The Digital Information Technology and Architectures 2016 module at #citylis is drawing to a close.  As I reflect on all of the interesting topics that we have covered this term, it is difficult to select just one to focus on.  So, I will briefly introduce a few subjects and, perhaps like me, you might be inspired to find out more about them in the future.

Searching for the Data

Having worked as a desk researcher, I found this session enlightening because I never stopped to ask myself if the underlying data was highly structured, partially structured or had little structure at all, i.e. web pages.  It highlighted to me how many key elements there are to consider when choosing a structure for datasets.  For example, who is the target audience?  How will they access the data?  How will they use it?  How often will it be updated?  Will it be chargeable and, if so, what methods of payment will be utilised?  What security measures are needed?  What ontologies, taxonomies and metadata will be used?  And, the list goes on.

Medline is a highly structured specialised medical database.  The users tend to be experts in their domain and have a deep understanding of the subject area.  They require controlled vocabulary and need to be able to specify exactly where in a document their search terms appear.  They generally receive training to get the most out of the system and, if they require further help, they can call the specialist.  An example of a Medline query might be The relation between blood pressure and mortality due to coronary heart disease among men in different parts of the world.

In contrast, someone with a more general requirement and/or limited experience in searching or using a computer might be more comfortable using the single search box, i.e. Google, where they can type in a word or phrase.  They will have less control over the results from multiple sources, but the upside is that it will require minimal effort and expertise on their part.  An example of this type of search might be Brexit.

Working with the Data

One way to interrogate large amounts of data is to use Application Programming Interfaces (APIs).  APIs are used as an interface between a dataset, usually a complex one, and a user.  They are programs containing a set of tools designed to allow non-programmers to access data from a database.

Bloomberg provide thousands of financial data series, such as stock share prices, going back decades.  Without programming skills or a Bloomberg terminal interface, it would not be possible to extract the data and analyse it in Excel.  Bloomberg have an API that plugs into Excel, meaning that it appears as a menu option along with other headings like File, Edit, View and Format.

With Excel and a subscription to Bloomberg, data can be imported using a guide, known as a wizard.   The wizard, in the case of the Bloomberg API, is a series of pop-up windows that contain search boxes, pull-down menus, controlled vocabulary and tick boxes.  It is possible to request the share price for a list of retail companies in Western Europe, with a turnover of at least €10bn, ranked by market capitalisation without knowing a single word of programming code.

During the lecture, we had the opportunity to set up a Twitter API to glean data from our Twitter feeds.  I can see that analysing data from social media sites is a powerful tool.

Counting the Data

On week seven of the course we were treated to a presentation by Amy, a #Citylis student currently working for Altmetric as a Customer & Sales Support representative.  Altmetric provide web-based metrics by tracking a range of sources that discuss scholarly content, for example news, blogs and social media.  The aim is not to convey sentiment but simply indicate the amount of attention that a paper is receiving.  The name Altmetric is short for alternative metrics although, strictly speaking, the metrics they provide are not designed to replace existing metrics but complement them.  The type of metrics I am referring to here are established citation impact metrics, such as impact factor and h-index.

Coming from a commercial background, the idea of tracking citations is new to me.  After using the system to search during our practical session, I was amazed at how useful it is as a search tool for finding documents too.  If something I am interested in is cited several times, whether it is being talked about in a positive way or being challenged, I can find out more about it.  I have been using the alert function for a few weeks and I am impressed with the results.  It is a fascinating addition to my existing pool of resources for research.

The Meaning in the Data

In week eight we went on to explore word clouds as a way of expressing data.  Experimenting with various shapes, colours and sizes gave us more control over how the data was conveyed.  The images below are drawn from an article by Lyn Robinson and David Bawden, The Dark Side of Data (Bawden and Robinson, 2009).



Word Clouds created using Tagxedo


This is not the end of a module but the beginning of a lifetime of learning.



Bawden, D. & Robinson, L. (2009). The dark side of information: overload, anxiety and other paradoxes and pathologies. Journal of Information Science, 35(2), pp. 180-191.