“Metadata is a map. Metadata is a means by which the complexity of an object is represented in a simpler form.”
Jeffrey Pomerantz, Metadata 2015 (page 12)
Welcome to this Digital Technology & Architecture blog. Why have I chosen to write about metadata? Partly in rebellion against the standard response to the question “what is metadata?” which tends to be “data about data.” In my experience, respondents appear to be quite satisfied that they have answered the question adequately. The truth is until now I didn’t understand what that meant. What exactly is metadata? When was it first used? Who uses it now and for what purpose? And, why are more and more people talking about it? To obtain answers to these questions, I had to dig a little deeper.
The Oxford English Dictionary defines metadata as “A set of data that describes and gives information about other data.” The word started to make more sense to me once I read that in epistemology, the prefix meta- is used to mean about (its own category).
As this is a modern term, I had always assumed that it only pertained to digital data, particularly databases and data warehouses. Until recently, I had never heard it referred to outside of an information technology setting. I believed it to be a complex concept that only computer programmers and web developers fully comprehended.
An example of digital metadata:
Extract from a ProQuest Dialog ProSheet. Medline Database. MEDLINE is the U.S. National Library of Medicine (NLM) premier bibliographic database. http://media2.proquest.com/documents/medline_prosheet.pdf
Much to my surprise, I discovered that although the word is relatively new the first recorded use of metadata dates back to 280BC! It is simply a way of describing an object just as the library staff did in the Library of Alexandria under the management of Zenodotus. Tags were attached to the scrolls displaying the name of the author, title and subject so that readers could locate literature efficiently without having to unravel several scrolls just to see what was inside of each one.
You can find out more about the progression of the use of metadata throughout the ages by clicking on the following link from M-Files https://www.m-files.com/en/infographic-the-history-of-metadata
Personally, I like the analogy that data resides in a container, and the metadata describes what is inside the container. With that simplistic explanation in mind, I can see that metadata is ubiquitous.
I agree with Sarah Higgins that “metadata is the backbone of digital curation” http://www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards. However, it reaches far beyond that.
For example, a value is added to a piece of artwork or a photograph when metadata is attached to it. At auction two similar paintings might be up for sale but if one has known facts attributed to it such as the artist’s name, origin, date of the work and ownership it will be deemed to be more valuable. Have you ever looked back at a photograph and forgotten where it was taken or struggled to remember the name of someone in a group shot? We are human, and our memories fade with time so without a record of these details they might be irretrievably lost forever.
Types of Metadata
Without getting too technical it is worth noting that there are different kinds of metadata namely descriptive, administrative and structural. There are several resources available on the subject for further investigation but below is a brief overview of each type.
Terms used to describe an object. The title, author or publisher are typical examples. Thinking about the audience, what words will they use to locate or sort the underlying data?
Often found in a business environment. Who created the document? When did they create it? Does it have a shelf life and if so, when should it be discarded?
Data about the structure of the content. Does it have different sections and if so, do you want users to be able to search for them independently?
By engaging in the digital world, we are sharing staggering amounts of metadata about ourselves, sometimes willingly and sometimes unknowingly, with retailers and service providers to name but a few. This type of information might include data about our telephone calls, travel arrangements, medical information, shopping habits, political views and the list goes on ad infinitum. With such transparency inevitably comes controversy (I think that is a topic for another blog post!) but it is fascinating to see how far we have come in terms of analytics.
Data Scientists, Deepak Jagdish & Daniel Smikov demonstrate a product in their YouTube video https://www.youtube.com/watch?v=i2a8pDbCabg that they developed to help them analyse metadata from e-mails. The functionality allows them to map out who connects with whom, where they are located, how often they correspond and how that shifts over time. The interesting part is that they are then able to build a story from those fields, for example, when someone had a major change in their life. They might have moved jobs or severed contact with a key person in their life. This level of transparency could have all sorts of implications. The point is, this is all gleaned from metadata alone, without touching the underlying text in the e-mails.
The Power of Metadata: Deepak Jagdish and Daniel Smilkov at TEDxCambridge 2013 (published on 25 September 2013) https://www.youtube.com/watch?v=i2a8pDbCabg
Some people believe that sharing their personal metadata is harmless, but I take a far more cautious stance. Companies have been analysing metadata for years, but more people are talking about it now because the analysis and use of our personal data are increasing exponentially.