Welcome Back! In my last post, “Demystifying Metadata” I introduced the basics of metadata covering the history, usage, various types and extent to which it is analysed today.
Our classes over the past few weeks have been building on these concepts, and I am gaining a deeper understanding of the terms Linked Data & Semantic Web and how they relate to Library & Information Science (LIS). I am beginning to appreciate how librarians and information professionals are playing a significant role in transforming the World Wide Web (WWW) into the Web of Data.
All the painstaking work that has been put into creating metadata for collections around the world can now be leveraged as it is unleashed from its current confinements. We are now connecting it to metadata from external sources, with meaningful relationships, which will help to open up the Web in a way that has not been previously possible.
During a TED Talk back in 2009, Tim Berners-Lee enthusiastically appeals to all of us to share our data on the Web. The same man who invented the WWW over 20 years ago, acknowledges that we have been sharing our documents and, in fact, he thanks us for doing so. But now, once again, he is feeling the frustration of untapped potential and he imparts his vision of a world where data linked together will allow us to access an abundance of power.
Linked Data and Semantic Web are terms coined by Tim Berners-Lee to describe a Web environment where computers understand the meaning of our search terms. Linked Data allows machines to act more like humans when searching for and retrieving data, therefore, returning more relevant results. Machines cannot process the content on Web pages in the same way that humans can, but they can understand semantically structured metadata. The Semantic Web is not a new Web but an extension of the existing one where W3C standards are used to structure data with Resource Description Framework (RDF) and EXtensible Markup Language (XML) protocols.
Linked data is a way of joining a subject with an object using a predicate. A predicate is another word for relationship. This additional element – the relationship – implicitly links concepts together creating meaningful connections. The three aspects together form a Triple.
This diagram is a simplified version of what is possible. One subject can be related to several objects, and RDF statements can be both internal and external to the source of the data. For further reading on RDF Triples, see the links at the end of this page.
Each entity or concept has a unique Uniform Resource Identifier (URI). A URI is similar to a Uniform Resource Locator (URL), but instead of pointing to a location of a Web page it homes in on a person, company, place, film or book, etc.
Linking in this way has many advantages. For example, a Web page is static and needs to be maintained to keep it up-to-date, which can be an expensive and time-consuming mission. With Linked Data the computer can do the work of updating the information by retrieving the URI that you are searching for along with other relevant data linked by the predicates. The links can come from anywhere within the Web of Data and comprise of several sources.
Other benefits include more accurate search results due to pre-defined ontologies assigned at the indexing stage; reduced time spent searching for information; no need to enter multiple search terms, and additional data can be utilised that would not otherwise be retrieved.
Libraries are preparing to convert their existing bibliographic data into Linked Data, opening up their catalogs and resources to a wider audience. More users will be able to access, search and combine the information with additional data sources. Having the metadata presented in this way also gives developers more opportunity to be creative when developing search tools.
Many people, especially the younger generation who have only looked for information in a world where the Internet is omnipresent, start and end their search with a Web browser i.e. Google. Search engines are useful but the results retrieved are not all inclusive. Usually, to locate resources held in libraries and information centres, a separate exploration is required.
The Library of Congress has instigated a Bibliographic Framework Initiative (BIBFRAME) which aims to convert bibliographic data from the existing Machine-Readable Cataloging (MARC) language to the BIBRAME format. “The BIBFRAME vocabulary uses a Linked Data model and thus leverages the RDF modelling practice of uniquely identifying as Web resources all entities, attributes, and relationships (i.e., properties) between entities.” (http://www.loc.gov/bibframe/)
The BIBFRAME model will enable library data to be connected with other Linked Data on the Web. When a person uses a search engine to look for, say “investment management”, they will see news, videos, blogs, images, etc., together with library references on investment management. Likewise, when they search library catalogs they will gain access to a host of other relevant information on the subject.
Despite the possibilities and buy-in from many communities, including LIS, there are sceptics who have concerns about the Semantic Web. These include documents being falsely tagged (to gain higher rankings in the results), less human oversite leading to ease of abuse, and those who believe that the vision will never come to fruition in the business environment.
As the Web 3.0 is still in development, it is too early to tell but I believe that it will have a positive impact on LIS professionals and the users of library and information services.
Examples of Web sites using Linked Data:
Linked Data: Evolving the Web into a Global Data Space http://linkeddatabook.com
Linked Data: Connect Distributed Data across the Web http://linkeddata.org
ALCTS: Transforming Library Metadata into Linked Library Data http://www.ala.org/alcts/resources/org/cat/research/linked-data
OCLC: Data strategy and linked data – Helping libraries thrive on the Web https://www.oclc.org/en-UK/data.html