Conrad Taylor reports from the Linked Data and Libraries conference
Talis, a company which is heard quite often in discussions of Linked Data and the Semantic Web, has deep roots in the library world: it arose in the early seventies from the pioneering ‘Birmingham Libraries Co-operative Mechanisation Project’ and developed a suite of Library Management System software that is at the heart of library cataloguing and management in many British libraries.
In March 2011, Talis took the bold step of selling off its library management division to Capita; from now on, Talis will concentrate on Linked Data and Semantic Web projects. These are approaches to handling information on the Web in such a way that machines can do more useful things with it, and the Linked Data idea is finding applications across many business areas, such as health information, government open data and support for scientific research.
However, Talis has not lost contact with the library world, and on 14 July I attended a free one-day conference on ‘Linked Data and Libraries’, a follow-up to a similar event they ran last year. The event was held in the British Library conference centre; it was attended by about 50-60 people, a surprising proportion from abroad, including Germany and the Netherlands, Denmark, France and USA. There were six major presentations, plus four short talks and a welcoming address by Dame Lynne Brindley, the CEO of the British Library.
British Library’s big Linked Data experiment
Being at the BL was extremely apposite because the Library is in the midst of a major conversion of its bibliographic records into Linked Data format, assisted by Talis, as I’ll explain below. As Lynne Brindley explained, the British Library has made a commitment to follow the government’s avowed philosophy of opening up data and media assets for free use by the public.
A highlight of the day was the description by Neil Wilson, BL’s Head of Metadata Services, of the project that is converting the British National Bibliography into Linked Data RDF-XML form, from MARC 21 records. The aim is to break this data out from its ‘entrapment’ within the MARC format, so that it continues to perform its traditional functions but is also opened up for data mining and can be integrated into new kinds of services and mash-ups that build on and extend the information within bibliographic data. There should also be cost savings to converting to more generic, more widely-supported RDF/XML-based formats – as Deanna Marcum of the US Library of Congress also noted in May 2011.
The British National Bibliography is the record of all books published in the UK. This was a good test case for Linked Data conversion: it’s a database of published output of general utility, copies of which can be found in many places, rather than an index of a special collection of unique items; and over the sixty years since the BNB started, the data format of the bibliography has stayed pretty consistent. What’s more, most of the items already come with a unique identifier such as an ISBN.
So far some 485,200 of these records have been converted. Here is an introductory link to the project.
The evangelist’s perspective
Richard Wallis is Talis’ ‘evangelist’, and his presentation set the stage for the day. He noted an explosion of interest in Linked Data in the libraries sector, over two busy years. Libraries are well placed to lead in the use of Linked Data because they have centuries of experience of describing and classifying things, they have loads of standards (too many, think some), and they certainly have lots of very large lists of things, both tangible (such as books) and conceptual (such as subject headings).
Linked Data can be hard to explain, but Richard provided an example. He showed us a picture of an American spacecraft, to which NASA had assigned the unique identifier ‘1969-059A’. In front of this, we could build a string that locates this identity on the Web: ‘http://nasa.dataincubator.org/spacecraft/1969-059A’. That is the Uniform Resource Indentifier for that entity, its URI.
If you go there (.html gets added automatically to the end of the URI), you end up on a Web page with all sorts of interesting information about the spacecraft, which was the Apollo 11 Command and Service Module, also called ‘Columbia’; associated with that is a .rdf (Resource Description Format) page which has all of that information contained in a highly structured form. And if you examine the RDF even in a cursory fashion, you’ll see that the RDF links out to other URIs, where other objects and concepts are defined: mission, launch date, launch site. They in turn link to other URIs for objects and concepts: latitude and longitude of Cape Canaveral, for example…
Once you take a Linked Data approach to bibliography, what starts as a set of records for describing books has the potential to extend by linking out to other resources. Consider how a work of history might reference people, places, times, events. Or consider how the ‘Author’ field of a Brother Cadfael mystery book by Ellis Peters might link to a resource that would reveal this name as a pseudonym for Edith Pargeter, who aso wrote as ‘Jolyon Carr’, ‘John Redfern’ and ‘Peter Benedict’. The library record thus becomes more Web-like: not a mere end-point of a query, but a waystation on a journey of discovery.
Richard also drew our attention to a new business launched by Talis called ‘Kasabi’. This enterprise will provide hosting for Linked Data publishing, with plenty of options to do this for free: it might be a space where organisations wanting to experiment with Linked Data might play: take a look! There is also a useful blog post at the Open Knowledge Foundation about Kasabi, written in June 2011 by Leigh Dodds: http://blog.okfn.org/2011/06/25/open-data-and-kasabi/
LOD-LAM report, and a London event soon?
Last month with generous funding from the Alfred Sloan Foundation, 85 organisations worlwide which are Linked Open Data pioneers came together at The Internet Archive in San Francisco for a “LOD-LAM Summit” (Linked Open Data in Libraries, Archives and Museums). This was a kind of unconference for a selcted group of people who are currently exploring what are the issues, the areas for international collaboration. Adrian Stevenson of UKOLN at the University of Bath gave a report from the event, and his slides are embedded on the LOD-LAM site at http://lod-lam.net/summit/ There is a #lodlam Twitter hashtag for LOD-LAM activities, and there is to be an event in London some time in November, on the topic of Museums and the Web.
Prism shows the nitty and the gritty
For me one of the most interesting presentations was the ‘Linked Data OPAC’ one by Phil John of ‘Prism’, that part of Capita that used to be the Talis LMS services. The Prism product is a next-generation discovery interface to library catalogue resources, built on top of Linked Data, and delivered through a Cloud software service model. Handling existing library catalogue data is the current target, but Prism is also being readied for extended purposes such as reaching out towards the retrieval of abstracts, e-journal article texts, thesis repositories, or even towards local organisations and events that are conceptually linked to what is being searched for (e.g. health-related workshops).
Phil primarily described what it takes to convert MARC 21 library records into Linked Data structures. MARC (Machine Readable Catalog) is a data format that made its original debut in 1966, and essentially it is an electronic replica of the old index-card catalogue. As Rob Styles of Talis later remarked, MARC is ‘a brilliant standard — but from so, so long ago’. Three-digit numerical codes reference the field names, and an arcane synbology of semicolons and dashes defines the relationships between parts of a statement, largely because early computer systems had such severe memory constraints.
Prism itself performs conversions from MARC 21 to RDF for Linked Data. The conversion system can be conceptualised as having three chained modules. A ‘parser’ ingests the MARC 21 records, understands the syntax and identifies various ‘events’ within them. An ‘observer’ module notices the events and discovers the data structures embedded within them. Based on that identification process, the structures are piped over to a family of ‘bibliographic handlers’ which actually perform the conversion to Linked Data.
A MARC 21 record is intended for humans to understand. That leads to loads of ambiguities when trying to turn records into webs of links between RDF triples which a machine must be able to negotiate. Prism has therefore put a lot of programming effort into automating as much as possible of this disambiguation and transformation process.
MARC 21 records fail to discriminate between various formats: for example an Audio CD is logged as a 12 cm sound disc with a velocity of 4 metres/second. With this kind of new media, you are going to want to say different things about it than you would about a book. By building into Prism conversions some ‘reasoners’ that can infer what the format is from the kind of descriptions in some of the other fields in a MARC 21 record, the data quality actually improves en route to the Linked Data format.
Through APIs, Prism can offer extensions such as RSS feeds: we can imagine that a search you have performed could be stored so that an RSS feed could be generated for that search, notifying customers of new books as they are acquired by a library.
Are records dead?
The final presentation of the day was by Rob Styles, also of Talis: Richard had landed him with the session title of “The Record is Dead”. Which they signally aren’t, said Rob – though maybe they should be! The records-based approach, particularly as expressed in MARC, is machine-processable in quite limited ways; Linked Data can extend out into a wider world of knowledge.
Rob dug a little further into some of the conversion issues between MARC and Linked Data, and on the way took some philosophical side-swipes at SKOS (the Simple Knowledge Organization System promoted by the World Wide Web Consortium) and FRBR (the Functional Requirements for Bibliographic Records entity-relationship model promoted by IFLA, the International Federation of Library Associations and Institutions). In both cases, Rob appears to believe that the classifications are performed at too great a level of abstraction, and not in terms that an ordinary user can relate to.
Rob had been part of the Talis team on the British Library project, and he added to Neil Wilson’s account of the methods used. BL chose to tackle the project themselves, with their own staff and their own PCs, a significant aim of the project being to increase their internal knowledge and skills. Talis provided initial training, ‘to get everyone on the same page’ as Rob put it; and as the project explored paths and solutions, Talis had people on hand as mentors.
There was more
In presenting this summary of the Talis event, I have focussed on the talks which had the greatest interest to be, and skipped entirely over the short ‘lightning talks’.
Further reading and viewing
The folks at Talis very efficiently stuck up the slides from the event in SlideShare format at http://consulting.talis.com/resources/presentations-from-linked-data-and-libraries-2011/
They’ve also added a blog item about the British Library’s release of part of the British National Bibliography, which explains how BL reached beyond a mere transformation from their catalogue data and have sought to model ‘things of interest’ using various available descriptive schemas, for example the ‘foaf’ (Friend of a Friend) schema which lets you say interesting things about people (such as authors).
Richard Wallis tried Talis’ first ever experiment with streaming video, which was achieved simply with a little Windows netbook with a built-in webcam, and the audio supplemented with a USB condenser mic. The recordings are there to be watched at this Ustream URL (it’s a free account so there are ads to put up with).