Wednesday 20 October 2010

3-in-1 : Merging bibliographic data

iFind Discover has a single bibliographic database extracted from the three partner libraries. It was decided early on that we wanted to de-duplicate records as much as reasonably possible, so records were extracted from the local LMSs and then pre-processed in order to:-

Remove duplicates. At present, this is done solely on ISBN/ISSN, but this creates problems, e.g. where serial titles have changed but retain the same ISSN. I am working on a more sophisticated merger process. When records are merged, a preferred record is selected and then protected fields (e.g. local notes, online links) are transferred from the non-preferred record.
  • Store system numbers. VuFind normally uses the system number in the 001 field to connect to the LMS in order to obtain holdings information. Since 001 is a non-repeatable field, part of the changes we have had to make to VuFind is to use another field to contain links to the LMS. We chose field 969, for no good reason other than it is a local field and not used in any of our existing records. The field contains the institution’s initials in subfield a and the system number in subfield b.
  • Replace the record system number with the ISBN/ISSN, which is used to match duplicates. 10-digit ISBNs are converted to their 13-digit form.
  • Transfer holdings data. Where holdings records are included in the extract (currently only at Swansea University), the 852 and 866 fields are merged into their parent bibliographic record. This makes the shelfmarks searchable even if they were not present in the bib record, and it allows for location information to be displayed even when the LMS is offline.
  • Correct oddities. There are some UTF8 character set problems and have been other structural problems in one of the datasets, to which the Perl MARC modules took exception (although these have mostly been fixed at source now).
At present, the pre-process and load procedure is manual and being performed roughly once a week, but we are working on an automated procedure for daily updates. The manual procedure pre-processes and loads records from each institution in turn, and finds duplicates by querying VuFind for records aleady loaded from other institutions.

Paul

No comments:

Post a Comment