Posts Tagged ‘MARC’

Transition away from MARC?

Friday, November 11th, 2011

A recent announcement by the Library of Congress (LC) has provided more details about its intent to steer away from MARC format and begin investigation into its replacement. MARC has been the format for bibliographic and authority record exchange for nearly 40 years. Because of that the format lacks the ability to accommodate fully the features planned for RDA.

The Working Group of the Future of Bibliographic Control wrote in their report: (more…)

Backstage Library Works Automated Services

Monday, August 31st, 2009

You may have noticed that we’ve recently changed the name of this blog from MARS Authority Control to MARS Automation Services. Why the change? I’ve asked our Chief Operations Officer of the Utah location, John Reese, to explain why. Here is what he had to say.

***

Before the purchase of MARS from OCLC, Backstage Library Works ran several bibliographic automated products.  These services include Non-MARC or Brief MARC Record Upgrades (Machine Matching), Deduplication and Consolidation of bibliographic data, Union Database creation, Marcadia and Custom Programming.  Prior to this week, these services were run independent of the MARS 2.0 Automated Authority Control Service.  Backstage is happy to announce the consolidation of all of the above mentioned services and MARS 2.0 Automated Authority Control under one division to be called, “Backstage Library Works Automated Services.”

We found that there was a lot of overlapping in client needs involving authority control with the above mentioned automated bibliographic services.  All of these services require technical knowledge of the automated process.  At Backstage we are now sharing this valuable resource under one umbrella, Automated Services.  We look forward to having one department work with your sales representative in offering the most efficient automated solution for you.  Below is a list of all of the services under our newly formed Automated Services department.  If you have question about this change please contact John Reese, jreese@bslw.com.

MARS 2.0 Automated Authority Control:

MARS 2.0 service is one of the oldest and most reliable automated authority control offerings on the market. With our new system upgrade introduced in 2008, managing your authority records has never been easier.  MARS 2.0 offers name and subject authority control based on the Library of Congress name and subject authority databases (and other available National databases like MeSH, NLC, etc.).  The authority control process standardizes name, subject, series and Uniform title headings.

Automated Machine Matching:

This service offers several options to upgrade non-MARC or brief MARC records to full MARC bibliographic records.  This process searches electronic records against Backstage Library Works database and the Library of Congress Bibliographic Database to return a full standard MARC record.  There are over twenty million records to match against in this database.

Automated Deduplication:

Backstage Library Works offers a deduplication process that consolidates bibliographic or authority records in a library’s database(s).  This process is performed according to the profile specifications of the library and is often used when a library or a library consortia is forming or adding new libraries.

Union Database Creation:

Library consortia or library districts often require a central database for their consortia to work from. Backstage helps libraries create Union Bibliographic as well as Union Authority databases.

Marcadia:

Marcadia is an automated batch copy cataloging service offered jointly by OCLC and Backstage Library Works.  This product finds, evaluates and delivers catalog records from OCLC WorldCat.  It is based on search records you supply from your local system and a selection criterion you provide.  Marcadia selects matching records from WorldCat and delivers them to you.

Custom Programming:

It has been a long standing Backstage Library Works’ tradition to customize our services to the needs of our libraries.  Many libraries require special programming to accommodate either earlier cataloging practices that no longer meet current standards and need to be changed or special programming to create unique processes for their library.  Backstage takes pride in its ability to accommodate these special needs.

MARC8 and UTF8 – what does it mean?

Tuesday, August 25th, 2009

Last week we looked at the Anatomy of an Authority Record, but what if we look even deeper? Both Bibliographic and Authority records are essentially text, made up of characters formed either in MARC8 or UTF8. But what does that mean, and whats the difference?

MARC-8

The MARC-8 character set uses 8-bit characters, meaning it natively displays ASCII and ANSEL text. Because of the limitation of characters that this allows, the MARC-8 character set includes methods to extend the displayable characters. One method is to include both spacing base characters and nonspacing modifier characters (diacritics).

Spacing or nonspacing refers to cursor movement: a spacing character moves the cursor, a nonspacing character does not. A nonspacing character is always associated with a single spacing character, but multiple nonspacing characters may be associated with the same spacing character.

In MARC-8, when there is a nonspacing character, it precedes the associated spacing character: any cursor movement occurs after displaying the character. This method allows basic and extended Latin characters to be displayed using the default character set.

 Another method MARC-8 uses to extend the displayable characters is to use alternate character sets. This is done by using escape sequences, special character sequences containing codes to indicate which character set is being selected for display. Possible alternate sets include subscripts, superscripts, Hebrew, Cyrillic, Arabic, and Greek. Chinese, Japanese and Korean are also possible by this method using EACC character encodings for these characters. While this method allows for many additional characters to be used, it is still limited and somewhat burdensome.

 UTF-8

As computers needed to support a wider character set, many computer related companies formed a group to define the Unicode Standard. This standard is based on 16-bit characters. UTF-8 is a method of encoding these characters into sequences of from 1 to 3 bytes. Unicode, using the UTF-8 encoding, was accepted as an alternative character set for use in MARC records, with an initial limitation to using only the Unicode characters that have corresponding characters in the MARC-8 character set.

Decomposed

Unicode has definitions for nonspacing characters like MARC-8, except that the nonspacing character follows the character it modifies: cursor movement occurs before the character is displayed. Decomposed UTF-8 characters are similiar to MARC-8 diacritics, in which a base character is modified by one or more non-spacing characters. For example a base character ‘n’ with a non-spacing ‘~’ would combine to display ‘ñ’. Decomposed is also the current LC standard.

Precomposed

 Unicode also includes many precomposed characters. These are spacing characters that are the equivalent of one or more nonspacing characters and a spacing character. A precomposed ‘ñ’, instead of having a base character and an additional non-spacing diacritic mark would combine all those elements into one code which represents the character with the diacritic as a whole. This causes a more difficult normalization routine.

Normalization

To handle the various ways a composite character could be normalized, standardized normalization forms have been defined. These include NFD (Normalization Form Decomposed) and NFC (Normalization Form Composed). In NFD, every character that can be decomposed is converted to its most decomposed form following rules for canonical decomposition. In NFC, the characters are first decomposed as in NFD, then composed into precomposed (composite) forms following canonical rules.  This may result in the sequence of characters for a given character changing to an alternate, equivalent form.

Conclusion

Many library systems are moving from MARC-8 to UTF-8 character encodings. This is a good move because it gives you the ability to accurately reflect the data, while lessoning the possibility of error. Backstage Library Works can return data in MARC-8, or UTF-8 (decomposed or composed) form.

Understanding the MARC Structure

Monday, March 16th, 2009

Underneath the MARC records that we all know and love is a somewhat cryptic structure that tells our systems how to read the record. Luckily, this structure rarely gets corrupted, but when it does, it’s good to have a basic understanding of how to read and understand a MARC Format.

The MARC format is a text based format, meaning you can open it with a text editor. It is probably a good idea to only open a few (or only one!) record in a text editor, because it’s very difficult to read otherwise.

Every MARC record starts with a leader, the leader gives your system information about the record, including how big it is and what type of a record it is. Next is what is called a directory. Just like a normal directory, it tells you what tags are in it, and where the data for each tag is located.

This is how a leader is defined for a Bibliographic Record:

Leader: 

Position | Description              | Explination
------------------------------------------
00-04 | Record Length               = This is how long the record is
05    | Record Status               = Is the record new, changed or deleted
06    | Type of record              = Authority, Book, Computer file, etc.
07    | Bibliographic level         = Monograph/Serial/etc.
08    | Type of control             = Archival or not
09    | Character coding scheme     = MARC8 or UTF8
10    | Indicator count             = # of indicators each tag has
11    | Subfield code count         = # of subfield codes that make up 1 subfield
12-16 | Base address of data        = The byte where actual record data begins
17    | Encoding level              = Level of encoding/cataloging
18    | Descriptive cataloging form = AACR2
19    | Multipart resource record level = # (blank)
20    | Length of the length-of-field = # of bytes to store how long each tag is
21    | Length of the starting-character-position = # of bytes to store where tag begins
22    | Length of the implementation-defined portion = Rarely used
23    | Undefined                   = Not used

Looking at a leader can be confusing, but it’s also the only way to find some problems.

Here is an example record, as it may appear on your screen:
001 __ 3180021
005 __ 20050216201852.0
008 __ 040805s2005    nyu      b    001 0 eng 
010 __ ▼a  2004018260
020 __ ▼a0415971675 (alk. paper)
035 __ ▼a(DLC)  2004018260
040 __ ▼aDLC▼cDLC▼dDLC▼dCaONFJC▼dOrLoB-B
043 __ ▼an-us—
050 00 ▼aPS374.H56▼bO73 2004
082 00 ▼a813/.5409358▼222
090 __ ▼aPS374.H56▼bO73 2005
100 1_ ▼aOrbán, Katalin.
245 10 ▼aEthical diversions :▼bthe post-holocaust narratives of Pynchon, Abish, DeLillo, and Spiegelman /▼cKatalin Orbán.
260 __ ▼aNew York :▼bRoutledge,▼c2005.
300 __ ▼aix, 209 p. ;▼c24 cm.
440 _0 ▼aLiterary criticism and cultural theory
504 __ ▼aIncludes bibliographical references (p. 193-205) and index.
505 00 ▼gCh. 1.▼t”Mauschwitz” : monsters, memory, and testimony — ▼gCh. 2.▼tFamiliarity and forgetfulness in Walter Abish’s fiction — ▼gCh. 3.▼tPinpricks on the Ars(e) Narrandi : liminality and oven-games in Gravity’s rainbow.
600 10 ▼aPynchon, Thomas▼xKnowledge▼xHolocaust, Jewish (1939-1945)
600 10 ▼aSpiegelman, Art▼xKnowledge▼xHolocaust, Jewish (1939-1945)
600 10 ▼aAbish, Walter▼xKnowledge▼xHolocaust, Jewish (1939-1945)
600 10 ▼aDeLillo, Don▼xKnowledge▼xHolocaust, Jewish (1939-1945)
650 _0 ▼aAmerican fiction▼y20th century▼xHistory and criticism.
650 _0 ▼aHolocaust, Jewish (1939-1945), in literature.
650 _0 ▼aJudaism and literature▼zUnited States▼xHistory▼y20th century.
650 _0 ▼aWorld War, 1939-1945▼zUnited States▼xLiterature and the war.
650 _0 ▼aEthics in literature.
650 _0 ▼aJews in literature.
852 0_ ▼bkr▼hPS374.H56▼iO73 2005
856 41 ▼3Table of contents▼uhttp://www.loc.gov/catdir/toc/ecip0421/2004018260.html
949 __ ▼aApproval▼b1628024-35▼c67.20▼d1▼i20027328▼jUSD▼tBook

Now, if we are to take a look at the actual MARC structure, it looks like this (when looking at an underlying MARC record, you may not be able to see all of the special characters like end of field and end of record marks, but they are there):

01872cam a2200397 a 450000100080000000500170000800800410002501000170006602000280008303500220011104000360013304300120016905000240018108200

210020509000240022610000210025024501180027126000340038930000250042344000430044850400640049150502240055560000620077960000

620084160000600090360000590096365000590102265000500108165000660113165000650119765000260126265000240128885200280131285600

7801340949005601418318002120050216201852.0040805s2005    nyu      b    001 0 eng    a  2004018260  a0415971675 (alk. paper)  a(DLC)  2004018260  aDLCcDLCdDLCdCaONFJCdOrLoB-B  an-us—00aPS374.H56bO73 200400a813/.5409358222  aPS374.H56bO73 20051 aOrbán, Katalin.10aEthical diversions :bthe post-holocaust narratives of Pynchon, Abish, DeLillo, and Spiegelman /cKatalin Orbán.  aNew York :bRoutledge,c2005.  aix, 209 p. ;c24 cm. 0aLiterary criticism and cultural theory  aIncludes bibliographical references (p. 193-205) and index.00gCh. 1.t”Mauschwitz” : monsters, memory, and testimony — gCh. 2.tFamiliarity and forgetfulness in Walter Abish’s fiction — gCh. 3.tPinpricks on the Ars(e) Narrandi : liminality and oven-games in Gravity’s rainbow.10aPynchon, ThomasxKnowledgexHolocaust, Jewish (1939-1945)10aSpiegelman, ArtxKnowledgexHolocaust, Jewish (1939-1945)10aAbish, WalterxKnowledgexHolocaust, Jewish (1939-1945)10aDeLillo, DonxKnowledgexHolocaust, Jewish (1939-1945) 0aAmerican fictiony20th centuryxHistory and criticism. 0aHolocaust, Jewish (1939-1945), in literature. 0aJudaism and literaturezUnited StatesxHistoryy20th century. 0aWorld War, 1939-1945zUnited StatesxLiterature and the war. 0aEthics in literature. 0aJews in literature.0 bkrhPS374.H56iO73 2005413Table of contentsuhttp://www.loc.gov/catdir/toc/ecip0421/2004018260.html  aApprovalb1628024-35c67.20d1i20027328jUSDtBook

And here is a sample analysis of this record:

Leader: 

Position | Description              | Data
------------------------------------------
00-04 | Record Length               = 01872 : confirmend in a hex editor the record is this length
05    | Record Status               = c
06    | Type of record              = a
07    | Bibliographic level         = m
08    | Type of control             = # (blank)
09    | Character coding scheme     = a
10    | Indicator count             = 2
11    | Subfield code count         = 2
12-16 | Base address of data        = 00397 : confirmed in hex editor this is correct
17    | Encoding level              = # (blank)
18    | Descriptive cataloging form = a
19    | Multipart resource record level = # (blank)
20    | Length of the length-of-field portion = 4
21    | Length of the starting-character-position portion = 5
22    | Length of the implementation-defined portion = 0
23    | Undefined                   = 0
Directory: 

Tag | Field length | Starting character position
001 0008 00000
005 0017 00008
008 0041 00025
010 0017 00066
020 0028 00083
035 0022 00111
040 0036 00133
043 0012 00169
050 0024 00181
082 0021 00205
090 0024 00226
100 0021 00250
245 0118 00271
260 0034 00389
300 0025 00423
440 0043 00448
504 0064 00491
505 0224 00555
600 0062 00779
600 0062 00841
600 0060 00903
600 0059 00963
650 0059 01022
650 0050 01081
650 0066 01131
650 0065 01197
650 0026 01262
650 0024 01288
852 0028 01312
856 0078 01340
949 0056 01418| (56+1418 = 1474) 1474 = Ending field terminator below

Fields:
(For our convenience in readin the record we have replaced certain non-displayed fields with graphical representations)
| = field terminator (also used to terminate the directory)
$ = subfield code delimiter
^ = record terminator
_ = utf8 characters have been replaced by an undersore so the postions line up in a non-hex display

3180021|20050216201852.0|040805s2005    nyu      b    001 0 eng  |  $a  2004018260|  $a0415971675 (alk. paper)|  $a(DLC)  2004018260|  $aDLC$cDLC$dDLC$dCaONFJC$dOrLoB-B|  $an-us—|00$aPS374.H56$bO73 2004|00$a813/.5409358$222|  $aPS374.H56$bO73 2005|1 $aOrb__n, Katalin.|10$aEthical diversions :$bthe post-holocaust narratives of Pynchon, Abish, DeLillo, and Spiegelman /$cKatalin Orb__n.|  $aNew York :$bRoutledge,$c2005.|  $aix, 209 p. ;$c24 cm.| 0$aLiterary criticism and cultural theory|  $aIncludes bibliographical references (p. 193-205) and index.|00$gCh. 1.$t”Mauschwitz” : monsters, memory, and testimony — $gCh. 2.$tFamiliarity and forgetfulness in Walter Abish’s fiction — $gCh. 3.$tPinpricks on the Ars(e) Narrandi : liminality and oven-games in Gravity’s rainbow.|10$aPynchon, Thomas$xKnowledge$xHolocaust, Jewish (1939-1945)|10$aSpiegelman, Art$xKnowledge$xHolocaust, Jewish (1939-1945)|10$aAbish, Walter$xKnowledge$xHolocaust, Jewish (1939-1945)|10$aDeLillo, Don$xKnowledge$xHolocaust, Jewish (1939-1945)| 0$aAmerican fiction$y20th century$xHistory and criticism.| 0$aHolocaust, Jewish (1939-1945), in literature.| 0$aJudaism and literature$zUnited States$xHistory$y20th century.| 0$aWorld War, 1939-1945$zUnited States$xLiterature and the war.| 0$aEthics in literature.| 0$aJews in literature.|0 $bkr$hPS374.H56$iO73 2005|41$3Table of contents$uhttp://www.loc.gov/catdir/toc/ecip0421/2004018260.html|  $aApproval$b1628024-35$c67.20$d1$i20027328$jUSD$tBook|^

Ending field terminator    = 1474
Record terminator position = 1475

Summary
———————————
Ending field position = 1475
Base address of data  = 397

1475 + 397 = 1872 (Record Length)