Difference between revisions of "Profile Guide Chapter 2"

From AC Wiki
Jump to: navigation, search
(Step 2.2 Question)
Line 17: Line 17:
 
   
 
   
  
== Step 2.2 Question ==
 
[[Image:2-2.png]]<br>
 
  
=== Numeric Field Validation ===
 
MARC fields that are incorrectly formatted often cause user searches to fail and prevent items in the collection from being included in the system indexes. MARS 2.0 software can validate the structure of numeric data in the following fields:<ul><li>010:  Library of Congress Control Number (LCCN)<li>020:  International Standard Book Number (ISBN)<li>022:  International Standard Serial Number (ISSN)<li>034:  Coded Cartographic Mathematic Data (CCMD)</ul>
 
  
=== Historical Fact ===
 
LC changed the structure of the LCCN beginning on Jan. 1, 2001 in order to accommodate a four-digit year. The length of the control number remains 12 characters as it was prior to the change. However, in the old LCCN structure (A), suffixes were occasionally used. Under the new LCCN structure (B), the location of elements is slightly altered to accommodate a four-digit year. Under both structures, the prefix, year and serial number are the basic elements required to make a LCCN unique.
 
 
Please indicate on Step 2.2 what kind of validation you would like performed on your 010, 020, 022 or 034 fields. Choosing “Yes, With these modifications” means that you would like the MARS 2.0 software to perform a modified validation (e.g., validate fields 020 and 022, but not fields 010 or 034).
 
 
=== Pre-2001 LCCN ===
 
LCCN Structure A (2000 and earlier) numbers are formatted according to the following 6 divisions (separated by hyphens):<br>
 
[[Image:Pre2001lccn.png]]
 
 
<ol><li>3-character prefix with lowercase letters and/or blanks<li>2 digits, usually the last 2 digits of the year<li>6-digit serial number, with zeroes padded to the left to make 6 digits<li>Blank space<li>Optional variable length suffix and/or alphabetic identifier<li>Optional revision date</ol><br>
 
 
Examples of LCCN Structure A (the character # represents a single space):<br>
 
[[Image:pre2001lccn2.png]]<br>
 
 
 
== 2.2 Numeric Field Validation ==
 
== 2.2 Numeric Field Validation ==
 
=== Post-2000 LCCN ===
 
=== Post-2000 LCCN ===

Revision as of 10:18, 6 August 2008

2.1 Introduction

Overview

MARS 2.0 makes changes in over 100 different MARC fields within your bibliographic record. Our Bibliographic Record Validation service updates many elements in MARC bibliographic records to conform with current MARC21 standards, providing increased consistency within your bibliographic files.

The level of MARC update is entirely configurable by you and your staff. By default, we perform all of the updates to your MARC bib records. Or, if you prefer, we can only perform the updates you specify. In the end, the update process is tailored to your expectations of what you’d like to see happen within your bibliographic records.

Standard MARC21 Validation

As soon as we receive your files, they are prepared for processing. The MARS 2.0 programs check all files of MARC records submitted to ensure they conform to the basic structural requirements of the MARC21 communications format. Our validation programs ensure that all records meet the following criteria:
  • Leader is present and correctly structured
  • Directory is present and correctly structured
  • No record exceeds 99,999 characters. Including bib records larger than 99,999 byte maximum size prevents successful processing of the input files. Records cannot be segmented (broken apart into multiple physical records) to reach the maximum size limit. These records will be output as potentially corrupt for the library to review
  • No field exceeds 9,999 characters (MARC21 directory limitation)
  • If a record exceeds the character or field size it is not processed. If there is a large number of rejected records our programmers will contact the library project manager to determine a course of action
  • All records contain the following standard MARC delimiters:
    • Record terminators (ASCII 1D16)
    • Field terminators (ASCII 1E16)
    • Subfield delimiters (ASCII 1F16)
  • All records contain valid characters (either in MARC8 or UTF8)
  • Any null characters (hex 00) are changed to spaces when records are loaded
  • MARS 2.0 will also delete empty fields or subfields as records are loaded

Note: MARS 2.0 programs can process MARC21 records that lack 001, 008 or other fields.

Topics

Chapter three is one of our most comprehensive documents, and as such is too large for a single page.

  • Step 2-1 - MARC Update Service Levels
  • Step 2-2 - Normalization of Generic Name Headings




2.2 Numeric Field Validation

Post-2000 LCCN

LCCN Structure B (2001 and later) numbers are formatted according the following 3 divisions (separated by hyphens):
Pst2001lccn.png

  1. 2-character prefix with lowercase letters and/or blanks
  2. 4-digit year
  3. 6-digit serial number, with zeroes padded to the left to make 6 digits

Examples of LCCN Structure B (the character # represents a single space):
Pst2001lccn2.png

According to the Library of Congress, Structure A LCCNs will not be changed to Structure B. This minimizes the impact of the LCCN change for local systems. Since LCCN structures A and B will continue to exist in authority and bibliographic records, MARS 2.0 programs provide for validation of both old and new LCCN formats. No provision is necessary, therefore, for the conversion of Structure A to the new Structure B formats, or vice versa.

LCCN Structure A Corrections

If the LCCN in the 010 $a is identified as a Structure A LCCN and does not have a valid structure, MARS 2.0 programs make the following format corrections (all changes are subsequently checked for validity):
  • If the first character of the LCCN is a number (no prefix is present), the programs insert 3 blanks (###) before the number:
    95-156543
    ###95156543#
  • If the first character of the LCCN is an alphabetic character and the second character is a number, MARS 2.0 programs insert 2 blanks (##) between the alphabetic character and the number to make a valid 3-character prefix:
    n95-156543
    n##95156543#
  • If the first 2 characters of the LCCN are alphabetic and the third character is a number, MARS 2.0 programs insert 1 blank (#) between the alphabetic characters and the number to make a valid 3-character prefix:
    nb95-156543
    nb#95156543#
  • If a hyphen appears in the 010 subfield $a, MARS 2.0 programs count the number of digits before the hyphen. If one digit is before the hyphen, a 0 (zero) is inserted before the first digit in the LCCN (following the prefix). If 2 digits are before the hyphen, no zeroes are inserted at the beginning of the LCCN:
    nb#9-156543
    nb#09156543#
    nb#95-156543
    nb#95156543#
  • MARS 2.0 programs also count the number of digits following the hyphen. If there are fewer than 6 digits, zeroes are added following the first 2 digits (##-) of the LCCN to make 6 digits (for a total of 8 digits). The hyphen is deleted from the LCCN:
    nb#95-6543
    nb#95006543#
    nb#95-56543
    nb#95056543#
  • If the LCCN contains a suffix, the suffix is removed in accordance with the revised LC standard for Structure A LCCNs:
    nb#95-516543//r86
    nb#95156543#
  • If the LCCN does not end with a blank, MARS 2.0 programs insert a blank following the last digit:
    nb#95-516543
    nb#95156543#
  • If the 010 field data has been modified, the 010 field length is recalculated and the 010 directory entry is updated. The record length is recalculated and updated in the record leader.
  • If MARS 2.0 programs cannot correct the format of the LCCN in the 010 subfield $a (e.g., there are 4 characters in the prefix or there are 9 digits), the 010 subfield $a code is changed to $z and a report can be generated. See report R50 on page 5:25 for more information about this report.
  • The following invalid LCCN prefixes are corrected to the valid format (# = blank):
    • #a# -> a##
    • ##a -> a##
    • #bc -> bc#
    • ## -> ###
    • # -> ###

LCCN Structure B Corrections

If the LCCN in the 010 subfield $a is identified as a Structure B LCCN and does not have a valid structure, MARS 2.0 programs attempt to correct it by making these conversions (all changes are subsequently checked for validity):
  • If the first character of the LCCN is a number (no prefix is present), the programs insert 2 blanks before the number:
    2005-256543
    ##2005256543
  • If the first character of the LCCN is an alphabetic character and the second character is a number, MARS 2.0 programs insert 1 blank (#) between the alphabetic character and the number to make a valid 2-character prefix:
    n2005-256543
    n#2005256543
    nb2005-256543
    nb2005256543
  • If a hyphen or blank space appears in the 010 subfield #a, MARS 2.0 programs count the number of digits following the hyphen. If there are fewer than 6 digits, zeroes are added following the first 4 digits (####-) of the LCCN to make 6 digits (for a total of 10 digits). The hyphen is deleted from the LCCN:
    nb2005-6543
    nb2005006543
  • If the 010 field data has been modified, the 010 field length is recalculated and the 010 directory entry is updated. The record length is recalculated and updated in the record leader.
  • If MARS 2.0 programs cannot correct the format of the LCCN in the 010 subfield $a (e.g., there are 4 characters in the prefix or there are 9 digits), the 010 subfield $a code is changed to $z and a report can be generated. See report R50 in Step 5 for more information about this report.
  • The following invalid LCCN prefixes are corrected to the valid format (# = blank):
    • #bc -> bc
    • #a -> a#
    • # -> ##

020 Field

Some automated systems do not index an ISBN if the format is invalid. An ISBN in field 020 subfield $a should be 10 digits or 13 digits. If the ISBN in 020 subfield $a does not have the valid structure, MARS 2.0 programs attempt to correct the ISBN structure by performing the following conversions:
  • If there are 9 digits in the ISBN, a 0 (zero) is inserted before the first digit in the ISBN:
    873671008
    0873671008
  • All hyphens are deleted:
    1-873671-008
    1873671008
  • A lowercase x is converted to uppercase:
    187367100x
    187367100X
  • If ISBN is 13 digits, MARS 2.0 programs will verify that the first 3 digits are 978.
  • As an optional service, MARS 2.0 programs will correct the order of the ISBN (i.e. pairs of 13/10 and 13/10)
  • As an optional service, MARS 2.0 programs will convert ISBN-10 to ISBN-13 (includes check-sum value for both 10 and 13 length ISBNs):
    1873671008
    9781873671000
  • If MARS 2.0 programs cannot correct the format of the ISBN in the 020 subfield $a (e.g., there are 11 digits), the 020 subfield $a code is changed to $z and a report is generated. See report R50 in Step 5 for more information about this report.

Historical Fact

The structure of the ISBN has changed over the past thirty years. Prior to 1977, the 020 field was not repeatable and multiple ISBNs and related information were placed in repeated subfields. Older bibliographic records may still have multiple ISBNs in a single 020 field rather than in multiple 020 fields. January 1, 2007 marked the final date for fully adopting ISBN-13. Between 2005 and 2008, publishers were encouraged to supply both an ISBN-10 and an ISBN-13 for the same manifestation, based on guidelines issued by the International ISBN Agency (IIA). The Library of Congress began accommodating ISBN-13 on October 1, 2004. At the beginning of 2007 is when publishers were expected to supply only ISBN-13.

Ordering 020 Fields

LC will accept both an ISBN-13 and an ISBN-10 for the same manifestation. These numbers are shown by publishers according to guidelines issued by the IIA, which call for grouping the pairs of ISBNs by manifestation. In printed products the ISBN-13 appears first, and each number is preceded by a print constant as in the following example:
     ISBN-13:  978-1-873671-00-0
     ISBN-10:  1-873671-008

Repeating 020 Subfields

MARS 2.0 Update processing validates an 020 field for correct subfield repeatability. If the 020 field contains multiple subfields $a, each subfield $a is placed in a separate 020 field:
     020 $a11111111$a22222222
       Corrected to:
     020 $a11111111
     020 $a22222222

Binding Information in 020 Fields

Prior to 1978, binding information was placed in a subfield $b. Older bibliographic records may have binding information in a subfield $b rather than as a parenthetical qualifier in the subfield $a.

If the 020 field contains a subfield $b and an 020 subfield $a exists:
  • Subfield $b delimiter and subfield code are deleted
  • 020 subfield $b data is enclosed in parentheses
  • A blank is inserted at the end of the immediately preceding 020 subfield $a data
  • 020 subfield $b data, enclosed in parentheses, is moved after the blank at the end of the 020 subfield $a data
    Original: 020 $a1873671008$bpbk. Corrected to: 020 $a1873671008 (pbk.)

No Subfield $a in 020 Field

If the 020 field contains a subfield $b and no 020 subfield $a exists, the subfield $b code will be changed to $c:
     020 $bpbk.
       Corrected to:
     020 $cpbk.

Multiple Subfields $c in 020 Field

If the 020 field contains multiple subfields $c, each subfield $c is placed in a separate 020 field:
     020 $c4.95 (lib. bdg.)$c3.60 (pbk.)
       Corrected to:
     020 $c4.95 (lib. bdg.)
     020 $c3.60 (pbk.)

Multiple Subfields $a and $c in 020 Field

MARS 2.0 programs correctly handle 020 fields with multiple subfields $a and $c:
     020 $a11111111$c4.95$a22222222$c3.60$c8.97$bpbk.
       Corrected to:
     020 $a11111111$c4.95
     020 $a22222222$c3.60
     020 $c8.97 (pbk.)

022 Field

MARS 2.0 Update processing can validated the format of the ISSN in field 022 subfield $a. Some automated systems do not index an ISSN if the format is invalid. A valid ISSN in field 022 subfield $a has the following structure: 4 digits, hyphen, 4 digits (or digits and an X):
     1234-1234
     1234-123X

If the ISSN in field 022 subfield $a does not have the valid structure, MARS 2.0 programs attempt to correct it by making these conversions:
  • If the ISSN has no hyphen, adds a hyphen between the fourth and fifth digits:
    12345678
    1234-5678
  • Converts a lowercase x to uppercase:
    1234-567x
    1234-567X
  • If MARS 2.0 programs cannot correct the format of the ISSN in the 022 subfield $a (e.g., there are 9 digits), the 022 subfield $a code is changed to $y and a report is generated. See report R50 in Step 5 for more information about this report.

034 Field

MARS 2.0 Update processing can validate field 034 CMD (Coded Mathematical Data) for correct format. If the 034 field first indicator has value 2 and the 034 field contains multiple subfields $a, MARS 2.0 Update processing:
  • Places each subfield $a in a separate 034 field
  • Changes each 034 field first indicator to value 1
    034 2_$aa$b100000$aa$b120000
         Corrected to:
    034 1_$aa$b100000
    034 1_$aa$b120000

Historical Fact

First indicator value 2 became obsolete when field 034 was made repeatable in 1982. Older bibliographic records may still have first indicator value 2.

Step 2.3 Question

2-3.png

Leader & Fixed Field Updates

MARS 2.0 Update processing provides a variety of updates and corrections to values in the Leader and fixed fields (006, 007, 008). Indicate whether or not to perform Leader & Fixed Field updates.

You may also elect to modify the standard fixed field updates by making a copy of the fixed field update chart, describing desired changes and returning a copy of the edited chart with your profiles.

Fixed Field Updates Changes to MARC21 replaced many fixed field values or made them obsolete. In the leader, for example, the value p designating a “Record in partial ISBD form” in byte 18, Descriptive Cataloging Form, was made obsolete in 1987 and is now coded using value I (ISBD). MARS 2.0 Update converts an h value in Leader byte 18 to i.

Bytes 18 (Frequency) and 19 (Regularity) in the 008 fixed field for Computer files/Electronic resources format materials were made obsolete in 1995. Additionally, 008 bytes 18-19 are undefined (should not be used) for Mixed materials format. The MARS 2.0 Update software, therefore, converts any values in 008 bytes 18-19 to blanks for Computer files/Electronic resources and Mixed materials records.

MARS 2.0 Leader & Fixed Field Updates Table In the following table, byte position is counted with the first byte being “00” (zero) to be consistent with MARC21 Bibliographic Format documentation. The table uses the following symbols:
     # =  blank space
     | =  fill character

Format codes are as follows:
     BK = Books
     CF = Computer files/Electronic resources
     MP = Maps
     MU = Music
     CR = Continuing resources (serials, etc.)
     VM = Visual materials
     MX = Mixed materials (includes obsolete Archival and Manuscripts control)