Difference between revisions of "Dedupe 2.1"

From AC Wiki
Jump to: navigation, search
(LCCN)
Line 17: Line 17:
 
             Both will be searched to find a potential match.</font>
 
             Both will be searched to find a potential match.</font>
  
==== historical fact ====
+
==== Historical fact ====
  
 
LC changed the structure of the LCCN beginning on Jan. 1, 2001 in order to accommodate a four-digit year. The length of the control number remains 12 characters as it was prior to the change. However, in the old LCCN structure (A), suffixes were occasionally used. Under the new LCCN structure (B), the location of elements is slightly altered to accommodate a four-digit year. Under both structures, the prefix, year and serial number are the basic elements required to make a LCCN unique.
 
LC changed the structure of the LCCN beginning on Jan. 1, 2001 in order to accommodate a four-digit year. The length of the control number remains 12 characters as it was prior to the change. However, in the old LCCN structure (A), suffixes were occasionally used. Under the new LCCN structure (B), the location of elements is slightly altered to accommodate a four-digit year. Under both structures, the prefix, year and serial number are the basic elements required to make a LCCN unique.
Line 28: Line 28:
  
 
*Examples of ISBN normalization:
 
*Examples of ISBN normalization:
   Example 1: "0937295124 : $12.95" will become "0937295124"
+
  <font size="4">
   Example 2: "978-0-06-108096-8" will become "9780061080968"
+
   Example 1: 0937295124 : $12.95 '''becomes''' 0937295124
   Example 3: "0688076815 (pbk.)" will become "0688076815"
+
   Example 2: 978-0-06-108096-8 '''becomes''' 9780061080968
 +
   Example 3: 0688076815 (pbk.) '''becomes''' 0688076815</font>
  
==== historical fact ====
+
==== Historical fact ====
  
 
The structure of the ISBN has changed over the past thirty years. Prior to 1977, the 020 field was not repeatable and multiple ISBNs and related information were placed in repeated subfields. Older bibliographic records may still have multiple ISBNs in a single 020 field rather than in multiple 020 fields. January 1, 2007 marked the final date for fully adopting ISBN-13. Between 2005 and 2008, publishers were encouraged to supply both an ISBN-10 and an ISBN-13 for the same manifestation, based on guidelines issued by the International ISBN Agency (IIA). The Library of Congress began accommodating ISBN-13 on October 1, 2004. At the beginning of 2007 is when publishers were expected to supply only ISBN-13.
 
The structure of the ISBN has changed over the past thirty years. Prior to 1977, the 020 field was not repeatable and multiple ISBNs and related information were placed in repeated subfields. Older bibliographic records may still have multiple ISBNs in a single 020 field rather than in multiple 020 fields. January 1, 2007 marked the final date for fully adopting ISBN-13. Between 2005 and 2008, publishers were encouraged to supply both an ISBN-10 and an ISBN-13 for the same manifestation, based on guidelines issued by the International ISBN Agency (IIA). The Library of Congress began accommodating ISBN-13 on October 1, 2004. At the beginning of 2007 is when publishers were expected to supply only ISBN-13.
Line 43: Line 44:
  
 
*Examples of ISSN normalization:
 
*Examples of ISSN normalization:
   Example 1: "0829-0784" will become "08290784"
+
  <font size="4">
   Example 2: "0009-5753 PERIODICAL" becomes "00095753"
+
   Example 1: 0829-0784 '''becomes''' 08290784
 +
   Example 2: 0009-5753 PERIODICAL '''becomes''' 00095753</font>
  
=== DEFAULT ===
+
=== Default===
  
 
For the 010/020/022 fields the default is to search the entire '''subfield a''' after normalization.
 
For the 010/020/022 fields the default is to search the entire '''subfield a''' after normalization.

Revision as of 13:47, 26 March 2013

Dedupe 2.1: Numeric Field Hits - Group 1

D2-1.png

Numeric Field Hits

LCCN

historical fact

LC changed the structure of the LCCN beginning on Jan. 1, 2001 in order to accommodate a four-digit year. The length of the control number remains 12 characters as it was prior to the change. However, in the old LCCN structure (A), suffixes were occasionally used. Under the new LCCN structure (B), the location of elements is slightly altered to accommodate a four-digit year. Under both structures, the prefix, year and serial number are the basic elements required to make a LCCN unique.

LCCN - 010 subfield a: For the Library of Congress Control Number, the subfield a will be used as default. For searching only (not changed in the final record), this field will be normalized. This will remove extra spaces, punctuation and extra data that is usually contained in a different subfield.

  • Examples of LCCN normalization:
 
 Example 1: ###755262 becomes 75005262
 Example 2: ###80020863 /AC/r86 becomes 80020863ACr86 and 80020863
            Both will be searched to find a potential match.

Historical fact

LC changed the structure of the LCCN beginning on Jan. 1, 2001 in order to accommodate a four-digit year. The length of the control number remains 12 characters as it was prior to the change. However, in the old LCCN structure (A), suffixes were occasionally used. Under the new LCCN structure (B), the location of elements is slightly altered to accommodate a four-digit year. Under both structures, the prefix, year and serial number are the basic elements required to make a LCCN unique.

ISBN

ISBN - 020 subfield a:

For the International Standard Book Number, the subfield a will be used as default. For searching only (not changed in the final record), this field will be normalized. This will remove extra spaces, punctuation and extra data that is usually contained in a different subfield.

  • Examples of ISBN normalization:
 
 Example 1: 0937295124 : $12.95 becomes 0937295124
 Example 2: 978-0-06-108096-8 becomes 9780061080968
 Example 3: 0688076815 (pbk.) becomes 0688076815

Historical fact

The structure of the ISBN has changed over the past thirty years. Prior to 1977, the 020 field was not repeatable and multiple ISBNs and related information were placed in repeated subfields. Older bibliographic records may still have multiple ISBNs in a single 020 field rather than in multiple 020 fields. January 1, 2007 marked the final date for fully adopting ISBN-13. Between 2005 and 2008, publishers were encouraged to supply both an ISBN-10 and an ISBN-13 for the same manifestation, based on guidelines issued by the International ISBN Agency (IIA). The Library of Congress began accommodating ISBN-13 on October 1, 2004. At the beginning of 2007 is when publishers were expected to supply only ISBN-13.

ISSN

ISSN - 022 subfield a:

For the International Standard Serial Number, the subfield a will be used as default. For searching only (not changed in the final record), this field will be normalized. This will remove extra spaces, punctuation and extra data that is usually contained in a different subfield.

  • Examples of ISSN normalization:
 
 Example 1: 0829-0784 becomes 08290784
 Example 2: 0009-5753 PERIODICAL becomes 00095753

Default

For the 010/020/022 fields the default is to search the entire subfield a after normalization.

The other options are as follows:

  1. Use the entire field data including subfields other than subfield a
  2. Don't use normalization
  3. Connect different tag numbers (see Step 4-3. Like Tags)

links

2.1 - 2.2 - 2.3 - 2.4 - 2.5 - 2.6 - 2.7 - 2.8 - 2.9 - 2.10 - 2.11 - 2.12
1.0 - 2.0 - 3.0 - 4.0 - 5.0 - 6.0