Difference between revisions of "Dedupe 2.8"

From AC Wiki
Jump to: navigation, search
(Description)
(Only if Both)
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== Dedupe 2.8: Group 1 - 245 $n, $p - Title (Verify)==
 
== Dedupe 2.8: Group 1 - 245 $n, $p - Title (Verify)==
 
[[Image:d2-8.png]]<br><br>
 
[[Image:d2-8.png]]<br><br>
=== Description ===
+
== Description ==
 
This field contains the title of the book, including main title, subtitle, statement(s) of responsibility, and occasionally other information appearing on the title page. Subfields n and p will be used for this parameter.
 
This field contains the title of the book, including main title, subtitle, statement(s) of responsibility, and occasionally other information appearing on the title page. Subfields n and p will be used for this parameter.
  
 
Every MARC record must have a 245 field, but not necessarily a subfield n and/or p.
 
Every MARC record must have a 245 field, but not necessarily a subfield n and/or p.
  
====subfield n - Number of part/section of a work====
+
===245 $n===
 
If a book includes only a part or section of a larger work, the number of that part or section may be entered in $n. Numbering includes letters as well as numbers, and any other designation of sequence.
 
If a book includes only a part or section of a larger work, the number of that part or section may be entered in $n. Numbering includes letters as well as numbers, and any other designation of sequence.
  
   245 $aDivine comedy.$nThird part.
+
  <font size="3">
   245 $aTravels through the orient.$nVolume 2, pt. 4.
+
   245 $a Divine comedy. <font color="red">$n Third part</font>.
 +
   245 $a Travels through the orient. <font color="red">$n Volume 2, pt. 4</font>.</font>
  
====subfield p - Name of part/section of a work====
+
===245 $p===
 +
  <font size="3">
 +
  245 $a Travels through the orient. $n Volume 2, <font color="red">$p Burma</font>.
 +
  245 $a New England byways. $n Part 1, <font color="red">$p Vermont</font>. $n Part 2, <font color="red">$p Massachusetts</font></font>
  
   245 $aTravels through the orient.$nVolume 2,$pBurma.
+
== Verify Method ==
   245 $aNew England byways.$nPart 1,$pVermont.$nPart 2,$pMassachusetts
+
There are a few methods for verifying: FULL, PARTIAL, and WITHIN. These methods will be used for comparing data found in a specific field against the same field in a potential match record.
 +
===Full===
 +
Full compares the entire verify string up to the verify length.
 +
===Partial===
 +
Partial truncates the compare strings to the shortest string, then does a full compare:
 +
   <font size="3">
 +
  Record A:
 +
    $a American history in the 20th century. $p <font color="blue">Part 1</font>.
 +
 
 +
  Record B:
 +
    $a American history in the 20th century. $p <font color="blue">Part 1</font>-1.</font>
 +
Partial matching returns these two records as a match.
 +
===Within===
 +
Within searches each compare string truncated at verify length against the full un-truncated string of the other field:
 +
   <font size="3">
 +
  Record A:
 +
    $a American history in the 20th century. $p Includes other information besides <font color="blue">part 1</font>.
 +
 
 +
  Record B:
 +
    $a American history in the 20th century. $p <font color="blue">Part 1</font>.</font>
 +
Since '''Part 1''' is located elsewhere Within one of the 245 $p, these two records would be considered a match.
 +
===Normalization===
 +
Normalization refers to how the string will be presented when compared to another string. Note that any normalization will not change anything in the record, but is only used when the program compares the strings.
  
=== Verify Method ===
+
Types of normalization are:
*FULL - Full compares the full verify string up to the verify length.
+
*NACO/CJK - retains spaces and subfield delimiters
*PARTIAL - Partial truncates the compare strings to the shortest string, then does a full compare. "The fox in the hound" in one record, "The fox" on the other record : both truncated to "The fox" and compared.
+
*FULL - all spaces and subfield delimiters removed
*WITHIN - Withing searches each compare string truncated at verify length against the full un-truncated string of the other field. "Cat" will ind a potential match on "The cat in the hat."
+
  <font size="3">
 +
  '''original field''':
 +
    $a Daniel Boone. $nNo 1.
 +
 
 +
  '''normalized (naco/cjk)''':
 +
    $ DANIEL BOONE $ NO 1
 +
 
 +
  '''normalized (full)''':
 +
    DANIELBOONENO1</font>
  
=== Normalization ===
+
===Length===
*NACO/CJK retains spaces and subfield delimiters.
+
This refers to how much of a given string the program will present for potential matches:
*FULL is NACO normalization with all spaces and subfield delimiters removed.
+
*Length - Refers to the number of characters for the verify field. The number of characters to be used is 1-2048, or all. Using a length of 10 gives us this example:
 +
  <font size="3">
 +
  '''original heading''':
 +
    $a Daniel Boone. $n No 1.
 +
 
 +
  '''normalized (full), length = 10''':
 +
              <font color="red">1--</font>-----10
 +
    DANIELBOONE<font color="red">NO1</font></font>
 +
===Words===
 +
*Words - Refers to a count of words to match within a given string:
 +
  <font size="3">
 +
  '''original heading''':
 +
    $a Daniel Boone. $n <font color="red">No 1</font>
 +
 
 +
  '''words = 2''':
 +
    <font color="red">No</font> and <font color="red">1</font> are possibilities for keyword matching.</font>
 +
NOTE: Non-filers are excluded from Words.
  
=== Length ===
+
==Must Verify==
This pertains to the number of characters to be used in the verification for the 245$a and $b subfields within the verify method chosen above. The max number of characters that can be used is 2048.
+
This option requires the given field to match between the two records. It also means that the verify field in question must exist in both records (and must match). This is typically common to include as part of the 245 title verification, though other fields may find it useful as well.
  
=== Words ===
+
==Only if Both==
This pertains to the number of words to be used in the verification for the 245$a and $b subfields within the verify method chosen above.  
+
This only does a verify comparison if both records have a specified field; verifies as true if only one of the records has the field. If this option was used on the 1xx field, the following would be true:
 +
Example 1:
 +
  <font size="3">
 +
  Record A has:
 +
    245 $a Adventures of Huckleberry Finn. <font color="red">$n Part two</font>.
 +
 
 +
  Record B has:
 +
    245 $a Adventures of Huckleberry Finn.</font>
 +
'''RESULT''': This would be a match because 245 $n exists in one record but not the other.
  
=== Defaults ===
+
However, when two records each have their own 1XX field and they differ, we have this scenario:
The default for verifying on the 245$a and $b main entry within Hit Group 1 (010/020/022) is as follows:
+
  <font size="3">
 +
  Record A has:
 +
    245 $a Adventures of Huckleberry Finn. $n Part <font color="red">one</font>.
 +
 
 +
  Record B has:
 +
    245 $a Adventures of Huckleberry Finn. $n Part <font color="red">two</font>.</font>
 +
'''RESULT''': This would not be a match because 245 $n of each record differs.
 +
 
 +
== Default==
 +
{| border="0" cellspacing="0" cellpadding="5" align="left" style="border-collapse:collapse;"
 +
! style="background:lightgray" align="left" colspan="2" | Group 1 (010, 020, 022)
 +
|- style="background:#CCFFFF; font-size: 110%;"
 +
| || 245 $n, $p titles must verify if fields exist in both records (full, naco)
 +
|}
 +
<div style=clear:both></div><br>
  
#Must Verify
 
#Full verify method
 
#NACO normalization
 
#Length - all characters
 
 
==links==
 
==links==
<center><font size="4">[[Dedupe_2.1|2.1]] - [[Dedupe_2.2|2.2]] - [[Dedupe_2.3|2.3]] - [[Dedupe_2.4|2.4]] - [[Dedupe_2.5|2.5]] - [[Dedupe_2.6|2.6]] - [[Dedupe_2.7|2.7]] - [[Dedupe_2.8|2.8]] - [[Dedupe_2.9|2.9]] - [[Dedupe_2.10|2.10]] - [[Dedupe_2.11|2.11]] - [[Dedupe_2.12|2.12]]
+
<center><font size="4">[[Dedupe_2.1|2.1]] - [[Dedupe_2.2|2.2]] - [[Dedupe_2.3|2.3]] - [[Dedupe_2.4|2.4]] - [[Dedupe_2.5|2.5]] - [[Dedupe_2.6|2.6]] - [[Dedupe_2.7|2.7]] - [[Dedupe_2.8|2.8]] - [[Dedupe_2.9|2.9]] - [[Dedupe_2.10|2.10]] - [[Dedupe_2.11|2.11]] - [[Dedupe_2.12|2.12]] - [[Dedupe_2.13|2.13]]
 
<hr>
 
<hr>
 
[[Dedupe_1.0|1.0]] - [[Dedupe_2.0|2.0]] - [[Dedupe_3.0|3.0]] - [[Dedupe_4.0|4.0]] - [[Dedupe_5.0|5.0]] - [[Dedupe_6.0|6.0]]</font></center>
 
[[Dedupe_1.0|1.0]] - [[Dedupe_2.0|2.0]] - [[Dedupe_3.0|3.0]] - [[Dedupe_4.0|4.0]] - [[Dedupe_5.0|5.0]] - [[Dedupe_6.0|6.0]]</font></center>
 
[[category:Dedupe Profile Guide]]
 
[[category:Dedupe Profile Guide]]

Latest revision as of 13:16, 2 April 2013

Dedupe 2.8: Group 1 - 245 $n, $p - Title (Verify)

D2-8.png

Description

This field contains the title of the book, including main title, subtitle, statement(s) of responsibility, and occasionally other information appearing on the title page. Subfields n and p will be used for this parameter.

Every MARC record must have a 245 field, but not necessarily a subfield n and/or p.

245 $n

If a book includes only a part or section of a larger work, the number of that part or section may be entered in $n. Numbering includes letters as well as numbers, and any other designation of sequence.

 
 245 $a Divine comedy. $n Third part.
 245 $a Travels through the orient. $n Volume 2, pt. 4.

245 $p

 
 245 $a Travels through the orient. $n Volume 2, $p Burma.
 245 $a New England byways. $n Part 1, $p Vermont. $n Part 2, $p Massachusetts

Verify Method

There are a few methods for verifying: FULL, PARTIAL, and WITHIN. These methods will be used for comparing data found in a specific field against the same field in a potential match record.

Full

Full compares the entire verify string up to the verify length.

Partial

Partial truncates the compare strings to the shortest string, then does a full compare:

 
 Record A:
   $a American history in the 20th century. $p Part 1.
 
 Record B:
   $a American history in the 20th century. $p Part 1-1.

Partial matching returns these two records as a match.

Within

Within searches each compare string truncated at verify length against the full un-truncated string of the other field:

 
 Record A:
   $a American history in the 20th century. $p Includes other information besides part 1.
 
 Record B:
   $a American history in the 20th century. $p Part 1.

Since Part 1 is located elsewhere Within one of the 245 $p, these two records would be considered a match.

Normalization

Normalization refers to how the string will be presented when compared to another string. Note that any normalization will not change anything in the record, but is only used when the program compares the strings.

Types of normalization are:

  • NACO/CJK - retains spaces and subfield delimiters
  • FULL - all spaces and subfield delimiters removed
 
 original field:
   $a Daniel Boone. $nNo 1.
 
 normalized (naco/cjk):
   $ DANIEL BOONE $ NO 1
 
 normalized (full):
   DANIELBOONENO1

Length

This refers to how much of a given string the program will present for potential matches:

  • Length - Refers to the number of characters for the verify field. The number of characters to be used is 1-2048, or all. Using a length of 10 gives us this example:
 
 original heading:
   $a Daniel Boone. $n No 1.
 
 normalized (full), length = 10:
              1-------10
   DANIELBOONENO1

Words

  • Words - Refers to a count of words to match within a given string:
 
 original heading:
   $a Daniel Boone. $n No 1
 
 words = 2:
   No and 1 are possibilities for keyword matching.

NOTE: Non-filers are excluded from Words.

Must Verify

This option requires the given field to match between the two records. It also means that the verify field in question must exist in both records (and must match). This is typically common to include as part of the 245 title verification, though other fields may find it useful as well.

Only if Both

This only does a verify comparison if both records have a specified field; verifies as true if only one of the records has the field. If this option was used on the 1xx field, the following would be true: Example 1:

 
 Record A has:
   245 $a Adventures of Huckleberry Finn. $n Part two.
 
 Record B has:
   245 $a Adventures of Huckleberry Finn.

RESULT: This would be a match because 245 $n exists in one record but not the other.

However, when two records each have their own 1XX field and they differ, we have this scenario:

 
 Record A has:
   245 $a Adventures of Huckleberry Finn. $n Part one.
 
 Record B has:
   245 $a Adventures of Huckleberry Finn. $n Part two.

RESULT: This would not be a match because 245 $n of each record differs.

Default

Group 1 (010, 020, 022)
245 $n, $p titles must verify if fields exist in both records (full, naco)

links

2.1 - 2.2 - 2.3 - 2.4 - 2.5 - 2.6 - 2.7 - 2.8 - 2.9 - 2.10 - 2.11 - 2.12 - 2.13
1.0 - 2.0 - 3.0 - 4.0 - 5.0 - 6.0