Difference between revisions of "Dedupe 2.7"

From AC Wiki
Jump to: navigation, search
(Default)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
==Dedupe 2.7: Group 1 - 245 $a, $b - Title (Verify)==
 
==Dedupe 2.7: Group 1 - 245 $a, $b - Title (Verify)==
 
[[Image:d2-7.png]]<br><br>
 
[[Image:d2-7.png]]<br><br>
=== Description ===
+
== Description ==
 
This field contains the title of the book, including main title, subtitle, statement(s) of responsibility, and occasionally other information appearing on the title page.
 
This field contains the title of the book, including main title, subtitle, statement(s) of responsibility, and occasionally other information appearing on the title page.
  
Line 8: Line 8:
 
Subfields a and b will be used for this parameter. If only subfield a is wanted to be used, or if other subfields are to be added other than subfields $n, $p and $h, please use section 2-13 of the dedupe profile to add this information.
 
Subfields a and b will be used for this parameter. If only subfield a is wanted to be used, or if other subfields are to be added other than subfields $n, $p and $h, please use section 2-13 of the dedupe profile to add this information.
  
=== Verify Method ===
+
== Verify Method ==
*FULL - Full compares the full verify string up to the verify length.
+
There are a few methods for verifying: FULL, PARTIAL, and WITHIN. These methods will be used for comparing data found in a specific field against the same field in a potential match record.
*PARTIAL - Partial truncates the compare strings to the shortest string, then does a full compare. "The fox in the hound" in one record, "The fox" on the other record : both truncated to "The fox" and compared.
+
===Full===
*WITHIN - Withing searches each compare string truncated at verify length against the full un-truncated string of the other field. "Cat" will ind a potential match on "The cat in the hat."
+
Full compares the entire verify string up to the verify length.
 +
===Partial===
 +
Partial truncates the compare strings to the shortest string, then does a full compare:
 +
  <font size="3">
 +
  Record A:
 +
    <font color="red">The fox</font> and the hound.
 +
 
 +
  Record B:
 +
    <font color="red">The fox</font>.
 +
 
 +
  ''Both are truncated to'' <font color="red">The fox</font> ''and compared.''</font>
 +
===Within===
 +
Within searches each compare string truncated at verify length against the full un-truncated string of the other field:
 +
  <font size="3">
 +
  Record A:
 +
    <font color="red">Cat</font>.
 +
 
 +
  Record B:
 +
    The <font color="red">cat</font> in the hat.
 +
 
 +
  <font color="red">Cat</font> ''in Record A will verify against'' The <font color="red">cat</font> in the hat ''in Record B.''</font>
  
=== Normalization ===
+
===Normalization===
*NACO/CJK retains spaces and subfield delimiters.
+
Normalization refers to how the string will be presented when compared to another string. Note that any normalization will not change anything in the record, but is only used when the program compares the strings.
*FULL is NACO normalization with all spaces and subfield delimiters removed.
+
 
 +
Types of normalization are:
 +
*NACO/CJK - retains spaces and subfield delimiters
 +
*FULL - all spaces and subfield delimiters removed
 +
  <font size="3">
 +
  '''original field''':
 +
    $a Daniel Boone :$b a pioneer.
 +
 
 +
  '''normalized (naco/cjk)''':
 +
    $ DANIEL BOONE $ A PIONEER
 +
 
 +
  '''normalized (full)''':
 +
    DANIELBOONEAPIONEER</font>
  
 
=== Length ===
 
=== Length ===
This pertains to the number of characters to be used in the verification for the 245$a and $b subfields within the verify method chosen above. The max number of characters that can be used is 2048.
+
This refers to how much of a given string the program will present for potential matches from the 245 $a & $b (max length = 2048 characters):
 +
Using a length of 10 gives us this example:
 +
  <font size="3">
 +
  '''original heading''':
 +
    $a Daniel Boone :$b a pioneer.
 +
 
 +
  '''normalized (full), length = 10''':
 +
    1-------10
 +
    <font color="red">DANIELBOON</font>EAPIONEER</font>
  
 
=== Words ===
 
=== Words ===
This pertains to the number of words to be used in the verification for the 245$a and $b subfields within the verify method chosen above.  
+
This pertains to the number of words to be used in the verification for 245 $a & $b:
 +
  <font size="3">
 +
  '''original heading''':
 +
    $a Daniel Boone :$b a pioneer.
 +
 
 +
  '''words = 2''':
 +
    <font color="red">Daniel</font>, <font color="red">Boone</font>, <font color="red">A</font>, or <font color="red">Pioneer</font> are all possibilities for keyword matching.</font>
 +
NOTE: Non-filers are excluded from Words.
  
 +
==Must Verify==
 +
This option requires the given field to match between the two records. It also means that the verify field in question must exist in both records (and must match). This is typically common to include as part of the 245 title verification, though other fields may find it useful as well.
 
== Default==
 
== Default==
 
{| border="0" cellspacing="0" cellpadding="5" align="left" style="border-collapse:collapse;"
 
{| border="0" cellspacing="0" cellpadding="5" align="left" style="border-collapse:collapse;"
Line 32: Line 81:
  
 
==links==
 
==links==
<center><font size="4">[[Dedupe_2.1|2.1]] - [[Dedupe_2.2|2.2]] - [[Dedupe_2.3|2.3]] - [[Dedupe_2.4|2.4]] - [[Dedupe_2.5|2.5]] - [[Dedupe_2.6|2.6]] - [[Dedupe_2.7|2.7]] - [[Dedupe_2.8|2.8]] - [[Dedupe_2.9|2.9]] - [[Dedupe_2.10|2.10]] - [[Dedupe_2.11|2.11]] - [[Dedupe_2.12|2.12]]
+
<center><font size="4">[[Dedupe_2.1|2.1]] - [[Dedupe_2.2|2.2]] - [[Dedupe_2.3|2.3]] - [[Dedupe_2.4|2.4]] - [[Dedupe_2.5|2.5]] - [[Dedupe_2.6|2.6]] - [[Dedupe_2.7|2.7]] - [[Dedupe_2.8|2.8]] - [[Dedupe_2.9|2.9]] - [[Dedupe_2.10|2.10]] - [[Dedupe_2.11|2.11]] - [[Dedupe_2.12|2.12]] - [[Dedupe_2.13|2.13]]
 
<hr>
 
<hr>
 
[[Dedupe_1.0|1.0]] - [[Dedupe_2.0|2.0]] - [[Dedupe_3.0|3.0]] - [[Dedupe_4.0|4.0]] - [[Dedupe_5.0|5.0]] - [[Dedupe_6.0|6.0]]</font></center>
 
[[Dedupe_1.0|1.0]] - [[Dedupe_2.0|2.0]] - [[Dedupe_3.0|3.0]] - [[Dedupe_4.0|4.0]] - [[Dedupe_5.0|5.0]] - [[Dedupe_6.0|6.0]]</font></center>
 
[[category:Dedupe Profile Guide]]
 
[[category:Dedupe Profile Guide]]

Latest revision as of 11:58, 2 April 2013

Dedupe 2.7: Group 1 - 245 $a, $b - Title (Verify)

D2-7.png

Description

This field contains the title of the book, including main title, subtitle, statement(s) of responsibility, and occasionally other information appearing on the title page.

Every record must have a 245 tag.

Subfields a and b will be used for this parameter. If only subfield a is wanted to be used, or if other subfields are to be added other than subfields $n, $p and $h, please use section 2-13 of the dedupe profile to add this information.

Verify Method

There are a few methods for verifying: FULL, PARTIAL, and WITHIN. These methods will be used for comparing data found in a specific field against the same field in a potential match record.

Full

Full compares the entire verify string up to the verify length.

Partial

Partial truncates the compare strings to the shortest string, then does a full compare:

 
 Record A:
   The fox and the hound.
 
 Record B:
   The fox.
 
 Both are truncated to The fox and compared.

Within

Within searches each compare string truncated at verify length against the full un-truncated string of the other field:

 
 Record A:
   Cat.
 
 Record B:
   The cat in the hat.
 
 Cat in Record A will verify against The cat in the hat in Record B.

Normalization

Normalization refers to how the string will be presented when compared to another string. Note that any normalization will not change anything in the record, but is only used when the program compares the strings.

Types of normalization are:

  • NACO/CJK - retains spaces and subfield delimiters
  • FULL - all spaces and subfield delimiters removed
 
 original field:
   $a Daniel Boone :$b a pioneer.
 
 normalized (naco/cjk):
   $ DANIEL BOONE $ A PIONEER
 
 normalized (full):
   DANIELBOONEAPIONEER

Length

This refers to how much of a given string the program will present for potential matches from the 245 $a & $b (max length = 2048 characters): Using a length of 10 gives us this example:

 
 original heading:
   $a Daniel Boone :$b a pioneer.
 
 normalized (full), length = 10:
   1-------10
   DANIELBOONEAPIONEER

Words

This pertains to the number of words to be used in the verification for 245 $a & $b:

 
 original heading:
   $a Daniel Boone :$b a pioneer.
 
 words = 2:
   Daniel, Boone, A, or Pioneer are all possibilities for keyword matching.

NOTE: Non-filers are excluded from Words.

Must Verify

This option requires the given field to match between the two records. It also means that the verify field in question must exist in both records (and must match). This is typically common to include as part of the 245 title verification, though other fields may find it useful as well.

Default

Group 1 (010, 020, 022)
245 $a, $b main entry must verify (full, naco); all characters are considered

links

2.1 - 2.2 - 2.3 - 2.4 - 2.5 - 2.6 - 2.7 - 2.8 - 2.9 - 2.10 - 2.11 - 2.12 - 2.13
1.0 - 2.0 - 3.0 - 4.0 - 5.0 - 6.0