Difference between revisions of "Dedupe 5.1"

From AC Wiki
Jump to: navigation, search
 
Line 1: Line 1:
 +
==Dedupe 5.1: Keep Which Record==
 
[[Image:d5-1.png]]<br><br>
 
[[Image:d5-1.png]]<br><br>
==LINKS==
+
===Definition===
 +
When comparing 2 records, the program will decide which record to keep based upon the options chosen in this section.
 +
 
 +
===Base===
 +
Keeps the first record in a file (or group of files).  Any subsequent matching records will be merged into the original record.  When a record is read into the deduping process, if it does not match any previously read records, it becomes the base record into which all future matches will be merged.
 +
 
 +
===Latest===
 +
This option will retain the record with the latest processing date according to the 005.
 +
If one of the records doesn't have an 005 it compares the 008's date (positions 0 through 5). If one of the records doesn't have an 008 it keeps the record with an 005. If neither record has an 005 it keeps the record with an 008. If neither record has an 005 or an 008 it keeps the base record.
 +
 
 +
===Incoming===
 +
This option is the opposite of the ''Base'' option.
 +
If you have a database of 1000 records being sent through the deduplication process and record 1 and record 200 are duplicates, then record 200 will be retained regardless of which record might be better.
 +
 
 +
===Most tags===
 +
Compares the original record with the potential match.  The best match is determined to be the one with the highest count of a given field in the record, which may not necessarily represent the largest record.
 +
 
 +
This can be used if you want to retain records with the highest count of 650 fields, or 5XX fields.
 +
 
 +
===Largest===
 +
Compares the original record with the potential match.  The best match is determined to be the one that has the most data, which may not necessarily represent the one with the most fields.
 +
  Record A has a lot of data in a 5XX field, but has no 6XX fields
 +
  Record B has limited data in the 5XX field along with 6XX fields
 +
Record A will be the retained if it has more data in total number of characters/bytes.
 +
 
 +
====Ignore fields====
 +
Within the Largest option, a field(s) can be chosen to be omitted from the ''largest'' equation. For the example above, if 5XX were chosen to be ignored, then it would retain Record B, barring another field.  This allows for the 9XX and holdings fields to be ignored.
 +
 
 +
===Defaults===
 +
*Keep Largest Record and ignore the 9XX fields in determining largest record
 +
 
 +
==links==
 
<center><font size="4">[[Dedupe 5.1|5.1]] - [[Dedupe 5.2|5.2]] - [[Dedupe 5.3|5.3]]
 
<center><font size="4">[[Dedupe 5.1|5.1]] - [[Dedupe 5.2|5.2]] - [[Dedupe 5.3|5.3]]
 
<hr>
 
<hr>
 
[[Dedupe 1.0|1.0]] - [[Dedupe 2.0|2.0]] - [[Dedupe 3.0|3.0]] - [[Dedupe 4.0|4.0]] - [[Dedupe 5.0|5.0]] - [[Dedupe 6.0|6.0]]</font></center>
 
[[Dedupe 1.0|1.0]] - [[Dedupe 2.0|2.0]] - [[Dedupe 3.0|3.0]] - [[Dedupe 4.0|4.0]] - [[Dedupe 5.0|5.0]] - [[Dedupe 6.0|6.0]]</font></center>
 
[[category:Profile Guide]]
 
[[category:Profile Guide]]

Latest revision as of 09:38, 1 April 2013

Dedupe 5.1: Keep Which Record

D5-1.png

Definition

When comparing 2 records, the program will decide which record to keep based upon the options chosen in this section.

Base

Keeps the first record in a file (or group of files). Any subsequent matching records will be merged into the original record. When a record is read into the deduping process, if it does not match any previously read records, it becomes the base record into which all future matches will be merged.

Latest

This option will retain the record with the latest processing date according to the 005. If one of the records doesn't have an 005 it compares the 008's date (positions 0 through 5). If one of the records doesn't have an 008 it keeps the record with an 005. If neither record has an 005 it keeps the record with an 008. If neither record has an 005 or an 008 it keeps the base record.

Incoming

This option is the opposite of the Base option. If you have a database of 1000 records being sent through the deduplication process and record 1 and record 200 are duplicates, then record 200 will be retained regardless of which record might be better.

Most tags

Compares the original record with the potential match. The best match is determined to be the one with the highest count of a given field in the record, which may not necessarily represent the largest record.

This can be used if you want to retain records with the highest count of 650 fields, or 5XX fields.

Largest

Compares the original record with the potential match. The best match is determined to be the one that has the most data, which may not necessarily represent the one with the most fields.

 Record A has a lot of data in a 5XX field, but has no 6XX fields
 Record B has limited data in the 5XX field along with 6XX fields

Record A will be the retained if it has more data in total number of characters/bytes.

Ignore fields

Within the Largest option, a field(s) can be chosen to be omitted from the largest equation. For the example above, if 5XX were chosen to be ignored, then it would retain Record B, barring another field. This allows for the 9XX and holdings fields to be ignored.

Defaults

  • Keep Largest Record and ignore the 9XX fields in determining largest record

links

5.1 - 5.2 - 5.3
1.0 - 2.0 - 3.0 - 4.0 - 5.0 - 6.0