Difference between revisions of "Dedupe 5.1"
(Created page with "==LINKS== <center><font size="4">5.1 - 5.2 - 5.3 <hr> 1.0 - 2.0 - 3.0 - [[Dedupe 4.0|4.0...") |
|||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
− | == | + | ==Dedupe 5.1: Keep Which Record== |
+ | [[Image:d5-1.png]]<br><br> | ||
+ | ===Definition=== | ||
+ | When comparing 2 records, the program will decide which record to keep based upon the options chosen in this section. | ||
+ | |||
+ | ===Base=== | ||
+ | Keeps the first record in a file (or group of files). Any subsequent matching records will be merged into the original record. When a record is read into the deduping process, if it does not match any previously read records, it becomes the base record into which all future matches will be merged. | ||
+ | |||
+ | ===Latest=== | ||
+ | This option will retain the record with the latest processing date according to the 005. | ||
+ | If one of the records doesn't have an 005 it compares the 008's date (positions 0 through 5). If one of the records doesn't have an 008 it keeps the record with an 005. If neither record has an 005 it keeps the record with an 008. If neither record has an 005 or an 008 it keeps the base record. | ||
+ | |||
+ | ===Incoming=== | ||
+ | This option is the opposite of the ''Base'' option. | ||
+ | If you have a database of 1000 records being sent through the deduplication process and record 1 and record 200 are duplicates, then record 200 will be retained regardless of which record might be better. | ||
+ | |||
+ | ===Most tags=== | ||
+ | Compares the original record with the potential match. The best match is determined to be the one with the highest count of a given field in the record, which may not necessarily represent the largest record. | ||
+ | |||
+ | This can be used if you want to retain records with the highest count of 650 fields, or 5XX fields. | ||
+ | |||
+ | ===Largest=== | ||
+ | Compares the original record with the potential match. The best match is determined to be the one that has the most data, which may not necessarily represent the one with the most fields. | ||
+ | Record A has a lot of data in a 5XX field, but has no 6XX fields | ||
+ | Record B has limited data in the 5XX field along with 6XX fields | ||
+ | Record A will be the retained if it has more data in total number of characters/bytes. | ||
+ | |||
+ | ====Ignore fields==== | ||
+ | Within the Largest option, a field(s) can be chosen to be omitted from the ''largest'' equation. For the example above, if 5XX were chosen to be ignored, then it would retain Record B, barring another field. This allows for the 9XX and holdings fields to be ignored. | ||
+ | |||
+ | ===Defaults=== | ||
+ | *Keep Largest Record and ignore the 9XX fields in determining largest record | ||
+ | |||
+ | ==links== | ||
<center><font size="4">[[Dedupe 5.1|5.1]] - [[Dedupe 5.2|5.2]] - [[Dedupe 5.3|5.3]] | <center><font size="4">[[Dedupe 5.1|5.1]] - [[Dedupe 5.2|5.2]] - [[Dedupe 5.3|5.3]] | ||
<hr> | <hr> | ||
[[Dedupe 1.0|1.0]] - [[Dedupe 2.0|2.0]] - [[Dedupe 3.0|3.0]] - [[Dedupe 4.0|4.0]] - [[Dedupe 5.0|5.0]] - [[Dedupe 6.0|6.0]]</font></center> | [[Dedupe 1.0|1.0]] - [[Dedupe 2.0|2.0]] - [[Dedupe 3.0|3.0]] - [[Dedupe 4.0|4.0]] - [[Dedupe 5.0|5.0]] - [[Dedupe 6.0|6.0]]</font></center> | ||
[[category:Profile Guide]] | [[category:Profile Guide]] |
Latest revision as of 08:38, 1 April 2013
Contents
Dedupe 5.1: Keep Which Record
Definition
When comparing 2 records, the program will decide which record to keep based upon the options chosen in this section.
Base
Keeps the first record in a file (or group of files). Any subsequent matching records will be merged into the original record. When a record is read into the deduping process, if it does not match any previously read records, it becomes the base record into which all future matches will be merged.
Latest
This option will retain the record with the latest processing date according to the 005. If one of the records doesn't have an 005 it compares the 008's date (positions 0 through 5). If one of the records doesn't have an 008 it keeps the record with an 005. If neither record has an 005 it keeps the record with an 008. If neither record has an 005 or an 008 it keeps the base record.
Incoming
This option is the opposite of the Base option. If you have a database of 1000 records being sent through the deduplication process and record 1 and record 200 are duplicates, then record 200 will be retained regardless of which record might be better.
Most tags
Compares the original record with the potential match. The best match is determined to be the one with the highest count of a given field in the record, which may not necessarily represent the largest record.
This can be used if you want to retain records with the highest count of 650 fields, or 5XX fields.
Largest
Compares the original record with the potential match. The best match is determined to be the one that has the most data, which may not necessarily represent the one with the most fields.
Record A has a lot of data in a 5XX field, but has no 6XX fields Record B has limited data in the 5XX field along with 6XX fields
Record A will be the retained if it has more data in total number of characters/bytes.
Ignore fields
Within the Largest option, a field(s) can be chosen to be omitted from the largest equation. For the example above, if 5XX were chosen to be ignored, then it would retain Record B, barring another field. This allows for the 9XX and holdings fields to be ignored.
Defaults
- Keep Largest Record and ignore the 9XX fields in determining largest record
links
1.0 - 2.0 - 3.0 - 4.0 - 5.0 - 6.0