Difference between revisions of "Dedupe 1.1"

From AC Wiki
Jump to: navigation, search
(Dedupe 1.1: Records Uploaded to Backstage)
(Bibliographic records)
Line 4: Line 4:
 
=== Bibliographic records ===
 
=== Bibliographic records ===
 
There are all kinds of records nowadays, comprising many different formats:
 
There are all kinds of records nowadays, comprising many different formats:
MARC, MARCXML, ONIX, etc. Files of these types may contain similar
+
MARC8, UTF8, MARCXML, etc. Files of these types may contain similar
 
information, yet it is broken up into different fields or elements, depending on
 
information, yet it is broken up into different fields or elements, depending on
 
format.
 
format.
  
While MARS 2.0 processes records natively in MARC format, we have many
+
While BSLW processes records natively in MARC format, we have many
 
tools available to successfully convert to or from any of the other formats listed
 
tools available to successfully convert to or from any of the other formats listed
 
above.
 
above.
Line 15: Line 15:
 
system or process bibliographic records purchased from a different cataloging
 
system or process bibliographic records purchased from a different cataloging
 
source.
 
source.
 +
 +
== MARC 8 ==
 +
MARC-8 has been the standard format for MARC-21 records since 1968. Nearly every system that can export records in MARC format can do so in MARC-8 format.
 +
 +
There is an inherent limitation built into MARC-21 format such that no record can exceed 99,999 characters. Also, no field can exceed 9,999 characters.  If a record exceeds the field or record size limits, there may be truncation or loss of data.
 +
 +
During the deduplication process, Backstage will notify customers in the event that a record cannot be processed due to field or record length.
 +
 +
== UTF-8 Format ==
 +
UTF-8 has been in use since early 1993.  The main difference between MARC-8 and UTF-8 is that UTF-8 allows for more character types to be used within the records.
 +
Since UTF-8 can represent many more characters than MARC-8, the files tend to be larger in size.  Each character in UTF-8 is between 1 - 4 bytes (whereas MARC-8 is only 1 byte in length).
 +
If your system uses UTF-8, please also let us know whether the characters are in precomposed or decomposed format.  Precomposed characters use combined diacritics (e.g., n & ~ are combined to form: ñ).  Decomposed format separates the characters.
 +
 +
== MARCXML ==
 +
MARCXML was developed by the Library of Congress and is based on the MARC-21 format.
 +
The number one advantage with MARCXML format is that there are no limitations to either the field or record size of the data.  While both MARC-8 and UTF-8 are constrained by the field and record limits, MARCXML conveniently circumvents that.
  
 
=== File handling ===
 
=== File handling ===

Revision as of 14:41, 21 March 2013

Dedupe 1.1: Records Uploaded to Backstage

Ac1-1.png

Bibliographic records

There are all kinds of records nowadays, comprising many different formats: MARC8, UTF8, MARCXML, etc. Files of these types may contain similar information, yet it is broken up into different fields or elements, depending on format.

While BSLW processes records natively in MARC format, we have many tools available to successfully convert to or from any of the other formats listed above.

A library can process all or part of the bibliographic records from its local ILS system or process bibliographic records purchased from a different cataloging source.

MARC 8

MARC-8 has been the standard format for MARC-21 records since 1968. Nearly every system that can export records in MARC format can do so in MARC-8 format.

There is an inherent limitation built into MARC-21 format such that no record can exceed 99,999 characters. Also, no field can exceed 9,999 characters. If a record exceeds the field or record size limits, there may be truncation or loss of data.

During the deduplication process, Backstage will notify customers in the event that a record cannot be processed due to field or record length.

UTF-8 Format

UTF-8 has been in use since early 1993. The main difference between MARC-8 and UTF-8 is that UTF-8 allows for more character types to be used within the records. Since UTF-8 can represent many more characters than MARC-8, the files tend to be larger in size. Each character in UTF-8 is between 1 - 4 bytes (whereas MARC-8 is only 1 byte in length). If your system uses UTF-8, please also let us know whether the characters are in precomposed or decomposed format. Precomposed characters use combined diacritics (e.g., n & ~ are combined to form: ñ). Decomposed format separates the characters.

MARCXML

MARCXML was developed by the Library of Congress and is based on the MARC-21 format. The number one advantage with MARCXML format is that there are no limitations to either the field or record size of the data. While both MARC-8 and UTF-8 are constrained by the field and record limits, MARCXML conveniently circumvents that.

File handling

At Backstage, we enjoy providing our customers with options. Each part of our profile is geared to be as customizable as possible, providing you with a few different options to get you started. Any part of the profile may then be expanded upon in order to match your expectations to the desired results.

Our preferred method of file-handling is through our website portal. Each customer will have their own login and password to access the site. Once logged in, our customers can view or edit their profile at any time, upload new files for processing, or retrieve files at their convenience.

We also recognize that our customers may already have upload and download scripts written on their side. So it may make more sense to use a traditional FTP method to transfer files between Backstage and your system.

In this first step, we think it is a good time to also discuss the format of your MARC records. Here you can let us know which format the file submitted will be sent as: MARC-8 or UTF-8. If you do not know, chances are excellent that we can easily inform you once you upload your file to us.

links

1.1 - 1.2 - 1.3
1.0 - 2.0 - 3.0 - 4.0 - 5.0 - 6.0