Difference between revisions of "RDA 1.2"

From AC Wiki
Jump to: navigation, search
(STEP 1.2 : FILE DOWNLOAD AND FINAL FORMAT)
(UTF-8 vs MARC-8 format)
 
(11 intermediate revisions by 4 users not shown)
Line 1: Line 1:
==STEP 1.2 : TYPE OF RDA PROCESSING==
+
==RDA 1.2: Records Delivered by Backstage==
 +
[[Image:rda1-1.png]]<br><br>
 +
===UTF-8 vs MARC-8 format===
 +
MARC-8 has been the standard format for MARC-21 records since 1968.  Nearly every system that can export records in MARC format can do so in MARC-8 format. The MARC-8 character set uses 8-bit characters. Due to the limitation of
 +
characters that this allows, MARC-8 also includes methods to extend the
 +
displayable characters: spacing based characters (for cursor movement) and
 +
non-spacing characters (diacritics).
  
===records to process===
+
MARC-8 also employs the use of alternate character sets in order to tackle the
The steps taken in this profile can be used to convert all of your bibliographic
+
diacritic display issue. This is done by using escape sequences, which are special
records from AACR2 (or an older standard) to RDA, upgrade existing RDA bib
+
codes to indicate which character set is being selected for display: subscripts,
records (as designated by the 040 $e rda), or create hybrid AACR2/RDA bib
+
superscripts, CJK characters, etc.
records.
+
  
===convert all bib records to rda===
+
While these methods allow for many additional characters to be used, it is still
If you choose to convert all records to RDA, then every AACR2 record that is
+
limited and somewhat burdensome.  For example, built into the MARC-21 format is a limitation that no record can exceed 99,999 characters, and no field can exceed 9,999 characters.  If a record exceeds the field or record size limits, there may be truncation or loss of data.
processed will be updated to include the RDA updates chosen in the rest of this
+
profile.
+
  
===upgrade existing rda records===
+
UTF-8 has been in use since early 1993, and is a standard based on 16-bit characters. The main difference between MARC-8 and UTF-8 is that UTF-8 allows for more character types to be used within the records.  Since UTF-8 can represent many more characters than MARC-8, the files tend to be larger in size.  Each character in UTF-8 is between 1 - 4 bytes (whereas MARC-8 is only 1 byte in length).
When you choose to have Backstage upgrade your existing RDA bibliographic
+
records, our processes will validate and correct standard information within
+
only records that already contain 040 $e rda.
+
  
===create hybrid aacr2/rda records===
+
If your system uses UTF-8, please also let us know whether the characters are in precomposed or decomposed format.  Precomposed characters use combined diacritics (e.g., n & ~ are combined to form: ñ).  Decomposed format separates the characters.
The PCC is working on guidelines for creating a hybrid AACR2/RDA bibliographic
+
record. Options within this profile can help you upgrade your existing AACR2
+
and RDA bib records to be hybrid records as outlined by the PCC standard.
+
  
==LINKS==
+
Additionally, to handle the various ways a composite character could be displayed,
 +
normalization forms have been defined.  Normalization Form Decomposed (NFD) and Normalization Form Composed
 +
(NFC) are standardized forms for handling composite characters.  In NFD, every character that can be decomposed is converted to its most
 +
decomposed form following rules for canonical decomposition.  In NFC, the characters are first decomposed as in NFD, then composed into
 +
precomposed (composite) forms following canonical rules. This may result in
 +
the sequence of characters for a given character changing into an alternate,
 +
equivalent form.
 +
 
 +
==Default==
 +
{| border="0" cellspacing="0" cellpadding="5" align="left" style="border-collapse:collapse;"
 +
|- style="background:#CCFFFF; font-size: 110%;"
 +
| Files are delivered in UTF-8 format through the website.
 +
|}
 +
<div style=clear:both></div><br>
 +
 
 +
==links==
 
<center><font size="4">[[RDA_1.1|1.1]] - [[RDA_1.2|1.2]] - [[RDA_1.3|1.3]] - [[RDA_1.4|1.4]] - [[RDA_1.5|1.5]]
 
<center><font size="4">[[RDA_1.1|1.1]] - [[RDA_1.2|1.2]] - [[RDA_1.3|1.3]] - [[RDA_1.4|1.4]] - [[RDA_1.5|1.5]]
 
<hr>
 
<hr>
[[RDA_1.0|1.0]] - [[RDA_2.0|2.0]] - [[RDA_3.0|3.0]] - [[RDA_4.0|4.0]] - [[RDA_5.0|5.0]] - [[RDA_6.0|6.0]] - [[RDA_7.0|7.0]]</font></center>
+
[[RDA_1.0|1.0]] - [[RDA_2.0|2.0]] - [[RDA_3.0|3.0]] - [[RDA_4.0|4.0]] - [[RDA_5.0|5.0]] - [[RDA_6.0|6.0]]</font></center>
 
[[category:RDA Profile Guide]]
 
[[category:RDA Profile Guide]]

Latest revision as of 16:51, 28 March 2013

RDA 1.2: Records Delivered by Backstage

Rda1-1.png

UTF-8 vs MARC-8 format

MARC-8 has been the standard format for MARC-21 records since 1968. Nearly every system that can export records in MARC format can do so in MARC-8 format. The MARC-8 character set uses 8-bit characters. Due to the limitation of characters that this allows, MARC-8 also includes methods to extend the displayable characters: spacing based characters (for cursor movement) and non-spacing characters (diacritics).

MARC-8 also employs the use of alternate character sets in order to tackle the diacritic display issue. This is done by using escape sequences, which are special codes to indicate which character set is being selected for display: subscripts, superscripts, CJK characters, etc.

While these methods allow for many additional characters to be used, it is still limited and somewhat burdensome. For example, built into the MARC-21 format is a limitation that no record can exceed 99,999 characters, and no field can exceed 9,999 characters. If a record exceeds the field or record size limits, there may be truncation or loss of data.

UTF-8 has been in use since early 1993, and is a standard based on 16-bit characters. The main difference between MARC-8 and UTF-8 is that UTF-8 allows for more character types to be used within the records. Since UTF-8 can represent many more characters than MARC-8, the files tend to be larger in size. Each character in UTF-8 is between 1 - 4 bytes (whereas MARC-8 is only 1 byte in length).

If your system uses UTF-8, please also let us know whether the characters are in precomposed or decomposed format. Precomposed characters use combined diacritics (e.g., n & ~ are combined to form: ñ). Decomposed format separates the characters.

Additionally, to handle the various ways a composite character could be displayed, normalization forms have been defined. Normalization Form Decomposed (NFD) and Normalization Form Composed (NFC) are standardized forms for handling composite characters. In NFD, every character that can be decomposed is converted to its most decomposed form following rules for canonical decomposition. In NFC, the characters are first decomposed as in NFD, then composed into precomposed (composite) forms following canonical rules. This may result in the sequence of characters for a given character changing into an alternate, equivalent form.

Default

Files are delivered in UTF-8 format through the website.

links

1.1 - 1.2 - 1.3 - 1.4 - 1.5
1.0 - 2.0 - 3.0 - 4.0 - 5.0 - 6.0