Difference between revisions of "RDA 1.2"

Latest revision as of 16:51, 28 March 2013

RDA 1.2: Records Delivered by Backstage

UTF-8 vs MARC-8 format

MARC-8 has been the standard format for MARC-21 records since 1968. Nearly every system that can export records in MARC format can do so in MARC-8 format. The MARC-8 character set uses 8-bit characters. Due to the limitation of characters that this allows, MARC-8 also includes methods to extend the displayable characters: spacing based characters (for cursor movement) and non-spacing characters (diacritics).

MARC-8 also employs the use of alternate character sets in order to tackle the diacritic display issue. This is done by using escape sequences, which are special codes to indicate which character set is being selected for display: subscripts, superscripts, CJK characters, etc.

While these methods allow for many additional characters to be used, it is still limited and somewhat burdensome. For example, built into the MARC-21 format is a limitation that no record can exceed 99,999 characters, and no field can exceed 9,999 characters. If a record exceeds the field or record size limits, there may be truncation or loss of data.

UTF-8 has been in use since early 1993, and is a standard based on 16-bit characters. The main difference between MARC-8 and UTF-8 is that UTF-8 allows for more character types to be used within the records. Since UTF-8 can represent many more characters than MARC-8, the files tend to be larger in size. Each character in UTF-8 is between 1 - 4 bytes (whereas MARC-8 is only 1 byte in length).

If your system uses UTF-8, please also let us know whether the characters are in precomposed or decomposed format. Precomposed characters use combined diacritics (e.g., n & ~ are combined to form: ñ). Decomposed format separates the characters.

Additionally, to handle the various ways a composite character could be displayed, normalization forms have been defined. Normalization Form Decomposed (NFD) and Normalization Form Composed (NFC) are standardized forms for handling composite characters. In NFD, every character that can be decomposed is converted to its most decomposed form following rules for canonical decomposition. In NFC, the characters are first decomposed as in NFD, then composed into precomposed (composite) forms following canonical rules. This may result in the sequence of characters for a given character changing into an alternate, equivalent form.

Default

Files are delivered in UTF-8 format through the website.

links

1.1 - 1.2 - 1.3 - 1.4 - 1.5

1.0 - 2.0 - 3.0 - 4.0 - 5.0 - 6.0

@@ Line 1: / Line 1: @@
-==STEP 1.2 : TYPE OF RDA PROCESSING==
+==RDA 1.2: Records Delivered by Backstage==
-[[File:RDA_1.2.jpg]]
+[[Image:rda1-1.png]]<br><br>
-===records to process===
+===UTF-8 vs MARC-8 format===
-The steps taken in this profile can be used to convert all of your bibliographic
+MARC-8 has been the standard format for MARC-21 records since 1968.  Nearly every system that can export records in MARC format can do so in MARC-8 format. The MARC-8 character set uses 8-bit characters. Due to the limitation of
-records from AACR2 (or an older standard) to RDA, upgrade existing RDA bib
+characters that this allows, MARC-8 also includes methods to extend the
-records (as designated by the 040 $e rda), or create hybrid AACR2/RDA bib
+displayable characters: spacing based characters (for cursor movement) and
-records.
+non-spacing characters (diacritics).
-===convert all bib records to rda===
+MARC-8 also employs the use of alternate character sets in order to tackle the
-If you choose to convert all records to RDA, then every AACR2 record that is
+diacritic display issue. This is done by using escape sequences, which are special
-processed will be updated to include the RDA updates chosen in the rest of this
+codes to indicate which character set is being selected for display: subscripts,
-profile.
+superscripts, CJK characters, etc.
-===upgrade existing rda records===
+While these methods allow for many additional characters to be used, it is still
-When you choose to have Backstage upgrade your existing RDA bibliographic
+limited and somewhat burdensome.  For example, built into the MARC-21 format is a limitation that no record can exceed 99,999 characters, and no field can exceed 9,999 characters.  If a record exceeds the field or record size limits, there may be truncation or loss of data.
-records, our processes will validate and correct standard information within
-only records that already contain 040 $e rda.
-===create hybrid aacr2/rda records===
+UTF-8 has been in use since early 1993, and is a standard based on 16-bit characters. The main difference between MARC-8 and UTF-8 is that UTF-8 allows for more character types to be used within the records.  Since UTF-8 can represent many more characters than MARC-8, the files tend to be larger in size.  Each character in UTF-8 is between 1 - 4 bytes (whereas MARC-8 is only 1 byte in length).
-The PCC is working on guidelines for creating a hybrid AACR2/RDA bibliographic
-record. Options within this profile can help you upgrade your existing AACR2
-and RDA bib records to be hybrid records as outlined by the PCC standard.
-==LINKS==
+If your system uses UTF-8, please also let us know whether the characters are in precomposed or decomposed format.  Precomposed characters use combined diacritics (e.g., n & ~ are combined to form: ñ).  Decomposed format separates the characters.
+Additionally, to handle the various ways a composite character could be displayed,
+normalization forms have been defined.  Normalization Form Decomposed (NFD) and Normalization Form Composed
+(NFC) are standardized forms for handling composite characters.  In NFD, every character that can be decomposed is converted to its most
+decomposed form following rules for canonical decomposition.  In NFC, the characters are first decomposed as in NFD, then composed into
+precomposed (composite) forms following canonical rules. This may result in
+the sequence of characters for a given character changing into an alternate,
+equivalent form.
+==Default==
+{| border="0" cellspacing="0" cellpadding="5" align="left" style="border-collapse:collapse;"
+|- style="background:#CCFFFF; font-size: 110%;"
+| Files are delivered in UTF-8 format through the website.
+|}
+<div style=clear:both></div><br>
+==links==
 <center><font size="4">[[RDA_1.1|1.1]] - [[RDA_1.2|1.2]] - [[RDA_1.3|1.3]] - [[RDA_1.4|1.4]] - [[RDA_1.5|1.5]]
 <hr>
 [[RDA_1.0|1.0]] - [[RDA_2.0|2.0]] - [[RDA_3.0|3.0]] - [[RDA_4.0|4.0]] - [[RDA_5.0|5.0]] - [[RDA_6.0|6.0]]</font></center>
 [[category:RDA Profile Guide]]

Difference between revisions of "RDA 1.2"

Latest revision as of 16:51, 28 March 2013

Contents

RDA 1.2: Records Delivered by Backstage

UTF-8 vs MARC-8 format

Default

links

Navigation menu

Views

Personal tools

Search

Navigation

Categories

Backstage Library Works

Tools