Difference between revisions of "Terminology-C"
(New page: ==Terminology… C== ===Canadiana=== ''see NLC Canadiana Files'' ===Chronological conversion=== <br>MARS 2.0 Authority Cleanup uses a table to convert chronological headings ($y) to thei...) |
|||
Line 37: | Line 37: | ||
===Collocate=== | ===Collocate=== | ||
Collocation is defined as a sequence of words or [http://en.wikipedia.org/wiki/Terminology terms] which [http://en.wikipedia.org/wiki/Co-occurrence co-occur] more often than would be expected by chance. It refers to the restrictions on how words can be used together. (wp) | Collocation is defined as a sequence of words or [http://en.wikipedia.org/wiki/Terminology terms] which [http://en.wikipedia.org/wiki/Co-occurrence co-occur] more often than would be expected by chance. It refers to the restrictions on how words can be used together. (wp) | ||
+ | |||
+ | ===Composed vs Decomposed Characters=== | ||
+ | What we<nowiki>’</nowiki>re talking about here is decomposed characters and the process of translation between UTF-8 and MARC-8 formats. | ||
+ | |||
+ | At the 2007 ALA Midwinter meeting, the Library of Congress announced the change to UTF-8 as the internal data exchange format for their database. The standard change came from LC<nowiki>’</nowiki>s migration to a Voyager-based server environment. As a Unicode format, UTF-8 holds an advantage over MARC-8 in allowing a broader range of languages and characters. | ||
+ | |||
+ | Backstage Library Works followed LC<nowiki>’</nowiki>s lead in developing MARS 2.0 by making UTF-8 our internal data exchange format. If your ILS and institutional policies allow, we recommend that you utilize the Unicode capabilities and switch your data over to UTF-8 for greater compatibility with the Library of Congress standard. | ||
+ | |||
+ | If it is not possible for your library to convert to a UTF-8 data exchange, MARS does have the capability to receive and deliver data in MARC-8. However, translating between UTF-8 and MARC-8 can be problematic because there are multiple ways to represent some characters, particulaly in a character<nowiki>’</nowiki>s level of '''composition''' or '''decomposition'''. | ||
+ | |||
+ | Characters with diacritical marks can generally be represented either as a single, '''composed''' character or as a '''decomposed''' sequence of a base letter plus one or more non-spacing marks. For example, a Spanish <nowiki>’</nowiki>''' ñ''' <nowiki>’</nowiki> can be a self-contained, composed character, separate from the English 26-letter alphabet, or it can be made up of the two decomposed elements — a standard <nowiki>’</nowiki> '''n''' <nowiki>’</nowiki> and a tilde, <nowiki>’</nowiki> '''<nowiki>~</nowiki>''' <nowiki>’</nowiki> — sharing the same display space. | ||
+ | |||
+ | In theory, both methods should display the same. But in practice, the appearance of composed and decomposed characters can vary depending upon what rendering engine and fonts are being used on the display end. The Library of Congress uses the decomposed sequence when creating a Unicode character. | ||
+ | |||
+ | In the example above, the <nowiki>’</nowiki>''' ñ''' <nowiki>’</nowiki> character contains two elements and must be either entirely composed or completely decomposed. There is no in-between state. In Korean, a character may contain several elements with multiple possibilities for combining the characters into composed subsets. We sometimes run into problems identfying the right level of composition/decomposition in the translation from MARC-8 to UTF-8 for authority matching and back to MARC-8 again for delivery to your system. | ||
+ | |||
+ | Our programmers have completed an enhancement that will allow characters, as they convert, to stop at the correct level of translation for MARC-8 compatibility. You should no longer see blank fields in your authority records that have Korean representations. However, you may run into other diacritic representations with similar problems. Please contact your project manager to bring this to our attention when this occurs. | ||
===Compound Heading=== | ===Compound Heading=== |
Revision as of 09:03, 11 November 2008
Contents
- 1 Terminology… C
- 1.1 Canadiana
- 1.2 Chronological conversion
- 1.3 CJK
- 1.4 Collapsed Report Format
- 1.5 Collocate
- 1.6 Composed vs Decomposed Characters
- 1.7 Compound Heading
- 1.8 Conference name heading
- 1.9 Corporate name heading
- 1.10 CONSER – Cooperative Online Serials
- 1.11 Control Number
- 1.12 Controlled Vocabulary
- 1.13 Cross-references
- 1.14 Current Cataloging
Terminology… C
Canadiana
see NLC Canadiana Files
Chronological conversion
MARS 2.0 Authority Cleanup uses a table to convert chronological headings ($y) to their correct form. Corrections are made to spelling and punctuation as well as to format: (mpg)
Subdivision | Changes to | In Field / Subfield |
$yTwentieth century | $y20th century | LC 6XX fields |
$z20th century | $y20th century | LC 6XX fields |
$y20th centry | $y20th century | LC 6XX fields |
CJK
Chinese-Japanese-Korean bibliographic records. CJK is a collective term for Chinese, Japanese, and Korean, which constitute the main East Asian languages. The term is used in the field of software and communications internationalization. (wp)
Collapsed Report Format
If bib control numbers are not included in your MARS 2.0 Bibliographic Reports, a heading contained in two or more bibliographic records will appear in the report only once. This type of report is in collapsed format. Collapsed-format reports are usually requested by libraries with a local system that offers robust global update tools. (mpg)
Collocate
Collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance. It refers to the restrictions on how words can be used together. (wp)
Composed vs Decomposed Characters
What we’re talking about here is decomposed characters and the process of translation between UTF-8 and MARC-8 formats.
At the 2007 ALA Midwinter meeting, the Library of Congress announced the change to UTF-8 as the internal data exchange format for their database. The standard change came from LC’s migration to a Voyager-based server environment. As a Unicode format, UTF-8 holds an advantage over MARC-8 in allowing a broader range of languages and characters.
Backstage Library Works followed LC’s lead in developing MARS 2.0 by making UTF-8 our internal data exchange format. If your ILS and institutional policies allow, we recommend that you utilize the Unicode capabilities and switch your data over to UTF-8 for greater compatibility with the Library of Congress standard.
If it is not possible for your library to convert to a UTF-8 data exchange, MARS does have the capability to receive and deliver data in MARC-8. However, translating between UTF-8 and MARC-8 can be problematic because there are multiple ways to represent some characters, particulaly in a character’s level of composition or decomposition.
Characters with diacritical marks can generally be represented either as a single, composed character or as a decomposed sequence of a base letter plus one or more non-spacing marks. For example, a Spanish ’ ñ ’ can be a self-contained, composed character, separate from the English 26-letter alphabet, or it can be made up of the two decomposed elements — a standard ’ n ’ and a tilde, ’ ~ ’ — sharing the same display space.
In theory, both methods should display the same. But in practice, the appearance of composed and decomposed characters can vary depending upon what rendering engine and fonts are being used on the display end. The Library of Congress uses the decomposed sequence when creating a Unicode character.
In the example above, the ’ ñ ’ character contains two elements and must be either entirely composed or completely decomposed. There is no in-between state. In Korean, a character may contain several elements with multiple possibilities for combining the characters into composed subsets. We sometimes run into problems identfying the right level of composition/decomposition in the translation from MARC-8 to UTF-8 for authority matching and back to MARC-8 again for delivery to your system.
Our programmers have completed an enhancement that will allow characters, as they convert, to stop at the correct level of translation for MARC-8 compatibility. You should no longer see blank fields in your authority records that have Korean representations. However, you may run into other diacritic representations with similar problems. Please contact your project manager to bring this to our attention when this occurs.
Compound Heading
see Name/title heading
Conference name heading
Also known as meeting name heading. The conference name heading is used in a name or name/title heading in established heading records that describe a particular meeting or conference that is involved with works published from that conference. The tag designation is X11. (m21,ac)
Corporate name heading
Corporate name used in a name, name/title, or extended subject heading in established heading records. In an established heading record, field 110 contains the established form of a corporate name. (m21)
CONSER – Cooperative Online Serials
CONSER is a cooperative online serials cataloging program. CONSER began in the early 1970s as a project to convert manual serial cataloging into machine-readable records and has evolved into an ongoing program to create and maintain high quality bibliographic records for serials. In keeping with its evolution, the name was changed in 1986 from the CONSER (CONversion of SERials) Project to the CONSER (Cooperative ONline SERials) Program. In October 1997, CONSER became a bibliographic component of the Program for Cooperative Cataloging. (PCC web sit)
===Contiguous pairs of subfields=== A subfield string will have more than one subfield constructed in a hierarchal order. The meaning of the string is contingent on the combination of subfields often referred to as contiguous pairs of subfields. (ac, dictionary)
Control Number
Record control number and other coded information used in the processing of MARC authority records. These fields have no indicators or subfield codes. Control number is assigned by the organization creating, using, or distributing the record. The control number is found in the 001 tag of the bibliographic and authority record. The MARC code identifying whose system control number is present in field 001 is contained in field 003 (Control Number Identifier). (m21)
Controlled Vocabulary
Controlled vocabularies (CV) provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri and taxonomies. Controlled vocabulary schemes mandate the use of predefined, authorized terms that have been preselected by the designer of the vocabulary, in contrast to natural language vocabularies, where there is no restriction on the vocabulary. (wp)
Cross-references
A cross-reference is an instance within a MARC authority record which refers to related or synonymous information elsewhere. The term "cross-reference" in MARC format is designated by the 5XX tag of the authority records and is often called a see also reference. Cross-referencing is used in the MARC record to link to another piece of work that is of related interest. (ac) Cross references are other forms of the name (or title) that might appear in the catalog. There are two types of cross-references: see references which reference forms of the name (or title) that have been deprecated in favor of the authorized form; and see also references, which point to other forms of the name (or title) that are authorized. See also references are most commonly used to point to earlier or later forms of a name (or title). (wp)
Current Cataloging
One of two on-going automated authority control services offered by offered through the MARS 2.0 software. The Current Cataloging Service provides automated Authority Control on an accelerated schedule—weekly, monthly, quarterly, annually or some other frequency determined by you—with rapid record turnaround. The Current Cataloging Service provides both Authority Control for the headings in your current cataloging records and the matching authority records. You can tailor MARS 2.0 profiles to support local requirements. The first phase of a Current Cataloging run is standard MARS 2.0 Bibliographic Validation processing. Elements of the MARC21 structure are validated, updated or corrected, as appropriate. Next, MARS 2.0 updates and corrects various heading subdivisions. Then, MARS 2.0 compares each heading against the national authority files specified. (mpg)