Statistical Summary

From AC Wiki
Revision as of 08:04, 8 July 2009 by Nate (Talk | contribs)

Jump to: navigation, search

Statistical Summary

A MARS 2.0 Statistical Summary is generated for every project that involves processing bibliographic records for authority control. The Statistical Summary includes both high-level and detailed statistical information about the records processed. It also includes the number of times selected actions were taken and the number of headings that met certain criteria.

Five Sections

The statistical information is divided into five sections:

Section 1: Record Overview

A high-level view of the processed files. This section includes the number of bibliographic records by type (books, serials, etc.) and how many records were changed during MARS 2.0 processing.

Section 2: Field Distribution

A statistical analysis of the distribution of fields (by tag) within the bibliographic file. Included are how many records had none, one, or two instances of each field, and how many fields changed (by tag). Changes listed in this section correspond with Step 2 of the Planning Guide.

Section 3: Authority Control

Provides match-rate statistics for fields under authority control examined during Step 3 of the Planning Guide.

Section 4: Authority Control Processing

Counts of specific changes made, and conditions found, during Step 3 of the Planning Guide.

Section 5: MARC Update Processing

Counts of specific changes made, and conditions found, during Step 2 of the Planning Guide.

Sections 4 and 5 also serve as a list of the reports available. Those reports marked with an asterisk (following the report number) are available for all MARS 2.0 authority control projects at no additional cost.

Section 1 : Record Overview

The first section lists a breakdown of the bibliographic formats processed. These are broken down according to the value in the bibliographic record's leader, bytes 6 and 7. The Record Format Table (see below) lets you know what our system considers a Book or a Computer File, etc.

Format # of Records  % of File Changed  % Changed
Books (BK) 536 98.9 519 95.8
Continuing Resources (CR) 2 0.4 2 0.4
Mixed Materials (MX) 0 0.0 0 0.0
Music (MU) 2 0.4 1 0.2
Maps (MP) 0 0.0 0 0.0
Sound recording (MU) 0 0.0 0 0.0
Visual Materials (VM) 2 0.4 2 0.4
Computer Files (CF) 0 0.0 0 0.0
Other 0 0.0 0 0.0
Totals 542 100.0 524 96.7

The Number of Records column contains the different formats that your bibliographic file comprises. The next column to the right, Percent of File directly relates to the Number of Records.

Record Format Table

Type LDR 06 LDR 07
BK t or a a or c or d or m
CF m
MP e or f
MU c or d or i or j
CR a b or i or j
VM g or k or o or r
MP p

Section 2 : Field Distribution

The second section deals entirely with Step 2 of the Planning Guide, Bibliographic Validation. It lists all of the possible tags that could be affected by any changes made during the implementation of Step 2's processing.

At first glance, this section may be a little confusing trying to understand what each of the columns represent. Let's look at a sample part from this section:

Tag 0 Fields 1 Field 2+ Fields Total Fields Avg Fields Max Fields Changed Fields  % Changed
010 269 273 0 273 0.50 1 272 99.6
020 84 11 447 1089 2.01 12 0 0.0
... ... ... ... ... ... ... ... ...
100 170 372 0 372 0.69 1 192 51.6

The first tag in this sample part is the LCCN (010). That next column, 0 Fields, lets us know that there are 269 bibs with no 010 fields. Then there are 273 bibs with only one 010 field. And there are zero bibs with more than one 010 field. So right away we can see that out of our sample of 542 records, there are 273 records which have one and only one 010 field for processing.

The Max Fields column lets us know how many total fields actually appear in at least one record. Since we already know there are not any records with more than one 010 field, we know that the max fields number will only be 1.

The Changed Fields column tells us exactly how many of the 273 LCCN fields were changed in some way during the Step 2 processing. In this case, 272 of the 273 LCCN fields were modified, which gives us a Percent Changed of 99.6%. Many of these particular changes are most likely due to prefix spacing or formatting incorrect LCCN (hyphens, etc.).

When we next look at the entry for the ISBN (020), we immediately notice that nearly all of the columns have data in there. While there are 84 records which do not have an 020 field, there are just 11 records that have only one 020 field, and a substantially larger number (447) that have at least two 020 fields in there. In fact, looking at Max Fields we can see that there is at least one record with 12 ISBN fields in there. Since the Changed Fields is zero, there was no changes made to any of the 1,089 total 020 fields.

Field Distribution

Section 2 of the Statistical Summary is further subdivided according to the type of fields found in the bibliographic records. Here are the sections and some of the more significant fields listed within each section:

  • Control Fields - 0XX
    • 001, 005, 006, 007, 008
    • 010, 020, 022
    • 040, 041, 050, 090
  • Descriptive Fields - 2XX, 3XX
    • 240, 245, 246
    • 260, 300
  • Main & Added Entries - 1XX, 4XX, 7XX, 8XX
    • 100, 110, 111, 130
    • 440, 490
    • 700, 710, 711, 730, 740
    • 780, 785
    • 800, 810, 811, 830
    • 880
  • Subject Access Fields - 6XX
    • 600, 610, 611
    • 630
    • 650, 651
    • 655
  • Notes & Local Fields - 5XX, 9XX
    • 500, 501, 502, 504, 505, 510, 520, 521, 533
    • 910, 949, 987


Again, it is worth noting that even though a particular field may be listed (e.g., a 9XX field) within Section 2 of the Statistical Summary, that does not necessarily mean any changes were made to that field during processing. One of the purposes of Section 2 is to list every possible bibliographic field as well as any actual changes made to that field.