Home > Database Licensing
Database Licensing


Most Recent File Format Changes

On July 14, 2004, ProQuest made minor technical changes to files distributed in DCIMARC and SGML/ODF formats. This improvement allowed us to include valuable data elements provided by Micromedia ProQuest.

Some records in our DCIMARC format began to include new fields:

Tag 205--Author Role Indicator(s)
Tag 210--Corporate Author(s)
Tag 215--Corporate Author Role Indicator(s)
Tag 727--CanCorp Number(s)

NOTE: Dissertation Abstracts files were not affected by these changes.

Records in SGML/ODF format were restructured, and began to include new tags:

<SPECFS>...</SPECFS>: a Special Features "container", surrounding one or more <SPECF>...</SPECF> tag pairs, each of which contains a term indicating the presence of some Special Feature (such as "Graphs") in the original document.
<DTY>Company CanCorp</DTY>: a new Index Term Type appearing under the <ITY>Company Names</ITY> section heading.
<AUNAME>...</AUNAME>: Personal Author Name.
<AURL>...</AURL>: Personal Author Role Indicator.
<CORPAUS>...</CORPAUS>: Corporate Author section.
<AUCORP>...</AUCORP>: (individual) Corporate Author information.
<AUCORPNAME>...</AUCORPNAME>: Corporate Author Name.
<AUCORPRL>...</AUCORPRL>: Corporate Author Role Indicator.

In addition, <AU>...</AU> tag pairs no longer directly contained author names--instead, they began to enclose the new <AUNAME>...</AUNAME> and <AURL>...</AURL> tag pairs.

Links to file samples appear below.  Prior to the change, customers were asked to examine and test the file(s) corresponding to the format(s) of the data they received, and to contact us regarding their state of readiness to process and to display such records. Both files contain 153 records, and include several occurrences of the new or changed fields listed above.

Other Sample Files and Documentation

ISO-8859-1 Character Support

On October 19, 2003, ProQuest made a technical change to files distributed in DCIMARC and SGML/ODF formats.  This improvement made possible a more faithful representation of text appearing entirely in French or Spanish, as well as those terms/phrases (e.g., "El Niño") that sometimes appear in documents consisting primarily of English text.

NOTE: Dissertation Abstracts files were not affected by this change.

Prior to this change, data in ProQuest's Database Licensing files had only included members of the US-ASCII character repertoire.  But now we distribute data containing members of an expanded character repertoire: ISO-8859-1.  This expanded repertoire--a superset of US-ASCII--allows our data to include diacritics that regularly appear in French, Spanish, German, and other languages of European origin that are written in Latin script.  We are also able to include a few other special characters, such as symbols for the pound ("£") or the yen ("¥").

At present, our data includes the following 8-bit characters (having hex values ranging from 20 through FF). Some characters below (such as the "soft hyphen"--hex AD) might not be properly displayed or printed by some browsers:

 !"#$%&'()*+,-./       [ the character in the first column is an ordinary space (hex 20) ]

0123456789:;<=>?

@ABCDEFGHIJKLMNO

PQRSTUVWXYZ[\]^_

`abcdefghijklmno

pqrstuvwxyz{|}~

 ¡¢£¤¥¦§¨©ª«¬­®¯        [ the character in the first column is a non-breaking space (hex A0) ]

°±²³´µ¶·¸¹º»¼½¾¿

ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ

ÐÑÒÓÔÕÖרÙÚÛÜÝÞß

àáâãäåæçèéêëìíîï

ðñòóôõö÷øùúûüýþÿ

Links to small handcrafted file samples appear below.  Prospective customers should examine and test the file(s) corresponding to their preferred data format, in order to determine their readiness to process and to display such records.  Both of the samples contain at least one occurrence of every one of the "new" characters resulting from our support of ISO-8859-1.  In the samples, these characters appear in the fields most likely to be populated by diacritics and special characters.  In general, the characters have been placed at the beginning of fields.  The resulting records don't make a great deal of sense in terms of their information content, but they do convey--in a concentrated form--the essence of what is new to our data.

At the moment there is no definite plan or timetable for ProQuest support of multiple-octet character encoding such as UTF-8 or ISO/IEC 10646 (related to the Unicode standard). For the time being, characters outside of the ISO-8859-1 repertoire are conveyed in ProQuest data by means of HTML entities or numeric character references.

Correction Record Processing

On July 30, 2001, ProQuest resumed the distribution of correction records.  These corrections are of two basic kinds.

Some corrections have the effect of removing full text from selected records distributed previously.  Although we generally strive to give our customers "more" rather than "less", we are legally bound to disable access to certain articles whenever we lose the rights to redistribute them.  Meeting these legal obligations is our first priority, and we are prepared to work closely with our customers to ensure that these corrections are processed correctly on their systems.

The other kind of records are those that people would ordinarily understand as "corrections".  These records fix errors in authors' names and in other bibliographic, indexing, and text elements.

With either kind of records, customers should take the same approach to correction records processing.  Whenever they receive correction records, customers should identify the corresponding records that already exist in their systems.  Those earlier records should then be replaced by the correction records.  In relatively rare cases, customers may receive correction records for which no prior records exist in their systems.  In such cases, the correction records should simply be dropped.  Additional details on the structure and processing of correction records can be found at various points in the Database Licensing Reference for Vault Feeds:

  • for correction records in SGML/ODF format, see p. 5.
  • for correction records in DCIMARC format, see p. 18, 27.

Whenever sets of correction records are ready for distribution to customers, they will be delivered according to the same schedule (monthly, weekly, or daily) as normal product updates.  These files of correction records will be given distinctive names, enabling them to be segregated and processed separately from normal product updates.  Correction record files will have names beginning with the letter "c".  Most customers receive product update files that have names beginning with the letter "f".  (Those customers whose files are given customized names need to contact us to negotiate the file naming scheme to be employed for corrections).

For the vast majority of our customers, corrections file naming will take place as in the following example.  Consider the case of Newspaper Abstracts.  A regular update file, for example, might be named "f4830717.zip", where "f483" represents our internal name for the product, and "0717" represents a date (July 17).

If any correction records files were generated on that date, those files would have been given (in order) the following names:
c483_1_0717.zip
c483_2_0717.zip
c483_3_0717.zip
c483_4_0717.zip
c483_5_0717.zip
c483_6_0717.zip
c483_7_0717.zip
c483_8_0717.zip
c483_9_0717.zip

For any given product, there can be as many as nine individual correction records files distributed on any given day.  Each of these corrections files can contain as many as 2,500 records.  These are strict system limits.  We'll generally send fewer than nine files; on some days there won't be any correction records files distributed at all.

Systems and programming staff can use some of the following files to develop and test the handling of correction records.  In the links below, "old format" refers to the structure of product updates sent prior to December, 1999.  "New format" refers to the structure of product updates sent since December, 1999.  The sample correction records files, if processed correctly, will make replacements to approximately one half of the records in the corresponding sample product update files.

This web page and its links will continue to be updated as needed, or whenever documentation is revised.

If you have questions regarding the documentation, sample data, or any other items discussed above, please contact Gerard Jendras, Senior Database Analyst and Special Projects Coordinator, Content Control (Gerard.Jendras@proquest.com or 1-800-521-0600, ext. 4044).

©2008, ProQuest LLC All rights reserved