Stories

Mapping and Converting Essential Federal Geographic Data Committee (FGDC) Metadata into MARC21 and Dublin Core: Towards an Alternative to the FGDC Clearinghouse

Appendix A: NBII Data

These data are presented in nine worksheets in a Microsoft Excel 97 file. If you would like to download the data, it is availabe in a single zipped spreadsheet file, which contains nine seperate worksheet tabs (see the description below for each tab). For more detailed information, please contact the authors. [ download 147 KB ]

The key to understanding the worksheet tabs is the SGML list of 444 FGDC elements, given as line numbers in a tab labeled "Key." For example, line 14 = , which is the Abstract (FGDC 1.2.1) in the Identification information section.

Excel and other spreadsheets have a limit of about 240 columns, so we had to divide the 444 elements or line numbers into three parts: A) counts17 (sections 1 and 7) or lines 1-97 and 414-444, a division chosen because it corresponds to the two mandatory sections of FGDC; B) counts4 (section 4), or lines 169-332, chosen because it is the longest single section; and C) counts2356 (sections 2, 3, 5, and 6), or lines 98-168 and 333-413. Thus, the three parts have 128 + 164 + 152 = 444 lines.

The nine tables are interpreted as follows:

Key tab: two fields: the first is the element number, the second is the SGML element path in the input record.

Tagcounts tab: this table has only two columns: column A = the 444 FGDC elements and column B = frequency of use in the data set. For example, element 14 () was used in all 466 FGDC records in the data set.

Stts tab: statistical summary; five columns: column A = file names of the 466 FGDC records; B = list of line numbers (elements) that are used in that record; C = record size (characters, bytes); D = of the longest field in that record (characters, bytes); E = line number of longest field in record (e.g., 14 = ). Average, median, and maximum for record size and longest field are given at the end of the table. This is the most useful arrangement of the data for our purposes.

Counts17 tab: Identification and Metadata Reference sections (1 and 7): column A = file names of the 466 FGDC records; columns B-DY = lines 1-97 and 414-444.

Counts2356 tab: Data Quality, Spatial Data Organization, Entity and Attribute, and Distribution sections (2, 3, 5, and 6): column A = file names of the 466 FGDC records; columns B-EW = lines 98-168 and 333-413.

Counts4 tab: Spatial Reference section (4): column A = file names of the 466 FGDC records; columns B-FI = lines 169-332.

Maxs17 tab: sections 1 and 7; contains the maximum value for all the elements in that section for each record.

Maxs2356 tab: sections 2, 3, 5 and 6; contains the maximum value for all the elements in that section for each record.

Maxs4 tab: section 4; contains the maximum value for all the elements in that section for each particular record.

Contact Information

Adam Chandler
Systems Librarian
Energy and Environmental Information Resources Center
University of Louisiana at Lafayette
700 Cajundome Blvd.
Lafayette, LA 70506
web: web: <http://eeirc.nwrc.gov>
email: [email protected]
tel: 318-266-8697

Dan Foley
Metadata Librarian
Energy and Environmental Information Resources Center
University of Louisiana at Lafayette
700 Cajundome Blvd.
Lafayette, LA 70506
web: <http://eeirc.nwrc.gov>
email: [email protected]
tel: 318-266-8539

Alaaeldin M. Hafez
Research Scientist
Center for Advanced Computer Studies
University of Louisiana at Lafayette
P.O. Box 44330
Conference Center Room 459
Lafayette, LA 70504-4330 USA
web: <http://www.cacs.usl.edu/Departments/CACS/>
email: [email protected]
tel: 318-482-5791

This work is supported in part by a grant from the U.S. Department of Energy (under grant No. DE-FG02-97ER1220).

Back to story