1990 U.S. Census LOOKUP

Mining a Mountain of Data

Deane W. Merrill
Nathan G. Parker
Harvard H. Holmes
Lawrence Berkeley National Laboratory
University of California
Berkeley CA 94720
[dwmerrill, ngparker, hhholmes]@lbl.gov

Chris Stuber
Valerie J. Gregg
U.S. Bureau of the Census
Washington DC 20233
[chris.stuber, vgregg]@census.gov

D-Lib Magazine, March 1996

ISSN 1082-9873


Introduction

For those of us who are parents, it is rare and gratifying to be able to pull off a stunt that our children actually think is clever and useful. All the more so if we are engaged in intellectual research far removed from our children's interests. One of us (DM) recently had such an experience.

My daughter grew up in Holland (her mother and I were divorced when she was three). She recently arrived in the United States with her Chilean fiance, seeking a life style far from big cities - but with a racially diverse and youthful population, cheap land, and good job opportunities. She had a good atlas, maps from the AAA, and Chamber of Commerce brochures. From the Web, she had gotten information from local governments and commercial organizations. She and her fiance were ready to check out places firsthand, but they could not afford a lengthy Odyssey; they needed to quickly find a good place to live and work.

The day before their departure, she asked me for help. "We know the general area we want, but we don't know what route to take. The Web pages only talk about the tourist attractions and fabulous hotels where we can stay for $100 a night. Every brochure shows gorgeous scenery and smiling people saying their own city is the best place in the universe. The atlas tells me about total population and climate but nothing else."

I told her about 1990 U.S. Census LOOKUP. "I'm sorry I can't help you today, but you can figure it out yourself. Go to the LOOKUP home page at http://cedr.lbl.gov/cdrom/doc/lookup_doc.html. Summary Tape Files (STF) 3A and 1A have the data you need. Choose a state and then a few towns (places), using your atlas as a guide. Get the tables about age, race, and (Hispanic) ethnicity. LOOKUP can't tell you about current land values or salaries, but you can get a general idea from housing values and median rents in 1990, and from education and occupation."

I arrived home that evening too late to help her, but I was gratified to find pages of printouts that she had gleaned from LOOKUP. Penciled on the map was a route through the most interesting towns. The next morning, she and her fiance were on their way.

The preceding story, though simplified for brevity, is true. With no more than a Web browser and the guidelines I gave my daughter, one can negotiate LOOKUP and obtain similar information for any region of the United States.


Data in LOOKUP

Other excellent Web-based tools exist ( [Gazetteer], [Datamap], [ORST] ) which provide simple access to profiles of selected data. Compared with those tools, LOOKUP is more difficult to use, but for good reason: only LOOKUP provides access to nationwide U.S. Census data below the county level. The data are derived from 86 CD-ROMs (about 43 gigabytes) which can be purchased from the U.S. Bureau of the Census. LOOKUP provides Web access to almost two billion numeric data values, as described in Table 1 .


An example

In detail, we illustrate the use of LOOKUP to obtain racial characteristics and median family income for the Census tracts of a small rural county (Trinity County, California). Such information might interest, for example, persons involved in political redistricting, or commercial marketing activities, or real estate sales, or demographic research.

  1. With your Web browser, go to the LOOKUP home page at http://cedr.lbl.gov/cdrom/doc/lookup_doc.html. Select any one of the available LOOKUP browsers, for example the one at the U.S. Bureau of the Census.

  2. Select Summary Tape File 1A (STF1A), which contains basic demographic variables from the "100% sample" or "short form" census tabulations, for small geographic areas including census tracts. (Other variables, not needed in this example, are in STF3A.)

  3. In the Census, tracts are organized by county. From the list of states, select "California" and "level State--County" and then "Submit." Then select "Trinity County" and "level State--County--Census Tract" and then "Submit." You will see the list of (four) census tracts in Trinity County, California. Choose "Select/retrieve all of the areas below" and then "Submit." Then "Choose TABLES to retrieve (population, race breakdowns, etc.)" and "Submit."

  4. You will see a list of the 98 tables that are available in STF1A. Select the following three tables, which contain a total of seven variables:
                 
    P1.	Persons(1)	Universe: Persons             
    P6.	Race(5)		Universe: Persons             
    H23B.	Median Value(1)	Universe: Specified owner-occupied housing units             
    
    and then "Submit."

  5. Select "HTML format" (HyperText Markup Language) and "Submit." The data for the four tracts are shown in Table 2. Yes, those really are counts of single individuals, not thousands! For example, in 1990 there were only 53 African Americans (blacks) in Trinity County. All but five of them lived in Tract 1.

  6. Using your Web browser, you can "mouse" the information into your word processor, print the information, save the information in a disk file, or mail the information to your Internet address. Note: Some Web browsers have bugs which may prevent you from saving LOOKUP output data in a file. The known browser bugs, with some workarounds, are described in http://cedr.lbl.gov/cdrom/doc/lookup/bugs_2.html.

  7. For downloading into a spreadsheet or database, you can get the same data in a more convenient tab-delimited format. Return to the page where you selected "HTML format." This time select "Tab-delimited format" and "Submit." Table 3 shows the resulting data ready for downloading. Note: Some browsers and e-mail systems convert tabs into spaces, in which case the file may require editing before being loaded into your spreadsheet.

History

LOOKUP grew out of former projects at Lawrence Berkeley National Laboratory (LBNL), a Department of Energy installation. Between 1970 and 1985, The Socio-Economic Environmental Demographic Information System (SEEDIS) acquired and integrated 1970 and 1980 Census data for the Department of Labor. Between 1978 and 1995, the Populations at Risk to Environmental Pollution (PAREP) project added health data and other socioeconomic data for studies of geographic disease distributions.

In 1990, LBNL began acquiring CD-ROM juke boxes and providing Internet access to 1990 Census CD-ROMs, most of which were provided by the University of California Berkeley (UCB) Libraries. University of California Data Archive and Technical Assistance (UC DATA) installed additional juke boxes and provided user support. In 1992, Hiroaki Katayama, a visiting LBNL guest scientist, developed a PC menu system [Menu] which provides integrated access to the CD-ROM information system ( [CDROM] , [Merrill] ).

In 1993, Nathan Parker, an LBNL summer student, developed DBUTIL subroutines for accessing the Census dBase files. In 1994, he completed LOOKUP, which Chris Stuber installed at the Census Bureau. Beginning in 1995, Valerie Gregg obtained Census Bureau funding for additional hardware and software development and for support of LBNL. In 1996, all the STF1 and STF3 CD-ROMs used by LOOKUP were copied to disk drives at the Census Bureau. New Web-based data access tools include the TIGER Mapping Service (TMS), a thematic mapping system (Themapit), data maps and profiles for states and counties (DataMap), and the U.S. Gazetteer, which provides integrated access to LOOKUP and TMS.


Usage statistics

Figure 1 (GIF format)
Figure 1 (PostScript format)

Figure 1 shows LOOKUP usage by month, since August 1994. The lighter and darker bars represent, respectively, the number of different clients which accessed in each month (a) the LOOKUP server at the Census Bureau (b) the two LOOKUP servers at LBNL. The numbers above the bars represent thousands of clients. About two-thirds of the usage is at the Census Bureau's LOOKUP server.

As of Febrary 1996, LOOKUP was being accessed each month by about 28,000 different client computers. Usage had steadily increased by about 10 percent each month, with two exceptions which are evident in Figure 1:

Figure 2 (GIF format)
Figure 2 (PostScript format)

Detailed log files, of every LOOKUP session to date, are archived at LBNL. To preserve the confidentiality of the users, these are not publicly available. Some summary statistics, including Figures 1 and 2, are maintained at http://bigsur.lbl.gov/pub/lookup/stats/summary.html ). Other statistics, not all on line, provide the following information:


How to obtain and install LOOKUP

To run LOOKUP as a user, no installation is required. You need only a World Wide Web (WWW) browser capable of handling forms. Follow the instructions under "An example."

However, as a system manager of a Web server, you can improve performance for your users by locally installing LOOKUP. This is strongly encouraged, for faster response and for availability when other LOOKUP servers are unavailable. The remainder of this section is intended for UNIX system managers who are familiar with Web concepts.

A HyperText Transfer Protocol (HTTP) server must be running on the computer where LOOKUP is to be installed. LOOKUP is a Common Gateway Interface (CGI) program written in C. Installation instructions, and source code for SunOS and Solaris 2, are available at http://cedr.lbl.gov/cdrom/doc/install/lookup.html.

The 86 mount points described in Figure 1 must be locally mounted with Network File Services (NFS), preferably with an automounter (either amd or Sun's automounter). The necessary automount maps are publicly exported from LBNL, as described in http://cedr.lbl.gov/cdrom/doc/nfs.html. The local system manager must obtain a current copy of the public automount map and provide it to the local automounter.

The NFS locations change rather frequently. When a particular diskN mount point is temporarily unavailable, its entry is moved from the "online" files to the "offline" files described below. At the same time, the automount maps which are exported from LBNL are changed. To avoid the need for constant attention, the local system manager should arrange for the local copy of the automount map to be periodically updated automatically.


How to configure the local version of LOOKUP

For efficiency and redundancy, duplicate public copies of the 1990 Census data are available. Sharing of the multiple copies by different LOOKUP servers is accomplished with three sets of files, which we call the "STF files," the "contents file," and the "NFS file." These are available at the following locations:

STF files (local LBNL version)
http://cedr.lbl.gov/data1/stf/ (see the files */*.loc)

contents file (database order, online)
http://cedr.lbl.gov/mpub/cdrom/install/contents.online.html
contents file (database order, offline)
http://cedr.lbl.gov/mpub/cdrom/install/contents.offline.html

contents file (diskN order, online)
http://cedr.lbl.gov/mpub/cdrom/install/contentu.online.html
contents file (diskN order, offline)
http://cedr.lbl.gov/mpub/cdrom/install/contentu.offline.html

NFS file (diskN order, online)
http://cedr.lbl.gov/mpub/cdrom/install/nfsloc.online.html
NFS file (diskN order, offline)
http://cedr.lbl.gov/mpub/cdrom/install/nfsloc.offline.html

The contents file and the NFS file provide information for choosing the best configuration of the STF files. The possible choices can be tested to see which choice gives consistently the fastest and most reliable response. It may be useful to communicate with the system managers of the individual data servers.

The contents file lists all the publicly available mount points. The diskN tag is permanently assigned when a new mount point (either CD-ROM or disk) is registered at LBNL. For example, three copies of North Carolina STF3A data were registered as of February 1996: (#1) CD-ROMs exported from LBNL (#2) disk files exported from the Census Bureau (#3) disk files exported from North Carolina State University (NCSU).

             
1990 Census: STF 3A       |NC(Alamance-Jackson)    (#1)|disk173|             
1990 Census: STF 3A       |NC(Johnston-Yancey)     (#1)|disk174|             
1990 Census: STF 3A       |NC(Alamance-Jackson)    (#2)|disk389|             
1990 Census: STF 3A       |NC(Johnston-Yancey)     (#2)|disk390|             
1990 Census: STF 3A       |NC(Alamance-Jackson)    (#3)|disk295|             
1990 Census: STF 3A       |NC(Johnston-Yancey)     (#3)|disk296|             

The NFS file, which corresponds to the automount maps exported from LBNL, specifies the current NFS location of each mount point. As of February 1996 these were:

             
disk173|hibana.lbl.gov        |/export/cdrom/cd029                  
disk174|hibana.lbl.gov        |/export/cdrom/cd030                  
disk389|mercury.census.gov    |/disk6/cd903a41                 
disk390|mercury.census.gov    |/disk6/cd903a42                 
disk295|amani.ces.ncsu.edu    |/info/www/census90/stf3a/41                    
disk296|amani.ces.ncsu.edu    |/info/www/census90/stf3a/42                    

The STF files $STF_INFO_DIR/stf/*/*.loc are created locally when LOOKUP is installed, and should be modified by the local system manager. The STF files need to be changed only when LOOKUP is installed, or when a new mount point is registered, or when a mount point will be off line for an extended period. System problems and anticipated shutdowns are announced in http://cedr.lbl.gov/cdrom/doc/lookup/status.html. Upon request, you can be be notified by e-mail whenever this file is updated.

After you have installed LOOKUP, the local file $STF_INFO_DIR/stf/c90stf3a/nc.loc may contain

             
stf3a nc 37 ( 001 - 099 ) disk173 stf300nc.dbf             
stf3a nc 37 ( 101 - 199 ) disk174 stf300nc.dbf             
which instructs the local LOOKUP server to use the mount points disk173 and disk174 (i.e., CD-ROMs at LBNL) for North Carolina STF3A data. You should check the contents file and NFS file to see what mount points are currently registered. If disk389 and disk390 (the disk files exported from the Census Bureau) are currently on line, you can improve performance by changing the file $STF_INFO_DIR/stf/c90stf3a/nc.loc as follows:
             
stf3a nc 37 ( 001 - 099 ) disk389 stf300nc.dbf             
stf3a nc 37 ( 101 - 199 ) disk390 stf300nc.dbf             
If disk295 and disk296 (the disk files exported from NCSU) are currently on line, you can change the file $STF_INFO_DIR/stf/c90stf3a/nc.loc as follows:
             
stf3a nc 37 ( 001 - 099 ) disk295 stf300nc.dbf             
stf3a nc 37 ( 101 - 199 ) disk296 stf300nc.dbf             
This would be appropriate for LOOKUP servers which are close to NCSU.

For even better performance, selected CD-ROMs can be copied to a local disk, and the corresponding STF files changed appropriately.


A plea for cooperation

The contents file and NFS file contain all the mount points which data servers have chosen to register with LBNL for shared public access. For a list of data servers, see "Acknowledgments." Over 400 mount points were registered as of February 1996, including the 86 used by LOOKUP. Many of these are being used for experimental development of new Web-based data servers.

The advantage of a central registry is that one can conveniently develop software (e.g. LOOKUP or the PC menu system) which simultaneously accesses data from dozens or even hundreds of mount points. The incremental cost of publicly exporting a mount point is small, compared with the costs of acquiring and maintaining the necessary hardware.

The authors wish to encourage additional sites to publicly NFS-export their non-restricted data files, and to publicly register them with LBNL. If this occurs, the sites now exporting data will not become overloaded, and the entire Internet community can participate in developing the next generation of integrated public data servers.


Acknowledgments

LOOKUP is supported by the U.S. Bureau of the Census under an amendment to Interagency Agreement IA-5-26, project 36-00-50-4755-00-259, Deane Merrill, Principal Investigator. Earlier development was supported by the Office of Epidemiologic Studies; Office of the Deputy Assistant Secretary for Health Studies; Office of Environment, Safety and Health; U.S. Department of Energy under Contract Number DE-AC03-76SF00098.

The data and resources used by LOOKUP were provided by the following organizations. The names of the principal contacts are indicated.

We are grateful to outside reviewers who have increased public awareness of LOOKUP: Michael Batty, Ryan Bernard, excite NetDirectory, McKinley Magellan, and Larry Schankman.

The content of this report is the sole responsibility of the authors and does not necessarily reflect the positions or the policies of the organizations listed above. No official endorsement should be inferred.


References

[Batty] Michael Batty. "The Computable City." Keynote Address: Fourth International Conference on Computers in Urban Planning and Urban Management Melbourne, Australia, July 11th - 14th, 1995. Available at http://www.geog.buffalo.edu/Geo666/batty/melbourne.html .

[Bernard] Ryan Bernard. "Premiere Site" review in Internet Business 500: The Top 500 Essential Sites for Business, Ventana Press, November 1995. Available at http://www.vmedia.com/cat/press/store/business/ .

[CD-ROM] The University of California CD-ROM Information System. Available at http://cedr.lbl.gov/cdrom/doc/cdrom.html .

[Datamap] DataMap, U.S. Bureau of the Census. Data maps and profiles for states and counties. Available at http://www.census.gov/ftp/pub/statab/www/profile.html .

[Davis] Glenn Davis. Cool Site of the Day, June 21, 1995: "1990 Census Data." Available at http://cool.infi.net/9506.html .

[Excite] Review in excite NetDirectory: "1990 U.S. Census LOOKUP." Available at http://www.excite.com/Subject/News_and_Reference/Libraries_and_Reference/General_Reference/ .

[Gazetteer] U.S. Gazetteer, U.S. Bureau of the Census. Available at http://www.census.gov/cgi-bin/gazetteer .

[Kahn] Jeffery Kahn. "Lab's Census Database Has Day of Fame." LBNL Currents, July 14, 1995. Available at http://www.lbl.gov/Publications/Currents/Archive/Jul-14-1995.html#RTFToC22 .

[LOOKUP] 1990 Census LOOKUP and its associated documentation are available at http://cedr.lbl.gov/cdrom/doc/lookup_doc.html .

[McKinley] McKinley Magellan three-star review: "1990 Census LOOKUP." Available at http://www.mckinley.com .

[Menu] Hiroaki Katayama, PC-based Menu System for Accessing CD-ROM Data. Available at http://cedr.lbl.gov/cdrom/doc/menu.html .

[Merrill] Deane Merrill, Nathan Parker, Fredric Gey, and Chris Stuber (1995). "The University of California CD-ROM Information System." Communications of the ACM 38(4) pp.51-52 (1995). Copyright © 1995 by ACM, Inc. Available at http://cedr.lbl.gov/mdocs/cacm9504/final.html .

[ORST] The Government Information Sharing Project, Oregon State University. Available at http://govinfo.kerr.orst.edu .

[Schankman] Larry Schankman. "Rest of the Best in Demographics: Census Data at the Lawrence Berkeley Laboratory." Available at http://www.clark.net/pub/lschank/web/census.html#rest .


Copyright © 1996 Deane W. Merrill, Nathan G. Parker, Harvard H. Holmes, Chris Stuber, Valerie J. Gregg

This story was revised on April 14, 1997 by the Editor. At the request of one of the authors, personal names have been deleted to protect the privacy of the individuals concerned. The research substance of the story remains unchanged since initial release in March 1996.


D-Lib Home Page |  D-Lib Magazine Contents Page | Comments
Next Story

hdl://cnri.dlib/march96-merrill
http://cedr.lbl.gov/mdocs/dlib/03merrill.html 3/7/96