Deane W. Merrill
Nathan G. Parker
Harvard H. Holmes
Lawrence Berkeley National Laboratory
University of California
Berkeley CA 94720
[dwmerrill, ngparker, hhholmes]@lbl.gov
Valerie J. Gregg
U.S. Bureau of the Census
Washington DC 20233
D-Lib Magazine, March 1996
My daughter grew up in Holland (her mother and I were divorced when she was three). She recently arrived in the United States with her Chilean fiance, seeking a life style far from big cities - but with a racially diverse and youthful population, cheap land, and good job opportunities. She had a good atlas, maps from the AAA, and Chamber of Commerce brochures. From the Web, she had gotten information from local governments and commercial organizations. She and her fiance were ready to check out places firsthand, but they could not afford a lengthy Odyssey; they needed to quickly find a good place to live and work.
The day before their departure, she asked me for help. "We know the general area we want, but we don't know what route to take. The Web pages only talk about the tourist attractions and fabulous hotels where we can stay for $100 a night. Every brochure shows gorgeous scenery and smiling people saying their own city is the best place in the universe. The atlas tells me about total population and climate but nothing else."
I told her about 1990 U.S. Census LOOKUP. "I'm sorry I can't help you today, but you can figure it out yourself. Go to the LOOKUP home page at http://cedr.lbl.gov/cdrom/doc/lookup_doc.html. Summary Tape Files (STF) 3A and 1A have the data you need. Choose a state and then a few towns (places), using your atlas as a guide. Get the tables about age, race, and (Hispanic) ethnicity. LOOKUP can't tell you about current land values or salaries, but you can get a general idea from housing values and median rents in 1990, and from education and occupation."
I arrived home that evening too late to help her, but I was gratified to find pages of printouts that she had gleaned from LOOKUP. Penciled on the map was a route through the most interesting towns. The next morning, she and her fiance were on their way.
The preceding story, though simplified for brevity, is true. With no more than a Web browser and the guidelines I gave my daughter, one can negotiate LOOKUP and obtain similar information for any region of the United States.
P1. Persons(1) Universe: Persons P6. Race(5) Universe: Persons H23B. Median Value(1) Universe: Specified owner-occupied housing unitsand then "Submit."
In 1990, LBNL began acquiring CD-ROM juke boxes and providing Internet access to 1990 Census CD-ROMs, most of which were provided by the University of California Berkeley (UCB) Libraries. University of California Data Archive and Technical Assistance (UC DATA) installed additional juke boxes and provided user support. In 1992, Hiroaki Katayama, a visiting LBNL guest scientist, developed a PC menu system [Menu] which provides integrated access to the CD-ROM information system ( [CDROM] , [Merrill] ).
In 1993, Nathan Parker, an LBNL summer student, developed DBUTIL subroutines for accessing the Census dBase files. In 1994, he completed LOOKUP, which Chris Stuber installed at the Census Bureau. Beginning in 1995, Valerie Gregg obtained Census Bureau funding for additional hardware and software development and for support of LBNL. In 1996, all the STF1 and STF3 CD-ROMs used by LOOKUP were copied to disk drives at the Census Bureau. New Web-based data access tools include the TIGER Mapping Service (TMS), a thematic mapping system (Themapit), data maps and profiles for states and counties (DataMap), and the U.S. Gazetteer, which provides integrated access to LOOKUP and TMS.
Figure 1 shows LOOKUP usage by month, since August 1994. The lighter and darker bars represent, respectively, the number of different clients which accessed in each month (a) the LOOKUP server at the Census Bureau (b) the two LOOKUP servers at LBNL. The numbers above the bars represent thousands of clients. About two-thirds of the usage is at the Census Bureau's LOOKUP server.
As of Febrary 1996, LOOKUP was being accessed each month by about 28,000 different client computers. Usage had steadily increased by about 10 percent each month, with two exceptions which are evident in Figure 1:
Figure 2 (GIF format)
Figure 2 (PostScript format)
Detailed log files, of every LOOKUP session to date, are archived at LBNL. To preserve the confidentiality of the users, these are not publicly available. Some summary statistics, including Figures 1 and 2, are maintained at http://bigsur.lbl.gov/pub/lookup/stats/summary.html ). Other statistics, not all on line, provide the following information:
However, as a system manager of a Web server, you can improve performance for your users by locally installing LOOKUP. This is strongly encouraged, for faster response and for availability when other LOOKUP servers are unavailable. The remainder of this section is intended for UNIX system managers who are familiar with Web concepts.
A HyperText Transfer Protocol (HTTP) server must be running on the computer where LOOKUP is to be installed. LOOKUP is a Common Gateway Interface (CGI) program written in C. Installation instructions, and source code for SunOS and Solaris 2, are available at http://cedr.lbl.gov/cdrom/doc/install/lookup.html.
The 86 mount points described in Figure 1 must be locally mounted with Network File Services (NFS), preferably with an automounter (either amd or Sun's automounter). The necessary automount maps are publicly exported from LBNL, as described in http://cedr.lbl.gov/cdrom/doc/nfs.html. The local system manager must obtain a current copy of the public automount map and provide it to the local automounter.
The NFS locations change rather frequently. When a particular diskN mount point is temporarily unavailable, its entry is moved from the "online" files to the "offline" files described below. At the same time, the automount maps which are exported from LBNL are changed. To avoid the need for constant attention, the local system manager should arrange for the local copy of the automount map to be periodically updated automatically.
For efficiency and redundancy, duplicate public copies of the 1990 Census data are available. Sharing of the multiple copies by different LOOKUP servers is accomplished with three sets of files, which we call the "STF files," the "contents file," and the "NFS file." These are available at the following locations:
The contents file and the NFS file provide information for choosing the best configuration of the STF files. The possible choices can be tested to see which choice gives consistently the fastest and most reliable response. It may be useful to communicate with the system managers of the individual data servers.
The contents file lists all the publicly available mount points. The diskN tag is permanently assigned when a new mount point (either CD-ROM or disk) is registered at LBNL. For example, three copies of North Carolina STF3A data were registered as of February 1996: (#1) CD-ROMs exported from LBNL (#2) disk files exported from the Census Bureau (#3) disk files exported from North Carolina State University (NCSU).
1990 Census: STF 3A |NC(Alamance-Jackson) (#1)|disk173| 1990 Census: STF 3A |NC(Johnston-Yancey) (#1)|disk174| 1990 Census: STF 3A |NC(Alamance-Jackson) (#2)|disk389| 1990 Census: STF 3A |NC(Johnston-Yancey) (#2)|disk390| 1990 Census: STF 3A |NC(Alamance-Jackson) (#3)|disk295| 1990 Census: STF 3A |NC(Johnston-Yancey) (#3)|disk296|
The NFS file, which corresponds to the automount maps exported from LBNL, specifies the current NFS location of each mount point. As of February 1996 these were:
disk173|hibana.lbl.gov |/export/cdrom/cd029 disk174|hibana.lbl.gov |/export/cdrom/cd030 disk389|mercury.census.gov |/disk6/cd903a41 disk390|mercury.census.gov |/disk6/cd903a42 disk295|amani.ces.ncsu.edu |/info/www/census90/stf3a/41 disk296|amani.ces.ncsu.edu |/info/www/census90/stf3a/42
The STF files $STF_INFO_DIR/stf/*/*.loc are created locally when LOOKUP is installed, and should be modified by the local system manager. The STF files need to be changed only when LOOKUP is installed, or when a new mount point is registered, or when a mount point will be off line for an extended period. System problems and anticipated shutdowns are announced in http://cedr.lbl.gov/cdrom/doc/lookup/status.html. Upon request, you can be be notified by e-mail whenever this file is updated.
After you have installed LOOKUP, the local file $STF_INFO_DIR/stf/c90stf3a/nc.loc may contain
stf3a nc 37 ( 001 - 099 ) disk173 stf300nc.dbf stf3a nc 37 ( 101 - 199 ) disk174 stf300nc.dbfwhich instructs the local LOOKUP server to use the mount points disk173 and disk174 (i.e., CD-ROMs at LBNL) for North Carolina STF3A data. You should check the contents file and NFS file to see what mount points are currently registered. If disk389 and disk390 (the disk files exported from the Census Bureau) are currently on line, you can improve performance by changing the file $STF_INFO_DIR/stf/c90stf3a/nc.loc as follows:
stf3a nc 37 ( 001 - 099 ) disk389 stf300nc.dbf stf3a nc 37 ( 101 - 199 ) disk390 stf300nc.dbfIf disk295 and disk296 (the disk files exported from NCSU) are currently on line, you can change the file $STF_INFO_DIR/stf/c90stf3a/nc.loc as follows:
stf3a nc 37 ( 001 - 099 ) disk295 stf300nc.dbf stf3a nc 37 ( 101 - 199 ) disk296 stf300nc.dbfThis would be appropriate for LOOKUP servers which are close to NCSU.
For even better performance, selected CD-ROMs can be copied to a local disk, and the corresponding STF files changed appropriately.
The advantage of a central registry is that one can conveniently develop software (e.g. LOOKUP or the PC menu system) which simultaneously accesses data from dozens or even hundreds of mount points. The incremental cost of publicly exporting a mount point is small, compared with the costs of acquiring and maintaining the necessary hardware.
The authors wish to encourage additional sites to publicly NFS-export their non-restricted data files, and to publicly register them with LBNL. If this occurs, the sites now exporting data will not become overloaded, and the entire Internet community can participate in developing the next generation of integrated public data servers.
The data and resources used by LOOKUP were provided by the following organizations. The names of the principal contacts are indicated.
We are grateful to outside reviewers who have increased public awareness of LOOKUP: Michael Batty, Ryan Bernard, excite NetDirectory, McKinley Magellan, and Larry Schankman.
The content of this report is the sole responsibility of the authors and does not necessarily reflect the positions or the policies of the organizations listed above. No official endorsement should be inferred.
[Batty] Michael Batty. "The Computable City." Keynote Address: Fourth International Conference on Computers in Urban Planning and Urban Management Melbourne, Australia, July 11th - 14th, 1995. Available at http://www.geog.buffalo.edu/Geo666/batty/melbourne.html .
[Bernard] Ryan Bernard. "Premiere Site" review in Internet Business 500: The Top 500 Essential Sites for Business, Ventana Press, November 1995. Available at http://www.vmedia.com/cat/press/store/business/ .
[CD-ROM] The University of California CD-ROM Information System. Available at http://cedr.lbl.gov/cdrom/doc/cdrom.html .
[Datamap] DataMap, U.S. Bureau of the Census. Data maps and profiles for states and counties. Available at http://www.census.gov/ftp/pub/statab/www/profile.html .
[Davis] Glenn Davis. Cool Site of the Day, June 21, 1995: "1990 Census Data." Available at http://cool.infi.net/9506.html .
[Excite] Review in excite NetDirectory: "1990 U.S. Census LOOKUP." Available at http://www.excite.com/Subject/News_and_Reference/Libraries_and_Reference/General_Reference/ .
[Gazetteer] U.S. Gazetteer, U.S. Bureau of the Census. Available at http://www.census.gov/cgi-bin/gazetteer .
[Kahn] Jeffery Kahn. "Lab's Census Database Has Day of Fame." LBNL Currents, July 14, 1995. Available at http://www.lbl.gov/Publications/Currents/Archive/Jul-14-1995.html#RTFToC22 .
[LOOKUP] 1990 Census LOOKUP and its associated documentation are available at http://cedr.lbl.gov/cdrom/doc/lookup_doc.html .
[McKinley] McKinley Magellan three-star review: "1990 Census LOOKUP." Available at http://www.mckinley.com .
[Menu] Hiroaki Katayama, PC-based Menu System for Accessing CD-ROM Data. Available at http://cedr.lbl.gov/cdrom/doc/menu.html .
[Merrill] Deane Merrill, Nathan Parker, Fredric Gey, and Chris Stuber (1995). "The University of California CD-ROM Information System." Communications of the ACM 38(4) pp.51-52 (1995). Copyright © 1995 by ACM, Inc. Available at http://cedr.lbl.gov/mdocs/cacm9504/final.html .
[ORST] The Government Information Sharing Project, Oregon State University. Available at http://govinfo.kerr.orst.edu .
[Schankman] Larry Schankman. "Rest of the Best in Demographics: Census Data at the Lawrence Berkeley Laboratory." Available at http://www.clark.net/pub/lschank/web/census.html#rest .
This story was revised on April 14, 1997 by the Editor. At the request of one of the authors, personal names have been deleted to protect the privacy of the individuals concerned. The research substance of the story remains unchanged since initial release in March 1996.