Alan R. Tupek
Cathryn S. Dippo
D-Lib Magazine, December 1997
In May 1997, a one-stop shop for federal statistics, FedStats - http://www.fedstats.gov - was released to the public by the Office of Management and Budget. Official statistics from more than 70 federal agencies are now much easier and faster to find. Internet users can locate the statistics they need without having to know in advance the agency, or in some cases the agencies, that produce the data. The long-range goal for FedStats is to encourage the appropriate use of federal statistics, thereby helping to improve the quantitative literacy of the general public.
The development of FedStats has spawned several new research initiatives. For example, library science researchers have developed a multifaceted set of methodologies to investigate design improvements for FedStats (Hert and Marchionini 1997) - http://www.glue.umd.edu/~dlrg/blsreport/mainbls.html. Also, an interdisciplinary-applied research program is under development to encourage collaborative efforts that can be used by federal statistical agencies to improve the collection, analyses, and dissemination of statistical information. (NSF, forthcoming)
Over 240,000 unique visitors made over 440,000 visits to FedStats during its first six months of availability. The number of visits to FedStats exceeded that of all but the largest federal statistical agencies. FedStats has been widely praised in the media, the Internet community, and by many of the visitors to the site. A task force of the Interagency Council on Statistical Policy (ICSP) undertook the development of FedStats. Staff at the Bureau of the Census designed and developed the website and continues to maintain and enhance it. The Bureau of Labor Statistics (BLS) has supported usability testing of the site by library science researchers. The National Science Foundation (NSF) provided initial funds for setting up the site and chaired the FedStats task force. The fifteen agencies represented on the ICSP agreed to share the ongoing expenses associated with maintaining and enhancing the site. Also, during the past year, representatives of the federal statistical agencies have developed a framework for interdisciplinary-applied research for collecting, analyzing, and disseminating statistics under the Foundation's new Digital Government initiative described below.
FedStats' major features include:
A table of contents, called Subjects A to Z, that provides access to the wide range of statistics available from federal agencies. Subjects A to Z allow users to see the various sources of statistics for each subject area. For example, the user looking for information on income can now see that several types of income data are available from various agencies, including the Bureau of Labor Statistics, the Social Security Administration, and the Bureau of Economic Analysis.
A Keyword Search that allows users to search for statistics by keywords across the statistical agencies. All pages of federal agencies that pertain to statistics are indexed and catalogued by FedStats. Users can readily see and link to the pages with statistical information that contain their keywords.
Other features of FedStats include:
Regional Statistics - A collection of agency World Wide Web sites that provide easy access to state, metropolitan area, and other geographical statistical information. Most agencies provide a clickable map approach to access these data.
Agencies that Provide Statistics - A list of all agencies that provide statistics with links to the respective Home Pages and statistical subjects that are provided by each agency.
Statistical Programs of the U.S. Government - Adapted from the Office of Management and Budget report with the same name, this feature provides links to statistics available from federal agencies in fourteen broad topical areas.
Fast Facts - Includes links to the Federal Statistics Briefing Rooms and to the Statistical Abstract of the United States. The Federal Statistics Briefing Rooms on the White House World Wide Web site provide easy access to about 100 key current economic and social statistics from several federal agencies. The entire Statistical Abstract can be retrieved with an "Adobe Acrobat" reader, or the user can browse through frequently requested tables, State rankings, and USA statistics in brief.
Subject Matter Contacts - A collection of agency World Wide Web sites that provide contact names, telephone numbers, and e-mail addresses for questions about statistics.
Statistical Press Releases - A collection of agency World Wide Web sites that provide the latest statistical news releases.
Statistical Policy - Includes links to federal budget documents, statistical policy working papers, and selected Federal Register notices.
FedStats also provides Additional Links to government statistical agencies outside the United States and to other statistical resources.
A Feedback form permits users to comment on the site or to make suggestions.
Articles on FedStats have appeared in the Washington Post, the Wall Street Journal, and many local newspapers, including this praise from the Palm Beach Post:
"It's such a disappointment when you discover someone in Washington has actually been working to produce something useful. Hating the government is so much easier when using blind generalities. The FedStats site, which takes statistics from 70 agencies and compiles them at one location, works because it is well organized and easy to use..."
On the day after the public release of FedStats, C-SPAN conducted a nine-minute live interview with Sally Katzen of the Office of Management and Budget. She stated, "Today, a high school student has better access to key statistics than top government officials had five years ago." FedStats has gained recognition from several Internet organizations as one of the top sites on the World Wide Web. Lycos recently rated FedStats 96% for content, 90% for design, and 94% overall, making it one of the top-rated federal government sites and among the highest rated sites of all kinds.
Before FedStats was made available to the public, two library science researchers, Carol A. Hert, Indiana University, and Gary Marchionini, University of Maryland, conducted a study of several federal statistical websites, including FedStats. Their research was supported through an agreement with the Bureau of Labor Statistics. The main objectives of the study were to determine who would be expected to use FedStats, what types of tasks they would bring to the site, and what strategies they would use for finding statistical information. Hert and Marchionini used a multifaceted set of methodologies to investigate the usability of the site, including reviews of literature and existing websites, site mapping, document analysis, interviews with agency staff, focus groups with intermediaries, content analysis of e-mail requests, usability tests with potential end users, and transaction log analyses ( Hert and Marchionini 1997) -http://www.glue.umd.edu/~dlrg/blsreport/mainbls.html. Some changes were made to the FedStats site and some are still being worked on as a result of these studies. Hert and Marchionini are continuing their study of FedStats through a new agreement with the Bureau of Labor Statistics.
Digital Government Initiative
FedStats has made it easier to access U.S federal statistical information. However, federal agency sites vary considerably in the way they are organized, in data formats, in methods for data retrieval, and in the amounts and types of available metadata. Much remains to be done if we are to build an infrastructure that appears unified to the general public. Moreover, putting statistical data in the hands of those who do not know how to use it appropriately could lead to misuse of the information. (Dippo and Tupek 1997)
Fortunately, the statistical and survey research community enthusiastically responded to the call for white papers for a recent workshop, "Towards the Digital Government of the 21st Century." (http://www.isi.edu/nsf/final.html) The workshop, sponsored by the Applications Council of the National Science and Technology Committee's Subcommittee on Computing, Information, and Communications, was designed to identify and define a research agenda which could help to bridge the gap between information technology researchers and federal information services. As a result of the workshop, numerous computer and information science research areas were identified which could benefit the efforts of the federal statistical agencies to increase public access to and encourage appropriate use of existing statistical information.
Databases filled with numbers can provide accurate and useful information only if the following 3 conditions hold true:
Whatever the context, without metadata, a number has no meaning. In the case of sample survey data, an extensive set of metadata is needed, including sample design (e.g., sample size, stratification or not, clustered or not), data collection procedures (e.g., personal visit vs. mail, computer-assisted or not, self or proxy), weighting methods (e.g., nonresponse adjustment, post-stratification, use of ancillary data), and quality measures (e.g., response rate, variance estimates, response error measures), as well as the specific questions used to obtain the information. While conceptually associating metadata with data appears straightforward, implementation is not. The complexities of providing useful metadata to users of varying technical expertise, who wish to put the information to a variety of uses, requires research and the development of new metaphors for the integration and presentation of quantitative and qualitative information.
Tools for analyzing numerical data can range from basic arithmetic within spreadsheet software to complex statistical analysis packages like SAS or SPSS. Until very recently, the use of public use files containing data provided by individual respondents were used only by researchers with the facilities and skills necessary to access magnetic tapes. While today many such datasets are available on CD-ROM and some can be downloaded via the WWW, users, in general, must still have some type of analytical software and know how to use it with the data. Several database tabulation tools are being independently developed by federal agencies, including:
CASPAR (Computer-aided Science Policy and Research, a database that integrates data from several surveys of academic institutions (http://caspar.nsf.gov/webcaspar)
CDC-Wonder (Center for Disease Control's database of health information (http://wonder.cdc.gov))
DADS (Data Access and Dissemination System - a database of the decennial census, forthcoming)
FERRET (Federal Electronic Research and Review Extraction Tool - a database of Current Population Survey supplements (http://ferret.bls.census.gov))
SESTAT (Scientists and Engineers Statistics - a database of U.S. scientists and engineers (http://srsstats.sbe.nsf.gov))
Creating analytical tools for the general public, with varying levels of statistical literacy, is a fertile area for research. The current availability of such tools is minimal at best. There has been some work on artificial intelligence-based software for various regression methods, but even these systems are designed for professionals having at least an acquaintance with the statistical procedure, not the general public. Similarly, research efforts on data mining and knowledge acquisition are not focused on the general public as user.
Two types of tools that are focused on the general user are the hypermedia guided data tour and basic data visualization. The first type of tool is very much in its infancy. While statistical graphics has received some attention for many years, there is still much to be done to focus on the needs of the general public, particularly in the areas of how to communicate uncertainty and other types of metadata.
Interdisciplinary research is necessary to achieve the three conditions discussed above for turning numbers into information. Examples of relevant potential pilot projects that could fall under the Digital Government initiative are:
The public release of FedStats in May 1997 has been a phenomenal success by providing one-stop shopping for all federal statistics. Continued interdisciplinary research may not only permit users to find the statistics faster and easier but may also help users understand how best to use the statistics. FedStats may, therefore, become an icon for improving the quantitative literacy of all Americans.
1 Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Bureau of Labor Statistics. The authors would like to thank all participants from the fifteen federal agencies who helped create FedStats. Special thanks are due to Rachael Taylor, Bureau of the Census, for developing and maintaining the site.
Dippo, Cathryn S., and Tupek, Alan R. "Creating a National Statistical Information Infrastructure," Bulletin of the International Statistical Institute, forthcoming.
Hert, Carol A., and Marchionini, Gary. "Seeking Statistical Information in Federal Websites: Users, Tasks, Strategies, and Design Recommendations," Final Report to the Bureau of Labor Statistics, 1997. (http://www.glue.umd.edu/~dlrg/blsreport/mainbls.html)
Alan R. Tupek is Deputy Director, Division of Science Resources Studies, National Science Foundation and Chair of the interagency task force that developed FedStats.
Cathryn S. Dippo is the Assistant Commissioner for Survey Methods Research, Bureau of Labor Statistics and Chair of the Digital Government Initiative's FedStats Working Group.