D-Lib Magazine
spacer
The Magazine of Digital Library Research
spacer
transparent image

D-Lib Magazine

July/August 2015
Volume 21, Number 7/8
Table of Contents

 

Data Stewardship in the Earth Sciences

Robert R. Downs
Columbia University
rdowns@ciesin.columbia.edu

Ruth Duerr
University of Colorado at Boulder
rduerr@nsidc.org

Denise J. Hills
Geological Survey of Alabama
dhills@gsa.state.al.us

H. K. Ramapriyan
Science Systems and Applications, Inc. and NASA Goddard Space Flight Center
hampapuram.ramapriya@ssaihq.com

DOI: 10.1045/july2015-downs

 

Printer-friendly Version

 

Abstract

Science data collection and documentation practices have changed radically over the last several hundred years, most importantly since the advent of the digital age. Data centers, as repositories for that science data, only had their genesis in the early twentieth century, yet have in excess of 50 years experience managing digital science data. In the Earth Sciences, for the past 15+ years, the Federation of Earth Science Information Partners (ESIP) has been working to make Earth science data more discoverable, accessible, and usable by more people. As a part of this effort, the ESIP Data Stewardship Committee has worked on a variety of recommendations, best practices, and guidelines that have significantly moved data stewardship in the Earth sciences forward, with impacts ranging from influencing how data management is done within government agencies and by other data stewards, to providing guidelines for citation of Earth science data used by publishers in the Earth sciences. Completed and ongoing activities of the committee are described. Interested readers are invited to join our community.

 

1 Introduction

The evolution of practices for recording and reporting scientific observations that has occurred over the centuries provides insight into the evolution of science data management practices. Early practices of explorers for recording observations of the Earth and its life forms evolved over time to produce expedition reports, log books, journals, notes, notebooks, drawings, and maps—materials that could be and were referenced in the literature.

In the early seventeenth century, Sir Francis Bacon recommended collecting data by recording observations on both sea and land "for succinct display in tables, under topics, that permitted the mind to begin comparative judgments" [1]. The practice of recording data was adopted for record keeping, accounting for merchandise, and describing observations of natural philosophers [2]. While logbooks, in the form of tables, were used previously for accounting and other practices, logbooks were adopted and used routinely during the seventeenth century to record observations of sea voyages [3]. These early data collection instruments included a "daily record about distant geography, climate patterns, and geopolitics" [4] and evolved "into a methodical, concise, and adaptable document, validated by daily use" [4]. During expeditions of discovery, logbooks and notebooks were used to record measurements and observations, including observations obtained while on land [5], [3]. Journals, notes, and reports were authored by individual naturalists to record their own scientific observations on land as well as to record the accounts of others, in tables, notes, drawings, and illustrations [6], [2]. Like mariners and accountants, naturalists adopted the practice of organizing notes into columns to facilitate organization and enable calculations.

Prior to sponsored travel for scientific research, instructions for recording observations during voyages on sea and land were provided to travelers and published in the Transactions of the Royal Society [7]. The practice of recording and preserving observations at sea became a routine, yet important, practice that evolved over time as "seasoned captains took pains to preserve their personal notebooks, government officials soon required copies to be deposited in a central archive, and naval administrators incorporated such documents into the training regimes of junior sailors" [4]. The Royal Society also requested that travel observations be deposited for perusal by Society members in the "navigators' licensing office" [4]. Considered the public record of an expedition, logbooks and journals were often collected by the museum, library, or archive of the government entity, organization or individual that sponsored the expedition [5], [2]. Logbooks were recognized as valuable records that routinely contained "not just longitude columns and geographic data but notes about climate, tides and currents, and significant shipboard events, from encounters with other vessels to deaths" [4].

 

2 Science Data Centers

While scientific data centers and archives have evolved in various forms, formal recognition and international collaboration started with the establishment of the World Data Centers in 1957-58. The International Council for Science, formerly the International Council of Scientific Unions (ICSU), established the World Data Centers while planning for the International Geophysical Year (IGY) [8]. The IGY was a coordinated scientific effort to facilitate international observations of the Earth. The World Data Center (WDC) system was established to enable accessibility and long-term preservation of data collected for each Earth science discipline. Moreover, to ensure the long-term availability of the data, a series of WDCs were established on different continents for each discipline, where each disciplinary WDC contained copies of the holdings of its sibling WDCs [9]. Perhaps this practice of redundant data storage represents an early version of the Reich and Rosenthal principle, "Lots of Copies Keeps Stuff Safe," known as LOCKSS [10]. As the WDC system evolved, the number of WDCs expanded to include more science disciplines, yet funding for each WDC varied dramatically by country. In 2008, ICSU recommended replacing the WDC's with the World Data System (WDS), with the purpose of providing "a coordinated, professional approach to the management of scientific data and the production of services based on the data" [11].

While called Data Centers, the WDCs initially served as the libraries for their respective scientific disciplines. The content that was housed in the WDCs included the logbooks, reports, and other material gathered by researchers who participated in the discipline that was served. As such, these materials were routinely referenced in the literature — despite the fact that technically they were data. Often these data were 'published' in the form of books, consisting of several to a few tens of pages of descriptive text (i.e., metadata), followed by possibly hundreds of pages of tables, charts or diagrams.

The development of computer technology and its adoption by archives and data centers improved the capabilities of these facilities as well as the capabilities of their users. During the 1960s, data archives and data centers began converting and formatting data to enable analysis using computing technology. Social science and ecological archival facilities and data centers began adopting emerging standards for managing and disseminating data and documentation to facilitate data analysis with computers [12]. It is not clear why this simple act of transforming data from written products (e.g., logbooks and other gray literature) to digital materials caused the scientific community to cease routine citation of the sources they used; but, despite the best efforts of data centers, most of whom have recommended data citations for decades, only now is routine citation of source materials irrespective of form coming back into normal use.

In the meantime, data centers have continued to evolve, taking advantage of available technologies to improve capabilities for managing and enabling access to their data holdings. For example, during the 1980s and 1990s, data centers began distributing data on CD-ROM disks, via File Transfer Protocol (FTP), and by email, enabling recipients to order, process, and analyze data on local systems, workstations, and desktop computers. With the development of the World-Wide-Web, data centers have disseminated data as online products and services to enable access and analysis using desktop and laptop computers as well as other electronic devices, including handheld computers and other mobile devices. National and international committees have developed standards to offer guidance for improving data stewardship practices at data centers, archives, and other repositories [13], [14], [15].

Today, a wide variety of Earth science datasets from data centers around the world are discoverable using various search capabilities offered by the Global Change Master Directory (GCMD), the U.S. node of the International Data Network; the Global Earth Observing System of Systems (GEOSS) and other global efforts. There is considerable and ongoing emphasis on interoperability among data centers worldwide through a variety of forums including the Research Data Alliance, the ICSU Committee on Data for Science and Technology (CODATA), and other venues.

Data centers and other organizations, such as CODATA, also have been involved in data rescue activities to enable access to datasets that otherwise would be inaccessible or lost if not recovered. For example, Levitus [16] describes the efforts of data rescue projects conducted to recover oceanographic data that were collected prior to 1992.

 

3 National Data Centers in the U.S.

Several federal agencies in the United States (US) have data centers that are responsible for the management of data from their respective scientific programs. Generally, these are discipline-based, e.g., oceanography, climate studies, and solid Earth and often these have been collocated with one of the existing WDC's. Notable among the agencies managing large quantities and varieties of Earth science data are the National Aeronautics and Space Administration (NASA), the National Oceanic and Atmospheric Administration (NOAA), and the US Geological Survey (USGS). For instance, NASA established a set of Distributed Active Archive Centers (DAACs) as a part of the Earth Observing System (EOS) Data and Information System (EOSDIS) at the beginning of the EOS Program in the early 1990s. The DAACs, now 12 in number, manage most of NASA's multiple petabytes of Earth science data, providing stewardship for Earth science data throughout the entire data lifecycle "from data collection, to management of active data sets, to long-term archive" [17].

Similarly, NOAA maintains three major Data Centers, collectively termed the National Center for Environmental Information (NCEI) as well as a host of smaller centers of data, while the largest USGS Data Center, the Earth Resources Observation Systems (EROS) Data Center, is hosted in South Dakota. In all three cases, as is the case with many other Federal agencies, the majority of the data held by these organizations is available openly and freely to the public.

 

4 Open Data

During recent decades, the benefits of sharing scientific data have been recognized within various disciplines, including those that study Earth systems. For instance, NASA's Earth Science Data and Information Policy, calling for full and open data sharing, has been in effect since the early 1990s [18]. The Federation of Earth Science Information Partners (ESIP) data stewardship principles (described in the next section) also call for data sharing [19]. Data sharing practices have improved in the environmental sciences in light of ethical principles and the potential for comparative analysis [20]. During the past decade, the international scientific community has agreed on principles for sharing scientific data on Earth observations [21]. Similarly, scientific journals are adopting policies to ensure that the data described or used in a published report are available for use by others [22]. The US federal government's emphasis on open sharing of data can be seen through several directives from the White House and the Office of Management and Budget [23], [24].

Data sharing and citation, however, has not always been the norm. Through the years, restrictions have been placed on some Earth science data for different reasons, limiting their distribution, discoverability, accessibility, or use. National security concerns limited access to meteorological data during the cold war [8]. Scientific data that contain the locations of protected species and indigenous populations must be restricted to ensure that their locations are not disclosed to those who might harm them or destroy their habitat. Similarly, measures need to be deployed to protect the confidentiality of data containing personally identifiable information and data describing the locations of research subjects to reduce potential risks from disclosure or inappropriate use [25]. Likewise, data management efforts must protect the confidentiality of personal health information and locations of human research subjects and health care patients [26]. Such protective actions also need to consider the potential for security breaches and for efforts to combine data from multiple sources or reengineer data to identify individuals and locations that must remain confidential [27]. Data protection issues present an ethical challenge to data managers. On the one hand, they have to provide sufficient protection for confidential data about research subjects and other vulnerable populations and interests. On the other hand, they need to enable the scientific community to utilize previously collected research data for subsequent studies.

Proprietary concerns also may influence decisions to release data or to embargo data until sufficient time has passed to enable initial use by those who have invested in their collection and preparation. Producers of scientific data may be motivated to limit the redistribution of their data until after they have had sufficient opportunities to publish reports based on their data [28]. Some data producers also are reluctant to release their data in a timely manner, if ever, due to the amount of effort that would be required to prepare the data for use by others [29], [30]. Recognizing such concerns, studies, which have been conducted to examine the benefits of releasing scientific data, have found that publicly releasing scientific data is associated with increased citation of the articles that describe the studies that collected the data [31], [32]. A culture change is needed for the scientific community to adopt the practice of properly sharing and citing scientific data [33], [34], [35]; a change that is currently occurring with as yet undetermined results due to the increasing mandates by governments, funding agencies and publishers to encourage or require citation and sharing of all source materials, be they digital data, computer software, or traditional publications. In many cases, the new infrastructure required to support such practices is currently being developed [36] through the efforts of groups such as the Future of Research Communication and e-scholarship 2011 (FORCE11) Data Citation Implementation Group (DCIG) [37].

 

5 ESIP Federation

The ESIP Federation is an open networked community of data, science, and technology practitioners who have been working over the past 15 years to make Earth science data more discoverable, accessible and useful. With over 150 members, in four membership categories ranging from applications developers, to data and service providers, data repositories, and funding agencies, ESIP is an entirely volunteer and community driven organization which acts as a neutral forum for the community to gather to move the field forward.

Most of the activities undertaken by ESIP groups begin when a group of individuals decides that a particular topic is ripe for advancement and declares that they are forming a cluster. While many such ESIP activities are discipline or technology focused, for the past several years there has been significant interest in the ESIP community in moving the data management field forward. In fact there has been enough interest that what started simply as a cluster has now been chartered as the ESIP Data Stewardship Committee — a part of the management infrastructure by which ESIP members govern themselves.

Over the years the Data Stewardship Committee has tackled many topics producing a wide variety of products and outcomes. The first topic tackled by the group was an in-depth study of the wide variety of technologies used as identifiers in an effort to assess their efficacy for use with Earth science data, work that culminated in the publication of an assessment of each technology and a set of recommendations for identifying Earth science data [34]. This set of recommendations, which include the use of Digital Object Identifiers (DOIs) at the data set or collection level are currently being implemented by US agencies such as NASA, NOAA, DOE, and USGS.

In addition, the committee developed a set of data stewardship principles and practices for data creators, data intermediaries, and data users [19] that were approved by the general assembly at the annual winter meeting. The assembly also approved citation guidelines during that meeting [38], guidelines which are now the recommended form for citing published data sets in all American Geophysical Union publications.

Among the ongoing activities of the group is establishment of a standard for the provenance and context content that must be preserved in order for Earth science data to be re-used for climate change studies. The background and current status of this work is described in the following section.

 

5.1 The ESIP PCCS — Background

In 1998, the US Global Change Research Program (USGCRP) held a workshop to discuss the Global Change Science Requirements for Long-Term Archiving (LTA) [39]. The workshop sponsored by NASA and NOAA, consisting of several practicing scientists and data managers, considered several use cases involving attempts to reuse data. In some of the cases, data preservation along with appropriate metadata and documentation had helped in long-term analyses. In others, insufficient attention to preservation of associated content had reduced the utility of data. Lessons from experiences in using archived data lead to identification of data and documentation to be preserved. Some of the key observations from the workshop included the following:

  • "Increase in scientific understanding of the Earth's system is the result of an active, growing community of research scientists
    • Their work requires wide variety of global-scale measurements sustained over long periods
    • Extension, cross-calibration and validation of datasets require combining measurements and derived products from multiple sources
    • This will only be possible if the data, products and full documentation are preserved by the Long-Term Archive program and each scientist has easy access to the particular combination of data and information services they require" [39]

This workshop report also identified a number of categories of content that must be preserved along with data to facilitate their long-term usability.

 

5.2 The ESIP PCCS — Current Efforts

In January 2011, members of the Data Stewardship Committee of ESIP proposed that a "Provenance and Context Content Standard (PCCS)" be developed enumerating all content items required to completely capture the provenance and context of the data products resulting from Earth science missions. The sources of content items considered in the enumeration were:

  • The 1999 USGCRP Workshop Report,
  • Interaction with several NASA instrument teams, and
  • NOAA's Satellite Products and Services Review Board (SPSRB) Documentation Guidelines.

For each of the content items, the following attributes were included:

  • Content Item Name,
  • Descriptive Definition,
  • Rationale (why a given item is needed),
  • Criteria (how good the content should be),
  • Priority,
  • User community (who would most likely use the item),
  • Source,
  • Project Phase for Capture,
  • Representation (word files, numeric files, pointers, etc.) and
  • Distribution Restrictions (e.g., proprietary concerns).

Eight categories of content items were identified (see Table 1). Content items defined within each of these categories, along with their attributes constitute a PCCS Matrix [40]. At this time this matrix is considered a good starting point for developing a standard to offer guidance for data producers, data managers, and others. In particular, NASA used the PCCS matrix to develop a preservation content specification for its Earth science data [41].

CategoryExample Content
Preflight/ Pre-Operations CalibrationInstrument Description; radiometric and geometric information from pre-flight measurements
Science Data ProductsLevel 4 data products (e.g., model outputs); Discovery Metadata
Science Data Product DocumentationProcessing history; Quality Assessment; Algorithm Theoretical Basis, References
Mission Data CalibrationIn situ measurement environment; Calibration software
Science Data Product SoftwareOutput data set description; programming and procedural considerations
Science Data Product Algorithm InputsAlgorithm input documentation; Algorithm input data
Science Data Product ValidationValidation record; validation data sets
Science Data Software ToolsSoftware to read and display data products

Table 1: PCCS Categories & examples of the type of information in each category

In parallel with the above efforts in the US, the European Space Agency (ESA) has been active in its Long Term Data Preservation Program (LTDP). Pursuing a common approach for its Member States, ESA has developed the LTDP Common Guidelines as a basic reference for application by space data and archive owners. They first distinguish requirements by type of platform, e.g., spacecraft, aircraft, and balloon. Data to be preserved are selected in accordance with these mission classification categories. Spacecraft based subcategories include Synthetic Aperture Radar (SAR), Optical, Land/Ocean, Atmospheric Chemistry satellite systems. A list of general mission related documentation is offered as a guide for content to be archived with the data. ESA's Preserved Data Set Composition document [42] further defines content by type of mission in the Earth science context during all mission phases. At this level there is a significant overlap and often one-to-one correspondence in the content descriptions with the eight content categories in the NASA specification; enough similarity and overlap to warrant a global effort to create a standard, as is described below.

 

5.3 Expanding the PCCS to other types of data and disciplines

Recently, the Geological Survey of Alabama (GSA) applied the PCCS to a collection of physical core samples (a common type of data set within Earth sciences — one that is significantly different from the remote sensing data for which the PCCS was originally designed) [43], [44]. They mapped the high-level categories to equivalents that one might find in the records associated with a collection of physical objects (Table 2). While not perfect, the PCCS provides a substantial foundation for stewardship of this very different sort of Earth science data set.

PCCS CategoryPhysical Object Mapped Category
Preflight/Pre-OperationsSite selection/permitting
Product Data and MetadataProduct Data
DocumentationDocumentation and Metadata
CalibrationRecovery information
Product SoftwareNot applicable
Algorithm inputConventions
ValidationNot applicable
Software ToolsNot applicable

Table 2: Mapping of Physical Object information to high-level PCCS categories (Derived from [43]).

The next step with the PCCS is for the community to move forward with generating an international standard for provenance and contextual content. The major issue at the moment is determining whether it would be better to do so with a base document describing in general what is required, perhaps as a subsidiary document to the Reference Model for an Open Archival Information System [15]; followed by a series of more disciplinary and data type specific documents or whether a standard that applies strictly to remote sensing data be the first in a series of disciplinary/data type specific standards. Members of the ESIP Data Preservation Committee are currently working with the International Standards Organization Technical Committee — Geographic information/Geomatics (ISO/TC 211), proposing to develop one or more ISO standards (or specifications) addressing preservation contents covering broad discipline areas.

In addition, the ESIP community would like to apply the PCCS to other types of data and would welcome participation by organizations or people with Earth Science data to assess. Furthermore, the PCCS is being used as the foundation for a profile or extension of the W3C PROV ontology. Tentatively entitled PROV-ES, for PROV Earth Science, this extension to PROV provides the linkages and entities needed to fully encompass the entire collection of materials needed to fully preserve Earth Science data for future use [45].

 

5.4 Future Work

In the meantime, the ESIP Data Stewardship Committee continues to work towards advancing the state of the art in data stewardship through the establishment of a series of liaisons with other groups working in this area. For example, several of the Research Data Alliance (RDA) Working Groups and Interest Groups have members who act as liaisons between those RDA communities and ESIP. In addition, ESIP has members on the FORCE11 DCIG group, on CODATA groups, and on other groups that are conducting related activities.

ESIP teams also are contributing to the development of standards and best practices within the Earth science community. For example, we have started developing example data citation guidelines for reviewers, editors, and authors to help journals and scientific organizations that are considering updating their guidelines to include data related topics.

To characterize the state of preservation of Earth Science data, we are beginning to test the Data Stewardship Maturity Matrix, originally developed by Ge Peng, of NOAA [46], on data that are external to the NOAA environment. The hope is that the Matrix can be generalized enough to work for Earth Science data more broadly, so that the Matrix can be applied in a uniform fashion by data centers and other data-holding organizations to express the state of any particular data set within their care. In assessing such tools, resources that offer similar utility also will need to be considered.

ESIP community discussions are open to individuals and organizational representatives. Individuals may join discussions by perusing the ESIP wiki and attending any of the listed teleconferences or by adding their names to the posted mailing lists. Organizations are encouraged to join the ESIP Federation by completing a membership application. Organizational membership is free, subject to review of application and approval by the Federation Assembly.

 

6 Summary

Science data management has evolved with science and technology. Data centers began to appear in the late 1950s, initially as libraries holding scientific data in the form of gray literature. As technology developed, the data were collected in digital form or subsequently digitized to become machine-readable. With the prevalence of digital data, the old scientific practice of citing all the source materials that were used for a publication had waned, but is now showing a resurgence that is coupled with the continuing maturity of practices for scientific data management.

In the Earth Sciences, much of the impetus for improvements in data management across U.S. organizations has come from the ESIP Federation, an open community of researchers, data managers, funding agencies, and others, that has been actively working to improve the state of data access, interoperability, and stewardship for almost two decades. As a community, ESIP has developed a series of papers, white papers, best practices, and other products, which have been broadly adopted and used by agencies and scientific data centers. ESIP continues to be actively engaged worldwide with related efforts to improve the creation, management, distribution, use, and citation of scientific data. As part of its commitment to open initiatives that enable the use of Earth science data, the ESIP community invites contributions from individuals and organizations with similar interests to improve practices for enabling the development and use of science data.

 

Acknowledgements

The authors' names appear in alphabetical order. This work was completed as an activity of the Federation of Earth Science Information Partners (ESIP) Data Stewardship Committee. The authors appreciate the review and comments on a draft of this paper by Justin Goldstein and Joe Hourclé. Support for Robert Downs was provided under NASA Contract NNG13HQ04C for the Continued Operation of the Socioeconomic Data and Applications Center (SEDAC). The National Snow and Ice Data Center and NSF Grants ARC-1231638, ARC-1231638, and ICER 1343802 provided support for Ruth Duerr. Denise Hills worked on this paper as part of her official duties as an employee of the State of Alabama. H.K. Ramapriyan worked on this paper first as part of his official duties as a U.S. Government employee and later under NASA Contract NNG12HP08C.

 

References

[1] Yeo, Richard. 2007. "Between Memory and Paperbooks: Baconianism and Natural History in Seventeenth-Century England." History of Science 45 (March): 1—46. Also available here.

[2] Soll, Jacob. 2010. "From Note-Taking to Data Banks: Personal and Institutional Information Management in Early Modern Europe." Intellectual History Review 20 (3): 355—75. http://doi.org/10.1080/17496977.2010.492615

[3] Sankey, Margaret. 2010. "Writing the Voyage of Scientific Exploration: The Logbooks, Journals and Notes of the Baudin Expedition (1800—1804)." Intellectual History Review 20 (3): 401—13. http://doi.org/10.1080/17496977.2010.492618

[4] Schotte, Margaret. 2013. "Expert Records: Nautical Logbooks from Columbus to Cook." Information & Culture 48 (3): 281—322. http://doi.org/10.7560/IC48301. Also available here.

[5] Bourguet, Marie-Noëlle. 2010. "A Portable World: The Notebooks of European Travellers (Eighteenth to Nineteenth Centuries)." Intellectual History Review 20 (3): 377—400. http://doi.org/10.1080/17496977.2010.492617

[6] Nelles, Paul. 2010. "Seeing and Writing: The Art of Observation in the Early Jesuit Missions." Intellectual History Review 20 (3): 317—33. http://doi.org/10.1080/17496977.2010.492612

[7] Carey, Daniel. 1997. "Compiling Nature's History: Travellers and Travel Narratives in the Early Royal Society." Annals of Science 54 (3): 269—92. http://doi.org/10.1080/00033799700200211

[8] Ruttenberg, Stan. 1992. "The ICSU World Data Centers." Eos, Transactions American Geophysical Union 73 (46): 494—95. http://doi.org/10.1029/91EO00365

[9] Dieminger, Professor Dr Walter, Professor Dr Gerd K. Hartmann, and Professor Dr Reinhart Leitinger. 1996. "The World Data Center System, International Data Exchange, and New ICSU Programs." In The Upper Atmosphere, edited by Professor Dr Walter Dieminger, Professor Dr Gerd K. Hartmann, and Professor Dr Reinhart Leitinger, 921—41. Springer Berlin Heidelberg.

[10] Reich, Vicky, and David S. H. Rosenthal. 2001. "LOCKSS: A Permanent Web Publishing and Access System." D-Lib Magazine 7 (6). http://doi.org/10.1045/june2001-reich

[11] Fox, Peter, and Ray Harris. 2013. "ICSU and the Challanges of Data and Information Management for International Science." Data Science Journal 12: WDS1—12. http://doi.org/10.2481/dsj.WDS-001

[12] Bisco, Ralph L. 1966. "Social Science Data Archives: A Review of Developments." American Political Science Review 60 (01): 93—109. http://doi.org/10.2307/1953810

[13] National Academy of Sciences. 2015. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: National Academies Press.

[14] CCSDS. 2011. Audit and Certification of Trustworthy Digital Repositories. Recommendation for Space Data System Practices, CCSDS 652.0-M-1. Washington, DC: CCSDS.

[15] Consultative Committee for Space Data Systems (CCSDS). (2012). Reference Model for an Open Archival Information System (OAIS): Recommended Practice. 650.0-M-2. Washington, DC: CCSDS.

[16] Levitus, S. 2012. "The UNESCO-IOC-IODE 'Global Oceanographic Data Archeology and Rescue' (GODAR) Project and 'World Ocean Database' Project." Data Science Journal 11: 46—71. http://doi.org/10.2481/dsj.012-014

[17] Committee on Geophysical and Environmental Data, National Research Council. 1998. Review of NASA's Distributed Active Archive Centers. Washington, DC: National Academies Press.

[18] NASA, 2011, Data and Information Policy—NASA Science.

[19] Data Stewardship Committee, 2012a, Interagency Data Stewardship Principles ESIP Commons, http://doi.org/10.7269/P3862DC8

[20] Michener, William K., John Porter, Mark Servilla, and Kristin Vanderbilt. 2011. "Long Term Ecological Research and Information Management." Ecological Informatics, Special Issue: 5th Anniversary, 6 (1): 13—24. http://doi.org/10.1016/j.ecoinf.2010.11.005

[21] Uhlir, Paul F., Robert S. Chen, Joanne Irene Gabrynowicz, and Katleen Janssen. 2009. "Toward Implementation of the Global Earth Observation System of Systems Data Sharing Principles." Data Science Journal 8: GEO1—91. http://doi.org/10.2481/dsj.35JSL201

[22] Whitlock, Michael C., Mark A. McPeek, Mark D. Rausher, Loren Rieseberg, and Allen J. Moore. 2010. "Data Archiving." The American Naturalist 175 (2): 145—46. http://doi.org/10.1086/650340

[23] Obama, Barack. 2013. "Executive Order — Making Open and Machine Readable the New Default for Government Information." The White House. May 9.

[24] Burwell, Sylvia, Steven VanRoekel, Todd Park, and Dominic Mancini. 2013. "Open Data Policy — Managing Information as an Asset." Office of Management and Budget, Executive Office of the President.

[25] Hartter, Joel, Sadie J. Ryan, Catrina A. MacKenzie, John N. Parker, and Carly A. Strasser. 2013. "Spatially Explicit Data: Stewardship and Ethical Challenges in Science." PLoS Biol 11 (9): e1001634. http://doi.org/10.1371/journal.pbio.1001634

[26] Curtis, Andrew, Jacqueline W. Mills, Loraine Agustin, and Myles Cockburn. 2011. "Confidentiality Risks in Fine Scale Aggregations of Health Data." Computers, Environment and Urban Systems 35 (1): 57—64. http://doi.org/10.1016/j.compenvurbsys.2010.08.002

[27] Boulos, Maged NK, Andrew J. Curtis, and Philip AbdelMalik. 2009. "Musings on Privacy Issues in Health Research Involving Disaggregate Geographic Data about Individuals." International Journal of Health Geographics 8 (1): 46. http://doi.org/10.1186/1476-072X-8-46

[28] Roche, Dominique G., Robert Lanfear, Sandra A. Binning, Tonya M. Haff, Lisa E. Schwanz, Kristal E. Cain, Hanna Kokko, Michael D. Jennions, and Loeske E. B. Kruuk. 2014. "Troubleshooting Public Data Archiving: Suggestions to Increase Participation." PLoS Biol 12 (1): e1001779. http://doi.org/10.1371/journal.pbio.1001779

[29] Borgman, Christine L., Jillian C. Wallis, and Noel Enyedy. 2007. "Little Science Confronts the Data Deluge: Habitat Ecology, Embedded Sensor Networks, and Digital Libraries." International Journal on Digital Libraries 7 (1-2): 17—30. http://doi.org/10.1007/s00799-007-0022-9

[30] Tenopir, Carol, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. 2011. "Data Sharing by Scientists: Practices and Perceptions." PLoS ONE 6 (6): e21101. http://doi.org/10.1371/journal.pone.0021101

[31] Piwowar, Heather A., and Todd J. Vision. 2013. "Data Reuse and the Open Data Citation Advantage." PeerJ 1 (October): e175. http://doi.org/10.7717/peerj.175

[32] Piwowar, Heather A., Roger S. Day, and Douglas B. Fridsma. 2007. "Sharing Detailed Research Data Is Associated with Increased Citation Rate." PLoS ONE 2 (3): e308. http://doi.org/10.1371/journal.pone.0000308

[33] Altman, Micah, and Gary King. March/April 20017. "A Proposed Standard for the Scholarly Citation of Quantitative Data." D-Lib Magazine. http://doi.org/10.1045/march2007-altman.

[34] Duerr, Ruth E., Robert R. Downs, Curt Tilmes, Bruce Barkstrom, W. Christopher Lenhardt, Joseph Glassy, Luis E. Bermudez, and Peter Slaughter. 2011. "On the Utility of Identification Schemes for Digital Earth Science Data: An Assessment and Recommendations." Earth Science Informatics 4 (3): 139—60. http://doi.org/10.1007/s12145-011-0083-6

[35] Parsons, Mark A., Ruth Duerr, and Jean-Bernard Minster. 2010. "Data Citation and Peer Review." Eos, Transactions American Geophysical Union 91 (34): 297—98. http://doi.org/10.1029/2010EO340001

[36] CODATA-ICSTI Task Group on Data Citation Standards and Practices. 2013. "Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data." Data Science Journal 12: CIDCR1—75. http://doi.org/10.2481/dsj.OSOM13-043

[37] Starr, Joan, Eleni Castro, Mercè Crosas, Michel Dumontier, Robert R. Downs, Ruth Duerr, Laurel L. Haak, Melissa Haendel, Ivan Herman, Simon Hodson, Joe Hourclé, John Ernest Kratz, Jennifer Lin, Lars Holm Nielsen, Amy Nurnberger, Stefan Proell, Andreas Rauber, Simone Sacchi, Arthur Smith, Mike Taylor, and Tim Clark. 2015. Achieving Human and Machine Accessibility of Cited Data in Scholarly Publications. PeerJ Computer Science 1:e1 http://doi.org/10.7717/peerj-cs.1

[38] Data Stewardship Committee, 2012b, Data Citation Guidelines for Data Providers and Archives, ESIP Commons. http://doi.org/10.7269/P34F1NNJ

[39] Hunolt, Greg. 1999. Global Change Science Requirements for Long-Term Archiving. Report of the Workshop, Oct 28-30, 1998. USGCRP Program Office. http://doi.org/10.7930/J0CZ353N

[40] Ramapriyan, H., J. Moses, and R. Duerr. 2012. "Preservation of Data for Earth System Science — Towards a Content Standard." In Geoscience and Remote Sensing Symposium (IGARSS), 2012 IEEE International, 5304—7. http://doi.org/10.1109/IGARSS.2012.6352411

[41] ESDIS Project. 2013. "NASA Earth Science Data Preservation Content Specification." NASA/GSFC.

[42] LTDP Working Group. 2012. "Long Term Data Preservation — Earth Observation Preserved Data Set Content." European Space Agency.

[43] Hills, D. J., S. Ramdeen, and H. K. Ramapriyan. 2013. "Applying the Emerging Provenance and Context Content Standard to Physical Objects in a Core Repository: A Use Case to Demonstrate Validity of Broader Community Adaptation." Abstract IN11D-07 presented at 2013 Fall Meeting, AGU, San Francisco, Calif., 9-13 Dec. (Also available on SlideShare.)

[44] Ramdeen, S., and Hills, D. (2013). "ESIP's Emerging Provenance and Context Content Standard Use Cases: Developing Examples and Models for Data Stewardship", Abstract IN53C-1578 presented at 2013 Fall Meeting, AGU, San Francisco, Calif., 9-13 Dec.

[45] Hua, H., Tilmes, C., Ramapriyan, H.K., Duggan, B., Wilson, B., and Manipon, G.J. (2014). Provenance for Earth Science Data Systems, Abstract IN31-C-3730 presented at 2014 Fall Meeting, AGU, San Francisco, Calif., 15-19 Dec.

[46] Peng, Ge, Jeffrey L. Privette, Edward J. Kearns, Nancy A. Ritchey, and Steve Ansari. 2015. "A Unified Framework for Measuring Stewardship Practices Applied to Digital Environmental Datasets." Data Science Journal advpub. http://doi.org/10.2481/dsj.14-049

 

About the Authors

downs

Robert R. Downs is the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Earth Institute of Columbia University. He also serves as a scientific member and as the Vice-Chair of the Columbia University Morningside Institutional Review Board (IRB) and on the Board of Directors of the Foundation for Earth Science. He earned the PhD in Information Management from the Stevens Institute of Technology. His research focuses on the development, management, use, and evaluation of systems.

 
duerr

Ruth Duerr is currently retiring from her position as Senior Associate Scientist and Data Stewardship program manager at the National Snow and Ice Data Center and is in transition to her new position as Research Scholar at the Ronin Institute for Independent Scholarship where she will the Principal Investigator, co-Investigator or Project Manager for several data management and cyberinfrastructure projects. She has interests in a broad range of fields including science data management, digital archives, records management, digital library science, software and system engineering. She was the first chair of the Federation of Earth Science Information Partners (ESIP) Preservation and Stewardship cluster and is currently President-elect of the American Geophysical Union's Earth and Space Science Informatics Focus Group. Ms. Duerr has a M.S. in Astronomy from the University of Arizona and a Graduate Certificate in Science and Technology Policy from the University of Colorado at Boulder.

 

Denise Hills is Director of the Energy Investigations Program at the Geological Survey of Alabama, Tuscaloosa, Alabama. Ms. Hills received her B.S. in Geology from the College of William and Mary (1995) and her M.S. in Geology and Geophysics from the University of Delaware (1998). Her principal expertise is in energy resource assessment and geophysical data interpretation. Projects have ranged from conventional oil and gas reservoir estimates, carbon capture and storage planning, to legacy data rescue related to geothermal resources. Recently, she has been involved with ESIP's Data Stewardship Committee, in order to actively address the issues surrounding geoscience data access, management, and preservation that she has confronted while conducting her research.

 
rama

Hampapuram K. "Rama" Ramapriyan is the Chief Science Research Advisor at Space Science and Applications, Inc. Until August 2014, he was the Assistant Project Manager of the Earth Science Data and Information System (ESDIS) Project at NASA's Goddard Space Flight Center, which is responsible for archiving and distribution of most of NASA's Earth science data. He has over 40 years of managerial and technical experience in science data systems development, image processing, remote sensing, parallel processing, algorithm development, science data processing, archiving and distribution. His most recent focus is on data preservation and stewardship. He led the development of the Provenance and Context Content Standard (PCCS) in the ESIP Data Stewardship Committee and NASA's Earth Science Data Preservation Content Specification. He is a member of the Board of Directors of the Foundation for Earth Science. He is a Senior Member of the Institution of Electrical and Electronics Engineers (IEEE), and a Distinguished Lecturer of the IEEE Geoscience and Remote Sensing Society. He holds a Ph.D. in Electrical Engineering from the University of Minnesota.

 
transparent image