May/June 2013
Developing Cyberinfrastructure for Earth Science: an Opportunity for Collaboration

Sarah Ramdeen
Providing access to digital resources enables scientists to ask and answer questions in ways they could not in the past. There is increasing interest and research in how to create the infrastructure necessary to support science data and its use, and the field of Earth Science is joining the conversation. As part of a series of domain-specific workshops hosted by EarthCube, a National Science Foundation program, the Cyberinfrastructure for Sedimentary Geology workshop was held on March 25 and 26, 2013 in Salt Lake City, Utah. Representatives from the sedimentary community gathered to discuss cyberinfrastructure issues relating to Earth Science data and the future development of the EarthCube program. During the workshop, participants discussed challenges to conducting scientific research in this domain; identified current resources; and discussed the potential impact of EarthCube on the future of research and pedagogy. Workshops such as this one are important to the field of information and library science, as they offer opportunities for interdisciplinary collaboration. Library professionals have expertise and experience to add to the conversation as EarthCube moves forward.



EarthCube, a community-driven, National Science Foundation (NSF) funded program established in 2011, focuses on Earth Science data management in an interdisciplinary context. Since the fall of 2012, EarthCube has hosted a series of domain-specific workshops (twenty four workshops will ultimately take place) to discern the diverse cyberinfrastructure needs of the Earth Science community. The Society for Sedimentary Geology (SEPM) and the Sedimentary Geology Division of the Geological Society of America (GSA) partnered to organize the most recent workshop to solicit feedback from the sedimentary community regarding major and current issues in Earth Science data and their goals for EarthCube in particular.

The EarthCube End-User Workshop for Sedimentary Geology workshop sought first to identify the cyberinfrastructure needs of the sedimentary community and second to consider optimal applications to address those needs. Participants identified scientific challenges EarthCube might address; outlined challenges to sharing and use of data within the domain; described relevant resources, workflows, tools and legacy data; discussed data sets, repositories, and software and tools needed for collaboration; and evaluated the pedagogical possibilities of EarthCube.

In January 2011, D-Lib Magazine published a special issue which focused on providing access to research data. Many of the concepts and concerns highlighted in these articles reflect the efforts needed within the field of Earth Sciences as described by the attendees of this workshop. Topics addressed in the issue included scientists' concerns about the quality of the published data without peer review, and the importance of unique identifiers such as DOIs . Additionally, the work being done by DataONE is very similar to the goals of EarthCube — however EarthCube is specifically focused on the domain of Earth Science and is still in development stages.



The "Cyberinfrastructure for Sedimentary Geology" workshop, March 25 and 26, 2013, was organized by David Budd (University of Colorado-Boulder) and Marjorie Chan (University of Utah). The workshop welcomed 57 participants from a range of institutions and organizations. The vast majority of participants work in the domain of sedimentary geology; in addition, the Cyberinfrastructure and Information and Library Science communities were represented. While participants came predominantly from U.S. universities, attendees also came from State Geological Surveys, the United States Geological Survey, the oil industry, and the group included one attendee from an international associate center of the NASA Astrobiology Center. To kick off the workshop, Chan shared a video created by her students to help attendees conceptualize EarthCube and how it might apply to their work in the future.

While providing an overview of EarthCube, Lisa Park-Boush of the National Science Foundation characterized sedimentary geologists as part of the "long tail" of data. She predicted a future of data-enabled, community-driven science. EarthCube, more specifically, does not intend to recreate existing tools (such as the work by IEDA with SESAR, the System for Earth Sample Registration — which creates unique identifiers for samples similar to DOIs for publications) but instead aims to connect data across systems to enable new and interdisciplinary research — and to involve stakeholders outside of academia.

Next, Lee Allison of the Arizona Geological Survey described EarthCube's governance model. EarthCube encourages scientists to focus on doing science, to embrace interdisciplinarity, and to develop more robust models. Although they examined a variety of governance models, EarthCube's advisors decided ultimately to suggest a higher-level roadmap for future development as opposed to supporting a specific model. Allison suggested that EarthCube should mimic DARPA's role in the development of the Internet — which is now a self-sustaining system with no direct leadership.

Illya Zaslavsky of the San Diego Supercomputer Center (SDSC) discussed the EarthCube stakeholder alignment survey and compared collected feedback from this workshop's participants to those of past domain workshops. Preliminary results highlighted participants' concerns about sharing and stealing of data and the difficulties they encountered in accessing data. Moreover, participants struggled to adapt models and use data originating in other communities. Finally, they stressed both the need for improved provenance tracking and better documentation and voiced concerns with current levels of standards compliance. Zaslavsky also demoed a new feature of the EarthCube website, Member Connections — a place where researchers can network and search for collaborators based on a variety of facets developed from their profiles.

Prefacing the breakout sessions, MYRES representative Liz Hajek of Penn State University addressed the concerns of early career researchers. She discussed tools such as Geoscience World, a database allowing researchers to search for publications using a map — an affordance most geologists would like to see in other systems.

After the morning presentations, the workshop transitioned into six breakout sessions. The sessions allowed participants to discuss cyberinfrastructure needs based on specialized topics in small groups. After each session, the groups reported back on their various findings, which will be aggregated into the final workshop report.


Session 1: Science drivers

Participants formed groups based on topics such as energy, geobiology, and paleoclimate issues. Groups discussed pressing scientific and educational "grand challenges" in these subfields. For example, sustainability constitutes an essential science driver. It impacts managing resources, informing decisions of policy makers, and understanding sedimentary processes such as dynamic depositional systems (e.g. deltas). Systems like deltas can have a pivotal societal impact, but are tremendously vulnerable to environmental impacts and often hard to explain to the layperson. Attendees were encouraged to find ways to describe their scientific work in ways that were understandable to the average citizen.


Session 2: Impediments

Participants were charged to refine, expand, clarify, and communicate key challenges, concentrating especially on cyberinfrastructural issues. Many challenges dealt with people, namely the silo-structure of science. Many participants also underlined challenges related to data management, data curation, or data capture. Other impediments related to a lack of documentation for data sets and a lack of venues in which to publish not only final results, but also the data on which those results were based. During this session there were also discussions on semantic data and enabling extensive metadata capture during initial data collection to help facilitate these processes, but the scientists were not sure of how to best solve these issues.


Session 3: Current resources

During the last session of the day, participants collaboratively developed lists of current resources related to their specific group topics from the first session. Participants described the type of resource and the data or services the resource provided. Developing these lists highlighted the need for more federated collections as many attendees benefited from the knowledge shared by their colleagues. It is also worth noting that while the National Snow and Ice Data Center and the EarthChem Library were mentioned, the California Digital Library's Portal for Earth Science Data Exploration and other valuable resources were overlooked in the final list. This suggests there is an opportunity for librarians to work with the community to develop subject resource guides or library guides, training workshops or other information portals which might raise awareness of resources available that are specific to this user community. These library resources should be expanded to include data, software and other tools beyond traditional sources such as books, websites and journals.


Session 4: Needs

Participants described measures necessary to achieve research and education goals suggested by previous sessions. Perhaps unsurprisingly, a wide range of suggestions were reported, which included: systems whose structure mimics the ways scientists actually do work; systems that enable the capture of the provenance of data; training on data curation, data management, and policy issues; innovative educational programs that merge computer science and geoscience; and bibliographic search tools. EarthCube members should look to the work done by DataONE for guidance on how to begin addressing these needs. DataOne offers a number of community engagement and outreach opportunities for the environmental science field, including cyberinfrastructure development.


Session 5: Impact on teaching and training

In the penultimate session, participants discussed EarthCube's potential pedagogical impact on the next generation of scientists. In particular, participants addressed questions such as: what could you do that you cannot do today? How would you do it? Why would you do it? Outcomes included facilitating virtual field trips, conducting labs across institutions, promoting distance learning, and aggregating resources for teachers and students. For instance, what type(s) of resource(s) would help teach students to sort through data themselves? In other words, how might one present data warehouses to students and instruct them to find (and evaluate) what they need? Some of these needs are addressed by sites such as the Digital Library for Earth System Education but there is still room for growth and future development.


Session 6: Next steps

The final session focused on the development of proposals within the NSF EarthCube calls, and facilitating coordination with other domains. In 11 small groups, participants brainstormed potential proposal topics, to either develop a Building Block grant or a Research Coordination Networks grant. These ideas were distilled into 5 significant potential topics. Participants regrouped around these topics and began drafting proposals. Examples included data integration within a specific genre (e.g. subsurface data), developing a tool to enable digital field data capture, and a project to create virtual access to geoscience-related images.

The digital field data capture group discussed how their particular needs as scientists are not being met by currently available tools. They would like to see more usability and flexibility regarding metadata capture. There was a focused discussion regarding standardized metadata fields in a digital field book — many scientists have their own customized process for note taking and they were not sure how best to standardize these processes. Strikingly, many groups vocalized their interest in having an archivist, librarian, or curator involved in their final process; however, the digital data capture group, in particular, understood the importance of an information science professional's assistance with determining how to develop metadata structures and standards.



During the closing of the workshop, Budd and Chan encouraged students, and early-career faculty in particular, to participate in EarthCube. The outcomes of these workshops will take years to materialize and they represent the future of the field. But more broadly, the workshop fostered collaboration with other domains as well as within the sedimentary geology community. The EarthCube website was highlighted as especially useful for those working on NSF proposals, and for those interested in the outcomes of other domain workshops. The discussions and questions raised in this workshop highlight an opportunity for researchers in information and library science fields to reach out to the various communities within the Earth Sciences to develop future research collaborations and to engage with these communities' data management needs.

Instead of developing cyberinfrastructure for the Earth Sciences, the goal should be developing data curation infrastructure. Cyberinfrastructure is a limiting term that does not describe the focus on research data and supporting future research efforts (Mayernik et. al, 2012). As mentioned in Mayernik, the idea of a data archive may be unfamiliar to scientists, and the authors suggest skills librarians may offer to scientists, such as the reference interview to determine the researcher's data management needs, the ability to identify gaps in current data management plans and to help scientists to understand the current options available, and the ability to facilitate the researchers' efforts to create their own data management plan. In particular, researchers need assistance with developing data repositories, methods and processes for data capture, and addressing issues relating to metadata and semantic/interoperability with data sets. While the leadership of EarthCube specifically points to a need to have computer scientists and experts in cyberinfrastructure, they should also be highlighting the library and information science field — the need to also have those professionals, who are the experts in data management.


About the Author

Sarah Ramdeen is a doctoral student at the School of Information and Library Science at the University of North Carolina at Chapel Hill. Her research interests include the information seeking behavior of geologists, understanding the needs of internal vs. external searchers on data management, and the information challenges related to physical items which cannot be replaced by digital surrogates. She previously worked for the Florida Geological Survey and has a BS in Geology and an MLIS, both from Florida State University.

