Semantic Search: Magnet for the Needle in the Search Haystack
Report on the 2012 Joint NKOS/CENDI Workshop
Marcia Lei Zeng
Experts in semantic search and related technologies, users, implementers, academic researchers, and others interested in the use of knowledge organization systems in networked environments attended the NKOS/CENDI Workshop, "Magnet for the Needle in a Search Haystack", held at the U.S. Department of Transportation Media Center on December 6, 2012. It was the tenth workshop on Networked Knowledge Organization Systems/Services sponsored by the NKOS group in the U.S. since 1998.
What is Semantic Search? What user requirements does it seek to address? How is Semantic Search being implemented? How can Semantic Search technologies be evaluated? What results have we seen thus far, and what are the areas of research that may bring future improvements? These questions and related topics were addressed by experts in semantic search and related technologies, users, implementers, and academic researchers, at the NKOS/CENDI Workshop "Magnet for the Needle in a Search Haystack", held at the U.S. Department of Transportation Media Center on December 6, 2012. It was the tenth workshop on Networked Knowledge Organization Systems/Services sponsored by the NKOS group in the U.S. since 1998.
NKOS is a community of over 300 practitioners from more than 10 countries who are interested in the use of knowledge organization systems (e.g., classifications, gazetteers, lexical databases, ontologies, taxonomies, thesauri, etc.) in networked environments. The workshop's co-sponsor, CENDI, is an interagency working group of senior scientific and technical information managers from 13 U.S. federal agencies. CENDI's mission is to help improve the productivity of federal science- and technology-based programs through effective scientific, technical, and related information support systems. CENDI also co-sponsored the 2008 and 2009 Joint NKOS/CENDI Workshops.
Over 70 colleagues attended this year's full-day workshop. Approximately ten colleagues participated through an online-conferencing system. Eighteen invited speakers represented government, academic, and commercial organizations from a variety of disciplines. The organizers of this year's workshop were Gail Hodge (Chair), Denise Bedford, Joseph Busch, Michael Crandall, Jane Greenberg, Marjorie Hlava, Michael Pendleton, Amanda Wilson, Shewan Workneh, and Marcia Zeng. The workshop organizers were especially grateful to Amanda Wilson, Director of National Transportation Library, U.S. Department of Transportation, for her instrumental role of hosting the workshop and coordinating the meeting facilities and access of the participants. All presentation materials (some with recordings) are available on the workshop website.
The workshop started with the session, "Setting the Landscape", moderated by Michael Pendleton, Linked Open Data Manager at the United States Environmental Protection Agency. Dean Allemang, Chief Technology Officer and Chief Data Scientist at Open Data Registry (previously of TopQuadrant) gave a keynote speech, "The Semantic Web Landscape", in which he placed semantic search in the semantic web landscape. He emphasized that the semantic web doesn't do any of the problem solving that we currently do on the web: search, comparison, route planning, diagnosis, measurement, broadcast, etc., but what it does do is allow data to be shared, thereby facilitating those functions and more. He explained in detail about sharing data on the semantic web, pointing out the strategies of moving from living a "life in the data cathedral" to "living in the data wilderness".
Denise Bedford, Goodyear Professor of Knowledge Management at Kent State University, discussed the various views of semantic search. Her presentation, "The 11 Views of Semantic Search," focused on four dimensions: Long-term vision of semantic search; Semantic search as a radical transformation; Semantic search as incremental improvement; and Status of semantic search R&D. She presented two different (but not inconsistent) ways to characterize semantic search: the first view is a radical transformation of search brought about by disruptive technologies, a radically different future semantic environment. The second view of semantic search is that of an incremental transformation of our current search environment with the addition of targeted semantics. At the end she posed eleven questions for general discussion.
"The User Perspective" was the second session, moderated by Gail Rayburn, Taxonomist at Johns Hopkins University Applied Physics Laboratory. Six invited speakers shared their views at a user panel, "The Value Proposition for Semantic Search". They included:
Each user panel presentation addressed three general questions: 1) What is the background or context of your community or project relevant to Semantic Search? 2) What are the search needs of the users in your community or project that may be solved by semantic search, focusing on unique or high priority requirements? And 3) What challenges, problems or issues have you or members of the community encountered as you've tried to identify, evaluate or implement semantic search in this context?
The first two afternoon sessions addressed linked data related issues and applications. Marcia Lei Zeng, Professor at Kent State University, and Shewan Workneh, Information Architect of International Monetary Fund, each moderated a session.
Tom Baker, CIO of Dublin Core Metadata Initiative (DCMI), talked about Knowledge Organization and Semantic Search. After illustrating various semantic searches, he shared two paradigms to help in defining semantic search and how knowledge organization systems can support them: 1) Concept-based vs. ontology-based search, and 2) SKOS concept schemes vs. OWL ontologies.
Xia Lin, Professor at Drexel University, gave a presentation that was full of vivid visual images and titled "Visualization and Semantic Search". His talk focused on the motivations and challenges of information visualization for semantic search and useful visualization for search. He shared two major research projects he had led to demonstrate what he considered as meaningful, useful KOS-based visualization for search and discovery.
"Semantic Search: Discovering Relevant Information in a Digital Golden Age" was the title of the third presentation by Bernadette Hyland, CEO & co-founder of 3 Round Stones, Inc. and co-chair of the W3C Government Linked Data Working Group. She pointed out that: 1) No 'one size fits all': simple, complex and legacy data require different approaches; 2) Search, discovery and access are coming together and in terms of open standards, Linked Data improves search, discovery and access; 3) Along with HTML5 for improved user experience and mobile access, Linked Data will provide the biggest win for government and the public, through ease of combining data sets. She ended by introducing the mission of the Government Linked Data (GLD) Working Group: to provide standards and other information which help governments around the world publish their data as effective and usable Linked Data using Semantic Web technologies.
Joel Richard, Web Developer of the Smithsonian Institution Libraries, presented a real case of "Using Linked Data in the Biodiversity and Systematic Taxonomy Communities." His talk, based on experience making two authoritative works available in Linked Data, was an example of things to keep in mind when creating one's own data sets. The first publication, Taxonomic Literature II, is a fifteen volume guide to the literature of systematic botany published between 1753 and 1940. The second, Index Animalium, published in the late 1800s and early 1900s, contains 430,000 species names for 7,000 scientific volumes published between 1758 and 1840. The Smithsonian is in a position to be the authoritative source for this information by publishing the content as Linked Data. His examples also showed how Linked Data allows the content to be easily reused and shared.
"Students Lightning Talks" by students or recent graduates working in this area provided a look at the future. Amalia Levi, College of Information Studies, University of Maryland, College Park, gave a presentation titled "Through the Eye of the Needle: Making Sense of Humanities Scholarship with Linked Open Data". Jake Spiegler and Thomas Burdick, both from the Information Architecture and Knowledge Management program, Kent State University, gave a talk, "When is Semantic Search Really Semantic Search?" Finally, Bryan Schneider, Enterprise Information Architecture, International Monetary Fund, provided insights on "Knowledge Architecture to Support Search Visualization."
The last presentation by Tamas Doszkocs, President of Weblib LLC, echoed the opening speech by Dean Allemang. "Where We Are and Where We Are Going The Future of Semantic Search" brought the workshop to a close by focusing on semantic technologies. This is a diverse family of technologies that seek to derive meaning and knowledge from data. Key technologies he listed included natural language processing, data and text mining, artificial intelligence and expert systems, database management systems, information retrieval, search engines and semantic search engines, cloud computing, and visualization. He also described Weblib's semantic search technology and its implementation in the Green Energy database for the U.S. Department of Energy's Office of Scientific and Technical Information.
Related to the NKOS Workshops held in the U.S. is the European NKOS Workshop series. The 11th European NKOS workshop was held at TPDL (International Conference on Theory and Practice of Digital Libraries) in Paphos, Cyprus on September 27, 2012.
Information about NKOS events and publications, how to subscribe to the NKOS list, and the archives of the previous programs and presentation materials from the U.S. and European workshops is available at the international NKOS website.
About the Authors