Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
May/June 2007

Volume 13 Number 5/6

ISSN 1082-9873

Using Wikipedia to Extend Digital Collections

 

Ann M. Lally
University of Washington Libraries
<alally@u.washington.edu>

Carolyn E. Dunford
University of Washington Libraries
<cedunford@gmail.com>

Red Line

spacer

Introduction

In May 2006, the University of Washington Libraries Digital Initiatives unit began a project to integrate the UW Libraries Digital Collections into the information workflow of our students by inserting links into the online encyclopedia Wikipedia. The idea for this project grew out of our reading of OCLC's 2005 report Perceptions of Libraries and Information Resources [1] which states that only 2% of college and university students begin searching for information at a library web site. It is, therefore, incumbent upon Librarians to look for new ways to reach out to our users where they begin their information search.

The explosive growth of Wikipedia made it a prime candidate for our efforts at pushing information about the Libraries out to where users conduct their research. It should be noted here that our digital collections are already harvested and heavily used by people all over the world; in fact, Google and its affiliates are the top referrers of people to our collections. However, Wikipedia is fast becoming one of the top reference resources for many who are searching for information on a particular topic, and it is often one of the first references in a search results list. In fact, Wikipedia receives 54% of its traffic from Google [2]. Furthermore, referring to Wikipedia as "one of the poster children for Web 2.0", the Pew Internet & American Life Project researchers have noted a sharp increase in the use of Wikipedia in contrast to the "sluggish growth" of Encarta [3].

Peter Morville, an information architecture and findability consultant, offers us a possible explanation for this phenomenon in a recent blog post in which he discusses how the perceived authority of Wikipedia is derived from its information architecture, visual design, governance, branding, "and from widespread faith in intellectual honesty and the power of collective intelligence" [4]. Morville argues that these structural and social aspects of Wikipedia make it more findable, and when combined with certain psychological aspects of decision making (anchoring bias and confirmation), boost Wikipedia's perceived authority [5].

This article will describe the UW Libraries Digital Collections and the phenomenon known as Wikipedia. We will also describe the process of adding links to Wikipedia articles as well as the outcomes from the University of Washington Libraries project.

Before we move on, however, we wish to note that it is not our intention to endorse or evaluate the content of Wikipedia articles. Rather, we acknowledge the increasing prominence of this resource in our patrons' workflows and wish to highlight our success with this project.

The University of Washington Libraries Digital Collections

The University of Washington Libraries Digital Collections are comprised of over 120,000 images, texts, and audio files covering a wide range of topics and resources. Most of the information is about the people and places of the Pacific Northwest; however we also have collections about architecture, chemistry, marine life, and a diverse assortment of additional topics. The information in our collections comes from the Libraries Special Collections unit, as well as from faculty, students and local and regional partners, and content in the collections has grown through internal efforts at digitization, as well as projects and partnerships. These collections average 3,600 visits a day with visitors viewing approximately 12 pages per visit.

What is Wikipedia?

Wikipedia is a free, online encyclopedia collaboratively written by its users. Anyone may add, edit, or remove content using a simple markup language. Wikipedia has over 1.5 million English-language articles and nearly 3 million registered users [6]. The articles are well organized; they generally have an introduction and a table of contents. Sub-headings are encouraged in order to help users find the information they are looking for within an article. As with traditional web pages, Wikipedia articles include hyperlinks to direct users to additional information; hyperlinks can be internal, pointing to other Wikipedia articles, or they can be external, pointing to outside resources. Articles are also placed in categories, which provide a taxonomic structure to the Wikipedia as a whole. Wikipedia employs a familiar layout – main navigational links are located on the left sidebar with tabs pertaining to the article located along the top of the page. Each article has its own discussion forum where content may be discussed and a history page where all changes to that page are recorded.

Detailed policies and guidelines govern the creation of content in Wikipedia. These policies and guidelines (which have been collaboratively written by users) provide standards for writing articles and rules of conduct for working with other users. Key standards state that an article must have a neutral point of view, include only verifiable information, cite sources, and must not include original research.

The Project: Adding links in Wikipedia to UW Collections

A. Analyzing the University of Washington Libraries Digital Collections

For each of the University of Washington Libraries digital collections, we first needed to determine what subjects were represented by the materials and the relative weight of these subjects in that particular collection. In some cases, the time frame of the pictured event was very important. For example, in Wikipedia, articles on the history of Seattle are segmented by particular time periods; therefore, any external links to our collections needed to be relevant to that same time period. Consideration was also given to the number of items in the online collection itself. This was critical because external links are reviewed by the Wikipedia community quite rapidly, and the relevance of a newly added link to an article must be obvious to the reader. If the description of the external link written for our collection or the description provided on the main page of the libraries digital collection is not obviously relevant to other editors, it probably would be challenged by another editor or simply deleted – within minutes.

Our digital image collections use the CONTENTdm Software Suite, which allows browsing within a collection and searching within one or a combination of collections, using keyword and pre-defined searches. Pre-defined searches are available via pull-down menus and other user interfaces, such as photo montages, graphical menus, and maps. Our process for collection analysis was straightforward: we reviewed the summary and description of each collection, then explored the collection by using the various pre-defined searches for a particular collection. A spreadsheet was used to note possible subjects in Wikipedia in order of their weight in the collection.

Once subject analysis was completed, Wikipedia was searched for relevant articles. Articles were found by entering keywords or phrases in the search box located on the left side of each Wikipedia page. When a good match was found, the Wikipedia article would be edited to add the external link pointing to our online collection; then, a note including the article name and the date the link was added was recorded on a spreadsheet.

B. Adding Links to Wikipedia

Wikipedia guidelines for external links recommend providing "a good summary of the site's contents, and the reasons why this specific website is relevant to the article in question" [7]. The norm in Wikipedia for such descriptive summaries is that they should be quite brief, often just a phrase or a sentence. Given that the titles of our collections often did not convey all of the subjects represented within a particular collection, we needed to create a concise summary for each collection. These summaries were based upon the existing description of the collection; additional subjects were added or emphasized as appropriate. A standard format for these external links and descriptions was established and maintained.

C. Creating Articles in Wikipedia

In a few instances, there were no articles in Wikipedia that corresponded to the subject matter of a collection, or the subject was not adequately covered in Wikipedia, so it was necessary to write a new article. This was the case with the J. Willis Sayre Photographs Digital Collection, which consists of a selection of 9,856 images from more than 24,000 photographs of theatrical and vaudeville performers, musicians, and entertainers who performed in Seattle between about 1900 and 1955. A link to our digital collection of Sayre photographs was added to the Vaudeville article. However, as a drama critic, journalist, promoter, and historian, Sayre was an influential figure in writing and conserving the history of theatre in Seattle, and we felt that his history and work warranted the creation of an article in Wikipedia wherein we could also add a link to point readers to our collection.

As the article about James Willis Sayre was being drafted, a list of related subjects was created. This list was used to find related articles within Wikipedia in which content could be added that would point back, using internal links, to the Sayre article. In addition to internal links, the process of discovery in Wikipedia is enhanced by the provision of a "What links here" link in the left navigation bar where one may find Wikipedia articles that link to the current article. For example, Sayre set a world record in 1903 for circumnavigating the globe using public transportation – this information was added to the article Around the World in Eighty Days, with a link to the James Willis Sayre article. While in the Sayre article, when you click on "What links here", the Around the World in Eighty Days article and all other articles that link to the Sayre article are listed (Figure 1)

Screen shot of the Wikopedia page linking to hte James Willis Sayre article

Figure 1: Wikipedia pages linking to the James Willis Sayre article

In addition to adding these internal links, category links were added to the end of the article. Wikipedia provides categorical indices to help people browse through the topics covered in Wikipedia. Categorical indices such as "History and Events" and "People and Self", to name just two, link to categories on a wide variety of topics. Using standard wiki markup language, inserting the name of the category provides users with a way to browse content via multiple taxonomies and automatically creates a link within the named categorical indices back to your article. For example, the James Willis Sayre article was automatically added to the American journalists, American theater critics, and People from Seattle categorical indices by adding the appropriate category tag to the end of the article [Figure 2].

an example of categorie links in a Wikipedia article

Figure 2: Examples of Category links in a Wikipedia article

"Lists" in Wikipedia offer another structure or mechanism for helping users discover articles and provide a way to list topics that may warrant an article, but do not yet have one. Entries are added by editing the particular list article; for example we added a link to the James Willis Sayre article to the "List of People from Seattle" page.

Working in Wikipedia

At the outset of this project, we did not create a user account in Wikipedia, and our external links were added as anonymous (unregistered) users. This, coupled with the rapid addition of our links, turned out to be a red flag for those monitoring pages for vandalism, as the majority of external link spam is added by unregistered users [8]. Subsequently, a note was left on the User talk page for our IP address suggesting that we create a registered Wikipedia user account. In Wikipedia, a User talk page may be created for registered users and unregistered users (who are identified only by their IP address). These talk pages provide a way for users to communicate with each other. If a User talk page does not already exist, another user may create one for that person in order to leave a message for them, as happened in our case.

The benefits of being a registered user include the ability to place articles on a personal "watchlist" (called My watchlist), which is essential for tracking changes to pages of interest. We were also able to easily access a list of all our contributions and customize some aspects of our account settings such as our user profile, editing interface, and watch list preferences.

Modification or deletion of external links does occur due to content disputes, mistakes, or vandalism, so it is necessary to check the article occasionally. Maintenance of our links involves routinely monitoring our watchlist; occasionally checking articles for changes; and responding to users' queries and comments on article discussion pages and user talk pages [9]. We have found that some deletions or problematic edits to our links have been corrected before we have even become aware of them [10]. Some Wikipedia users have created guidelines and tools to help other users find and revert vandalism. This effort includes the Recent Changes patrol, a project to ensure that recent changes to Wikipedia articles are reviewed in a timely manner.

Results

Analysis of server statistics indicates that Wikipedia is indeed driving more traffic to our site. The University of Washington Libraries uses the Urchin web statistics software package to help us make sense of the data being collected by our servers. For the purposes of this project, we analyzed the referrals to our site. Urchin shows a ranking by number of visits of the top sites that direct people to our collections [Figure 3].

Screen shot showing the referrals to the University of Washington web site

Figure 3: Referrals to content.lib.washington.edu October 2005 - September 2006

In addition, the software package allows for referral drilldown. Therefore, not only can we see that we are receiving referrals from en.wikipedia.org; in addition, we can see specifically which articles are responsible for the traffic, and how much traffic is generated by the links in each article [Figure 4].

Screen shot showing the referral drilldwon for en.wikipedia.org for one year

Figure 4: Referral drilldown for en.wikipedia.org October 2005 - September 2006

For the purposes of this article, we looked at data from October 2005 through September 2006. The aggregate of all referrals to content.lib.washington.edu shows sustained use for the month of October through February, then a sharp upward spike March through May, with a bit of a decline during the summer months when the majority of students in the United States are on summer break [Figure 5].

Chart showing overall referrals to the UW Libraries Digital Collections for one year

Figure 5. Overall referrals to UW Libraries Digital Collections, October 2005 - September 2006

During this same year, referrals from Google.com [not including Google Images, Google Scholar or any of the country-specific Google search engines] show a somewhat similar trend, particularly in the March through May upward spike and subsequent summer vacation drop-off [Figure 6].

Chart showing Google referrals to the UW Libraries Digital Collections for the same year

Figure 6. Google referrals to UW Libraries Digital Collections, October 2005 - September 2006

What is significant, then, is the steady increase in usage we have seen from Wikipedia. The months of June, July and August saw no downward trend, but rather the continuation of an upward trend, which began in February of 2006 (before we started adding links). This trend continued its upward climb in the months of September and October [Figure 7].

Chart showing Wikipedia referrals to UW Libraries Digital Collections for the same period

Figure 7. Wikipedia referrals to UW Libraries Digital Collections, October 2005 - September 2006

Another interesting trend of note is Wikipedia's ranking in the list of sites that refer users to us. In February 2006, Wikipedia was ranked 12th after images.google.com, search.yahoo.com and search.msn.com; but by September 2006 it was 4th after only Google.com, images.google.com and lib.washington.edu.

Because Wikipedia's content is licensed under the GNU Free Documentation License (GNU FDL), its content may be copied and distributed freely, commercially or non-commercially, providing that no other licensing restrictions are added and Wikipedia is credited. It has been noted that multiple versions of Wikipedia content are turning up on the Web [11]. Indeed, we found that the Wikipedia article James Willis Sayre, which was added to Wikipedia on August 8, 2006, now appears as mirrored content (with our links intact) in several other online dictionaries and encyclopedias, including Answers.com, TheFreeDictionary.com, Lycos.com, About.com, Reference.com, Avoo.com. In addition, content may be translated and added to non-English language Wikipedias. Only one article that references James Willis Sayre, Around the World in 80 days, has been added to another Wikipedia, in this case, the Indonesian-language Wikipedia. All this has the potential to amplify our efforts to extend the reach of our collections, and we will watch for this pattern in our referrals.

Conclusion

Recently, the MIT Libraries conducted a needs assessment for determining what improvements they should make to their online tools. The assessment states explicitly "Since the students often started their information seeking outside of the Libraries' web space, it would make sense to continue to find ways to put links, tools and MIT Libraries metadata in widely popular web sites, search engines, and databases that lead our community back to resources available to them in the Libraries..." [12]. Librarians must continue to be proactive in their attempts to reach students and other researchers where they begin their search. In the MIT study, which looked at sixteen undergraduate and sixteen graduate students, students began only 23% of their tasks at the library and consulted a library resource only 36% of the time. While the sample size was admittedly small, it nevertheless illustrates a trend that shows no signs of reversing.

Web 2.0 technologies offer librarians a great opportunity to enhance the authority of resources that students use on a daily basis, and to push their knowledge and expertise beyond the traditional boundaries of the library. We now consider Wikipedia an essential tool for getting our digital collections out to our users at the point of their information need. We view this as a very low cost way to enhance access to our collections, as well as an effective way to participate in the creation of resources that are used by millions around the world. We will continue to explore how we can take advantage of the opportunities that Web 2.0 technologies offer us when marketing our digital and physical collections.

Notes and References

[1] "Perceptions of Libraries and Information Resources: An OCLC Report to the Membership." OCLC Online Computer Library Center, Inc, 2005. <http://www.oclc.org/reports/2005perceptions.htm>. (Accessed April 22, 2007).

[2] Hitwise, "Wikipedia - Where do People Go After Visiting Wikipedia?" <http://weblogs.hitwise.com/heather-hopkins/2006/10/wikipedia_where_do_people_go_a_1.html>. (Accessed April 22, 2007).

[3] Madden, M. & S. Fox. "Riding the Waves of "Web 2.0." Pew Internet & American Life Project, <http://www.pewinternet.org/PPF/r/189/report_display.asp>. (Accessed April 22, 2007).

[4] Morville, P. "How Findability Determines Authority Online: The Wikipedia Phenomenon." <http://www.masternewmedia.org/news/2005/10/17/how_findability_determines_authority_online.htm>. (Accessed April 22, 2007).

[5] For a more detailed discussion see Morville, "How Findability Determines Authority Online: The Wikipedia Phenomenon".

[6] As of December 19, 2006.

[7] Wikipedia.org. "External links." <http://en.wikipedia.org/wiki/Wikipedia:External_links>. (Accessed April 22, 2007).

[8] There is an ongoing and lively discussion related to the development of guidelines for the use of External links in Wikipedia. A major point of concern in the Wikipedia community is what is known as "external link spamming" – adding links to commercial sites, adding links for search engine optimization purposes, or adding excessive or otherwise inappropriate links.

[9] Communication and negotiations with other Wikipedia editors can occur in several places: on article discussion pages, user talk pages, and at the community portal.

[10] Wikipedia quickly deals with vandalism through a number of mechanisms including the Counter-vandalism unit and a variety of counter-vandalism tools. For more information on vandalism and Wikipedia see <http://en.wikipedia.org/wiki/Wikipedia:Vandalism>. (Accessed April 23, 2007).

[11] Rosenzweig, R. "Can History be Open Source? Wikipedia and the Future of the Past." Center for History and New Media, June 2006, <http://chnm.gmu.edu/resources/essays/d/42>. (Accessed April 22, 2007).

[12] Bartley, M., et al. "User Needs Assessment of Information Seeking Activities of MIT Students - Spring 2006." SFX/Verde Group. <http://macfadden.mit.edu/webgroup/userneeds/userneeds-report.pdf>. (Accessed April 22, 2007).

Copyright © 2007 Ann M. Lally and Carolyn E. Dunford
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

doi:10.1045/may2007-lally