Exploring Social Curation
Michael Zarro and Catherine Hall
This work investigates social curating activities on the website Pinterest, and relates them to digital libraries. Pinterest is a social curation site that combines features such as sharing, liking, following and commenting with the information management characteristics of successful data curation. Effectively combining social media techniques and data curation practices will result in new ways of interacting with Web users, providing insight into the development of effective social media efforts by libraries, archives, and museums, as well as commercial organizations.
Curating collections of digital objects found on the Web is a popular way of storing resources for future use. While practices like bookmarking, tagging, and downloading support collecting, the popular adoption of high-speed Internet and Web 2.0 technologies enables collecting to become a social activity. The website Pinterest, currently estimated as the third most popular social media website in the United States (Experian Marketing Services, 2012), allows users to easily "organize and share" objects they encounter on the Web by curating digital collections on a virtual "pinboard" . Pinterest is a social curation site, combining the social features sharing, liking, following and commenting, with the curating capabilities of bookmarking, tagging, and personal information management (Jones, 2007). Users of social curating sites create "ad hoc" categories (Barsalou, 1983) conforming to their personal notions. The combination of these social and curating qualities points towards new ways of interacting with Web users, providing insight for the social media efforts of commercial organizations, and libraries, archives, and museums. This work investigates social curating activities on the website Pinterest.com, and relates them to digital libraries.
Social Media and LAM
Social curating offers a way for digital libraries to get a "social life" (Marshall & Bly, 2004). While previous efforts to use social media, like those on The Commons on Flickr, have some social elements, they remain organized and controlled by Library, Archive and Museum (LAM) professionals. In contrast, Pinterest users curate collections and add annotations with no supervision or institutional control. Users follow one another and unique collections in a twitter-like following model, forming networks based on shared interests. Digital libraries can learn from these unsupervised collecting activities, and this model may provide a way to extend the reach of the library to patrons who may never otherwise visit that library's physical or digital collections.
Past LAM social media projects have seen success. Steve: The Museum Social Tagging Project (Trant, 2009) incorporates user-contributed tags into descriptions of collections at several museum websites and can be seen in use at the Indianapolis Museum of Art. The Library of Congress received an "overwhelming" response to their images on Flikr Commons, (Springer, et al., 2008), collecting user annotations and increasing public access. The Smithsonian Institution determined to "go where visitors are" by adding their images to Flikr Commons rather than "requiring them to come to us" (Kalfatovic, et al., 2009).
Pinterest's growing popularity makes it a prime candidate for the attention of LAM professionals. We suggest that Pinterest and similar social curation sites can be used to expand the reach of LAM collections and gather user annotations. Social curators create ad hoc categories that are "perspective or context dependent, [and] therefore show a wide range of concepts" (Barsalou, 1983). These categories are created on the fly and represent a multitude of user thoughts, opinions, and judgments that can supplement traditional cataloging.
Digital objects are collected on Pinterest in two ways. First, organizations create an account and add their own content. Pinterest is currently used by a number of LAMs, non-profits, and commercial organizations to deliver information, promote brand awareness, and engage with their customers. LAM organizations currently with Pinterest accounts include the New York Public Library, the Smithsonian, and the Philadelphia Museum of Art. Second (and more frequently), as discussed below, Pinterest users add an organization's content to their personal ad hoc collections as the context of social curating allows reusing others' materials (Marshall & Shipman, 2011).
The design of Pinterest as a "lightweight shared place" (Marshall & Bly, 2004) is a likely explanation for its success in comparison to independent efforts on LAM websites (Marty, 2011). Pinterest supports collecting as adding pins (images, text descriptions, and sources) copied from external websites to pinboards (collections). The Pinterest website has a simple grid based layout (See Figure 1) that supports searching/browsing and serendipitous resource discovery. Web 2.0 tools, including a browser bookmarklet and "Pin This" buttons, enable seamless collecting from almost any website. Following are terms that describe Pinterest activities, and explanations from the Pinterest help page:
Figure 1: A pinboard from the website Pinterest.
Figure 2: A pin included in the Pinboard in Figure 1.
Exploring the Pinterest website and the literature discussed above led to the following exploratory research questions.
Social Curation on Pinterest
Beginning on February 15, 2012 and ending on March 15, 2012 we used the Pinterest API (since discontinued) to collect the top 25 Pins in the Pinterest system every five to 10 minutes, resulting in a total of 291,125 pins. Represented in this popular dataset are 78,261 unique users, 24,952 unique source domains, and 79,768 unique source URLs. Descriptions range from one to 7,420 characters, with an average of 29 characters. Repins ranged from zero to 80,914 (mean 1,710); Comments from zero to 469 (mean of nine); and Likes from zero to 23,167 (mean of 244).
From the popular dataset, we removed duplicate postings, removed any pin described by less than three characters, then randomly selected 1,000 pins for analysis. The resulting data consists of 946 unique users, 632 unique source domains, and 904 unique source URLs. Eighty-eight pins in this dataset were uploaded by the user from his/her computer or mobile device and contain no source domain or source URL. Descriptions in this dataset range from three to 615 characters (mean of 28 characters). Repins ranged from two to 22,897 (mean of 470); Comments from one to 112 (mean of four); and Likes from zero to 3,352 (mean of 75). Some domains are related, for example the domain name blogspot.com has many subdomains (subdomain.blogspot.com) that contribute to the datasets above. The data collected for each pin in this study (shown in Table 1) consists of user-contributed text in the form of descriptions and board names; and social metrics shown by the repins, likes, and comments counts.
Table 1. Data Collected from Pinterest API used in this study.
Popular Domain Types
In order to determine the most popular sources used for pins in our sample we extracted the most frequent domains from the sample dataset, only those that contributed three or more pins. Using an iterative coding process, we grouped these popular domain names into the categories in Table 2.
Table 2. Domain categories based on the most frequent domains (% of all pins).
Boards and Ad Hoc Categories
Guided by the Pinterest community's top-level groupings, we manually categorized each board in our sample dataset into a category (see Table 2). We found many ad hoc collections in our selection. A substantial number of pinboards are personally relevant categories or relate to home, do-it-yourself (DIY), entertaining, and fashion. Examples include Places I'd Like to Go, For the Home, and Birthday Party Ideas that provide a sense of the collection. Some Pinterest board names, however, follow the "folksonomic flaw" in that board names are "often ambiguous, overly personalised and inexact" (Guy & Tonkin, 2006). For example, the most frequent categorization of pinboards is "Other" exemplified by boards named precious, stuff, or Rob. The second most popular categorization, "My Life" includes boards Love This, Good Stuff, and A Girl Can Dream Can't She.
Next, we searched for Library of Congress Subject Headings (LCSH) that exactly match a board name using tools available at http://id.loc.gov. We found 150 pinboard names (15%) matched a LCSH using this method. All of the matches we found are for board names that are a single word, like; Architecture, Cats, and Shoes. Pinboard names that do not return a match include more complex or personal terms like Favorite Places and Spaces, Dahling ... you look FABULOUS, and eCards I found on the floor. Descriptions of individual pins fare even worse, with just 91 of 1000 (9%) matching LCSH. For both board names and descriptions, all matches were for terms that consist of a single word, while the board names average 2.4 words. Previous studies of social media sites showed a greater overlap between user-contributions and LCSH (Stvilia & Jörgensen, 2010; Heymann & Garcia-Molina, 2009). However, these works investigated tags, which are generally one-word terms, in controlled settings. The low overlap in our data suggests ad hoc categories express concepts not represented in LCSH. Ad hoc categorization might lead to new indexing schemes for LAM by revealing terms for user-centered indexing (Fidel, 1994) and provide a richer description of resources.
The topics of interest to Pinterest users based on our analysis are often from non-scholarly sources (blogs, online magazines) and relate to personal interest topics (home décor, DIY & crafts). One may deem these topics, and thus Pinterest itself, not interesting to digital library professionals because they are not academic materials. We do not subscribe to this approach for two reasons. First, the technology used on Pinterest has attracted a massive user-base in a short period of time. Certainly there are lessons that LAM professionals can use to improve the state of social media services. Second, these topics are part of everyday life information seeking (ELIS) (Savolainen, 1995; McKenzie, 2003), and relevant to Personal Information Management (Jones, 2007). Shared interests serve as discovery tools for the general public on Pinterest, similar to ArtStor's efforts to do this very thing for art scholars at member institutions. Similarly, Wikimedia Commons and Flickr support the creation of crowd-sourced material that are available to the general public, but lack the simple collecting capabilities of a social curation site.
Table 3. Pinboard Categorization
Adherence to Pinterest Guidelines
Lesk defined digital libraries as "organized collections of digital information" (1997, p. ixx). Pinterest is forming a bottom-up digital library created by users and enhanced by shared interests and social connections. However, as shown above there can be difficulty creating a digital library with no top-down structure. To address this issue, on the Pinterest Getting Started page the site operators provide the following instructions for the creation of useful pins:
To make Pinterest the most useful to yourself and others, follow best practices when pinning: 1. Pin from the original source. 2. Pin from permalinks. 3. Give credit and include a thoughtful pin description.
Our analysis takes its cue in part from these instructions as both we and the site operators are interested in useful pins. However, the popular dataset shows many users add descriptions that are not thoughtful or descriptive, for example, a description that includes only a single letter. These "non-descriptive" descriptions did not prevent one single letter pin from appearing in one of the top 25 postings on a highly popular social media site, lending credence to the notion that the images alone, as a cue for finding or refinding (Teevan, et al., 2009), may be largely responsible for the usefulness of a pin.
Library of Congress and Smithsonian Pins
The prevalence of image sharing and blogging sites in our study suggests many pins are not collected from the original source, but rather are copies of originals. To investigate this question specifically in relation to digital libraries, we studied additional pins of resources from the Library of Congress and Smithsonian. Using the Pinterest search tool we conducted a search for the terms "library of congress" and "smithsonian". Selected from the search results were 25 pins representing an institution's holdings for each search. Our examination of the 50 pins found many images not posted from the original source. Twenty pins were pinned from the official Library or Smithsonian site (including an image from the official Smithsonian store). A total of 13 Flickr images were pinned from the Library of Congress or Smithsonian photostreams and a photo group on Flickr. The remainder (17) came from blogs, online magazines, e-commerce sites, or the Pinterest upload tool.
Less than half of the Library of Congress and Smithsonian pins were sourced directly from the original website. This suggests that images are often copied and re-copied across the Web. Figure 3 shows the path of one image we investigated from the Smithsonian that propagated across several sites and within Pinterest itself. A Pinterest user desiring to find the original would have to traverse many levels before reaching her goal.
Figure 3. Graph of Pinterest sources for a unique image that may be copied several times across the social web.
In a collection of 50 Library of Congress and Smithsonian pins we downloaded, we counted a total of one comment, 19 likes, and 150 repins. While this does not match the level of interaction found in the most popular pins, it shows at least 170 social interactions that Pinterest users have had with these LAM materials on this site. In the popular dataset we found three unique pins with the Smithsonian's domain, si.edu, and no pins with the Library of Congress' domain loc.gov. One pin was from the Library's Flickr.com photostream. No pins were from the Smithsonian's Flickr photostream, however three pins were added from the Smithsonian magazine website.
This work looks at subsets of pins and pinboards on Pinterest. Given our methods, we make no claim that our data is representative of the Pinterest community as a whole. Nevertheless we feel this research shows interesting social curating behaviors, and points to future research and development for the benefit of digital libraries and user/patron collecting tools.
Web users curate social collections in ways made possible only by recent innovations in Web 2.0 technologies. Previous research shows tools created by LAM are not popular among website visitors (Filippini-Fantoni & Bowen, 2007). Tools provided by Pinterest.com make it easy for a web user to add content without interrupting their information seeking process, fulfilling Marshall and Bly's (2004) suggestion to digital librarians that "there is opportunity for innovation and refinement in ... the facilities we give readers for interacting with interesting material." Implementing techniques used by Pinterest may help spur user adoption of social and curating tools created for LAM websites.
This work is part of a larger research project investigating collecting and sharing on social sites. Already some LAM have an organizational presence on Pinterest, following the Smithsonian '"go where they are'' approach (Kalfatovic, et al., 2009). The variety of sources is an opportunity for digital libraries to expand their collections with online-only resources identified by the Pinterest community, while the low overlap of board names with LCSH and curation practices of Pinterest users provides curatorial insight for LAM, ELIS, and personal information management. Ad hoc categories we observed could inform user-centered indexing practices in digital libraries. The level of social interaction shown by repins, likes, and comments implies substantial interest in resources on Pinterest. We recommend further study of Pinterest by LAM professionals who are creating Web 2.0 tools and social media strategies.
The authors extend a special thanks to Xia Lin, Andrea Forte, and Joan Beaudoin for their timely feedback and advice. Doctoral studies of both authors have been supported by IMLS research fellowships.
 Barsalou, L. W. (1983). Ad hoc categories. Memory & Cognition, 11(3), 211227.
 Experian Marketing Services. (2012). The 2012 Digital Marketer Trend and Benchmark Report.
 Fidel, R. (1994). User-centered indexing. Journal of the American Society for Information Science, 45(8), 572576.
 Filippini-Fantoni, S., & Bowen, J. (2007). Bookmarking in museums: extending the museum experience beyond the visit? Museums and the Web (Vol. 7).
 Heymann, P., & Garcia-Molina, H. (2009). Contrasting Controlled Vocabulary and Tagging. Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM 2009. Presented at the WSDM.
 Kalfatovic, M. R., Kapsalis, E., Spiess, K. P., Van Camp, A., & Edson, M. (2009). Smithsonian team Flickr: A library, archives, and museums collaboration in web 2.0 space. Archival Science, 8(4), 267277.
 Lesk, M. (1997). Practical digital libraries: Books, bytes, and bucks. Morgan Kaufmann.
 Marshall, C. C., & Bly, S. (2004). Sharing encountered information: digital libraries get a social life. Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries. (pp. 218227). IEEE.
 Marshall, C. C., & Shipman, F. M. (2011). The ownership and reuse of visual media. Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (pp. 157166). ACM.
 Marty, P. F. (2011). My lost museum: User expectations and motivations for creating personal digital collections on museum websites. Library & Information Science Research, 33(3), 211219. http://dx.doi.org/10.1016/j.lisr.2010.11.003
 McKenzie, P. J. (2003). A model of information practices in accounts of everyday-life information seeking. Journal of Documentation, 59(1), 1940.
 Savolainen, R. (1995). Everyday life information seeking: Approaching information seeking in the context of "way of life." Library & Information Science Research, 17(3), 259294.
 Springer, M., Dulabahn, B., Michel, P., Natanson, B., Reser, D., Woodward, D., & Zinkham, H. (2008). For the Common Good: The Library of Congress Flickr Pilot Project. Library of Congress.
 Stvilia, B., & Jörgensen, C. (2010). Member activities and quality of tags in a collection of historical photographs in Flickr. Journal of the American Society for Information Science and Technology, 61, 24772489. http://dx.doi.org/10.1002/asi.v61:12
 Teevan, J., Cutrell, E., Fisher, D., Drucker, S. M., Ramos, G., André, P., & Hu, C. (2009). Visual snippets: summarizing web pages for search and revisitation. Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI '09 (pp. 20232032). New York, NY, USA: ACM. http://dx.doi.org/10.1145/1518701.1519008
 Trant, J. (2009). Studying social tagging and folksonomy: A review and framework. Journal of Digital Information, 10(1).
About the Authors