D-Lib Magazine
The Magazine of Digital Library Research
transparent image

D-Lib Magazine

March/April 2014
Volume 20, Number 3/4
Table of Contents


BitTorrent and Libraries: Cooperative Data Publishing, Management and Discovery

Chris Markman, Clark University

Constantine Zavras



Printer-friendly Version


(This Opinion piece presents the opinions of the authors. It does not necessarily reflect the views of D-Lib Magazine, its publisher, the Corporation for National Research Initiatives, or the D-Lib Alliance.)



The evolution of Open Data depends on the use of new technologies that not only allow equal access to information, but equal access to the distribution and redistribution of public knowledge. An open API offers only the illusion of transparency—for data to truly be free, librarians must look towards their audience as digital collaborators, rather than simply end users. Thankfully, the tools to create a global, decentralized, peer-to-peer information network for massive amounts of data has been hiding under our nose the entire time. In this opinion piece we explore the opportunities afforded by the BitTorrent protocol. We also discuss what happens when libraries adopt a distributed, grassroots approach to data management that saves money and lays the groundwork for online community.


1. Introduction

The evolution of Open Data depends on the use of new technologies that not only allow equal access to information, but also to the distribution and redistribution of public knowledge. An open API offers only the illusion of transparency and does not replace the need for direct access to data by humans acting as digital collaborators rather than end users. In this opinion piece we will explore why we believe this is true, as well as several opportunities afforded by the BitTorrent protocol your library could be implementing today—not tomorrow.

To fully understand how this technology could benefit libraries, the terminology and concepts must be clearly understood. At the basic level BitTorrent is a communication protocol that allows for the transfer of files from multiple sources and is known as "peer-to-peer" (P2P). This type of file sharing protocol allows any computer, known as the "peer," to act as a client or server for other computers, via the Internet. What this means is that files can be shared directly between computers without the need for a central server. All that is required to join one of these systems is a connection to the Internet and peer-to-peer software, a very low bar for implementing the technology.

[Detail image of infographic designed by E.J. Fox, Visual.ly 1]

BitTorrent is one of the fastest and most efficient of the peer-to-peer systems. It differs from other P2P networks in that it has a central server for some of the data transfer. A small portion of the file is maintained at the central server, which is called a tracker. Anyone looking for a file using BitTorrent merely has to click on a file link in a browser. The BitTorrent software communicates with the tracker in order to find other computers that are running BitTorrent, that have either the complete files or have a portion of the file. The computers with complete files are known as "seed" computers, and those with a portion of the file are typically other peers in the process of downloading the file.

[Detail image of infographic designed by E.J. Fox, Visual.ly 2]

The tracker then identifies all the computers that have pieces of the file, which is called the swarm. These pieces are traded with other computers that are connected, allowing you to receive multiple pieces of the file simultaneously. By downloading the file in segments from multiple systems the overall speed is greatly improved, and the more computers are involved, the faster the file is downloaded. If many libraries or computers in a library were running BitTorrent it would make the movement of data more easily attainable.

With such an effective system of transferring information why haven't libraries looked into it yet as a way to manage data? A major reason has been the recent rise of negative publicity in social media and technology circles. This has become particularly relevant due to the spotlight on Pirate Bay and new legislation such as the "Six Strikes System," and other similar initiatives.3


2. The P2P Controversy

With the roll-out of the "Six Strikes" initiative that went live in mid-February of 2013, ISP's are cracking down on illegal file-sharing more than ever, primarily when it utilizes BitTorrent. Programs such as this are problematic in that BitTorrent has many legitimate uses for the dissemination of information. Many legal trackers exist, which we will discuss later, but this legislation stifles the use of this technology no matter what the purpose.

Pirate Bay is the most recognized purveyor of BitTorrent trackers, legal and illegal. Many of their founders are currently embroiled in legal issues on a global scale4. The sensationalism that has been traveling the news circuit has given rise to a stigma associated with conversations about BitTorrent. While Pirate Bay may be the face of BitTorrent for the general public, the technology behind it is much older and has many applications beyond the sharing of illegal files. Separating the technology from the misinformation that has come to surround it is important when assessing the uses BitTorrent can have in libraries.

Beyond the technical capabilities of the BitTorrent protocol, which prove to be robust, there is another dimension to P2P exchange that has yet to be explored. That is, the fundamental shift which takes place between library and patron in this data exchange—the "invisible" transaction taking place between the end user's computer hardware, monthly Internet and electric bill, and your collection's total bandwidth capability. Introducing BitTorrent into your library's information ecosystem is not only a potential cost saver, but the first step toward building an online data community as well.


3. Content and Discovery

The web-based BitTorrent Tracker interface has much in common with a typical online library catalog. Be aware that the adoption of BitTorrent should not be viewed as a replacement for a catalog but rather as an inherently multiplatform TCP/IP based data service that operates at the "data link layer"5. It does not change anything you know and love about digital libraries and online collections—it just delivers content faster and more efficiently by bypassing network bottlenecks through the utilization of "client" systems6.

Adapting BitTorrent technology to suit the needs of libraries provides several opportunities outside this context (discussed in the next section of this article) but the important point to consider in all cases is for BitTorrent's ability to suit the needs of all types of data in a network-enabled environment. Not only that, but as the amount of digital content grows, BitTorrent is a scalable way of pushing out content. Also consider the concept of "small data" as opposed to "big data" in this context7. While it may seem that BitTorrent is more conducive to sharing extremely large data sets in the hundreds of gigabytes, it can be easily used to pass small data around quickly to millions of people.

While it makes sense for the initial set up and ongoing maintenance of your BitTorrent tracker to fall under the responsibilities of your neighborhood systems librarian, there are some benefits in joining a preexisting public tracker because an online community is most likely already there. This is important to consider because high-traffic public trackers often have faster download speeds (exposing your content to a larger pool of potential seeds and peers = more bandwidth).

There are, of course, many reasons to go the opposite route and deploy a dedicated BitTorrent for your library as well8. An often overlooked BitTorrent feature that is embedded within a tracker allows it to easily double as a discovery system for librarians and patrons alike. That is, BitTorrent "swarm" statistics, which measure individual user upload/download ratios and/or the "popularity" of a given torrent file are often captured automatically by the tracker software—functioning as a sort of "Google Analytics" for your digital library collection9. With the recent addition of "magnet links" in the BitTorrent toolset, an enterprising systems librarian wouldn't even need to set up a tracker at all10. Such an implementation could easily be automated to insert magnet links into a given set of catalog records.

This is not to say you should immediately begin uploading files to The Pirate Bay, even though it is arguably the epicenter of legal and illegal BitTorrent activity online. There are in fact many BitTorrent trackers that deal exclusively with legal content. One popular example, eTree.org11, deals exclusively with live concert recordings (when the artist or musical group permits it). Another, Clearbits.com12 trades only "high quality, open-licensed (Creative Commons) digital media, datasets, and artwork for Content Creators"13.

[The most notable example of BitTorrent and libraries currently exists in the Digital Public Library of America's index of material from the Internet Archive.]

4. Opportunities for Libraries

While the benefits of BitTorrent have long been known to its users (they are, after all, both Ps in P2P) there has been little discussion in the IT world about the possible benefits of utilizing the unique characteristics of the BitTorrent protocol in heterogeneous information ecosystems. The fact that BitTorrent software is both multi-platform and open source creates stability for long-term sustainability. As a technology with over a decade of development behind it, BitTorrent has reached a maturity level ripe for specialization in library systems,s and when viewed through the lens of a librarian, BitTorrent also offers many unique opportunities14.


Virtual Teams

Online collaborations, especially those taking place between members of different institutions, can be difficult to efficiently maintain in terms of data management. As the size of your files become larger, the number of free online services like Dropbox.com or Google Drive become harder to find. Alternatively, with BitTorrent your entire team could be sharing the same data in a few clicks, while the software actively utilizes your collective computing power and bandwidth.

There are currently two BitTorrent side projects in development under the "BitTorrent Labs" banner at Bittorent.com that aim to do precisely this. SoShare streamlines the BitTorrent sharing process to the point where users can instantly share files with a single click from their desktop through the use of a browser plug-in—very useful for one-off file sharing needs and they claim to support files as large as a terabyte15. Similarly, "BitTorrent Sync", which was recently launched as a public beta, can "automatically sync files between computers via secure, distributed technology"16. The major difference between BitTorrent Sync and traditional BitTorrent sharing is the use of shared "secrets", similar to PGP encryption17.


Software Development and Deployment

Milliseconds matter a great deal in the world of Twitter. That's why Murder, their BitTorrent-based deployment system was created18. Essentially, Murder utilizes BitTorrent by treating code as data19. Although the speed of Murder is a notable advantage that's not why libraries and archives might find this BitTorrent use interesting. Murder is not only completely customizable and open source20 but it's essentially designed to make the job of systems maintenance easier and is optimized, by design, for local area network architecture.

[Screenshot of slide image from http://vimeo.com/11280885.]


The "Lots of Copies Keep Stuff Safe" (LOCKSS) mantra is another way of saying too many backups is better than too few21. With BitTorrent any institution can grow their own LOCKSS system with minimal up-front cost through crowd-sourcing or the strategic purchase of VPN servers (on multiple continents no less). Coincidentally, the same systems that have enabled media piracy to evade copyright law could also enable cultural institutions to implement data disaster planning for a fraction of the cost.

BitTorrent tracker software also offers the opportunity for libraries to reward their more "dedicated" seeds/users. This can be done by tracking download/upload ratios, creating the potential for online community building, and marketing—a decentralized, organic, "virtual timeshare" system that runs entirely on donated bandwidth and hardware from users.


Streaming Live Video

BitTorrent Live, a new feature as of 2013, enables any seeder to broadcast video streams across the world in a way that avoids the pitfalls of many competing online streaming systems. Where other systems might buckle under a heavy user load, BitTorrent Live scales instantly because each new peer is also a seed. BitTorrent Live is still very new, but it will undoubtedly play a critical role in conference events and library programming in the future.

[Screenshot of live traffic cam from http://live.bittorrent.com/.]

Streaming Old Video

Although not legal under US Copyright Law, libraries in other parts of the world could easily create a Netflix-like online streaming video service using BitTorrent Live. This is great news for public and academic libraries that have already invested in large DVD collections, but do not want to negotiate new streaming licenses with video distributors.

This is unlike the previous example of self-publishing because the number of "live users" could be limited by software to never exceed the total number of DVD copies of a particular title owned by a given library—similar to other online reservation systems for digital objects like Overdrive. It's important to note that while Section 108 currently grants libraries and archives permission to break DVD encryption under special circumstances this does not cover the process of format shifting an entire work for online consumption22.


Something Completely Different

The greatest opportunity BitTorrent presents for libraries may in fact be a unique combination of all the features and functionality previously discussed: LOCKSS, Self Publishing, and Live video depicting one example. These features could easily be combined to create a self-sustaining video archive that scales automatically to create a global broadcast hub—all in a way that takes full advantage of the inherently decentralized infrastructure of the web.


5. Conclusion

In 2012 the Internet Archive began serving up over a million files from its collection via BitTorrent. Internet Archive founder Brewster Kahle notes in this initial blog post that the "distributed nature of BitTorrent swarms and their ability to retrieve Torrents from local peers may be of particular value to patrons with slower access to the Archive", especially for "those outside the United States or inside institutions with slow connections"23.

This opinion piece has explored a number of ways that BitTorrent could be useful to libraries. The integration of BitTorrent, both internally and externally, could be a huge boon. Many libraries currently belong to networks such as college circuits, public library consortiums, and other insular groups. The infrastructure and content is already in place and BitTorrent could help bridge the informational and geographical gap between various libraries without extra cost to the institutions.

Many of the controversies that surround BitTorrent result from its portrayal in the media and its illegal use, combined with the public's lack of understanding of the protocol. The "Six Strikes" Initiative, and others, have treated the BitTorrent protocol as a single application used solely for piracy in the way Napster was, and disregard the fact that there are many legal uses that are being curtailed. The truth is that the technology itself is sound and has myriad legal applications in the library world, as has been discussed above.

BitTorrent is useful in areas of low and high technology. It allows the transfer of information slowly in places with limited resources, and the transfer of large quantities of data in high tech places that need quick and efficient movement of information. Even in developing countries and parts of the US that do not have fast Internet connections or large resources for libraries, BitTorrent can be used to send information effectively. The US is rich in repositories of information, as are many other nations. Using BitTorrent in partnership with foreign institutions could help disseminate information to parts of the world with less access to cutting edge computer technology. The "patrons" computers could build their own virtual networks without the need for pre-existing setups. The global ramifications of this technology are staggering and it could begin a new era of information literacy for places with no national libraries or information sharing framework.

BitTorrent could provide libraries a way to quickly serve information to the public as well as making their infrastructure and operations more efficient. The ability to roll out updates and transfer information would prove invaluable, as well as the backing up of materials. All that is needed is for the library community to embrace the technology and all the good that could come with it.

The BitTorrent protocol is over a decade old and yet the Internet Archive is the only high profile library currently utilizing this distribution technology. Libraries that adopt BitTorrent can not only improve download speeds for patrons but in doing so cut bandwidth costs and enrich the online community at large. Only when more institutions with vast digital collection and busy IT staff become willing to take the time to transform this once stigmatized protocol will we be able to comprehend its full potential. The latest developments of BitTorrent labs both confirms this and points towards several new applications that fully leverage the potential of decentralized network computing power.


6. Notes

1 A History of BitTorrent, E.J. Fox, designer, visual.ly, May 2011.

2 Ibid.

3 The popular BitTorrent news blog, TorrentFreak, is an excellent source for news on the US's "Six Strikes" policy and in June 2013 published the "copyright alert" materials sent to ISP customers.

4 The Pirate Bay's legal battle(s) are the subject of a crowd-funded 85 minute documentary available to view for free online.

5 See OSI Model, Wikipedia, the free encyclopedia, 8 February 2014

6 This is of course a misnomer because client systems in the BitTorrent information ecosystem are also servers—the basis of any "P2P" system.

7 Dr. Rufus Pollack of the Centre for Intellectual Property and Information Law and the University of Cambridge talks about the meaning of "small data" in this post on the Open Knowledge Foundation blog.

8 The wide range of BitTorrent tracker software and features can be viewed here.

9 To read more about Google Analytics and the software,s capabilities see Introduction to Google Analytics.

10 Magnet links are basically a link containing the hash value of a torrent file, originally used as a way to subvert online ant-piracy measures which often first target illegal BitTorrent tracker websites (when a tracker is shut down, its .torrent files no longer work). For a basic overview see Lifehacker, What Are Magnet Links, and How Do I Use Them to Download Torrents?.

11 eTree's hosting is provided by ibiblio, a familiar name in world of libraries. ibiblio also manages another BitTorrent tracker called Terasaur in collaboration with the School of Information Science, the School of Journalism and Mass Communication, and Information Technology Services at the University of North Carolina at Chapel Hill.

12 Formally known as LegalTorrents.com—there is an interesting article on their blog that explains why they made the name change.

13 As of January 23, 2014, while this article was in preparation for publication, Clearbits.net shut down permanently.

14 Many of the following examples are echoes of network security professional Joe Steward paper titled 'BitTorrent and the Legitimate Use of P2P' which was presented in a panel discussion held by the Forum on Technology & Innovation in Washington, DC. Steward offers similar findings under a more general context. A version of the presentation is available on his personal website.

15 SoShare is no longer being actively developed, and may of its features have been replaced by BitTorrent Sync and another project, Paddle Over.

16 See BitTorrent Sync.

17 What is PGP? See Wikipedia's Pretty Good Privacy description, and the BitTorrent Sync FAQ for more information. It's also worth noting this is different than the traffic encryption already support by most BitTorrent client software.

18 "Murder" being an uncommon term for a flock of crows.

19 For the gory IT details on the project look no further than Twitter infrastructure engineer Larry Gadea's presentation in 2010 at the Canadian University Software Conference.

20 The code and documentation is freely available on their GitHub project page.

21 For those looking for a more official definition, see the Society of American Archivists website.

22 US Copyright Office. "Revising Section 108: Copyright Exceptions for Libraries and Archives". February 2013.

23 See Internet Archive Blogs. "Over 1,000,000 Torrents of Downloadable Books, Music, and Movies". August 2012.


7. References

[1] "Comparison of BitTorrent Tracker Software." Wikipedia, the free encyclopedia. Web, accessed March 2014.

[2] Fox, E.J. "A History of BitTorrent." Visual.ly. Web, accessed March 2014.

[3] Kahle, Brewster. "Over 1,000,000 Torrents of Downloadable Books, Music, and Movies." Internet Archive Blogs, 7 August 2012.

[4] Klose, Simon. TPB AFK: The Pirate Bay Away from Keyboard. Film, 2013

[5] "OSI Model." Wikipedia, the free encyclopedia Web, accessed March 2014.

[6] Pollock, Rufus. "What Do We Mean By Small Data." Open Knowledge Foundation. 23 April 2013.

[7] Stewart, Joe. "BitTorrent and the Legitimate Use of P2P." Washington, D.C, 2004. Web, accessed March 2014.

[8] Twitter — Murder Bittorrent Deploy System. Film, 2010.

[9] "What Are Magnet Links, and How Do I Use Them to Download Torrents?" Lifehacker. Web, accessed March 2014.


About the Authors

Photo of Chris Markman

Chris Markman is the Resource Library Coordinator for the Visual & Performing Arts Department at Clark University where he manages a multimedia research collection. His interests include 3D printing, intellectual property law, experimental video, and digital curation.

Photo of Constantine Zavras

Constantine Zavras is a freelance technical writer, editor, and data specialist. He worked on data aggregation and cataloging projects at ITA Software as a Data Engineer and Domain Team Lead. His interests include electronics, open source advocacy, library digitization, and information literacy.

transparent image