BitTorrent

Peer-to-peer content delivery

Paul Krzyzanowski

November 20, 2023

Introduction

A peer-to-peer model is an application architecture that removes the need for dedicated servers and enables each host to participate in providing a service. Because all systems can both access as well as provide the service, they are called peers. In this discussion, we will focus on BitTorrent, a peer-to-peer architecture designed for scalable content delivery.

BitTorrent

Similar to Akamai, the design of BitTorrent was motivated by the flash crowd problem. How do you design a file-sharing service that will scale as a huge number of users want to download a specific file? Content delivery networks rely on a massive infrastructure of servers distributed throughout the world and at multiple ISPs to achieve their scale. BitTorrent seeks to accomplish this via a peer-to-peer model, although focusing on file downloads specifically rather than downloading web content.

Earlier peer-to-peer systems such as Napster, Gnutella, and Kazaa all serve their content from the peer that hosts it. If many users try to download a popular file at the same time, all of them will have to share the bandwidth that is available to the peer hosting that content and will see their performance plummet (not to mention the load on the server).

The idea behind BitTorrent is to turn each peer that is downloading content into a server of that content. BitTorrent only focuses on the download problem and does not handle the mechanism for locating the content.

To offer content, the content owner creates a .torrent file. This file contains metadata, or information, about the file, This includes the name, size of the file, and a SHA-256 hash of the file that will allow a client to validate the integrity of the complete file. BitTorrent downloads files in pieces and the torrent file also contains the piece size and a list of hashes of each piece of the content. This list of hashes in the torrent file allows a downloading peer to validate that each downloaded piece has been downloaded without errors or modifications. Finally, the .torrent file contains the URL of a tracker.

The tracker is a server running a process that manages downloads for a set of .torrent files. When a peer wants to download a specific file, it opens the .torrent file and connects to the tracker specified within the file via HTTP. The tracker is responsible for keeping track of which peers have which have all or some of the pieces of the file. There could be many trackers, each responsible for different torrents. The tracker responds to the client with a list of IP addresses of peers that have pieces of the file.

A seeder is a peer that has the entire file available for download by other peers. Seeders register themselves with trackers so that trackers can direct downloading peers to them.

Downloading content

Using the list of addresses obtained from the tracker, the peer that is downloading the file picks any peer in the list and connects to it. The other peer sends returns a bitfield message, a compact representation if the pieces of the file that it has. Each bit in the bitfield corresponds to a piece of the file, with a bit of 1 indicating the piece is available. The downloading peer can then request specific pieces of the file.

For each piece that the peer downloads, it computes a SHA-256 hash of the piece and compares it against the hash in the .torrent file to ensure that the piece it downloaded is not corrupt.

The peer repeats this process, connecting to random peers and downloading random pieces that it doesn’t have until it has all the pieces of the file. A peer may connect to multiple peers at the same time and perform concurrent downloads. It also periodically re-downloads an updated peer list from the tracker.

Scaling with more peers

As soon as the peer downloads any piece of a file, it can serve it to other peers. A leecher is the term for a peer that has parts of the file but not the entire file. Its IP address was registered with the tracker when it connected to it to get the list of addresses that have pieces of the file: the list of seeders and leechers. The collection of seeders and leechers for a particular file is called a swarm. Periodically, each peer in the swarm will report its status to the tracker, which updates its list of seeders and leechers. This report includes information on which pieces of the file the peer has, allowing the tracker to send appropriate peer lists to other participants. When a peer needs an updated list of peers, it can re-contact the tracker to get the latest information.

This mechanism is the key to BitTorrent’s ability to scale its effective bandwidth. As more peers download a file, they become leechers and make the swarm bigger, increasing the set of available machines that everyone in the swarm can contact to get new blocks. Each new peer increases the overall download capacity by offering uploads of pieces of the file that it already downloaded. Once a file is fully downloaded, the leecher has the option of turning itself into a seeder and continuing to allow other peers to download pieces of the file.

References

Last modified November 25, 2023.
recycled pixels