Friday, January 27, 2012

BitTorrent and Death by Heavy Tails

While studying the network structure of the internet and the web-graph, it is hard to ignore P2P(peer-to-peer) file transfer - one study in 2010 suggested that 50-70% of internet traffic is generated by P2P applications. Consequently, a large number of papers have been published on modeling and optimizing P2P file transfer. Most of these studies focus on the well-known BitTorrent protocol, although some have considered alternatives optimized for high-volume flow on relatively small networks. However, as BitTorrent is the most widespread P2P sharing protocol, it makes the most sense to focus our efforts on just the torrent network.[3]

BitTorrent relies on an initial "seed" content provider and a torrent file of meta-tags, detailing the packet structure of the seeded content, the tracker that is responsible for keeping track of the packet distribution on the network, and encryption hash tags used to verify the authenticity of transferred packets. Peers, other users who want to download the seeded content, download this torrent file and then query for pieces of the files described in the torrent to all other peers on the network and the original seed. On top of this basic framework, the BitTorrent protocol employs several important optimizations to attempt to improve the fairness of packet distribution and incentivize higher upload rates from the members of its network. However, even with the incentive system in place, users often encounter the phenomenon of quickly downloading a file to 99% and then seeing their download speed slow to a grinding halt almost indefinitely. This phenomenon is a classic example of a heavy-tailed behavior - in this case, packet availability in the P2P network is very heavy-tailed, such that the most common packets can be found easily and downloaded from the majority of peers, while the rarest packets are extremely scarce.[1] This result is easily derived by assuming that packets are shared between peers with a uniform probability - the population of the packets that are shared first rises exponentially, drowning out the packets that were distributed by the seed later. A number of studies have proposed solutions to this heavy-tail problem, two of which I would like to discuss here.

The first paper focuses on the behavior of the torrent over time and the incentive system built into the protocol.[1] At the most basic level, a peer can be incetivized to upload more by offering a higher download speed in return. However, this system fails to provide incentive to peers to continue uploading downloaded files after completing the download. Most mathematical models of this process assume that the peer arrival rate is a Poisson process, with the peer arrival rate decaying exponentially with time, and that the time a peer remains in the network is constant. In this model, the lifespan of a particular torrent - the time in which a newly arrived peer can download the entire file - is quite low. In real networks, this time is 8.89 days on average, with a download failure rate of approximately 10%.
A study by L. Guo et. al. proposes an improved incentive system to reduce the download failure rate in the torrent network. The study finds that if torrents are allowed to interact with one another, and determine download rates for a peer based on upload performance across all active torrents, the download failure rate in the network falls off 6-fold since peers are encouraged to come back to completed torrents. Thus, if seeds are convinced to stay 10 times longer, in a traditional torrent network, the download failure rate is expected to fall off by a factor of 10, while in the multi-torrent incentive system, this reduction is by a factor of 10^6.[1]

The second study approaches the problem of heavy-tails in packet availability more directly by proposing a clustering mechanism, in which the rarest packets are distributed to all members of a particular cluster preferentially to increase the overall availability of these packets. For example, if cluster 1 notices that cluster 2 lacks a particular packet that a member of cluster 1 has, this packet is preferentially distributed to all members of cluster 1, such that when the clusters are reorganized, the packet distribution across clusters is fairly even. The authors of this study do not provide the expected packet distribution under this algorithm, but do show that in their simulations, the speed of packet distribution is significantly greater, suggesting that the packet availability becomes to some degree less heavy-tailed.[2]

BitTorrent networks are some of the most prevalent networks on the internet and as such, are extremely interesting from both a networking and a game-theoretic perspective. After all, seeding vs. leeching on a torrent is a classical example of a Prisoner's dilemma. The statistical outcomes of this pattern are quite interesting - one of them is the heavy tailed distribution of file packet availability described above. Surely, there are numerous other characteristics of this network that could be analyzed, perhaps from a game-theoretic standpoint rather than a probabilistic one, but this question is a subject for another time.

[1] Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning Ding, and Xiaodong Zhang. 2005. Measurements, analysis, and modeling of BitTorrent-like systems. In Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement (IMC '05). USENIX Association, Berkeley, CA, USA, 4-4. Available at http://dl.acm.org/citation.cfm?id=1251090. Date accessed: 27 Jan, 2012
[2] WEI, G., LING, Y., GU, Y., GE, Y.. A Dependable Cluster Based Topology in P2P Networks. Journal of Communications, North America, 5, jan. 2010. Available at: http://ojs.academypublisher.com/index.php/jcm/article/view/2393. Date accessed: 27 Jan. 2012.
[3] Chaojiong Wang; Ning Wang; Howarth, M.; Pavlou, G.; , "A dynamic Peer-to-Peer traffic limiting policy for ISP networks," Network Operations and Management Symposium (NOMS), 2010 IEEE , vol., no., pp.317-324, 19-23 April 2010. Available at http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5488483&isnumber=5488304. Date accessed: 27 Jan, 2012

2 comments:

  1. I really like the phrase "death by heavy-tails"...

    ReplyDelete
  2. Fantastic post! I really like the idea of managing incentives in networks. The game theorist in me shouts that BitTorrent would be so much more successful if they could add a reputation element to users. Maybe they do, I don't know.

    ReplyDelete