TY - GEN
T1 - Sampling Bias in BitTorrent Measurements
AU - Zhang, Boxun
AU - Iosup, Alexandru
AU - Pouwelse, Johan A.
AU - Epema, Dick H.J.
AU - Sips, Henk J.
PY - 2010
Y1 - 2010
N2 - Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete measurement impractical. While a large number of measurements have already employed diverse sampling techniques to study parts of BitTorrent network, until now there exists no investigation of their sampling bias, that is, of their ability to objectively represent the characteristics of BitTorrent. In this work we present the first study of the sampling bias in BitTorrent measurements. We first introduce a novel taxonomy of sources of sampling bias in BitTorrent measurements. We then investigate the sampling among fifteen long-term BitTorrent measurements completed between 2004 and 2009, and find that different data sources and measurement techniques can lead to significantly different measurement results. Last, we formulate three recommendations to improve the design of future BitTorrent measurements, and estimate the cost of using these recommendations in practice. © 2010 Springer-Verlag.
AB - Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete measurement impractical. While a large number of measurements have already employed diverse sampling techniques to study parts of BitTorrent network, until now there exists no investigation of their sampling bias, that is, of their ability to objectively represent the characteristics of BitTorrent. In this work we present the first study of the sampling bias in BitTorrent measurements. We first introduce a novel taxonomy of sources of sampling bias in BitTorrent measurements. We then investigate the sampling among fifteen long-term BitTorrent measurements completed between 2004 and 2009, and find that different data sources and measurement techniques can lead to significantly different measurement results. Last, we formulate three recommendations to improve the design of future BitTorrent measurements, and estimate the cost of using these recommendations in practice. © 2010 Springer-Verlag.
UR - http://www.scopus.com/inward/record.url?scp=78349302656&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78349302656&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15277-1_46
DO - 10.1007/978-3-642-15277-1_46
M3 - Conference contribution
SN - 3642152767
SN - 9783642152764
VL - 6271 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 484
EP - 496
BT - Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31 - September 3, 2010, Proceedings, Part I
T2 - 16th International Euro-Par Conference on Parallel Processing, Euro-Par 2010
Y2 - 31 August 2010 through 3 September 2010
ER -