‹ Notes

Optimal piece size in BitTorrent

May 19, 2025

There’s a very interesting aspect of BitTorrent which isn’t studied a lot - optimal piece size.

  • A file distributed via torrent is split into fixed-size chunks called pieces.
    • why? because bittorrent is a p2p protocol, we want to authenticate the integrity of the file we download
    • we can hash the entire file and use that, but that means we can’t verify until we download the entire file
    • we can split the file into chunks, and distribute the list of hashes for each chunk. that way we can download and verify every piece we get.
  • You can choose the piece size. Say a 1GB file, can be split into 4096 pieces (256kb piece size) or 1024 pieces (1mb piece size)
  • There is an unexplained heuristic for optimal piece size - roughly 3000 pieces.

What does optimal piece sizing do?

It increases “swarm utilization”:

  • 1 GiB file @ 256 KiB → 4096 pieces
  • 1 node seeds 4096 pieces
  • 3 slower nodes download
  • now there is more swarm upload capacity, since pieces are shared between 3 slower nodes
  • smaller piece size = faster dissemination, but higher overhead
  • download/upload speed is distributed differently. probably according to a power law.
  • big nodes have high upload and download, small nodes have much smaller upload and download.
  • the faster pieces are distributed, the more “swarm utilization” there is - the more capacity for other nodes to upload to other nodes, and hence the more capacity

why massive piece size is bad:

  • Fewer pieces → fewer chunks to share → poor parallelism
  • Slower swarm spread → fewer peers can upload at a time
  • If a piece fails (bad hash/peer drop), you re-download a large chunk (e.g. 4 MiB wasted)
  • Slower initial seeding → fewer entry points for others
  • Wasted capacity → many peers sit idle with nothing to upload

why is smallest piece size bad:

  • Too many pieces → huge .torrent file (metadata bloat)
  • More Have messages + bigger bitfields → higher network chatter
  • Higher CPU and disk overhead → more seeks, more hash checks
  • Less efficient pipelines → too shallow to saturate bandwidth
  • Diminishing returns → overhead grows faster than benefits