Optimal piece size in BitTorrent
May 19, 2025
There’s a very interesting aspect of BitTorrent which isn’t studied a lot - optimal piece size.
- A file distributed via torrent is split into fixed-size chunks called pieces.
- why? because bittorrent is a p2p protocol, we want to authenticate the integrity of the file we download
- we can hash the entire file and use that, but that means we can’t verify until we download the entire file
- we can split the file into chunks, and distribute the list of hashes for each chunk. that way we can download and verify every piece we get.
- You can choose the piece size. Say a 1GB file, can be split into 4096 pieces (256kb piece size) or 1024 pieces (1mb piece size)
- There is an unexplained heuristic for optimal piece size - roughly 3000 pieces.
What does optimal piece sizing do?
It increases “swarm utilization”:
- 1 GiB file @ 256 KiB → 4096 pieces
- 1 node seeds 4096 pieces
- 3 slower nodes download
- now there is more swarm upload capacity, since pieces are shared between 3 slower nodes
- smaller piece size = faster dissemination, but higher overhead
- download/upload speed is distributed differently. probably according to a power law.
- big nodes have high upload and download, small nodes have much smaller upload and download.
- the faster pieces are distributed, the more “swarm utilization” there is - the more capacity for other nodes to upload to other nodes, and hence the more capacity
why massive piece size is bad:
- Fewer pieces → fewer chunks to share → poor parallelism
- Slower swarm spread → fewer peers can upload at a time
- If a piece fails (bad hash/peer drop), you re-download a large chunk (e.g. 4 MiB wasted)
- Slower initial seeding → fewer entry points for others
- Wasted capacity → many peers sit idle with nothing to upload
why is smallest piece size bad:
- Too many pieces → huge .torrent file (metadata bloat)
- More Have messages + bigger bitfields → higher network chatter
- Higher CPU and disk overhead → more seeks, more hash checks
- Less efficient pipelines → too shallow to saturate bandwidth
- Diminishing returns → overhead grows faster than benefits