Notes on B Trees · Liam Zebedee

Although BST’s and B-Trees share the same asymptotic complexity of O(log n) for all operations, the specialisation of B-Trees are often more performant in practice. By specialising their order (number of keys in a node, and number of children per node) to fit into the block size of the underlying media (sectors for HDD’s, cache and block-based storage for SSD’s), they can reap the benefits of fewer disk reads per-lookup and better use of caching mechanisms. For a real-world analysis, look to this post from a database called RethinkDB.

Keys are clustered, and with this data locality comes better performance at multiple levels (disk caching, L2/L3 caching, RAM memory page caching).

The engineering idea behind B-Trees don’t just apply to software on a single machine. You can imagine distributed databases as a B-Tree where the nodes are in fact network nodes. Read about Google Bigtable as a B-Tree over networked nodes.

A binary tree splits the keyspace into two equal-sized workloads. A B-Tree splits the keyspace into N equal-sized workloads. The N is chosen to maximise the read/write efficiency. This optimization can mean optimizing for the failure rates of hard-drive disk platters (where you only get a certain number of writes - so you want to write as much data as possible in one write), but it can also mean optimizing for minimising network communication of expensive consensus protocols like Paxos. In the case of Bigtable, N = 128 MB so that the root node fits inside a strongly-consistent database called Chubby.