‹ Notes

Notes on Google File System.

Paper, Presentation to FTF group

Overview.

Google File System was designed in 2004.

It follows the Jeff Dean philosophy of - split into one unit of load balancing and centrally allocate at a master to workers.

The file system index is stored at a master server, in-memory.

Each file is split into fixed-size chunks, and distributed to chunkservers.

The system scales horizontally by nodes called chunkservers, which are assigned to store chunks by a master.

Each file has configurable replication. By default, the replication factor is r=3, meaning each chunk is replicated on 3 different chunkservers.

The master handles file metadata creation, chunk creation, chunk allocation, and chunk reallocation when chunkservers die.

The consistency model of GFS is a bit complex, and not worth covering, just read the paper.

In GFS v2 (Colossus), replication is done using Reed-Solomun codes, as in Amazon S3. This is referred to as striping.

Design.

Modelling a file system.

Scaling a file system - chunkservers.

Consistency model.

Mutation model.

Master server roles.

Software architecture.

Media.

Screenshot 2024-10-11 at 4.02.04 pm.png

Diagrams.

Screenshot 2024-10-22 at 8.35.00 pm.png

Screenshot 2024-10-22 at 7.56.32 pm.png

Screenshot 2024-10-22 at 8.35.45 pm.png

Screenshot 2024-10-22 at 8.35.50 pm.png

Screenshot 2024-10-22 at 8.36.54 pm.png

Screenshot 2024-10-22 at 8.41.54 pm.png

Screenshot 2024-10-22 at 8.36.59 pm.png

Readings.

https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf

https://static.googleusercontent.com/media/research.google.com/en//people/jeff/Stanford-DL-Nov-2010.pdf

http://sghose.me/talks/storage systems/2015/11/23/GFS-Talk/

https://github.com/CodeBear801/tech_summary/blob/master/tech-summary/papers/colossus.md

https://www.cnblogs.com/dhcn/p/7389645.html