Programming
Articles.
Readings.
- Async - What color is your function?
- Schema evolution in Avro, Protocol Buffers and Thrift
- gRPC Motivation and Design Principles
- Microservices and the First Law of Distributed Objects
- What happens when you type google.com into your browser and press enter?"
- Production Twitter on One Machine? 100Gbps NICs and NVMe are fast
- Latency Numbers Every Programmer Should Know
- Browser engineering - Life of a Pixel
- The Philosophy of Computer Science - some great insights here on the nature of abstraction in software engineering. Particularly - abstractions are rarely “pure” or “free” as they are in mathematics.
- Consider an algorithm for binary sort. This algorithm has $O(log(N))$ complexity in theory, but in practice it depends on the underlying language and runtime you use to implement it as to the concrete performance. This is unlike mathematics, where a pure abstraction is necessarily pure - it has no underlying runtimes which change its cost.
- See Notes on abstraction in mathematics versus computer science.
- Reflections on Trusting Trust
- The Roots of Lisp
- Original source codes of major projects (bitcoin, bittorrent, pdf, www)
Languages.
Favourite language: Go.
Things I like:
- Python. Best fit: ML, data analysis.
- Natural bigint support.
- Easy to transfer between base systems -
hex
,bin
, etc.
- Go. Best fit: dumb systems engineering (API’s, blockchains, vm’s, vpn’s).
- Rust. Best fit: strongly safe code - cryptography, concurrency.
- My favourite flavor of Rust is close to C - mainly functions and few structs. You don’t necessarily need explicit lifetimes.
- JS: Best fit: frontends.
Things I enjoy:
- C and C++. Low-level is fun.
- C# and Unity. Microsoft API’s are super well designed. ie.
Task<T>
for async, instead ofPromise
. - Ruby. It’s really pretty.
- Bytecode and VM’s.
- Lisp, Brainfuck. :head_explode:.
- Some notes on Lisp
Things I don’t enjoy a lot:
- Solidity.
- Safely composing contract logic is nigh-impossible, because upgradeable contracts generally require
call
(cannot share storage) ordelegatecall
(can share storage, but requires storage pointers and namespacing where the tooling is complete shit). - Very small standard library. Not that bad, but a gripe. OpenZeppelin is poorly readable code.
- Safely composing contract logic is nigh-impossible, because upgradeable contracts generally require
Concepts.
- Problem solving and logic
- Divide-and-conquer
- Closely related: binary partitioning. This comes up a lot.
- Fixed-size workloads (BigTable, GFS).
- Divide-and-conquer
- Software engineering
- Practices
- Software reflects communication structures - Conway’s law.
- Encapsulation. Layering.
- Observability. The missing part most programmers don’t focus on. Data about your system is the essential thing you can use to improve it.
- Make it work. Make it simple. Make it efficient (fast, performant).
- Upgradability.
- Google’s protocol buffers / refactoring techniques are an interesting example.
- Everything is design and validating design (testing) in software.
- Proof-of-concepts - fleshing out design.
- Unit testing.
- Integration testing.
- Smoke/sanity testing.
- E2E testing.
- Continuous integration.
- Release patterns.
- Feature flags.
- Canary release. Limited rollout to subset of production.
- Progressive rollouts. Gradually increasing scope of rollout.
- Shadow testing. Running a change in parallel with production traffic without impacting real users.
- Patterns.
- Dependency injection.
- Event emission.
- Context.
- Logs.
- Practices
- Distributed systems networking.
- Schema’s, serialisation.
- Interface definition (IDL’s).
- Application binary interface (ABI’s).
- Schema evolution in Avro, Protocol Buffers and Thrift
- RPC.
- Virtual machines.
- Concurrency.
- Race conditions are a subset of memory safety. See Rust guide.
- Why do we async?
- Event loops.
- Node.js intro talk.
- Computation.
- Space-time tradeoff.
- Database index - conserve time, spend space.
- Optimal block sizing - consider the underlying “chunk size” of your medium. e.g.
- Networks: MTU
- HDD’s: disk sectors.
- This is the insight behind B Tree efficiency.
- P2P networks: BitTorrent
- Data locality / colocation.
- Google’s Bigtable colocates data with small VM runtimes, to colocate compute with data.
- Mathematical/computational patterns:
- One-way flow of state. Immutability, see Rust, React, Kubernetes.
- Map
- Reduce
- Filter
- Fold
- State machines, replicated state machines, prover-verifier computation.
- Space-time tradeoff.
- Security.
- Fuzzing.
- Capability vs. roles. (see Fuschia)
- Optimization and compilers.
- AST’s
- SSA
- Symbolic execution.
- Hardware.
- Moore’s law.
- L1, L2, L3 caches.
- CPU.
- Bitwise logic - more efficient to exponentiate by bitshifts, etc.
- Instructions for JavaScript - FJCVTZS
- RAM.
- GPU’s.
- GPU’s are their own computer (ie. completely separate motherboard, RAM, instruction set - PTX).
- FPGA’s.
- ASIC’s.
- Networking.
- TCP.
- Gaffer On Games - implementing reliable networking for video games.
- UDP.
- Remote Direct Memory Access.
- Networks:
- Packet-switching.
- Tier 1, 2 networks.
- One of the major inventions of the internet was flow control. Distributed bandwidth allocation.
- Q: how does this work in terms of economic allocation / mechanism design?
- TCP.
Tools.
- iTerm and zsh.
- SQLite.
- Protobufs.
- Docker.
- React and Flux but not Redux. Next.js is okay.
- Django.
- Rails.
- EVM. Generally pretty amazing.
- Keras. Tensorflow. Pytorch to some extent.
- Google Docs, Google Sheets. They just work.
Things I want to learn.
Given a project:
- Erlang.
- real-time (20,000 compute units per task iteration; reminds me of a better EVM)
- supervisor trees
- ML.
- K (array-based programming) and Kdb+
Open-source.
Some crazy open-source things:
TiledMapPlus
. When I was 14, I was really into gamedev and wrote a small Java library for the Slick2D ecosystem that’s now embedded in like 20 games.- Wikipedia. I submitted a PR to Wikipedia.
- I somehow contributed to a ZK prover, despite not really knowing Rust or having any formal training in cryptography.
Software design.
From What Is Software Design?:
The overwhelming problem with software development is that everything is part of the design process. Coding is design, testing and debugging are part of design, and what we typically call software design is still part of design. Software may be cheap to build, but it is incredibly expensive to design. Software is so complex that there are plenty of different design aspects and their resulting design views. The problem is that all the different aspects interrelate (just like they do in hardware engineering). It would be nice if top level designers could ignore the details of module algorithm design. Likewise, it would be nice if programmers did not have to worry about top level design issues when designing the internal algorithms of a module. Unfortunately, the aspects of one design layer intrude into the others. The choice of algorithms for a given module can be as important to the overall success of the software system as any of the higher level design aspects. There is no hierarchy of importance among the different aspects of a software design. An incorrect design at the lowest module level can be as fatal as a mistake at the highest level. A software design must be complete and correct in all its aspects, or all software builds based on the design will be erroneous.
In order to deal with the complexity, software is designed in layers. When a programmer is worrying about the detailed design of one module, there are probably hundreds of other modules and thousands of other details that he can not possibly worry about at the same time. For example, there are important aspects of software design that do not fall cleanly into the categories of data structures and algorithms. Ideally, programmers should not have to worry about these other aspects of a design when designing code.
Funny source code lore.
- Bitcoin
- the original client, in order to discover peers, would ping random IPv4 addresses. the funny thing is this isn’t as hairbrained as it seems. you can query the entire ipv4 address space in 16mins
- Linux
- Linux includes a link to the Wikipedia article for the Aussie bin chicken, the ibis
- BitTorrent
- BitTorrent’s wire format (BEncode) is based on Lisp S-Expressions
- ZK STARK’s.
- The core elliptic curves of ZK-STARK’s were discovered computationally. They had to literally search for the parameters of two elliptic curves - MNT4 and MNT6 - which took over 610,000 compute hours
- GPL
- Steve Jobs was the first person to violate the GPL license (cite)
- Byzantine Fault tolerance
- Hahha did you know, this was originally called the Albanian Generals problem (and hence Albanian Fault Tolerance) before they realised that might be a bit touchy..
Invented by Australians
- Google Maps
- rsync
- Sublime Text
- JIRA (kill me)