Cryptography and blockchain
My projects.
- Built a full Nakamoto blockchain. [github]
- Built a Rust implementation of Tendermint. [github]
- Built a Go implementation of Pod. [github]
- Worked on prediction markets for content curation with the Ethereum Foundation.
- Launched Dappnet to decentralize the DeFi frontend.
- Launched Mergeswap with some friends, storage proof bridge of ETHW to Eth mainnet, $100k volume.
- Reverse-engineered StarkNet traces, contributed to first open-source ZK prover. (pullreq)
- Papers (self-published):
- Articles:
Smart, interesting people to follow.
- Starkware
- LambdaClass / Fede
- Paradigm / Gakonst
- Vitalik
- Dan Boneh
- Remco Bloemen
- Alex Obadia
- Michael Saylor - see his bitcoin thesis.
- Arthur Hayes.
Blockchains.
Blockchains are byzantine fault tolerant (BFT) replicated state machines.
They usually consist of:
- state, whether its a mapping of accounts to balances, or more complex things like positions in lending protocols
- a virtual machine (VM) paired with an execution environment, that runs smart contracts (programs which modify state and have built-in API’s)
- users creating transactions, which are cryptographically signed intents to execute functions on smart contracts on the VM.
- a cryptocurrency for paying for the fees of running the network
- a resource metering system like gas, which measures the cost of one transaction.
- a consensus protocol which decides on the order of transactions applied, which is usually byzantine-fault-tolerant (BFT)
- a regular interval at which transactions are applied, termed a block
- the space inside one block, termed blockspace, where transactions can be filled. This space is paid for via fees. And fees are priced according to an automated market maker for gas.
- a set of incentives (block rewards) for nodes to (1) store the data (state and history), (2) execute transactions and (3) continue providing blockspace via consensus.
A blockchain’s state can be derived by getting every transaction in an ordered sequence $T$, and applying the state transition function serially.
Concepts:
- Blockspace.
- Gas markets.
- Mempools.
- Progressive gas auction (PGA) (first-price auction).
- Congestion pricing.
- Vitalik’s Position paper on resource pricing
- EIP1559.
- Multidimensional resource pricing
- Data availability.
- L1 as a DA layer. Blobspace.
- Data Availability Sampling: From Basics to Open Problems
- Celestia (paper)
- EigenDA.
- Data availability committees.
- Blockchain state models:
- UXTO - Bitcoin.
- Account-based - Ethereum, Solana.
- Private UXTO (privacy systems - Aztec, Zcash).
- Comparison with legal systems:
- UXTO: deeds
- Account-based: Torrens titles
- State growth and history.
- State is the up-to-date balances of the blockchain, history is the full set of all blocks and transactions.
- History is necessary for verification in replicated state machines (but not ZK), as well as historical data analysis.
- Articles:
- Execution.
- EVM.
- Cosmos.
- SVM.
- zkVM.
- Settlement.
- Sequencing.
- Sequencing refers to the algorithm to derive the transaction sequence. In Bitcoin and Ethereum, this is just the consensus algorithm.
- In other cases (L2’s), sequencing can be centralized.
- Merklelization.
- Bitcoin and Ethereum make use of merkle trees to commit state, which makes it possible to proof individual state leaves in places outside of the chain (e.g. storage proofs).
- Models for integrity:
- Replicated state machine. $O(N)$ where $N$ is number of transactions
- ZK proof. $O(log(N))$ to $O(1)$, depending on scheme.
- Scaling.
- Optimistic rollups.
- Interactive binary search for fraudulent opcode.
- ZK rollups.
- Use L1 as the DA layer for L2 transactions, post proof and state diff.
- Based rollups.
- Using Ethereum L1 as a base layer, and enshrining a Lisp-style
eval
opcode.
- Using Ethereum L1 as a base layer, and enshrining a Lisp-style
- Optimistic rollups.
- Security.
- Proof-of-work (PoW).
- Objectivity w/ checkpointing in early days.
- Proof-of-stake (PoS).
- Weak subjectivity and this.
- Restaking.
- Proof-of-work (PoW).
- Tokenomics.
- Coinbase / block reward mechanism.
- Issuance curve.
- Deflationary vs. inflationary.
- Bridging / interoperability.
- Vitalik’s R3 paper on chain interop
- Hash-lock time contracts (HLTC’s).
- MEV
Bitcoin.
- Bitcoin whitepaper
- Readings:
- History:
- Philosophy.
Ethereum / EVM.
- Ethereum whitepaper
- Ethereum yellowpaper (technical).
- Eth 1.0 (POW)
- Eth 2.0 (POS)
- Readings
Cosmos.
- Cosmos chains are based on CometBFT (aka. Tendermint).
- Execution.
- There is no execution environment - smart contracts are simply written in Go, and call out to standard Cosmos API’s for storage.
- There is no gas metering - Cosmos chains define their own fees for each smart contract function.
- Bridging is standard, thanks to the IBC protocol.
Solana.
- Solana whitepaper
- Proof of history.
- Validators run hashes recursively, that is, H(0 ++ t0), H(H(t0) ++ t1), H(H(H(t0) ++ t1) ++ t2).
- Each hash’s input is the last hash concatenated with current state.
- Transactions are inserted into
t
as they are received. - The output of
H
functions as a logical clock, that defines a forward transaction sequence.
Rogue.
Blockchain-specific devices.
- Smart contracts.
- The execution model of blockchains. Small pieces of code which have their own persistent state, can emit events, can call other contracts (subject to call depth limitations) and can receive tokens just like any other externally owned account (operated by users).
- First published by Nick Szabo.
- Oracles.
- Data feeds from the external world, usually involving prices of assets (USD, BTC) or in prediction markets, adjudications of outcomes.
- Centralized:
- ChainLink.
- Token-based.
- Kleros.
- SchellingCoin.
- Bridges.
- Keepers.
- Small agents which “keep” systems up to date.
- Examples: liquidation keepers.
- Indexers.
- Although blockchains are databases, the indexing is not enshrined at the blockchain layer. Instead, external third-party indexers allow you to query the chain.
MEV.
MEV is known as Miner Extractible Value / Max Extractible Value.
The basic idea - you can manipulate the transaction sequence (include, uninclude, reorder) on a blockchain in order to make profit.
- A model for Bitcoin’s security and the declining block subsidy
- Flash Boys 2.0: Frontrunning, Transaction Reordering, and Consensus Instability in Decentralized Exchanges
- Builder-Proposer separation, relayers, bundles.
- Sandwich attacks, salmonella, sybolic re-execution, multi-block MEV, MEV unbundling attacks.
- Flashbots research collective
- MEV and me
- Time, slots, and the ordering of events in Ethereum Proof-of-Stake
- Priority is All You Need
- Concurrent Block Proposers
- Leaderless Auctions
Consensus protocols.
- Nakamoto (paper)
- Basis of Bitcoin
- Proof-of-work = difficulty readjustment + hashcash.
- Stateless SPV proofs
- Ethereum 1.0 (POW).
- Ethereum 2.0 (POS)
- Tendermint (paper)
- Pod (paper)
- DAG-based mempools / consensus.
- Insight: you can separate transaction ordering from transaction dissemination.
- Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus
- Cosmos without Tendermint: Exploring Narwhal and Bullshark
Markets
Markets are where buyers and sellers transact. Markets are usually priced in pairs, e.g. BTC/USD refers to a market with BTC and USD in it. Market makers are organisations that keep the price steady, or otherwise facilitate liquidity such that people can buy and sell when they want to. Market manipulation refers to when people commit short-term trades in order to move the price, and frighten other buyers into selling (and some more complex strategies).
Markets are implemented in a few ways on-chain:
- orderbooks (e.g. CLOB) + matching engines (compute mid-point, fill orders).
- automated market makers
Automated market makers.
- Constant product $xy=k$ (CPPM)
- Stablecoin-efficient curves.
- StableSwap (used in Curve)
- Uniswap v3
- Synthetix simulated liquidity AMM
- Research
Perpetual futures.
Crypto is the first place perpetual futures were widely implemented (on BitMEX). For a good engineering rundown, I suggest reading the Synthetix futures protocol proposal, which details the mathematics.
Prediction markets.
Prediction markets are markets where you bet on the outcome of an event.
For example, “president of USA in 2025” is the event, and “Donald J Trump”, “Kamala Harris” etc. are the outcomes. Each outcome is tokenised, and sold via a market maker. For example, “Donald J Trump” might trade at 60c. The price reflects the probability of the outcome - meaning 60c is a 60% odds. All outcomes prices sum to $1, following the law that all probabilities for mutually exclusive events must sum to 1.
The market has defined resolution critiera, whereby the outcome is defined according to some data source plus additional conditions. Participants can buy/sell outcome tokens at any time. In Polymarket, they use a CLOB to fill trades. Prices reflect supply/demand tension and are filled at midpoint.
- Types:
- Discrete.
- Conditional tokens (Gnosis)
- Basis of Polymarket (which uses a centralized CLOB matcher ?).
- Continuous.
- Discrete.
Stablecoins.
- Algorithmic, backed by ETH, priced in USD.
- DAI v1.
- RAI.
- GHO.
- Algorithmic, backed by BTC, priced in BTC.
- NakaDollar, which is the basis of Ethena.
- Backed by 1x short 0.5 BTC + 0.5 BTC
- NakaDollar, which is the basis of Ethena.
- Centralized.
- Centralized stablecoins are minted by centralized issuers like Tether and Circle. Their business model is that of a bank - they take customer deposits and generally-speaking buy treasury bills for risk-free yield. Ironically we have seen a bank run on DAI v2 (due to being collateralised by USDC) but not on Tether - their proprietary allocation has been defensive against rumours, whereas MakerDAO’s is open to the public to see.
- USDC (Circle USDC).
- USDT (Tether).
Privacy protocols.
- Protocols:
- Zcash
- Aztec
- Tornado Cash
- Mimblewimble / Grin
- Articles:
Zero Knowledge Proofs (ZK).
ZK allows you to create proofs of computation that you can verify faster than you can run them. For a relation $F(x)=y$, you can run $F(x)$ and generate a proof $P$. The proof $P$ can be verified, $\mathrm{verify}(P)$, which shows $\mathrm{verify}(P)=F(x)$. Interestingly, while $F(x)$ is $O(N)$ to evaluate, $\mathrm{verify}(P)$ is $O(log(N))$ or even $O(1)$ in some schemes. $^{1}$
These are referred to as succinct arguments of knowledge. The ZK stands for zero knowledge. Schemes that employ the ZK aspect mean that your input to $F(x)$, $x$, remains encrypted, such that anyone reading your proof $P$ cannot know what $x$ is. This means you could for example, prove that your age is over 18, without revealing your ID. But most times you see ZK proofs, it’s only referring to the succinct part.
Not only that, but $F(x)$ can of course implement $\mathrm{verify(x)}$, whereby the function we are proving is in itself verifying proofs. This is referred to as recursive proofing, and sometimes associated with proof aggregation.
Using ZK proofs, you can compress the entire bitcoin verification work down to a single 20kB proof. The big-O notation can refer to time and space - some schemes are fast to prove, but generate larger proofs (STARK’s). Some schemes generate very small proofs (groth16) that are only 3 field multiplications to verify. You can even mix and match schemes using recursive proofing to get the benefits of both.
$^{1}$ this explanation is simplified. You are in fact proving the relation, $F(x) = y$ and $\mathrm{verify}(P)$ technically only returns a 1 or a 0. In order to get a return value, typically we wrap the execution inside a bootloader of sorts, which shows what program we’re proving and takes as input the hash of the return data. Also - generating a proof is not running $F(x)$ directly, rather converting $F(x)$ to a form like an arithmetic circuit or an algebraic intermediate representation (AIR).
- Articles.
- Concepts.
- Prover-verifier distinction.
- Satisfiability.
- Commitment scheme.
- STARK’s.
- Trace - history of CPU registers and operations.
- Arithmetization - transform trace to set of linear equations.
- Interpolation - transform trace to set of polynomial equations.
- Composition polynomial.
- Proving low-degreeness.
- Reed-Solomun encoding.
- Interactive oracle proofs of proximity.
- SNARK’s.
- Groth16.
- Arithmetic circuit.
- Quadratic Arithmetic Program (QAP).
- Arithmetic circuit.
- Groth16.
- Foundational papers.
- Machine Learning proofs.
- Further reading:
- Books.
- Open-source.
- Reverse-engineered StarkNet traces, contributed to first open-source ZK prover. (pullreq)
Multi-party-computation (MPC) and FHE.
Multi-party computation is akin to a shared encrypted CPU, where you upload a program, run it with inputs, and get an output. It can be used for example to do data analysis without revealing data to third parties. Imagine a program designed to detect how old you are - you encrypt your genome, send it to a remote server, it does MPC, and you get back the result - without them ever seeing your data. That’s the magic.
- Threshold ECDSA wallets.
- SPDZ
- Renegade encrypted DEX (whitepaper)
- Renegade constructs a non-custodial dark pool - a CLOB-style DEX where the amounts are encrypted.
- Uses:
- MPC to perform order matching and transfers (SPDZ-style maliciously-secure MPC-with-abort).
- coSNARK’s to generate proofs of the MPC between trading parties.
- Blockchain to settle ZK proofs, which advance the state machine.
Cryptographic techniques.
- New Directions in Cryptography - a seminal paper by Diffie and Helman.
- Accumulators and commitment schemes.
- Merkle trees.
- Sparse merkle trees (dicts/maps) (paper)
- Merkle mountain ranges.
- BLS signatures.
- BLS signatures can be aggregated, as in Ethereum 2.0 consensus, hence they qualify as accumulators for specifically signatures.
- Vector commitments
- Commit-reveal.
- Used in SchellingCoin, and atomic swaps.
- Bloom filters.
- Skip ratchets.
- Used in Signal, Matrix.
- How does a skip ratchet work?
- Probabilistic sampling.
- Consensus.
- Bitcoin, Ethereum probabilistically elect a leader per-round using a seed.
- In Bitcoin, this is based on hashpower, adjusted in realtime using difficulty.
- In Ethereum, this is based on the RANDAO hash, which is formed from block proposers contributing randomness through BLS signatures.
- Arweave.
- Arweave uses probabilistic sampling to test if nodes are storing content. Each block, a set of data roots is randomly sampled as a challenge to storage providers that they are storing that content.
- Bitcoin, Ethereum probabilistically elect a leader per-round using a seed.
- ZK proofs.
- STARK proofs are based on a verifier probabilistically sampling random points on the curve given by the prover, the curve being constructed from a computational trace, representing the program’s execution.
- Consensus.
- ZK proofs.
- SNARK’s
- Groth16
- STARK’s
- Collaborative SNARK’s (coSNARK’s) (paper, presentation): construct a SNARK proof between $N$ mutually distrusting parties.
- SNARK’s
- Multi-party computation.
- Public key schemes.
- RSA.
- ECDSA.
- Compressed public keys / recovery trick.
- BLS signatures
- Schnorr signatures
- Ring signatures
- Merkle puzzles
- This is pre-RSA and very interesting.
- Curves.
- secp256k1, ed25519
- pairing-friendly 2 cycles: mnt4, mnt6
- pairing-friendly curves: bn254
- Distributed Hash Tables (DHT’s).
- Fuzzy Message Detection
- Problem: in a private cryptocurrency like Zcash, how do you check your balance without revealing your address to the node?
- Penumbra implementation
- Verifiable Delay Functions (VDF’s)
- Esoteric inventions.
- Verifiable Encryption under Committed Key (VECK)
- Enables fair data exchange - atomic swaps of data for $.
- Blog post
- Rateless Invertible Bloom Filters
- Short-lived zero-knowledge proofs and signatures
- Low Overhead Broadcast Encryption from Multilinear Maps
- Secret Sharing with Certified Deletion (quantum)
- Proof of Latency Using a Verifiable Delay Function
- Verifiable Encryption under Committed Key (VECK)
Security / trust models.
- Trusted Execution Environments (TEE’s)
- Liveness
- Market manipulation
- Censorship resistance
- Capture resistance
- Slashing.
- Ethereum - automatic.
- Solana - manual - social slashing.
- Malicious M-of-N / byzantine faults
- Bitcoin: 51% attack
- Tendermint: $3F+1$
- Ethereum: $\frac{n}{3}$ accountable safety
- Collusion
- Technical
- DNS attacks
- BGP attacks
- Mechanism design
Data availability.
Data availability schemes are ways to ensure block data (transactions) can be scaled. They basically redundantly extend data using Reed-Solomun, distribute shards of that data widely to full nodes, and light nodes sample these data shares in order to check it’s available. Reed-Solomun allows us to do $M$ of $N$ reconstruction of the data, meaning we don’t require all $N$ nodes.
- Articles:
- Papers
P2P networks.
- Distributed Hash Tables
- See distributed systems.
- Ethereum uses a Kademlia DHT.
- Articles.
Reputation systems.
- Projects.
- Evidence-Based Subjective Logic (cited by the EU)
- Eigentrust / PageRank.
- BitTorrent tit-for-tat (paper).
Product-market fit.
- Article: Crypto-product market fit
Misc blog posts.
- “Applied economics”, mechanism design
- The blockchain folk theorem - reasonable mathematics-heavy paper that talks about Bitcoin and proof-of-work in terms of game theory.
- https://blog.ethereum.org/2014/11/25/proof-stake-learned-love-weak-subjectivity/
- https://blog.ethereum.org/2015/01/23/superrationality-daos/
- https://blog.ethereum.org/2015/02/14/subjectivity-exploitability-tradeoff/
- https://blog.ethereum.org/2014/09/02/software-bounded-rationality/
- https://blog.ethereum.org/2015/01/28/p-epsilon-attack/
- https://blog.ethereum.org/2014/06/19/mining/
- https://vitalik.ca/general/2019/04/03/collusion.html