Machine learning · Liam Zebedee

Questions.

Ideas for projects:

Could we build Tinygrad’s philosophy as an RL agent? ie. minimise complexity as a reward signal
Could we use RL to train BitTorrent agents that maximize utility (download speed, swarm health)?
Can we build a reccomendation system (recsys) that is P2P like BitTorrent and replace centralized algorithms? Twitter on one machine.
When will we build the first private intelligence? An AI that runs inside an MPC circuit. Is this a solution to the prisoner’s dilemma?
If AI is prediction is compression, will video codecs be replaced by AI embeddings / TV tokens? e.g. ts_zip for TV
How long until an AI agent can build a web browser from scratch?
This is a valuable problem. All modern agents pay for containerised Google Chrome instances. Imagine if you improved the efficiency by 50% through a more minimal reimplementation.

General questions:

What is intelligence?
Sutton: “Intelligence is the computational part of the ability to achieve goals”
When are we training smaller models? ie. TinyStories
How small can we make models?
How can we make a model which does online learning without catastrophic forgetting?
TikTok’s recsys is an online learner. But user taste is a non-stationary distribution (changing).
When will we get full AI generated TV shows like Seinfeld?
What will be the first cultural moment for AI cinema? What will be remixed? Something ridiculous like American Psycho set in the era of Arab opulence?
When will intelligence become like Docker containers?
- Base image (alpine) for English language, base image for reasoning (logic), and then other layers for domain-specific knowledge (Matrix-style hot patched)?
Could we pentest the law using LLM’s?
How much intelligence do we need to make light 10x cheaper?
- We can estimate how much wood we need to get light for 1hr. But we don’t even have a unit for intelligence (tokens?).
Public intelligence?
Is the economy an ML algorithm? Are price signals gradients?
What is the equivalent of the open Web for agents existing in the physical world?
Current AI agents use HTTP. What about the physical world, where a mixture of sensory input from many agents will be streamed in real-time to centralized databases? How will agents coordinate? I doubt it be a P2P data transfer. I imagine a version of the Web but for the physical world (ie. 3D) where many agents can interact and co-operatively train.

People to follow.

Richard Sutton
Ilya
Kaparthy
John Carmack
George Hotz:
- https://geohot.github.io/blog/jekyll/update/2021/10/29/an-architecture-for-life.html

Ideas.

The move to foundation models
John Carmack UpperBound 2025 talk
Humans are a biological bootloader for digital intelligence.
Jailbreaking the simulation.
Generative simulacra
Media is programming. Genres, themes, motifs, plots, character descriptions, arcs, recurrent bits, one-off features - these are all as much primitives as HTML, React views, react-query, useState, useEffect, CSS modules, API routes are. Atomization.
The internet made the cost of distributing content marginal. Now due to AI, the cost of producing content falls to zero. Attention is still scarce. Taste is still scarce.
“neural <X>”
- discrete neural networks that emulate a digital circuit (see: Jane St problem), interacting with continuous neural networks (transformers). What could you build here?
- neural BitTorrent: RL to train agents that maximize utility (download speed, swarm health)
- neural Bitcoin: learned hash functions (embeddings) instead of sha256, learned difficulty approximation instead of moving average.
- neural DHT’s: use embeddings instead of cryptographic hash functions, nodes store content related to topics (ie. embedding clusters) rather than uniformly distributed.

AI progress key constraints.

Energy (power grids).
- Add more compute, get more intelligence.
Statistics.
- At its core, the ChatGPT unlock was about four things: attention, scaling compute, good dataset, and RLHF.
- Core unlocks like attention and TTT.
Software.
- Cut-Cross Entropy is one example.
- Quantization is another.
Hardware.
- GPU’s, tensor cores, TPU’s.
- Optimizing for hardware layout.
Data.
- TikTok gets this, online learning makes system better, thus more usage, more training data.
Product.
- This is probably the most counterintuitive one here. But hear me out.
- Deepseek is interesting because the reward signal comes from an external tool - python evaluating math equations.
- OpenAI is the best-in-class consumer product, and their next iteration as of March 2025 is buildng tooling integrations.
- Tooling is the cheapest way to more signal and thus more data.

Artificial vs. Human Intelligence.

Some random notes-

24/7 - AI never switches off
Instant communication - AI can communicate instantly with all AI’s worldwide. it doesn’t need to pickup the phone or get its airpods.
Parallel communication - humans can only speak with 1 person at a time, AI can do billions
Deep communication - humans can only convey a fixed bitrate of information. AI’s can convey terabytes
Multimodal communication - humans can speak and change facial reactions. AI’s can speak, generate text, generate images, think deeply. etc.
Multilingual - AI’s can speak every human language. humans can only speak a few.
Memory recall - AI can remember everything it receives in conversation. there’s no error. humans routinely make errors.
Concurrent communication and thinking - AI can do research while it’s speaking to other AI’s. whereas humans suffer when multitasking, limited bandwidth.
1,000,000x larger memory - AI can remember infinitely more than you, horizontally scalable knowledge.
1,000,000x larger perceptive field - AI can see the entire world at once. it can use vast sensor networks to see what is happening.
Preponderance of second, third, fourth order consequences
- AI is really good at vast map-reduce style thought. it’s good at searching over potentialities. think of “deep blue” the chess computer - AI is really good at just imagining all the trajectories.
- what it lacks is something like “taste”. it’s currently a massive supercomputer with really really poor senses. it is a broad brushstroke and a deep factory worker. but it’s not artisinal in any sense of that word. nothing artisinal has ever been made with AI autonomously.
Reliability - is AI more auditable/trustworthy?

What are humans better at:

energy and direction and taste. AI’s don’t have the self-direction to choose what to work on.
inspiration and style. AI’s have the ability to consume content and “fine-tune” in the direction of that content. but they don’t really have the loop of humans where they continuously absorb it.
inventing new products - AI can’t really figure out what to make
learning in real time. ai’s can’t really learn in real time yet. reality is not turn-based. see carmack’s upper bound talk
follow curiosity and sparse rewards
- We want an economically valuable agent to carry out long sequences of actions with just a reward at the end
- People don’t actually look at the scores going up as they play very much. In some games like Yar’s Revenge, the score is only visible between levels
responding quickly - RL AI systems are high latency (150ms+) and cannot play atari
doing transfer learning on video games - ie. AI’s learn to play one game but then act like a fucking moron on others. they don’t have deep knowledge.
efficiently representing many high-level qualities and making decisions ie.
- rewards
- acting on different timescales
- efficient curiosity
- factoring an action space efficiently
- learning fast
- learning when to generalise vs. specialise
- not categorically forgetting
- learning sequentially
- storing classifiers vs. RL models - apparently ML systems that classify can’t necessarily do RL tasks as well.
organising teams of humans with taste - AI can’t figure out people’s temperaments. it cannot lead teams. it has no presence.
being famous. ai’s can be famous but they can’t ie. develop their own personalities just yet. they can’t be idolised like taylor swift or sports stars
pioneering distinct visual styles
having beers and being at the bbq. ai’s can’t really do afterwork drinks.
physical embodiment
love
growing a family
filming funny youtube videos
dreaming - ai’s cannot dream yet.

Talks.

TikTok’s recommendation system (2025). Presented to distributed systems study group. ()
- Summarising these papers:
  - Monolith: Real Time Recommendation System With Collisionless Embedding Table (2022)
  - Deep Retrieval: Learning A Retrievable Structure for Large-Scale Recommendations (2020)
  - Deep Neural Networks for YouTube Recommendations (2016)

Papers.

Techniques
- Attention. Attention is All You Need
- Residual connections. Deep Residual Learning for Image Recognition
- LoRA’s. LoRA: Low Rank Adaptation of LLM’s
- DeMo: Decoupled Momentum Optimization
- Effective Long-Context Scaling of Foundation Models
LLM’s
- Generative Pre-Trained Transformers
- GPT3
Recsys
RNN’s and test-time training.
Video
- CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Compression.
- Language modelling is compression
Theory.
- The Platonic Representation Hypothesis
- Harnessing the Universal Geometry of Embeddings
- The First Fully General Computer Action Model - forward/inverse dynamics models

Distillations.

AI is statistics (science) applied to big data (engineering).

Foundations: scientific method
Prediction = compression (hutter).
ML = (x,y) -> optimizer -> f(x,P)
Embeddings: word2vec for anything
Attention: sequence compression O(N^2), probabilistic weight sharing.
Test-time-training: sequence compression O(N)
Feature engineering
- YouTube recsys.
- OpenAI tiktoken: n-grams and BPE.
- Rank factorization, LoRA’s.

Notes.

My story of the field.

I’ve been interested in ML since high school, ever since DeepDream came out. But I chose to go into crypto so I could travel the world. Now I’m back into ML.

Modern AI has existed since 2010, when we combined big data (ImageNet) with big compute (GPU’s). There has been a very steady linear progress in the capabilities of AI since 2010:

Google, web crawl dataset, eigenvectors (2000)
…
GPU parallelism + large datasets (imagenet)
RNN’s, CNN’s
batchnorm / dropout / simplifying CNN’s
relu/swiglu
deepdream
resnets, highway nets, information bottleneck thesis
GAN’s
Adam
transformers
scaling (chinchilla) / gpt2 / commoncrawl
bitter lesson (2019)
gpt3.5/RLHF
diffusion models (sd)
quantization
LoRA’s
recsys
TikTok Monolith - online learning.
cut-cross entropy / logit materialisation
P2P training: Nous DiSTrO
inference-time compute / reasoning models / GRPO / DeepSeek / o1
AI game engines (gamengen)
video models - Sora, Veo
realtime multimodal AI - text, image, voice
test-time training
- 1min video coherency

On Sama

I like this position - https://ia.samaltman.com/