Machine learning
Questions.
Ideas for projects:
- Could we build Tinygrad’s philosophy as an RL agent? ie. minimise complexity as a reward signal
- Could we use RL to train BitTorrent agents that maximize utility (download speed, swarm health)?
- Can we build a reccomendation system (recsys) that is P2P like BitTorrent and replace centralized algorithms? Twitter on one machine.
- When will we build the first private intelligence? An AI that runs inside an MPC circuit. Is this a solution to the prisoner’s dilemma?
- If AI is prediction is compression, will video codecs be replaced by AI embeddings / TV tokens? e.g. ts_zip for TV
- How long until an AI agent can build a web browser from scratch?
- This is a valuable problem. All modern agents pay for containerised Google Chrome instances. Imagine if you improved the efficiency by 50% through a more minimal reimplementation.
General questions:
- What is intelligence?
- Sutton: “Intelligence is the computational part of the ability to achieve goals”
- When are we training smaller models? ie. TinyStories
- How small can we make models?
- How can we make a model which does online learning without catastrophic forgetting?
- TikTok’s recsys is an online learner. But user taste is a non-stationary distribution (changing).
- When will we get full AI generated TV shows like Seinfeld?
- What will be the first cultural moment for AI cinema? What will be remixed? Something ridiculous like American Psycho set in the era of Arab opulence?
- When will intelligence become like Docker containers?
- Base image (alpine) for English language, base image for reasoning (logic), and then other layers for domain-specific knowledge (Matrix-style hot patched)?
- Could we pentest the law using LLM’s?
- How much intelligence do we need to make light 10x cheaper?
- We can estimate how much wood we need to get light for 1hr. But we don’t even have a unit for intelligence (tokens?).
- Public intelligence?
- Is the economy an ML algorithm? Are price signals gradients?
- What is the equivalent of the open Web for agents existing in the physical world?
- Current AI agents use HTTP. What about the physical world, where a mixture of sensory input from many agents will be streamed in real-time to centralized databases? How will agents coordinate? I doubt it be a P2P data transfer. I imagine a version of the Web but for the physical world (ie. 3D) where many agents can interact and co-operatively train.
People to follow.
- Richard Sutton
- Ilya
- Kaparthy
- John Carmack
- George Hotz:
Ideas.
- The move to foundation models
- John Carmack UpperBound 2025 talk
- Humans are a biological bootloader for digital intelligence.
- Jailbreaking the simulation.
- Generative simulacra
- Media is programming. Genres, themes, motifs, plots, character descriptions, arcs, recurrent bits, one-off features - these are all as much primitives as HTML, React views, react-query, useState, useEffect, CSS modules, API routes are. Atomization.
- The internet made the cost of distributing content marginal. Now due to AI, the cost of producing content falls to zero. Attention is still scarce. Taste is still scarce.
- “neural <X>”
- discrete neural networks that emulate a digital circuit (see: Jane St problem), interacting with continuous neural networks (transformers). What could you build here?
- neural BitTorrent: RL to train agents that maximize utility (download speed, swarm health)
- neural Bitcoin: learned hash functions (embeddings) instead of sha256, learned difficulty approximation instead of moving average.
- neural DHT’s: use embeddings instead of cryptographic hash functions, nodes store content related to topics (ie. embedding clusters) rather than uniformly distributed.
AI progress key constraints.
- Energy (power grids).
- Add more compute, get more intelligence.
- Statistics.
- At its core, the ChatGPT unlock was about four things: attention, scaling compute, good dataset, and RLHF.
- Core unlocks like attention and TTT.
- Software.
- Cut-Cross Entropy is one example.
- Quantization is another.
- Hardware.
- GPU’s, tensor cores, TPU’s.
- Optimizing for hardware layout.
- Data.
- TikTok gets this, online learning makes system better, thus more usage, more training data.
- Product.
- This is probably the most counterintuitive one here. But hear me out.
- Deepseek is interesting because the reward signal comes from an external tool -
pythonevaluating math equations. - OpenAI is the best-in-class consumer product, and their next iteration as of March 2025 is buildng tooling integrations.
- Tooling is the cheapest way to more signal and thus more data.
Artificial vs. Human Intelligence.
Some random notes-
- 24/7 - AI never switches off
- Instant communication - AI can communicate instantly with all AI’s worldwide. it doesn’t need to pickup the phone or get its airpods.
- Parallel communication - humans can only speak with 1 person at a time, AI can do billions
- Deep communication - humans can only convey a fixed bitrate of information. AI’s can convey terabytes
- Multimodal communication - humans can speak and change facial reactions. AI’s can speak, generate text, generate images, think deeply. etc.
- Multilingual - AI’s can speak every human language. humans can only speak a few.
- Memory recall - AI can remember everything it receives in conversation. there’s no error. humans routinely make errors.
- Concurrent communication and thinking - AI can do research while it’s speaking to other AI’s. whereas humans suffer when multitasking, limited bandwidth.
- 1,000,000x larger memory - AI can remember infinitely more than you, horizontally scalable knowledge.
- 1,000,000x larger perceptive field - AI can see the entire world at once. it can use vast sensor networks to see what is happening.
- Preponderance of second, third, fourth order consequences
- AI is really good at vast map-reduce style thought. it’s good at searching over potentialities. think of “deep blue” the chess computer - AI is really good at just imagining all the trajectories.
- what it lacks is something like “taste”. it’s currently a massive supercomputer with really really poor senses. it is a broad brushstroke and a deep factory worker. but it’s not artisinal in any sense of that word. nothing artisinal has ever been made with AI autonomously.
- Reliability - is AI more auditable/trustworthy?
What are humans better at:
- energy and direction and taste. AI’s don’t have the self-direction to choose what to work on.
- inspiration and style. AI’s have the ability to consume content and “fine-tune” in the direction of that content. but they don’t really have the loop of humans where they continuously absorb it.
- inventing new products - AI can’t really figure out what to make
- learning in real time. ai’s can’t really learn in real time yet. reality is not turn-based. see carmack’s upper bound talk
- follow curiosity and sparse rewards
- We want an economically valuable agent to carry out long sequences of actions with just a reward at the end
- People don’t actually look at the scores going up as they play very much. In some games like Yar’s Revenge, the score is only visible between levels
- responding quickly - RL AI systems are high latency (150ms+) and cannot play atari
- doing transfer learning on video games - ie. AI’s learn to play one game but then act like a fucking moron on others. they don’t have deep knowledge.
- efficiently representing many high-level qualities and making decisions ie.
- rewards
- acting on different timescales
- efficient curiosity
- factoring an action space efficiently
- learning fast
- learning when to generalise vs. specialise
- not categorically forgetting
- learning sequentially
- storing classifiers vs. RL models - apparently ML systems that classify can’t necessarily do RL tasks as well.
- organising teams of humans with taste - AI can’t figure out people’s temperaments. it cannot lead teams. it has no presence.
- being famous. ai’s can be famous but they can’t ie. develop their own personalities just yet. they can’t be idolised like taylor swift or sports stars
- pioneering distinct visual styles
- having beers and being at the bbq. ai’s can’t really do afterwork drinks.
- physical embodiment
- love
- growing a family
- filming funny youtube videos
- dreaming - ai’s cannot dream yet.
Talks.
- TikTok’s recommendation system (2025). Presented to distributed systems study group. (slides)
- Summarising these papers:
- Monolith: Real Time Recommendation System With Collisionless Embedding Table (2022)
- Deep Retrieval: Learning A Retrievable Structure for Large-Scale Recommendations (2020)
- Deep Neural Networks for YouTube Recommendations (2016)
- Summarising these papers:
Papers.
- Techniques
- LLM’s
- Recsys
- RNN’s and test-time training.
- Video
- Compression.
- Theory.
Distillations.
AI is statistics (science) applied to big data (engineering).
- Foundations: scientific method
- Prediction = compression (hutter).
- ML = (x,y) -> optimizer -> f(x,P)
- Embeddings: word2vec for anything
- Attention: sequence compression O(N^2), probabilistic weight sharing.
- Test-time-training: sequence compression O(N)
- Feature engineering
- YouTube recsys.
- OpenAI tiktoken: n-grams and BPE.
- Rank factorization, LoRA’s.
Notes.
My story of the field.
I’ve been interested in ML since high school, ever since DeepDream came out. But I chose to go into crypto so I could travel the world. Now I’m back into ML.
Modern AI has existed since 2010, when we combined big data (ImageNet) with big compute (GPU’s). There has been a very steady linear progress in the capabilities of AI since 2010:
- Google, web crawl dataset, eigenvectors (2000)
- …
- GPU parallelism + large datasets (imagenet)
- RNN’s, CNN’s
- batchnorm / dropout / simplifying CNN’s
- relu/swiglu
- deepdream
- resnets, highway nets, information bottleneck thesis
- GAN’s
- Adam
- transformers
- scaling (chinchilla) / gpt2 / commoncrawl
- bitter lesson (2019)
- gpt3.5/RLHF
- diffusion models (sd)
- quantization
- LoRA’s
- recsys
- TikTok Monolith - online learning.
- cut-cross entropy / logit materialisation
- P2P training: Nous DiSTrO
- inference-time compute / reasoning models / GRPO / DeepSeek / o1
- AI game engines (gamengen)
- video models - Sora, Veo
- realtime multimodal AI - text, image, voice
- test-time training
- 1min video coherency
On Sama
I like this position - https://ia.samaltman.com/