‹ back Home/Machine learning

Machine learning

Questions.

Ideas for projects:

  • Could we build Tinygrad’s philosophy as an RL agent? ie. minimise complexity as a reward signal
  • Could we use RL to train BitTorrent agents that maximize utility (download speed, swarm health)?
  • Can we build a reccomendation system (recsys) that is P2P like BitTorrent and replace centralized algorithms? Twitter on one machine.
  • When will we build the first private intelligence? An AI that runs inside an MPC circuit. Is this a solution to the prisoner’s dilemma?
  • If AI is prediction is compression, will video codecs be replaced by AI embeddings / TV tokens? e.g. ts_zip for TV
  • How long until an AI agent can build a web browser from scratch?
  • This is a valuable problem. All modern agents pay for containerised Google Chrome instances. Imagine if you improved the efficiency by 50% through a more minimal reimplementation.

General questions:

  • What is intelligence?
  • Sutton: “Intelligence is the computational part of the ability to achieve goals”
  • When are we training smaller models? ie. TinyStories
  • How small can we make models?
  • How can we make a model which does online learning without catastrophic forgetting?
  • TikTok’s recsys is an online learner. But user taste is a non-stationary distribution (changing).
  • When will we get full AI generated TV shows like Seinfeld?
  • What will be the first cultural moment for AI cinema? What will be remixed? Something ridiculous like American Psycho set in the era of Arab opulence?
  • When will intelligence become like Docker containers?
    • Base image (alpine) for English language, base image for reasoning (logic), and then other layers for domain-specific knowledge (Matrix-style hot patched)?
  • Could we pentest the law using LLM’s?
  • How much intelligence do we need to make light 10x cheaper?
    • We can estimate how much wood we need to get light for 1hr. But we don’t even have a unit for intelligence (tokens?).
  • Public intelligence?
  • Is the economy an ML algorithm? Are price signals gradients?
  • What is the equivalent of the open Web for agents existing in the physical world?
  • Current AI agents use HTTP. What about the physical world, where a mixture of sensory input from many agents will be streamed in real-time to centralized databases? How will agents coordinate? I doubt it be a P2P data transfer. I imagine a version of the Web but for the physical world (ie. 3D) where many agents can interact and co-operatively train.

Ideas.

  • The move to foundation models
  • John Carmack UpperBound 2025 talk
  • Humans are a biological bootloader for digital intelligence.
  • Jailbreaking the simulation.
  • Generative simulacra
  • Media is programming. Genres, themes, motifs, plots, character descriptions, arcs, recurrent bits, one-off features - these are all as much primitives as HTML, React views, react-query, useState, useEffect, CSS modules, API routes are. Atomization.
  • The internet made the cost of distributing content marginal. Now due to AI, the cost of producing content falls to zero. Attention is still scarce. Taste is still scarce.
  • “neural <X>”
    • discrete neural networks that emulate a digital circuit (see: Jane St problem), interacting with continuous neural networks (transformers). What could you build here?
    • neural BitTorrent: RL to train agents that maximize utility (download speed, swarm health)
    • neural Bitcoin: learned hash functions (embeddings) instead of sha256, learned difficulty approximation instead of moving average.
    • neural DHT’s: use embeddings instead of cryptographic hash functions, nodes store content related to topics (ie. embedding clusters) rather than uniformly distributed.

AI progress key constraints.

  1. Energy (power grids).
    • Add more compute, get more intelligence.
  2. Statistics.
    • At its core, the ChatGPT unlock was about four things: attention, scaling compute, good dataset, and RLHF.
    • Core unlocks like attention and TTT.
  3. Software.
    • Cut-Cross Entropy is one example.
    • Quantization is another.
  4. Hardware.
    • GPU’s, tensor cores, TPU’s.
    • Optimizing for hardware layout.
  5. Data.
    • TikTok gets this, online learning makes system better, thus more usage, more training data.
  6. Product.
    • This is probably the most counterintuitive one here. But hear me out.
    • Deepseek is interesting because the reward signal comes from an external tool - python evaluating math equations.
    • OpenAI is the best-in-class consumer product, and their next iteration as of March 2025 is buildng tooling integrations.
    • Tooling is the cheapest way to more signal and thus more data.

Artificial vs. Human Intelligence.

Some random notes-

  • 24/7 - AI never switches off
  • Instant communication - AI can communicate instantly with all AI’s worldwide. it doesn’t need to pickup the phone or get its airpods.
  • Parallel communication - humans can only speak with 1 person at a time, AI can do billions
  • Deep communication - humans can only convey a fixed bitrate of information. AI’s can convey terabytes
  • Multimodal communication - humans can speak and change facial reactions. AI’s can speak, generate text, generate images, think deeply. etc.
  • Multilingual - AI’s can speak every human language. humans can only speak a few.
  • Memory recall - AI can remember everything it receives in conversation. there’s no error. humans routinely make errors.
  • Concurrent communication and thinking - AI can do research while it’s speaking to other AI’s. whereas humans suffer when multitasking, limited bandwidth.
  • 1,000,000x larger memory - AI can remember infinitely more than you, horizontally scalable knowledge.
  • 1,000,000x larger perceptive field - AI can see the entire world at once. it can use vast sensor networks to see what is happening.
  • Preponderance of second, third, fourth order consequences
    • AI is really good at vast map-reduce style thought. it’s good at searching over potentialities. think of “deep blue” the chess computer - AI is really good at just imagining all the trajectories.
    • what it lacks is something like “taste”. it’s currently a massive supercomputer with really really poor senses. it is a broad brushstroke and a deep factory worker. but it’s not artisinal in any sense of that word. nothing artisinal has ever been made with AI autonomously.
  • Reliability - is AI more auditable/trustworthy?

What are humans better at:

  • energy and direction and taste. AI’s don’t have the self-direction to choose what to work on.
  • inspiration and style. AI’s have the ability to consume content and “fine-tune” in the direction of that content. but they don’t really have the loop of humans where they continuously absorb it.
  • inventing new products - AI can’t really figure out what to make
  • learning in real time. ai’s can’t really learn in real time yet. reality is not turn-based. see carmack’s upper bound talk
  • follow curiosity and sparse rewards
    • We want an economically valuable agent to carry out long sequences of actions with just a reward at the end
    • People don’t actually look at the scores going up as they play very much. In some games like Yar’s Revenge, the score is only visible between levels
  • responding quickly - RL AI systems are high latency (150ms+) and cannot play atari
  • doing transfer learning on video games - ie. AI’s learn to play one game but then act like a fucking moron on others. they don’t have deep knowledge.
  • efficiently representing many high-level qualities and making decisions ie.
    • rewards
    • acting on different timescales
    • efficient curiosity
    • factoring an action space efficiently
    • learning fast
    • learning when to generalise vs. specialise
    • not categorically forgetting
    • learning sequentially
    • storing classifiers vs. RL models - apparently ML systems that classify can’t necessarily do RL tasks as well.
  • organising teams of humans with taste - AI can’t figure out people’s temperaments. it cannot lead teams. it has no presence.
  • being famous. ai’s can be famous but they can’t ie. develop their own personalities just yet. they can’t be idolised like taylor swift or sports stars
  • pioneering distinct visual styles
  • having beers and being at the bbq. ai’s can’t really do afterwork drinks.
  • physical embodiment
  • love
  • growing a family
  • filming funny youtube videos
  • dreaming - ai’s cannot dream yet.

Talks.

  • TikTok’s recommendation system (2025). Presented to distributed systems study group. ()
    • Summarising these papers:
      • Monolith: Real Time Recommendation System With Collisionless Embedding Table (2022)
      • Deep Retrieval: Learning A Retrievable Structure for Large-Scale Recommendations (2020)
      • Deep Neural Networks for YouTube Recommendations (2016)

Distillations.

AI is statistics (science) applied to big data (engineering).

  • Foundations: scientific method
  • Prediction = compression (hutter).
  • ML = (x,y) -> optimizer -> f(x,P)
  • Embeddings: word2vec for anything
  • Attention: sequence compression O(N^2), probabilistic weight sharing.
  • Test-time-training: sequence compression O(N)
  • Feature engineering
    • YouTube recsys.
    • OpenAI tiktoken: n-grams and BPE.
    • Rank factorization, LoRA’s.

My story of the field.

I’ve been interested in ML since high school, ever since DeepDream came out. But I chose to go into crypto so I could travel the world. Now I’m back into ML.

Modern AI has existed since 2010, when we combined big data (ImageNet) with big compute (GPU’s). There has been a very steady linear progress in the capabilities of AI since 2010:

  • Google, web crawl dataset, eigenvectors (2000)
  • GPU parallelism + large datasets (imagenet)
  • RNN’s, CNN’s
  • batchnorm / dropout / simplifying CNN’s
  • relu/swiglu
  • deepdream
  • resnets, highway nets, information bottleneck thesis
  • GAN’s
  • Adam
  • transformers
  • scaling (chinchilla) / gpt2 / commoncrawl
  • bitter lesson (2019)
  • gpt3.5/RLHF
  • diffusion models (sd)
  • quantization
  • LoRA’s
  • recsys
  • TikTok Monolith - online learning.
  • cut-cross entropy / logit materialisation
  • P2P training: Nous DiSTrO
  • inference-time compute / reasoning models / GRPO / DeepSeek / o1
  • AI game engines (gamengen)
  • video models - Sora, Veo
  • realtime multimodal AI - text, image, voice
  • test-time training
    • 1min video coherency

On Sama

I like this position - https://ia.samaltman.com/

 
0:00