Ralph Loops and AGI: The State of the Art in Software Engineering

A presentation on autonomous development loops, the future of work, and why questions matter more than answers.

Introduction

I’ve been in my cave for about seven days now, and this is me coming out. I need to push this information out so I can be a Zen Buddhist again and focus.

So what am I talking about? How do we do software engineering? I’m going to tell you how you’re supposed to do it now.

You write an idea in a markdown document, link a directory called Ralph—this framework—and then it just generates specs and runs a loop. That’s it.

The Five Levels of Software Engineering Evolution

What is the fastest way to generate value? Let me walk you through the levels.

The Five Levels of Value Generation

Level 1: Manual Coding

The state of the art used to be a team of engineers writing code by hand. Humans typed code. AI didn’t exist. Then they figured out how to tune an AI to be an assistant rather than something that just predicts probabilities. All of a sudden, AI could generate code solutions.

The bottleneck: How quickly could I copy and paste these solutions?

Level 2: Augmentation

Now we’re generating code and copy-pasting. The state of the art became ChatGPT. The innovation was context—suddenly I could get an AI to actually be alongside me in the codebase.

The bottleneck: Just hitting tab on my computer and navigating around the files as I’m building features.

Level 3: Delegated Labor

The state of the art is tools like Copilot. Then we figured out how to give AI tools. Now they’re not just autocompleting—they’re actually doing their own work in the codebase. They’re listing files, editing them, running build commands.

We’re using these agents to build features. You orchestrate and supervise the agents.

The bottleneck: How quickly can I translate my intent into prompts?

Level 4: Orchestrated Labor (Designing Workers)

At this level, we start building prompts for actual roles that would typically be an employee—marketing, sales, whatever. We’re using AIs to run programs that represent these people’s roles. The loops self-improve and self-iterate.

The bottleneck: Orchestration and intent.

Level 5: Extended Imagination

Level five is where you just think of ideas and agents start building them.

Here’s what this looks like: You’re going to have Neuralink in your brain. You’re going to think “I’m thirsty right now” and a beer is going to appear in your hand. And as you’re visualizing this beer, as you’re seeing this Great Northern, you’re having a disgust reaction because you’re actually in the south—you’re in Victoria. And it’s immediately going to redesign the beer and replace it in your hand with a cool Victorian warm ale instead of a cold Northern Queensland lager.

This is what’s going to happen.

The Loop: How It Actually Works

Thinking in Bottlenecks

I always think about this in terms of bottlenecks: the gist of where we’re at, the state of the art, the innovation that unlocks something, and then the new bottleneck.

The Bottleneck Table

Right now, the state of the art is some people out there with seven tabs open, each tab running Claude Code, each instance working on tasks. The innovation I want to talk about is loops—specifically this thing called Ralph.

Ralph’s unlock is that AI can build on autopilot. The way to think about it: you type something into your computer, go to sleep, and 12 hours later you come back and your computer has built stuff. It hasn’t gotten stuck anywhere. It has continued working.

That’s what you can’t do with Claude right now. You have to come back at least every 30 minutes. But with Ralph, you can get it to run for eight or nine hours.

The new bottleneck: How quickly can I pump stuff into this loop that seems reliable enough to build features?

The Kubernetes Analogy

What is a loop? Basically, it’s like Kubernetes.

Kubernetes started with this idea: how do we continuously drive the current state of the system—which in Kubernetes was infrastructure, but in our example it’s source code—toward our desired state? Our desired state is made up of specifications in markdown files.

What’s a Ralph Loop?

The loop looks like four pretty simple things:

Idea → 2. Specs → 3. Plan → 4. Build Loop

A Concrete Example

Let me walk through this presentation as an example.

Step 1: The Idea

The idea.md Example

I go nano idea.md and start writing:

“I’m going to give a presentation to the OS Builders about Ralph”
I paste 20,000 lines from the Ralph repo
I paste 20,000 lines from Steve Jobs’ “The Art of Presenting”
I add whatever else seems relevant

Step 2: Generate Specs

I go to Claude with the prompt that generates specs, plus the idea file. Claude spends a minute and writes the specs.

What do these specs look like? The spec defines the ideal state:

The presentation is engaging, funny, and visual
It’s in English
It’s short and fast
We don’t care if it’s not safe for work—it’s going to be funny

Another spec covers what’s actually in it: it’s about software engineering and Ralph.

The key insight: Claude is going to write something really good. Way better than if I wrote it by hand. It’s going to really think about the complete picture. The AI has latent intelligence where it can think of the complete picture. It doesn’t require you to know the full picture.

Step 3: Generate the Plan

The plan.md Example

From the specs, we call Claude again. The plan command knows to scan these spec directories and starts up a bunch of sub-agents—maybe 500 of them—and generates a plan.

What is a plan? It’s pretty simple: a to-do list. But Claude thinks of it in terms of sprint planning. It divides the work into chunks and schedules it. Things that unlock other things come first. It understands the dependencies.

The to-do list doesn’t have to be perfectly concrete, but it’s concrete enough that the agent won’t get stuck doing deep R&D. It knows: “I just have to make some slides. I have to flesh out some stuff.”

Step 4: The Build Loop

This is the interesting part. We’ve interacted with Claude and been at the computer the entire time. We couldn’t have gotten Claude to automatically do all this—we were there typing.

Now we run a command called loop. It iterates. Each iteration, it checks the plan and ticks off items.

Iteration one: bam. Iteration two: bam. Iteration three: bam.

The Simplicity of the Code

How does it do this? The code is really simple. It just loops while true and pipes in a prompt to Claude. The prompt is doing all the work.

The loop.sh Code

What is the prompt? Conceptually:

The Build Loop Prompt

Before you do anything, read all the specs, read the plan, and read the code. Pick one item. Search through the code, get up to scratch, and implement it. Run tests. Fix gaps. Update the plan with anything you notice. If all the tests pass, commit and push. Done.

This might seem like we’re giving each step one at a time and checking. No—we give the whole prompt because this is enough. The models are smart. They can figure out what they’re doing from just these four steps.

We run this in a loop and it actually works. That’s the crazy thing.

My Overnight Test

I did this overnight. I ran a loop. Started with “make a to-do list app,” generated specs from that, generated a plan from the specs, and ran the loop.

The first night it crashed because I ran out of Claude tokens. The next night I ran it, and I woke up in the morning with a to-do list app. It had done all this stuff, deployed it, run it.

This is all possible today. That’s incredible.

Why the Loop is Idempotent

Going back to the original definition: we’re continuously driving the current state toward a desired state.

The Loop Mapping Diagram

Let’s map this:

Continuously = loop.sh
Drive = the prompt for the build loop
Current state = source code
Towards = the plan (we keep track of our plan)
Desired state = specs

When the loop crashes, it’s idempotent—meaning it can start up again and just work. It doesn’t get corrupted. You restart the loop. It rereads the prompt. It studies the current state, the plan, and where we’re going. It picks an item off the plan and uses its intelligence. It just continues working.

That’s amazing because you can add things to the specs as you’re building. You can add things to the plan as you’re building. The loop runs in the terminal, and every time it starts a new iteration, it reads the state from the system. If you’ve added new items to the plan or new specs, it picks them up automatically.

Loops Beyond Development

The Bug Fix Loop

Who cares about to-do list apps? Well, this is interesting because you can automate a substantial part of your work. But what’s more interesting is that the loop doesn’t end there.

I was building a markdown editor and had a bunch of bugs. I wondered if the loop could fix the bugs too. Then I realized: this bug-fixing thing is just a loop as well.

The Bug Fix Loop

It’s a different type of loop because there’s no specification for a bug. It’s more like: reproduce the bug, then dive in and fix it.

The bug fix prompt:

Write a unit test to reproduce the bug
Run the test
Identify the root cause
Implement fixes
Verify with the test
If it passes, you fixed it; if not, you haven’t

So I wrote a prompt for this. Instead of loop build, it’s loop bug fix. The loop now continuously drives toward a desired state where there are no more bugs in the bugs folder—they’re all resolved.

And this works too.

Nested Loops: Claude Supervising Claude

I started thinking: if that works, what else could be a loop?

Sometimes the bug fix loop crashes because it runs out of tokens, or it gets into its own loops where it can’t break out because the architecture is messy. It might fix one bug in a hacky way that causes another.

So I realized: I could run this loop inside Claude and let Claude supervise the loop. Claude watches it looping and goes, “I can see it’s making this dumb error of just fixing things that aren’t actually the cause.”

You can do that. You can actually do that. And it can work.

If the model can build the application, the model can probably solve whatever stupid stuff it’s doing.

The Marketing Loop

What else could the model be capable of? Could we loop marketing?

Marketing nowadays is probably making a bunch of TikTok brain-rot videos. But think about it: I’ve got a changelog. I’ve got completed features. Marketing is making people aware that you have this solution and it’s improving.

What would that look like?

Read the changelog
Create briefs—“this is the creative thing we want to sell”
Loop on generating ideas until some criterion is met (the idea is good enough according to an audience)
Generate TikToks for those ideas
Schedule and publish
Measure engagement
Store feedback in a folder

All of these steps can be automated, just like the process we already went through: idea → specification → plan → build loop.

Marketing could be the same thing. It’s not that different a task. You’re generating stuff, you’re thinking, you require intelligence—and we have intelligence.

Learning from Feedback

And then what about analyzing the feedback? Are you smoking crack? Are you actually smoking crack right now? What? What do you mean “humans manually analyze the feedback”? No. We use Claude to do that as well.

Obviously.

The Marketing Feedback Loop

“Hey ChatGPT, this video where I intentionally sparked rage at my viewers because of some political topic of the day, plus two OnlyFans models—why does this work? How does this work better than the video where it’s just a LinkedIn seminar?”

And ChatGPT will be like, “Obviously, sex sells and people want to feel angry. Adding this to agents.md.”

What that means is it’s going to learn. It’ll add it to a markdown file. The next time we do the marketing loop, it’ll remember: “People are really engaged by low-quality brain-rot content.” And that’s actually improving the marketing. It’s getting views.

This is 2026. We can do that. It’s not outside the realm of possibility that a model can look at what you’ve built, what you’ve put out there, and think about it and analyze it.

The Real Value: Taste and Orchestration

What’s hard is taste. A lot of other people are doing this. You’re going to have to think of ways to put your taste into this process.

The actual work of generating a video? Easy. Putting in actors? Easy. Voiceovers, speeches, audio? All generated. Even the process of generating it can be generated.

It’s not hard to make code that makes videos now. It’s not hard to make things that make code that makes videos.

It’s all about orchestration.

It’s 2026. Get with the program.

Evidence of AI Capability

An AI Cloned an Entire OS from Scratch

This AI cloned an entire operating system from scratch. From scratch. It took pictures of an old version of Mac OS and just cloned it. Cloned it.

If you think marketing is more complex than building an operating system, you’re wrong. Half of this was just looking at pictures and implementing functionality and building parts.

Marketing is going to have complicated aspects, but marketers often get paid less than the people who built this operating system. We have models that are intelligent now. If it can do something as complicated as this, it can probably do something as complicated as marketing and learn from it.

Neural Network Example

My computer literally built me a neural network that taught geometric 3D models how to walk. Then I asked it: “Can you give the models vision? Can they perceive other models in the game?”

And it gave them eyes.

Then I said, “Can you build neural networks for each one of these things so they learn?”

And it did. It made half of them predators and half of them prey. The predators learned to hunt and chase. The prey learned to run away and built a model of the predators’ dynamics.

Then it trained it. And it did it. It just did that. All I had to do was steer it.

The Trajectory of Progress

We literally live in the future.

I was never that good at mathematics. I’m pretty all right, but I never really wanted to practice it that deeply. And now I don’t actually have to. To build 80% of the ideas I have, I can just ask Claude to do it. I can use the right language and it will do it and I can steer the model.

Sure, if there’s fundamental research to be made, I don’t have the talents for that. But that’s not what I ever really wanted to do. I was only learning that stuff because there was no other way. Now the computer is doing the magic for me.

The Timeline

We probably don’t even understand how fast this is moving.

The Timeline of Progress

5 years ago: I was writing code by hand. I can’t even remember how simple that used to be.
4 years ago: I was only at the stage of copy-pasting snippets of code.
3 years ago: I was just tabbing (autocomplete).
2 years ago: I didn’t even have Claude. I was still copy-pasting from GPT, impressed that it could handle something complicated like a consensus algorithm.
1 year ago: I was using Claude to write files.
Today: I literally go to sleep and when I wake up, my agents have built software.

That’s wild.

And then you think: what are we looking at in the future? “Can an agent do marketing?” Look at what’s happened. Look at the past 5 years.

It just cloned an operating system and it half works. We’re talking about operating systems—things that have existed for 20 years and been iterated on. This wasn’t possible a year ago, and now it’s possible.

At the end of 2026, we’re probably going to look at this and go, “Man, how could I not see the future coming? All the data points were there.”

The Economics of Speed

Let me give you a metaphor.

$7 trillion—that’s what Sam Altman wanted to raise. I put it to you that maybe that was reasonable.

The GDP growth from 1975 to 1985 was $7 trillion. Now think about this:

2020: Building a moderately complex SaaS app might take 3 months.
Now: It might take a day or so.
In 5 years: It might genuinely take 30 seconds.

Keep in mind: the cost goes down and the speed goes up, and they’re independent variables.

If you had access to this technology back in 2020—if you could build products and sell them literally two and a half million times faster and at pennies—how much money could you have made that year as a contractor?

That’s similar to how much GDP increased in that 10-year gap. And that’s just for one person. Multiply that by everyone.

$7 trillion? I don’t know how much money could you make. Probably a lot.

The New Way to Do Software Engineering

So how do you do software engineering now?

You basically just keep asking: What’s the bottleneck?

How to Do SWE in JAN26

What can be made parallel?
What’s the best bang for buck in terms of tokens to value delivered?
You only get 20 million tokens a week—how do you use them?

You start thinking about these things:

As the build loop is running, another loop is producing specs to build.
As the bug fix loop is running, another loop is studying the program interactions and testing to find bugs, adding them to the bug fix queue.
As the software engineering loop is running, another loop is marketing features on TikTok and Shorts.
As the marketing loop is running, another loop is studying the world and ideating on new product capabilities.

Keep in mind: Twitter announces all these new models. Every model has latent capability to do things with your product that you hadn’t thought of—generate speech, generate features, do different things.

You don’t need to be looking at those tweets. You can have an agent looking at them for you. Another agent generating ideas. Something taking those ideas and translating them to product specs. Something building them. Something taking the changelog and building marketing videos.

Loops on loops on loops.

You can do this with your computer. You don’t actually have to hire people necessarily. If you can’t do it now, you can probably do it in 12 or 24 months because everything is moving in real time.

The Scarce Resource

As you’re doing this, you realize the scarce resource is actually talking to the computer.

Right now it’s orchestrating loops. But in a couple years, the scarce thing is just talking. The computer is listening. As it listens, it gets cheaper and cheaper to prototype your thoughts. It reads your intent better. As you’re talking, it’s realizing prototypes—thousands at once—and presenting them to you at the end. You look at them and it realizes what you think is a preference.

But right now, it’s just this person with seven agents open. And another person who’s one step ahead, looping.

It goes so much further.

Where the Loop Fails

The bottleneck is: where does the loop fail?

That’s what we’re figuring out. But there’s a long trajectory of: damn, this goes really fast.

We might be at L3 (delegating labor), but L4 is designing workers—orchestrating them and just providing intent. And L5 is extended imagination—you just think of ideas and AI pops up and does stuff.

What Happens to Work?

There are a bunch of questions about what happens to work. Here’s the speed-run:

What Happens to Work?

“The AI can work autonomously.” → It works while I’m asleep.
“It’s kind of like investing.”
“I can make my own agents for different domains.”
“I can automate the work of different roles like marketing and sales.”
“I still need to be a human in the loop, but for how long?”
“Wait, is it only 6 months? I can design another loop to automatically adjust the loop when the next model comes out.”
“What other roles can AI automate? Can it do policy? Can it do construction?”
“All the data is locked up. When are we going to get the data in one place?”
“How do I differentiate myself when everyone else is learning and doing this stuff?” → It’s probably going to be the people with audience.
“Do I need to write prompts? Can I just ask the agent to watch and observe what I’m doing? How long until it can learn from what I’m doing?”

And then you start asking bigger questions:

Why is everything so slow? If I can build this SaaS app in a day and I look at government—government takes six months to write a report. Does it need to take six months?

You ask the model. You talk to the model. “Can you write me some policy? Can you reflect on this?” The model is intelligent enough to think of a lot of stuff. Can the model do that job fast?

Then you look at jobs on Seek.com. Why is 20% of jobs just “data entry specialists for AIs”? What does that even mean? The models are getting really intelligent because we’re hiring people to train them.

And then: are you telling me the AIs aren’t actually dumb, but the people who say they’re dumb are just dumb themselves? They don’t actually know how to use them?

Why do I even need to read the news? Can’t I just get an agent to read it for me?

What if agents were drafting stuff in advance for humans to review—faster than humans can tell the agents what to do? Why do humans need to tell the agents at all?

Say it’s policy or regulation. Government creates regulations to protect people. Why are humans reading the news? The agent could read the news. The agent knows what a policy is. Why can’t the agent read the news and draft new policy ideas, and then humans just meet once a week and go, “Here’s what’s changed in the world, here’s this draft of this policy, does it look good?”

We don’t even have to think. The agent can do the thinking and just come with ideas to us.

Tactics for Staying Sane

How do you stay sane when things change this fast?

Tactics for Staying Sane

Tactic 1: Focus on Problems, Not Solutions

Tell yourself: “My responsibility is to solve problems.”

Before, I solved problems of there not being code for features. Now I solve problems of the agents not having specifications yet—they’re running on an empty loop. So now I’m the person who solves the problem of how to get specifications faster.

To do this, I just learn the tools to solve problems.

Tactic 2: Update Monthly

Things are going to change faster than you have time to catch up. That’s okay. Just catch up once a month. Look at what the best thing is, and update your approach.

You still have to be doing stuff. It’s still working and generating value. You just update your approach as it comes.

At the end of the day, you’re steering a loop. You’re steering tools. Before it was “I need to buy a new MacBook every 5 years.” Now it’s “I need to write a better orchestration for something to do the job.”

Tactic 3: Be Part of Something

The biggest tactic for staying sane is being part of a company, having a job, or starting a startup—being part of something.

In this age, attention is scarce, but distribution is where you need to be. There are going to be a billion different voice recording apps for Mac, and people aren’t going to find them easily. They’re going to look to whoever’s the trusted person or community that tells them what to use.

You’re still going to be competing against everyone else because everything’s getting automated. The best moat is not holding on to ideas.

Questions Over Ideas: The Mental Framework

Ideas of how the world works decay faster than you can absorb the new way the world works. To update those ideas, you need to find information, and it’s just tiring.

So instead, hold on to questions.

Questions give you the frame. The question might be: “What’s the bottleneck in what I’m doing right now?” or “How can I get this done faster?”

That question is always going to stay the same. The answer will change, but the question stays constant.

You can hold mental peace and clarity by focusing on refining your question-asking ability rather than refining ideas. Whenever you reenter this mode of personal development, you have the same question. The answer is always different because things are getting better. But that’s good—you just record the answer every time.

You’re not always trying to look at your current answer and go, “How do I update this?” You’re looking at your current question and what you had as the answer last month, and you’re just updating that answer.

That’s how you operate. That’s the most peaceful way of operating.

My Current Questions

The questions I’m asking right now:

Can the model automate this?
At what point does it stop being able to do that reliably?
When does that point change?

For example: Can the model automate building features for me? Yes. At what point does that stop being reliable? Depends on the codebase, depends on good architecture. At what point does that change? Either when the model can come back and tell you it can’t do the job, or when you build a loop that does refactoring work before feature work.

Write down those questions. Come back 6 months later. Ask the question again. “Oh, the new model’s come out and it can actually solve this.” Okay cool. What’s the next level up?

The interesting thing is the question can also be run by an AI. Technically, you don’t even have to do that work—you can get an AI to ask these questions for you. Write them down in a little area of your computer. But maybe that’s a level too far.

The Elephant and the Chain

Think about it the other way. If you just had ideas—“a policy takes 6 months to develop,” “an AI model can currently review this”—and then you use the new models, you’re already constraining yourself. You’re like the elephant with the chain: it grows up with a chain on itself, they take the chain off, but it still thinks it’s chained.

Questions help you learn. They give you the aperture on your lens of information to see what’s possible. And they make you feel good, because holding on to ideas of how things work is just going to get you stuck.

You’re going to be like the engineers who say, “AI can’t write code. It can’t generate a function because it just generates bad code.”

The reason they think that is because they don’t actually know how to use the model. They’re not asking “Can this write code?” They’re saying, “The way I talk to this magic we’ve invented is not generating the output I want, and thus it must not be capable of it.”

They don’t realize that as the AI gets 50% through its working memory—its context window—it starts getting dumber because of biases in the architecture. Once you realize that, you see that all these people saying it “just gets dumb” don’t understand how it actually works.

We’re all playing and experimenting. Everything has uncertainty. Questions allow you to learn and reduce your uncertainty. Ideas held statically don’t.

Conclusion: Where Is the Moat?

This is what I’ve been thinking about.

It started with Ralph. It started with looping. But when I got the loop working and walked away from it, I realized: I can just make more loops. And I thought, “Damn, that’s crazy.” And then, “This gives me so much anxiety. Where is the moat? If the loop can copy and clone any app from any PNG, and the agent can learn how to use the operating system—what is the moat?”

The moat is having questions.

How do you out-compete? You stay mentally stable. Stay at peace. You know that how you’re approaching the problem is the most effective way. You continue asking your questions, refining them. Over time you get better questions. You won’t be rattled by development. You’ll be able to see information. You won’t be lying to yourself by asking biased questions.

As you do this, you’re leveling up the stack. You keep asking: what’s the next bottleneck? What can I do to build it better? Deliver the same value with less cost and faster?

You’ve got to be a problem solver. You’re solving problems and asking questions. How can I solve these problems better? You see tools out there, so you learn them. What’s the limit of these tools? What’s the bottleneck?

The limit is: you need a human in the loop now because we don’t exactly understand the limitations. But the loops are pretty good and the models are pretty good. There are people smarter than me saying, “This is the state of the art. The old way is old and it’s dead now.”

And you just continue with that.

Congratulations. Ralph Wiggum made it all this way. The Simpsons Movie was great. Thank you very much.

‹ Machine learning

The Moat Is Having Questions (On Ralph)

Ralph Loops and AGI: The State of the Art in Software Engineering

Introduction

The Five Levels of Software Engineering Evolution

Level 1: Manual Coding

Level 2: Augmentation

Level 3: Delegated Labor

Level 4: Orchestrated Labor (Designing Workers)

Level 5: Extended Imagination

The Loop: How It Actually Works

Thinking in Bottlenecks

The Kubernetes Analogy

A Concrete Example

The Simplicity of the Code

My Overnight Test

Why the Loop is Idempotent

Loops Beyond Development

The Bug Fix Loop

Nested Loops: Claude Supervising Claude

The Marketing Loop

Learning from Feedback

The Real Value: Taste and Orchestration

Evidence of AI Capability

The Trajectory of Progress

The Timeline

The Economics of Speed

The New Way to Do Software Engineering

The Scarce Resource

Where the Loop Fails

What Happens to Work?

Tactics for Staying Sane

Questions Over Ideas: The Mental Framework

My Current Questions

The Elephant and the Chain

Conclusion: Where Is the Moat?