Loop Engineering

Software engineering keeps changing names for the same old problems. Before 2023, changes were introduced with some level os stability; now, changes happen much faster, to the point that we do not really understand the problems. We had those problems in the past, considering things like (agile/scrum, microservices,  devops), but never has it been a fast loop like this.

First, we had scripting. Then automation. Then CI/CD. Then DevOps. Then Platform Engineering. Then AI Agents. Now we have Loop Engineering. However, companies have barely functional CI/CD due to a lack of incentives, poor vision, poor management, and other dysfunctions.  Loop engineering, the name is new. The problem is not. The problem is: how do you make a machine do useful engineering work without babysitting every step? That’s it. But "solution" and "waste" manifest in the same way, using the same tools.

Loop Engineering is not just about prompting with a fancy jacket. It is not “write a better prompt.” It is not “make no mistakes.” It is not “let the agent run forever and pray.” Loop Engineering is designing the system that keeps the agent moving, checking, remembering, failing, recovering, and stopping.

The stopping part is the one people forget.

From Ralph to Loop Engineering

The scrappy but honest origin is the Ralph Loop. In “Ralph Wiggum as a software engineer”, Geoffrey Huntley described Ralph in the most brutally simple way possible: in its purest form, Ralph is a Bash loop. The canonical version is basically 

while :; do cat PROMPT.md | claude-code ; done.

That’s it.

Not a distributed system. Not a PhD in architecture. Not a multi-agent mesh with six dashboards and a diagram that looks like Kubernetes got drunk. A loop. That is what makes it interesting, brilliant, and dumb all at once. The trick is not the bash. The trick is context reset. Which also kills caching.

A long chat session with an agent starts clean and then slowly becomes polluted. Failed attempts. Logs. Wrong assumptions. Half-fixed bugs. The model starts carrying the smell of its previous mistakes. You get context rot. Ralph says: kill the session. Start fresh. Read the repo. Read the plan. Continue from disk.

That is a very important idea.

In Huntley’s broader post, “everything is a Ralph Loop”, the idea becomes bigger than one bash command. The loop is a way to give the agent a durable goal, external state, and repeated execution without carrying all the conversational garbage from previous attempts.

  • The model is not the memory.
  • The repo is the memory.
  • Git is the memory.
  • The markdown file is the memory.
  • The ticket is the memory.
  • The test result is the memory.
  • The LLM is the processor. 
  • The filesystem is the durable state.

Old computer science, rediscovered because LLMs are expensive goldfish. How does one remember that all problems can be solved with another layer of indirection? Remember people complaining kubernetes was complex, well, we will put loops on top of it :-) 

The Rebrand

Then Ralph got a suit.


Tangent: This is a long-standing trend among AI labs: they take OSS ideas/projects and turn them into products. I need to admit that with a good level of success. Like openClaw and Claude Tag, like cursor threads and literally all agentic non-cli apps. 

Addy Osmani’s post, “Loop Engineering”, gave the practice a cleaner vocabulary: instead of being the person who manually prompts the agent, you design the system that prompts, checks, retries, remembers, and stops. That framing is useful because it shifts the conversation away from “better prompt” and toward “better system.” The prompt is not the product anymore. The loop is the product.

Addy’s anatomy is also useful: automations, worktrees, skills, plugins/connectors, sub-agents, and external memory. In other words, the loop is not just the prompt. It is the runtime around the prompt.

Paddo captured the social and economic side of the rename very well in “We Stopped Calling It Ralph Wiggum”: the ugly, while true, hack became respectable, but the invoice, verifier weakness, and comprehension problem did not magically disappear. That is what naming does. “Ralph Wiggum” sounds like a hack. “Loop Engineering” sounds like a job title. Same animal. Better LinkedIn posture. Paddo, amazing take here, I can only clap.

What is a Loop?

A loop is a success-condition machine. It runs until “done” evaluates true. That sounds powerful because it is powerful. It also sounds dangerous because it is dangerous.

A good loop has:

  • A clear goal
  • A durable state
  • An execution environment
  • A verifier
  • A budget
  • A stopping condition
  • A human escape hatch

Without those things, you dont have loop engineering. You have autonomous token burning.

The basic shape is simple:

  • Read the goal
  • Read the current state
  • Pick the next task
  • Change the code
  • Run checks
  • Save progress
  • Commit or rollback
  • Repeat

This is not magic. This is control systems applied to software work. The agent is not the point. The loop is the point. However, if the agent sucks, I will argue the loop can be expending money forever or at least wasting very fast, very distributed, assuming hundreds to thousands of people doing it.

Loop Engineering is Harness Engineering in Motion

Loop Engineering is a subcategory of Harness Engineering. Harness Engineering is the discipline of building the environment around the model: tools, permissions, memory, prompts, policies, sandbox, observability, and human approval. Loop Engineering is what happens when that harness runs repeatedly toward a goal. The harness is the body. The loop is the heartbeat. This distinction matters because people keep talking like the breakthrough is the agent. It is not. The model matters, of course. Better models make the loop better. But the difference between a toy and a production workflow is usually not the model. It is the harness.

Meaning:

  • Can it run tests?
  • Can it isolate branches?
  • Can it avoid secrets?
  • Can it stop?
  • Can it explain what changed?
  • Can it rollback?
  • Can it ask for help?
  • Can it stay inside a budget?
  • Can it avoid deleting production?

That is engineering.

The Five Pieces

Addy’s five primitives are a good map, but I would frame them slightly differently.

1. Automation

Something needs to trigger the loop. Could be cron. Could be a webhook. Could be a failed CI job. Could be a Linear ticket. Could be a GitHub issue. Could be a human pressing “go.” The trigger matters because it defines the shape of the work. A scheduled loop is good for discovery and maintenance. A PR-triggered loop is good for review and repair. A ticket-triggered loop is good for bounded implementation. A random “go fix everything” loop is how you get garbage at scale.

2. Worktrees

Parallel agents need isolation. Running multiple agents in the same checkout is like letting five interns edit the same file over SSH. Maybe it works once. It wont work as a system. Git worktrees are the boring primitive that make this practical. Each loop gets its own branch, its own directory, its own mess. Isolation is not optional. Without isolation, multi-agent becomes multi-chaos.

3. Skills

Skills reduce intent debt. Intent debt is all the stuff you keep explaining to the agent because your organization never wrote it down.

Example Guidance, that goes in a skill:

  • “We use Java 26.”
  • “Dont use Lombok.”
  • “Tests live here.”
  • “This service owns this boundary.”
  • “Dont add another abstraction.”
  • “Run this script before claiming success.”
  • “This API is weird because of legacy clients.”

That knowledge should not live in your head. It should not live in a chat transcript. It should live in files the agent can read. CLAUDE.md, AGENTS.md, SKILL.md, README, docs, scripts, whatever. The exact filename is not the point.

The point is: if you need to repeat it, encode it.

4. Connectors

A useful loop needs tools. GitHub. Jira. Linear. Slack. CI. Logs. Browser. Databases. Feature flags. Docs. Cloud APIs. This is where power becomes risk. A loop that can only edit files is dangerous enough. A loop that can open PRs, comment on tickets, deploy code, touch cloud resources, and read Slack is a different category of animal.

What you need to be careful of is:

  • Connectors turn the agent from a writer into an actor.
  • Actors need permissions.
  • Permissions need limits.
  • Limits need logs.
  • Logs need humans.
  • Insufficient testing is a liability in the AI world

This is not bureaucracy. This is blast-radius management. Proper Guardrails and due diligence.

5. Sub-agents

The maker should not grade its own homework. One agent writes. Another agent checks. Maybe another one searches. Maybe another one runs security review. Maybe another one summarizes the diff for humans. This is useful.

But dont over-romanticize it. AI reviewing AI is not the same as correctness. Two agents can share the same blind spot. Two agents can agree on the same wrong abstraction. Two agents can produce a very confident hallucination with better formatting.

Maker/checker is a good pattern. It is not a substitute for ownership.

The Non-Obvious Good Part

The best thing about Loop Engineering is not speed. Speed is the obvious part. The demo part. The Twitter part. The non-obvious part is that loops force engineering discipline.

Discipline is manifested as:

  • A loop needs tests. So suddenly tests matter.
  • A loop needs clear goals. So suddenly specs matter.
  • A loop needs memory. So suddenly documentation matters.
  • A loop needs repeatable commands. So suddenly build scripts matter.
  • A loop needs safe rollback. So suddenly deployment hygiene matters.
  • A loop needs a verifier. So suddenly “works on my machine” is not enough.

This is the funniest part of AI coding. People tried to skip engineering with AI, and AI dragged them back to engineering. A weak engineering culture gets punished by loops. A strong engineering culture gets amplified by loops.

That’s the real leverage.

The Non-Obvious Bad Part

The loop does not optimize your intent. The loop optimizes your verifier. This is the core problem.

If “done” means tests pass, the loop will make tests pass. That might mean solving the problem.

It might also mean weakening the test, hardcoding the output, swallowing the exception, deleting the edge case, or changing the assertion. This is the part Paddo nails in the rename and invoice critique: the loop cannot tell the difference between solving your problem and satisfying your check. Those are the same event to a loop. That sentence should scare people. Because most companies do not have strong verifiers. They have partial tests. Flaky CI. Weak specs. Missing product assertions. Security checks that run later. Observability gaps. 

Review processes based on vibes:

  • A loop will not fix that.
  • A loop will exploit that.
  • Not because it is evil.

Because optimization is literal.

The Invoice

The second problem is cost.

Manual prompting has a natural governor: human attention. You type. You wait. You read. You think. You type again. That is slow, but it rate-limits the bill. A loop removes that governor. I'm not saying approving every single thing is efficient or makes sense, but it's a rate-limiter to Paddo's point, for sure. Every retry costs money. Every failed attempt costs money. Every sub-agent costs money. Every verification pass costs money. Every “let me inspect the codebase again” costs money.

Again, this is why Paddo’s piece is useful. It takes the discussion away from magic and into invoices.

A loop does not just automate prompting.It automates spending.

This is why “better loop” does not automatically mean “cheaper loop.”:

  • A better loop might verify more.
  • A better loop might retry more.
  • A better loop might spawn more checkers.
  • A better loop might run longer.
  • A better loop might call stronger models.

That might be worth it. But it needs to be designed, measured, and budgeted. Otherwise, your architecture diagram has a hidden line item called “oops.”

Comprehension Debt

Technical debt is when the system becomes harder to change. Let's face it: companies have never paid attention to technical debt. Over the years, even the ones that really cared stopped doing that, like Meta. So, no, this is not an AI problem; it's a human problem with a lack of vision and a lack of lean principles. However, AI can accelerate cure or destruction, depending on what you choose to do with it.

Comprehension debt is when the system becomes harder to understand. Loops can create comprehension debt very fast. Imagine you wake up, and the loop opened 12 PRs. CI is green. Tests pass. The descriptions look good. The code looks plausible. Do you understand the system better? Maybe not. Maybe the codebase moved faster than your mental model. That is dangerous. Velocity can go up while ownership goes down. DORA metrics can look fine while comprehension collapses.

This is why I dont buy the naive “AI will replace engineers” take. The more code agents produce, the more important judgment becomes. The bottleneck moves from generation to review, architecture, verification, and governance.

We automated typing. We did not automate understanding. We did not automate value!

Overbaking

A loop without a stop condition is a junior engineer with infinite coffee and your credit card. It will keep going. And because LLMs are trained to be useful, they will find more usefulness to perform. Refactor this. Improve that. Add docs. Add abstractions. Add support for things nobody asked for. This is overbaking. It happens because the loop does not naturally know “enough.” Humans are lazy in useful ways. Humans get tired. Humans ask “why are we doing this?” Humans stop. Loops dont stop unless you design stopping.

What you need to do to mitigate this is to design a stop by.

  • Max iterations.
  • Max cost.
  • Max time.
  • Max files changed.
  • Max diff size.
  • Max retry count.
  • Max permission level.
  • Stop on ambiguity.
  • Stop on repeated failure.
  • Stop on unclear ownership.
  • Stop before production.

The stop condition is not a detail. It is the product.

Security

Loop Engineering is also an infosec problem. Maybe mostly an infosec problem. A normal LLM chat can hallucinate dangerous advice. Bad, but bounded. An agent with tools can execute dangerous actions. A loop with tools can execute dangerous actions repeatedly.

Now add MCP. Add Slack. Add GitHub. Add browser. Add shell. Add package managers. Add cloud credentials. Add issue trackers full of untrusted text. Add dependencies with README files that agents read.

Congratulations, you built a prompt-injection supply chain. What we can learn is:

  • This is why sandboxing matters.
  • This is why least privilege matters.
  • This is why secrets isolation matters.
  • This is why allowlists matter.
  • This is why human approval matters.
  • This is why “just trust the model” is not an engineering strategy.

Do not ask the LLM to be safe(Prompt: Make no mistakes). Build a system where it has less room to be unsafe.

When Loops Work

Loops are great when the work is:

  • Bounded
  • Repetitive
  • Testable
  • Low-risk
  • Easy to rollback
  • Easy to verify
  • Annoying for humans
  • Valuable when done many times

Good examples:

  • Fix flaky tests
  • Upgrade dependencies
  • Add missing tests
  • Port mechanical code
  • Run lint repair
  • Triage CI failures
  • Update docs from source
  • Open PRs for simple bugs
  • Apply known refactoring patterns
  • Add observability to known paths

This is where loops shine.Not because the agent is a genius. Because the environment gives the agent rails.

When Loops Suck

Loops are bad when the work is:

  • Ambiguous
  • Political
  • Product-heavy
  • Architecture-heavy
  • High-risk
  • Poorly tested
  • Hard to rollback
  • Dependent on taste
  • Dependent on user judgment

Bad prompts:

  • “Make the product better”
  • “Improve the UX”
  • “Refactor the system”
  • “Make it scalable”
  • “Fix all tech debt”
  • “Redesign checkout”
  • “Modernize the architecture”
  • “Make this enterprise grade”

Those are not loop goals. Those are executive wishes dressed as prompts.

The loop will still produce something. That’s the danger. It will produce something with confidence, commits, and a nice summary. Let's all remember that: Output is not outcome.

The Real Job

Loop Engineering does not eliminate engineers. It reduces some typing and increases system design.

The engineer now has to design:

  • The goal
  • The verifier
  • The memory
  • The permissions
  • The sandbox
  • The budget
  • The stop condition
  • The review path
  • The rollback path
  • The human escalation path

That is not less engineering. That is more engineering concentrated into the harness. This is the part I think many people miss. The future is not “AI writes code and humans disappear.” The future is “humans design bounded execution systems, and agents operate inside them.”

That is a very different claim.

Final Thoughts

Loop Engineering is here and trending. Huntley proved the primitive. Ralph showed the dumb thing that works. Addy gave the practice a clean vocabulary. Paddo gave the useful critique: the rename legitimizes the pattern, but the invoice, verifier, and comprehension problem do not go away.

My view is simple:

  • Loop Engineering is Harness Engineering in motion.
  • The loop is not the magic.
  • The model is not the magic.

The magic, when it works, is the boring engineering around it: tests, state, isolation, permissions, budgets, logs, and humans who still understand the system.

Good engineering gets amplified. Bad engineering gets amplified too. AI doesn't fix broken engineering culture. That is the whole game. So yes, experiment loops. But like an engineer. Not like someone trying to avoid engineering. 

Huge shoutout to Addy Osmani and Paddo for shaping the industry conversation around this. Addy gave us the technical primitives, while Paddo brought the essential, unvarnished truth about the invoices and comprehension debt—thank you both for the inspiration behind this piece.

Cheers,

Diego Pacheco

Popular posts from this blog

Cool Retro Terminal

Harness Engineering

AI coding Agents Evolution