Harness Engineering

Harness Engineering is pretty trendy at the moment. Harness engineering is a way to better drive or operationalize an LLM model. The idea is that you are renting an LLM model as a service, but you own the harness. The harness is a way to be less dependent on the model (LLM). LLMs are not deterministic at all, and they are not general intelligence; they are pretty limited to their training data. Inference cost is very expensive, and the era of subsidizing is over.  AI was sold as a promise to solve engineering and elevate us to a new level of abstraction, and so far thats far from being a reality. I keep hearing people say "wait two years" every year. More and more engineers spend more time trying to drive LLMs to produce the right code. When that is well executed, we can see 10-30 % productivity gains across our industry. Such good execution results from proper testing, robust CI/CD, great automation, amazing observability, and attention to technical excellence. When that is poorly executed due to a lack of proper technical due diligence, we see slop disasters even in big tech. Harness engineering is not the fix to AI Slop or lack of determinism, but so far, it is the best tool we have to make it less wrong. 

What is a Harness? 

Let me answer that with another question. What is an AI Agent? I would say it's a set of instructions to achieve a particular task. When those instructions get more elaborate, or you have code, you get a harness. A harness is how complex agents get implemented. 

The reason I say complex is that a harness is far beyond a simple 2-line prompt and more advanced than context engineering; it's not just about how to properly use and manage the context window. It's about engineering, orchestration, integration, and the addition of determinism. Adding determinism does not mean we make LLM deterministic, but when the harness calls a tool thats engineering and therefore deterministic.  

Harnesses can vary in engineering level and complexity. But we need to keep in mind that a 2-10 line prompt is not a harness. We need more rules, structure, design, and even architecture, I would argue, to call something a harness.
The simple form of a harness could be a skill, which is a markdown file that can actually be used across many harnesses, like Claude code, Codex, Opencode, etc. Engineering can be added on demand, starting with pre-backed scripts and instructing the LLM to call the scripts. We can also implement a full-blown harness using programming languages such as Java, Scala, or even Rust. Rust is a great language to write a harness. Funny fact that most of the harnesses today are written with Typescript and Ink(A form of React for the terminal). IMHO, the wrong tool for the job. 

Programming languages are great for writing orchestrators (another term for a harness). How you solve the problem makes all the difference, and that's design and architecture. No matter how good the harness is, it cannot be 100% slop proof because they use LLMs at the end of the day.

Common Popular Harness 

There are a lot of harnesses out there. Claude Code and Codex (You might have heard them be called AI Coding Agents as well) are the most popular and are used massively nowadays. More and more companies are building their own harness. 

Common harness is a way to make LLMs more efficient than traditional chat applications like ChatGPT or even the first generation of tools like GitHub Copilot. You can consider such a harness like Claude Code or Codex, Harness of Operational Systems for Agents. 

More and more companies are creating their own harness, not only to compete but also to better steer the LLMs. LLMs cannot do it all by themselves; they need rules, scripts, and applications. Such applications and rules are often engineered inside the harness. 

Harness Anatomy

Harness can do whatever you want if you use a coding language. Considering the popular harness, they are not so different and have lots of common elements like for instance: 

LLM: All harness remote LLM model APIs or use a local model with Ollama or another form of local AI. LLM is never embedded in the harness. It's either remote or running locally on your machine, but not embedded in the harness. 

The Core Loop: Harness is an orchestration machine. So they have a main loop where they read instructions for the user, call the LLM, integrate and facilitate tools calls. Parse outputs and results; format data to send back and forth to the LMM. IF you are curious about how this works, you can check my Claudio Coda POC, where I implemented a harness from scratch in Rust with tools.

Tools: Harness has tools support, the most common tools are bash execution related tools like creating files, reading files, executing scripts. Here is where we absolutely need engineering: LLMs are text-based and not compilers, so we need a programming language to do all that. LLMs can generate a bash script, but you need to create the file in FS, execute it, and send the result back to the LLM. 

Memory: Harness needs to store data; usually, this is done with plaintext files like .txt or .markdown. Common memory files are CLAUDE.MD, AGENTS.md, or even whatever custom MD file you create. Memory can be long-term and short-term. RAG is a form of long-term memory. Vector Databases could be another form of persistent long-term memory. 

State Storage: Harness can use the file system and simply write json files there as a form of communication with subagents and state persistence. There are a bunch of projects trying to provide a virtual file system interface and, in the end, persist to a traditional relational database or even a NoSQL database. Like My Redis FS POC.

Sandbox: This is usually optional, but Harness can also run in a sandbox, providing additional protection since security should be layered. Most of the harnesses are not super strong on this one, and this is usually handled by a 3rd-party solution or even an AI Agent Gateway like Portkey or LiteLLM.

Harness Patterns

There are many patterns popping up for building a coding harness and advanced skills. For instance: 

Progressive Disclosure: Instead of feeding a lot of text to the LLM up front, you give the model a pointer. For instance, if you need a JavaScript linter, read linter-js.md file. This way, MD files or system prompts get very lean and LLM load files on demand as needed. Such a pattern is key for proper context window management. 

The Advisor Pattern: This pattern is good because it separates decision-making from execution. You could spin a smaller model to do a simple task, and then the result could be orchestrated by a big model and aggregated into a more complex task. Which, again, is a good way to manage the proper context window. The big model plans a trip to France, and the small model finds 5-10 restaurants near the tour location.

Scape Hatch: LLMs are people-pleasers, and that is not good. Senior engineers are annoying and ask Why all the time and push back. If you ask the LLM to do something, it will do it, even when it's the wrong thing to do. So you need to give the LLM a way to escape, that's the escape hatch. Imagine a code review skill where the agent offers options: accept the agent-produced review or let the user write their own. 

IF you are interested in Harness Engineering patterns, I'm building a catalog. I have 46 Patterns now; check them out here

Complexity

Harnesses are getting bigger and more complex every day. There are many analyses like this on the internet. When the harness is closed source, it's much harder to debug, troubleshoot, and even know what's going on. Harnesses are becoming big monoliths. 

Beyond the complexity of the harness itself, there is the need to ensure the model does the right thing. Engineers are spending more and more time driving the harness in the right direction. Code produced by harness gets a lot of SLOP if you are not reading and not thinking about how to use it. AI isn't a genie you can just say, "Build a great system with no mistakes, and it's done." AI might be the biggest con artist in history. It can only do things humans can do; it's faster-ish but not better. There is still a lot to learn, but one thing is true: token usage is getting out of hand, and we need to remember that token maxing can also be wasteful. Turns out AI is a tool. Engineers are good at doing engineering. Harness engineering is a way to do more engineering and get more sanity.

Cheers,

Diego Pacheco

Popular posts from this blog

Cool Retro Terminal

GIT based wiki with Gollum

Having fun with Zig Language