Posts

Harness Engineering

Image
Harness Engineering is pretty trendy at the moment. Harness engineering is a way to better drive or operationalize an LLM model. The idea is that you are renting an LLM model as a service, but you own the harness. The harness is a way to be less dependent on the model (LLM). LLMs are not deterministic at all, and they are not general intelligence; they are pretty limited to their training data. Inference cost is very expensive, and the era of subsidizing is over .  AI was sold as a promise to solve engineering and elevate us to a new level of abstraction, and so far thats far from being a reality. I keep hearing people say "wait two years" every year. More and more engineers spend more time trying to drive LLMs to produce the right code. When that is well executed, we can see 10-30 % productivity gains across our industry. Such good execution results from proper testing, robust CI/CD, great automation, amazing observability, and attention to technical excellence. When that is...

Multi-Agent Systems and AI Transformations

Image
AI everywhere, agents everywhere. We just finished the first quarter of 2026, and a lot happened in those first 3 months. It feels like 3 years have passed, not three months. Opus 4.6 really changed the game, but software engineering and distributed systems are not solved by agents; we saw hype at the top of the hype, with huge,  unrealistic expectations that need to be dialed back and properly adapted. The less you know about AI and Agents, the more impressed you are, and no, you cannot get rid of all engineers. AI Alone does nothing and cannot self-verify to the point that systems of systems can be completely automated and run hands-off. Maybe we will get there one day, but we are not there, and no matter what people say, no one can predict when this will happen. Could be 30 years or even more. Karparthy already said that: Zero to Demo is easy . Demos are not impressive anymore. Zero to production is still a very different story. Multi-Agent Systems: Process as Agents Pad...

You can't fix code review with code review

Image
Engineers never liked doing code reviews. Especially if there were lots of files, you got fewer reviews. That was so 2023. Today, in 2026, AI Coding agents generate most of the code, and the code review problem is much worse than it ever was. Paddo captures very well the disaster that is Amazon Kiro and Spect Driven Development. Everybody believe that code review is a bottleneck; let's be honest with the anti-pattern of vibe-coding, and when speed beats safety, bad things happen. Many companies are seeing twice as many incidents, including Microsoft GitHub and many others. Safety needs to come first and speed next, not the other way around. Industrial logic got that decades ago with Modern Agile . Modern agile was a second take on the agile movement with the addition of modern concepts. That is not new; in fact, Modern Agile was created back in 2016. One of the principles was "Make safety a prerequisite".  More AI: Means more things to review Many companies and people b...

Agent Skill in Multi-Agent Systems

Image
People building agents today are mostly doing one-shot. Meaning they write one and that's it. Yesterday, I was watching the YC Lightcone podcast: "Inside Claude Code With Its Creator Boris Cherny" and one of the things Boris, creator of Claude Code and head of Claude Code in anthropic, said is that they delete the CLAUD.MD a lot because they want the new models to take over. That insight tells us a lot that we cannot just settle for whatever prompts we have. Besides that, depending on how we write the prompt, we might use more or fewer tokens; there are ways to better structure agents, workflows, and skills. For this blog post, I will cover some lessons learned while building and improving agents, workflows, and skills. I did a bunch of experiments; in fact, I wrote 7 incarnations of my agent skill. To test the agent's skill, I asked the agent to build a Twitter-like application so I could evaluate the quality of the code and solution as a proxy for the agent's s...

AI coding Agents Evolution

Image
AI coding Agents like Claude Code , OpenAI Codex , and Gemini CLI have disrupted how software engineering is done. IMHO, the most disruptive agents are Claude code and Codex. However, a lot of things already happened, some progress has been made, and there is some evolution in the space. We saw the birth of custom and subagents to avoid passing the whole context window down, custom commands  to have more control over a workflow, or when a specific task is executed. Hooks  add more determinism and make sure tests and linters are executed as part of the guardrails. From the explosion of MCPs to Multi-Agent Systems. There are many interesting changes and evolutions happened, we learned somethings while some things are still to be learned. For this blog post, I will cover some of the evolution in AI coding agents (mainly around Claude code). I did a lot of POC with agents, 74 Agent-related POCs at the moment. One thing I keep saying is that POCs are getting expensive, now not ...

AI Agent Infrastructure

Image
The One does not simply use AI Agents in production. Before using AI agents in production, we need to understand that LLMs are token prediction machines and by nature are non-deterministic . No matter how good you specs are, AI will drop packages and make mistakes. Lack of determinism is just one aspect we need to keep in mind. We also need to keep in mind that it's very easy to jailbreak the models . Adding a chatbot directly to customers has dangers and not only in a security sense, but also for misuse and potentially legal problems. Even if that is all somehow managed and risk is minimized with proper guarantees, one still does not just use agents in production. 20-15 years ago, we would not just deploy APIs to production; we would use an API Gateway. Considering agents and LLMs, we need the same: an AI gateway infrastructure. What happens if your API provider (Anthropic, Google, or OpenAI, for instance) is down? Is your business down? 

State Induction

Image
Imagine you are coaching a basketball team. You want to train your team to be good at 2-point shooting from the inside. Now imagine for some weird reason you can't test that, and you need to play a whole 4 quarters basketball game in order to be able to maybe, with a lot of luck, score 2 points. That would suck, right? This actually sounds insane because we all know we can skip the whole game and just train 2 points from inside right? Well, what IF I told you the basketball game is often a people test software, and they cannot train exact scenarios (State Induction), and they actually need to test the whole thing (expensive E2E testing). What if we could write tests in a very different way, so that it would allow us to have massive parallelism, and perhaps multiple people could test the same thing at the same time, and it would work.