Hardening Production

Often successful companies have significant growth. Meaning: grow in structure. More people, more departments, more managers, more coordination and often more overlaps. Enterprise companies always end up having some form or governance issue. I remember at the beginning of the SOA world "Governance" was a very bad word and there was abuse. After 3 waves of Agile manifestation, we are with much less "Governance" look like we still have plenty of abuse. Abuse can be also called WASTE(Lean Concept). Some rules can really promote best practices and better software like: Do not share your internal data stores, Expose data via Common Interfaces like (HTTP / gRPC).  Systems tend to grow and get more complex as companies grow with them. Every single improvement(refactoring) often means more investment($$$) which also can be saying as business entropy as the time passes it only gets worst(more complex and more expensive). When we analyze refactorings individually(business impact) they might not make sense but that cannot be the only measure otherwise entropy will kill you in the long run, so there is the complex chess math going on. Bad code or bad design is not the only source of entropy, tests can be big offenders too. 

Unit Tests have can have Waste

People often value unit tests as they are the best kind of testing technique. IMHO that happens for a couple of reasons. One because there was a lack of unit tests in legacy applications, often legacy web apps and desktop apps. So it's natural to say unit test all the things. For the frontend, there are better ways to test and have value like double down on linters and static tying like TypeScript for instance. Also for the frontend, you can double down on Snapshot Testing and tools like Jest. However, backend Unit Tests have more value and are more important. We need to understand that unit tests can have waste, meaning: Flaky tests that break all the time for bad practices, coupling with external things like database IDs, or just because of mocking abuse. One thing companies always want to do is raise the bar, which makes sense however often that bar means more coverage. 

All metrics can be gamed 

That's the issue, as you as for more coverage the goal became to produce more tests, not often means produce better tests or make sure we dont have waste in tests. Whatever number you get 100%, 90%, 80%, 70% you might have WASTE but higher numbers have a higher chance to have waste there. Should we have no metrics for them? well, thats a hard problem. Since this is subjective is really hard to measure and really hard to have a clear safe a universal signal. 

The Issue with Mocks

The issue is people end up abusing from mock. Either you have the real thing and you are doing a real integration or E2E test or you want to test something else and the mocks are a way to isolate your code from other stuff so you can test in isolation. That is fine, however, the issue is most of the time people are testing mocks. There is no value in testing mocks. Mock is a form of coupling if you have a shared library mock that could be a huge step into migrations and patches. 

More Diversity (CI/CD, Observability, More types of tests) 

IMHO there are other things we can do than just (add more coverage) like have a continuous CI/CD pipeline working which will increase ownership and by having the code being potentially deployed at any time(Readiness) you increase the health of the codebase by saying you need to keep the build and test passing all times. The issue with Release Trains or Release calendars is often there are components you are not deploying and people end up no caring and creating a state of build failures and tests that do not pass. With CI/CD there is no such thing. No having CI/CD means you will afford much more waste which is bad. 

Observability and the ability to understand what's going on is also a huge improvement and observability is also a form of testing. So it's another dimension that can be explored. Finally, I would say you can look for having more test diversity like Mutation Testing, Property-Based Testing, Snapshot Testing, Chaos Engineering & Stress Testing. These other forms of testing allow you to do more with less and also uncover issues you might not catch with regular unit/integration tests. 

Hardening Production the whole point 

Finally, I would say it does not matter if works in your machine. The whole point is to work in production. Working in production means, testing in production, because able to deploy software there and test with real traffic and real production hardware without impacting the user experience, so we need to have isolation and automation mechanisms in prod in order to deal with the then. We need to understand that production is a shared responsibility and everyone should be holding accountable in regards to production. We want to harden production because is where the value is and where we deliver value to customers. The best place to be improved is the production. 

Cheers,
Diego Pacheco

Popular posts from this blog

Kafka Streams with Java 15

Rust and Java Interoperability

HMAC in Java