Going Faster with Testing

The traditional management says that quality changes as we increase or decrease scope. That came from the traditional management triangle(Scope, Time, Price and Quality in the middle). This idea is completely wrong. Quality does not make us go slow, actually is the opposite, it makes as goes much faster. There are 2 important questions: A) Why do we need to go Faster? B) What faster means? Companies want to innovate, grow and increase revenue and market share, most of the companies today realized they won't achieve it with current mindsets, skillsets, and technology. So there is a rush for acquisition, companies need to acquire: 1) New Business Capability 2) Talent Density 3) Better ways to build digital products and innovate. However, there is something wrong. So for an unknown reason, we often are stuck in old mindsets like Only Deliver features, Follow Plans, Keep current process and Values. Great but what's the relation from the Business mindset with Testing. Testing is at the heart of everything. When you run a Test(Also know as an experiment) you can have 2 outputs: Sucess or Failure. Failure is fine and normal as long as you learn from it. The question is what are we learning? What is our goal? What Going Faster Means? What kind of Testing do we Need?

 Agile View on Engineering Speed

     Tweet Link: https://twitter.com/allenholub/status/1233137308418945030

The heart of agile is about Feedback. Why feedback? Because we need to Learn. Going Faster the right way means Learning Faster via more deploys in production, MVPs and Mindset shift. Just following a plan does not make us go faster, no matter how many user stories we deliver per week. If we end up delivering +300 user stories per week - we need to think if:

1) Are we making an impact on users? Are we getting the right outcomes?
2) Are we pilling up debt that will kill our productivity soon?
3) Are we getting more organic growth or we are just following plans?

So in order to go faster, we need to learn faster, how do we learn faster?

1) We make experiments, We fail fast and we discuss ideas and reduce the backlog.
2) We apply user experiments via UX usability Testing, Landing Pages, Surveys, A/B Testing, MVTs(Minimum viable testing). 
3) We adapt to changes, new variables, variation and respond to change via learning in a blameless(Devops culture) and Psychological Safety Envimento(Google principle).

In order words, we need more testing. Lean Startup is about Testing. Testing is not only for engineering but for Discovery and for Goals(OKRs). We need to Test more and learn from our tests(experiments). Testing is not only a Discovery matter but also a delivery one, currently, there are a lot of techniques that need to be considered in order to make sure also engineering is building the product the right way.

Testing Pyramid and Sustainability

Testing is old practice in engineering. However, it's often we see products with poor coverage and lack of proper test balance(how many units vs integration test should I do?).

The classical Test Pyramid cover from 3 levels(Unit, Integration, UI) to 10 levels(considering E2E, Acceptance, Regression, Stress, etc..). However, is this enough for 2020? I dont think it is. There are other ways to do testing. I'm not saying that the classical Testing pyramid is wrong. I'm saying: We need more.

Why do we need more? There are several reasons, like:

1) Software become more distributed. i.g: Cloud, Edge, IoT.
2) There is more specialization in Engineering: Frontend, Backend, IoT, DevOps Engineers...
3) Testing can be expensive(Unit testing it's not a silver bullet) due explosion of options. 
4) Difficulty to replicate REAL prod env.
5) Multiple teams, cultures, countries, and timezones.
6) Constant software change and Technology evolution

Considering this scenario it's clear that Unit Testing is not enough. Because it happens BEFORE PRODUCTION and we need start doing whats SV is doing, Testing in production. Even on the classical Testing pyramid, there are other testing approaches that can be used like:

* Property Testing
* Mutation Testing
* Snapshot Testing
* Statical Analysis Testing (Covertura / Security Checks)

Let's talk about production now. Testing in Production: Oh Boy! 

Why It's not enough? Brand New Testing Approaches

I know, you think I'm crazy. Often when the words Production and Testing come together they are associated with the following issues:

1) Lack of Maturity
2) Risk
3) Damage to the brand and final user
4) Risk of creating outages
5) Lack of Professional Behavior
6) You are nuts, we are not SV feelings

True. Poorly executed Testing in production will create these issues. However not Testing in production also create issues. Do you trust your: Dev, QA, Staging Environments? Would the QA/Staging Environment the new "works on my developer machine mantra?". Often lower environments are not like production.  Lower envs often lack:

A) Real Traffic
B) Real cluster sizes and machines
C) Real Configurations
D) Real Users
E) Real Observability

So if QA/Staging is lacking all that, what's the point of testing in QA/Staging? I would say that the main reason is because is how we always did it. Secondly Fear of creating damage and last but not least lack of proper automation, observability, isolation in Production. Great so dont test in Production. Not testing in production does not give you feedback, so are we going faster or slower?

The KEY thing is the difference between deployment and Releases. For most companies, they are something. For smart SV companies DEPLOY is add things in production but do not expose to users. This difference is the key to safety and reducing the blast radius. Netflix runs chaos monkeys in production for many years and no customer impact is noticed. So they do Test and they do learn. If you do not Test you will learn when outages happen which are much more expensive and much more damaging than testing in production.

The previous image show testing approaches classified in Pre-production and In-Production. There are many well know ideas like Chaos Engineering, Stress Testing, Observability(Logs, Traces, Profiling, Auditing) but there is new stuff like Shadowing which means recording prod traffic and replay to a service or via Service Mesh(i.g Envoy) isolate the call in a canary.   

Testing in production requires:

1) Automation
2) Canary
3) Observability
4) Isolation
5) Proper Design (Rollback, metadata isolation, different DB cluster, etc..)
6) Ability to perform Rollback
7) Caution

However, is a very powerful tool in order to make sure our product works and we deliver the best user experience we can do at the cloud. Doing that we will be having super important feedback, with the feedback we will be learning and then going faster.

The Way Forward

A great starting point is to explore other ways of pre-production testing like Mutation, Property, Snapshot and Statical Analysis. As you increase your Observability and Automation you really should consider testing in production because you will get the right feedback and also you can save money on lower environments and from avoiding outages.

Proper execution requires maturity and breaking tabus on the sense of Production being something holy and unsuitable. Production is the most important environment however with the right set of mechanisms we can take it much more advantage for the business.

Diego Pacheco

Popular posts from this blog

Podman in Linux

Java Agents

HMAC in Java