State Induction

Imagine you are coaching a basketball team. You want to train your team to be good at 2-point shooting from the inside. Now imagine for some weird reason you can't test that, and you need to play a whole 4 quarters basketball game in order to be able to maybe, with a lot of luck, score 2 points. That would suck, right? This actually sounds insane because we all know we can skip the whole game and just train 2 points from inside right? Well, what IF I told you the basketball game is often a people test software, and they cannot train exact scenarios (State Induction), and they actually need to test the whole thing (expensive E2E testing). What if we could write tests in a very different way, so that it would allow us to have massive parallelism, and perhaps multiple people could test the same thing at the same time, and it would work.

The problem is not the environment 

In one of my previous posts, I discussed why shared environments do not work. The post was called: Environments and People. Basically, you can have as many non-shared environments as you want, and you will have the exact same issues.  Because the problem isn't the environment, it's how people behave in it. When you add new environments, people behave the same; therefore, you have the exact same issues. In the post I mentioned, I was talking about a series of principles that people need to follow and behave according to to achieve better results.

The bottom line is simple (the actual work is pretty hard). We need to change people's behavior, which can only happen with training and a lot of support. IF you stop to think about it, the production environment never had these problems. There are maybe thousands of users running in production, and all are different users "live" there, and all work without issues. Not only can people not touch production, which is one of the reasons why it works and non-production environments do not work. One simple option could be to remove access to people in non-production environments and force them to only mess around in their local environment, maybe using containers. However, that's not enough, we need to change how tests are written also. 

What about production data?

But what if we make production data safe and anonymous, and then refresh it in non-production time to time? Maybe that's all synthetic data; it doesn't even need to be real data. 


Sounds like a good idea, right? Only on the surface. Because when you provide such bucket data to people and say, " Come here, pick it. The problem is that it still does not avoid collisions. Unless that happens in their local environments inside a container, but one day you will need to move to a shared environment in non-prod, and that's where you will have trouble. We need a different approach. 

Easy and Hard Tests

Some tests are very easy to write; you don't need a lot of support or even a lot of changes. However, some tests need to be fundamentally different from how we approach testing. Now this leads us to this spectrum:



IF you have a read operation, meaning you are just reading data, then you do not need to worry, you simply write your test, and it will all be fine (as long as you are not depending on a hard-coded ID). IF you test insert the data you need and delete, and the end (setup and tear down), you are gold; easy peasy lemon squeezy. However, if you test needs to mutate data (delete or update) thats where we need to do other things; we can't just simply test.

IF you simply test, very likely you will endup with a flaky test, which will give you random headaches all the time in your shared environment. These are hard to test, but it's ok, hard things can be tested. They just require a bit more work, but are still doable, and we can and will do them.

No, you don't need production data...

There is one interesting thing about production data. Production data covers a variety of testing scenarios. 
Very often, when people ask about production data, they want to see which scenarios they need to consider for testing. That's fair, but I would still argue you don't need production data. What you need is a Testing Contract; you need to write you software in a way that will work without testing with all data in production.
For instance, let's look at the USA power outlet standard. Pretty sure when manufacturing companies build power outlets, they do not and will not test with all hours of America. That could be insane and impractical. Instead, they have a contract, which is a set of rules on what they support, and they test the adherence to those rules. You don't need to test with all data in production, but you need to get you testing contract right. 

By doing so, we will need much less E2E testing, and you will be able to test components in isolation. Now, again, we don't need the production data; we need the "State" induced so we can guarantee it will work when the component faces real production data.

We have 2 mindset shifts going on here. The first one is that instead of needing production data, you need "State" and you need to be able to "induce" that starts fast. Remember the basketball metaphor I used at the beginning of this post? We dont want to play the whole game (E2E testing) to test 2 points from inside (which can be done with state induction). The second mindset shift is that we are building the tests very differently now to make them reliable and scalable in a massive environment. 

State Induction Patterns

2 years ago, I wrote a post about induction and testing hard things like batch jobs and queues. Here are the same ideas, however presented in a form of patterns. Here are 3 patterns that will allow us to perform tests in a way that will be aligned with the principles I described earlier. 

DB Pattern: This pattern works with all kinds of databases, including Oracle, MySQL, PostgreSQL, Cassandra, and DynamoDB. Even with in-memory systems like Redis or MemcacheD, all these solutions can be covered with the same pattern. The idea here is pretty simple. We will insert data at the beginning of the test, call what we need, perform assertions, and then delete the data we created. This way the test is self-contained, it never shares IDs and will always work in all environments. 

However, we cannot just "insert data"; there is an order of precedence we must follow:


1. API: IF there is an API, you must call the API. API is the best level of abstraction. IF The API does what you need, that';s the way to go. IF Some internal detail changes, the API keeps working. API is the right level of abstraction, and when available, we should always use it. 

2. Testing Interface: Testing interfaces are APIs; they don't exist in production. Because we do not want they exist in production, but they will exist in non-production and will be maintained by engineers, and these APIs will allow all sorts of complicated test scenarios like: an e-commerce application, and you need to simulate a user who did not pay with a credit card. 

3. Via UI: UI Automation via web with a tool like MS Playwright or some desktop RPA should only happen if we don't have an API, and we can't change the code to create a test interface. This will be the case of a proprietary SaaS application where you have no control. 

4. via Database: As a last resort, and Tully needs to be the last resort, we would use the database to change data. The issue with this approach is obvious: it's fragile and can create distributed monoliths if done wrong.

But what if we dont have a database, and there is only an API and the API is limited, we would need to go outside of the pyramid and do mocks, but we can't do the standard mocks; we need a very advanced kind of mock. Which, at this point, I dont think we should even call a mock.

2. Queue Pattern: Testing queues are very hard because of the nature of FIFO (First In First Out); it's so easy to have your message consumed by somebody else. Queues are especially hard to test. However queue can be tested. We require some engineering inside the component that uses the queue. Here are some options:

* Multiple Queues: Imagine there is a queue for each developer; in that case, the system can route the message based on a header ID and then use the appropriate queue. Providing isolation and massive scalability. 

* HashMap in Memory: Another option is to not use a queue and by-pass the entire queue system, and the software (service or component) would have an internal HashMap that can and will support high concurrency, and therefore each developer can have their own ID inside of the map. The obvious problem with this approach is that you are not testing the queues. 

* Store into the DB: Another option would be that we would bypass the queue and then store data into the database. We can easily create one ID per developer and therefore provide isolation. Now we still have the same issue as the previous option: we are skipping the queue. 

* Adapter / Header: IF we provide a level of indirection, meaning imagine instead of putting the message directly on the queue, you have an adapter object or a header. Now you can add sufficient metadata so that each developer has their own ID, and we can isolate messages. 

IMHO, Queues are among the hardest things to test properly. Hard but possible with some engineering. 

Batch Pattern: Usually, batch systems are reading and writing to a database, and then we can use the same DB pattern I described before. The issue is that by nature, batch jobs only run at a scheduled time, and that can be 1x per week or in 2 years. We don't want to wait to test. The solution here is to decouple the execution if the batch code (let's call it batch service) from the scheduling. Therefore, in non-prod, we can trigger the batch job like any service anytime we want. 

The hard part

The hard part is, what about these proprietary SaaS systems that we don't have any control over? How can we simulate several different states? Remember the "mocks" outside of the pyramid?  For that, we need a platform that can induce state into complex scenarios, something like this:


The idea here is that you would have a git repository with a bunch of json files. Such JSON files would be profiles with a whole graph of decisions. Decisions can be: call the real component, mock the whole component, or, for something more nuanced, call the real thing but only change these 2 fields in the return. 

Imagine you have all these profiles (json files) attached to unique ids. There could be a server that would return the profile to you based on an ID. Consider a simple SDK where you would instrument all your REST clients. The SDK would be very lean and lightweight. The SDK would detect if you are running in production, and in that case, it would do nothing but call the services. However, if it detects it's running in non-prod, it would look for HTTP headers and, based on those headers, fetch a profile to apply transformations. 

Such a tool enables testing many hard scenarios that would never be tested otherwise. Also, it would allow proper Stress and Chaos testing. Considering enough data, we could easily use AI to help us generate the profile (json) files. 

From Principles to Practice

Proper testing requires discipline, requires training people, and having the right principles. Proper testing must be done from within because, as Lean teaches us, quality must be built in. Having flaky tests requires proper discipline and the right testing approach. Otherwise, tests will always be flaky, no matter how many environments and the abundance of data there is. Hard problems are not impossible; they only require more discipline and more education.

Cheers,
Diego Pacheco