Re-write or Strangler

In the matrix movie(classical and awesome movie), there is a classic scene where Morpheus challenge Neo's understanding of reality and offers 2 pills the Blue one which would keep the status quo, and the red one, which could be painful but would be the reality.  If you have a successful company with software thats the reality of the industry. In other terns, the discussion can seem as should we re-write the software or should we strangle it. Re-write software and even strangler patterns have cases of success and failure. Before we enter into the trade-offs of this complicated dilemma lets us understand the forces which create problems and opportunities that are around this discussion. Let's talk more about the context and their trade-offs. 

The Context

Engineering means making mistakes. No matter if is new tech or old tech. No matter if the tech is mature and has plenty of documentation and thats is true because people come and go, is impossible to just have senior staff at all times. 

Good Technical solutions depend on a variety of disciplines such as Code, Design, Analysis, Tests, Operations, Security, Observability, Reliability, Frontend, Backend, Mobile, Business, Research, UX, Discovery, and many others more. Mastering all those are difficult and some disciplines are more established in the IT industry them others like Coding vs Design or Architecture vs Security. This is an important factor that might lead to debt or problems in our solutions.

To Make it worse, every 5-10 years there is a massive migration going on either changing the OS, the language, the platform(i.e Cloud or web vs mobile), or all of them at the same time. Which put us back into the learning space. Should we stop using the right tool for the job and doing that? Absolutely not. 

Do we need to be more careful about our tech choices - yes, especially at the Scale. At the same time we need to move fast and unblock innovation, so how can we do both at the same time? Netflix keeps talking about Talent Density and having more talent means better performance and less process, true. IMHO we need ISOLATION if we can isolate solutions under explicit contracts(the SOA way) we can create a safe space for us to make mistakes and change tech under the hood.

When we do tech projects they can deliver a good impact on the business when they are not measured by scope, time, and budget only(Allan Kelly - NoProjects ideas). However, the reality is, not all projects end up well. When I mean end up well I dont mean those waterfall ones who never deliver but the ones who deliver and are used. Just because "solutions" were shipped and "in use" does not mean companies are in a better place, could be the opposite actually. 

Tony Saldanha nails it on Why Digital Transformations fail, lack of discipline. That easily is the answer to why Agile, DevOps, <<Future Movement which will fail here>>. Lack of well-disciplined engineering practices ends up producing legacies. Now the 1M dollars question, what do we do? Do we re-write it or do we strangle it? 

Before we analyze the trade-offs between the 2 approaches and I make two interesting examples, the classical post from Joel Spolsky on why you should never rewrite your legacy. A great counterexample is Evernote Bald's move of holding features for +18 months on big cloud migration and re-write.

Strangler Pros/Cons

Strangler in a nutshell: Have a Contract(Service interface - i.e REST with gRPC) and start migrating code to use that interface and stop accessing database tables directly. Sounds simple but in practicality is hard, specially on the following secnarios:

* You have lots of code and scale, others problems and also others priorities going on

* There is a team erosion problem or lack of established owners

* You have several active systems that are reading/writing to the same tables (the whole strangle will take a long time) - not sure if you miss applications or not - so will need to do proper monitoring for a while. 

In Order to completely isolate a sub-domain could take a long time. The good news is that we can do things incrementally with stranglers. The bad is that besides being time-consuming and case by case requires special converter/adapter logic for each consumer. Rollback plans might be difficult as well since we are talking about a case-by-case basis. 

What about the Data Structure or Format? Well, that could be another big challenge. I wrote some thoughts about that in this and this blog post.

Re-write Pros/Cons

Re-write give you unique opportunities, in some sense, it could be much easier since you are not trapped by the past tech debts and have a chance to approach problems with different languages, frameworks, libs, and have better architecture and infrastructure. Re-write is also a big talent magnet. However, the business might not think the same way.

The issue is to justify the re-write for sake of re-write.  You must have the appetite and strong winds blowing in the right direction, like, for instance:

 * That part of the business is mission-critical now

 * There is a big growth targeted on that direction, lots of investments comming 

 * There is significant cost reduction(i.e cloud migrations)

 * There is killer new capabilities for the business which will make a huge impact on costumers

 * The future of the company might depend on that

 * The software or technology is too old, too hard to get people, too inefficient, instable 

For those reasons might end up being the right ones for a re-write. But this choice is difficult. This is a governance problem with variables that keep changing. The big issue with re-write is that potentially there will be a time to get all your old features back, for sure you might not need all those. However some customers might not be able to use your system if you dont have a specific feature, since that can be customer-specific, this might be a big challenge. 

However, not all rewrites are the same. For instance, in one scenario is that you might be just looking to improve the underground pipes like having a modern language, improving databases design, code, more tests, etc... but not necessarily having completely different UX or completely different products. 

Now the other story is if you are changing from one platform to another. You cannot just PORT your software as it is from the Desktop to the Cloud and from the Web to Mobile for instances. Lots of people believe Apple's success over iPhone compared with the Windows fiasco with the iPhone was because in windows they trip to port what they already had and in apple's case they started from scratch and fundamentally re-think the whole thing from the ground up. 

How to Move Forward

This is complicated but I believe these are actions you can take in order to move forward and navigate in these complex and turbulent waters, you can move forward by:

 * Have a Governance Map or Inventory (Classify your solutions by Active, Dead, to-be-killed, etc..)

 * Embrace ISOLATION and SOA and have explicit contracts as much as possible. 

 * Build the Discipline for Healthy Engineering (Testing, Architecture, Design, etc..) 

 * Actively talk with the business and understand where the wind is moving

 * Build new capabilities and services out of the monolith but with true isolation - otherwise keep them in the monolith - will be better.

 * When is strategic re-write and decommission legacy systems. 

Be careful with our movements, sometimes doing nothing is better, other times make it worst. It is always a good idea to make some analysis to understand the context and see if the change will be for the best or not. 

I hope you guys like it - take care! 


Diego Pacheco

Popular posts from this blog

Podman in Linux

Java Agents

HMAC in Java