Blameless Feature Reviews

Have you ever wondered if what you build has the right impact on the customers? Engineering is often demanded to be on time and cost-effective.  Bugs and incidents are known for disrupting the customer experience. Less bugs, the better; fewer incidents, the better. So when do we reduce bugs and incidents? Devops has an interesting practice called Blameless Incident Reviews (BIR). Considering the devops culture and movement, blameless incident reviews are great because they drive the right culture shift from feat and blame to sharing and understanding. BIR is often pull-based, which happens when we have a number of production bugs that are worth sharing and driving lessons learned to the whole org. Devops is all about better ways of building and operating software. We cannot do better if we are stuck with the same practices all the time; practices need to be changing and evolving and a way to keep us fresh and learning at all times.

How does Blameless Incident Review work?

Usually, there is a classification of severity; imagine a scale from 1 to 5, for instance. S1 will be the most severe bugs and incidents that should be reviewed. S5 might be less severe and maybe we dont need to worry so much. How can we make a difference between S1 and S5? Customer disruption, Financial Loss, and Mean Time to Recover (MTTR) are good criteria. 

Once there is some value of incidents or critical bugs, let's say 3-5, for instance, then we can have a meeting where we talk about such issues. Now imagine that this meeting does not need to be fixed(push model). It could be pull-based (happens on demand). 

Then, one will present what happens, usually in the form of a timeline of events. The most important part of this practice is twofold. First, we need to drive lessons learned. Second, we need to take action to improve things. 

Let's me say this very clearly to take this out of the way. Blameless incident reviews are RETROSPECTIVES. You cannot have effective retrospectives with our good facilitation, and lessons learned need to be driven. What did we learn from incident X? BIR can go bad if we just create tickets and do not drive lessons learned. Plus we need actions, we need start review previous incidents to see what we did it wrong and if we made it better or not. Otherwise, you just have a meeting to present tickets. 

Done right, BIR can be amazing. It helps the company learn and really improves the product and the process. This is one of the best practices that the DevOps movement gifted us. Now think about this, if BIR is great for us to learn from bugs and incidents and make sure they do not happen again, what about features? What do we do with features?

What about Features?

Features are massive sources of investment in software. People are expensive, engineers are expensive. Product cares a lot about features. Now how do we know we are being effective with features? there are several things we can do in modern product development:
  • Proper Discovery: via discovery process, personas, user interviews, user research, prototypes, and rapid experimentation.
  • Metrics: what we call observability in engineering. How much growth are we having, and how much % user retention are we getting? Net Promote Score (NPS) and other proxy metrics. A great metric is counters, how many times the user clicks on the screen/button?
  • Feedback: for mobile apps, app store reviews, and review websites like Yelp, Google, and others. Direct customer feedback via support and many other sources.
Why does this matter? Because it costs a lot of money to make software. So, we need to know if we are going in the right direction or not. When we produce software there is a chance it works and there is a change it does not work and turns out to be a bug. Bugs can cause incidents, and incidents can happen because of bugs or by some other mistake. 

Incidents and bugs affect the user experience and need to be minimized. That's why we need to learn and we use blameless incident reviews to learn and improve. Now, we can do a lot of things for features from discovery, metrics, and feedback, but is that enough? 

Blamess Feature Review (BFR)

Think about this idea. What if in a regular cadence, 1x per month or after 10 features are delivered, we sit together or virtually and all review how the features are going. By definition product management is or supose to be doign that. However, are engineers involved? 

IMHO, they should be because some features are also sources of technical debt. Being able to delete features means deleting technical debt. Software needs a counterforce, where we reduce complexity. Adding features all the time means adding complexity all the time. The easiest way to reduce technical debt is by decommissioning software, but you cannot decommission software that is being used. 

But what if software is not being used? What if the software does not drive the results we want? Would that be a low-hanging fruit opportunity for future cleanup? Plus, why did we build something that the users did not want? Should the builders know about it, or should only the product know? I would argue that the product is everybody's responsibility. Imagine someone present feature X and say:
  • This is feature X
  • It cost $$$
  • It took X months to be done
  • It has Y bugs associated with it
  • It has Z many page views per day
  • It requires services A, B, and C
  • Customers are saying XYZ on the Apple Store, and XYZ in the Google Play store
  • We apply this discovery process: bla bla bla
Product view and engineering view all together. Imagine if there are two buckets, the top 5 best features, and the top 5 less used features. It would be great to compare them and see what we can learn. Doing better products is a cross-functional sport and requires engineering to know what works and does not work for the users. So we need a review process; why not start doing Blameless Feature Reviews? 

Cheers,
Diego Pacheco

Popular posts from this blog

Kafka Streams with Java 15

Rust and Java Interoperability

HMAC in Java