Team Erosion
This is not a geography post. However, software architecture and code also suffer from erosion. People come and go, teams get created and destroyed by the project mindset. The result is an ownership problem. Ownership problem is just another form of a Governance issue; Governance is a bad world and often associated with bureaucracy and old ideas, and bad stuff. Ownership issues exist because teams touch a higher number of resources such as Code, Database Schemas, Dashboards, Alerts, Wiki pages, S3 Buckets, Kubernetes clusters, Ec2 instances, and much more assets. So why AWS TAGS are not enough?
The issues with Tags
The tags are awesome don't get me wrong. However, tags will work only for AWS resources, and the reality is we have much more resources, and several of those easily are outside of AWS, such as Dashboards, Alerts, Traces, Logs, Source Code, Jenkins Jobs, and much more. You should be doing TAGS, dont get me wrong, but thats not enough. We need more.
So people think that could be solved with a monorepo. Monorepos can make sense within a single domain and for sure they reduce the software proliferation aspect but this is also not enough for fix the team erosion problem.
The need for a lifecycle
New engineers are joining your organization. What software should they use? what software should they avoid at all costs? Which software they need to maintain? It could be very tricky and noisy to answer that question. People coming and going(The erosion effect) could easily create orphans; which team should inherit the assets? A lifecycle is about effective comunication.
This is a very sample lifecycle example; for sure, you can have a more complicated one. But having these 3 states, we can provide a strong and effective comunication pattern meaning:
ACTIVE: Live and Well - Feel free to keep growing and doing PRs here.
DEPRECATED: We want to decommission this; we are in maintenance mode, do PRs if you must.
DEAD: We archive it, close to be burned :-) So dont do PR's here. This is not in production anymore.
Git/Github has commit hooks, which we could check the metadata of the assets before allowing a commit co trought, that could also be an effective way to communicate to teams.
The need for metadata
Lifecycle is cool but is that enough? No. We need more. We also need to have metadata. Metadata will help us to increase the understanding of the systems. Metadata can be either a single file on each github repository and/or tags on resources. What other forms of metadata might be useful:
* Kind: It's a Service, Shared Lib or Lambda?
* Owner: Could be a Team, An Architect, and EM. Ideally, email and Slack Handlers here as well.
* Lifecycle State: (Active|Deprecated|Dead) so we know what the state of affairs is.
* Links: Links to other resources owned by that team: Dashboards, Alerts, etc.
Netflix has an OSSMETADATA file to tell the community the state of affairs. You can do a similar thing in your github, or you can also build a more complex system like YELP did it.
Automation and Accountability
Once you have the data in place. Automation can be created to show a unified Dashboard or page. Alerts also can be creating remembering the team to update their assets for instance. Having this metadata in place also allows us to have a BOT that can POKE people to remind them they do update their important assets such as libraries versions, broken dashboards, un-used instances in AWS, etc.
Accountability can be the next step. Either by committee or open forun, you will need to decide who inherit orphans this is important because otherwise, the system would lose his purpose. Having healthy software could also be part of coaching/management evaluation criteria.
By the end, it's all about having a product mentality with long-term thinking and understand that projects and people can come and go but the code and other assets might be there forever so we need a different way to think and approach the Governance of these resources.
Cheers,
Diego Pacheco