charliekendall.co.uk | Quality is a shared responsibility, not a job title

📅 27 August 2024

quality assurance platform engineering automation culture QA

Before getting into it, I want to be clear that I mean no disrespect whatsoever to anybody who is employed in a QA focused role. The vast majority of QA folk I have had the pleasure of meeting are bright people who want to create high quality products and care very much about their work - exactly the type of people I love to work with! They are the ballast keeping the ship stable in rough seas, doing their utmost to fend off poor user experiences, buggy interfaces, and nonsensical features. But therein lies the problem: they are most often a band aid applied to stem the flow of chaos whilst inadequacies in other areas of the business run rife. Employing QA people rarely addresses the root cause and can, counterintuitively, amplify the problems that it masks, allowing them to grow into something much larger and more challenging to rectify in the long run.

The moment you put up that job ad for a QA person, you’re sending a strong and undeniable message to the rest of your team: we do not trust your output to be of an acceptable quality.

For many, it only gets worse from here. A dev team workflow might then involve a developer ‘passing’ their work over to QA before it can be released to the users. When they do so, the developer must switch contexts and carry out other work whilst QA examine what has been sent their way, only to switch back once more when questions are asked and bugs are raised. The pace of delivery slows and people become frustrated, feeling like everything is in progress and nothing is being completed. For the developers, quality is no longer their responsibility and there now exists a QA safety net between them and the users, so why should they spend as much time and energy worrying about it? Divides emerge within teams, and the lines of responsibility become blurred leading to an inevitable drop in morale. A bleak picture, yet not an uncommon one.

But… how can we ensure quality without a QA team?!

Well, that’s the million dollar question! Every organisation is different and, as is usually the case with these sorts of things, there is no one-size-fits-all strategy for delivering high quality software efficiently. Having said this, there are a number of principles that many of the successful strategies share and, spoiler alert, “hire loads of QA people” has not made the shortlist. This is certainly not to say these roles are never appropriate; there are many contexts (e.g. game development) in which QA folk are all but essential! However, if you’re building a deterministic system with a well bounded set of inputs then you could be shooting yourself in the foot by having a QA team as the enforcers of quality rather than instilling a desire to produce high quality work into your development teams. In the end, it boils down to empowering people with the tools, culture, and direction in such a way that it’s easy for them to do the right thing, and hard for them to do the wrong thing.

I’ll reiterate that final part, as it really is the bottom line of the value that high quality affords you: it makes it easy to do the right thing, and hard to do the wrong thing. Whether it’s building a seamless user experience, ensuring pre-prod environments are all but identical to production, or maintaining runbooks for important tasks you have yet to automate. They all make it easier for key groups of people to do the right thing, or harder for them to do the wrong thing. To emphasise this further, let’s look at some examples of scenarios where quality shines through in different ways:

“We need to tweak all form buttons in the application to improve accessibility!” 👌 Do you have a well factored codebase with good automated test coverage and a healthy CI/CD pipeline? No problem - should be a quick job for one of the junior devs

“The third party stock prices API we depend on is down!” ⚠️ Got a circuit breaker policy that automatically fails over to a different API? Nice - there’s no user impact, and it’s self-healing so let’s get rid of that alert and avoid the noise in the future

“Some integration tests are failing in the pre-production environment!” 🤔 Can you easily see which commit broke them and whether it’s safely revertible? If so, get the revert done and block deployments to production in the meantime. Incident avoided, at least we had a representative pre-prod environment and integration tests! We should consider running a session to brainstorm what could be done to avoid similar issues making it as far as pre-production in the future

Hopefully you get the point… if quality wasn’t present in the scenarios above then they could have easily resulted in downtime, bugs, and slow delivery. Quality is far more than verifying a feature meets the acceptance criteria, or running a sparse suite of unit tests against pull requests; it is a mindset that needs to be firmly embedded throughout your culture. So, here’s my take on three principles that can help you to get there:

Invest heavily in your platform

Your platform is your bedrock, made up of all the dimensions common to the various pieces of your solutions. Deployment pipelines, telemetry instrumentation, infrastructure provisioning and maintenance, endpoint security, and even feature flag management… the list goes on! Appropriately abstracting and ensuring centralised ownership of these concerns is absolutely necessary in order to provide teams with a powerful arsenal that enables them to move both swiftly and confidently. With a great platform, product teams are able to allocate much more of their time to directly delivering high quality changes and delighting users instead of wrestling the YAML for their CI pipelines into submission. If the platform is left behind, teams inevitably waste time reinventing wheels and accruing a debt for which quality is firmly at the top of the hit list when the plaintiffs come knocking.

I won’t dive further into the weeds of platform engineering right now as I plan to write more about it in the future. In the meantime, Microsoft have a pretty handy set of articles on this topic that I’d highly recommend taking a look at if you’re interested. Instead, I will leave you with an analogy I like to use for why investing in your platform is worthwhile:

Imagine you own a building company and are embarking on a project to build a new row of terraced housing. You could hire a couple of structural engineers (platform engineers) at considerable expense to work out the blueprints for the foundations, but why would you when your builders (product engineers) are the ones doing the labour and they’ve poured foundations a hundred times before? Without the structural engineers, you’ll save money and end up with something that just about looks like a row of houses to sell - a huge win! The project is eventually completed later than planned, and you take a look at what your teams have produced. A quarter of the houses on the terrace look great, because one of the builders had a keen interest in structural engineering and worked overtime to make sure the foundations were designed to the best of her ability. A few houses are subtly subsiding already, and the rest are somewhere in between - nothing the average person would notice though. All the houses are eventually sold, however over the following five years you end up spending 10x the cost of the structural engineers fighting lawsuits and rectifying the various issues you are contractually obligated to fix.

The lesson? Although it’s a cost that might seem difficult to justify at first, hiring those structural (*cough* platform *cough*) engineers would have saved you a lot of time, reputational damage, and money in the long run. Platform engineering tackles a difficult set of problems, so give those engineers space to address them properly in a centralised way rather than expecting each dev team to solve them independently themselves. It’s an investment, but one that’s guaranteed to be one of the best you’ll ever make.

🥡 The takeaway: Ensure your platform is well abstracted and has clear ownership.

Minimise the distance between developers and users

Every additional intermediary between a person using software and a person creating software adds complexity, which grows exponentially as you increase the number of intermediaries. With each one added comes a new set of voices that must be heard and expectations that must be set - so as you can imagine, lengthening this chain is a great way to encourage misaligned expectations and inappropriate solutions. It’s not uncommon for devs to be the final link in the chain, only having their input sought at the point which most requirements have been agreed upon. These requirements are often presented in the form of incomplete acceptance criteria and/or UI designs that fail to take into account complexities arising from technical constraints and the system architecture. This usually ends up going one of two ways: developers ask questions that should have been asked long ago, resulting in delayed delivery of a partial set of requirements and consequently some rather unimpressed stakeholders. Alternatively, developers don’t ask questions and do their best to hack together whatever has been asked of them, resulting in a glut of technical debt, bugs, and unscalable systems.

As is the case with any other role, developers bring their own perspective and ask questions that others may not, so it only makes sense to include them in the conversation from the start – and I’m not talking about a “solutions architect” or some other higher level technical person, it needs to be the folk that will be writing the code and building the thing! Bringing developers in for inception, refinement, research, or design sessions not only presents an opportunity for these questions to be asked, but also shares invaluable context relating to why decisions are made whilst simultaneously building up their domain knowledge and team cohesion. The result you’ll get is supercharged teams who are capable of making smarter decisions and building products that meet your users’ needs far more effectively and doing so much faster!

🥡 The takeaway: Involve all team members early and often in the end-to-end software lifecycle, from inception to release.

Empower teams with all the data

I believe that quality is always quantifiable. If a statement relating to quality is not quantifiable, then it’s nothing more than an opinion 😉 Data is what gives you the power to quantify, so should be made readily available in a variety of forms. It’s easy to say “we build high quality software!”, but what metrics are you using to back up this claim? It could be that your APIs have 99.999% uptime, or maybe you’ve got an impressively high uptake of new features, or it could even be a speedy ‘mean time to resolution’ of bugs. Regardless of the metrics you use to gauge quality, it is essential that measuring data is easy and analysis is accessible in order to build a culture where quality has a front row seat. In other words, it should be simple to add/change the data that is being captured and your analytics tooling should be accessible to everybody, rather than being locked behind a deep knowledge of data structures and query languages. If these criteria aren’t sufficiently met, most people simply won’t look at the data and you lose out on the majority of the power it holds.

Hopefully it goes without saying that teams should have access to application logs/telemetry data and that this can massively reduce the time taken to address bugs and performance issues, however this is far from the full picture. Product metrics and analytics play just as important a role when it comes to keeping the quality bar high. These are able to give you a powerful view into the minds of your users which can, with proper analysis, be used to build them features and improvements they didn’t even know they needed. Conversely, a lack of appropriate quantitative analysis can lead to a culture dominated by ‘the loudest voice wins’, with software being bloated out by unused functionalities and low impact work being highly prioritised. High quality data is incredibly powerful, and a relatively small amount of analysis can go a very long way towards directing your strategy and optimising your time.

There’s nothing like an good example to hammer a point home, so here’s a real life one that a friend shared with me recently: her company set a quarterly product objective to increase the number of weekly sign-ups by 15%. Being a smart product led company, they had an analytics framework instrumented throughout their application. Instead of diving right in and solutionizing, my friend put together a funnel for the sign-up flow to see if there was any low hanging fruit when it comes to improving conversion. A 30% user drop off on the second sign-up step looked suspicious and, after digging a little deeper, it became apparent that the majority of the drop offs were iPhone Safari user agents. As it turned out, there was a bug with one of their form components that was preventing certain browsers from completing the flow altogether. 2 hours later, a fix was deployed resulting in an immediate reduction in the drop off from 30% to 12% and a net increase in sign-ups by more than 20% 🤯

You won’t get big wins like this every day, but it’s a great example of how solid data and tooling helps to keep quality high. You get to fix a major bug, hit a quarterly objective, save a tonne of time, and avoid wasting money on email campaigns… all in a single day! What’s not to love?

🥡 The takeaway: Data is your superpower, so make sure you are capturing it and utilizing it properly.

At the end of the day…

… every company is different, and strategies that work for one are not guaranteed to work for the next. The one undeniable truth is that high quality tech is only ever built by high quality companies and individuals, where it is deeply embedded within their personalities and culture. This is not to say that you need high quality software to be successful; there are countless profitable companies out there who are shipping low quality tech, but consider how much more profitable they could be if they were better able to meet the needs of their users and stakeholders. And on the employee level, working in a high quality environment is infinitely more rewarding than constantly fighting fires in a hellish feature factory! Of course there will always be stresses, but the stress caused by taking your new feature GA during a live conference talk is very different to the stress caused by being woken up at 2am to deal with your third P1 incident that week.

When quality is taken seriously, everybody reaps the rewards.