Firefighting

Firefighting is one of the biggest and most common causes of lost productivity and morale in software development. It usually manifests as developers being pulled off their project to urgently fix something - the live site is spewing 500 errors, there's a button in the wrong place, or a senior director just asked for a report that doesn't exist. They abandon what they were doing, grumble their way through fixing the issue, and return to their original work in a poor state of mind with their flow disrupted.

We all know this if we've worked in software for more than a few minutes. But let's look at it in a bit more detail. Firstly:

What is a fire?

A fire is an unexpected problem that, if left unchecked, will burn your business down.

From this, we can see that most firefighting isn't really dealing with "fires" at all. An undiscovered bug that causes customers to frequently lose their shopping cart and have to start again? That's a fire - you didn't expect it and you're going to lose a lot of business if it's not fixed right away. However, something the wrong shade of green? Probably not going to destroy your business. Someone doesn't have a report because you dropped it to make a deadline? Hardly unexpected, you should have dealt with that in the project.

This is a fairly typical firefighting profile in an organisation. A few real fires, mixed in with minor bug fixes and feature requests. At its worst, you're looking at a highly disruptive unofficial channel for getting new features at the cost of actual product development.

How do we deal with this?

Visibility

A good agile team is well-versed in the principles of transparency and visibility. You need a board. Not just cards on an individual project's board, but a dedicated firefighting board. One of the things which happens time and again as a consequence of this is stakeholders start saying, "I didn't realise everyone else had all these problems, I thought I was the only one". (That doesn't in itself solve all your issues, but more on this later).

Once you have the board, you need to keep it prioritised. Which means getting people to talk to each other. At SDWS we had the Monday morning horse trading session - getting all the business unit heads together to argue that finance's report is more important than marketing getting some new site images, but neither are as urgent as fixing the bug which is affecting website sales. The key is that those stakeholders are talking to each other: understanding the context and what's important to the business as a whole rather than the individual.

Ringfencing

Having the board is of limited use if you then go and use it as a vehicle to drag developers off their projects and demand they fix everything. You need a throughput limit and some way of figuring out who's going to work on the cards.

The most effective way to do this is a dedicated team - to start with this is usually comprised of developers who'd otherwise be on the bench between projects, but after time you find people who enjoy the nature of work and build the core of the team around them. That way you're not interfering with the agile commitment that a team should be free to concentrate on their project work, or giving the illusion that firefighting can be sped up by disrupting more of everything else. Developers on a product team may have to give the firefighting team support at first, but this reduces quickly as the dedicated team build up their own store of knowledge.

The other advantage of a dedicated team is you don't really need any rules over what goes on the firefighting board other than "prioritise it". It may end up full of little nice-to-haves, but all the time there are genuine fires to fight those aren't going to get done. Because you have the board, and it's very visible how long things are sitting in "To Do" for, it's easy to get an idea of whether the team is too small (fires being left ablaze) or too large (too many low value nice-to-haves hitting production).

This is where we come to the downside of the board and ringfencing. Most of the time, you'll have one or two people in the company who are used to getting everything they ask for done immediately, no matter how capricious or irrelevant a whim it may be. They might be the dictator screaming at the nearest team to get it done, or the networker with a mate in development who's always on the hook for their every whim, but whichever way they've got a direct line into development. From their point of view, ringfencing is far less efficient, because they've gone from getting everything done to only getting the important things done. The important thing in this situation is to cut off those unofficial channels. Make developers feel confident in saying no!

Again, the all-stakeholder meeting helps a lot here. It's much easier to tell someone they can't have every little problem dealt with at top priority any more when they can see first-hand how many other people in the business also have a ton of problems to fix, many of them much more urgent. That's why you need to be quite forceful in making people engage with the process to get things done - involvement is the most powerful tool to convince them a dedicated team is the best approach for the business as a whole, even if it involves some individual sacrifice.

Ultimately, you want to be in a position where all of the firefighting is being brought to a ringfenced team, and you're making a decision based on the amount of work as to how big the team should be. Unofficial channels reduce your ability to do that - you can't measure how much water you've poured into a bucket if the bucket has a leak.

Pushing to projects (or not)

Because your board is governed by priority, anything can be put into it - and you'll often see small feature requests as well as fires. Remember the point I started with: most firefighting is not an actual fire. But that doesn't mean we should ignore those little tasks. All we need to do is make sure we put the actual fires out first, and don't disrupt our investment in new products and technical capability.

There's nothing to say that if there are no fires to extinguish, our ringfenced team can't work on some of those features. This is, ultimately, how you deal with the morale issue of being stuck on support; by getting to a steady state where work is divided between crunching support tasks (dull) and working on quick business wins (fun). The key there is "quick". What you don't want is your firefighting team picking up project-sized pieces of work, only to abandon them as new fires come in and never get anything completed.

If something is genuinely project-sized, it should be dealt with as such: either by creating a project, or by dividing up the tasks and putting them on the backlog of the team working on that product. To know when to do this, I quite like a simple 2/2 rule: if it's more than 2 days of work or you can wait more than 2 iterations for something to be done, it belongs in the relevant project's backlog (or a new project) rather than the firefighting board.

The other thing that's worth pushing back to its originating product team is a bug. One of the risks of dedicated support is that less diligent product teams feel they can throw any old bug-ridden pile of junk over the wall and it becomes support's problem. Your firefighting team needs to be able to push back code quality issues. In extreme cases I'd argue for slowing the velocity of a team throwing bug-ridden code into production and rotating some of its members into the firefighting team so they have to clean up some of their own mess. My experience is that people come back to their teams with a renewed focus on quality after a stint in firefighting!

Improvement

The final piece of the puzzle is that the firefighting team needs ownership of its process. (It feels like a long time since I talked about ownership - it remains important). This comes down to three main things:

  • The ability to put cards into their own board to fix recurring issues, automate effort-intensive tasks, and deal with technical debt.
  • The ability to "stop the line" and exchange a low-quality band-aid for a more involved solution that fixes the underlying problem.
  • The ability to participate in the prioritisation process and move up cards that are important or worthwhile to the team.

Ownership is where you really change the support team's life from punishing drudgery to rewarding work. Because the nature of fires is bursty (you'll go weeks without an incident, then everything will explode at once) the team can spend its slack periods building new features, fixing technical debt, or building components that solve the problems they commonly see.

That's the secret to this, and why I'm telling you to build a ringfenced team with control over their process. You can extinguish fires by adding more people, more appliances and more effort - but the biggest advances come when you identify how those fires came about and understand how to build something that's not as flammable to start with. A good firefighting team don't just fight your fires. They build shared components and services to prevent fires and give warning at the first sign of smoke. They write articles and educate people on common mistakes and how to avoid them.

It's about outcomes. And when you have a team whose outcome is, "prevent things being on fire", and they know what all of those things are, they're going to deliver far better results than a harangued developer whose ideal outcome is getting back to their project as quickly as possible.

Image: public domain