The most prevalent challenge in software engineering for bug fixing is the triaging process.
It's not whether teams are allocating time to fix bugs or not.
Every company has to spend time fixing bugs. Most of them spent too much or too little.
Only companies with a clear triaging system in place succeed.
If your incoming bug triaging system is unclear or non-existent, you're basically set up to fail over time.
Many systems I tried have failed.
But one stuck over time.
The general guidance is as follows:
- A rotating "on-call" schedule for bug triaging is key
- The bug impact is assessed and triaged—not investigated nor fixed—as they come in
- The triaging exercise should take less than 30 seconds
- The ultimate objective is to ensure a seamless and quick process
Practically, it could look like:
- Each incoming bug is posted somewhere visible to the entire team (I like using Slack or any IM platform)
- The on-call engineer (which can include the engineering manager) is responsible for quickly assessing the impact. If the impact is high, then an immediate response ensues. Otherwise, the bug is triaged and moved into a future sprint of work.
- Then, they react to the message when triaged (for visibility purposes to the rest of the team)
- They update the bug/ticket in the respective ticket system based on the assessment
- Go back to other responsibilities
For the success of this system, the triaging exercise absolutely needs to be about triaging.
Not about the investigation of the root cause. Nor trying to replicate it.
An effective system is as frictionless as possible.