Critical ops ranking system

When setting severity levels and the processes and expectations around them, in addition to the impact of an incident, you’ll need to factor in: Not all incidents are created equal and not all organizations handle them the same way. And because severity is often a key factor that drives priority, we’ve set clear definitions in our incident handbook for our own incident management practice. High severity doesn’t automatically push something to the top of the priority list and high priority doesn’t always mean a system is down.īecause priority is more actionable than severity, it’s the primary measurement we use in Opsgenie. If there are other incidents with wider impact, an issue like this might not be the top priority, even though generally the words “system is crashing” would mean all hands on deck.īoth measurements matter in incident management, but it’s important to recognize when they align and when they don’t. But let’s also say that incident is only affecting. It’s high severity because it keeps users from doing what they need to do. Similarly, let’s say there’s an incident that’s causing your app to crash. And so this incident could be low-severity and high-priority. Employees can still do what they need to do.īut the business might see the fix as high priority for brand standard reasons or because it causes confusion or simply because it makes them look bad. Users can still do whatever they need to do. This is a low-severity issue because it doesn’t actually impact the function of the website. For example: Let’s say there’s a typo in the headline on your website’s homepage. A high-severity incident that takes down the entire company is also probably the highest priority for DevOps and IT teams to focus on.īut sometimes priority and severity don’t align. Sometimes the two measurements align perfectly. How quickly do we need to fix this issue? Which issue needs to be fixed first? Priority, on the other hand, is a measurement of urgency. How much impact does an incident have on users? Does it take down their whole system? Keep them from completing a vital task? Or perhaps just irritate them and make tasks harder? After all, a severe incident with dire consequences should be dealt with before a less-severe incident, right?īut the truth is, for most businesses, it’s more complicated than that. Without well-defined severity levels, it’s easy to waste vital time defining and explaining an incident’s urgency instead of resolving it.Īt first glance, incident severity seems like it would be the same as incident priority.

The more well-defined your SEV levels are, the more likely it is that your team will be on the same page and able to react quickly and appropriately when incidents happen. Severity levels are useful for understanding impact quickly and setting priorities for the IT and DevOps teams.

A minor inconvenience to customers, workaround available.

git push, issue create) is significantly impacted A customer-facing service is unavailable for a subset of customers.A customer-facing service, like Jira Cloud, is down for all customers.SeverityĪ critical incident with very high impact Typically, the lower the severity number, the more impactful the incident.įor example: At Atlassian, we define a SEV (severity) 1 incident as “a critical incident with very high impact.” This could include a customer data loss, a security breach, or when a client-facing service is down for all customers.Ī SEV 2 incident is a “major incident with significant impact,” including when a client-facing service is down for a sub-set of customers or a critical function within a system is not functioning.Īnd a SEV 3 incident is “a minor incident with low impact,” such as a system glitch that is causing customers slight inconvenience.Īt Atlassian, SEV 3 incidents can be handled during daytime/working hours, while SEV 1 and SEV 2 incidents generate an alert for on-call professionals for an immediate fix no matter the time of day. Incident severity levels are a measurement of the impact an incident has on the business.