• /
  • EnglishEspañolFrançais日本語한국어Português
  • Se connecterDémarrer

Level 1 - Critical alert coverage scorecard rule

Critical alert coverage measures the balance between critical and warning alerts in your monitoring strategy. This scorecard rule helps you avoid alert fatigue by ensuring you're not over-relying on critical alerts for every issue.

About this scorecard rule

This critical alert coverage rule is part of Level 1 (Reactive) in the business uptime maturity model. It evaluates whether your alert strategy includes an appropriate mix of critical and warning alert conditions.

Why this matters: Too many critical alerts can lead to alert fatigue, where teams become desensitized to urgent notifications. A balanced alerting strategy helps teams respond appropriately to different severity levels.

How this rule works

This rule analyzes a 7-day sample of alert incidents to calculate what percentage are triggered by critical alert conditions versus warning alert conditions. It measures the ratio across all monitored entities in your account.

Understanding your score

  • Pass (Green): 25% or fewer of your alerts are classified as critical
  • Fail (Red): More than 25% of your alerts are classified as critical
  • Target: Maintain a balanced alert strategy where critical alerts represent true emergencies

What this means:

  • Passing score: You have a well-balanced alerting strategy with appropriate escalation levels
  • Failing score: You may be over-using critical alerts, which can lead to alert fatigue and reduced response effectiveness

Building a balanced alert strategy

A well-designed alerting strategy should include three types of alerts:

Immediately actionable alerts (Critical)

  • Purpose: Indicate business-impacting events requiring immediate response
  • Examples: Service outages, critical system failures, security breaches
  • Response time: Within minutes
  • Who responds: On-call engineer or incident response team

Anticipatory alerts (Warning)

  • Purpose: Signal conditions that aren't immediately business-impacting but may require future action
  • Examples: Rising error rates, approaching capacity limits, performance degradation
  • Response time: Within hours or during business hours
  • Who responds: Development team or system administrators

Retrospective alerts (Informational)

  • Purpose: Provide data for periodic analysis and long-term system optimization
  • Examples: Weekly performance summaries, capacity planning metrics, trend analysis
  • Response time: During scheduled review periods
  • Who responds: Operations team during planned analysis sessions

How to improve your critical alert coverage

If your score indicates too many critical alerts, follow these steps to rebalance your strategy:

1. Audit your current alerts

  1. Review all critical alerts: List every alert condition currently set to critical
  2. Assess business impact: For each critical alert, ask: "Does this require immediate response to prevent business impact?"
  3. Identify candidates for downgrade: Look for alerts that could be warnings instead

2. Reclassify alerts appropriately

Downgrade to warning when:

  • The issue doesn't immediately affect customers
  • Response can wait until business hours
  • The alert provides early warning of potential problems
  • Manual intervention isn't urgently required

Keep as critical when:

  • Customer-facing services are unavailable
  • Data loss or security incidents occur
  • Revenue-generating systems fail
  • Immediate action prevents cascading failures

3. Implement progressive alerting

Create alert escalation paths:

  1. Warning alert fires first when metrics approach concerning levels
  2. Critical alert follows if conditions worsen or persist
  3. Use time-based escalation to allow teams to respond before escalating

Example escalation:

  • Warning: Response time > 2 seconds for 5 minutes
  • Critical: Response time > 5 seconds for 2 minutes, OR warning persists for 30 minutes

4. Validate your changes

After reclassifying alerts:

  1. Monitor for missed issues: Ensure important problems are still detected
  2. Measure response times: Verify teams respond appropriately to different severity levels
  3. Gather team feedback: Ask responders if the new classification feels appropriate

Measuring improvement

Track these metrics to verify your alert rebalancing efforts:

  • Critical alert percentage: Should decrease toward the 25% target
  • Response effectiveness: Teams should respond faster to critical alerts when they're truly urgent
  • Alert fatigue reduction: Survey team members about confidence in alert classification
  • Incident detection coverage: Ensure you're still catching important issues early

Common scenarios and solutions

Everything marked as critical:

  • Problem: Teams mark all alerts as critical to ensure attention
  • Solution: Establish clear criteria for critical vs. warning classification and train teams on appropriate usage

Fear of missing important issues:

  • Problem: Teams worry that warning alerts will be ignored
  • Solution: Create processes for regular warning alert review and establish SLAs for different severity levels

Legacy alert configurations:

  • Problem: Old alerts were set up without consideration for severity levels
  • Solution: Conduct a systematic audit of all existing alerts and reclassify based on current business impact

When to adjust the 25% threshold

The default 25% threshold works for most organizations, but you may need to adjust it if:

  • Higher percentage acceptable: Your organization primarily monitors critical production systems
  • Lower percentage needed: You have extensive monitoring including development and staging environments
  • Industry requirements: Regulatory or compliance requirements dictate different alerting strategies

Important considerations

  • Business context matters: Critical alerts should align with your business priorities and customer impact
  • Team capacity: Consider your team's ability to respond to different alert volumes and severities
  • Escalation procedures: Ensure clear escalation paths exist for different alert types
  • Regular review: Alert classifications should evolve as your systems and business priorities change

Next steps

  1. Immediate action: Review and reclassify any alerts currently contributing to a failing score
  2. Ongoing monitoring: Check this scorecard rule weekly to maintain balanced alerting
  3. Advance to Level 2: Once alert coverage is optimized, focus on proactive monitoring practices

For comprehensive guidance on alert strategy, see our Alert Quality Management implementation guide.

Droits d'auteur © 2025 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.