• /
  • EnglishEspañolFrançais日本語한국어Português
  • Se connecterDémarrer

Level 2 - Service level coverage scorecard rule

Service level coverage measures whether your critical services have defined Service Level Indicators (SLIs) that track performance from your users' perspective. SLIs help you understand service health, set reliability targets, and make data-driven decisions about improvements.

About this scorecard rule

This service level coverage rule is part of Level 2 (Proactive) in the business uptime maturity model. It evaluates whether your services have SLIs defined, indicating a proactive approach to reliability management.

Why this matters: SLIs provide objective measurement of service quality from the user's perspective. Without SLIs, teams rely on subjective assessments of service health and may miss performance issues that impact user experience.

How this rule works

This rule examines the latest entity harvest to determine which entities have Service Level Indicators (SLIs) defined. It evaluates all monitored entities that could benefit from service level measurement.

Understanding your score

  • Pass (Green): Critical services have SLIs defined to measure user-facing performance
  • Fail (Red): Important services lack SLIs, making it difficult to measure service quality objectively
  • Target: Complete SLI coverage for business-critical and user-facing services

What this means:

  • Passing score: Your team can measure service reliability from the user perspective and make data-driven improvement decisions
  • Failing score: You're missing objective measures of service quality, potentially leading to blind spots in service performance

Understanding Service Level Indicators (SLIs)

SLIs are specific metrics that measure service performance from the user's perspective. Good SLIs should be:

User-focused

  • Measure what users experience: Response times, error rates, availability
  • Reflect business value: Metrics that directly impact customer satisfaction and business outcomes
  • Observable and measurable: Based on real telemetry data, not synthetic estimates

Common SLI types

Availability SLIs:

  • Definition: Percentage of requests that result in successful responses
  • Example: 99.9% of HTTP requests return non-error status codes
  • Good for: Critical user-facing services, APIs, websites

Latency SLIs:

  • Definition: Percentage of requests completed within acceptable time thresholds
  • Example: 95% of requests complete within 200ms
  • Good for: Interactive applications, real-time services, mobile apps

Quality SLIs:

  • Definition: Percentage of outputs that meet quality standards
  • Example: 99% of search results return relevant content
  • Good for: Data processing, content delivery, recommendation systems

Freshness SLIs:

  • Definition: Percentage of data that meets recency requirements
  • Example: 95% of dashboard data is less than 5 minutes old
  • Good for: Analytics platforms, reporting systems, monitoring dashboards

How to implement service level coverage

Follow these steps to establish comprehensive SLI coverage:

1. Identify services requiring SLIs

Prioritize by business impact:

  1. Customer-facing services: Applications that directly serve end users
  2. Revenue-critical systems: Services that impact business revenue if they fail
  3. Dependency services: Internal services that support multiple customer-facing applications
  4. Compliance-critical systems: Services required for regulatory or security compliance

Consider service characteristics:

  • Complexity: Services with multiple components or dependencies
  • User expectations: Services where performance directly affects user experience
  • Business criticality: Services that support core business functions
  • Change frequency: Services that are frequently updated or modified

2. Define meaningful SLIs

Choose the right metrics:

  • Start with user journeys: Map critical user paths and identify measurement points
  • Focus on outcomes: Measure what matters to users, not just technical metrics
  • Use existing data: Leverage telemetry you're already collecting
  • Keep it simple: Start with basic availability and latency SLIs

Set appropriate measurement windows:

  • Short windows (1-5 minutes): For real-time services requiring immediate response
  • Medium windows (1-24 hours): For most web applications and APIs
  • Long windows (weekly/monthly): For batch processing or analytical services

3. Implement SLIs systematically

Use New Relic's SLI features:

  1. Navigate to Service Levels: Access the service levels section in New Relic
  2. Select your service: Choose the entity you want to create an SLI for
  3. Define SLI criteria: Set up the specific metrics and thresholds
  4. Configure alerting: Set up notifications when SLIs are not being met

Best practices for implementation:

  • Start small: Begin with one or two critical services
  • Iterate and improve: Refine SLI definitions based on real-world data
  • Document decisions: Keep records of why specific SLIs were chosen
  • Train your team: Ensure everyone understands how to interpret and act on SLI data

Measuring improvement

Track these metrics to verify your service level coverage improvements:

  • SLI coverage percentage: Aim for 100% coverage of business-critical services
  • SLI relevance: Ensure SLIs correlate with actual user experience and business impact
  • Actionability: Measure how often SLI data leads to meaningful improvements
  • Team adoption: Track how frequently teams reference SLI data in decision-making

Common scenarios and solutions

Too many services to cover:

  • Problem: Large service portfolios make complete coverage overwhelming
  • Solution: Start with tier-1 services and expand coverage gradually based on business priority

Difficulty defining user-focused metrics:

  • Problem: Internal services don't have obvious user-facing metrics
  • Solution: Define SLIs based on downstream service dependencies and internal customer satisfaction

Legacy services without modern instrumentation:

  • Problem: Older applications may lack detailed telemetry for meaningful SLIs
  • Solution: Start with basic availability SLIs using synthetic monitoring or log-based metrics

Services with variable performance requirements:

  • Problem: Some services have different performance expectations at different times
  • Solution: Use time-based SLIs or create separate SLIs for different usage patterns

Advanced SLI strategies

Multi-dimensional SLIs

  • Geographic segmentation: Different SLIs for different regions
  • User segmentation: Separate SLIs for different user types (free vs. paid, mobile vs. web)
  • Feature-based: SLIs for specific features or user journeys

Composite SLIs

  • End-to-end measurement: SLIs that span multiple services for complete user journeys
  • Weighted averages: Combine multiple metrics based on business importance
  • Dependency-aware: SLIs that account for upstream service health

Adaptive SLIs

  • Dynamic thresholds: SLIs that adjust based on traffic patterns or seasonal variations
  • Learning systems: SLIs that evolve based on user behavior analysis
  • Context-aware: Different SLI targets for different operational contexts

Building a service level management program

Establish governance

  • SLI standards: Create organization-wide standards for SLI definition and measurement
  • Review processes: Regular evaluation of SLI relevance and accuracy
  • Ownership model: Clear responsibility for maintaining and acting on SLIs

Enable team adoption

  • Training programs: Educate teams on SLI concepts and implementation
  • Tools and automation: Provide easy-to-use tools for SLI creation and management
  • Success stories: Share examples of how SLIs have driven improvements

Continuous improvement

  • Regular review cycles: Quarterly or semi-annual SLI assessment and refinement
  • Feedback loops: Mechanisms to capture when SLIs don't reflect real user experience
  • Evolution strategy: Plan for how SLIs will mature as services and business needs change

Important considerations

  • Quality over quantity: Focus on meaningful SLIs rather than maximizing coverage numbers
  • User perspective: Always prioritize what users experience over internal technical metrics
  • Business alignment: Ensure SLIs support business objectives and customer satisfaction goals
  • Actionable insights: SLIs should lead to concrete actions when thresholds aren't met

Next steps

  1. Immediate action: Identify your most critical services and create basic availability SLIs
  2. Expand coverage: Gradually add SLIs for additional services based on business priority
  3. Refine definitions: Improve SLI accuracy based on real-world usage and feedback
  4. Set objectives: Progress to defining Service Level Objectives (SLOs) based on your SLIs
  5. Advance to Level 3: Once SLI coverage is established, focus on service level attainment

For comprehensive guidance on service level management, see our Service Level Management implementation guide.

Droits d'auteur © 2025 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.