Level 2 - Service level coverage scorecard rule

Service level coverage measures whether your critical services have defined Service Level Indicators (SLIs) that track performance from your users' perspective. SLIs help you understand service health, set reliability targets, and make data-driven decisions about improvements.

About this scorecard rule

This service level coverage rule is part of Level 2 (Proactive) in the business uptime maturity model. It evaluates whether your services have SLIs defined, indicating a proactive approach to reliability management.

Why this matters: SLIs provide objective measurement of service quality from the user's perspective. Without SLIs, teams rely on subjective assessments of service health and may miss performance issues that impact user experience.

How this rule works

This rule examines the latest entity harvest to determine which entities have Service Level Indicators (SLIs) defined. It evaluates all monitored entities that could benefit from service level measurement.

Understanding your score

Pass (Green): Critical services have SLIs defined to measure user-facing performance
Fail (Red): Important services lack SLIs, making it difficult to measure service quality objectively
Target: Complete SLI coverage for business-critical and user-facing services

What this means:

Passing score: Your team can measure service reliability from the user perspective and make data-driven improvement decisions
Failing score: You're missing objective measures of service quality, potentially leading to blind spots in service performance

Understanding Service Level Indicators (SLIs)

SLIs are specific metrics that measure service performance from the user's perspective. Good SLIs should be:

User-focused

Measure what users experience: Response times, error rates, availability
Reflect business value: Metrics that directly impact customer satisfaction and business outcomes
Observable and measurable: Based on real telemetry data, not synthetic estimates

Common SLI types

Availability SLIs:

Definition: Percentage of requests that result in successful responses
Example: 99.9% of HTTP requests return non-error status codes
Good for: Critical user-facing services, APIs, websites

Latency SLIs:

Definition: Percentage of requests completed within acceptable time thresholds
Example: 95% of requests complete within 200ms
Good for: Interactive applications, real-time services, mobile apps

Quality SLIs:

Definition: Percentage of outputs that meet quality standards
Example: 99% of search results return relevant content
Good for: Data processing, content delivery, recommendation systems

Freshness SLIs:

Definition: Percentage of data that meets recency requirements
Example: 95% of dashboard data is less than 5 minutes old
Good for: Analytics platforms, reporting systems, monitoring dashboards

How to implement service level coverage

Follow these steps to establish comprehensive SLI coverage:

1. Identify services requiring SLIs

Prioritize by business impact:

Customer-facing services: Applications that directly serve end users
Revenue-critical systems: Services that impact business revenue if they fail
Dependency services: Internal services that support multiple customer-facing applications
Compliance-critical systems: Services required for regulatory or security compliance

Consider service characteristics:

Complexity: Services with multiple components or dependencies
User expectations: Services where performance directly affects user experience
Business criticality: Services that support core business functions
Change frequency: Services that are frequently updated or modified

2. Define meaningful SLIs

Choose the right metrics:

Start with user journeys: Map critical user paths and identify measurement points
Focus on outcomes: Measure what matters to users, not just technical metrics
Use existing data: Leverage telemetry you're already collecting
Keep it simple: Start with basic availability and latency SLIs

Set appropriate measurement windows:

Short windows (1-5 minutes): For real-time services requiring immediate response
Medium windows (1-24 hours): For most web applications and APIs
Long windows (weekly/monthly): For batch processing or analytical services

3. Implement SLIs systematically

Use New Relic's SLI features:

Navigate to Service Levels: Access the service levels section in New Relic
Select your service: Choose the entity you want to create an SLI for
Define SLI criteria: Set up the specific metrics and thresholds
Configure alerting: Set up notifications when SLIs are not being met

Best practices for implementation:

Start small: Begin with one or two critical services
Iterate and improve: Refine SLI definitions based on real-world data
Document decisions: Keep records of why specific SLIs were chosen
Train your team: Ensure everyone understands how to interpret and act on SLI data

Measuring improvement

Track these metrics to verify your service level coverage improvements:

SLI coverage percentage: Aim for 100% coverage of business-critical services
SLI relevance: Ensure SLIs correlate with actual user experience and business impact
Actionability: Measure how often SLI data leads to meaningful improvements
Team adoption: Track how frequently teams reference SLI data in decision-making

Common scenarios and solutions

Too many services to cover:

Problem: Large service portfolios make complete coverage overwhelming
Solution: Start with tier-1 services and expand coverage gradually based on business priority

Difficulty defining user-focused metrics:

Problem: Internal services don't have obvious user-facing metrics
Solution: Define SLIs based on downstream service dependencies and internal customer satisfaction

Legacy services without modern instrumentation:

Problem: Older applications may lack detailed telemetry for meaningful SLIs
Solution: Start with basic availability SLIs using synthetic monitoring or log-based metrics

Services with variable performance requirements:

Problem: Some services have different performance expectations at different times
Solution: Use time-based SLIs or create separate SLIs for different usage patterns

Advanced SLI strategies

Multi-dimensional SLIs

Geographic segmentation: Different SLIs for different regions
User segmentation: Separate SLIs for different user types (free vs. paid, mobile vs. web)
Feature-based: SLIs for specific features or user journeys

Composite SLIs

End-to-end measurement: SLIs that span multiple services for complete user journeys
Weighted averages: Combine multiple metrics based on business importance
Dependency-aware: SLIs that account for upstream service health

Adaptive SLIs

Dynamic thresholds: SLIs that adjust based on traffic patterns or seasonal variations
Learning systems: SLIs that evolve based on user behavior analysis
Context-aware: Different SLI targets for different operational contexts

Building a service level management program

Establish governance

SLI standards: Create organization-wide standards for SLI definition and measurement
Review processes: Regular evaluation of SLI relevance and accuracy
Ownership model: Clear responsibility for maintaining and acting on SLIs

Enable team adoption

Training programs: Educate teams on SLI concepts and implementation
Tools and automation: Provide easy-to-use tools for SLI creation and management
Success stories: Share examples of how SLIs have driven improvements

Continuous improvement

Regular review cycles: Quarterly or semi-annual SLI assessment and refinement
Feedback loops: Mechanisms to capture when SLIs don't reflect real user experience
Evolution strategy: Plan for how SLIs will mature as services and business needs change

Important considerations

Quality over quantity: Focus on meaningful SLIs rather than maximizing coverage numbers
User perspective: Always prioritize what users experience over internal technical metrics
Business alignment: Ensure SLIs support business objectives and customer satisfaction goals
Actionable insights: SLIs should lead to concrete actions when thresholds aren't met

Next steps

Immediate action: Identify your most critical services and create basic availability SLIs
Expand coverage: Gradually add SLIs for additional services based on business priority
Refine definitions: Improve SLI accuracy based on real-world usage and feedback
Set objectives: Progress to defining Service Level Objectives (SLOs) based on your SLIs
Advance to Level 3: Once SLI coverage is established, focus on service level attainment

For comprehensive guidance on service level management, see our Service Level Management implementation guide.