• /
  • EnglishEspañol日本語한국어Português
  • ログイン今すぐ開始

Improve your stack with alerts

As time goes on, your number of alerts will grow. This can lead to problems for your organization if they're not managed correctly. Additionally, your alerts will give you crucial information you can use to improve your system, and if you don't take advantage of that information, you won't be using your alerts to their fullest potential.

You can find out how to manage the quality of your alerts to prevent things like alert fatigue, as well as how you can use to gather data and drive positive impact to your organization by following the process below.

A screenshot displaying a view of an AQM dashboard in New Relic

Optimize your alerts

Reducing unnecessary alerts helps ensure the alerts you receive are the most relevant ones. We've created an Alert quality management dashboard to make that easier. Essentially, you'll be installing a dashboard, gathering information, then making changes based on the information you've gathered. We've outlined each step in this process to make it more easy to get the results you want from your alerts.

A diagram displaying how to use alerts to improve your system

To get started optimizing your alerts, you need to do the following:

Analyze your KPIs

The dashboard will help you understand how you're doing using four KPIs (key performance indicators):

  • Incident count: alerts with a high number of incidents

  • Accumulated incident time: alerts with high cumulative durations

  • Mean time to close: the amount of time it takes until incidents are closed

  • Percent under 5 minutes: the amount of incidents open for less than 5 minutes

    The Alerting Count by Policy pane in the dashboard helps you identify these alert policies and determine any relevant patterns.

Establish your baselines

The AQM dashboard gives you a baseline of KPIs that you can use to begin the improvement process. You (and anyone on your team) can review the most active policies from the previous step to reduce alert noise. Ask yourself questions about what the data is telling you and how you can fix them, such as:

  1. Are the alerts telling us something about a resource that needs to be fixed? If so, then fix the problem and see if the alert volume decreases.

  2. Are the alerts telling us about something that actually requires an immediate response? If not, then adjust or disable the policy.

  3. Are the policy thresholds set properly? If not, then consider adjusting the thresholds.

    You should deal with the incident alerts using the following guidelines after establishing your baselines:

  • If you look at an alert and decide to take any sort of further investigative action, acknowledge the alert.
  • If you typically close an alert without doing anything else, don't acknowledge the alert.
  • If the incident alert is always on, don't close or acknowledge it.

Gather your data

It takes some time to accumulate your alert data from the dashboard. You should wait at least two weeks to gather this data, but check regulary to ensure that the incident responders for your alerts are following the guidelines outlined in the previous step.

Check your data against your baselines

After two weeks, you should have enough data to analyze and begin your alert improvement process. To improve your system using the alert data, follow the steps below:

  1. Analyze the week-over-week trends in your KPIs. Find the areas that you may need to fix and you can begin finding ways to improve them.
  2. Use the data to map the current quality of your alerts. You can identify areas where improvement has positively impacted the business and areas where problems have resulted in negative outcomes.
  3. Use the dashboard to identify the noisiest incident policies.
  4. Review the policies identified in the previous step. For each policy, try to determine if the alert is relevant, properly configured, and what the alert tells you about problems that you may need to address.
  5. Identify what areas you can work on to improve the policies you reviewed. This should be a technical analysis, and should end with recommendations in how to fix problems in your system that trigger the alert, how to tune policies that need improvement, or how to fix any gaps in your instrumentation.

After completing the procedure above, you're well on your way to using your alerts to improve your system and provide a positive impact to your organization. This is only the beginning though: there's a lot more possibilities for using alerts than what we've covered here. For more detailed information on alert quality and KPIs, see our Alert quality management docs.

Previous step

Learn how to create alerts using New Relic

Next step

Learn how to manage and improve the quality of your alerts

Copyright © 2024 New Relic株式会社。

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.