preview
We're still working on this feature, but we'd love for you to try it out!
This feature is currently provided as part of a preview program pursuant to our pre-release policies.
This document covers the steps to troubleshoot common issues when installing or operating Agent Control.
Enable debug logging
To diagnose errors during the installation process, you can increase the log level for Agent Control by adding the following setting in your values-newrelic.yaml
file:
agent-control-deployment: config: agentControl: content: log: level: trace
- Default log level:
info
. - Other supported log levels:
debug
andtrace
. - OTel collector logs: To enable debug logs in the OpenTelemetry collector, add
verboseLog: true
.
To inspect the Agent Control logs, run the following command, replacing agent-control-***
with the name of your Agent Control pod:
$kubectl logs agent-control-*** -n agent-control
Status endpoint
Agent Control exposes a local status endpoint you can use to check the health of Agent Control and its managed agents. This endpoint is enabled by default on port 52100
. Follow these steps to query the cluster status:
Forward a local port to the main agent-control pod:
$kubectl port-forward <pod-name> 51200:51200
Request the agent status:
$curl localhost:51200/status
Helm release failure
Agent Control requires a valid authentication credential to securely connect to Fleet Control. Initially, this credential is automatically generated through the Agent Control installation UI and is represented by the identityClientId
and identityClientSecret
fields in the values file.
For security reasons, the credential necessary for installing Agent Control expires after 12 hours. If the installation fails with the error message below, follow these troubleshooting steps:
$Error: UPGRADE FAILED: pre-upgrade hooks failed: job failed: BackoffLimitExceeded
Check the logs of the Kubernetes job responsible for setting up the Agent Control system identity.
First, identify the job’s pods:
$kubectl describe job agent-control-system-identity -n <your-namespace>
In the Events section, look for entries for the specific pods, as follows:
$Events:$ Type Reason Age From Message$ ---- ------ ---- ---- -------$ Normal SuccessfulCreate 88s job-controller Created pod: agent-control-system-identity-installer-jr6cg$ Normal SuccessfulCreate 73s job-controller Created pod: agent-control-system-identity-installer-wnx2v$ Normal SuccessfulCreate 50s job-controller Created pod: agent-control-system-identity-installer-8zsqd$ Normal SuccessfulCreate 7s job-controller Created pod: agent-control-system-identity-installer-btqh7$ Warning BackoffLimitExceeded 1s job-controller Job has reached the specified backoff limit
View the logs of the failing pods:
$kubectl logs <pod-name> -n <your-namespace>
Example:
$kubectl logs agent-control-system-identity-installer-btqh7 -n newrelic
After reviewing the logs, retry the installation using Helm while watching for specific error messages and checking the logs for potential problems. Below are some known issues and how to interpret them:
- Invalid identityClientId:
Error getting system identity auth token. The API endpoint returned 404: Failed to find Identity: <identityClientId-value>
- Invalid identityClientSecret:
Error getting system identity auth token. The API endpoint returned 400: Bad client secret.
- Identity expired:
Error getting system identity auth token. The API endpoint returned 400: Expired client secret.
- Missing required permissions:
Failed to create a New Relic System Identity for Fleet Control communication authentication. Please verify that your User Key is valid and that your Account Organization has the necessary permissions to create a System Identity: Exception while fetching data (/create) : Not authorized to perform this action or the entity is not found.
Invalid New Relic license
If you see an error message like the one below in the logs of the OpenTelemetry collector deployment pod, it may indicate an invalid New Relic license key. This prevents the collector from being able to export telemetry data to New Relic:
$2024-06-13T13:46:05.898Z error exporterhelper/retry_sender.go:126 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlphttp/newrelic", "error": "Permanent error: error exporting items, request to https://otlp.nr-dat ││ go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
Solution
Confirm that you're using a valid New Relic license key in your configuration.
OTel collector pods not created
If no OTel collector pods are being created, there may be an issue with the HelmRelease Custom Resource Definition (CRD).
Check the status of the Helm release:
$kubectl get helmrelease open-telemetry -n agent-control
A successful and healthy release should show READY: True
and STATUS: InstallSucceeded
.
If the release failed, the STATUS
and READY
fields will indicate the problem. Depending on the type of error, the root problem might not be fully reflected in the status field. To get more details, use kubectl
to describe the HelmRelease resource:
$kubectl describe helmrelease open-telemetry -n agent-control
Additional support
If you encounter issues not covered in this section, contact New Relic support for further assistance.