Skip to main content

Observability and Monitoring in llm-d

Please join SIG-Observability to contribute to monitoring and observability topics within llm-d.

Enable Metrics Collection in llm-d Deployments

Prometheus HTTPS/TLS Support

By default, Prometheus is installed with HTTP-only access. For production environments or when integrating with autoscalers that require HTTPS, you can enable TLS:

# Install Prometheus with HTTPS/TLS enabled
./scripts/install-prometheus-grafana.sh --enable-tls

This will:

  1. Generate self-signed TLS certificates valid for 10 years
  2. Create Kubernetes secrets with the certificates
  3. Configure Prometheus to serve its API over HTTPS
  4. Update Grafana datasource to use HTTPS

Accessing Prometheus with TLS:

  • Internal cluster access: https://llmd-kube-prometheus-stack-prometheus.llm-d-monitoring.svc.cluster.local:9090
  • Port-forward access: kubectl port-forward -n llm-d-monitoring svc/llmd-kube-prometheus-stack-prometheus 9090:9090 then access via https://localhost:9090

For clients that need the CA certificate:

kubectl get configmap prometheus-web-tls-ca -n llm-d-monitoring -o jsonpath=``{.data.ca\.crt}`` > prometheus-ca.crt

Certificate Management:

  • Certificates are stored in the prometheus-web-tls secret
  • CA certificate is also available in the prometheus-web-tls-ca ConfigMap for client use
  • To regenerate certificates: delete the secret and run the installation script again with --enable-tls

Platform-Specific

Helmfile Integration

All llm-d guides have monitoring enabled by default, supporting multiple monitoring stacks depending on the environment. We provide out of box monitoring configurations for scraping the Endpoint Picker (EPP) metrics, and vLLM metrics.

See the vLLM Metrics and EPP Metrics sections below for how to further config or disable monitoring.

vLLM Metrics

vLLM metrics collection is enabled by default with:

# In your ms-*/values.yaml files
decode:
monitoring:
podmonitor:
enabled: true

prefill:
monitoring:
podmonitor:
enabled: true

Upon installation, view prefill and/or decode podmonitors with:

kubectl get podmonitors -n my-llm-d-namespace

The vLLM metrics from prefill and decode pods will be visible from the Prometheus and/or Grafana user interface.

EPP (Endpoint Picker) Metrics

EPP provides additional metrics for request routing, scheduling latency, and plugin performance. EPP metrics collection is enabled by default with:

  • For self-installed Prometheus,

    # In your gaie-*/values.yaml files
    inferenceExtension:
    monitoring:
    prometheus:
    enabled: true

    Upon installation, view EPP servicemonitors with:

    kubectl get servicemonitors -n my-llm-d-namespace
  • For GKE managed Prometheus,

    # In your gaie-*/values.yaml files
    inferenceExtension:
    monitoring:
    gke:
    enabled: true

EPP metrics include request rates, error rates, scheduling latency, and plugin processing times, providing insights into the inference routing and scheduling performance.

Dashboards

Grafana dashboard raw JSON files can be imported manually into a Grafana UI. Here is a current list of community dashboards:

PromQL Query Examples

For specific PromQL queries to monitor LLM-D deployments, see:

Load Testing and Error Generation

To populate metrics (especially error metrics) for testing and monitoring validation:

Troubleshooting

Autoscaler "http: server gave HTTP response to HTTPS client" Error

If your autoscaler is configured to connect to Prometheus via HTTPS but Prometheus is serving HTTP, you'll see this error:

Post "https://llmd-kube-prometheus-stack-prometheus.llm-d-monitoring.svc.cluster.local:9090/api/v1/query":
http: server gave HTTP response to HTTPS client

Solution: Enable TLS on your Prometheus installation:

# Reinstall with TLS enabled
./scripts/install-prometheus-grafana.sh --uninstall
./scripts/install-prometheus-grafana.sh --enable-tls

Or manually generate certificates and upgrade:

# Generate certificates
./scripts/generate-prometheus-tls-certs.sh

# Upgrade existing installation
helm upgrade llmd prometheus-community/kube-prometheus-stack \
-n llm-d-monitoring \
-f /tmp/prometheus-values-with-tls.yaml

After enabling TLS, ensure your autoscaler:

  1. Uses https:// instead of http:// in the Prometheus URL
  2. Has access to the CA certificate (available in the prometheus-web-tls-ca ConfigMap)
  3. Is configured to either verify or skip TLS verification appropriately
Documentation Version

This documentation corresponds to llm-d v0.4.0, the latest public release. For the most current development changes, see this file on main.

📝 To suggest changes or report issues, please create an issue.

Source: docs/monitoring/README.md