Health Checks

Use this to health check all LLMs defined in your config.yaml

When to Use Each Endpoint

Endpoint	Use Case	Purpose
`/health/liveliness`	Container liveness probes	Basic alive check - use for container restart decisions
`/health/readiness`	Load balancer health checks	Ready to accept traffic - includes DB connection status
`/health`	Model health monitoring	Comprehensive LLM model health - makes actual API calls
`/health/services`	Service debugging	Check specific integrations (datadog, langfuse, etc.)
`/health/shared-status`	Multi-pod coordination	Monitor shared health check state across pods

Summary

The proxy exposes:

a /health endpoint which returns the health of the LLM APIs
a /health/readiness endpoint for returning if the proxy is ready to accept requests
a /health/liveliness endpoint for returning if the proxy is alive
a /health/shared-status endpoint for monitoring shared health check coordination across pods

Shared Health Check State

When running multiple LiteLLM proxy pods, you can enable shared health check state to coordinate health checks across pods and avoid duplicate API calls. This is especially beneficial for expensive models like Gemini 2.5-pro.

Key Benefits:

Reduces duplicate health checks across pods
Saves costs on expensive model API calls
Reduces monitoring noise and logging
Improves resource efficiency

Requirements:

Redis for shared state coordination
Background health checks enabled
Multiple proxy pods

For detailed configuration and usage, see Shared Health Check State.

`/health`

Request

Make a GET Request to /health on the proxy

info

This endpoint makes an LLM API call to each model to check if it is healthy.

curl --location 'http://0.0.0.0:4000/health' -H "Authorization: Bearer sk-1234"

You can also run litellm -health it makes a get request to http://0.0.0.0:4000/health for you

litellm --health

Response

{
    "healthy_endpoints": [
        {
            "model": "azure/gpt-35-turbo",
            "api_base": "https://my-endpoint-canada-berri992.openai.azure.com/"
        },
        {
            "model": "azure/gpt-35-turbo",
            "api_base": "https://my-endpoint-europe-berri-992.openai.azure.com/"
        }
    ],
    "unhealthy_endpoints": [
        {
            "model": "azure/gpt-35-turbo",
            "api_base": "https://openai-france-1234.openai.azure.com/"
        }
    ]
}

Embedding Models

To run embedding health checks, specify the mode as "embedding" in your config for the relevant model.

model_list:
  - model_name: azure-embedding-model
    litellm_params:
      model: azure/azure-embedding-model
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      mode: embedding # 👈 ADD THIS

Image Generation Models

To run image generation health checks, specify the mode as "image_generation" in your config for the relevant model.

model_list:
  - model_name: dall-e-3
    litellm_params:
      model: azure/dall-e-3
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      mode: image_generation # 👈 ADD THIS

Custom Health Check Prompt

By default, health checks use the prompt "test from litellm". You can customize this prompt globally by setting an environment variable, or per-model via config:

DEFAULT_HEALTH_CHECK_PROMPT="this is a test prompt"

Text Completion Models

To run /completions health checks, specify the mode as "completion" in your config for the relevant model.

model_list:
  - model_name: azure-text-completion
    litellm_params:
      model: azure/text-davinci-003
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      mode: completion # 👈 ADD THIS

Speech to Text Models

model_list:
  - model_name: whisper
    litellm_params:
      model: whisper-1
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      mode: audio_transcription

Text to Speech Models

# OpenAI Text to Speech Models
  - model_name: tts
    litellm_params:
      model: openai/tts-1
      api_key: "os.environ/OPENAI_API_KEY"
    model_info:
      mode: audio_speech
      health_check_voice: alloy

You can specify a health_check_voice if you need to use a voice other than "alloy".

Rerank Models

To run rerank health checks, specify the mode as "rerank" in your config for the relevant model.

model_list:
  - model_name: rerank-english-v3.0
    litellm_params:
      model: cohere/rerank-english-v3.0
      api_key: os.environ/COHERE_API_KEY
    model_info:
      mode: rerank

Batch Models (Azure Only)

For Azure models deployed as 'batch' models, set mode: batch.

model_list:
  - model_name: "batch-gpt-4o-mini"
    litellm_params:
      model: "azure/batch-gpt-4o-mini"
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
    model_info:
      mode: batch

Expected Response

{
    "healthy_endpoints": [
        {
            "api_base": "https://...",
            "model": "azure/gpt-4o-mini",
            "x-ms-region": "East US"
        }
    ],
    "unhealthy_endpoints": [],
    "healthy_count": 1,
    "unhealthy_count": 0
}

Realtime Models

To run realtime health checks, specify the mode as "realtime" in your config for the relevant model.

model_list:
  - model_name: openai/gpt-4o-realtime-audio
    litellm_params:
      model: openai/gpt-4o-realtime-audio
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      mode: realtime

OCR Models

To run OCR health checks, specify the mode as "ocr" in your config for the relevant model.

model_list:
  - model_name: mistral/mistral-ocr-latest
    litellm_params:
      model: mistral/mistral-ocr-latest
      api_key: os.environ/MISTRAL_API_KEY
    model_info:
      mode: ocr

Wildcard Routes

For wildcard routes, you can specify a health_check_model in your config.yaml. This model will be used for health checks for that wildcard route.

In this example, when running a health check for openai/*, the health check will make a /chat/completions request to openai/gpt-4o-mini.

model_list:
  - model_name: openai/*
    litellm_params:
      model:  openai/*
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      health_check_model: openai/gpt-4o-mini
  - model_name: anthropic/*
    litellm_params:
      model: anthropic/*
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      health_check_model: anthropic/claude-3-5-sonnet-20240620

Background Health Checks

You can enable model health checks being run in the background, to prevent each model from being queried too frequently via /health.

info

This makes an LLM API call to each model to check if it is healthy.

Here's how to use it:

in the config.yaml add:

general_settings: 
  background_health_checks: True # enable background health checks
 health_check_interval: 300 # frequency of background health checks

Start server

$ litellm /path/to/config.yaml

Query health endpoint:

 curl --location 'http://0.0.0.0:4000/health'

Disable Background Health Checks For Specific Models

Use this if you want to disable background health checks for specific models.

If background_health_checks is enabled you can skip individual models by setting disable_background_health_check: true in the model's model_info.

model_list:
  - model_name: openai/gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      disable_background_health_check: true

Hide details

The health check response contains details like endpoint URLs, error messages, and other LiteLLM params. While this is useful for debugging, it can be problematic when exposing the proxy server to a broad audience.

You can hide these details by setting the health_check_details setting to False.

general_settings: 
  health_check_details: False

Health Check Driven Routing

By default, background health checks are observability-only — they populate the /health endpoint but don't affect routing. Unhealthy deployments still receive traffic until request failures trigger cooldown.

With enable_health_check_routing: true, the router excludes deployments that failed their last background health check before selecting a candidate. This gives you proactive failover instead of reactive cooldown.

How it works

Background health checks run on their configured interval
After each cycle, every deployment is marked healthy or unhealthy
On each incoming request, the router filters out unhealthy deployments before cooldown filtering and load balancing
If all deployments are unhealthy, the filter is bypassed (safety net — never causes a total outage)
If health state is stale (older than health_check_staleness_threshold), it is ignored

Quick start

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY_SECONDARY

general_settings:
  background_health_checks: true
  health_check_interval: 60
  enable_health_check_routing: true

Configuration

Setting	Where	Default	Description
`enable_health_check_routing`	`general_settings`	`false`	Enable/disable health-check-driven routing
`health_check_staleness_threshold`	`general_settings`	`health_check_interval * 2`	Seconds before health state is considered stale and ignored
`background_health_checks`	`general_settings`	`false`	Must be `true` for health check routing to work
`health_check_interval`	`general_settings`	`300`	Seconds between health check cycles

Interaction with cooldown

Health check filtering and cooldown are additive. A deployment can be excluded by either mechanism:

Health check filter — proactive, runs on the configured interval, excludes deployments that failed the last check
Cooldown — reactive, triggered by request failures, excludes deployments for a short TTL

This means request failures still provide fast detection between health check intervals.

Staleness

If a health check result is older than health_check_staleness_threshold, it is ignored and the deployment is treated as eligible. This prevents stale data from permanently excluding a deployment if the health check loop stops or slows down.

The default staleness threshold is health_check_interval * 2. For a 60s interval, health state expires after 120s.

Example: custom staleness

general_settings:
  background_health_checks: true
  health_check_interval: 30
  enable_health_check_routing: true
  health_check_staleness_threshold: 90  # ignore health state older than 90s

Debugging

Run the proxy with --detailed_debug and look for:

health_check_routing_state_updated healthy=3 unhealthy=1

This is logged after each health check cycle when routing state is written.

If the safety net triggers (all deployments unhealthy), you'll see:

All deployments marked unhealthy by health checks, bypassing health filter

Health Check Timeout

The health check timeout is set in litellm/constants.py and defaults to 60 seconds.

This can be overridden in the config.yaml by setting health_check_timeout in the model_info section.

model_list:
  - model_name: openai/gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      health_check_timeout: 10 # 👈 OVERRIDE HEALTH CHECK TIMEOUT

Health Check Max Tokens

By default, health checks use max_tokens=1 to minimize cost and latency. For wildcard models, the default is max_tokens=10.

You can override this per-model by setting health_check_max_tokens in the model_info section of your config.yaml.

model_list:
  - model_name: openai/gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      health_check_max_tokens: 5 # 👈 OVERRIDE HEALTH CHECK MAX TOKENS

`/health/readiness`

Unprotected endpoint for checking if proxy is ready to accept requests

Example Request:

curl http://0.0.0.0:4000/health/readiness

Example Response:

{
  "status": "connected",
  "db": "connected",
  "cache": null,
  "litellm_version": "1.40.21",
  "success_callbacks": [
    "langfuse",
    "_PROXY_track_cost_callback",
    "response_taking_too_long_callback",
    "_PROXY_MaxParallelRequestsHandler",
    "_PROXY_MaxBudgetLimiter",
    "_PROXY_CacheControlCheck",
    "ServiceLogging"
  ],
  "last_updated": "2024-07-10T18:59:10.616968"
}

If the proxy is not connected to a database, then the "db" field will be "Not connected" instead of "connected" and the "last_updated" field will not be present.

`/health/liveliness`

Unprotected endpoint for checking if proxy is alive

Example Request:

curl -X 'GET' \
  'http://0.0.0.0:4000/health/liveliness' \
  -H 'accept: application/json'

Example Response:

"I'm alive!"

`/health/services`

Use this admin-only endpoint to check if a connected service (datadog/slack/langfuse/etc.) is healthy.

curl -L -X GET 'http://0.0.0.0:4000/health/services?service=datadog'     -H 'Authorization: Bearer sk-1234'

API Reference

Advanced - Call specific models

To check health of specific models, here's how to call them:

1. Get model id via `/model/info`

curl -X GET 'http://0.0.0.0:4000/v1/model/info' \
--header 'Authorization: Bearer sk-1234' \

Expected Response

{
    "model_name": "bedrock-anthropic-claude-3",
    "litellm_params": {
        "model": "anthropic.claude-3-sonnet-20240229-v1:0"
    },
    "model_info": {
        "id": "634b87c444..", # 👈 UNIQUE MODEL ID
}

2. Call specific model via `/chat/completions`

curl -X POST 'http://localhost:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
  "model": "634b87c444.." # 👈 UNIQUE MODEL ID
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
}
'

When to Use Each Endpoint​

Summary​

Shared Health Check State​

/health​

Request​

Response​

Embedding Models​

Image Generation Models​

Custom Health Check Prompt​

Text Completion Models​

Speech to Text Models​

Text to Speech Models​

Rerank Models​

Batch Models (Azure Only)​

Realtime Models​

OCR Models​

Wildcard Routes​

Background Health Checks​

Disable Background Health Checks For Specific Models​

Hide details​

Health Check Driven Routing​

How it works​

Quick start​

Configuration​

Interaction with cooldown​

Staleness​

Example: custom staleness​

Debugging​

Health Check Timeout​

Health Check Max Tokens​

/health/readiness​

/health/liveliness​

/health/services​

Advanced - Call specific models​

1. Get model id via /model/info​

2. Call specific model via /chat/completions​