AI Development

Health Check Endpoints for AI Agents: Best Practices for /health

Clarvia Team
Author
Feb 8, 2026
7 min read
Health Check Endpoints for AI Agents: Best Practices for /health

Dead services waste AI agents' time. Your API might have flawless documentation, a well-structured agent card, and a comprehensive service catalog -- but an agent that hits a downed endpoint burns tokens, breaks chains, and moves on to your competitor. The fix is embarrassingly simple: a single URL at /health that proves you are alive and ready to work.

As autonomous agents increasingly orchestrate multi-service workflows -- routing traffic, chaining API calls, making split-second decisions about which service to invoke -- a reliable health endpoint is not a nice-to-have. It is the heartbeat of your AI discoverability.

Why AI Agents Depend on Health Checks

Here is what most teams get wrong: they think health checks are for DevOps dashboards. They were. An engineer would glance at Grafana, a load balancer would pull an unhealthy node out of rotation, and the feedback loop ran through humans who could tolerate a 30-second delay.

AI agents operate on a different clock. An orchestrator running a multi-agent framework makes routing decisions in under 100 milliseconds. No tickets. No Slack threads. No waiting. If your payment processing agent is unresponsive, traffic reroutes to a backup before a human even notices the blip.

That speed creates a brutal filter. Services without health endpoints do not get a second chance -- they get skipped.

Here is what /health unlocks in the AI agent ecosystem:

  • Service discovery validation -- An agent finds your service through an A2A agent card, then pings /health to confirm you are actually reachable before adding you to its working set. No healthy response, no inclusion.
  • Routing decisions -- Orchestrators check health status before every downstream call. The fastest healthy service wins the request.
  • Graceful degradation -- When a dependency fails, agents fall back to alternative services instead of crashing the entire workflow.
  • Trust scoring -- Some agent frameworks track health check history to build reliability scores over time, preferring services with 99.9%+ uptime. Your reputation compounds with every successful ping.

The Three Levels of Health Checks

A single /health endpoint is good. Three tiered endpoints are production-grade. Each level answers a different question, and the distinction matters more than most teams realize.

Level 1: Liveness (Am I Running?)

The liveness check does one thing: confirm the process is alive and the HTTP server can respond. No database queries. No cache lookups. No downstream calls. Just a pulse.

GET /health/live

{ "status": "ok" }

Target: under 10 milliseconds. If this endpoint is slow, something is fundamentally broken -- not degraded, broken.

Level 2: Readiness (Can I Accept Traffic?)

Being alive is not the same as being ready. A service that just booted is running but may still be warming caches, establishing database pools, or loading ML models. The readiness check verifies that essential dependencies are connected and the service can handle real requests.

GET /health/ready

{ "status": "ok", "checks": { "database": "connected", "cache": "connected" } }

Level 3: Deep Health (What Is My Full Status?)

This is the endpoint AI agents and monitoring systems use to make nuanced decisions. It returns version info, uptime, and per-dependency latency -- everything an orchestrator needs to rank your service against alternatives.

GET /health

{ "status": "ok", "version": "2.4.1", "uptime": 84923, "timestamp": "2026-02-18T12:34:56Z", "checks": { "database": { "status": "ok", "latency_ms": 3 }, "redis": { "status": "ok", "latency_ms": 1 }, "openai_api": { "status": "ok", "latency_ms": 142 } } }

Notice the 142ms on the OpenAI API check versus 3ms on the database. That kind of granularity lets agents detect bottlenecks before they cascade.


JSON Response Format Best Practices

FieldTypePurpose
statusstring"ok" or "degraded" or "error"
versionstringDeployed version or git SHA
uptimenumberSeconds since process started
timestampstringCurrent server time in ISO 8601
checksobjectIndividual dependency statuses
Three non-negotiable rules: return HTTP 200 for healthy and 503 for unhealthy (agents check status codes first, not response bodies). Always return valid JSON with Content-Type: application/json. Include a version field so agents can detect stale deployments -- a service running last month's code might be alive but dangerously outdated.

Complete Implementation Examples

Express.js (Node.js)

const startTime = Date.now();

app.get('/health', async (req, res) => { const checks = {}; let overall = 'ok';

// Check database try { const dbStart = Date.now(); await db.query('SELECT 1'); checks.database = { status: 'ok', latency_ms: Date.now() - dbStart }; } catch (err) { checks.database = { status: 'error', message: err.message }; overall = 'degraded'; }

// Check Redis try { const redisStart = Date.now(); await redis.ping(); checks.redis = { status: 'ok', latency_ms: Date.now() - redisStart }; } catch (err) { checks.redis = { status: 'error', message: err.message }; overall = 'degraded'; }

const statusCode = overall === 'ok' ? 200 : 503; res.status(statusCode).json({ status: overall, version: process.env.APP_VERSION || '1.0.0', uptime: Math.floor((Date.now() - startTime) / 1000), timestamp: new Date().toISOString(), checks, }); });

// Lightweight liveness probe app.get('/health/live', (req, res) => { res.json({ status: 'ok' }); });

Python Flask

import time
from flask import Flask, jsonify

app = Flask(__name__) START_TIME = time.time()

@app.route('/health') def health_check(): checks = {} overall = 'ok'

# Check database try: start = time.time() db.session.execute(text('SELECT 1')) checks['database'] = { 'status': 'ok', 'latency_ms': round((time.time() - start) 1000) } except Exception as e: checks['database'] = {'status': 'error', 'message': str(e)} overall = 'degraded'

status_code = 200 if overall == 'ok' else 503 return jsonify({ 'status': overall, 'version': os.getenv('APP_VERSION', '1.0.0'), 'uptime': int(time.time() - START_TIME), 'timestamp': datetime.utcnow().isoformat() + 'Z', 'checks': checks, }), status_code

@app.route('/health/live') def liveness(): return jsonify({'status': 'ok'})

What the Clarvia GEO Checker Checks

The Clarvia GEO Checker includes a dedicated health endpoint layer. When the audit scans your domain, it evaluates five critical signals:

  • Endpoint existence -- Does /health resolve, or does it return a 404? A missing health endpoint is the single most common AI discoverability failure we see.
  • HTTP 200 status -- A healthy service must respond with a 200 status code, not a redirect, not a generic page.
  • Valid JSON response -- The response body must parse as valid JSON. Plain text and HTML responses are invisible to agent parsers.
  • Required fields -- The audit checks for a status field at minimum, and flags missing fields like version, uptime, and checks that agents rely on for routing decisions.
  • Response time -- A health endpoint that takes more than 2 seconds to respond suggests deeper performance problems. Most production-grade services respond in under 50ms.

Kubernetes Integration

If you are running on Kubernetes, mapping these three levels to probes is straightforward. The key insight: liveness and readiness must be separate endpoints with different failure thresholds.

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3

readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 2

Common Mistakes to Avoid

Five-second health checks are not health checks. They are load tests. Use SELECT 1 or PING, not SELECT COUNT() FROM users.

  • Checking too much in liveness probes. If your liveness probe queries the database and the database is slow, Kubernetes restarts the pod -- which adds more load to the already-struggling database. A cascading failure triggered by the very system meant to prevent failures. Keep liveness minimal.
  • Not separating liveness from readiness. A service that is starting up is alive but not ready. Without separate endpoints, you get unnecessary restarts during deployment rollouts -- exactly when your service is most vulnerable.
  • Forgetting proper HTTP status codes. Returning 200 with {"status": "error"} in the body defeats the purpose entirely. Agents and load balancers check the HTTP status code first. Many never parse the body at all.
  • Exposing sensitive information. Never include credentials, internal IPs, or detailed error stack traces in health responses. A health endpoint is public by design -- treat it that way.

Your /health endpoint is the first thing an AI agent checks and the last thing most teams implement. Run the free GEO Checker to see how your health endpoint scores across all five checks -- plus six other AI discoverability layers. If you need help implementing production-grade health checks or making your services AI-agent-ready, get in touch -- we build this every day.

health check endpointAI agent health checkhealth endpoint best practicesservice liveness readiness

Ready to Transform Your Development?

Let's discuss how AI-first development can accelerate your next project.

Book a Consultation

Cookie Preferences

We use cookies to enhance your experience. By continuing, you agree to our use of cookies.