06 -- Rate Limiting

Prerequisite: 05 -- Load Balancing You will need: Running Hangar with a load-balanced group from recipe 05 Time: 5 minutes Adds: Protect MCP servers from request overload

The Problem

A runaway client sends hundreds of requests per second. Your MCP servers can handle 10 concurrent calls each. Without limits, they queue up, timeout, and cascade into health check failures.

The Config

# config.yaml -- Recipe 06: Rate Limiting
mcp_servers:
  my-mcp:
    mode: remote
    endpoint: "http://localhost:8080"
    health_check_interval_s: 10
    max_consecutive_failures: 3

  my-mcp-backup:
    mode: remote
    endpoint: "http://localhost:8081"
    health_check_interval_s: 10
    max_consecutive_failures: 3

  my-mcp-3:
    mode: remote
    endpoint: "http://localhost:8082"
    health_check_interval_s: 10
    max_consecutive_failures: 3

  my-mcp-group:
    mode: group
    strategy: round_robin
    min_healthy: 1
    members:
      - id: my-mcp
        weight: 1
      - id: my-mcp-backup
        weight: 1
      - id: my-mcp-3
        weight: 1

Rate limiting is configured via environment variables:

export MCP_RATE_LIMIT_RPS=1          # NEW: 1 request per second steady-state
export MCP_RATE_LIMIT_BURST=10       # NEW: allow short bursts up to 10

Try It

Start Hangar with rate limiting configured:

MCP_RATE_LIMIT_RPS=1 MCP_RATE_LIMIT_BURST=10 \
  mcp-hangar serve --http --port 8000

Send requests within the limit:

for i in $(seq 1 5); do
  curl -s http://localhost:8000/api/mcp_servers | jq .mcp_servers[0].state
done

All 5 requests succeed.

Flood the API to trigger rate limiting:

for i in $(seq 1 100); do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8000/api/mcp_servers
done

After the burst limit, you see 429 responses:

Wait 60 seconds and verify the limit resets:

sleep 60
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8000/api/mcp_servers

What Just Happened

Rate limiting is enforced per-tool via check_rate_limit(). When the rate exceeds MCP_RATE_LIMIT_RPS (requests per second), subsequent requests receive 429 Too Many Requests. The MCP_RATE_LIMIT_BURST setting allows short spikes above the steady-state rate.

Rate limiting applies to both MCP tool calls and REST API requests.

Key Config Reference

Environment Variable	Type	Default	Description
`MCP_RATE_LIMIT_RPS`	float	`10`	Requests per second steady-state limit
`MCP_RATE_LIMIT_BURST`	int	`20`	Maximum burst above the rate limit

What's Next

Congratulations -- you've completed the sequential path. Your setup has health checks, circuit breakers, failover, load balancing, and rate limiting.

The remaining recipes are standalone. Start with 07 -- Observability: Metrics to add Prometheus and Grafana monitoring.