Tabs

AI Gateway

The AI Gateway tab shows LLM traffic flowing through Zuplo's AI Gateway: request volume, token usage, estimated cost, model and provider distribution, latency, cache effectiveness, and blocked-request reasons. It's visible when the project type is ai.

When to use this

Audit AI spend by model or provider.
Compare cache hit rate before and after enabling caching.
Investigate why requests are being blocked by your guardrails.

Summary KPIs

Name	What it measures
Total Requests	All AI gateway requests in the window.
Total Tokens	Sum across requests. Secondary: prompt / completion split.
Estimated Cost	Computed from model pricing × token usage.
Median Latency	P50 across all AI gateway requests.

Charts

Request Time Series. Three series in one chart: requests, tokens, and cost over the window.

Model Usage. Stacked bars by model with a sidebar legend showing top models by share. Click a model in the legend to highlight it; the others fade.

Token Breakdown. A donut split of prompt / completion / embedding tokens, plus a time series of the same.

Provider Breakdown. A donut and time series by provider, plus a top-providers list.

Latency Distribution. Histogram of P10, P50, P90, P95, P99.

Latency Over Time. P50, P95, P99 lines.

Cache Hit Rate. Hits vs misses over time, with a summary hit rate. What to look for: a stable hit rate above your target after enabling caching means semantic caching is working as configured.

Blocked Requests. Donut and time series by block reason type. Useful when guardrails or quota policies are doing meaningful work.

Filters

The filter bar applies. See Shared controls.

Troubleshooting

The AI Gateway tab is empty. No AI Gateway traffic has been recorded in the selected window. Start proxying requests through the AI Gateway and the charts populate automatically.

Estimated cost doesn't match my provider bill. Estimated cost is computed from token usage and published pricing. It excludes discounts and credits. See Metrics glossary.

Cache hit rate is 0%. Either caching isn't enabled on the route, or every request was unique enough that no entry matched. Check your AI Gateway cache configuration.

Edit this page

Last modified on May 15, 2026

Agents MCP Gateway

Name

What it measures

Total Requests

All AI gateway requests in the window.

Total Tokens

Sum across requests. Secondary: prompt / completion split.

Estimated Cost

Computed from model pricing × token usage.

Median Latency

P50 across all AI gateway requests.