AI Gateway
The AI Gateway tab shows LLM traffic flowing through Zuplo's AI Gateway: request volume, token usage, estimated cost, model and provider distribution, latency, cache effectiveness, and blocked-request reasons. It's visible when the project type is ai.
When to use this
- Audit AI spend by model or provider.
- Compare cache hit rate before and after enabling caching.
- Investigate why requests are being blocked by your guardrails.
Summary KPIs
| Name | What it measures |
|---|---|
| Total Requests | All AI gateway requests in the window. |
| Total Tokens | Sum across requests. Secondary: prompt / completion split. |
| Estimated Cost | Computed from model pricing × token usage. |
| Median Latency | P50 across all AI gateway requests. |
Charts
Request Time Series. Three series in one chart: requests, tokens, and cost over the window.
Model Usage. Stacked bars by model with a sidebar legend showing top models by share. Click a model in the legend to highlight it; the others fade.
Token Breakdown. A donut split of prompt / completion / embedding tokens, plus a time series of the same.
Provider Breakdown. A donut and time series by provider, plus a top-providers list.
Latency Distribution. Histogram of P10, P50, P90, P95, P99.
Latency Over Time. P50, P95, P99 lines.
Cache Hit Rate. Hits vs misses over time, with a summary hit rate. What to look for: a stable hit rate above your target after enabling caching means semantic caching is working as configured.
Blocked Requests. Donut and time series by block reason type. Useful when guardrails or quota policies are doing meaningful work.
Filters
The filter bar applies. See Shared controls.
Troubleshooting
The AI Gateway tab is empty. No AI Gateway traffic has been recorded in the selected window. Start proxying requests through the AI Gateway and the charts populate automatically.
Estimated cost doesn't match my provider bill. Estimated cost is computed from token usage and published pricing. It excludes discounts and credits. See Metrics glossary.
Cache hit rate is 0%. Either caching isn't enabled on the route, or every request was unique enough that no entry matched. Check your AI Gateway cache configuration.