One API forbetter model routing.
Lumina gives developers an OpenAI-compatible provider surface with automatic model routing, resilient fallback, streaming responses, usage controls, and request telemetry.
- No credit card
- Usage controls
- Provider fallback
Routing selected Lumina Auto, streamed the response, recorded usage, and attached a request trace for debugging and billing review
Model routing
Lumina chooses the best route for the task.
Instead of making you pick from a model menu, Lumina classifies the request and automatically routes it to the right path for speed, balanced quality, or deeper reasoning.
One prompt. Multiple perspectives. One fused answer.
Model Fusion is the next step in Lumina routing: for high-value tasks, Lumina will run a panel of model routes, compare their reasoning, and synthesize the strongest answer instead of stopping at the first response.
Fusion trace
Planned quality pass
One high-stakes request
Multiple Lumina model routes
Compare strengths and disagreement
One synthesized answer
Panel-based reasoning
Send important work through multiple Lumina routes instead of trusting a single model perspective.
Disagreement analysis
Surface where routes agree, where they conflict, and which evidence should influence the final answer.
Fused final answer
Synthesize the strongest reasoning, context, and sources into one cleaner response for decisions.
Quality controls
Reserve fusion for research, strategy, code review, and other tasks where extra quality is worth the cost.
Platform
Model access should not be a black box.
Lumina adds the provider layer between your app and the model: routing, fallback, usage control, telemetry, and quality evaluation.
Adaptive routing
Right model, right moment
Normalize each request, then route to the Lumina model path that fits the work.
- •Task-aware model selection
- •Lumina Auto, Fast, and Reasoning paths
- •Legacy model aliases for compatibility
- •Streaming and non-streaming responses
Provider surface
A clean developer API
Expose a predictable OpenAI-compatible API instead of hiding app behavior inside requests.
- •/v1/chat/completions
- •/v1/models
- •/v1/usage
- •Customer API keys and project usage
Usage controls
Metered from the first request
Track usage and credits where provider customers expect to see them.
- •Credits charged per completion
- •Token usage capture
- •Rate-limit aware accounting
- •Usage logs by API key
Provider observability
Every route is measurable
Measure model calls, fallback behavior, latency, usage, and quality signals.
- •Request and provider traces
- •Provider latency and error tracking
- •Usage logs by customer API key
- •Credits and billing-ready metering
Capabilities
A provider layer built for model traffic.
Lumina is not another chat wrapper. It is the model gateway layer that accepts requests, chooses the right model path, handles fallback, meters usage, and makes provider behavior traceable.
Provider API
Expose chat completions, model listing, usage reporting, and API-key authentication through one developer surface.
Automatic model routing
Route each request to fast, balanced, or reasoning-heavy Lumina paths using explicit provider policy.
Provider fallback
Fail over by provider health, model availability, and compatibility without changing the client request shape.
Usage metering
Charge credits, record token usage, and expose customer usage logs from the same execution path.
Streaming responses
Support streaming and non-streaming completions while keeping provider errors and usage accounting consistent.
OpenAI-compatible shape
Keep request and response contracts familiar so teams can swap providers with minimal client changes.
Provider telemetry
Capture route choice, upstream latency, fallback behavior, usage, and model quality signals.
Evaluation loop
Feed production traces and eval results back into routing policy so model quality can improve over time.
See progress as the answer forms.
Route by task, cost, latency, and availability.
Expose the stages that shaped the response.
Trust posture
Production-ready means no black boxes.
Credibility starts with product behavior teams can inspect: routing, usage, provider fallback, request traces, and quality evaluation before badges or metrics.
Routing is explainable
Lumina is built around visible provider traces, so model selection and fallback are not treated as a black box.
Add-ons stay explicit
Retrieval, tools, and memory belong in explicit product flows rather than hidden inside default model calls.
Usage is measurable
Requests, model routes, customer API keys, and credit consumption are built to be visible from day one.
Model routing is transparent
Lumina selects the model route automatically for each task, keeping routing behavior part of the product experience.
How it works
Provider-grade routing in four moments.
The backend can stay modular underneath, but the product experience is simple: accept, route, meter, and return a traceable model response.
Accept the request
A client sends an OpenAI-compatible completion request with a model, messages, stream preference, and API key.
Route the model path
Lumina selects the best fast, balanced, reasoning, or auto route using provider policy and request signals.
Fallback and meter
Provider health, fallback rules, credits, token usage, latency, and rate limits are applied in one execution path.
Stream and trace
The response streams back while the route, provider events, usage, and quality signals stay inspectable.
Route every model call with confidence.
Join the private beta to help shape a model provider that gives developers routing, fallback, streaming, usage metering, and request telemetry behind one API.
Questions before you join?
Straight answers about the beta, the pipeline, and what Lumina is built to do.
Lumina is an AI model provider and routing platform for teams building production AI products. It exposes an OpenAI-compatible API, routes each request to a Lumina model path, applies resilient fallback, streams responses, and records usage telemetry.
A normal model API usually binds your request to one model. Lumina adds provider-grade orchestration: model aliases, routing policy, fallback, usage metering, request traces, and quality telemetry while keeping the public API predictable.
The public catalog exposes Lumina Auto, Lumina Fast, and Lumina Reasoning. Legacy versioned IDs are accepted as aliases so older clients can keep working while new clients use the provider catalog.
Model Fusion is a roadmap feature for high-value work. The goal is to let Lumina run multiple model routes, compare their reasoning, detect disagreement, and synthesize one stronger final answer.
No. The provider API should be predictable: a chat completion request is routed, metered, streamed, and traced. Retrieval, tools, memory, or web context should be explicit add-ons, not hidden behavior in the default provider path.
Yes. The app pipeline can support tools, web context, and memory where the product explicitly asks for them. The provider-first API keeps those capabilities separate from the default model routing path.
The beta focuses on AI model-provider quality: routing, streaming, provider fallback, usage metering, customer API keys, request logs, admin telemetry, and eval-driven Model Fusion. The waitlist helps us invite users in waves while those surfaces are hardened.
Lumina is in private beta. Join the waitlist to request early access; invites are sent in waves while the hosted product is hardened for broader availability.
Request access to Lumina
Join the waitlist for model provider infrastructure with automatic routing, resilient fallback, streaming, usage metering, and Model Fusion telemetry.
Join the private beta
No credit card. We’ll email when your invite is ready.
Private Beta Access
Invites roll out in waves while the hosted product is hardened.
Product Updates
Get concise notes as routing, fallback, telemetry, and Model Fusion features ship.
No Commitment
Free to request access. No credit card or sales call required.