Private beta · model provider infrastructure

One API forbetter model routing.

Lumina gives developers an OpenAI-compatible provider surface with automatic model routing, resilient fallback, streaming responses, usage controls, and request telemetry.

  • No credit card
  • Usage controls
  • Provider fallback
01
Provider API
OpenAI-compatible requests and streams
02
Provider fallback
Availability, latency, and usage signals
03
Telemetry
Trace, meter, and evaluate every route

Model routing

Lumina chooses the best route for the task.

Instead of making you pick from a model menu, Lumina classifies the request and automatically routes it to the right path for speed, balanced quality, or deeper reasoning.

Lumina Fast
Short, lightweight tasks
Lumina Balanced
Everyday analysis and writing
Lumina Thinking
Reasoning-heavy work
Task classifier
Understands the request first
Web context
Fresh external signals
Provider fallback
Availability-aware routing
Memory context
Useful preferences and history
Tool execution
Approved actions when needed
Built so Lumina can improve routing without changing your workspace
Roadmap preview · Model Fusion

One prompt. Multiple perspectives. One fused answer.

Model Fusion is the next step in Lumina routing: for high-value tasks, Lumina will run a panel of model routes, compare their reasoning, and synthesize the strongest answer instead of stopping at the first response.

Deep research briefsStrategic decision memosArchitecture and code reviewsProvider comparison
Help shape Model Fusion

Fusion trace

Planned quality pass

In development
1
Prompt

One high-stakes request

2
Panel

Multiple Lumina model routes

3
Review

Compare strengths and disagreement

4
Fuse

One synthesized answer

Panel-based reasoning

Send important work through multiple Lumina routes instead of trusting a single model perspective.

Disagreement analysis

Surface where routes agree, where they conflict, and which evidence should influence the final answer.

Fused final answer

Synthesize the strongest reasoning, context, and sources into one cleaner response for decisions.

Quality controls

Reserve fusion for research, strategy, code review, and other tasks where extra quality is worth the cost.

Platform

Model access should not be a black box.

Lumina adds the provider layer between your app and the model: routing, fallback, usage control, telemetry, and quality evaluation.

Adaptive routing

Right model, right moment

Normalize each request, then route to the Lumina model path that fits the work.

  • Task-aware model selection
  • Lumina Auto, Fast, and Reasoning paths
  • Legacy model aliases for compatibility
  • Streaming and non-streaming responses

Provider surface

A clean developer API

Expose a predictable OpenAI-compatible API instead of hiding app behavior inside requests.

  • /v1/chat/completions
  • /v1/models
  • /v1/usage
  • Customer API keys and project usage

Usage controls

Metered from the first request

Track usage and credits where provider customers expect to see them.

  • Credits charged per completion
  • Token usage capture
  • Rate-limit aware accounting
  • Usage logs by API key

Provider observability

Every route is measurable

Measure model calls, fallback behavior, latency, usage, and quality signals.

  • Request and provider traces
  • Provider latency and error tracking
  • Usage logs by customer API key
  • Credits and billing-ready metering

Capabilities

A provider layer built for model traffic.

Lumina is not another chat wrapper. It is the model gateway layer that accepts requests, chooses the right model path, handles fallback, meters usage, and makes provider behavior traceable.

Provider API

Expose chat completions, model listing, usage reporting, and API-key authentication through one developer surface.

Automatic model routing

Route each request to fast, balanced, or reasoning-heavy Lumina paths using explicit provider policy.

Provider fallback

Fail over by provider health, model availability, and compatibility without changing the client request shape.

Usage metering

Charge credits, record token usage, and expose customer usage logs from the same execution path.

Streaming responses

Support streaming and non-streaming completions while keeping provider errors and usage accounting consistent.

OpenAI-compatible shape

Keep request and response contracts familiar so teams can swap providers with minimal client changes.

Provider telemetry

Capture route choice, upstream latency, fallback behavior, usage, and model quality signals.

Evaluation loop

Feed production traces and eval results back into routing policy so model quality can improve over time.

Streaming by default

See progress as the answer forms.

Provider-aware routing

Route by task, cost, latency, and availability.

Inspectable execution

Expose the stages that shaped the response.

Trust posture

Production-ready means no black boxes.

Credibility starts with product behavior teams can inspect: routing, usage, provider fallback, request traces, and quality evaluation before badges or metrics.

Routing is explainable

Lumina is built around visible provider traces, so model selection and fallback are not treated as a black box.

Add-ons stay explicit

Retrieval, tools, and memory belong in explicit product flows rather than hidden inside default model calls.

Usage is measurable

Requests, model routes, customer API keys, and credit consumption are built to be visible from day one.

Model routing is transparent

Lumina selects the model route automatically for each task, keeping routing behavior part of the product experience.

How it works

Provider-grade routing in four moments.

The backend can stay modular underneath, but the product experience is simple: accept, route, meter, and return a traceable model response.

1

Accept the request

A client sends an OpenAI-compatible completion request with a model, messages, stream preference, and API key.

2

Route the model path

Lumina selects the best fast, balanced, reasoning, or auto route using provider policy and request signals.

3

Fallback and meter

Provider health, fallback rules, credits, token usage, latency, and rate limits are applied in one execution path.

4

Stream and trace

The response streams back while the route, provider events, usage, and quality signals stay inspectable.

Route every model call with confidence.

Join the private beta to help shape a model provider that gives developers routing, fallback, streaming, usage metering, and request telemetry behind one API.

Questions before you join?

Straight answers about the beta, the pipeline, and what Lumina is built to do.

Lumina is an AI model provider and routing platform for teams building production AI products. It exposes an OpenAI-compatible API, routes each request to a Lumina model path, applies resilient fallback, streams responses, and records usage telemetry.

A normal model API usually binds your request to one model. Lumina adds provider-grade orchestration: model aliases, routing policy, fallback, usage metering, request traces, and quality telemetry while keeping the public API predictable.

The public catalog exposes Lumina Auto, Lumina Fast, and Lumina Reasoning. Legacy versioned IDs are accepted as aliases so older clients can keep working while new clients use the provider catalog.

Model Fusion is a roadmap feature for high-value work. The goal is to let Lumina run multiple model routes, compare their reasoning, detect disagreement, and synthesize one stronger final answer.

No. The provider API should be predictable: a chat completion request is routed, metered, streamed, and traced. Retrieval, tools, memory, or web context should be explicit add-ons, not hidden behavior in the default provider path.

Yes. The app pipeline can support tools, web context, and memory where the product explicitly asks for them. The provider-first API keeps those capabilities separate from the default model routing path.

The beta focuses on AI model-provider quality: routing, streaming, provider fallback, usage metering, customer API keys, request logs, admin telemetry, and eval-driven Model Fusion. The waitlist helps us invite users in waves while those surfaces are hardened.

Lumina is in private beta. Join the waitlist to request early access; invites are sent in waves while the hosted product is hardened for broader availability.

Private beta · Inviting in waves

Request access to Lumina

Join the waitlist for model provider infrastructure with automatic routing, resilient fallback, streaming, usage metering, and Model Fusion telemetry.

Join the private beta

No credit card. We’ll email when your invite is ready.

Free to requestNo spam, ever

Private Beta Access

Invites roll out in waves while the hosted product is hardened.

Product Updates

Get concise notes as routing, fallback, telemetry, and Model Fusion features ship.

No Commitment

Free to request access. No credit card or sales call required.