Demo · Model Routing & Local Compute

LLM
Balancer.

A unified abstraction layer that puts every LLM provider behind a single API — and a desktop agent that lets any spare laptop or workstation join the local inference pool over WebSocket.

LLM Balancer admin walkthrough
Internal balancer.gysho.com/admin ~38s loop · muted
LLM Balancer local agent app
Desktop app Gysho LLM Balancer Agent ~18s loop · muted
The Problem

Vendor lock-in, idle hardware, opaque costs

Single-vendor LLM strategies mean betting on one roadmap and one pricing model. Meanwhile, on-prem GPUs and laptop NPUs sit unused while cloud bills climb. Multi-provider integration is custom plumbing for every new agent.

What We Built

One API + a local-pool agent

The Balancer normalises requests across Anthropic, OpenAI, Azure, OpenRouter, and any local model — routing by cost, latency, sovereignty, or policy. The desktop Agent lets any spare laptop or workstation register itself as a local-inference node over WebSocket, advertising its MLX models.

The Outcome

Swap models. Reclaim hardware. See the bill.

Provider switches are config, not code. Local capacity shifts on-prem by policy. Every dispatch is logged with cost, latency, and routing reason — AI spend becomes a number you can govern, not a surprise on the invoice.

LLM Balancer Desktop Agent (MLX) WebSocket Pool Anthropic / OpenAI / Azure / OpenRouter Policy Routing Cost & Latency Observability

Want vendor-independent LLM infrastructure for your stack?

Talk to us →