AIVory Smart Inference
Cheaper inference. One URL. No code changes.
Smart Inference optimizes expenditures by dynamically routing every request to the most cost-effective provider for a given model. Key aspects include:
• Real-time price optimization across multiple providers
• OpenAI-compatible API for seamless integration
• Support for over 50 models, including open-weight options
• Pay-as-you-go pricing with no credit expiration or subscriptions
• Option to self-host on spot GPUs with one-click setup
This system acts as a routing proxy, sitting between your application and over ten inference providers. For each API call, it scores available endpoints based on cost, latency, and availability, then forwards the request to the most economical option meeting quality standards. It continuously monitors live pricing from providers like Together AI, DeepInfra, Fireworks, Groq, Cerebras, and AWS Bedrock, along with spot GPU capacity from RunPod, Vast.ai, Crusoe Cloud, and Azure Spot, reacting to price changes within seconds.
The service is fully compatible with OpenAI's API, supporting `/v1/chat/completions` with features like streaming, tool calling, JSON mode, and vision. This means you can retain your current SDK (Python, TypeScript, or curl), model names, and prompts. The only required change is updating your `base_url` to `https://smart.aivory.net/v1`. Users typically experience median savings of approximately 30%, with potential savings up to 89% on open-weight models.
Ideal for developers, engineering teams, and businesses utilizing large language models who seek to minimize operational costs without compromising performance or requiring complex code changes. It simplifies cost management and ensures requests are always handled by the most budget-friendly option.