Chat widgets that go silent during outages lose trust

Frontier model APIs go down more often than anyone admits in their status page. A 500 here, a 30-second timeout there, a quiet rate limit on a Friday afternoon. Most chatbots treat any failure as a fatal error and either show a generic 'something went wrong' or just spin forever. Visitors close the tab and the chance to convert is gone, with no log of what they were going to ask.

SleekAI lets you configure a fallback model per bot. Set GPT-4o as primary and Claude 3.5 Sonnet as fallback, or Gemini 1.5 Pro as primary and an OpenRouter mix as fallback. When the primary call returns an HTTP error, hits a rate limit, or exceeds your timeout, SleekAI retries the same request against the fallback. The conversation state, the resolved variables, the user message all carry over. The visitor sees a reply that is at most a second slower than usual, with no error to dismiss.

Generic chatbots are wired to a single provider and a single API key. When that provider blips, the whole widget is dead. Self-hosted retry logic is possible in theory but the code to do it correctly across different SDKs and error formats is more work than most teams want to maintain. SleekAI handles the cross-provider translation so the fallback just works.

Workflow

How fallback routing works in practice

1

Configure primary and fallback

In the bot's model settings, pick a primary and a fallback. Both can be from different providers. SleekAI stores both API keys and tracks which one handled each turn. The fallback inherits the same temperature, system prompt, and conversation state automatically.

2

Send the primary request

Every chat turn first goes to the primary model. SleekAI watches for HTTP errors, rate limits, and timeouts. If the response comes back clean, it gets delivered and logged with the primary model name. The fallback never triggers and costs nothing.

3

Retry on failure

If the primary returns 5xx, times out, or is rate-limited, SleekAI translates the same request into the fallback provider's format and retries. The fallback sees the full context, generates a reply, and SleekAI delivers it. The user-facing chat continues without interruption.

4

Log which model handled it

Each conversation row records the model that produced the reply. Filter the log by model to see fallback firing patterns. A spike in fallback usage indicates a primary outage, which is useful even when the fallback masked the impact from end users.

Try it now

A typical fallback-in-action chat

The primary model rate-limits mid-conversation and the fallback finishes the answer without the user noticing.

Comparison

Generic chatbot vs SleekAI for LLM fallback

Generic chatbot

Wired to a single provider with no retry logic
Shows an error message when the API blips
Loses conversation context on any failure
Cannot fail over from OpenAI to Anthropic or Google
Requires custom code to handle different SDK errors

SleekAI chatbot

Primary and fallback configured per bot
Cross-provider fallback: OpenAI to Anthropic, etc.
Conversation state preserved through the swap
Triggers on 5xx, timeout, or rate limit errors
Logs which model handled each turn

Features

What SleekAI gives you for Chatbots With Fallback LLM

Cross-provider failover

Set OpenAI as primary and Anthropic as fallback. Or Google as primary and OpenRouter as fallback. SleekAI translates the request shape between SDKs internally, so the same prompt and conversation history work against either provider without you writing any glue code.

Smart trigger conditions

Fallback triggers on HTTP 5xx errors, request timeouts past your configured limit, and explicit rate-limit responses (429). It does not trigger on user-induced errors like a malformed prompt, which would just fail again on the fallback and waste tokens.

Single-second swap

When the primary fails, the retry against the fallback adds typically 800-1500ms on top of the original timeout. From the visitor's perspective the reply is slightly slower than usual but still arrives. The chat does not die, the conversation does not reset, the experience holds up.

Use cases

When fallback earns the second key

OpenAI outage Tuesdays

ChatGPT and the OpenAI API have correlated incidents that sometimes last 30+ minutes. With Anthropic as fallback, the chatbot stays available through the entire window with no manual intervention from your team.

Black Friday traffic spikes

Inbound chat volume can 10x on launch days or sale events. Even with raised rate limits, the primary provider may throttle bursts. The fallback catches throttled requests and keeps the conversion path intact.

Cost-control overflow

Pair an expensive primary like GPT-4o with a cheaper fallback like Claude 3.5 Haiku or Gemini Flash. When primary capacity is constrained or budget is tight, the cheaper fallback picks up the slack without sacrificing availability.

The bigger picture

Why a chatbot needs a second key

Treating any AI chatbot as critical infrastructure means accepting that single-provider deployment is fragile. Frontier providers have outages, throttling, capacity issues during launches, and occasional bad deploys that turn one of their models flaky for hours. None of this is unusual.

What is unusual is the small number of sites that have actually planned for it. Most just wait out the outage and lose whatever conversions and support load happened during that window. The cost of running with a fallback is essentially zero until it fires.

The configuration is a five-minute setting in the bot dashboard. The two API keys are already needed if you use more than one provider for cost reasons or for testing. There is no separate monitoring service, no infrastructure to maintain, no integration code to write.

The fallback model uses the same prompt and the same conversation state, so quality remains consistent. The risk of running without a fallback is asymmetric: you save nothing in normal operation and you take the full hit during outages. The risk of running with a fallback is bounded: you spend a fraction more on tokens during the rare retries and gain availability the rest of the time.

Questions

Common questions about SleekAI for Chatbots With Fallback LLM

In the bot settings, pick a primary model and a fallback model from the model picker. You can mix providers: GPT-4o primary, Claude 3.5 Sonnet fallback. Both API keys are stored in SleekAI. The fallback inherits the same prompt, variables, and conversation state automatically.

HTTP 5xx errors from the provider, request timeouts past the configured limit, and explicit rate-limit responses (429). Authentication errors (401, 403) and bad-request errors (400) do not trigger fallback, since they will fail on any model and indicate a configuration issue rather than an outage.

Usually not. The retry adds typically 800-1500ms on top of whatever the primary's failure mode took. If the primary timed out at 10 seconds, the fallback reply arrives around 11-12 seconds in. That is slower than ideal but worlds better than a dead chat widget showing 'something went wrong'.

Yes. The conversation state, system instruction, user message, and resolved WordPress variables all carry over to the fallback request. The fallback model sees exactly what the primary saw and produces a coherent continuation, not a fresh conversation.

The standard configuration is primary plus one fallback. For deeper cascades, see the multi-LLM fallback feature which supports a list of models tried in order. Most sites are well-served by primary plus one fallback, since correlated outages across two providers are rare.

Token cost is whatever the fallback provider charges. If your fallback is cheaper (e.g. Claude Haiku at fallback for GPT-4o primary), an outage actually saves money. If it is similar (Sonnet at fallback for GPT-4o), cost is roughly the same. SleekAI logs which model handled each turn so cost breakdowns stay accurate.

Yes. Every conversation turn records which model handled it. The admin log filters by model, so you can see how often the fallback fired in the last week. A sudden spike means the primary had a bad day, which is useful signal even when the fallback masked the user-facing impact.

Yes. Each bot has a toggle for fallback. Useful during testing when you want to confirm primary errors are surfacing correctly, or during a fallback's own incident when you would rather show an error than pile retries onto a struggling provider. The toggle is per-bot, not global.

Other chatbots SleekAI builds well

AI chatbot with personas: multiple bots per site, each with its own voice

SleekAI's multibot architecture lets you run a sales bot on pricing, a docs bot on documentation, and a support bot on the help center, e...

White-Label AI Chatbot you can resell or rebrand

SleekAI exposes filters to relabel the plugin name, hide the menu under your own brand, customize update URLs, and remove any visible men...

GDPR-compliant AI chatbot for WordPress: data stays on your site

SleekAI stores conversations in your own WordPress database, calls the provider directly with your key, and integrates with Complianz, Co...

AI Chatbot with Image Input that reasons about visuals

SleekAI accepts JPEG, PNG, and WebP uploads, validates them server-side, and routes them to a vision-capable model from OpenAI, Anthropic...

AI Chatbot With CMS-Aware Routing for WordPress

SleekAI uses post type, taxonomy term, page template, URL pattern, user role, and login state to decide which chatbot loads on each page,...

Inline AI Chatbot for WordPress Posts and Pages

SleekAI's inline mode renders the chatbot as part of the page flow via shortcode or Gutenberg block. The chat reads the surrounding post ...

Pricing

More than 1000+
happy customers

Explore our flexible licensing options tailored to your needs. Upgrade your license anytime to access more features, or opt for a lifetime license for ongoing value, including lifetime updates and lifetime support. Our hassle-free upgrade process ensures that our platform can grow with you, starting from whichever plan you choose.

Starter

€79

EUR

per year

Get started

3 websites
1 year of updates
1 year of support

Pro

€149

EUR

per year

Get started

Unlimited websites
1 year of updates
1 year of support

Lifetime ♾️

The Bundle (unlimited sites)

Pay once, own it forever

Elevate your WordPress site with our exclusive plugin bundle that includes all of our premium plugins in one package. Enjoy lifetime updates and lifetime support. Save significantly compared to buying plugins individually.

What’s included

SleekAI
SleekByte
SleekMotion
SleekPixel
SleekRank
SleekView

€749

Continue to checkout

Browse more

Plugin Integration

Content Types

Industry Services

Industry Health

AI Chatbot With Fallback LLM for WordPress

Chat widgets that go silent during outages lose trust