Keep the model and the prompts on your own boxes

Most WordPress AI plugins force you through their hosted relay or a hard-coded list of providers. That means the prompt, the visitor question, and the answer all transit a third-party server before they reach the model you actually wanted to use. For regulated industries, internal portals, and any team that already runs vLLM, Text Generation Inference, or llama.cpp behind a load balancer, that detour is a non-starter.

SleekAI treats any OpenAI-compatible endpoint as a first-class provider. Drop the base URL of your inference server into the provider settings, paste the API key you minted on your own gateway, pick a deployed model name like llama-3.1-70b-instruct or qwen2.5-72b, and the bot starts answering. The chat widget, postmeta wiring, display conditions, and logs all behave exactly the same as they would on OpenAI or Anthropic.

Because the request hits your server directly from WordPress, no Sleek-hosted proxy sits in between. Conversations land in wp_sleek_ai_chats on the same database that owns the rest of your site, and logs record the exact provider URL and model that produced each reply. Auditors get a single source of truth and your inference bill stays on the cloud account you already have a contract with.

Workflow

Wire SleekAI to your own LLM in four steps

1

Stand up the endpoint

Deploy vLLM, TGI, llama.cpp, or LiteLLM in front of your model on a box you control. Confirm /v1/chat/completions returns a valid OpenAI-shaped response with a curl probe before touching WordPress.

2

Add the provider

In SleekAI provider settings choose OpenAI-compatible, paste the base URL like https://llm.internal.example.com/v1, and save the API key your gateway expects in the Authorization header.

3

Create a bot

Pick the new provider, set the model name your gateway routes to, write a system prompt, and choose which post types or fields flow into the context. Multibot lets each bot use a different model on the same install.

4

Watch the logs

Every reply lands in wp_sleek_ai_chats with the provider URL, model, token counts, and originating page. Filter by failed responses to catch gateway issues before customers do.

Try it now

Ask the self-hosted demo bot

This bot is wired to a hypothetical internal LLM behind a customer's VPC. Ask how SleekAI handles custom base URLs, model names, and on-prem deployments.

Comparison

Generic chatbot vs SleekAI for self-hosted LLMs

Generic chatbot

Locks you into the vendor's hosted model list
Routes prompts through a third-party relay you cannot audit
No way to set a custom OpenAI-compatible base URL
Cannot pass per-tenant headers or auth to your gateway
Logs live on the vendor's server, not your database

SleekAI chatbot

Custom base URL for any OpenAI-compatible endpoint
Works with vLLM, TGI, llama.cpp, Ollama, LM Studio
Conversations logged in wp_sleek_ai_chats on your own DB
Bring your own API key, no Sleek-hosted proxy in the path
Per-bot model selection so prod and staging can differ

Features

What SleekAI gives you for Self-Hosted LLM

OpenAI-compatible base URL

Paste the base URL of your vLLM, TGI, or llama.cpp deployment, add the gateway key, and SleekAI talks to it directly. The provider field accepts any HTTPS endpoint that speaks the OpenAI chat completions schema.

No vendor relay

Every chat request goes from WordPress straight to your inference server. Prompts, system instructions, and conversation history never touch a Sleek-controlled host, which keeps the audit trail short and clean.

Per-bot model overrides

Run a Llama 3.1 70B bot on docs, a smaller 8B on the marketing site, and a fine-tuned variant in staging - all pointing at different deployment names on the same self-hosted gateway from one WordPress install.

Use cases

Where teams point SleekAI at their own LLM

Regulated industries

Finance, health, and government sites that cannot send prompts to a US-hosted SaaS run inference on a self-hosted box and use SleekAI as the WordPress front door.

ML platform teams

Internal platform teams already run vLLM or TGI for other apps. SleekAI lets the WordPress marketing site reuse the same gateway instead of standing up a separate provider account.

Model evaluation

Swap base URLs to compare a fine-tuned model against a stock checkpoint on real visitor traffic. Logs make it trivial to A/B answers without rebuilding the chatbot.

The bigger picture

Why self-hosted matters for WordPress chat

Self-hosting an LLM used to mean running a research notebook and praying. In the last two years vLLM, Text Generation Inference, and LiteLLM turned that into a normal piece of infrastructure that a platform team can deploy in an afternoon. The result is that more and more companies have a perfectly good internal model serving other apps while their WordPress marketing site is still sending visitor questions to a SaaS chatbot.

That is a sourcing mismatch, a security mismatch, and a cost mismatch all at once. The model your CISO already approved is sitting idle while a separate vendor is metering your traffic. SleekAI fixes that gap by treating any OpenAI-compatible endpoint as a normal provider.

The same WordPress install that runs your blog and your pricing page can talk to the inference cluster your platform team already operates. Prompts and conversations stay inside the perimeter, billing consolidates, and the audit trail is one database table instead of three vendor dashboards. For regulated industries this is often the only path to shipping a chatbot at all.

For everyone else it is just cleaner architecture.

Questions

Common questions about SleekAI for Self-Hosted LLM

No. When you configure a custom base URL, WordPress sends the chat completion request directly to that URL. There is no Sleek-hosted relay, proxy, or telemetry hop between your visitor and your inference server. The only outbound dependency is whatever endpoint you choose to point at.

Anything that exposes an OpenAI-compatible /v1/chat/completions endpoint. That includes vLLM, Text Generation Inference, llama.cpp server, LM Studio, LocalAI, Ollama, and most inference gateways like LiteLLM, Portkey, and Helicone in proxy mode.

Yes. Multibot lets you create multiple chatbots in one WordPress site, each with its own provider, base URL, model name, system prompt, and display conditions. A 70B model can serve docs while an 8B handles the marketing pages.

SleekAI sends a standard Authorization: Bearer header with the API key you saved per provider. If your gateway accepts that header, you are done. Most teams put LiteLLM or Portkey in front of their cluster and mint per-tenant keys there.

Never. Every conversation is written to the wp_sleek_ai_chats table in your own WordPress database. You control retention, exports, and deletion through standard WP-CLI or SQL. Sleek has no telemetry pipeline that ingests your chats.

The OpenAI Files integration is specific to OpenAI's hosted assistants. For self-hosted setups you can map a custom table or postmeta field instead, or run a self-hosted embeddings endpoint and store vectors in your own retrieval store referenced by mapped variables.

Provider settings expose the base URL and bearer key out of the box. For extra headers like x-tenant-id, the recommended pattern is to put LiteLLM, Cloudflare Workers, or an Nginx layer in front of the model server and inject the headers there.

SleekAI surfaces the upstream error in the chat log and shows a graceful fallback message to the visitor. You can configure a secondary provider per bot, so an outage on your primary gateway can transparently fail over to a cloud provider until the box is back.

Other chatbots SleekAI builds well

AI Chatbot With SMS Fallback for WordPress

SleekAI answers questions in chat using WordPress data and sends the transcript as an SMS to the visitor or your on-call number when they...

AI Chatbot with Fireworks AI: fast open-model inference

SleekAI treats Fireworks AI like any other OpenAI-compatible provider. Paste the Fireworks base URL, drop in the API key you minted in yo...

AI Chatbot with Streaming Responses: Word by Word UX

SleekAI streams from OpenAI, Anthropic, Google, and OpenRouter directly into your WordPress widget, so visitors see the first token in un...

AI Chatbot with Mistral on WordPress

SleekAI plugs into Mistral La Plateforme via its OpenAI-compatible chat completions endpoint, so a chatbot grounded in your WordPress con...

Lightweight AI Chatbot for WordPress: Small JS, Fast Load

SleekAI's widget weighs in around 35 KB gzipped, loads from your own domain through the WordPress enqueue system, runs after first paint,...

AI Chatbot with Citations: Cite every answer with a source

SleekAI grounds replies in your actual posts, products, and custom fields, then attaches a citation with the post title and permalink so ...

Pricing

More than 1000+
happy customers

Explore our flexible licensing options tailored to your needs. Upgrade your license anytime to access more features, or opt for a lifetime license for ongoing value, including lifetime updates and lifetime support. Our hassle-free upgrade process ensures that our platform can grow with you, starting from whichever plan you choose.

Starter

€79

EUR

per year

Get started

3 websites
1 year of updates
1 year of support

Pro

€149

EUR

per year

Get started

Unlimited websites
1 year of updates
1 year of support

Lifetime ♾️

The Bundle (unlimited sites)

Pay once, own it forever

Elevate your WordPress site with our exclusive plugin bundle that includes all of our premium plugins in one package. Enjoy lifetime updates and lifetime support. Save significantly compared to buying plugins individually.

What’s included

SleekAI
SleekByte
SleekMotion
SleekPixel
SleekRank
SleekView

€749

Continue to checkout

Browse more

Plugin Integration

Content Types

Meta Ai

Industry Health

AI Chatbot with a Self-Hosted LLM

Keep the model and the prompts on your own boxes