AI Chatbot with a Self-Hosted LLM
SleekAI talks to any OpenAI-compatible endpoint, so a vLLM, TGI, or llama.cpp server on your VPC becomes the brain of the bot. You bring the model and the API key, SleekAI handles the WordPress wiring, prompts, display conditions, and logs.
♾️ Lifetime License available
Keep the model and the prompts on your own boxes
Most WordPress AI plugins force you through their hosted relay or a hard-coded list of providers. That means the prompt, the visitor question, and the answer all transit a third-party server before they reach the model you actually wanted to use. For regulated industries, internal portals, and any team that already runs vLLM, Text Generation Inference, or llama.cpp behind a load balancer, that detour is a non-starter.
SleekAI treats any OpenAI-compatible endpoint as a first-class provider. Drop the base URL of your inference server into the provider settings, paste the API key you minted on your own gateway, pick a deployed model name like llama-3.1-70b-instruct or qwen2.5-72b, and the bot starts answering. The chat widget, postmeta wiring, display conditions, and logs all behave exactly the same as they would on OpenAI or Anthropic.
Because the request hits your server directly from WordPress, no Sleek-hosted proxy sits in between. Conversations land in wp_sleek_ai_chats on the same database that owns the rest of your site, and logs record the exact provider URL and model that produced each reply. Auditors get a single source of truth and your inference bill stays on the cloud account you already have a contract with.
Workflow
Wire SleekAI to your own LLM in four steps
Stand up the endpoint
Add the provider
Create a bot
Watch the logs
Try it now
Ask the self-hosted demo bot
Comparison
Generic chatbot vs SleekAI for self-hosted LLMs
Generic chatbot
- Locks you into the vendor's hosted model list
- Routes prompts through a third-party relay you cannot audit
- No way to set a custom OpenAI-compatible base URL
- Cannot pass per-tenant headers or auth to your gateway
- Logs live on the vendor's server, not your database
SleekAI chatbot
- Custom base URL for any OpenAI-compatible endpoint
- Works with vLLM, TGI, llama.cpp, Ollama, LM Studio
- Conversations logged in wp_sleek_ai_chats on your own DB
- Bring your own API key, no Sleek-hosted proxy in the path
- Per-bot model selection so prod and staging can differ
Features
What SleekAI gives you for Self-Hosted LLM
OpenAI-compatible base URL
Paste the base URL of your vLLM, TGI, or llama.cpp deployment, add the gateway key, and SleekAI talks to it directly. The provider field accepts any HTTPS endpoint that speaks the OpenAI chat completions schema.
No vendor relay
Every chat request goes from WordPress straight to your inference server. Prompts, system instructions, and conversation history never touch a Sleek-controlled host, which keeps the audit trail short and clean.
Per-bot model overrides
Run a Llama 3.1 70B bot on docs, a smaller 8B on the marketing site, and a fine-tuned variant in staging - all pointing at different deployment names on the same self-hosted gateway from one WordPress install.
Use cases
Where teams point SleekAI at their own LLM
Regulated industries
Finance, health, and government sites that cannot send prompts to a US-hosted SaaS run inference on a self-hosted box and use SleekAI as the WordPress front door.
ML platform teams
Internal platform teams already run vLLM or TGI for other apps. SleekAI lets the WordPress marketing site reuse the same gateway instead of standing up a separate provider account.
Model evaluation
Swap base URLs to compare a fine-tuned model against a stock checkpoint on real visitor traffic. Logs make it trivial to A/B answers without rebuilding the chatbot.
The bigger picture
Why self-hosted matters for WordPress chat
Self-hosting an LLM used to mean running a research notebook and praying. In the last two years vLLM, Text Generation Inference, and LiteLLM turned that into a normal piece of infrastructure that a platform team can deploy in an afternoon. The result is that more and more companies have a perfectly good internal model serving other apps while their WordPress marketing site is still sending visitor questions to a SaaS chatbot.
That is a sourcing mismatch, a security mismatch, and a cost mismatch all at once. The model your CISO already approved is sitting idle while a separate vendor is metering your traffic. SleekAI fixes that gap by treating any OpenAI-compatible endpoint as a normal provider.
The same WordPress install that runs your blog and your pricing page can talk to the inference cluster your platform team already operates. Prompts and conversations stay inside the perimeter, billing consolidates, and the audit trail is one database table instead of three vendor dashboards. For regulated industries this is often the only path to shipping a chatbot at all.
For everyone else it is just cleaner architecture.
Questions
Common questions about SleekAI for Self-Hosted LLM
No. When you configure a custom base URL, WordPress sends the chat completion request directly to that URL. There is no Sleek-hosted relay, proxy, or telemetry hop between your visitor and your inference server. The only outbound dependency is whatever endpoint you choose to point at.
 Anything that exposes an OpenAI-compatible /v1/chat/completions endpoint. That includes vLLM, Text Generation Inference, llama.cpp server, LM Studio, LocalAI, Ollama, and most inference gateways like LiteLLM, Portkey, and Helicone in proxy mode.
 Yes. Multibot lets you create multiple chatbots in one WordPress site, each with its own provider, base URL, model name, system prompt, and display conditions. A 70B model can serve docs while an 8B handles the marketing pages.
 SleekAI sends a standard Authorization: Bearer header with the API key you saved per provider. If your gateway accepts that header, you are done. Most teams put LiteLLM or Portkey in front of their cluster and mint per-tenant keys there.
 Never. Every conversation is written to the wp_sleek_ai_chats table in your own WordPress database. You control retention, exports, and deletion through standard WP-CLI or SQL. Sleek has no telemetry pipeline that ingests your chats.
 The OpenAI Files integration is specific to OpenAI's hosted assistants. For self-hosted setups you can map a custom table or postmeta field instead, or run a self-hosted embeddings endpoint and store vectors in your own retrieval store referenced by mapped variables.
 Provider settings expose the base URL and bearer key out of the box. For extra headers like x-tenant-id, the recommended pattern is to put LiteLLM, Cloudflare Workers, or an Nginx layer in front of the model server and inject the headers there.
 SleekAI surfaces the upstream error in the chat log and shows a graceful fallback message to the visitor. You can configure a secondary provider per bot, so an outage on your primary gateway can transparently fail over to a cloud provider until the box is back.
 Pricing
More than 1000+
happy customers
Explore our flexible licensing options tailored to your needs. Upgrade your license anytime to access more features, or opt for a lifetime license for ongoing value, including lifetime updates and lifetime support. Our hassle-free upgrade process ensures that our platform can grow with you, starting from whichever plan you choose.
Lifetime ♾️
Most popular
EUR
once
- Unlimited websites
- Lifetime updates
- Lifetime support
...or get the Bundle Deal
and save €250 🎁
The Bundle (unlimited sites)
Pay once, own it forever
Elevate your WordPress site with our exclusive plugin bundle that includes all of our premium plugins in one package. Enjoy lifetime updates and lifetime support. Save significantly compared to buying plugins individually.
What’s included
-
SleekAI
-
SleekByte
-
SleekMotion
-
SleekPixel
-
SleekRank
-
SleekView
€749
Continue to checkoutBrowse more
- Savings Calculator Chatbot
- KYC Onboarding
- Bug Report
- Shipping Rate Quote Chatbot
- Cookie Preferences Chatbot
- Visitor Check-In Chatbot
- Demo Replay Chatbot
- Wait Time Chatbot
- Size Guide
- Shipping Tracker Chatbot
- affiliate program pages
- Whitepaper Delivery Chatbot
- Password Reset
- Policy Explainer Chatbot
- Onboarding Walkthrough Chatbot
- Chatbot With SMS Fallback
- Fireworks AI
- Chatbot with Streaming Responses
- Mistral
- Lightweight Chatbot
- Chatbot with Citations
- Chatbot With Role-Based Access
- Knowledge Base Search Bots
- Chatbot With Usage Quota
- SOC 2 Compliant Chatbot
- Chatbot with Source Links
- Chatbot with Conversation History
- Chatbot With RAG
- Support Ticket Deflection
- Fast Chatbot
- Hyperbaric Oxygen Clinics
- Gynecologists
- Neurologists
- Pediatric Speech Therapy
- Imaging Centers
- ABA Therapy Providers
- Executive Physical Clinics
- LASIK and refractive surgery clinics
- Vampire Facial Clinics
- Hospice Care Providers
- Telehealth Providers
- Spine Surgery Centers
- plastic surgeons
- General surgeons
- Assisted Living Facilities