Bring your own fine-tuned model on Replicate

Replicate's appeal has always been the long tail. Beyond hosting popular open models like Llama 3 and Mistral, it lets you fine-tune a model on your own data and serve it behind a stable URL with a simple API token. For WordPress sites that want a chatbot tuned to their specific tone, jargon, or product names without standing up an inference cluster from scratch, that pattern is hard to beat.

Replicate exposes an OpenAI-compatible chat completions endpoint at https://api.replicate.com/v1. SleekAI's OpenAI-compatible provider drops onto it: paste the base URL, your Replicate API token in the key field, and pick the model identifier for the deployment - either a built-in like meta/meta-llama-3.1-70b-instruct or your own fine-tune under your-username/your-model-name. Streaming and function calling behave the same way they do on other providers.

Because Replicate bills by compute time rather than per token, the economics work well for bursty WordPress traffic. A docs site that does 5,000 chats one week and 50,000 the next pays for exactly the seconds of GPU time used. Conversations land in wp_sleek_ai_chats with the Replicate model name logged per reply, which makes it easy to track which fine-tune ran on which page over time. For sites that want a chatbot in the company's voice without managing GPUs, the Replicate + SleekAI combination is a sensible default.

Workflow

Wire SleekAI to Replicate in four steps

1

Train or pick a model

Either pick a public model on Replicate like meta/meta-llama-3.1-70b-instruct, or fine-tune your own on training data you provide. Once the fine-tune finishes, note the stable model identifier your-username/your-model-name.

2

Create an API token

Sign in at replicate.com, open Account Settings, and create an API token. The same token authenticates against both public and private fine-tuned model endpoints under your Replicate account.

3

Configure the provider in SleekAI

Choose OpenAI-compatible, set the base URL to https://api.replicate.com/v1, paste the API token, and save. Pick the model identifier per bot - public model or your fine-tune - in each bot configuration screen.

4

Watch logs and tune

Conversations in wp_sleek_ai_chats record the Replicate model name, token counts, and origin page. Cross-check against the Replicate usage dashboard and decide where the fine-tune earns its keep versus a cheaper public model.

Try it now

Ask the Replicate demo bot

This bot is wired to a hypothetical fine-tuned Llama on Replicate for a niche developer tools site. Ask how SleekAI handles the Replicate endpoint and fine-tunes.

Comparison

Generic chatbot vs SleekAI for Replicate

Generic chatbot

Locked to closed-weight providers, no Replicate option
Cannot run user-owned fine-tunes from WordPress
Per-token billing assumed, no compute-time pricing supported
Routes traffic through a vendor relay you cannot audit
No way to mix Replicate fine-tunes with other providers per bot

SleekAI chatbot

Native Replicate via OpenAI-compatible chat completions
Use Llama, Mistral, or your own fine-tune from Replicate
Compute-time billing, friendly to bursty WordPress traffic
Bring your own Replicate API token, no Sleek-hosted relay
Logs Replicate model name per chat for cost reconciliation

Features

What SleekAI gives you for Replicate

Your fine-tune, your bot

Fine-tune a model on Replicate using your product docs, brand voice, and FAQ archive, then point SleekAI at the resulting model string. The WordPress chatbot starts answering in the tone you tuned for, with no extra deployment work.

Wide public model library

Replicate also hosts a long list of public open-weight models. Llama 3.1 70B, Mistral, and others are available under stable namespaces, ready to use through the same OpenAI-compatible chat endpoint and the same API token.

Compute-time billing

Replicate bills by seconds of GPU time used per request. Bursty WordPress traffic - documentation launches, news spikes, seasonal sales - ends up cheaper than per-token pricing on most fine-tune setups, since you only pay for what ran.

Use cases

Where Replicate plus SleekAI fits

Brand-voice chatbots

Sites with a distinctive tone fine-tune a small model on their own posts, scripts, and FAQs on Replicate, then use SleekAI to wire that fine-tune into the WordPress chat widget.

Research and niche models

Academic sites, developer tools, and vertical SaaS deploy specialized models on Replicate - code, biomedicine, legal - and surface them as a chatbot grounded in their published content.

Bursty traffic patterns

Product launches, ticket releases, and viral content moments drive chat volume spikes. Replicate's compute-time pricing handles those bursts more linearly than fixed-tier per-token pricing.

The bigger picture

Why Replicate plus WordPress fits brand voice

Off-the-shelf frontier models are very good at the average chat task and only okay at sounding like a specific brand. A SaaS that prides itself on a punchy, jargon-light tone, a magazine that has spent twenty years polishing its house style, or a developer tools company with strong opinions about how to explain a CLI - none of these get the full benefit of a chatbot that defaults to a generic helpful-assistant register. Replicate gives those teams a path to a model that actually sounds like them.

Upload a corpus of past posts, tickets, and product copy, fine-tune a small open-weight base on Replicate's training pipeline, and the resulting model writes in the voice the brand already established. SleekAI closes the loop on the WordPress side. Drop the fine-tuned model identifier into a bot configuration, pick which post types and meta keys flow into the prompt, set display conditions per template, and the chatbot answering on the site sounds like the rest of the site.

Compute-time billing on Replicate makes the economics work for bursty traffic, conversations land in wp_sleek_ai_chats for review, and Multibot lets the team keep a public model as a fallback for topics outside the fine-tune's strength. The chatbot stops feeling like a bolt-on and starts feeling like part of the publication.

Questions

Common questions about SleekAI for Replicate

No. Replicate's classic prediction API is asynchronous and request-specific. SleekAI uses Replicate's OpenAI-compatible chat completions endpoint at https://api.replicate.com/v1, which speaks the same shape as OpenAI and works with streaming and function calling out of the box.

Once your fine-tune finishes training on Replicate, it gets a stable model identifier of the form your-username/your-model-name. Paste that into the SleekAI bot configuration as the model string, save, and the chatbot starts using the fine-tune on the next visitor message.

Sign in at replicate.com, open Account Settings, and create an API token. Paste it into SleekAI's provider settings as the API key under the OpenAI-compatible adapter with the base URL https://api.replicate.com/v1. The same token works for both public and private fine-tuned models.

meta/meta-llama-3.1-70b-instruct is a solid default for grounded long answers. Smaller Llama and Mistral variants are good for high-volume support chat. For code-heavy bots, the various coder-tuned open models hosted on Replicate work well through the same endpoint.

Replicate bills based on the GPU class your model runs on, multiplied by the wall-clock seconds the prediction took. Some models also have cold-start time, which counts toward billing. For chat use cases, keeping the model warm with steady traffic is the cheapest pattern overall.

Yes. Multibot lets each chatbot pick its own Replicate model string. A common pattern is a fine-tuned brand voice model on the marketing site and a public Llama 3.1 70B on the docs site, both authenticated with the same Replicate API token.

Conversations are written to the wp_sleek_ai_chats table in your own WordPress database, with the Replicate model name logged per reply. The chat request itself goes from WordPress to api.replicate.com and back. Sleek does not log, proxy, or ingest any of that traffic.

Function calling support depends on the underlying base model. Most modern instruct-tuned Llama and Mistral fine-tunes handle OpenAI-compatible tool calls through Replicate's chat completions endpoint. Test with a simple tool first to confirm the specific model you fine-tuned behaves as expected.

Other chatbots SleekAI builds well

SOC 2-friendly AI chatbot for enterprise WordPress sites

SleekAI keeps conversation logs in your own WordPress database, calls the model provider directly with your key, and supports audit-frien...

Knowledge Base Chatbot for WordPress Docs and Help Centers

SleekAI reads your docs, kb_article, or helpie post types directly from wp_posts and ...

AI Chatbot With Staging Environment for WordPress

SleekAI lets you stage prompt and config changes on a sandbox bot or staging site, run real conversations against the draft, and publish ...

AI Chatbot With Role-Based Access for WordPress

SleekAI ties every chatbot edit, key reveal, and log view to a WordPress capability, so editors can tweak prompts while only admins touch...

AI Chatbot for Internal Helpdesk Use Cases

SleekAI reads your runbooks, HR policies, and IT how-tos from a private WordPress install, answers staff questions with the right citatio...

AI Chatbot on a Budget: Cheapest Setup for WordPress

SleekAI is a one-time plugin purchase that runs unlimited conversations on your WordPress install; you bring an OpenAI, Anthropic, Google...

Pricing

More than 1000+
happy customers

Explore our flexible licensing options tailored to your needs. Upgrade your license anytime to access more features, or opt for a lifetime license for ongoing value, including lifetime updates and lifetime support. Our hassle-free upgrade process ensures that our platform can grow with you, starting from whichever plan you choose.

Starter

€79

EUR

per year

Get started

3 websites
1 year of updates
1 year of support

Pro

€149

EUR

per year

Get started

Unlimited websites
1 year of updates
1 year of support

Lifetime ♾️

The Bundle (unlimited sites)

Pay once, own it forever

Elevate your WordPress site with our exclusive plugin bundle that includes all of our premium plugins in one package. Enjoy lifetime updates and lifetime support. Save significantly compared to buying plugins individually.

What’s included

SleekAI
SleekByte
SleekMotion
SleekPixel
SleekRank
SleekView

€749

Continue to checkout

Browse more

Plugin Integration

Content Types

Meta Ai

Industry Health

AI Chatbot with Replicate on WordPress

Bring your own fine-tuned model on Replicate