Token streaming, not request-response

A non-streaming chatbot waits for the model to finish, then drops the entire answer into the widget. On a Llama 3.1 70B reply that runs three paragraphs, the visitor stares at a spinner for four to eight seconds before anything shows up. That is the difference between a chatbot that feels alive and one that feels broken, and most WordPress AI plugins still ship the spinner version.

SleekAI streams. The PHP side opens a server-sent events connection to OpenAI, Anthropic, Fireworks, or any OpenAI-compatible endpoint that supports stream: true. As tokens come back, the widget appends them character by character, so the visitor sees the first word in roughly the time the model needs to emit it (typically 200 to 500 ms on a hosted provider).

The streaming layer is invisible to the rest of the bot. Mapped variables still inject postmeta and taxonomies into the system prompt, display conditions still scope the bot to the right post types and roles, and the final completed message lands in wp_sleek_ai_chats with full token counts. Only the rendering changes, and visitors immediately feel the difference.

Workflow

Stream tokens through SleekAI in four steps

1

Pick a streaming-capable provider

OpenAI, Anthropic, Fireworks, Together, Groq, and most self-hosted vLLM or TGI deployments support stream: true. Confirm by sending one curl request with that flag and verifying you get back a stream of event chunks rather than a single JSON blob.

2

Enable streaming on the bot

Open the chatbot settings, flip the streaming toggle on, and save. SleekAI will start sending stream: true on every chat completion request and the widget will switch from spinner mode to token-by-token rendering automatically.

3

Tune the proxy layer

If WordPress sits behind Nginx, Cloudflare, or a host WAF, allow the chat endpoint to bypass response buffering. SleekAI sets X-Accel-Buffering: no and the right cache headers, but some stacks need an explicit rule for SSE routes.

4

Audit through wp_sleek_ai_chats

Every streamed reply ends up in the conversation log with prompt and completion tokens, model, and provider URL. Partial responses get flagged so you can filter for visitors who disconnected and adjust streaming UX if disconnects spike.

Try it now

Watch the streaming demo bot

This bot demonstrates how token streaming feels in the SleekAI widget. Ask anything and the reply appears word by word as the model produces it.

Comparison

Generic chatbot vs SleekAI for streaming tokens

Generic chatbot

Waits for the full model response, then renders it all at once
Visitor sees a spinner for four to eight seconds on long answers
No server-sent events, just plain XHR or fetch with json body
Cannot cancel an in-flight request when the visitor leaves
Token counts only logged on success, partial responses lost

SleekAI chatbot

Server-sent events stream tokens as the model produces them
First token typically lands in 200 to 500 ms on hosted providers
Works with any OpenAI-compatible provider that supports stream: true
Aborts upstream cleanly when the visitor leaves the page
Partial responses still logged with consumed token counts

Features

What SleekAI gives you for Streaming Tokens

SSE-based streaming

SleekAI opens a server-sent events channel from PHP to the upstream provider, then pipes the chunks straight to the chat widget. The visitor sees the reply build word by word instead of waiting for a full JSON response to land.

Sub-second first token

Perceived latency on a streaming bot is the time to first token, not total completion. On hosted providers that is typically 200 to 500 ms, which makes even a 600-word Llama 70B answer feel instant in the widget.

Cancel on disconnect

When a visitor closes the tab or navigates away, SleekAI tears down the upstream connection so you stop paying for tokens the visitor will never see. The partial response is still logged with the tokens that were consumed.

Use cases

Where streaming changes the chat experience

Long documentation answers

Docs questions often need three or four paragraphs of explanation. Streaming makes those answers feel responsive instead of broken, because the visitor starts reading the first sentence while the model is still finishing the last.

Product recommendations

When a shopper asks for the right SKU, the bot can stream a tailored recommendation built from postmeta and product taxonomies. The first line appears in under a second, which keeps the visitor engaged through the rest of the explanation.

Tutoring and learning

Educational bots need step-by-step explanations. Streaming lets the learner read step one while the model is still composing step two, which feels closer to a real tutor than a request-response transcript ever does.

The bigger picture

Why streaming is the baseline for modern chat

Visitors have been trained by ChatGPT, Claude, and Gemini to expect words to appear as the model is still thinking. When a WordPress chatbot reverts to a spinner and dumps the whole answer at once, the experience instantly feels years out of date even when the underlying model is the same. The fix is not faster inference, it is streaming.

Server-sent events let the server push tokens to the browser the moment the model emits them, which collapses perceived latency to whatever the first-token latency of the upstream provider happens to be. On hosted providers that number is usually well under a second. SleekAI ships streaming as the default, not an upsell.

Each chatbot has a toggle, the widget renders tokens as they arrive, partial responses are logged for audit, and the upstream connection is torn down cleanly when the visitor leaves the page. The result is a chatbot that feels like the tools your visitors already use, on a stack you fully own, without writing a single line of SSE plumbing yourself. That alignment with modern UX expectations matters more for engagement than almost any other implementation detail in a chatbot.

Questions

Common questions about SleekAI for Streaming Tokens

SleekAI opens a server-sent events connection from the PHP request to the upstream provider with stream: true in the request body. PHP flushes each chunk to the browser as it arrives. The chat widget consumes the EventSource on the client and appends tokens to the active assistant bubble until the [DONE] sentinel arrives.

Yes. Anthropic exposes streaming on the messages API with stream: true, and SleekAI handles the slightly different event shape (content_block_delta versus chat.completion.chunk) transparently. From the WordPress side the configuration is identical to a streaming OpenAI bot.

Some WordPress hosts run a WAF or reverse proxy that buffers responses and breaks SSE. SleekAI sends the standard headers (Content-Type: text/event-stream, Cache-Control: no-cache, X-Accel-Buffering: no) to tell Nginx and most CDNs to pass chunks through. If your stack still buffers, disable buffering on the chat endpoint route.

Yes. Each chatbot has a streaming toggle. Turn it off when the upstream provider does not support SSE, when you are debugging a structured output use case, or when a downstream tool expects a full JSON blob rather than a chunked response.

Most streaming providers send usage data in the final chunk before the [DONE] marker. SleekAI captures that and writes prompt and completion token counts into wp_sleek_ai_chats alongside the full reply text. On providers that omit usage, SleekAI estimates with the encoding for the configured model.

If the visitor disconnects mid-stream, SleekAI still writes a row to wp_sleek_ai_chats with whatever text was streamed so far and the tokens consumed up to that point. The row is marked as a partial response so you can filter for it when auditing failed conversations.

Object caching is fine, but full-page cache plugins can intercept the chat endpoint and cache the SSE response, which obviously breaks streaming. SleekAI registers the chat endpoint as non-cacheable through standard WP filters, and the docs include an exclude list snippet for WP Rocket, W3 Total Cache, and Litespeed.

Yes. The chat widget shows a typing indicator from the moment the request fires until the first token arrives, then switches to the streaming bubble. The indicator returns briefly during long inference pauses if the upstream stops sending tokens for more than two seconds.

Other chatbots SleekAI builds well

AI Chatbot with Analytics: See Every Conversation, Token, and Page

SleekAI logs every conversation with model name, prompt and completion tokens, response time, origin URL, and visitor session in a normal...

Fullscreen AI Chatbot for WordPress: Immersive Chat Page

SleekAI's fullscreen mode renders the chatbot as an immersive, full-page surface (think ChatGPT, not a corner bubble) backed by your Word...

AI Chatbot for Customer Success Use Cases

SleekAI reads your help docs, release notes, and user meta directly from WordPress, answers adoption questions in plain language, and rou...

AI Chatbot With Conversation Export for WordPress

SleekAI stores every conversation in your WordPress database and ships with one-click CSV or JSON export from the admin, plus a WP-CLI co...

AI Chat Summarizer for WordPress

SleekAI logs every chat with the transcript, then summarizes it into a structured record: intent, key fields, action items, and next step...

AI Chatbot for Agencies: White Label and Multi Client

SleekAI installs on each client's WordPress install, reads their posts, products, and ACF fields, and runs on the client's own OpenAI, An...

Pricing

More than 1000+
happy customers

Explore our flexible licensing options tailored to your needs. Upgrade your license anytime to access more features, or opt for a lifetime license for ongoing value, including lifetime updates and lifetime support. Our hassle-free upgrade process ensures that our platform can grow with you, starting from whichever plan you choose.

Starter

€79

EUR

per year

Get started

3 websites
1 year of updates
1 year of support

Pro

€149

EUR

per year

Get started

Unlimited websites
1 year of updates
1 year of support

Lifetime ♾️

The Bundle (unlimited sites)

Pay once, own it forever

Elevate your WordPress site with our exclusive plugin bundle that includes all of our premium plugins in one package. Enjoy lifetime updates and lifetime support. Save significantly compared to buying plugins individually.

What’s included

SleekAI
SleekByte
SleekMotion
SleekPixel
SleekRank
SleekView

€749

Continue to checkout

Browse more

Plugin Integration

Content Types

Meta Ai

Industry Health

AI Chatbot with Streaming Tokens for instant replies

Token streaming, not request-response