✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount
✨ New Plugin Alert ✨ SleekRank is now available with €50 launch discount

AI Chatbot with Streaming Tokens for instant replies

SleekAI uses server-sent events to push every token from your provider to the visitor as soon as the model produces it. The chat widget renders the reply as it streams, which cuts perceived latency to under a second even on long answers. Bring your own OpenAI, Anthropic, or compatible API key.

♾️ Lifetime License available

SleekAI chatbot for Streaming Tokens

Token streaming, not request-response

A non-streaming chatbot waits for the model to finish, then drops the entire answer into the widget. On a Llama 3.1 70B reply that runs three paragraphs, the visitor stares at a spinner for four to eight seconds before anything shows up. That is the difference between a chatbot that feels alive and one that feels broken, and most WordPress AI plugins still ship the spinner version.

SleekAI streams. The PHP side opens a server-sent events connection to OpenAI, Anthropic, Fireworks, or any OpenAI-compatible endpoint that supports stream: true. As tokens come back, the widget appends them character by character, so the visitor sees the first word in roughly the time the model needs to emit it (typically 200 to 500 ms on a hosted provider).

The streaming layer is invisible to the rest of the bot. Mapped variables still inject postmeta and taxonomies into the system prompt, display conditions still scope the bot to the right post types and roles, and the final completed message lands in wp_sleek_ai_chats with full token counts. Only the rendering changes, and visitors immediately feel the difference.

Workflow

Stream tokens through SleekAI in four steps

1

Pick a streaming-capable provider

OpenAI, Anthropic, Fireworks, Together, Groq, and most self-hosted vLLM or TGI deployments support stream: true. Confirm by sending one curl request with that flag and verifying you get back a stream of event chunks rather than a single JSON blob.
2

Enable streaming on the bot

Open the chatbot settings, flip the streaming toggle on, and save. SleekAI will start sending stream: true on every chat completion request and the widget will switch from spinner mode to token-by-token rendering automatically.
3

Tune the proxy layer

If WordPress sits behind Nginx, Cloudflare, or a host WAF, allow the chat endpoint to bypass response buffering. SleekAI sets X-Accel-Buffering: no and the right cache headers, but some stacks need an explicit rule for SSE routes.
4

Audit through wp_sleek_ai_chats

Every streamed reply ends up in the conversation log with prompt and completion tokens, model, and provider URL. Partial responses get flagged so you can filter for visitors who disconnected and adjust streaming UX if disconnects spike.

Try it now

Watch the streaming demo bot

This bot demonstrates how token streaming feels in the SleekAI widget. Ask anything and the reply appears word by word as the model produces it.

Comparison

Generic chatbot vs SleekAI for streaming tokens

Generic chatbot

  • Waits for the full model response, then renders it all at once
  • Visitor sees a spinner for four to eight seconds on long answers
  • No server-sent events, just plain XHR or fetch with json body
  • Cannot cancel an in-flight request when the visitor leaves
  • Token counts only logged on success, partial responses lost

SleekAI chatbot

  • Server-sent events stream tokens as the model produces them
  • First token typically lands in 200 to 500 ms on hosted providers
  • Works with any OpenAI-compatible provider that supports stream: true
  • Aborts upstream cleanly when the visitor leaves the page
  • Partial responses still logged with consumed token counts

Features

What SleekAI gives you for Streaming Tokens

SSE-based streaming

SleekAI opens a server-sent events channel from PHP to the upstream provider, then pipes the chunks straight to the chat widget. The visitor sees the reply build word by word instead of waiting for a full JSON response to land.

Sub-second first token

Perceived latency on a streaming bot is the time to first token, not total completion. On hosted providers that is typically 200 to 500 ms, which makes even a 600-word Llama 70B answer feel instant in the widget.

Cancel on disconnect

When a visitor closes the tab or navigates away, SleekAI tears down the upstream connection so you stop paying for tokens the visitor will never see. The partial response is still logged with the tokens that were consumed.

Use cases

Where streaming changes the chat experience

Long documentation answers

Docs questions often need three or four paragraphs of explanation. Streaming makes those answers feel responsive instead of broken, because the visitor starts reading the first sentence while the model is still finishing the last.

Product recommendations

When a shopper asks for the right SKU, the bot can stream a tailored recommendation built from postmeta and product taxonomies. The first line appears in under a second, which keeps the visitor engaged through the rest of the explanation.

Tutoring and learning

Educational bots need step-by-step explanations. Streaming lets the learner read step one while the model is still composing step two, which feels closer to a real tutor than a request-response transcript ever does.

The bigger picture

Why streaming is the baseline for modern chat

Visitors have been trained by ChatGPT, Claude, and Gemini to expect words to appear as the model is still thinking. When a WordPress chatbot reverts to a spinner and dumps the whole answer at once, the experience instantly feels years out of date even when the underlying model is the same. The fix is not faster inference, it is streaming.

Server-sent events let the server push tokens to the browser the moment the model emits them, which collapses perceived latency to whatever the first-token latency of the upstream provider happens to be. On hosted providers that number is usually well under a second. SleekAI ships streaming as the default, not an upsell.

Each chatbot has a toggle, the widget renders tokens as they arrive, partial responses are logged for audit, and the upstream connection is torn down cleanly when the visitor leaves the page. The result is a chatbot that feels like the tools your visitors already use, on a stack you fully own, without writing a single line of SSE plumbing yourself. That alignment with modern UX expectations matters more for engagement than almost any other implementation detail in a chatbot.

Questions

Common questions about SleekAI for Streaming Tokens

SleekAI opens a server-sent events connection from the PHP request to the upstream provider with stream: true in the request body. PHP flushes each chunk to the browser as it arrives. The chat widget consumes the EventSource on the client and appends tokens to the active assistant bubble until the [DONE] sentinel arrives.

 

Yes. Anthropic exposes streaming on the messages API with stream: true, and SleekAI handles the slightly different event shape (content_block_delta versus chat.completion.chunk) transparently. From the WordPress side the configuration is identical to a streaming OpenAI bot.

 

Some WordPress hosts run a WAF or reverse proxy that buffers responses and breaks SSE. SleekAI sends the standard headers (Content-Type: text/event-stream, Cache-Control: no-cache, X-Accel-Buffering: no) to tell Nginx and most CDNs to pass chunks through. If your stack still buffers, disable buffering on the chat endpoint route.

 

Yes. Each chatbot has a streaming toggle. Turn it off when the upstream provider does not support SSE, when you are debugging a structured output use case, or when a downstream tool expects a full JSON blob rather than a chunked response.

 

Most streaming providers send usage data in the final chunk before the [DONE] marker. SleekAI captures that and writes prompt and completion token counts into wp_sleek_ai_chats alongside the full reply text. On providers that omit usage, SleekAI estimates with the encoding for the configured model.

 

If the visitor disconnects mid-stream, SleekAI still writes a row to wp_sleek_ai_chats with whatever text was streamed so far and the tokens consumed up to that point. The row is marked as a partial response so you can filter for it when auditing failed conversations.

 

Object caching is fine, but full-page cache plugins can intercept the chat endpoint and cache the SSE response, which obviously breaks streaming. SleekAI registers the chat endpoint as non-cacheable through standard WP filters, and the docs include an exclude list snippet for WP Rocket, W3 Total Cache, and Litespeed.

 

Yes. The chat widget shows a typing indicator from the moment the request fires until the first token arrives, then switches to the streaming bubble. The indicator returns briefly during long inference pauses if the upstream stops sending tokens for more than two seconds.

 

Pricing

More than 1000+
happy customers

Explore our flexible licensing options tailored to your needs. Upgrade your license anytime to access more features, or opt for a lifetime license for ongoing value, including lifetime updates and lifetime support. Our hassle-free upgrade process ensures that our platform can grow with you, starting from whichever plan you choose.

Starter

€79

EUR

per year

  • 3 websites
  • 1 year of updates
  • 1 year of support

Pro

€149

EUR

per year

  • Unlimited websites
  • 1 year of updates
  • 1 year of support

Lifetime ♾️

Most popular

€249

EUR

once

  • Unlimited websites
  • Lifetime updates
  • Lifetime support

...or get the Bundle Deal
and save €250 🎁

The Bundle (unlimited sites)

Pay once, own it forever

Elevate your WordPress site with our exclusive plugin bundle that includes all of our premium plugins in one package. Enjoy lifetime updates and lifetime support. Save significantly compared to buying plugins individually.

What’s included

  • SleekAI

  • SleekByte

  • SleekMotion

  • SleekPixel

  • SleekRank

  • SleekView