Document-Trained AI Chatbot for WordPress (PDF and DOCX)
Drop PDFs, DOCX files, and plain text into SleekAI. The plugin extracts and chunks them, stores embeddings in your WordPress database, and the chatbot retrieves passages at chat time using your OpenAI, Anthropic, Google, or OpenRouter key.
♾️ Lifetime License available
Half of your useful knowledge sits in files, not posts
Almost every business has a document library that is more useful than its website: a product manual, a sales playbook, a benefits PDF, a research paper, a compliance booklet. These files do not have URLs, they do not get indexed by search engines, and they certainly do not get read by chatbots crawling your sitemap. Making them queryable by a chatbot used to mean uploading to a vendor's vector store with its own dashboard, its own billing, and its own data residency story to argue about with security.
SleekAI's document mode keeps the documents and the index inside WordPress. Upload a PDF or DOCX through the plugin's Documents tab and it is parsed, chunked, embedded with your configured embedding model, and stored in a SleekAI table alongside source filename, page number, and chunk position. At chat time the retriever ranks chunks against the user's question and the top ones are inserted into the system prompt with their source markers. Citations link back to the document and the specific page, so when the bot says it answered from page 14 of the benefits guide, the user can verify.
The non-obvious part is metadata. SleekAI lets you tag each document with categories, restrict it by user role, and assign it to specific bots. A benefits PDF can be available only to logged-in employees through an internal bot; a public product brochure can be shared with the marketing bot. The same plugin runs both because document scope is part of the bot configuration, not a separate platform. When you replace a document with a new version, the old chunks are removed and the new ones reindexed without touching the bots that referenced the file.
Workflow
How document training runs end to end
Upload and tag
Extract and chunk
Embed and store
Retrieve and answer
Try it now
A chat over uploaded documents
Comparison
Generic chatbot vs SleekAI for document training
Generic chatbot
- Documents must be uploaded to a third-party vendor's store
- No fine-grained scope per document, per bot, or per role
- Citations point to a vendor URL, not your file and page
- Replacing a document leaves stale chunks in the index
- Compliance teams have to approve another data processor
SleekAI chatbot
- PDF, DOCX, TXT, and Markdown ingestion built into the plugin
- Chunks stored in WordPress with page number and filename metadata
- Per-document role and bot scope, not just account-wide access
- Reupload replaces chunks atomically, no stale fragments left
- Citations include filename and page number, linkable to the source
Features
What SleekAI gives you for Document-Trained Chatbot
Native file ingestion
Drop PDFs, DOCX, TXT, and Markdown into the Documents tab. SleekAI extracts text, chunks by configurable size and overlap, and stores chunks alongside filename, page number, and section heading metadata for retrieval.
Per-document scoping
Each document carries a category, a role allowlist, and a list of bots that can read it. A benefits PDF can stay invisible to public visitors and only feed the internal employee bot, without juggling two vector stores.
Atomic reupload
Upload a new version of the same document and SleekAI removes the previous chunks and embeddings before inserting the new ones. Citations always reflect the live version. No half-updated index, no stale facts.
Use cases
Where document training earns its budget
HR and benefits questions
Employees ask the same benefits and policy questions every year. A bot trained on the benefits PDF answers them instantly with the page number for verification.
Product manual lookups
Customers ask install or troubleshooting questions answered in the 200-page manual. The bot finds the right page and quotes it instead of asking them to download and search the PDF.
Research and reports
Consultancies upload past research deliverables and let staff query them in natural language. Citations point back to the original report and page so the lineage is clear.
The bigger picture
Why document training belongs on your own server
A document is often a more honest source than a web page. It is the version that legal signed off on, the manual the engineers actually maintain, the playbook the sales team prints out before the quarterly offsite. When a chatbot answers from those documents, the answer carries the weight of the source.
When a chatbot answers from a vendor's crawl of your marketing site, the answer carries the weight of whoever last updated that page. The strongest case for document training is content that intentionally does not live on the public web. Internal benefits guides, NDA-bound research, paid product manuals, regulated compliance booklets.
Sending those to a vendor's vector store invites a long compliance review and an annual security questionnaire. Keeping them inside WordPress lets you reuse the access controls and the backup story you already have. Roles and capabilities already determine who sees which content; documents inherit the same model.
Backups already cover wp-content and the database; document chunks and embeddings are inside both. The chatbot becomes a feature of the site rather than a separate service to govern. That is the difference between a tool you can deploy in a week and one that lives in legal review for a quarter.
Questions
Common questions about SleekAI for Document-Trained Chatbot
PDF (text and scanned with OCR), DOCX, DOC, TXT, Markdown, and HTML are supported out of the box. CSV is supported for tabular data with optional column-aware chunking. Scanned PDFs go through OCR before chunking, which is configurable per upload to balance speed against accuracy.
 
Documents are stored in wp-content/uploads/sleek-ai/documents with WordPress's standard private-file protections. Chunks and embeddings are stored in dedicated SleekAI tables in your WordPress database. Optionally, embeddings can be pushed to an external vector store (Pinecone, Qdrant) but the default keeps everything inside WP.
There is no hard limit, but practical performance starts to degrade past a few thousand pages per document because chunking and embedding take time. SleekAI runs the ingestion job in the background via WordPress cron or an action scheduler queue, so large uploads do not block the admin.
 Yes. Each bot's settings include a document scope: all documents, a category, an explicit allowlist, or a tag-based filter. A common pattern is a public marketing bot scoped to brochures and a private employee bot scoped to HR and policy documents on the same site.
 Upload a new file with the same identifier (filename or document slug) and SleekAI replaces the chunks and embeddings atomically. The bot starts citing the new version on the next chat turn. There is no separate reindex step to remember; the upload action triggers it.
 Yes. Every chunk carries filename, page number, and section heading metadata. The system prompt instructs the model to cite the source inline like 'source: benefits-2026.pdf, page 12' and the widget renders a Sources block under each reply. Clicking the citation downloads the file at the relevant page when the browser supports PDF page anchors.
 The system prompt instructs the model to say it cannot find the answer in the uploaded documents rather than guessing. You can configure the fallback per bot: a generic 'contact support' message, a handoff to a contact form, or a soft retry that searches related categories before giving up.
 Yes. SleekAI uses Tesseract by default for OCR with options to swap in a different OCR backend through a filter. Per-upload settings let you choose whether to OCR the file, which language model to use, and whether to keep both the OCR text and the original scanned image bytes accessible from the document's metadata page.
 Pricing
More than 1000+
happy customers
Explore our flexible licensing options tailored to your needs. Upgrade your license anytime to access more features, or opt for a lifetime license for ongoing value, including lifetime updates and lifetime support. Our hassle-free upgrade process ensures that our platform can grow with you, starting from whichever plan you choose.
Lifetime ♾️
Most popular
EUR
once
- Unlimited websites
- Lifetime updates
- Lifetime support
...or get the Bundle Deal
and save €250 🎁
The Bundle (unlimited sites)
Pay once, own it forever
Elevate your WordPress site with our exclusive plugin bundle that includes all of our premium plugins in one package. Enjoy lifetime updates and lifetime support. Save significantly compared to buying plugins individually.
What’s included
-
SleekAI
-
SleekByte
-
SleekMotion
-
SleekPixel
-
SleekRank
-
SleekView
€749
Continue to checkoutBrowse more
- Gift Recommendations
- Product Comparison
- Customer Support
- feature pages
- Survey Feedback Chatbot
- Invoice Lookup Chatbot
- partner program pages
- Tier 1 Tech Support
- Newsletter Signup Chatbot
- Homework Help Chatbot
- Portfolio Sites
- Donation Collection Chatbot
- demo request pages
- Shipping Tracker Chatbot
- Discovery Call Pre Qualification