A chatbot connected to an unvetted knowledge base fabricates answers 52% of the time. The same chatbot, given curated content, drops that to near zero. Same model. Same retrieval pipeline. Different data.

This is from a 2026 IEEE study out of the University of Illinois Chicago that tested what happens when you feed a RAG system ungoverned documentation. The fabrication rate on unvetted content was 52%. On curated, certified content — near zero.

The difference was entirely in what the chatbot was reading.

The Model Isn't the Problem

Most business owners I talk to think chatbot quality is a model problem. Pick the wrong LLM and you get bad answers. Switch to GPT-5 or Claude 4 or whatever ships next week and the answers get better.

That's not how modern chatbots work.

A RAG (Retrieval-Augmented Generation) chatbot doesn't generate answers from its training data. It searches your knowledge base, pulls the most relevant document, and generates an answer from that document. The LLM is a skilled reader — it summarises what it finds. If the document is wrong, the answer is wrong. Confidently wrong.

The model has no way to know the document it retrieved was written two years ago. It cannot verify whether the button it describes still exists. It reads what's there and generates accordingly.

Prompt engineering cannot fix this. A better model cannot fix this. You can tune tone, length, and format through prompts. You cannot prompt your way to accurate answers from bad source documents. The accuracy ceiling is set by the knowledge base, not the model.

The Four Failure Modes

The same IEEE study identified four dimensions that determine whether a knowledge base produces trustworthy answers or confident fabrications.

Accuracy. Is the source factually reliable? When your knowledge base contains outdated pricing, deprecated workflows, or instructions for features that were redesigned last quarter, the chatbot amplifies those errors at scale. A human agent might know to ignore the old FAQ page. The chatbot doesn't.

Freshness. Is the knowledge current? Documentation decays silently. A product ships a UI change. The corresponding help article stays in the knowledge base. The chatbot keeps retrieving and generating instructions for an interface that no longer exists. You don't notice until customers start complaining.

Completeness. Does the context cover what the LLM needs? A 2,000-word article covering five features gets broken into chunks by the retrieval system. Each chunk is a separate search unit. When those chunks contain mixed topics, the chatbot retrieves a fragment that's only partially relevant and fills in the gaps with whatever the model guesses.

Classification. Should this document be retrieved at all? Internal training docs, draft policies, and legacy content mixed into a knowledge base produce answers that cite the wrong authority. The same chunking that fragments articles also strips access context. The chatbot doesn't know a document was only intended for staff.

The study found that fixing these four dimensions through metadata enrichment alone lifted RAG precision from 73.3% to 82.5%. No changes to the retrieval algorithm. No model swap. Just governed data.

What You've Been Calling "Chatbot Issues" Is Content Debt

There's a term for this now: content debt. It's the gap between what your organisation knows and what its AI can reliably retrieve.

Content debt existed before you deployed the chatbot. It was invisible because humans hid it.

A customer service rep encounters a contradiction between the policy doc and the website. The rep knows to trust the policy doc. The customer never sees the conflict. A loan officer knows that the SharePoint version of the income verification form is the one to use, not the old binder under the desk. Human judgment filters the mess.

The chatbot removes that error-correction layer. It retrieves everything that matches, treats all sources as equally valid, and synthesises an answer from whatever it finds. The contradictions that humans used to buffer are now visible to every user.

The chatbot exposed a problem that was already there.

This is why 60% of AI projects are expected to be abandoned through 2026. Not because the models aren't good enough. Because the data feeding them isn't ready.

What This Means for Your Business

If you run a chatbot on your website, or you're thinking about it, here's what actually determines whether it works:

  • Your knowledge base needs an owner. Someone who decides what goes in, what gets removed, and when content is stale enough to archive. Not a committee. One person.
    • Single-topic documents outperform long articles. A focused document covering exactly one process retrieves precisely. A 3,000-word product manual retrieves fragments that confuse the model. Break it up.
      • Answer-first structure matters. Language models weight early content more heavily. Put the answer in the first 60 words. Background goes after.
        • Freshness SLAs are not optional. Every document needs a last-verified date and a review cycle. For high-velocity businesses, quarterly is too slow.
          • Archiving is more important than creating. Old content that stays in the knowledge base is an accuracy liability. Sunset it. Don't just leave it.
          • The Opportunity

            The businesses that get real value from AI chatbots are not the ones using the most sophisticated models. They are the ones that treat their knowledge base as infrastructure. Governed, maintained, and fed by a continuous improvement loop.

            That level of curation takes work. But the alternative is a chatbot that sounds confident, looks professional, and gives wrong answers to everyone who trusts it.

            We build chatbots at Kern. We also build the knowledge infrastructure that makes them accurate. If you want a chatbot that actually answers your customers' questions — not one that fabricates answers 52% of the time — start here.

            The model is ready. The question is whether your data is.


            AM
            Armin Marxer

            Founder of Kern, CoolMinds, and MFTPlus. 30 years building systems that don't have off-the-shelf answers. Writes at zeroclue.dev.

            Frequently Asked Questions

            Why does my chatbot give wrong answers if I'm using a good AI model?

            Because the answers come from your knowledge base, not the model's training data. A RAG chatbot retrieves documents and generates answers from them. If the documents are outdated, conflicting, or incomplete, the model will faithfully reproduce those problems. A better model doesn't fix bad source data.

            How often should I update my chatbot's knowledge base?

            Every time your business changes. New pricing, updated policies, discontinued services, or product changes all need to be reflected in the knowledge base. For most businesses, a quarterly review is a baseline. For high-velocity businesses — ecommerce, real estate — monthly is safer. Stale knowledge is the most common cause of chatbot complaints.

            What's the difference between a scripted chatbot and an AI chatbot?

            A scripted chatbot follows decision trees you build manually. An AI chatbot (RAG) searches your knowledge base and generates answers from what it finds. The AI chatbot is more flexible and handles complex questions, but it's only as accurate as the documents it retrieves. The scripted chatbot is limited but predictable. Both fail if the underlying content is wrong.

            How many documents does my knowledge base need?

            Fewer than you think. A focused set of 30-50 well-written, single-topic documents outperforms a library of 500 poorly organised ones. Depth and clarity matter more than volume. Every additional document that isn't governed makes retrieval slightly worse, not better.