It’s Time to Bury the Document

By Jarrod Davis | Date Published: July 08, 2024 - Last Updated September 26, 2024 | Comments

Knowledge Management as we know it is not just being disrupted by vectorization, vector search and LLMs, but they may be the final nails in its coffin. I’ve previously gone into detail about how and why this is the case, so I’d like to look towards the future in which the document is no longer the basic building block of KM systems, but instead the “chunk” (as beautiful a name as it is).

The document is such a foundational piece of both the analog and digital world that it goes completely unquestioned. Yet, the digital incarnation of the document simply transferred an analog concept and technology into a new form, without considering the differences in capability, format and delivery and the new possibilities stemming from those differences.

Why Documents Suck

When we learned to write essays in school, we were taught to structure them clearly into sections, arguments and evidence. The idea that knowledge has to be organized and that it relates to other pieces of information is clear. But we’re still thinking about that in terms of a piece of paper.

Consider how we consume information today, in tweets, FAQs, snippets and short posts.

What’s the Purpose of a Document?

An essay or book is a coherent long-form piece of information meant to be read in its entirety from beginning to end. Customer support information such as a return policy, product technical specifications or compatibility chart is not. It’s meant to be queried against with a specific question that has a clear and usually short answer.

Usability Limitations

Like mold in a humid environment, documents quickly grow and they both usually stink. With the ability to continuously write more, the length, readability and accuracy of digital documents suffer. When lengthy, they are difficult to navigate particularly in the context of finding the answer to a specific question. Yes, we have CTRL-F but what happens when you search with different words than are used in the document, despite it being the same idea? I can’t be the only one who has nightmares of trying to find a critical answer in a massive PDF!

Documents are simply an inefficient means of storing data that should be actionable and is not meant to be consumed linearly or in its entirety.

FAQs were one step towards this as are HTML formatted documents which add improved navigation options.

Consider your last Google search. Google moved from returning full documents (or web pages) to simply snippets which directly answered your question. Raise your hand if you are hoping to find a massive wall of text when you search for something. I won’t even wait for you to react because literally nobody wants that, especially not your customers and agents.

Goodbye Documents, Hello Snippets & Chunks

So, documents suck, we’re supposed to use these so-called chunks. What are they exactly? A chunk is a unit of knowledge extracted from a knowledge source like a Word document, PDF, PowerPoint or similar source. Chunks are small, self-contained pieces of information can be processed and managed effectively. They’re designed to be consumed in small portions, perhaps along with another chunk or two, but not as an entire “document”. Think of documents like a massive main course and chunks like tapas.

For instance, a chunk can represent a single paragraph, a sentence, or even a smaller unit of text from a document. By dividing the content into chunks, you gain better granularity, enabling that knowledge to be analyzed more easily and quickly and thus it can be identified and delivered to answer customer queries more efficiently. The same however goes for agent assistance scenarios where a human agent needs instant answers instead of “Please hold (while I switch to another tab and depressingly scroll through a Great Wall of text)”.

From a technical perspective, managing knowledge in chunks improves the system's ability to match the right information with customer questions, resulting in more accurate and contextually appropriate responses. Chunks can have associated metadata such as a title, topic tag, source and more.

A document is to a chunk as a cow is to a steak. Which one do you want to be served at a restaurant?

Time to Align the Input with the Output

Hence, the next generation of knowledge management needs to finally align the final format used in delivery, with the structure and format it’s initially created and stored in. Some KM software has already moved in this direction to their credit, but we’ve got a long way to go.

Implications & Questions for Knowledge Managers

We’ll still need knowledge managers regardless of what technology comes and goes. The most Sci-Fi search technology in the world will only spew garbage if the data quality is bad (whether in documents or chunks). Hence, editors, knowledge managers and the like are crucial to maintaining quality, designing and implementing approval processes, following brand guidelines and ensuring accuracy and the like. However, here are just a few points worth considering:

How will chunk-based approval processes differ from document-based?

How will you manage chunks with expiration dates or that should be regularly reviewed?

How will the change to chunks vs. documents affect the role of your content or KM team?

How will you balance machine-readable vs. human usable text?

The impact of graph-based information organization vs. traditional folder hierarchies

Thus, the next generation of knowledge management software will separate management from delivery, most likely using completely different solutions for each and likely not even come from the same vendor. On the delivery side, we’ll use a combination of NLU, LLMs and vectorization. At the same time, they currently lack sufficient management features that are so critical to ensuring accuracy, quality and usability.

Traditional KM is the equivalent of a condemned building. You may manage to keep living in it for a while, but at some point, it’s going to collapse. Knowledge delivery technology has now leapt so far ahead, we’ve yet to see the management side catch-up or new solutions developed. But the writing is on the wall. The future of KM lies in leveraging AI to deliver precise, context-aware, and personalized knowledge in real-time, fundamentally changing how support information is both managed and delivered.

Jarrod Davis’s goal in life is being able to talk to multiple people in support and never having to repeat himself. When not evangelizing conversational AI, he can usually be found living in southern Germany as an American expat. He enjoys barbecuing (decently), fly fishing (poorly), and reading about history. He currently works at Cognigy in product marketing.

Topics: Customer Experience

Recent Articles

Subscribe now to ICMI Contact Center Insider!

TechTarget and Informa Tech’s Digital Business Combine.