RFC: Query a Knowledge base from Ketty
Context
Booksprints (a Ketty Community member sponsoring this feature) facilitate writing a book in 5 days. The organisations that they work with have existing content that they access and use during the writing process. It would be beneficial to access this knowledge base (KB) from within the relevant chapter and query it via prompts during the writing process.
A knowledge base may include various kinds of structured and unstructured data; text-based materials such as manuals and books; and images, audio, and video content.
Here's a basic example prompt and response, assuming I'm writing a company handbook for an organisation that manufactures cars:
- Prompt: How many models of cars does XYZ company make?
- Response: XYZ company makes 20 models are cars
Beyond Booksprints' use case, any author relies on other primary and secondary sources of information at some stage during their writing phase so this feature proposal extends to multiple use cases.
Proposal
Within Ketty:
- The KB is associated at the book level.
- The book owner controls the use of KB by giving collaborators with edit-access to the book the same rights to the KB associated with the book.
- The book owner and collaborators with edit-access interact with the KB via the existing AI assistant writing tool (see docs here).
Design
User flows
Implementation (if applicable)
Since KB approaches are new and quickly evolving, we'll iterate fast outside of Ketty (in Coko's AI design studio), and then integrate into Ketty.
- 'OpenAI's Embeddings' API to create vectors from documents
- 'pgvector' extension to add and retrieve resources to the knowledgeBase (cosine similarity search), we may need to use raw() to perform vector operations with knex.
- 'HNSW' for indexing (to avoid querying the whole kb)
- After retrieving the text (in case of docx or pdf) we should split it into chunks using JS and then store (embeddings limit is about 30000 words(10 full pages))
Some useful links: 'pgvector' : Repo: useful docs including integration with embeddings api
'HNSW' : video
'possible OpenSource alternatives' : video
Tasks:
-
enable and disable KB flow -- @victormutai -
add and delete files in KB -- @MRdevTagg -
query KB -- @MRdevTagg
Alternative approaches (if applicable)
- Set up a knowledge base via Llama Index (https://www.llamaindex.ai/) or similar.
- Interacting with the knowledge base could also happen via a chatbot, this can possibly happen in later iterations: chatbot to ask the knowledge base questions, then users can copy/paste relevant responses into the Chapter of the Producer page.
Open issues (if applicable)
[Links to and a discussion of related issues, if applicable.]