AI Engineering

How I Built an Enterprise RAG Knowledge Intelligence Platform

A practical architecture for combining GitHub and Confluence knowledge using LangChain, Neo4j, and hybrid retrieval to reduce hallucination and improve developer productivity.

March 10, 20267 min read

RAG

LangChain

Neo4j

OpenAI

FastAPI

The problem

Engineering teams lose substantial time searching scattered information across code repositories, documentation systems, and tribal-knowledge channels. In practice, the question is rarely 'where is the file?' and more often 'what was the reasoning behind this module and who approved it?'.

Traditional keyword search misses intent, while pure vector search can return contextually similar but operationally irrelevant results. That mismatch becomes expensive when the answer needs to be trusted enough for a production decision.

The platform needed to answer technical questions across GitHub and Confluence without forcing developers to bounce between tools or read every document manually.

Architecture highlights

I designed a hybrid RAG stack with BM25 sparse retrieval plus MMR-based dense retrieval to maximize both precision and coverage. BM25 handles exact terminology well, while MMR helps avoid returning five nearly identical chunks that all say the same thing.

Instead of embedding raw code blindly, I introduced an LLM summarization layer for functions and modules before indexing, which reduced token overhead and improved retrieval quality. This was especially useful for large repositories where raw source files can overwhelm context windows.

A Neo4j knowledge graph models links between modules, authors, docs, and higher-level topics, enabling graph traversal for semantically related context rather than relying on embeddings alone.

The retrieval chain also included query rewriting and context compression so a long, messy question from a developer could still map to a concise, answerable retrieval request.

What went wrong

The first prototype used too much raw code in the retrieval corpus and produced noisy answers. It looked impressive in isolated tests, but real questions degraded quickly because the model had to sift through too much low-signal context.

Another early issue was trusting similarity search too much. Two snippets can be semantically similar while one is an implementation detail and the other is the decision-maker's note. That distinction mattered in almost every technical query.

Trade-offs and outcome

The trade-off was complexity: adding summarization and graph traversal made the pipeline more sophisticated, but it also gave us control over retrieval quality. The extra moving parts were worth it because the system had to be dependable, not just clever.

FastAPI APIs plus a Next.js developer portal created a clean workflow for search, Q and A, and context-aware exploration. The end result was a system that reduced manual navigation across documentation and source code while giving engineers a place to ask natural-language questions.

The main lesson: retrieval quality is not a single model problem. It is the product of corpus design, query rewriting, graph structure, prompt orchestration, and a willingness to remove bad context instead of stuffing more into the window.