Architecture
Documents → Chunker → Embedder → Pinecone
↓
User Query → Embedder → Pinecone Search → Reranker → LLM → Answer
Chunking Strategy
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64,
separators=['\n\n', '\n', '. ', ' '],
)
Hybrid Search (BM25 + Dense)
BM25 for keyword match + cosine similarity for semantic — combine with RRF:
rrf_score = sum(1/(k + rank_i) for rank_i in [bm25_rank, dense_rank])
Hallucination Mitigation
- Cite source chunks in the answer
- Set temperature=0 for factual queries
- Check if answer tokens overlap with context (RAGAS faithfulness metric)