Building a Production RAG System with LangChain and Pinecone
Architecture and code for a production RAG system — chunking strategies, embedding models, hybrid search, reranking, and hallucination mitigation.
Deep-dive articles on machine learning, AI engineering, and production ML systems
Architecture and code for a production RAG system — chunking strategies, embedding models, hybrid search, reranking, and hallucination mitigation.
How I built a production WhatsApp AI agent for a Moroccan e-commerce business — architecture, conversation memory, product catalog Q&A, and order tracking.
Chain-of-thought, few-shot, system prompts, JSON mode, and 5 more patterns with real examples from production LLM applications.
Orchestrator-worker, peer-to-peer, and hierarchical multi-agent architectures — when to use each, communication patterns, and failure recovery.
Setting up Ollama for production use — model selection, API integration, performance tuning, and running Llama 3.1 on-premise for data privacy.
A practical benchmark of the top vector databases — indexing speed, query latency, filtering, scalability, and when to use each for RAG applications.
I build custom ML models, AI agents, computer vision, and automation — from idea to production.