Artwork

Keith Bourne द्वारा प्रदान की गई सामग्री. एपिसोड, ग्राफिक्स और पॉडकास्ट विवरण सहित सभी पॉडकास्ट सामग्री Keith Bourne या उनके पॉडकास्ट प्लेटफ़ॉर्म पार्टनर द्वारा सीधे अपलोड और प्रदान की जाती है। यदि आपको लगता है कि कोई आपकी अनुमति के बिना आपके कॉपीराइट किए गए कार्य का उपयोग कर रहा है, तो आप यहां बताई गई प्रक्रिया का पालन कर सकते हैं https://hi.player.fm/legal
Player FM - पॉडकास्ट ऐप
Player FM ऐप के साथ ऑफ़लाइन जाएं!

RAG Deep Dive: Building AI Systems That Actually Know Your Data (Chapter 1-3)

20:45
 
साझा करें
 

Manage episode 523808526 series 3705596
Keith Bourne द्वारा प्रदान की गई सामग्री. एपिसोड, ग्राफिक्स और पॉडकास्ट विवरण सहित सभी पॉडकास्ट सामग्री Keith Bourne या उनके पॉडकास्ट प्लेटफ़ॉर्म पार्टनर द्वारा सीधे अपलोड और प्रदान की जाती है। यदि आपको लगता है कि कोई आपकी अनुमति के बिना आपके कॉपीराइट किए गए कार्य का उपयोग कर रहा है, तो आप यहां बताई गई प्रक्रिया का पालन कर सकते हैं https://hi.player.fm/legal

In this episode, we take a deep technical dive into Retrieval-Augmented Generation (RAG), drawing heavily from Keith Bourne's book Unlocking Data with Generative AI and RAG. We explore why RAG has become indispensable for enterprise AI systems, break down the core architecture, and share practical implementation guidance for engineers building production-grade pipelines.

What We Cover

The Problem RAG Solves

No matter how advanced LLMs become—GPT, Llama, Gemini, Claude—they fundamentally lack access to your private, proprietary, or real-time data. RAG bridges this gap by combining LLM reasoning with dynamic retrieval of relevant information.

Why RAG Is Exploding Now

  • Context windows have grown dramatically (Llama 4 Scout handles up to 10M tokens)
  • The ecosystem has matured—LangChain alone hit 70M monthly downloads in May 2025
  • Infrastructure for vector storage and retrieval is production-ready

The Three-Stage Architecture

  1. Indexing: Convert documents into vector embeddings and store in a vector database
  2. Retrieval: Embed user queries and perform similarity search to find relevant chunks
  3. Generation: Feed retrieved context into an LLM prompt to generate grounded responses

RAG vs. Fine-Tuning

We compare trade-offs between augmenting at inference time versus modifying model weights, and discuss hybrid approaches that combine both.

Implementation Deep Dive

  • Data ingestion and preprocessing strategies
  • Chunking with RecursiveCharacterTextSplitter (1,000 tokens, 200 overlap)
  • Embedding models and vector databases (Chroma DB, Pinecone, Weaviate)
  • Pipeline orchestration with LangChain Expression Language (LCEL)
  • Source citation patterns for compliance and auditability

Real-World Applications

Customer support chatbots, financial advisory systems, healthcare recommendations, ecommerce personalization, and internal knowledge bases.

Open Challenges

  • "Lost in the middle" effect with long contexts
  • Multiple needles problem
  • Hallucination verification
  • Unstructured data preprocessing complexity

Tools & Technologies Mentioned

  • LangChain & LlamaIndex
  • Chroma DB, Pinecone, Weaviate
  • OpenAI Embeddings
  • NumPy, Beautiful Soup
  • Meta Llama, Google Gemini, Anthropic Claude, OpenAI GPT

Book Reference

Unlocking Data with Generative AI and RAG (2nd Edition) by Keith Bourne — available on Amazon. The book includes detailed diagrams, thorough explanations, and hands-on code labs for building production RAG systems.

Find Keith Bourne on LinkedIn.

Brought to You By

Memriq — An AI content studio building practical resources for AI practitioners. Visit Memriq.ai for more engineering-focused AI content.

  continue reading

22 एपिसोडस

Artwork
iconसाझा करें
 
Manage episode 523808526 series 3705596
Keith Bourne द्वारा प्रदान की गई सामग्री. एपिसोड, ग्राफिक्स और पॉडकास्ट विवरण सहित सभी पॉडकास्ट सामग्री Keith Bourne या उनके पॉडकास्ट प्लेटफ़ॉर्म पार्टनर द्वारा सीधे अपलोड और प्रदान की जाती है। यदि आपको लगता है कि कोई आपकी अनुमति के बिना आपके कॉपीराइट किए गए कार्य का उपयोग कर रहा है, तो आप यहां बताई गई प्रक्रिया का पालन कर सकते हैं https://hi.player.fm/legal

In this episode, we take a deep technical dive into Retrieval-Augmented Generation (RAG), drawing heavily from Keith Bourne's book Unlocking Data with Generative AI and RAG. We explore why RAG has become indispensable for enterprise AI systems, break down the core architecture, and share practical implementation guidance for engineers building production-grade pipelines.

What We Cover

The Problem RAG Solves

No matter how advanced LLMs become—GPT, Llama, Gemini, Claude—they fundamentally lack access to your private, proprietary, or real-time data. RAG bridges this gap by combining LLM reasoning with dynamic retrieval of relevant information.

Why RAG Is Exploding Now

  • Context windows have grown dramatically (Llama 4 Scout handles up to 10M tokens)
  • The ecosystem has matured—LangChain alone hit 70M monthly downloads in May 2025
  • Infrastructure for vector storage and retrieval is production-ready

The Three-Stage Architecture

  1. Indexing: Convert documents into vector embeddings and store in a vector database
  2. Retrieval: Embed user queries and perform similarity search to find relevant chunks
  3. Generation: Feed retrieved context into an LLM prompt to generate grounded responses

RAG vs. Fine-Tuning

We compare trade-offs between augmenting at inference time versus modifying model weights, and discuss hybrid approaches that combine both.

Implementation Deep Dive

  • Data ingestion and preprocessing strategies
  • Chunking with RecursiveCharacterTextSplitter (1,000 tokens, 200 overlap)
  • Embedding models and vector databases (Chroma DB, Pinecone, Weaviate)
  • Pipeline orchestration with LangChain Expression Language (LCEL)
  • Source citation patterns for compliance and auditability

Real-World Applications

Customer support chatbots, financial advisory systems, healthcare recommendations, ecommerce personalization, and internal knowledge bases.

Open Challenges

  • "Lost in the middle" effect with long contexts
  • Multiple needles problem
  • Hallucination verification
  • Unstructured data preprocessing complexity

Tools & Technologies Mentioned

  • LangChain & LlamaIndex
  • Chroma DB, Pinecone, Weaviate
  • OpenAI Embeddings
  • NumPy, Beautiful Soup
  • Meta Llama, Google Gemini, Anthropic Claude, OpenAI GPT

Book Reference

Unlocking Data with Generative AI and RAG (2nd Edition) by Keith Bourne — available on Amazon. The book includes detailed diagrams, thorough explanations, and hands-on code labs for building production RAG systems.

Find Keith Bourne on LinkedIn.

Brought to You By

Memriq — An AI content studio building practical resources for AI practitioners. Visit Memriq.ai for more engineering-focused AI content.

  continue reading

22 एपिसोडस

सभी एपिसोड

×
 
Loading …

प्लेयर एफएम में आपका स्वागत है!

प्लेयर एफएम वेब को स्कैन कर रहा है उच्च गुणवत्ता वाले पॉडकास्ट आप के आनंद लेंने के लिए अभी। यह सबसे अच्छा पॉडकास्ट एप्प है और यह Android, iPhone और वेब पर काम करता है। उपकरणों में सदस्यता को सिंक करने के लिए साइनअप करें।

 

त्वरित संदर्भ मार्गदर्शिका

अन्वेषण करते समय इस शो को सुनें
प्ले