RAG si LangChain: Un ghid complet pentru Retrieval-Augmented Generation

spinny:~/writing $ less rag-langchain-deep-dive.md

1 
2Modelele de limbaj mari (LLM-uri) precum GPT-4 si Claude sunt extraordinar de puternice, dar sufera de o limitare fundamentala: cunostintele lor sunt inghetate la momentul antrenamentului. **Retrieval-Augmented Generation (RAG)** rezolva exact aceasta problema combinand puterea generativa a LLM-urilor cu capacitatea de a recupera informatii din surse externe.
3 
4## Problema: Limitarile LLM
5 
61.  **Cunostinte statice**: Un LLM stie doar ce a vazut in timpul antrenamentului.
72.  **Halucinatii**: Cand un LLM nu stie raspunsul, tinde sa fabrice unul.
83.  **Fara acces la date private**: Un LLM generic nu are acces la documentatia interna a companiei tale.
9 
10## Ce este RAG?
11 
12RAG este o arhitectura care imbogateste promptul trimis unui LLM cu informatii recuperate dintr-o baza de cunostinte externa.
13 
14```mermaid
15graph LR
16    User["User"] -- "Question" --> Retriever
17    Retriever -- "Search relevant\ndocuments" --> VectorStore["Vector Store"]
18    VectorStore -- "Relevant\ndocuments" --> Retriever
19    Retriever -- "Context + Question" --> LLM
20    LLM -- "Grounded\nresponse" --> User
21```
22 
23## Cum functioneaza RAG in detaliu
24 
25### Faza 1: Indexare (Ingestia documentelor)
26 
27```mermaid
28graph TD
29    A["Documents\n(PDF, HTML, MD, DB)"] --> B["Document Loader"]
30    B --> C["Text Splitter"]
31    C --> D["Text Chunks"]
32    D --> E["Embedding Model"]
33    E --> F["Numerical Vectors"]
34    F --> G["Vector Store\n(ChromaDB, Pinecone, FAISS)"]
35```
36 
37### Faza 2: Recuperare + Generare
38 
391.  Intrebarea este transformata intr-un embedding.
402.  Vector Store gaseste cele mai similare fragmente.
413.  Fragmentele recuperate sunt inserate in prompt ca context.
424.  LLM-ul genereaza un raspuns bazat pe context.
43 
44## Construirea unui pipeline RAG cu LangChain
45 
46```bash
47pip install langchain langchain-openai langchain-community chromadb
48```
49 
50### Pasul 1: Incarcarea documentelor
51 
52```python
53from langchain_community.document_loaders import (
54    PyPDFLoader,
55    WebBaseLoader,
56    DirectoryLoader,
57    TextLoader,
58)
59 
60pdf_loader = PyPDFLoader("docs/manual.pdf")
61pdf_docs = pdf_loader.load()
62 
63web_loader = WebBaseLoader("https://docs.example.com/guide")
64web_docs = web_loader.load()
65 
66dir_loader = DirectoryLoader("./knowledge_base", glob="**/*.md", loader_cls=TextLoader)
67md_docs = dir_loader.load()
68 
69all_docs = pdf_docs + web_docs + md_docs
70```
71 
72### Pasul 2: Impartirea documentelor in fragmente
73 
74```python
75from langchain.text_splitter import RecursiveCharacterTextSplitter
76 
77text_splitter = RecursiveCharacterTextSplitter(
78    chunk_size=1000,
79    chunk_overlap=200,
80    separators=["\n\n", "\n", ". ", " ", ""],
81)
82 
83chunks = text_splitter.split_documents(all_docs)
84```
85 
86### Pasul 3: Crearea Embedding-urilor si Vector Store
87 
88```python
89from langchain_openai import OpenAIEmbeddings
90from langchain_community.vectorstores import Chroma
91 
92embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")
93 
94vectorstore = Chroma.from_documents(
95    documents=chunks,
96    embedding=embedding_model,
97    persist_directory="./chroma_db",
98)
99```
100 
101### Pasul 4: Crearea Retriever-ului
102 
103```python
104retriever = vectorstore.as_retriever(
105    search_type="similarity",
106    search_kwargs={"k": 4},
107)
108```
109 
110### Pasul 5: Construirea lantului RAG
111 
112```python
113from langchain_openai import ChatOpenAI
114from langchain_core.prompts import ChatPromptTemplate
115from langchain_core.runnables import RunnablePassthrough
116from langchain_core.output_parsers import StrOutputParser
117 
118llm = ChatOpenAI(model="gpt-4o", temperature=0)
119 
120prompt = ChatPromptTemplate.from_template("""
121Answer the question based only on the provided context.
122If the context does not contain enough information, say you don't know.
123 
124Context:
125{context}
126 
127Question: {question}
128 
129Answer:
130""")
131 
132def format_docs(docs):
133    return "\n\n".join(doc.page_content for doc in docs)
134 
135rag_chain = (
136    {"context": retriever | format_docs, "question": RunnablePassthrough()}
137    | prompt
138    | llm
139    | StrOutputParser()
140)
141 
142response = rag_chain.invoke("How does authentication work in the system?")
143print(response)
144```
145 
146## Tehnici RAG avansate
147 
148### Recuperare Multi-Query
149 
150```python
151from langchain.retrievers import MultiQueryRetriever
152 
153multi_retriever = MultiQueryRetriever.from_llm(
154    retriever=vectorstore.as_retriever(),
155    llm=llm,
156)
157```
158 
159### Compresie contextuala
160 
161```python
162from langchain.retrievers import ContextualCompressionRetriever
163from langchain.retrievers.document_compressors import LLMChainExtractor
164 
165compressor = LLMChainExtractor.from_llm(llm)
166compression_retriever = ContextualCompressionRetriever(
167    base_compressor=compressor,
168    base_retriever=retriever,
169)
170```
171 
172### Cautare hibrida
173 
174```python
175from langchain.retrievers import EnsembleRetriever
176from langchain_community.retrievers import BM25Retriever
177 
178bm25_retriever = BM25Retriever.from_documents(chunks)
179bm25_retriever.k = 4
180 
181semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
182 
183hybrid_retriever = EnsembleRetriever(
184    retrievers=[bm25_retriever, semantic_retriever],
185    weights=[0.4, 0.6],
186)
187```
188 
189## Bune practici
190 
1911.  **Alege dimensiunea corecta a fragmentului**: Experimenteaza cu dimensiuni diferite (500-1500 tokeni).
1922.  **Foloseste metadate ale documentelor**: Adauga sursa, data si categoria ca metadate.
1933.  **Evalueaza calitatea**: Foloseste framework-uri precum [RAGAS](https://docs.ragas.io/).
1944.  **Gestioneaza actualizarile documentelor**: Implementeaza un pipeline de re-ingestie.
1955.  **Adauga un re-ranker**: Dupa recuperarea initiala, foloseste un model de re-clasificare.
196 
197## Concluzie
198 
199RAG a devenit arhitectura standard pentru construirea aplicatiilor AI care au nevoie de acces la cunostinte specifice si actualizate. LangChain simplifica semnificativ implementarea.
200 
201**Pasii urmatori:**
202- **Experimenteaza local**: Incepe cu ChromaDB si cateva documente.
203- **Exploreaza LangSmith**: Foloseste [LangSmith](https://smith.langchain.com/) pentru monitorizare.
204- **Incearca modele de embedding diferite**: Compara modele precum `text-embedding-3-small`, `text-embedding-3-large`.
205- **Consulta documentatia**: [Documentatia LangChain](https://python.langchain.com/docs/) este excelenta.
206

:RAG si LangChain: Un ghid complet pentru Retrieval-Augmented Generationlines 1-206 (END) — press q to close

2Modelele de limbaj mari (LLM-uri) precum GPT-4 si Claude sunt extraordinar de puternice, dar sufera de o limitare fundamentala: cunostintele lor sunt inghetate la momentul antrenamentului. **Retrieval-Augmented Generation (RAG)** rezolva exact aceasta problema combinand puterea generativa a LLM-urilor cu capacitatea de a recupera informatii din surse externe.

4## Problema: Limitarile LLM

61. **Cunostinte statice**: Un LLM stie doar ce a vazut in timpul antrenamentului.

72. **Halucinatii**: Cand un LLM nu stie raspunsul, tinde sa fabrice unul.

83. **Fara acces la date private**: Un LLM generic nu are acces la documentatia interna a companiei tale.

10## Ce este RAG?

12RAG este o arhitectura care imbogateste promptul trimis unui LLM cu informatii recuperate dintr-o baza de cunostinte externa.

14```mermaid

15graph LR

16 User["User"] -- "Question" --> Retriever

17 Retriever -- "Search relevant\ndocuments" --> VectorStore["Vector Store"]

18 VectorStore -- "Relevant\ndocuments" --> Retriever

19 Retriever -- "Context + Question" --> LLM

20 LLM -- "Grounded\nresponse" --> User

21```

23## Cum functioneaza RAG in detaliu

25### Faza 1: Indexare (Ingestia documentelor)

27```mermaid

28graph TD

29 A["Documents\n(PDF, HTML, MD, DB)"] --> B["Document Loader"]

30 B --> C["Text Splitter"]

31 C --> D["Text Chunks"]

32 D --> E["Embedding Model"]

33 E --> F["Numerical Vectors"]

34 F --> G["Vector Store\n(ChromaDB, Pinecone, FAISS)"]

35```

37### Faza 2: Recuperare + Generare

391. Intrebarea este transformata intr-un embedding.

402. Vector Store gaseste cele mai similare fragmente.

413. Fragmentele recuperate sunt inserate in prompt ca context.

424. LLM-ul genereaza un raspuns bazat pe context.

44## Construirea unui pipeline RAG cu LangChain

46```bash

47pip install langchain langchain-openai langchain-community chromadb

48```

50### Pasul 1: Incarcarea documentelor

52```python

53from langchain_community.document_loaders import (

54 PyPDFLoader,

55 WebBaseLoader,

56 DirectoryLoader,

57 TextLoader,

58)

60pdf_loader = PyPDFLoader("docs/manual.pdf")

61pdf_docs = pdf_loader.load()

63web_loader = WebBaseLoader("https://docs.example.com/guide")

64web_docs = web_loader.load()

66dir_loader = DirectoryLoader("./knowledge_base", glob="**/*.md", loader_cls=TextLoader)

67md_docs = dir_loader.load()

69all_docs = pdf_docs + web_docs + md_docs

70```

72### Pasul 2: Impartirea documentelor in fragmente

74```python

75from langchain.text_splitter import RecursiveCharacterTextSplitter

77text_splitter = RecursiveCharacterTextSplitter(

78 chunk_size=1000,

79 chunk_overlap=200,

80 separators=["\n\n", "\n", ". ", " ", ""],

81)

83chunks = text_splitter.split_documents(all_docs)

84```

86### Pasul 3: Crearea Embedding-urilor si Vector Store

88```python

89from langchain_openai import OpenAIEmbeddings

90from langchain_community.vectorstores import Chroma

92embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

94vectorstore = Chroma.from_documents(

95 documents=chunks,

96 embedding=embedding_model,

97 persist_directory="./chroma_db",

98)

99```

100

101### Pasul 4: Crearea Retriever-ului

102

103```python

104retriever = vectorstore.as_retriever(

105 search_type="similarity",

106 search_kwargs={"k": 4},

107)

108```

109

110### Pasul 5: Construirea lantului RAG

111

112```python

113from langchain_openai import ChatOpenAI

114from langchain_core.prompts import ChatPromptTemplate

115from langchain_core.runnables import RunnablePassthrough

116from langchain_core.output_parsers import StrOutputParser

117

118llm = ChatOpenAI(model="gpt-4o", temperature=0)

119

120prompt = ChatPromptTemplate.from_template("""

121Answer the question based only on the provided context.

122If the context does not contain enough information, say you don't know.

123

124Context:

125{context}

126

127Question: {question}

128

129Answer:

130""")

131

132def format_docs(docs):

133 return "\n\n".join(doc.page_content for doc in docs)

134

135rag_chain = (

136 {"context": retriever | format_docs, "question": RunnablePassthrough()}

137 | prompt

138 | llm

139 | StrOutputParser()

140)

141

142response = rag_chain.invoke("How does authentication work in the system?")

143print(response)

144```

145

146## Tehnici RAG avansate

147

148### Recuperare Multi-Query

149

150```python

151from langchain.retrievers import MultiQueryRetriever

152

153multi_retriever = MultiQueryRetriever.from_llm(

154 retriever=vectorstore.as_retriever(),

155 llm=llm,

156)

157```

158

159### Compresie contextuala

160

161```python

162from langchain.retrievers import ContextualCompressionRetriever

163from langchain.retrievers.document_compressors import LLMChainExtractor

164

165compressor = LLMChainExtractor.from_llm(llm)

166compression_retriever = ContextualCompressionRetriever(

167 base_compressor=compressor,

168 base_retriever=retriever,

169)

170```

171

172### Cautare hibrida

173

174```python

175from langchain.retrievers import EnsembleRetriever

176from langchain_community.retrievers import BM25Retriever

177

178bm25_retriever = BM25Retriever.from_documents(chunks)

179bm25_retriever.k = 4

180

181semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

182

183hybrid_retriever = EnsembleRetriever(

184 retrievers=[bm25_retriever, semantic_retriever],

185 weights=[0.4, 0.6],

186)

187```

188

189## Bune practici

190

1911. **Alege dimensiunea corecta a fragmentului**: Experimenteaza cu dimensiuni diferite (500-1500 tokeni).

1922. **Foloseste metadate ale documentelor**: Adauga sursa, data si categoria ca metadate.

1933. **Evalueaza calitatea**: Foloseste framework-uri precum [RAGAS](https://docs.ragas.io/).

1944. **Gestioneaza actualizarile documentelor**: Implementeaza un pipeline de re-ingestie.

1955. **Adauga un re-ranker**: Dupa recuperarea initiala, foloseste un model de re-clasificare.

196

197## Concluzie

198

199RAG a devenit arhitectura standard pentru construirea aplicatiilor AI care au nevoie de acces la cunostinte specifice si actualizate. LangChain simplifica semnificativ implementarea.

200

201**Pasii urmatori:**

202- **Experimenteaza local**: Incepe cu ChromaDB si cateva documente.

203- **Exploreaza LangSmith**: Foloseste [LangSmith](https://smith.langchain.com/) pentru monitorizare.

204- **Incearca modele de embedding diferite**: Compara modele precum `text-embedding-3-small`, `text-embedding-3-large`.

205- **Consulta documentatia**: [Documentatia LangChain](https://python.langchain.com/docs/) este excelenta.

206