RAG si LangChain: Un ghid complet pentru Retrieval-Augmented Generation

spinny:~/writing $ vim rag-langchain-deep-dive.md

1~
2Modelele de limbaj mari (LLM-uri) precum GPT-4 si Claude sunt extraordinar de puternice, dar sufera de o limitare fundamentala: cunostintele lor sunt inghetate la momentul antrenamentului. **Retrieval-Augmented Generation (RAG)** rezolva exact aceasta problema combinand puterea generativa a LLM-urilor cu capacitatea de a recupera informatii din surse externe.
3~
4## Problema: Limitarile LLM
5~
61.  **Cunostinte statice**: Un LLM stie doar ce a vazut in timpul antrenamentului.
72.  **Halucinatii**: Cand un LLM nu stie raspunsul, tinde sa fabrice unul.
83.  **Fara acces la date private**: Un LLM generic nu are acces la documentatia interna a companiei tale.
9~
10## Ce este RAG?
11~
12RAG este o arhitectura care imbogateste promptul trimis unui LLM cu informatii recuperate dintr-o baza de cunostinte externa.
13~
14```mermaid
15graph LR
16    User["User"] -- "Question" --> Retriever
17    Retriever -- "Search relevant\ndocuments" --> VectorStore["Vector Store"]
18    VectorStore -- "Relevant\ndocuments" --> Retriever
19    Retriever -- "Context + Question" --> LLM
20    LLM -- "Grounded\nresponse" --> User
21```
22~
23## Cum functioneaza RAG in detaliu
24~
25### Faza 1: Indexare (Ingestia documentelor)
26~
27```mermaid
28graph TD
29    A["Documents\n(PDF, HTML, MD, DB)"] --> B["Document Loader"]
30    B --> C["Text Splitter"]
31    C --> D["Text Chunks"]
32    D --> E["Embedding Model"]
33    E --> F["Numerical Vectors"]
34    F --> G["Vector Store\n(ChromaDB, Pinecone, FAISS)"]
35```
36~
37### Faza 2: Recuperare + Generare
38~
391.  Intrebarea este transformata intr-un embedding.
402.  Vector Store gaseste cele mai similare fragmente.
413.  Fragmentele recuperate sunt inserate in prompt ca context.
424.  LLM-ul genereaza un raspuns bazat pe context.
43~
44## Construirea unui pipeline RAG cu LangChain
45~
46```bash
47pip install langchain langchain-openai langchain-community chromadb
48```
49~
50### Pasul 1: Incarcarea documentelor
51~
52```python
53from langchain_community.document_loaders import (
54    PyPDFLoader,
55    WebBaseLoader,
56    DirectoryLoader,
57    TextLoader,
58)
59~
60pdf_loader = PyPDFLoader("docs/manual.pdf")
61pdf_docs = pdf_loader.load()
62~
63web_loader = WebBaseLoader("https://docs.example.com/guide")
64web_docs = web_loader.load()
65~
66dir_loader = DirectoryLoader("./knowledge_base", glob="**/*.md", loader_cls=TextLoader)
67md_docs = dir_loader.load()
68~
69all_docs = pdf_docs + web_docs + md_docs
70```
71~
72### Pasul 2: Impartirea documentelor in fragmente
73~
74```python
75from langchain.text_splitter import RecursiveCharacterTextSplitter
76~
77text_splitter = RecursiveCharacterTextSplitter(
78    chunk_size=1000,
79    chunk_overlap=200,
80    separators=["\n\n", "\n", ". ", " ", ""],
81)
82~
83chunks = text_splitter.split_documents(all_docs)
84```
85~
86### Pasul 3: Crearea Embedding-urilor si Vector Store
87~
88```python
89from langchain_openai import OpenAIEmbeddings
90from langchain_community.vectorstores import Chroma
91~
92embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")
93~
94vectorstore = Chroma.from_documents(
95    documents=chunks,
96    embedding=embedding_model,
97    persist_directory="./chroma_db",
98)
99```
100~
101### Pasul 4: Crearea Retriever-ului
102~
103```python
104retriever = vectorstore.as_retriever(
105    search_type="similarity",
106    search_kwargs={"k": 4},
107)
108```
109~
110### Pasul 5: Construirea lantului RAG
111~
112```python
113from langchain_openai import ChatOpenAI
114from langchain_core.prompts import ChatPromptTemplate
115from langchain_core.runnables import RunnablePassthrough
116from langchain_core.output_parsers import StrOutputParser
117~
118llm = ChatOpenAI(model="gpt-4o", temperature=0)
119~
120prompt = ChatPromptTemplate.from_template("""
121Answer the question based only on the provided context.
122If the context does not contain enough information, say you don't know.
123~
124Context:
125{context}
126~
127Question: {question}
128~
129Answer:
130""")
131~
132def format_docs(docs):
133    return "\n\n".join(doc.page_content for doc in docs)
134~
135rag_chain = (
136    {"context": retriever | format_docs, "question": RunnablePassthrough()}
137    | prompt
138    | llm
139    | StrOutputParser()
140)
141~
142response = rag_chain.invoke("How does authentication work in the system?")
143print(response)
144```
145~
146## Tehnici RAG avansate
147~
148### Recuperare Multi-Query
149~
150```python
151from langchain.retrievers import MultiQueryRetriever
152~
153multi_retriever = MultiQueryRetriever.from_llm(
154    retriever=vectorstore.as_retriever(),
155    llm=llm,
156)
157```
158~
159### Compresie contextuala
160~
161```python
162from langchain.retrievers import ContextualCompressionRetriever
163from langchain.retrievers.document_compressors import LLMChainExtractor
164~
165compressor = LLMChainExtractor.from_llm(llm)
166compression_retriever = ContextualCompressionRetriever(
167    base_compressor=compressor,
168    base_retriever=retriever,
169)
170```
171~
172### Cautare hibrida
173~
174```python
175from langchain.retrievers import EnsembleRetriever
176from langchain_community.retrievers import BM25Retriever
177~
178bm25_retriever = BM25Retriever.from_documents(chunks)
179bm25_retriever.k = 4
180~
181semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
182~
183hybrid_retriever = EnsembleRetriever(
184    retrievers=[bm25_retriever, semantic_retriever],
185    weights=[0.4, 0.6],
186)
187```
188~
189## Bune practici
190~
1911.  **Alege dimensiunea corecta a fragmentului**: Experimenteaza cu dimensiuni diferite (500-1500 tokeni).
1922.  **Foloseste metadate ale documentelor**: Adauga sursa, data si categoria ca metadate.
1933.  **Evalueaza calitatea**: Foloseste framework-uri precum [RAGAS](https://docs.ragas.io/).
1944.  **Gestioneaza actualizarile documentelor**: Implementeaza un pipeline de re-ingestie.
1955.  **Adauga un re-ranker**: Dupa recuperarea initiala, foloseste un model de re-clasificare.
196~
197## Concluzie
198~
199RAG a devenit arhitectura standard pentru construirea aplicatiilor AI care au nevoie de acces la cunostinte specifice si actualizate. LangChain simplifica semnificativ implementarea.
200~
201**Pasii urmatori:**
202- **Experimenteaza local**: Incepe cu ChromaDB si cateva documente.
203- **Exploreaza LangSmith**: Foloseste [LangSmith](https://smith.langchain.com/) pentru monitorizare.
204- **Incearca modele de embedding diferite**: Compara modele precum `text-embedding-3-small`, `text-embedding-3-large`.
205- **Consulta documentatia**: [Documentatia LangChain](https://python.langchain.com/docs/) este excelenta.
206~

NORMAL · rag-langchain-deep-dive.md [readonly]206 lines · :q to close

2Modelele de limbaj mari (LLM-uri) precum GPT-4 si Claude sunt extraordinar de puternice, dar sufera de o limitare fundamentala: cunostintele lor sunt inghetate la momentul antrenamentului. **Retrieval-Augmented Generation (RAG)** rezolva exact aceasta problema combinand puterea generativa a LLM-urilor cu capacitatea de a recupera informatii din surse externe.

4## Problema: Limitarile LLM

61. **Cunostinte statice**: Un LLM stie doar ce a vazut in timpul antrenamentului.

72. **Halucinatii**: Cand un LLM nu stie raspunsul, tinde sa fabrice unul.

83. **Fara acces la date private**: Un LLM generic nu are acces la documentatia interna a companiei tale.

10## Ce este RAG?

11~

12RAG este o arhitectura care imbogateste promptul trimis unui LLM cu informatii recuperate dintr-o baza de cunostinte externa.

13~

14```mermaid

15graph LR

16 User["User"] -- "Question" --> Retriever

17 Retriever -- "Search relevant\ndocuments" --> VectorStore["Vector Store"]

18 VectorStore -- "Relevant\ndocuments" --> Retriever

19 Retriever -- "Context + Question" --> LLM

20 LLM -- "Grounded\nresponse" --> User

21```

22~

23## Cum functioneaza RAG in detaliu

24~

25### Faza 1: Indexare (Ingestia documentelor)

26~

27```mermaid

28graph TD

29 A["Documents\n(PDF, HTML, MD, DB)"] --> B["Document Loader"]

30 B --> C["Text Splitter"]

31 C --> D["Text Chunks"]

32 D --> E["Embedding Model"]

33 E --> F["Numerical Vectors"]

34 F --> G["Vector Store\n(ChromaDB, Pinecone, FAISS)"]

35```

36~

37### Faza 2: Recuperare + Generare

38~

391. Intrebarea este transformata intr-un embedding.

402. Vector Store gaseste cele mai similare fragmente.

413. Fragmentele recuperate sunt inserate in prompt ca context.

424. LLM-ul genereaza un raspuns bazat pe context.

43~

44## Construirea unui pipeline RAG cu LangChain

45~

46```bash

47pip install langchain langchain-openai langchain-community chromadb

48```

49~

50### Pasul 1: Incarcarea documentelor

51~

52```python

53from langchain_community.document_loaders import (

54 PyPDFLoader,

55 WebBaseLoader,

56 DirectoryLoader,

57 TextLoader,

58)

59~

60pdf_loader = PyPDFLoader("docs/manual.pdf")

61pdf_docs = pdf_loader.load()

62~

63web_loader = WebBaseLoader("https://docs.example.com/guide")

64web_docs = web_loader.load()

65~

66dir_loader = DirectoryLoader("./knowledge_base", glob="**/*.md", loader_cls=TextLoader)

67md_docs = dir_loader.load()

68~

69all_docs = pdf_docs + web_docs + md_docs

70```

71~

72### Pasul 2: Impartirea documentelor in fragmente

73~

74```python

75from langchain.text_splitter import RecursiveCharacterTextSplitter

76~

77text_splitter = RecursiveCharacterTextSplitter(

78 chunk_size=1000,

79 chunk_overlap=200,

80 separators=["\n\n", "\n", ". ", " ", ""],

81)

82~

83chunks = text_splitter.split_documents(all_docs)

84```

85~

86### Pasul 3: Crearea Embedding-urilor si Vector Store

87~

88```python

89from langchain_openai import OpenAIEmbeddings

90from langchain_community.vectorstores import Chroma

91~

92embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

93~

94vectorstore = Chroma.from_documents(

95 documents=chunks,

96 embedding=embedding_model,

97 persist_directory="./chroma_db",

98)

99```

100~

101### Pasul 4: Crearea Retriever-ului

102~

103```python

104retriever = vectorstore.as_retriever(

105 search_type="similarity",

106 search_kwargs={"k": 4},

107)

108```

109~

110### Pasul 5: Construirea lantului RAG

111~

112```python

113from langchain_openai import ChatOpenAI

114from langchain_core.prompts import ChatPromptTemplate

115from langchain_core.runnables import RunnablePassthrough

116from langchain_core.output_parsers import StrOutputParser

117~

118llm = ChatOpenAI(model="gpt-4o", temperature=0)

119~

120prompt = ChatPromptTemplate.from_template("""

121Answer the question based only on the provided context.

122If the context does not contain enough information, say you don't know.

123~

124Context:

125{context}

126~

127Question: {question}

128~

129Answer:

130""")

131~

132def format_docs(docs):

133 return "\n\n".join(doc.page_content for doc in docs)

134~

135rag_chain = (

136 {"context": retriever | format_docs, "question": RunnablePassthrough()}

137 | prompt

138 | llm

139 | StrOutputParser()

140)

141~

142response = rag_chain.invoke("How does authentication work in the system?")

143print(response)

144```

145~

146## Tehnici RAG avansate

147~

148### Recuperare Multi-Query

149~

150```python

151from langchain.retrievers import MultiQueryRetriever

152~

153multi_retriever = MultiQueryRetriever.from_llm(

154 retriever=vectorstore.as_retriever(),

155 llm=llm,

156)

157```

158~

159### Compresie contextuala

160~

161```python

162from langchain.retrievers import ContextualCompressionRetriever

163from langchain.retrievers.document_compressors import LLMChainExtractor

164~

165compressor = LLMChainExtractor.from_llm(llm)

166compression_retriever = ContextualCompressionRetriever(

167 base_compressor=compressor,

168 base_retriever=retriever,

169)

170```

171~

172### Cautare hibrida

173~

174```python

175from langchain.retrievers import EnsembleRetriever

176from langchain_community.retrievers import BM25Retriever

177~

178bm25_retriever = BM25Retriever.from_documents(chunks)

179bm25_retriever.k = 4

180~

181semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

182~

183hybrid_retriever = EnsembleRetriever(

184 retrievers=[bm25_retriever, semantic_retriever],

185 weights=[0.4, 0.6],

186)

187```

188~

189## Bune practici

190~

1911. **Alege dimensiunea corecta a fragmentului**: Experimenteaza cu dimensiuni diferite (500-1500 tokeni).

1922. **Foloseste metadate ale documentelor**: Adauga sursa, data si categoria ca metadate.

1933. **Evalueaza calitatea**: Foloseste framework-uri precum [RAGAS](https://docs.ragas.io/).

1944. **Gestioneaza actualizarile documentelor**: Implementeaza un pipeline de re-ingestie.

1955. **Adauga un re-ranker**: Dupa recuperarea initiala, foloseste un model de re-clasificare.

196~

197## Concluzie

198~

199RAG a devenit arhitectura standard pentru construirea aplicatiilor AI care au nevoie de acces la cunostinte specifice si actualizate. LangChain simplifica semnificativ implementarea.

200~

201**Pasii urmatori:**

202- **Experimenteaza local**: Incepe cu ChromaDB si cateva documente.

203- **Exploreaza LangSmith**: Foloseste [LangSmith](https://smith.langchain.com/) pentru monitorizare.

204- **Incearca modele de embedding diferite**: Compara modele precum `text-embedding-3-small`, `text-embedding-3-large`.

205- **Consulta documentatia**: [Documentatia LangChain](https://python.langchain.com/docs/) este excelenta.

206~