spinny:~/writing $ less rag-langchain-deep-dive.md
12Modelele de limbaj mari (LLM-uri) precum GPT-4 si Claude sunt extraordinar de puternice, dar sufera de o limitare fundamentala: cunostintele lor sunt inghetate la momentul antrenamentului. **Retrieval-Augmented Generation (RAG)** rezolva exact aceasta problema combinand puterea generativa a LLM-urilor cu capacitatea de a recupera informatii din surse externe.34## Problema: Limitarile LLM561. **Cunostinte statice**: Un LLM stie doar ce a vazut in timpul antrenamentului.72. **Halucinatii**: Cand un LLM nu stie raspunsul, tinde sa fabrice unul.83. **Fara acces la date private**: Un LLM generic nu are acces la documentatia interna a companiei tale.910## Ce este RAG?1112RAG este o arhitectura care imbogateste promptul trimis unui LLM cu informatii recuperate dintr-o baza de cunostinte externa.1314```mermaid15graph LR16 User["User"] -- "Question" --> Retriever17 Retriever -- "Search relevant\ndocuments" --> VectorStore["Vector Store"]18 VectorStore -- "Relevant\ndocuments" --> Retriever19 Retriever -- "Context + Question" --> LLM20 LLM -- "Grounded\nresponse" --> User21```2223## Cum functioneaza RAG in detaliu2425### Faza 1: Indexare (Ingestia documentelor)2627```mermaid28graph TD29 A["Documents\n(PDF, HTML, MD, DB)"] --> B["Document Loader"]30 B --> C["Text Splitter"]31 C --> D["Text Chunks"]32 D --> E["Embedding Model"]33 E --> F["Numerical Vectors"]34 F --> G["Vector Store\n(ChromaDB, Pinecone, FAISS)"]35```3637### Faza 2: Recuperare + Generare38391. Intrebarea este transformata intr-un embedding.402. Vector Store gaseste cele mai similare fragmente.413. Fragmentele recuperate sunt inserate in prompt ca context.424. LLM-ul genereaza un raspuns bazat pe context.4344## Construirea unui pipeline RAG cu LangChain4546```bash47pip install langchain langchain-openai langchain-community chromadb48```4950### Pasul 1: Incarcarea documentelor5152```python53from langchain_community.document_loaders import (54 PyPDFLoader,55 WebBaseLoader,56 DirectoryLoader,57 TextLoader,58)5960pdf_loader = PyPDFLoader("docs/manual.pdf")61pdf_docs = pdf_loader.load()6263web_loader = WebBaseLoader("https://docs.example.com/guide")64web_docs = web_loader.load()6566dir_loader = DirectoryLoader("./knowledge_base", glob="**/*.md", loader_cls=TextLoader)67md_docs = dir_loader.load()6869all_docs = pdf_docs + web_docs + md_docs70```7172### Pasul 2: Impartirea documentelor in fragmente7374```python75from langchain.text_splitter import RecursiveCharacterTextSplitter7677text_splitter = RecursiveCharacterTextSplitter(78 chunk_size=1000,79 chunk_overlap=200,80 separators=["\n\n", "\n", ". ", " ", ""],81)8283chunks = text_splitter.split_documents(all_docs)84```8586### Pasul 3: Crearea Embedding-urilor si Vector Store8788```python89from langchain_openai import OpenAIEmbeddings90from langchain_community.vectorstores import Chroma9192embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")9394vectorstore = Chroma.from_documents(95 documents=chunks,96 embedding=embedding_model,97 persist_directory="./chroma_db",98)99```100101### Pasul 4: Crearea Retriever-ului102103```python104retriever = vectorstore.as_retriever(105 search_type="similarity",106 search_kwargs={"k": 4},107)108```109110### Pasul 5: Construirea lantului RAG111112```python113from langchain_openai import ChatOpenAI114from langchain_core.prompts import ChatPromptTemplate115from langchain_core.runnables import RunnablePassthrough116from langchain_core.output_parsers import StrOutputParser117118llm = ChatOpenAI(model="gpt-4o", temperature=0)119120prompt = ChatPromptTemplate.from_template("""121Answer the question based only on the provided context.122If the context does not contain enough information, say you don't know.123124Context:125{context}126127Question: {question}128129Answer:130""")131132def format_docs(docs):133 return "\n\n".join(doc.page_content for doc in docs)134135rag_chain = (136 {"context": retriever | format_docs, "question": RunnablePassthrough()}137 | prompt138 | llm139 | StrOutputParser()140)141142response = rag_chain.invoke("How does authentication work in the system?")143print(response)144```145146## Tehnici RAG avansate147148### Recuperare Multi-Query149150```python151from langchain.retrievers import MultiQueryRetriever152153multi_retriever = MultiQueryRetriever.from_llm(154 retriever=vectorstore.as_retriever(),155 llm=llm,156)157```158159### Compresie contextuala160161```python162from langchain.retrievers import ContextualCompressionRetriever163from langchain.retrievers.document_compressors import LLMChainExtractor164165compressor = LLMChainExtractor.from_llm(llm)166compression_retriever = ContextualCompressionRetriever(167 base_compressor=compressor,168 base_retriever=retriever,169)170```171172### Cautare hibrida173174```python175from langchain.retrievers import EnsembleRetriever176from langchain_community.retrievers import BM25Retriever177178bm25_retriever = BM25Retriever.from_documents(chunks)179bm25_retriever.k = 4180181semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})182183hybrid_retriever = EnsembleRetriever(184 retrievers=[bm25_retriever, semantic_retriever],185 weights=[0.4, 0.6],186)187```188189## Bune practici1901911. **Alege dimensiunea corecta a fragmentului**: Experimenteaza cu dimensiuni diferite (500-1500 tokeni).1922. **Foloseste metadate ale documentelor**: Adauga sursa, data si categoria ca metadate.1933. **Evalueaza calitatea**: Foloseste framework-uri precum [RAGAS](https://docs.ragas.io/).1944. **Gestioneaza actualizarile documentelor**: Implementeaza un pipeline de re-ingestie.1955. **Adauga un re-ranker**: Dupa recuperarea initiala, foloseste un model de re-clasificare.196197## Concluzie198199RAG a devenit arhitectura standard pentru construirea aplicatiilor AI care au nevoie de acces la cunostinte specifice si actualizate. LangChain simplifica semnificativ implementarea.200201**Pasii urmatori:**202- **Experimenteaza local**: Incepe cu ChromaDB si cateva documente.203- **Exploreaza LangSmith**: Foloseste [LangSmith](https://smith.langchain.com/) pentru monitorizare.204- **Incearca modele de embedding diferite**: Compara modele precum `text-embedding-3-small`, `text-embedding-3-large`.205- **Consulta documentatia**: [Documentatia LangChain](https://python.langchain.com/docs/) este excelenta.206
:RAG si LangChain: Un ghid complet pentru Retrieval-Augmented Generationlines 1-206 (END) — press q to close