Modelele de limbaj mari (LLM-uri) precum GPT-4 si Claude sunt extraordinar de puternice, dar sufera de o limitare fundamentala: cunostintele lor sunt inghetate la momentul antrenamentului. Retrieval-Augmented Generation (RAG) rezolva exact aceasta problema combinand puterea generativa a LLM-urilor cu capacitatea de a recupera informatii din surse externe.
Problema: Limitarile LLM
- Cunostinte statice: Un LLM stie doar ce a vazut in timpul antrenamentului.
- Halucinatii: Cand un LLM nu stie raspunsul, tinde sa fabrice unul.
- Fara acces la date private: Un LLM generic nu are acces la documentatia interna a companiei tale.
Ce este RAG?
RAG este o arhitectura care imbogateste promptul trimis unui LLM cu informatii recuperate dintr-o baza de cunostinte externa.
Cum functioneaza RAG in detaliu
Faza 1: Indexare (Ingestia documentelor)
Faza 2: Recuperare + Generare
- Intrebarea este transformata intr-un embedding.
- Vector Store gaseste cele mai similare fragmente.
- Fragmentele recuperate sunt inserate in prompt ca context.
- LLM-ul genereaza un raspuns bazat pe context.
Construirea unui pipeline RAG cu LangChain
pip install langchain langchain-openai langchain-community chromadb
Pasul 1: Incarcarea documentelor
from langchain_community.document_loaders import ( PyPDFLoader, WebBaseLoader, DirectoryLoader, TextLoader, ) pdf_loader = PyPDFLoader("docs/manual.pdf") pdf_docs = pdf_loader.load() web_loader = WebBaseLoader("https://docs.example.com/guide") web_docs = web_loader.load() dir_loader = DirectoryLoader("./knowledge_base", glob="**/*.md", loader_cls=TextLoader) md_docs = dir_loader.load() all_docs = pdf_docs + web_docs + md_docs
Pasul 2: Impartirea documentelor in fragmente
from langchain.text_splitter import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ". ", " ", ""], ) chunks = text_splitter.split_documents(all_docs)
Pasul 3: Crearea Embedding-urilor si Vector Store
from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import Chroma embedding_model = OpenAIEmbeddings(model="text-embedding-3-small") vectorstore = Chroma.from_documents( documents=chunks, embedding=embedding_model, persist_directory="./chroma_db", )
Pasul 4: Crearea Retriever-ului
retriever = vectorstore.as_retriever( search_type="similarity", search_kwargs={"k": 4}, )
Pasul 5: Construirea lantului RAG
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser llm = ChatOpenAI(model="gpt-4o", temperature=0) prompt = ChatPromptTemplate.from_template(""" Answer the question based only on the provided context. If the context does not contain enough information, say you don't know. Context: {context} Question: {question} Answer: """) def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) response = rag_chain.invoke("How does authentication work in the system?") print(response)
Tehnici RAG avansate
Recuperare Multi-Query
from langchain.retrievers import MultiQueryRetriever multi_retriever = MultiQueryRetriever.from_llm( retriever=vectorstore.as_retriever(), llm=llm, )
Compresie contextuala
from langchain.retrievers import ContextualCompressionRetriever from langchain.retrievers.document_compressors import LLMChainExtractor compressor = LLMChainExtractor.from_llm(llm) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever, )
Cautare hibrida
from langchain.retrievers import EnsembleRetriever from langchain_community.retrievers import BM25Retriever bm25_retriever = BM25Retriever.from_documents(chunks) bm25_retriever.k = 4 semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 4}) hybrid_retriever = EnsembleRetriever( retrievers=[bm25_retriever, semantic_retriever], weights=[0.4, 0.6], )
Bune practici
- Alege dimensiunea corecta a fragmentului: Experimenteaza cu dimensiuni diferite (500-1500 tokeni).
- Foloseste metadate ale documentelor: Adauga sursa, data si categoria ca metadate.
- Evalueaza calitatea: Foloseste framework-uri precum RAGAS.
- Gestioneaza actualizarile documentelor: Implementeaza un pipeline de re-ingestie.
- Adauga un re-ranker: Dupa recuperarea initiala, foloseste un model de re-clasificare.
Concluzie
RAG a devenit arhitectura standard pentru construirea aplicatiilor AI care au nevoie de acces la cunostinte specifice si actualizate. LangChain simplifica semnificativ implementarea.
Pasii urmatori:
- Experimenteaza local: Incepe cu ChromaDB si cateva documente.
- Exploreaza LangSmith: Foloseste LangSmith pentru monitorizare.
- Incearca modele de embedding diferite: Compara modele precum
text-embedding-3-small,text-embedding-3-large. - Consulta documentatia: Documentatia LangChain este excelenta.