Velke jazykove modely (LLM) jako GPT-4 a Claude jsou mimoradne mocne, ale trpi zakladnim omezenim: jejich znalosti jsou zmrazeny v dobe treninku. Retrieval-Augmented Generation (RAG) resi presne tento problem kombinaci generativni sily LLM s moznosti ziskavat informace z externich zdroju.
Problem: Omezeni LLM
- Staticke znalosti: LLM vi pouze to, co videl behem treninku.
- Halucinace: Kdyz LLM nezna odpoved, ma tendenci si ji vymyslet.
- Zadny pristup k soukromym datum: Genericky LLM nema pristup k interni dokumentaci vasi spolecnosti.
Co je RAG?
Jak RAG funguje podrobne
Faze 1: Indexace
Faze 2: Vyhledavani + Generovani
Stavba RAG Pipeline s LangChain
pip install langchain langchain-openai langchain-community chromadb
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader, DirectoryLoader, TextLoader pdf_loader = PyPDFLoader("docs/manual.pdf") pdf_docs = pdf_loader.load() all_docs = pdf_docs
from langchain.text_splitter import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) chunks = text_splitter.split_documents(all_docs)
from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import Chroma embedding_model = OpenAIEmbeddings(model="text-embedding-3-small") vectorstore = Chroma.from_documents(documents=chunks, embedding=embedding_model, persist_directory="./chroma_db")
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser llm = ChatOpenAI(model="gpt-4o", temperature=0) retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 4}) prompt = ChatPromptTemplate.from_template(""" Answer the question based only on the provided context. If the context does not contain enough information, say you don't know. Context: {context} Question: {question} Answer: """) def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )
Pokrocile techniky RAG
Patri sem Multi-Query Retrieval, kontextualni komprese, hybridni vyhledavani a konverzacni RAG s pameti.
Osvedcene postupy
- Experimentujte s velikosti fragmentu (500-1500 tokenu).
- Pouzivejte metadata dokumentu.
- Vyhodnocujte kvalitu pomoci frameworku jako RAGAS.
- Implementujte pipeline pro aktualizaci dokumentu.
- Pridejte re-ranker po pocatecnim vyhledavani.
Zaver
RAG se stal standardni architekturou pro stavbu AI aplikaci s pristupem ke specifickym znalostem. LangChain vyrazne zjednodusuje implementaci.