Latest Research Papers
2025-01-23
arXiv
A RAG-Based Institutional Assistant
This paper introduces a RAG-based virtual assistant for the University of São Paulo, which integrates relevant document fragments to improve LLM performance. The system's accuracy significantly increases when provided with correct document chunks, highlighting the importance of database access and the limitations of current semantic search methods.
Although large language models (LLMs) demonstrate strong text generation
capabilities, they struggle in scenarios requiring access to structured
knowledge bases or specific documents, limiting their effectiveness in
knowledge-intensive tasks. To address this limitation, retrieval-augmented
generation (RAG) models have been developed, enabling generative models to
incorporate relevant document fragments into their inputs. In this paper, we
design and evaluate a RAG-based virtual assistant specifically tailored for the
University of S\~ao Paulo. Our system architecture comprises two key modules: a
retriever and a generative model. We experiment with different types of models
for both components, adjusting hyperparameters such as chunk size and the
number of retrieved documents. Our optimal retriever model achieves a Top-5
accuracy of 30%, while our most effective generative model scores 22.04\%
against ground truth answers. Notably, when the correct document chunks are
supplied to the LLMs, accuracy significantly improves to 54.02%, an increase of
over 30 percentage points. Conversely, without contextual input, performance
declines to 13.68%. These findings highlight the critical role of database
access in enhancing LLM performance. They also reveal the limitations of
current semantic search methods in accurately identifying relevant documents
and underscore the ongoing challenges LLMs face in generating precise
responses.
2024-09-12
arXiv
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG
This paper benchmarks and evaluates various ranking models for enhancing the accuracy of text retrieval in question-answering tasks, introducing a new model, NV-RerankQA-Mistral-4B-v3, that significantly improves accuracy. It also discusses the trade-offs between model size, accuracy, and system requirements in real-world applications.
Ranking models play a crucial role in enhancing overall accuracy of text
retrieval systems. These multi-stage systems typically utilize either dense
embedding models or sparse lexical indices to retrieve relevant passages based
on a given query, followed by ranking models that refine the ordering of the
candidate passages by its relevance to the query.
This paper benchmarks various publicly available ranking models and examines
their impact on ranking accuracy. We focus on text retrieval for
question-answering tasks, a common use case for Retrieval-Augmented Generation
systems. Our evaluation benchmarks include models some of which are
commercially viable for industrial applications.
We introduce a state-of-the-art ranking model, NV-RerankQA-Mistral-4B-v3,
which achieves a significant accuracy increase of ~14% compared to pipelines
with other rerankers. We also provide an ablation study comparing the
fine-tuning of ranking models with different sizes, losses and self-attention
mechanisms.
Finally, we discuss challenges of text retrieval pipelines with ranking
models in real-world industry applications, in particular the trade-offs among
model size, ranking accuracy and system requirements like indexing and serving
latency / throughput.