Latest Research Papers
2024-08-13
arXiv
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
The paper introduces a framework that combines guided Monte Carlo Tree Search (MCTS) with a self-critique mechanism and iterative fine-tuning, improving LLMs' performance in complex, multi-step reasoning tasks. This method significantly enhances the zero-shot performance of LLMs in real-world scenarios, such as web navigation and booking, outperforming existing baselines.
Large Language Models (LLMs) have shown remarkable capabilities in natural
language tasks requiring complex reasoning, yet their application in agentic,
multi-step reasoning within interactive environments remains a difficult
challenge. Traditional supervised pre-training on static datasets falls short
in enabling autonomous agent capabilities needed to perform complex
decision-making in dynamic settings like web navigation. Previous attempts to
bridge this ga-through supervised fine-tuning on curated expert
demonstrations-often suffer from compounding errors and limited exploration
data, resulting in sub-optimal policy outcomes. To overcome these challenges,
we propose a framework that combines guided Monte Carlo Tree Search (MCTS)
search with a self-critique mechanism and iterative fine-tuning on agent
interactions using an off-policy variant of the Direct Preference Optimization
(DPO) algorithm. Our method allows LLM agents to learn effectively from both
successful and unsuccessful trajectories, thereby improving their
generalization in complex, multi-step reasoning tasks. We validate our approach
in the WebShop environment-a simulated e-commerce platform where it
consistently outperforms behavior cloning and reinforced fine-tuning baseline,
and beats average human performance when equipped with the capability to do
online search. In real-world booking scenarios, our methodology boosts Llama-3
70B model's zero-shot performance from 18.6% to 81.7% success rate (a 340%
relative increase) after a single day of data collection and further to 95.4%
with online search. We believe this represents a substantial leap forward in
the capabilities of autonomous agents, paving the way for more sophisticated
and reliable decision-making in real-world settings.
2024-07-01
arXiv
Searching for Best Practices in Retrieval-Augmented Generation
The paper investigates different RAG approaches to identify optimal practices that balance performance and efficiency, and shows that multimodal retrieval techniques can enhance question-answering and content generation.
Retrieval-augmented generation (RAG) techniques have proven to be effective
in integrating up-to-date information, mitigating hallucinations, and enhancing
response quality, particularly in specialized domains. While many RAG
approaches have been proposed to enhance large language models through
query-dependent retrievals, these approaches still suffer from their complex
implementation and prolonged response times. Typically, a RAG workflow involves
multiple processing steps, each of which can be executed in various ways. Here,
we investigate existing RAG approaches and their potential combinations to
identify optimal RAG practices. Through extensive experiments, we suggest
several strategies for deploying RAG that balance both performance and
efficiency. Moreover, we demonstrate that multimodal retrieval techniques can
significantly enhance question-answering capabilities about visual inputs and
accelerate the generation of multimodal content using a "retrieval as
generation" strategy.
2024-04-25
arXiv
A Survey of Generative Search and Recommendation in the Era of Large Language Models
The paper surveys the emerging paradigm of generative search and recommendation driven by large language models, providing a unified framework to categorize and analyze existing works. It highlights unique challenges, open problems, and future directions in this field.
With the information explosion on the Web, search and recommendation are
foundational infrastructures to satisfying users' information needs. As the two
sides of the same coin, both revolve around the same core research problem,
matching queries with documents or users with items. In the recent few decades,
search and recommendation have experienced synchronous technological paradigm
shifts, including machine learning-based and deep learning-based paradigms.
Recently, the superintelligent generative large language models have sparked a
new paradigm in search and recommendation, i.e., generative search (retrieval)
and recommendation, which aims to address the matching problem in a generative
manner. In this paper, we provide a comprehensive survey of the emerging
paradigm in information systems and summarize the developments in generative
search and recommendation from a unified perspective. Rather than simply
categorizing existing works, we abstract a unified framework for the generative
paradigm and break down the existing works into different stages within this
framework to highlight the strengths and weaknesses. And then, we distinguish
generative search and recommendation with their unique challenges, identify
open problems and future directions, and envision the next information-seeking
paradigm.
2024-03-08
arXiv
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL, an open-source Vision-Language Model, is designed for real-world applications with a focus on diverse data, efficient processing, and strong language capabilities. The model, available in 1.3B and 7B versions, demonstrates superior performance in practical applications and benchmarks.
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed
for real-world vision and language understanding applications. Our approach is
structured around three key dimensions:
We strive to ensure our data is diverse, scalable, and extensively covers
real-world scenarios including web screenshots, PDFs, OCR, charts, and
knowledge-based content, aiming for a comprehensive representation of practical
contexts. Further, we create a use case taxonomy from real user scenarios and
construct an instruction tuning dataset accordingly. The fine-tuning with this
dataset substantially improves the model's user experience in practical
applications. Considering efficiency and the demands of most real-world
scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently
processes high-resolution images (1024 x 1024), while maintaining a relatively
low computational overhead. This design choice ensures the model's ability to
capture critical semantic and detailed information across various visual tasks.
We posit that a proficient Vision-Language Model should, foremost, possess
strong language abilities. To ensure the preservation of LLM capabilities
during pretraining, we investigate an effective VL pretraining strategy by
integrating LLM training from the beginning and carefully managing the
competitive dynamics observed between vision and language modalities.
The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user
experiences as a vision-language chatbot in real-world applications, achieving
state-of-the-art or competitive performance across a wide range of
visual-language benchmarks at the same model size while maintaining robust
performance on language-centric benchmarks. We have made both 1.3B and 7B
models publicly accessible to foster innovations based on this foundation
model.
2024-03-04
arXiv
Wukong: Towards a Scaling Law for Large-Scale Recommendation
This paper introduces Wukong, a network architecture based on stacked factorization machines and an upscaling strategy, to establish a scaling law for recommendation models. Wukong effectively captures diverse interactions and outperforms state-of-the-art models in quality and scalability. The results show that Wukong maintains its superiority across a wide range of model complexities.
Scaling laws play an instrumental role in the sustainable improvement in
model quality. Unfortunately, recommendation models to date do not exhibit such
laws similar to those observed in the domain of large language models, due to
the inefficiencies of their upscaling mechanisms. This limitation poses
significant challenges in adapting these models to increasingly more complex
real-world datasets. In this paper, we propose an effective network
architecture based purely on stacked factorization machines, and a synergistic
upscaling strategy, collectively dubbed Wukong, to establish a scaling law in
the domain of recommendation. Wukong's unique design makes it possible to
capture diverse, any-order of interactions simply through taller and wider
layers. We conducted extensive evaluations on six public datasets, and our
results demonstrate that Wukong consistently outperforms state-of-the-art
models quality-wise. Further, we assessed Wukong's scalability on an internal,
large-scale dataset. The results show that Wukong retains its superiority in
quality over state-of-the-art models, while holding the scaling law across two
orders of magnitude in model complexity, extending beyond 100 GFLOP/example,
where prior arts fall short.
2024-02-15
arXiv
Chain-of-Thought Reasoning Without Prompting
This paper explores a method to elicit chain-of-thought (CoT) reasoning from pre-trained LLMs without the need for prompt engineering, by altering the decoding process. It shows that CoT paths are often inherent in top-k alternative tokens, and their presence correlates with higher confidence in the model's answers. The approach effectively reveals the intrinsic reasoning abilities of LLMs.
In enhancing the reasoning capabilities of large language models (LLMs),
prior research primarily focuses on specific prompting techniques such as
few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while
effective, often involve manually intensive prompt engineering. Our study takes
a novel approach by asking: Can LLMs reason effectively without prompting? Our
findings reveal that, intriguingly, CoT reasoning paths can be elicited from
pre-trained LLMs by simply altering the \textit{decoding} process. Rather than
conventional greedy decoding, we investigate the top-$k$ alternative tokens,
uncovering that CoT paths are frequently inherent in these sequences. This
approach not only bypasses the confounders of prompting but also allows us to
assess the LLMs' \textit{intrinsic} reasoning abilities. Moreover, we observe
that the presence of a CoT in the decoding path correlates with a higher
confidence in the model's decoded answer. This confidence metric effectively
differentiates between CoT and non-CoT paths. Extensive empirical studies on
various reasoning benchmarks show that the proposed CoT-decoding effectively
elicits reasoning capabilities from language models, which were previously
obscured by standard greedy decoding.
2023-12-18
arXiv
Retrieval-Augmented Generation for Large Language Models: A Survey
The paper reviews Retrieval-Augmented Generation (RAG) for Large Language Models, which integrates external knowledge to improve accuracy and credibility. It covers the evolution of RAG paradigms and their components, and introduces a new evaluation framework. The paper also discusses current challenges and future research directions.
Large Language Models (LLMs) showcase impressive capabilities but encounter
challenges like hallucination, outdated knowledge, and non-transparent,
untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has
emerged as a promising solution by incorporating knowledge from external
databases. This enhances the accuracy and credibility of the generation,
particularly for knowledge-intensive tasks, and allows for continuous knowledge
updates and integration of domain-specific information. RAG synergistically
merges LLMs' intrinsic knowledge with the vast, dynamic repositories of
external databases. This comprehensive review paper offers a detailed
examination of the progression of RAG paradigms, encompassing the Naive RAG,
the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the
tripartite foundation of RAG frameworks, which includes the retrieval, the
generation and the augmentation techniques. The paper highlights the
state-of-the-art technologies embedded in each of these critical components,
providing a profound understanding of the advancements in RAG systems.
Furthermore, this paper introduces up-to-date evaluation framework and
benchmark. At the end, this article delineates the challenges currently faced
and points out prospective avenues for research and development.
2023-09-03
arXiv
Large Language Models for Generative Recommendation: A Survey and Visionary Discussions
This survey explores the potential of large language models (LLMs) in revolutionizing recommender systems by simplifying the recommendation process to a single stage, focusing on direct generation of recommendations. It examines the concept, necessity, and implementation of LLM-based generative recommendation for various tasks.
Large language models (LLM) not only have revolutionized the field of natural
language processing (NLP) but also have the potential to reshape many other
fields, e.g., recommender systems (RS). However, most of the related work
treats an LLM as a component of the conventional recommendation pipeline (e.g.,
as a feature extractor), which may not be able to fully leverage the generative
power of LLM. Instead of separating the recommendation process into multiple
stages, such as score computation and re-ranking, this process can be
simplified to one stage with LLM: directly generating recommendations from the
complete pool of items. This survey reviews the progress, methods, and future
directions of LLM-based generative recommendation by examining three questions:
1) What generative recommendation is, 2) Why RS should advance to generative
recommendation, and 3) How to implement LLM-based generative recommendation for
various RS tasks. We hope that this survey can provide the context and guidance
needed to explore this interesting and emerging topic.
2023-05-17
arXiv
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
The paper introduces Tree of Thoughts (ToT), a new framework for language model inference that enhances problem-solving by enabling exploration and strategic decision-making. ToT allows LMs to consider multiple reasoning paths and self-evaluate choices, improving performance on tasks requiring planning or search. Experiments show significant improvements in problem-solving abilities, such as increasing the success rate in the Game of 24 from 4% to 74%.
Language models are increasingly being deployed for general problem solving
across a wide range of tasks, but are still confined to token-level,
left-to-right decision-making processes during inference. This means they can
fall short in tasks that require exploration, strategic lookahead, or where
initial decisions play a pivotal role. To surmount these challenges, we
introduce a new framework for language model inference, Tree of Thoughts (ToT),
which generalizes over the popular Chain of Thought approach to prompting
language models, and enables exploration over coherent units of text (thoughts)
that serve as intermediate steps toward problem solving. ToT allows LMs to
perform deliberate decision making by considering multiple different reasoning
paths and self-evaluating choices to decide the next course of action, as well
as looking ahead or backtracking when necessary to make global choices. Our
experiments show that ToT significantly enhances language models'
problem-solving abilities on three novel tasks requiring non-trivial planning
or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in
Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of
tasks, our method achieved a success rate of 74%. Code repo with all prompts:
https://github.com/princeton-nlp/tree-of-thought-llm.
2022-09-16
arXiv
Monolith: Real Time Recommendation System With Collisionless Embedding Table
This paper introduces Monolith, a real-time recommendation system designed for online training with a collisionless embedding table. It optimizes memory usage and provides a fault-tolerant architecture, enabling real-time learning by integrating customer feedback.
Building a scalable and real-time recommendation system is vital for many
businesses driven by time-sensitive customer feedback, such as short-videos
ranking or online ads. Despite the ubiquitous adoption of production-scale deep
learning frameworks like TensorFlow or PyTorch, these general-purpose
frameworks fall short of business demands in recommendation scenarios for
various reasons: on one hand, tweaking systems based on static parameters and
dense computations for recommendation with dynamic and sparse features is
detrimental to model quality; on the other hand, such frameworks are designed
with batch-training stage and serving stage completely separated, preventing
the model from interacting with customer feedback in real-time. These issues
led us to reexamine traditional approaches and explore radically different
design choices. In this paper, we present Monolith, a system tailored for
online training. Our design has been driven by observations of our application
workloads and production environment that reflects a marked departure from
other recommendations systems. Our contributions are manifold: first, we
crafted a collisionless embedding table with optimizations such as expirable
embeddings and frequency filtering to reduce its memory footprint; second, we
provide an production-ready online training architecture with high
fault-tolerance; finally, we proved that system reliability could be traded-off
for real-time learning. Monolith has successfully landed in the BytePlus
Recommend product.