2024-08-13
arXiv

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Pranav Putta , Edmund Mills , Naman Garg , Sumeet Motwani , Chelsea Finn
The paper introduces a framework that combines guided Monte Carlo Tree Search (MCTS) with a self-critique mechanism and iterative fine-tuning, improving LLMs' performance in complex, multi-step reasoning tasks. This method significantly enhances the zero-shot performance of LLMs in real-world scenarios, such as web navigation and booking, outperforming existing baselines.
Large Language Models (LLMs) have shown remarkable capabilities in natural language tasks requiring complex reasoning, yet their application in agentic, multi-step reasoning within interactive environments remains a difficult challenge. Traditional supervised pre-training on static datasets falls short in enabling autonomous agent capabilities needed to perform complex decision-making in dynamic settings like web navigation. Previous attempts to bridge this ga-through supervised fine-tuning on curated expert demonstrations-often suffer from compounding errors and limited exploration data, resulting in sub-optimal policy outcomes. To overcome these challenges, we propose a framework that combines guided Monte Carlo Tree Search (MCTS) search with a self-critique mechanism and iterative fine-tuning on agent interactions using an off-policy variant of the Direct Preference Optimization (DPO) algorithm. Our method allows LLM agents to learn effectively from both successful and unsuccessful trajectories, thereby improving their generalization in complex, multi-step reasoning tasks. We validate our approach in the WebShop environment-a simulated e-commerce platform where it consistently outperforms behavior cloning and reinforced fine-tuning baseline, and beats average human performance when equipped with the capability to do online search. In real-world booking scenarios, our methodology boosts Llama-3 70B model's zero-shot performance from 18.6% to 81.7% success rate (a 340% relative increase) after a single day of data collection and further to 95.4% with online search. We believe this represents a substantial leap forward in the capabilities of autonomous agents, paving the way for more sophisticated and reliable decision-making in real-world settings.
2024-07-01
arXiv

Searching for Best Practices in Retrieval-Augmented Generation

Xiaohua Wang , Zhenghua Wang , Xuan Gao , Feiran Zhang , Yixin Wu
The paper investigates different RAG approaches to identify optimal practices that balance performance and efficiency, and shows that multimodal retrieval techniques can enhance question-answering and content generation.
Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolonged response times. Typically, a RAG workflow involves multiple processing steps, each of which can be executed in various ways. Here, we investigate existing RAG approaches and their potential combinations to identify optimal RAG practices. Through extensive experiments, we suggest several strategies for deploying RAG that balance both performance and efficiency. Moreover, we demonstrate that multimodal retrieval techniques can significantly enhance question-answering capabilities about visual inputs and accelerate the generation of multimodal content using a "retrieval as generation" strategy.
2024-04-25
arXiv

A Survey of Generative Search and Recommendation in the Era of Large Language Models

Yongqi Li , Xinyu Lin , Wenjie Wang , Fuli Feng , Liang Pang
The paper surveys the emerging paradigm of generative search and recommendation driven by large language models, providing a unified framework to categorize and analyze existing works. It highlights unique challenges, open problems, and future directions in this field.
With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs. As the two sides of the same coin, both revolve around the same core research problem, matching queries with documents or users with items. In the recent few decades, search and recommendation have experienced synchronous technological paradigm shifts, including machine learning-based and deep learning-based paradigms. Recently, the superintelligent generative large language models have sparked a new paradigm in search and recommendation, i.e., generative search (retrieval) and recommendation, which aims to address the matching problem in a generative manner. In this paper, we provide a comprehensive survey of the emerging paradigm in information systems and summarize the developments in generative search and recommendation from a unified perspective. Rather than simply categorizing existing works, we abstract a unified framework for the generative paradigm and break down the existing works into different stages within this framework to highlight the strengths and weaknesses. And then, we distinguish generative search and recommendation with their unique challenges, identify open problems and future directions, and envision the next information-seeking paradigm.
2024-03-08
arXiv

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Wen Liu , Bingxuan Wang , Zhenda Xie , Yaofeng Sun , Kai Dong
DeepSeek-VL, an open-source Vision-Language Model, is designed for real-world applications with a focus on diverse data, efficient processing, and strong language capabilities. The model, available in 1.3B and 7B versions, demonstrates superior performance in practical applications and benchmarks.
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhead. This design choice ensures the model's ability to capture critical semantic and detailed information across various visual tasks. We posit that a proficient Vision-Language Model should, foremost, possess strong language abilities. To ensure the preservation of LLM capabilities during pretraining, we investigate an effective VL pretraining strategy by integrating LLM training from the beginning and carefully managing the competitive dynamics observed between vision and language modalities. The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks. We have made both 1.3B and 7B models publicly accessible to foster innovations based on this foundation model.
2024-03-04
arXiv

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Buyun Zhang , Liang Luo , Yuxin Chen , Jade Nie , Xi Liu
This paper introduces Wukong, a network architecture based on stacked factorization machines and an upscaling strategy, to establish a scaling law for recommendation models. Wukong effectively captures diverse interactions and outperforms state-of-the-art models in quality and scalability. The results show that Wukong maintains its superiority across a wide range of model complexities.
Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets. In this paper, we propose an effective network architecture based purely on stacked factorization machines, and a synergistic upscaling strategy, collectively dubbed Wukong, to establish a scaling law in the domain of recommendation. Wukong's unique design makes it possible to capture diverse, any-order of interactions simply through taller and wider layers. We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms state-of-the-art models quality-wise. Further, we assessed Wukong's scalability on an internal, large-scale dataset. The results show that Wukong retains its superiority in quality over state-of-the-art models, while holding the scaling law across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short.
2024-02-15
arXiv

Chain-of-Thought Reasoning Without Prompting

Xuezhi Wang (Google Research) , Denny Zhou (Google Research)
This paper explores a method to elicit chain-of-thought (CoT) reasoning from pre-trained LLMs without the need for prompt engineering, by altering the decoding process. It shows that CoT paths are often inherent in top-k alternative tokens, and their presence correlates with higher confidence in the model's answers. The approach effectively reveals the intrinsic reasoning abilities of LLMs.
In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the \textit{decoding} process. Rather than conventional greedy decoding, we investigate the top-$k$ alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' \textit{intrinsic} reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding effectively elicits reasoning capabilities from language models, which were previously obscured by standard greedy decoding.
2023-12-18
arXiv

Retrieval-Augmented Generation for Large Language Models: A Survey

Yun Xiong , Yunfan Gao , Xinyu Gao , Kangxiang Jia , Jinliu Pan
The paper reviews Retrieval-Augmented Generation (RAG) for Large Language Models, which integrates external knowledge to improve accuracy and credibility. It covers the evolution of RAG paradigms and their components, and introduces a new evaluation framework. The paper also discusses current challenges and future research directions.
Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval, the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces up-to-date evaluation framework and benchmark. At the end, this article delineates the challenges currently faced and points out prospective avenues for research and development.
2023-09-03
arXiv

Large Language Models for Generative Recommendation: A Survey and Visionary Discussions

Lei Li , Yongfeng Zhang , Dugang Liu , Li Chen
This survey explores the potential of large language models (LLMs) in revolutionizing recommender systems by simplifying the recommendation process to a single stage, focusing on direct generation of recommendations. It examines the concept, necessity, and implementation of LLM-based generative recommendation for various tasks.
Large language models (LLM) not only have revolutionized the field of natural language processing (NLP) but also have the potential to reshape many other fields, e.g., recommender systems (RS). However, most of the related work treats an LLM as a component of the conventional recommendation pipeline (e.g., as a feature extractor), which may not be able to fully leverage the generative power of LLM. Instead of separating the recommendation process into multiple stages, such as score computation and re-ranking, this process can be simplified to one stage with LLM: directly generating recommendations from the complete pool of items. This survey reviews the progress, methods, and future directions of LLM-based generative recommendation by examining three questions: 1) What generative recommendation is, 2) Why RS should advance to generative recommendation, and 3) How to implement LLM-based generative recommendation for various RS tasks. We hope that this survey can provide the context and guidance needed to explore this interesting and emerging topic.
2023-05-17
arXiv

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Shunyu Yao (Princeton University) , Dian Yu (Google DeepMind) , Jeffrey Zhao (Princeton University) , Izhak Shafran (Google DeepMind) , Thomas L. Griffiths (Princeton University)
The paper introduces Tree of Thoughts (ToT), a new framework for language model inference that enhances problem-solving by enabling exploration and strategic decision-making. ToT allows LMs to consider multiple reasoning paths and self-evaluate choices, improving performance on tasks requiring planning or search. Experiments show significant improvements in problem-solving abilities, such as increasing the success rate in the Game of 24 from 4% to 74%.
Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm.
2022-09-16
arXiv

Monolith: Real Time Recommendation System With Collisionless Embedding Table

Zhuoran Liu , Leqi Zou , Xuan Zou , Caihua Wang , Biao Zhang
This paper introduces Monolith, a real-time recommendation system designed for online training with a collisionless embedding table. It optimizes memory usage and provides a fault-tolerant architecture, enabling real-time learning by integrating customer feedback.
Building a scalable and real-time recommendation system is vital for many businesses driven by time-sensitive customer feedback, such as short-videos ranking or online ads. Despite the ubiquitous adoption of production-scale deep learning frameworks like TensorFlow or PyTorch, these general-purpose frameworks fall short of business demands in recommendation scenarios for various reasons: on one hand, tweaking systems based on static parameters and dense computations for recommendation with dynamic and sparse features is detrimental to model quality; on the other hand, such frameworks are designed with batch-training stage and serving stage completely separated, preventing the model from interacting with customer feedback in real-time. These issues led us to reexamine traditional approaches and explore radically different design choices. In this paper, we present Monolith, a system tailored for online training. Our design has been driven by observations of our application workloads and production environment that reflects a marked departure from other recommendations systems. Our contributions are manifold: first, we crafted a collisionless embedding table with optimizations such as expirable embeddings and frequency filtering to reduce its memory footprint; second, we provide an production-ready online training architecture with high fault-tolerance; finally, we proved that system reliability could be traded-off for real-time learning. Monolith has successfully landed in the BytePlus Recommend product.