Hi, I am Hongjin

RoleRAG is a unified retrieval-augmented generation (RAG) framework that enables efficient multi-task processing via role-specific token optimization. It decomposes RAG into six modular tasks, all handled by a single LLM using optimized role tokens. A query graph guides dynamic reasoning, and experiments on five QA benchmarks show RoleRAG’s effectiveness and generalizability.

Paper Arxiv Preprint

Details

[A12] Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Tool-Star is an RL-based framework that enables large language models to autonomously and effectively use multiple external tools during step-by-step reasoning. It introduces a scalable tool-use data synthesis pipeline and a two-stage training strategy combining supervised fine-tuning and self-critic reinforcement learning. Tool-Star integrates six tool types and achieves strong performance across 10+ reasoning benchmarks.

Paper Arxiv Preprint

Details

[A11] Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging

First Author 2025

InForage is a reinforcement learning framework that redefines retrieval-augmented reasoning as a dynamic information-seeking process, rewarding intermediate retrieval quality to train LLMs to adaptively search and integrate information, significantly improving performance on complex, multi-step web-based tasks.

Paper Arxiv Preprint

Details

[A10] WebThinker: Empowering Large Reasoning Models with Deep Research Capability

WebThinker is a deep research agent that enhances large reasoning models by enabling autonomous web search, navigation, and report drafting through a Think-Search-and-Draft strategy and RL-based optimization, significantly improving performance on complex reasoning and scientific report generation tasks.

Paper Arxiv Preprint

Details

[A9] Memory-enhanced Retrieval Augmentation for Long Video Understanding

MemVid is a cognitively inspired RAG framework for long-video understanding that mimics human memory by memorizing holistic video content, inferring information needs, retrieving key moments, and reasoning over them to answer complex queries—achieving state-of-the-art performance via curriculum learning on major LVU benchmarks.

Paper Arxiv Preprint

Details

[A8] HawkBench: Investigating Resilience of RAG Methods on Stratified Information-Seeking Tasks

First Author 2025

HawkBench is a human-labeled benchmark designed to evaluate RAG systems’ resilience across diverse information-seeking tasks and domains, overcoming limitations of existing benchmarks by providing systematic task stratification and multi-domain corpora. Using HawkBench, evaluations reveal the necessity for RAG systems to adopt dynamic task strategies that enhance generalizability and improve performance in real-world information-seeking scenarios.

Paper Arxiv Preprint

Details

[A7] Does RAG Really Perform Bad For Long-Context Processing?

RetroLM addresses the challenge of efficient long-context processing in LLMs by introducing a novel RAG framework that utilizes KV-level retrieval augmentation, partitioning the KV cache for selective retrieval of crucial information. This approach, combined with a specialized retriever and unsupervised post-training, enables RetroLM to surpass existing methods in benchmarks requiring intensive reasoning and long-context comprehension.

Paper Arxiv Preprint

Details

[C20] Boosting Long-Context Information Seeking via Query-Guided Activation Refilling

Processing long contexts is challenging for large language models (LLMs) due to context-window limitations and the computational cost of key-value (KV) activations. For information-seeking tasks, full context perception is often unnecessary, as queries may require varying levels of detail. We propose a method called query-guided Activation Refilling (ACRE), which utilizes a Bi-layer KV Cache. ACRE establishes a proxying relationship between these layers to efficiently process dynamic information needs.

Paper Conference ACL 25 CCF-A Core A*

Details

[C19] MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation

Processing long contexts is a major challenge for large language models (LLMs). While recent advances allow LLMs to handle longer contexts, computational costs remain high. Retrieval-Augmented Generation (RAG) offers a solution, but traditional methods require explicit queries and well-structured knowledge, which are not always feasible. We introduce HawkRAG, a novel framework leveraging global memory-augmented retrieval with a dual-system architecture to overcome these limitations.

Paper Conference theWebConf25 CCF-A Core A*

Details

[C18] Tackling the Length Barrier: Dynamic Context Browsing for Knowledge-Intensive Task

This work challenges the necessity of long-LLMs for long-context tasks by introducing LC-Boost, a framework that enables short-LLMs to effectively handle long-context tasks by adaptively accessing and utilizing relevant context, achieving improved performance with fewer resources.

Paper Conference KDD25 CCF-A Core A*

Details

[C17] CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

Retrieval-Augmented Generation (RAG) enhances large language models through external knowledge retrieval, but current research mainly focuses on single-turn tasks. To address this gap, we introduce CORAL, a benchmark for evaluating RAG systems in multi-turn conversations, tackling challenges like open-domain coverage, knowledge intensity, and topic shifts across tasks such as passage retrieval, response generation, and citation labeling.

Paper Conference NAACL25 Findings

Details

[C16] Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization

This paper explores two fine-tuning phenomena in Large Language Models (LLMs): the superior performance of optimizing only the Q and K matrices over full parameter optimization, and the benefit of distinct learning rates for faster convergence. Through theoretical and empirical analysis, the authors propose a new, efficient fine-tuning strategy that enhances generalization, memory efficiency, and optimization speed, validated on benchmark datasets.

Paper Conference IJCAI25 CCF-A Core A*

Details

[A6] A Survey of Conversational Search

This survey examines recent advancements in conversational search, a next-generation paradigm that uses natural language dialogue and LLMs to enable intuitive, multi-turn information retrieval, highlighting critical modules, challenges, and future directions for enhancing user experience and system intelligence in search engines.

Paper Arxiv Survey

Details

[A5] Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

This survey introduces a unified framework to evaluate the trustworthiness of Retrieval-Augmented Generation (RAG) systems across six key dimensions—factuality, robustness, fairness, transparency, accountability, and privacy—offering a structured benchmark and comprehensive evaluations to guide future research and enhance RAG reliability in real-world applications.

Paper Arxiv Survey

Details

[C15] RAG-Studio: Towards In-Domain Adaptation Of Retrieval Augmented Generation Through Self Alignment

RAG-Studio is an efficient self-aligned training framework that adapts general RAG models to specialized domains solely through synthetic data, producing a domain-specific RAG system that outperforms the use of human-annotated data for fine-tuning.

Paper Conference EMNLP24 Findings

Details

[A4] MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

MemoRAG is a novel retrieval-augmented generation framework that incorporates long-term memory to handle tasks with ambiguous information needs, which standard RAG systems struggle with. By using a dual-system architecture to form global memory and generate draft answers for guiding retrieval, MemoRAG outperforms conventional RAG in both complex and straightforward tasks.

Paper Arxiv Technical Report

Details

[A3] Extending Llama-3's Context Ten-Fold Overnight