Researcher @ BAAI
My research sits at the intersection of information retrieval and large language models — specifically, how AI systems can actively search, reason over, and synthesize knowledge to answer complex, real-world questions.
Recent work focuses on LLM-based deep research agents that autonomously decompose multi-step tasks, retrieve heterogeneous evidence, and produce grounded answers — including reward-driven search (InfoFlow), deep research for reasoning models (WebThinker), and agentic memory (MemoBrain). This builds on a foundation in retrieval-augmented generation: memory-augmented architectures (MemoRAG) and information-foraging-guided reasoning (Scent of Knowledge).
|
|
Agent
2025-Present
13
AI Agents represent the next evolution of LLMs, moving from passive conversation to active task execution.Publications
|
|
|
|
Retrieval-Augmentation Generation
2022-Present
12
Retrieval-Augmented Generation (RAG) is a method that first retrieves relevant information from an external knowledge source and then combines it with the model’s input to generate more accurate and informative responses.Publications
|
|
|
|
Conversational Search
2021-Present
9
Conversational search is an interactive search paradigm where users and systems engage in a dialogue, allowing queries, clarifications, and refinements across multiple turns to iteratively reach more accurate and context-aware results.Publications
|
|
|
|
Others
2020-Present
10
Dialogue System, QA System, Ranking, Retrieval, Theory, etc.Publications
|
Jun 2023 - Oct 2023, Beijing, China
Elensdata is a start-up company which offers high-calibre data science/AI solutions that help real businesses, in media, finance, etc.
This project focuses on exploring techniques to expand the knowledge scale and memory scale at the input stage. The goal is to overcome the limitations of current LLMs in complex knowledge reasoning, knowledge memorization, and global knowledge understanding. This will be achieved by constructing a hierarchical memory mechanism that enables the scaling, memorization, and dynamic, coordinated retrieval of multi-source, heterogeneous knowledge.
MemoRAG is a next-generation retrieval-augmented generation system with long-term memory, enabling superior context-aware information retrieval and enhanced performance on complex tasks where traditional RAG systems struggle.
Informatica is a comprehensive collection of systematic research projects focused on deep research systems. Our mission is to provide open-source, scalable frameworks, datasets, data synthesis methods, models, and demonstrations.