Research

Research Highlights

Jeonghye Kim and Sojeong Rhee (Prof. Youngchul Sung’s Lab) Achieve State-of-the-Art Performance in Large Language Model Agents

성영철 교수님 연구실 900
<(from left) Ph.D. Candidate Jeonghye Kim, M.S. Student Sojeong Rhee, and Prof. Youngchul Sung>

With the emergence of OpenAI’s ChatGPT at the end of 2022, generative Large Language Models (LLMs) have become one of the central research areas in artificial intelligence. Today, LLMs are evolving beyond simply understanding prompts and providing answers, into LLM Agents capable of interacting with their environments, engaging in multi-round actions, observations, and reasoning to accomplish assigned tasks. Such Agentic AI represents a key direction for future development, where AI systems autonomously interact with the environment to execute tasks without human intervention.

 

For example, when a household robot is given the task of “cooking soybean paste stew,” it must independently identify and gather the necessary ingredients, prepare them, place them in a pot, put the pot on the stove, turn on the heat, cook the stew, and finally turn off the stove once the dish is complete. Humans cannot provide instructions for each and every of these steps one by one: the robot must act, observe the results, reason about them, and determine its next action autonomously.

 

A representative LLM Agent model for this purpose is ReAct, introduced in 2023 by Google Brain and Princeton University. ReAct decides on actions step by step while taking future plans into account. However, ReAct suffers from limitations, as it often hallucinates or produces actions disconnected from reality.

 

To overcome these shortcomings, PhD student Jeonghye Kim and Master student Sojeong Rhee from Professor Youngchul Sung’s lab, in collaboration with Professor Kyomin Jung’s lab at Seoul National University, proposed a new LLM Agent model called ReflAct. The proposed ReflAct simultaneously considers the ultimate goal and the current situation at each step, thereby significantly reducing hallucination and enabling the agent to recognize its own mistakes. When combined with state-of-the-art reasoning LLMs, ReflAct achieved an impressive 93.3% task success rate on the ALFWorld benchmark, a simulated household environment.

 

This result is expected to accelerate the development of Agentic AI not only for household assistants but also for diverse applications such as scientific exploration and military operations. The work will be presented at the Main Conference of EMNLP 2025 this coming November.

 

Paper link: https://arxiv.org/pdf/2505.15182