U-M team reaches next phase of Amazon Alexa Prize SimBot Challenge
As virtual assistants grow more ubiquitous in personal and household devices, a new generation of artificial intelligence research races to keep up with demands for innovation. In a move to involve academia in industry progress, Amazon launched the Alexa Prize SimBot Challenge, a competition focused on “helping advance development of next-generation virtual assistants that will assist humans in completing real-world tasks.” After a preliminary round in March 2022, a University of Michigan team led by doctoral student Yichi Zhang and advised by Professor Joyce Chaihas advanced to the next phase of interacting with real human users.
Team SEAGULL, which consists of students from Chai’s Situated Language and Embodied Dialogue Lab, envisions a bot “capable of attending to users’ needs, following users’ instructions, collaborating with users, and continuously improving itself through interaction with users.”
Participants in the challenge have been tasked with building machine-learning models for natural language understanding and human-agent interaction. Each bot entry is evaluated on its ability to respond to user commands and multimodal sensory inputs in order to execute game-like tasks on Amazon Echo Show devices. The bots operate within a virtual environment, and users can watch the effect their inputs have.
“The objective is to work with agents over a virtual interface and see if they are able to follow natural language commands to complete a task,” says Zhang.
The team designed SEAGULL to address two key shortcomings in recent learning agents: poor symbolic reasoning and an inability to deal with failure. Symbolic reasoning, Zhang says, is one of the key capabilities enabling humans to plan, adapt, and communicate.
“Despite a lot of advances in deep learning,” he says, “agents still lack the ability to reason over symbols. Humans can use concepts in our minds to come up with very efficient and adaptive plans, so agents will need these same capabilities to work alongside us.”
SEAGULL draws on past work in Chai’s lab to explicitly represent the structures of a task that can be mapped to natural language during training.
“That’s why we call our system a neural-symbolic system,” Zhang explains. “The neural network deals with perception, and our symbolic system deals with reasoning.”
The team is also working to make SEAGULL more adaptive by learning from failures. The Simbot Challenge demands that a bot deal with several problems at once – natural language understanding, dialog, action planning, and incremental knowledge acquisition.
“So we need to have a modular tool to weave all this together,” says Zhang. He adds that this approach will enable agent designers to more efficiently track down failure points in the event a robot runs into trouble.
Chai commented that projects like SEAGULL are the first step in developing embodied physical agents that can communicate and collaborate with humans.
“The ultimate goal is to work with physical robots,” she says, “but there are many limitations there.” These limitations have led much of the AI community to develop virtual simulated environments to do data collection, training, and evaluation.
According to Chai, “A virtual environment really gets us to start innovating, because otherwise it would be very hard to collect data and look at what the fundamental problems are. The simulated environment provides a nice platform for us to address some of these problems head-on.”
“This competition serves as an excellent opportunity to develop something that can really interact with real humans,” says Zhang.
As the team moves into the second round of evaluation with human users, they look to incorporate efficiency, adaptability, and common sense communication into SEAGULL.
“This is a really exciting time for embodied AI research,” says Chai. “We’re finally at this stage that we have a lot of promising progress in NLP, machine learning, and embodied agents, and we can look at them holistically with this project.”