Research Publications People Join

As autonomous agents become increasingly woven into the fabric of society—from self-driving cars to personal robot manipulators to AI assistants—our lab aims to ensure their seamless interaction with people. However, integrating these systems into human-centered environments in a way that aligns with human expectations is a formidable challenge. Specifying human objectives to robots is difficult because these objectives are complex, context-dependent, and inherently subjective. Without the right objectives, autonomous systems may exhibit unexpected or even dangerous behaviors.

Learning these objectives (for instance, as reward functions) has emerged as a popular alternative to manual specification, but it comes with its own set of difficulties: 1) getting the right data to supervise the learning is hard because humans are imperfect, not infinitely queryable, and have unique and changing preferences; 2) the representations we choose to mathematically express human objectives may themselves be wrong, thus preventing us from ever being able to capture desired behaviors; 3) reliably quantifying misalignment—or discrepancies from expected behavior—to ensure system safety remains underexplored.

Our goal is to develop autonomous agents whose behavior aligns with human expectations—whether the human is an expert system designer, a novice end-user, or another AI stakeholder. Our research combines expertise from robotics, deep learning, cognitive psychology, and probabilistic reasoning to develop more aligned, generalizable, and robust learning algorithms.

Asking for the Right Data

Typical methods that learn from human feedback (e.g. RLHF) treat humans as infinitely queryable oracles. However, individual humans have unique and evolving preferences, objectives, and biases that may not be fully reflected in canned internet data. Our research explores ways to effectively learn human objectives from noisy, incomplete, or inconsistent data. We focus on designing algorithms that can extract meaningful information from limited interactions, using structure, simulation, and powerful priors. This allows autonomous systems to better understand and anticipate human needs.

Getting the right data from humans

Interactively Arriving at Shared Task Representations

To act in the world, robots rely on a representation of salient task features: for example, to hand over a cup of coffee, the robot may consider efficiency and cup orientation in its behavior. But if we want robots to act for and with people, their representations must not be just functional but also reflective of what humans care about, i.e. they must be aligned with humans. If they're not, misalignment could lead to unintended and potentially harmful behavior; for example, we saw a robot arm move a coffee cup inches away from a person's face because it lacked an understanding of personal space. Our research focuses on aligning robot representations with humans via interactive processes where robots and humans can find shared task representations.

Interactively arriving at shared representations

Reliably Quantifying Misalignment

A key component of ensuring reliable autonomous systems is the ability to quantify how well a system's behavior aligns with human expectations. An autonomous agent should know when it doesn't know enough, and either ask for help or learn in proportion to how confident it is in its model. Our research aims to develop metrics and methods to detect and correct misalignment, ensuring that autonomous systems behave predictably and safely in diverse situations. This includes exploring probabilistic reasoning and cognitive psychology to understand and mitigate the risks associated with misalignment.

Quantifying misalignment

Check out our TEDxMIT talk on why robots aren't superhuman in our human world to get a sense of our research philosophy!