Action Segmentation

Action Segmentation and Causal Variable Discovery
Collaborators: Tom Griffiths, Dillon Plunkett, Alison Gopnik, Dare Baldwin

This line of work uses a combination of experimental and computational approaches to investigate how the ability to segment action and to infer its causal structure functions and develops. Many if not most of the causal outcomes people witness are the result of intentional human action. Both children and adults must be able to distinguish the unique actions they see other people performing and recognize their effects in order to understand the reason’s behind others’ behavior, and in order to potentially bring about those effects themselves. But before we can interpret actions we first must parse a continuous stream of motion into meaningful behavior. What cues do we use to do this? How might infants and young children begin to break into the behavior stream in order to identify intentional, goal-directed actions? Could the causal relationships between actions and their outcomes in the world help children understand action structure itself? How might children identify reaching, grasping, and turning, and then group them into the action “opening the door”? In this work, we are examining the relationship between statistical action segmentation and causal inference, and whether the two may in fact be jointly learned.

Segmenting and Recognizing Human Action from Video
Collaborators: Kevin Canini and Tom Griffiths

Infants are able to parse dynamic human action well before they are thought to have a fully developed theory of mind. This suggests that there may be low-level cues to intentional action structure available in human motion, an idea supported by a variety of recent work. In this line of work, I am developing a series of computational models that make very few representational assumptions about what is observed when watching videos of human action, in order to explore the amount of action structure that can be inferred from just low-level changes in pixel values, without knowledge of human body structure, higher level goals and intentions, or even foreground/background distinctions. To the extent that these models correspond to human segmentation judgments, and correctly recognizes actions, we will know that there are cues in surface level image changes that can be used to both segment and identify human behavior.