Intelligence Through the Lens of Interaction
Author
Ehsani, kiana
Metadata
Show full item recordAbstract
In this thesis, I will discuss the problem of acquiring visual intelligence from the interaction, focusing on two aspects of visual understanding: (1) visual perception and (2) embodied intelligence. To address the first question, I designed experiments to learn visual representations by observing animals and humans interact with the visual world. Further, I investigated the idea of learning perception from hands-on interaction -- acquiring generalizable physical understanding by predicting the forces applied in an observed video and trying to replicate the motion observed in simulation, with no additional supervision provided. To address the second question, I discuss our findings on training intelligent embodied agents using interaction from two perspectives. I designed a training paradigm that enables learning-to-learn from interactions. This training regime helps us to continue to learn from our interactions even during inference time. Moreover, I introduce a visually rich object manipulation framework, ManipulaTHOR, which opens the gate for directly training embodied agents to interact intelligently in a physically realistic environment via low-level object manipulation and navigation.