Welcome to LA4IRA@ROMAN’23!

Schedule: August 28th, 2023 (KST)

Time	Title	Speaker
09:00-09:05am	Session: Opening Remarks
09:05-09:35am	Keynote talk: "Embodied AI: Machine Learning to Learning Machines" Abstract: Machine learning (including deep learning) has changed the paradigm of AI from rule-based “manual” programming to data-driven “automatic” programming. However, the current paradigm of machine learning requires some external system that provides them with data, making their scalability limited. Here we argue that the learner can feed itself the data autonomously if it is embodied, i.e. equipped with sensors and actuators. With the perception-action cycle the embodied AI can continually learn to solve problems in a self-teaching way by doing new actions, observing their outcomes, and correcting their own predictions like the humans and animals do. In this talk, I will show some of our studies in this direction of “(embodied) learning machine” research and discuss its implications for achieving truly human-level general AI.	Byoung-Tak Zhang Seoul National University
09:35-10:05am	Invited Talk 1: "Following Instructions and Asking Questions" Abstract: As we move towards the creation of embodied agents that understand natural language, several new challenges and complexities arise for grounding (e.g. complex state-spaces), planning (e.g. long horizons), and social interaction (e.g. asking for help or clarifications). In this talk, I'll discuss several recent results both on improvements to embodied instruction following within ALFRED and initial steps towards building agents that ask questions or model theory-of-mind.	Yonatan Bisk Carnegie Mellon University
10:05-10:35am	Invited Talk 2: "Scaling Robot Learning by Understanding Videos" Abstract: True gains of machine learning in AI sub-fields such as computer vision and natural language processing have come about from the use of large-scale diverse datasets for learning. In this talk, I will discuss if and how we can leverage large-scale diverse data in the form of egocentric videos (first-person videos of humans conducting different tasks) to similarly scale up policy learning for robots. I will discuss the challenges this presents, and some of our initial efforts towards tackling them. In particular, I will describe techniques to acquire low-level visuomotor subroutines, high-level value functions, and an interactive understanding of objects from in-the-wild egocentric videos.	Saurabh Gupta University of Illinois Urbana-Champaign
10:35-10:45am	10-min Coffee Break/Social Networking
10:45-11:55am	3-min teaser talks for poster presentations / Poster Session Paper 1: ECLAIR: Event-Cognizant Language Interaction Embodied Robots Jinyeon Kim, Byeonghwi Kim, Cheolhong Min, Yuyeong Kim, Taewoong Kim and Jonghyun Choi Paper 2: GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Suyeon Shin and Byoung-Tak Zhang Paper 3: Continual Fine-tuning using Linearized Deep Neural Networks Hyounguk Shon and Junmo Kim Paper 4: The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training Gi-Cheon Kang, Sungdong Kim, Jin-Hwa Kim, Donghyun Kwak and Byoung-Tak Zhang Paper 5: Towards Robust Robot Perception: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses Thanh Nguyen, Tung Luu and Chang Dong Yoo Paper 6: CLIP is Also a Good Teacher in the Presence of Noisy Labels Yeonguk Yu, Minhwan Ko and Kyoobin Lee Paper 7: Leveraging Class-agnostic Object Proposals for Open-world Object Detection Assefa Seyoum Wahd, Minsu Jang and Seung-Ik Lee Paper 8: A Dataset Design for Question Generation Based on Human Cognition Minjung Shin, Minsu Chang, Miyoung Cho and Jeh-Kwang Ryu Paper 9: Reducing Object Hallucination for Image Captioning using Large Vision-Language Models with Reinforcement Learning Hee Suk Yoon, Eunseop Yoon and Chang D. Yoo Paper 10: A Data Set for Clarifying Ambiguous Questions with Intermediate Questions in Visual Question Answering Gyu-Min Park and Seong-Bae Park Paper 11: Spatio-Temporal Graph Random Walks to Understand Long Video Semantic Structure Eun-Sol Kim, Hoyoung Yoon, Minseo Kim and Kyuyoung Lee
11:55-12:00pm	Closing remarks