ELLIS Delft Talk by Hongyang Lin & Michael Milford

18 July 2024 12:00 till 13:00 - Location: Zoom & Building 34 (ME) Lecture Hall F - By: ELLIS Delft | Add to my calendar

This meeting is open for all interested researchers, and we particularly want to emphasize that we very much welcome all the PhD students and postdocs that are associated with the unit! If you have ideas about ways to make these talks more engaging for you, please let us know your suggestions.

We hope that the majority of you will join us in person, but those that are not able to may join online.
 

Physical
Building 34 (Mechanical Engineering)
ME-Lecture Hall F - Simon Stevin, 34.A-0-610


Zoom
https://tudelft.zoom.us/j/93958722644?pwd=WnQ5ZFZvZGlNRnRidndIVlhBTUsvdz09
Meeting ID: 939 5872 2644
Passcode: 296739

 

Speaker 1: Dr. Hongyang Lin from Shanghai AI Lab and the University of Hong Kong

Title: What are Good Pre-training Representations for Robotic Manipulation?

 

Abstract
Representation learning approaches for robot manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture the patterns of behavior and physical interaction during the manipulation process, resulting in an inadequate understanding of the relationship between objects and the environment. To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation. Given a pair of key frames representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively. These two learning objectives achieve superior comprehension towards ``how-to-interact" and ``where-to-interact". We conduct a comprehensive evaluation of four challenging robotic tasks. The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms as well as simulation environments.

Biography
Hongyang Li is an Assistant Professor at University of Hong Kong and Research Scientist at OpenDriveLab, Shanghai AI Lab. His research focus is on autonomous driving and embodied AI. He led the end-to-end autonomous driving project, UniAD and won the IEEE CVPR 2023 Best Paper Award. UniAD has a large impact both in academia and industry, including the recent rollout to customers by Tesla in FSD V12. He proposed the bird’s-eye-view perception work, BEVFormer, that won Top 100 AI Papers in 2022 and was explicitly recognized by Jensen Huang, CEO of NVIDIA and Prof. Shashua, CEO of Mobileye at public keynote talks. He served as Area Chair for CVPR 2023, 2024, NeurIPS 2023 (Notable AC), 2024, ACM MM 2024, referee for Nature Communications. He will serve as Workshop Chair for CVPR 2026. He is the Working Group Chair for IEEE Standards under Vehicular Technology Society and Senior Member of IEEE.


Speaker 2: Prof. Dr. Michael Milford from Queensland University of Technology

Title: Safe, collaboration-friendly, user-acceptable and performant perception and localization for autonomous systems


Abstract
For robots, autonomous vehicles and general technology platforms to ever be deployed ubiquitously in the world around us, they must meet certain requirements. Firstly, they must be performant - and this has been the focus of the vast majority of research attention, focusing on levels of performance as well as their generality and robustness. Secondly, most robot deployments will be in some manner collaborative or supervised - midway between the human-only traditional model and the speculative fully autonomous approach. Collaboration requires key capabilities from autonomous systems, most notably introspective capability, so that they can work and interact seamlessly with people. Thirdly, they must operate in a manner that is acceptable by end-users: a great example of this being complying with legal and social expectations around privacy in the case of perception systems. Finally, they must be safe and fit-for-purpose, and at least some of the metrics and the manner in which we measure this performance for research should ideally be directly predictive of these properties. In this talk I'll highlight challenges and limitations in all of these areas, and, using both applied industry and fundamental research projects as examples, showcase work we've done to address these challenges.

Biography
Professor Milford conducts interdisciplinary research at the boundary between robotics, neuroscience, computer vision and machine learning, and is a multi-award winning educational entrepreneur. His research models the neural mechanisms in the brain underlying tasks like navigation and perception to develop new technologies in challenging application domains such as all-weather, anytime positioning for autonomous vehicles. From 2022 – 2027 he is leading a large research team combining bio-inspired and computer science-based approaches to provide a ubiquitous alternative to GPS that does not rely on satellites. He is also one of Australia’s most in demand experts in technologies including self-driving cars, robotics and artificial intelligence, and is a passionate science communicator. He currently holds the position of Director of the QUT Centre for Robotics, Australian Research Council Laureate Fellow, Professor at the Queensland University of Technology, and is a Microsoft Research Faculty Fellow and Fellow of the Australian Academy of Technology and Engineering.