Draft Title: Visual Recognition from One or More Images
Draft Abstract: Building machines that see and recognize from visual inputs is not easy. In the last decade, there have been tremendous efforts in visual recognition by developing new ways to process image inputs, either in the form of single inputs or as streams. Our advances have allowed us to build systems which can recognize objects in 2D and 3D, identify their class and their pose or track them in time. While not perfect, these models work robustly enough that we can rely on them in everyday life, including facilitating us at home or using them with our smart phones. In this talk, I will give an overview of state-of-the-art visual recognition models I have worked on, covering a range of visual tasks, such as object detection, human-object interaction, human pose tracking and 3D object understanding. I will discuss the computational needs behind running such models, which is particularly relevant for distributed smart camera applications.