Dal Alert!

Receive alerts from Dalhousie by text message.


PhD Thesis Proposal - Video Relocalization with 3D Convolutional Neural Networks

Who: Yoshimasa Kubo

Title: Video Relocalization with 3D Convolutional Neural Networks

Examining Committee:

Thomas Trappenberg - Faculty of Computer Science (Supervisor)
Dr. Evangelos Milios - Faculty of Computer Science (Reader)
Dr. Malcolm Heywood - Faculty of Computer Science (Reader)
Dr. Mae Seto - Faculty of Engineering (External Examiner)



Recognizing where we are is easy for humans. However, robots cannot do the same thing easily. In robotics, this ”where we are” question is called the localization problem. A common approach in robotics is to use Bayes filters such as Kalman filters or particle filters to combine sensor data with prior knowledge. In contrast, in this thesis we propose to solve this problem with machine learning techniques with camera inputs. This is an end-to-end solution that does not require explicit knowledge such as a sensor model. We are thereby studying possible architectures of different sensor configurations such as using single images, multi-resolution systems, or video. Specifically, in this study, we have applied convolutional neural networks (CNNs) and various localization data sets. We studied how different techniques influence the localization ability. This includes studying the difference between single images and multiple views. We also started to apply such techniques to video input. We found CNNs with multiple sided views outperforms CNNs with one sided view. We further studied how different convolution techniques can be applied. In image processing it is common to use convolution in 2D and a weighting over the color space. In addition to the effect of multiple sided view, here, we studied applying 3D convolutions to video analysis. We found that 3D CNNs outperform 2D CNNs on video classification tasks as well as localization datasets.



FCS Room 142