Dal Alert!

Receive alerts from Dalhousie by text message.


Big Data Seminar: Piotr Wójcik - Random Projection in Deep Neural Networks

Speaker: Piotr Wójcik (AGH University of Science and Technologyu)

Title:   Random Projection in Deep Neural Networks

Date: Tuesday December 12, 2017

This dissertation investigates the ways in which deep learning methods can benefit from Random Projection (RP), a classic linear dimensionality reduction method. In particular, we focus on two areas where, as we have found, employing RP techniques can enhance deep models.

In the first application of RP, we make use of its original purpose, i.e., reducing the dimensionality of the input data. We show how this can be useful in training deep networks on data that is sparse, unstructured and extremely high-dimensional. This type of data often arises in areas such as social media, web crawling, gene sequencing or biomedical analysis. Currently, learning in similar applications is reserved for fast linear methods that can efficiently process sparse high-dimensional feature vectors, e.g., support vector machines or logistic regression classifiers. More complex non-linear approaches are usually not viable mostly because of their high computational cost. With the assistance of Random Projection, we narrow this gap and enable deep networks to be trained on such problematic type of data. Training deep networks on high-dimensional data that has no exploitable structure has proved to be practically infeasible. In fact, the dimensionality of the input data in most modern neural network applications is relatively low. For example, networks trained for speech recognition tasks employ input vectors with size on the order of hundreds of dimensions. Training on data with higher input dimensionality typically requires some structure in the data. This is the case in Convolutional Neural Networks (CNNs), which can work with up to hundred thousand input pixels. CNNs take advantage of the spatial structure of images by exploiting the local pixel connectivity and sharing the weights between spatial locations, which greatly reduces the number of learnable parameters. However, training deep neural networks on multi-million-dimensional data with no exploitable structure poses a major computational problem. It implies a network architecture with a huge input layer, which greatly increases the number of weights, often making
the training infeasible. We show, that this problem can be solved by incorporating RP into the network architecture. In particular, we propose to prepend the network with an input layer whose weights are initialized to elements of an RP matrix. We study cases where the weights of this RP layer are either fixed during training or finetuned with error backpropagation. We propose several architecture and training regime modifications that make fine tuning the weights in the RP layers feasible, even on large-scale datasets. Our results demonstrate that, in comparison to the state-of-the-art methods, neural networks with RP layer achieve competitive performance on extremely high-dimensional real-world datasets.
The second, less conventional area, where we found the application of RP techniques to be beneficial for training deep models is weight initialization. Specifically, we studied setting the initial weights in deep networks to various RP matrices instead of drawing them from a scaled normal distribution as is done in the current state-of-the-art initialization technique. Such RP initialization enabled us to train deep networks to higher levels of performance: our experiments suggest that particularly deep CNNs can benefit from the introduced method

Brief Bio:
Piotr Iwo Wójcik received M.Sc. in Computer Science from the AGH University of Science and Technology, where he is currently a Ph.D. candidate at the Department of Computer Science. His research interests focus on machine learning methods and bioinformatics.

  Stan Matwin (stan@cs.dal.ca)



Room 127, Goldberg Computer Science Building