Random Projection in Deep Neural Networks


This dissertation investigates the ways in which deep learning methods
can benefit from Random Projection (RP), a classic linear dimensionality
reduction method. In particular, we focus on two areas where, as we
have found, employing RP techniques can enhance deep models.

In the first application of RP, we make use of its original purpose,
i.e., reducing the dimensionality of the input data. We show how this
can be useful in training deep networks on data that is sparse,
unstructured and extremely high-dimensional. This type of data often
arises in areas such as social media, web crawling, gene sequencing or
biomedical analysis. Currently, learning in similar applications is
reserved for fast linear methods that can efficiently process sparse
high-dimensional feature vectors, e.g., support vector machines or
logistic regression classifiers. More complex non-linear approaches are
usually not viable mostly because of their high computational cost. With
the assistance of Random Projection, we narrow this gap and enable deep
networks to be trained on such problematic type of data. Training deep
networks on high-dimensional data that has no exploitable structure has
proved to be practically infeasible. In fact, the dimensionality of the
input data in most modern neural network applications is relatively low.
For example, networks trained for speech recognition tasks employ input
vectors with size on the order of hundreds of dimensions. Training on
data with higher input dimensionality typically requires some structure
in the data. This is the case in Convolutional Neural Networks (CNNs),
which can work with up to hundred thousand input pixels. CNNs take
advantage of the spatial structure of images by exploiting the local
pixel connectivity and sharing the weights between spatial locations,
which greatly reduces the number of learnable parameters. However,
training deep neural networks on multi-million-dimensional data with no
exploitable structure poses a major computational problem. It implies a
network architecture with a huge input layer, which greatly increases
the number of weights, often making the training infeasible. We show,
that this problem can be solved by incorporating RP into the network
architecture. In particular, we propose to prepend the network with an
input layer whose weights are initialized to elements of an RP matrix.
We study cases where the weights of this RP layer are either fixed
during training or finetuned with error backpropagation. We propose
several architecture and training regime modifications that make
finetuning the weights in the RP layers feasible, even on large-scale
datasets. Our results demonstrate that, in comparison to the
state-of-the-art methods, neural networks with RP layer achieve
competitive performance on extremely high-dimensional real-world

The second, less conventional area, where we found the application of
RP techniques to be beneficial for training deep models is weight
initialization. Specifically, we studied setting the initial weights in
deep networks to various RP matrices instead of drawing them from a
scaled normal distribution as is done in the current state-of-the-art
initialization technique. Such RP initialization enabled us to train
deep networks to higher levels of performance: our experiments suggest
that particularly deep CNNs can benefit from the introduced method.

Speaker Bio:

Piotr Iwo Wójcik received M.Sc. in Computer Science from the AGH
University of Science and Technology, where he is currently a Ph.D.
candidate at the Department of Computer Science. His research interests
focus on machine learningmethods and bioinformatics.


Lectures, Seminars



Computer Science Auditorium (#127), Goldberg Computer Science Building




David Langstroth, dll@cs.dal.ca