Introduction to deep learning based on Google's TensorFlow framework. These tutorials are direct ports of Newmu's Theano Tutorials
Topics
- Simple Multiplication
- Linear Regression
- Logistic Regression
- Feedforward Neural Network (Multilayer Perceptron)
- Deep Feedforward Neural Network (Multilayer Perceptron with 2 Hidden Layers O.o)
- Convolutional Neural Network
Dependencies
- TensorFlow
- Numpy
Convolution Network flow:
- Get Image
- create Filters (calculated in training)
- do convolution -> Spatial Convolution (like Gabor filter, polarization)
- apply Tan & Abs (Activation function?)
- subsampling (will decrease the output size), Spatial Sub Sampling:
- Compute the average (e.g. of 2x2 Pixels)
- Multiply it by a trainable coefficient
- Add trainable bias
- Convolution Map: convolute smaller images with filters in the same size. This results into skalar numbers. Symmetrie breaking: convolute not all of the images, but a randomly choosen subset -> training can explore different features
- Linear classification: "y = Ax + b" (produces result 0 or 1)
Things one should keep in mind: Disadvantage: Only finds features of defined size. Solution: Downsampling and pyramid method
Quick thoughts / Cites
- Monitor fraction of "dead" neurons (?)
- "To give you some context, modern Convolutional Networks contain on orders of 100 million parameters and are usually made up of approximately 10-20 layers (hence deep learning)."
- As an aside, in practice it is often the case that 3-layer neural networks will outperform 2-layer nets, but going even deeper (4,5,6-layer) rarely helps much more. This is in stark contrast to Convolutional Networks, where depth has been found to be an extremely important component for a good recognition system (e.g. on order of 10 learnable layers).
- Based on our discussion above, it seems that smaller neural networks can be preferred if the data is not complex enough to prevent overfitting. However, this is incorrect - there are many other preferred ways to prevent overfitting in Neural Networks that we will discuss later (such as L2 regularization, dropout, input noise). In practice, it is always better to use these methods to control overfitting instead of the number of neurons.
- A more recent paper on this topic, Delving Deep into Rectifiers:
Surpassing Human-Level Performance on ImageNet Classification by He et al.,
derives an initialization specifically for ReLU neurons, reaching the
conclusion that the variance of neurons in the network should be
2.0/n. This gives the initializationw = np.random.randn(n) * sqrt(2.0/n), and is the current recommendation for use in practice.
Layers of Convoluted NNs
- INPUT: Input layer
- CONV: Convoluted Layers, Parameters:
- Accepts a volume of size
W1 x H1 x D1 - Requires 4 hyper-parameters:
K, Number of filters (aka depth)F, their spatial extentS, the stridesP, amount of zero padding
- Produces volume of size
W2 x H2 x D2, where:W2 = (W1 - F + 2P)/S + 1H2 = (H1 - F + 2P)/S + 1(symmetry)D2 = K
- With parameter sharing, it introduces
F x F * D1weights per filter, for a total of(F x F * D1) x Kweights andKbiases. - In the output volume, the
d-th depth slice (of sizeW2 x H2) is the result of performing a valid convolution of thed-th filter over the input volume with a stride ofS, and then offset byd-th bias. - see here for a demonstration
- Accepts a volume of size
- RELU: Applies RELU (activation) function
- POOL: subsampling any input
- Accepts a volume of size
W1 x H1 x D1 - Requires three hyper-parameters:
F, their spatial extendS, the stride
- Produces a volume of size
W2 x H2 x D2where:W2 = (W1 - F)/S + 1H2 = (H1 - F)/S + 1(symmetry)D2 = D1
- Introduces zero parameters since it computes a fixed function of the input
- Most commonly:
F = 2, S = 2, function:f(xi) = max(xi) - Note: zero-padding is not common in Pooling Layers
- Accepts a volume of size
- FC: Fully connected layer
- Layer Patterns:
INPUT -> [[CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FCwithN >= 0andN <= 3,M >= 0,K >= 0and `K < 3. Examples:INPUT -> [CONV -> RELU -> POOL]*2 -> FC -> RELU -> FCINPUT -> [CONV -> RELU -> CONV -> RELU -> POOL]*3 -> [FC -> RELU]*2 -> FC
- Hint: Prefer a stack of small filter CONV to one large receptive field CONV layer
Useful links:

