A Survey of Deep Learning Techniques Applied to Trading
Deep learning has been getting a lot of attention lately with breakthroughs in image classification and speech recognition. However, its application to finance doesn’t yet seem to be commonplace. This survey covers what I’ve found so far that is relevant to systematic trading. Please tell me if you know of some research I’ve missed.
Acronyms:...MORE
DBN = Deep Belief Network
LSTM = Long Short-Term Memory
MLP = Multi-layer Perceptron
RBM = Restricted Boltzmann Machine
ReLU = Rectified Linear Units
CNN = Convolutional Neural Network
Limit Order Book Modeling
Sirignano (2016) predicts changes in limit order books. He has developed a “spatial neural network” that can take advantage of local spatial structure, is more interpretable, and more computationally efficient than a standard neural network for this purpose. He models the joint distribution of the best bid and ask at the time of the next state change. Also, he models the joint distribution of the best bid and ask prices upon the change in either of them.
Architecture – Each neural network has 4 layers. The standard neural network has 250 neurons per hidden layer, and the spatial neural network has 50. He uses the tanh activation function on the hidden layer neurons.
Training – He trained and tested on order books from 489 stocks from 2014 to 2015 (a separate model for each stock). He uses Level III limit order book data from the NASDAQ with event times having nanosecond decimal precision. Training involved 50TB of data and used a cluster with 50 GPUs. He includes 200 features: the price and size of the limit order book across the first 50 non-zero bid and ask levels. He uses dropout to prevent overfitting. He uses batch normalization between each hidden layer to prevent internal covariate shift. Training is done with the RMSProp algorithm. RMSProp is similar to stochastic gradient descent with momentum but it normalizes the gradient by a running average of the past gradients. He uses an adaptive learning rate where the learning rate is decreased by a constant factor whenever the training error increases over a training epoch. He uses early stopping imposed via a validation set to reduce overfitting. He also includes an l^2 penalty when training in order to reduce overfitting.
Results – He shows that limit order books exhibit some degree of local spatial structure. He predicts the order book 1 second ahead and also at the time of the next bid/ask change. The spatial neural network outperforms the standard neural network and logistic regression with non-linear features. Both neural networks have 10% lower error than logistic regression.
Price-based Classification Models
Dixon et al. (2016) use a deep neural network to predict the sign of the price change over the next 5 minutes for 43 commodity and forex futures.
Architecture – Their input layer has 9,896 neurons for input features made up of lagged price differences and co-movements between contracts. There are 5 learned fully-connected layers. The first of the four hidden layers contains 1,000 neurons, and each subsequent layer tapers by 100 neurons. The output layer has 135 neurons (3 for each class {-1, 0, 1} times 43 contracts).
Training – They used the standard back-propagation with stochastic gradient descent. They speed up training by using mini-batching (computing the gradient on several training examples at once rather than individual examples). Rather than an nVidia GPU, they used an Intel Xeon Phi co-processor.
Results – They report 42% accuracy, overall, for three-class classification. They do some walk-forward training instead of a traditional backtest. Their boxplot shows some generally positive Sharpe ratios from the mini-backtests for each contract. They did not include transaction costs or crossing the bid-ask spread. All their predictions and features were based on the mid-price at the end of each 5-minute time period.
Takeuchi and Lee (2013) look to enhance the momentum effect by predicting which stocks will have higher or lower monthly returns than the median.
Architecture – They use an auto-encoder composed of stacked RBMs to extract features from stock prices which they then pass to a feed-forward neural network classifier. Each RBM consists of one layer of visible units and one layer of hidden units connected by symmetric links. The first layer has 33 units for input features from one stock at a time. For every month t, the features include the 12 monthly returns for month t-2 through t-13 and the 20 daily returns approximately corresponding to month t. They normalize each of the return features by calculating the z-score relative to the cross-section of all stocks for each month or day. The number of hidden units in the final layer of the encoder is sharply reduced, forcing dimensionality reduction. The output layer has 2 units, corresponding to whether the stock ended up above or below the median return for the month. Final layer sizes are 33-40-4-50-2.
Training – During pre-training, they split the dataset into smaller, non-overlapping mini-batches. Afterwards, they un-roll the RBMs to form an encoder-decoder, which is fine-tuned using back-propagation. They consider all stocks trading on the NYSE, AMEX, or NASDAQ with a price greater than $5. They train on data from 1965 to 1989 (848,000 stock-month samples) and test on data from 1990 to 2009 (924,300 stock-month samples). Some training data held-out for validation for the number of layers and the number of units per layer.
Results – Their overall accuracy is around 53%. When they consider the difference between the top decile and the bottom decile predictions, they get 3.35% per month, or 45.93% annualized return....