How many epochs to train? (ensemble ANN)

17Sep10

When we eliminated the validation sets from the training samples, we opted to use a fix number of epochs to train. An epoch is one step of the training when all the available training samples are shown to the NN. We have already written about the optimal epoch for standalone ANN. See one of the previous posts. This time, we made an experiment for the ensemble voters version of the ANN. In this version we have a group of 20 ANNs as a committee. We created this group to mitigate randomness. Teach them for N epochs. N = 1..7

We use 200 days lookback period. Also note that we normalize the targets (target is the next day %gain) in the sample. We calculate the mean for the last 200 days and we subtract it from the targets. We feed that to the ANN. After the ANN makes the forecast, we should add this 200 days mean back. However, we think that the current market trend average is better estimated from only the most recent 100 days (half of the training samples) data. We add this 100 days average to the forecasted value to get an estimate for the next day %gain.

The aim of this study is to have some insights what is the number of epochs worth training. For each epoch number (1…7), we make 5 experiments (note it is then 7*5=35 backtests, that took 24 hours to run). We calculate the mean and variance for 3 statistics:
D_stat(%) (Directional accuracy statistics) that is the winning bets vs. the losing bets in percentage
estimated CAGR% (Compounded Annual Growth Rate that is based on the arithmetic mean of the daily returns projected for 250 days). This is the additive return. We invest a fix amount (like 100USD) every day and we harvest the return instantly on that day, so we don’t let the returns accumulate.
TR% (total return in that period that is a geometric mean of the daily returns). Multiplicative return. Daily returns are multiplied.

Note that after a while the D_stat(%),CAGR% and TR% don’t improve, but get worse. That is rather strange to me. We have only 2 neurons (2 weights to optimize), and 200 training samples. I don’t see how the ANN can overtrain. As we train more epochs, we get worse results. Somebody has an idea to explain?
Ok, let’s suppose the 2 neurons ANN can overtrain. What can we do? We can do something similar to the default behavior of the Matlab ANN. We run the training, and we increase the epochs. After every epoch we calculate the RMSE error of the ANN for the training set (not the validation set as the default Matlab behaviour). If it doesn’t improve, we stop the training. This is very similar to the default case. What we propose here is that we don’t separate the input samples to training (80%) and validation samples (20%), because that introduces too much randomness. We have 200 input samples, so only 20% = 40 samples would be in the validation set. That is too much randomness in them. So, what we can do in practice in Matlab? Instead of setting random 20% validation proportion in Matlab, we can fill the validation set manually. And we will set the validation set as the full input set. If the RMSE error doesn’t improve, the learning is stopped. This method would adapt to the problem that we attack. Instead of a fix number of epoch, it would train until it finds useful. Also, if we control the training process, I would specify a minimum number of epochs. That is at least 2 or 3 epochs to train.
Until this process is worked out, as a fix number, we would use 4 epochs for ANN training.

Advertisements


No Responses Yet to “How many epochs to train? (ensemble ANN)”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: