How many epochs to train? (standalone NN)

15Sep10

When we eliminated the validation sets from the training samples, we opted to use a fix number of epochs to train. An epoch is one step of the training when all the available training samples are shown to the NN. It can be done in an incremental way one by one (MATLAB: adapt() method) or in one bulk as a batch learning (MATLAB: train() method). The advantage of the batch learning is efficiency; the advantage of incremental learning is that it is proved that with that training the NN can estimate any non linear function in the limit case. While this is not guaranteed in the batch learning case.
Nevertheless, we use the batch learning.

We repeat two illustrations here from a previous post. (click them for a little larger image)


In the first image there are the 10 cases of the NN surface after 1 epoch training. The first little stamp image shows the unprocessed output that you have already seen it in the last post. That is the average next day %gains for the days Mon…Fri.
In the second stamp image, the same can be seen, but the mean of the period is subtracted from the columns.
We expect that the NN should learn an inverted bell shape. However, note that in the 1 epoch case, only 4 out of the 10 NN could learn the inverted bell shape. Training the NN for only 1 epoch is clearly not enough. Unfortunately (as I realized now), this happened many times in our former experiments. Imagine that when the validation set (20% of the samples) were used for termination, and randomly an outlier were put into the validation set. We start the learning epochs. The training set RMSE error improves gradually, but the validation RMSE error increases at every step. The default Matlab behavior is that after the validation error increases for 6 consecutive epochs, it rolls back the 6 epochs learning and generate the NN that exist 6 epochs before. If the validation RMSE error increases from the inception from the 1st epoch to the 7th epoch, Matlab rolls back to the 1st epoch and finalize that as the output net of the training. So, in many cases in our past, the final NN was equivalent to a NN trained for 1 epoch only.

The second image is the same, but after 2 epochs training. Note that 9 out of 10 NN could learn the inverted bell shape.

We don’t know what the optimal number of epochs is in advance. Usually, the optimal nEpoch should increase as the input dimension increases or the complexity of the function increases.
It is more difficult to climb a 20 dimensional hyperspace than a 2 dimensional space. In our case we have 2 neurons, so the state space is 2 dimensional. Articles for 20 dimensional weights report 1500 needed epoch. We mention that ultimately, a better way is to stop training at a specified error threshold, but that also depends on the task we try to solve.

In the experiment of this post, we get rid of the ensemble idea, so every day we train only 1 NN to present a forecast. We use 200 days lookback period. Also note that we normalize the targets in the sample. We calculate the mean for the last 200 days and we subtract it from the targets. We feed that to the NN. After the NN make the forecast, we should add this 200 days mean back. However, we think that the current market trend mean is better estimated from only the most recent 100 days (half of the training samples) data. We add this 100 days mean to the forecasted value to get an estimate for the next day %gain.

The aim of this study is to have some insights what is the number of epoch worth picking. For each epoch number (1…9), we make 21 experiment (I wanted 20, but made a mistake that I don’t regret). We calculate the mean and variance for 3 statistics:
D_stat(%) (Directional accuracy statistics) that is the winning bets vs. the losing bets in percentage
– estimated CAGR% (Compounded Annual Growth Rate that is based on the arithmetic mean of the daily returns projected for 250 days). This is the additive return. We invest a fix amount (like 100USD) every day and we harvest the return instantly, so we don’t let the returns accumulate.
TR% (Total Return in that period that is a geometric mean of the daily returns). Multiplicative return. Daily returns are accumulated.

Click for bigger image:

Note that after a while the D_stat(%) doesnt’ improve. However, the CAGR improves nicely. Our conclusion is that it is worth increasing the number of epoch to infinity. As the nEpoch increases we get more and more accurate results. However, for practical reasons, we have to limit it.
For the D_stat(%), 2 epochs are not enough, but 3 epochs are enough.
For the CAGR%, 5 epochs is the optimal looking at the mean of the CAGR%, but the standard deviation chart is more important. We seek to decrease the variance of the backtest as low as we can. The CAGR% std chart suggest 6 epochs to be the optimal. TR% charts are similar to the CAGR%. That is not a surprise, since one is the arithmetic mean the other is the geometric mean of the daily returns.
In future studies we probably will use 5 epochs for the day of the week anomaly problem. For another tasks, like RSI2 based NN forecast, we should probably repeat this experiment and we probably determine another optimal epoch count.

Advertisements


No Responses Yet to “How many epochs to train? (standalone NN)”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: