Does the increase of nEpoch decrease randomness?

01Dec10

Our input is only the CurrDayChange in the continuous case. We used to have 10 members in the ensemble, but this time, we used an ensemble with 50 members. That is quite extreme, and it took a lot of time to run the backtest (about 5 hours each). We made 4 experiments with nEpoch=5 and nEpoch=10.

– We contend that increasing the number of epoch in the training from 5 to 10 was good four both our performance and the randomness. We only expected to see less randomness, but even the performance is increased.

the D_stat increased from 51.4% to 51.9% and the CAGR increased from 8.84% to 12.65%.

by doubling the number of epoch, the STD, the randomness decreased to half.

In backtests, we will use 5 epochs (to save precious computation power) , but in production environment, in which case we need to train the network only once (for today), we will use at least 10 epochs.

– note however that increasing the epoch further was not always advantageous in the continuous input case. I share a weird thing that I don’t fully understand yet.
When I manually inspected both the histogram of the input and the predicted output function (the ANN surface) I was surprised to find that
– nEpoch = 5: there are many different random versions of the ANN function (no constant function). we don’t overtrain.
– nEpoch = 250: the NNSurface is always the same useless constant function.
So, as we increase nEpoch to 250, we actually get worse Function approximation. That is something to investigate later in the Synthetic Random Function case.
That is very weird. Another interesting thing is even if I set the nEpoch to be 250, very frequently the system stops after about 100 epochs with the message trainLm.stop =“Minimum gradient reached.”. The reason why it happens:

The purpose of training is to minimize the objective function,
not achieving zero error (or some other specified low value).

When minimization occurs, theoretically, the gradient is zero.When a
computer taking finite steps gets sufficiently near a local min,
the gradient will be less than some small value. The program
suggests 1e-10 is sufficiently small for believing that you are
sufficiently near a local min. I agree, assuming inputs, initial
weights and other learning parameters are properly scaled.

So, the system is stuck in another local minimum. Should I try the online (not batch training)
Batch training is not proved to find the minima. Use adapt() instead.
Tried Adapt(): the system doesn’t stuck in that stupid constant function solution. The result is different every time, even with 2500 nEpoch, it doesn’t converge.

With batch training after 250 nEpoch, it always gives the same constant function. What I expect that prediction of the Gaussian is the problem. There are some outliers that still kill the learning.
We suggest to test it with the discretized case (2 bins) instead of the continuous case. If after 250 epochs, the discretized case doesn’t stop prematurely and doesn’t give the useless constant function, we pronounce the discrete case to be the winner.

To brief the current post: Can we decrease volatility and randomness by increasing nEpoch? Yes, and the good news that both randomness and performance improves. The bad news is that we can increase nEpoch from 5 to 10, but we cannot increase it too much (to 250), because of some weirdness in the learning of the Gaussian random function.

Advertisements


No Responses Yet to “Does the increase of nEpoch decrease randomness?”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: