### Adaptive NN and partial elimination of randomness

**1. Adaptive**

The previous post used all the previous past values for forecasting that the day of the week will be bullish or bearish.

In the backtest, he first prediction started in 2005.

That prediction used all the previous data from 1998 to 2005 to learn. It seems a good idea for an a equilibrium problem: use all the data that is available. However, financial markets are far from equilibrium processes. The mean of the SP500 (for example the SMA(200)) changes all the time. There are bullish periods for 5 years then there are bearish periods for 2 years. There are periods with high volatility. There are periods when the Mondays are bearish (usually), but there are periods when the Mondays are bullish (2010 spring). Sometimes, the bearish Mondays effect shifts a day. So Monday becomes bullish, but Tuesdays become bearish instead. We dub these periods as regimes. We expect that these regimes exist for quite a long time. (years) So, teaching the NN from all the available past is not as good idea as it seems at first in the financial time series case. In this post, we restrict the available past data to only the last 200 (or 300) days only. This way our algorithm is now adaptive. It adapts to the last regime.

**2. Randomness**

The previous post showed how different the NN decision surfaces can be. For example the nNeurons = 2 case: the supposedly ‘final’ surface can be slope up, slope down or bell shape. This doesn’t look like a robust method. This variability stems from 2 sources of randomness.

**A.
On the one hand**, randomness arises when the NN input, weights and bias are initialized randomly. This is the default MATLAB behaviour to assert robustness. I was tempted to eliminate this source of randomness, but so far I let it be. Those who are interested how to initialize those weights to zero, here is the MATLAB code:

`-Set net.initFcn to 'initlay'. (net.initParam will automatically become initlay's default parameters.)`

-Set net.layers{i}.initFcn to 'initwb'.

-Set each net.inputWeights{i,j}.initFcn to 'initzero'.

-Set each net.layerWeights{i,j}.initFcn to 'initzero'.

-Set each net.biases{i}.initFcn to 'initzero'.

-To initialize the network, call init.

**B.
On the other hand**, randomness stems from the fact that the previous data is randomly divided between training and validation sets. Previously in my research, I used 20% of the past data to validation. This source of randomness can be huge. Imagine that an outlier sample is once placed into the training bin, but on the next round it isn’t. Options to eliminate this randomness is to

– divide the past data deterministically between training and validation.

This will not introduce randomness in that sense that if I repeat the experiment, I will be given the same result, but nevertheless, it introduces randomness, because an outlier ‘accidentally’ (even deterministially) is placed into the training set on one day, and it is placed in the validation set on the following day.

– don’t reserve validation data at all. Use all the past samples for training.

Ok, but then what criterion to use for stopping the learning?

Recall that the validation set is used for keeping the NN from overtraining. After each epoch, the RMSE is measured on the training set and on the validation set. If the RMSE starts to increase for 6 consecutive epoch on the validation set, that suggest we overtrained, so we stop training. This is a robust default MATLAB behaviour. It is very good for many cases.

BUT

Is there a danger for us to overtrain the NN?

Absolutely not. Note that the number of neurons is 2, and we show the NN 200 previous data samples in this adaptive NN case. Can we overtrain the NN, if it has only 2 neurons? (therefore 2 parameters to estimate) No.

After having a conclusion that we don’t risk overtraining, we face the problem of when to stop the training.

There are at least 2 possibilities used in the literature:

– stop the training after a fix X number of epochs (parameter X)

– stop the training after the RMSE on the training set decrease to a threshold. (parameter threshold)

Unfortunately, both of these 2 cases based on trial-and-error for determining the parameter.

We have to determine what is the ‘optimal’ number of epochs, or the optimal RMSE threshold for that special case. We lose the nice automatic termination decision of the validation set version. But that is the price I am willing to pay for eliminating most of the randomness from the process.

So, we will stop training after a fix number of epochs. That fix number is 2 or 3. We will determine it later. Note also that the number of epoch necessary really depends on the problem at hand. In our current case, with the day of the week anomaly, we have only a 1 dimensional input. (1 = Monday, … 5 = Friday) With this 1 dimensional input we have 2 neurons (plus the bias). So, imagine that for the 2 neurons, we have w_1, w_2 as the weight the learning algorithm wants to optimize. It is really enough to surf the 2 dimensional space in 2 or 3 epochs. Especially, if the function to approximate is smooth and has a clear global minimum or maximum.However, imagine a case with 20 neurons. We cannot normally expect that surfing on a 20 dimensional space we can find the minimum in 2 or 3 epochs (training sessions).I have seen in the literature that with 20 neurons case, people trained the network in 1500 epochs.

So, ultimately, I expect that instead of fix number of epochs, I will set a fix RMSE threshold. But for the purpose of the next studies, I use the fix epochs as a remedy to randomness.

We show two illustrations here. (click them for a trifle larger version)

In the first image there are the 10 cases of the NN surface after 1 epoch training. The first little stamp image shows the unprocessed output that you have already seen it in the last post. That is the average next day %gains for the days Mon…Fri.

In the second stamp image, the same can be seen, but the mean of the period is subtracted from the columns.

We expect that the NN should learn an inverted bell shape. However, note that in the 1 epoch case, only 4 out of the 10 NN could learn the inverted bell shape. Training the NN for only 1 epoch is clearly not enough. Unfortunately, this happened many times in the former case, when the validation set (20% of the samples) were used for termination (for example when an outlier were put into the validation set, even if the training set RMSE error improves, the validation RMSE error increased).

The second image is the same, but after 2 epochs training. Note that 9 out of 10 NN could learn the inverted bell shape.

The important thing we achieved today is that we reduced the variability of the trained NN compared to our past experiments. See the previous post. Note that **we still have some randomness, because the NN weights are initialized randomly**. But **with randomness reduction**, our forecast become more robust and our **backtest are more reliable**.

Filed under: Uncategorized | Leave a Comment

## No Responses Yet to “Adaptive NN and partial elimination of randomness”