### Combined effect of a homogeneous ensemble, SMA(30)/SMA(45) vs. SMA(20)/SMA(180) again

In this post, I introduce a new concept. In the current implementation, instead of relying on the decision of one NN, I run many NN and base the forecast on the vote of the committee.

In theory, the aggregate decision of a committee can be derived in various ways. The simplest one is the **average vote of the members of the group**. However, more sophisticated selection is also possible, like the vote of the median, or the vote of that member who was the best forecaster in the previous X weeks.

Note also, that** in this implementation, the members of the group are the same NN**, with the same parameters. In theory (and in the future), I would have an ensemble of heterogeneous NN. But, so far, let’s stick to a simple case. However, if my group is homogeneous (every member is the same), what is the point of having a group?

Note that the **main disadvantage of the NN** usage (as it is discussed in the literature) is that **it tends to stuck in local minima**. Stop here a little bit, because it is not as trivial as it seems at the first sight.

It is not that the function f(x) that the NN estimates that has local minima.

No.

It is perfectly possible that we have a linear monotone function f(x) = 5x +2 to forecast, that has no local minima, but still, the NN (by design) will have a problem that the learning algorithm stuck in a local minima…

**In a local minima of the weight space** of the NN.

Imagine the NN has many weights (w_1, … w_n). The main task of the backpropagation (or any other NN learning algorithm, gradient descent) is to find these weights. The problem is that these searching of the weights space tends to stop by finding a set of weights (w_1, … w_n) that are sub-optimal.

If the gradient descent crawls into local minima, then it can never high-jump from this hole. We have no ‘random-jumps’ in the domain of the weights. For these random jumps, (if required), the genetic algorithm (random mutation) ore reinforcement learning (trial and error weights setting) can be suggested.

As I cannot be well-rounded, I realize I cannot be deeply proficient in every AI method (GA, RL, SVM, ARMA, etc.), so for now, I don’t search algorithms that can overcome this problem. I keep the NN as my favorite and try to leverage as much as I can on the NN concept.

Ok, **how can we mitigate the problem of this local minima** stuck?

Usually, we can start the search of weights space from random different points and **let different NN learns from different random initial conditions**. **Many of them stuck at local minima, some few find the global minima. Aggregate their knowledge.**

This is exactly what we do now. Note that we randomly divide the teaching set to (train, validation) sets. We use 20% of the teaching set for validation. Because we randomly divide the teaching set, if we run 20 different NN with the same initial inputs, we get 20 different NN, with 20 different weights.

So, all of our members of the committee have the same structure. They only differ, because the random selection of the validation samples drives them on different learning routes.

The aim of this voting is that the forecasted value and the performance is more stable.

Also note that we made another change from the code.

In previous implementations, we used all the samples (even samples from the future) for teaching. (I mentioned this.) Of course, the test sample was not in the teaching set, but the thing that we used future samples biased our results to the upside. That was not correct. From now on, we use only samples from the past that were available at the time of the forecast. In the current test, we use all the previously available data. So, for example, for the last week, the 20 NN are trained on 10 years of data.

I don’t show tests now that the forecast of the ensemble is better than the forecast of the solitary NN. It is obviously is better, and it was proved in our test. (the winLose percentage is better and less volatile)

However, I present a test that compares again the different SMA parameters.

With the previous framework (when the NN was taught on future samples as well), we compared the SMA(30)/SMA(45) with a more long term SMA(20)/SMA(180). We concluded that the SMA(30)/SMA(45) was as good as the SMA(20)/SMA(180). That was a surprise then. In the new framework, we repeat the same test.

Note that** the committee has 20 members now.**

Parameters:

nNeurons = 5;

testSamplesRatio = 0.50;

nRebalanceDays = 5;

startDate=’2000-01-02′

nShortTermMA = 30;

nLongTermMA = 45;

nRepeatLearning = 20;

**Test with 30/45 SMA**

****Test 1: winLose: 53.77%, avgWeeklyGainP: 0.08%, projectedCAGR: 4.18%, periodGain: 8.04%

****Test 2: winLose: 56.07%, avgWeeklyGainP: 0.46%, projectedCAGR: 26.63%, periodGain: 240.20%

****Test 3: winLose: 49.34%, avgWeeklyGainP: 0.06%, projectedCAGR: 2.97%, periodGain: -4.97%

****Test 4: winLose: 51.99%, avgWeeklyGainP: -0.11%, projectedCAGR: -5.49%, periodGain: -41.09%

****Test 5: winLose: 48.68%, avgWeeklyGainP: -0.09%, projectedCAGR: -4.81%, periodGain: -38.34%

**Average: winLose:51.97%; avgWeeklyGainP:0.08**

**test with 20/180 SMA**

****Test 1: winLose: 53.26%, avgWeeklyGainP: 0.18%, projectedCAGR: 9.88%, periodGain: 37.31%

****Test 2: winLose: 55.36%, avgWeeklyGainP: 0.27%, projectedCAGR: 15.20%, periodGain: 82.77%

****Test 3: winLose: 53.61%, avgWeeklyGainP: 0.16%, projectedCAGR: 8.93%, periodGain: 33.40%

****Test 4: winLose: 56.01%, avgWeeklyGainP: 0.04%, projectedCAGR: 2.18%, periodGain: -3.95%

****Test 5: winLose: 53.61%, avgWeeklyGainP: 0.13%, projectedCAGR: 7.02%, periodGain: 24.51%

**Ave: winLose:54.37%; avgWeeklyGainP:0.16**

Interestingly, in this context, the SMA(30)/SMA(45) is not as good as the SMA(20)/SMA(180).

So, in future test, I would like again to use a more long term average, like 180..200 days.

Filed under: Uncategorized | Leave a Comment

## No Responses Yet to “Combined effect of a homogeneous ensemble, SMA(30)/SMA(45) vs. SMA(20)/SMA(180) again”