Increasing the number of weekly samples 5 fold

17May10

The test performed in the last post had a startdate of 1997-11-28 and enddate of 2010-03. That was about 12 years. Using every week as 1 sample resulted a sample count of about 12 * 52 = 600. Basically, we used the Monday/next Monday/next-next Monday samples. The NN likes as many samples as possible. Therefore an idea is that we can also use the Tuesday/next Tuesday pairs, Wednesday/next Wednesday… etc. pairs.
We use all 1 week pairs in our sampling data. This effectively increases the training samples 5 times.
So, instead of having 600 samples (600 weeks), we have now 3000 samples.

In the previous post, I introduced 2 types of test.
In the first, 40% of the samples were kept for out-of-sample testing and calculating the performance measures.
In the second case, 5% of the samples were used for testing.

Note that we also introduced the avgWeeklyGainPercent measure, beside the winLoseRatio (that measured the directional accuracy).
From the avgWeeklyGainPercent, we calculate the projCAGR (projected Commulative Annual Growth Rate) as an 52 weeks cumulative gain: ((1+avgWeeklyGainPercent)^52-1)*100.

A.
Having the old 600 samples, we had these results:

5% testSet case
Neurons: 2: winLoseRatios Arithmetic Mean: 52.88%, stdev: 9.24%, avgWeeklyGainPercent mean: 0.08%, stdev: 0.61%, projCAGR: 4.08%
Neurons: 3: winLoseRatios Arithmetic Mean: 53.39%, stdev: 9.05%, avgWeeklyGainPercent mean: 0.08%, stdev: 0.63%, projCAGR: 4.02%
Neurons: 4: winLoseRatios Arithmetic Mean: 52.01%, stdev: 9.11%, avgWeeklyGainPercent mean: -0.01%, stdev: 0.61%, projCAGR: -0.73%
Neurons: 5: winLoseRatios Arithmetic Mean: 51.94%, stdev: 9.35%, avgWeeklyGainPercent mean: -0.03%, stdev: 0.62%, projCAGR: -1.36%

40% testSet case
Neurons: 3: winLoseRatios Arithmetic Mean: 52.68%, stdev: 3.49%, avgWeeklyGainPercent mean: 0.05%, stdev: 0.20%, projCAGR: 2.70%
Neurons: 4: winLoseRatios Arithmetic Mean: 51.60%, stdev: 3.59%, avgWeeklyGainPercent mean: -0.01%, stdev: 0.20%, projCAGR: -0.59%
Neurons: 5: winLoseRatios Arithmetic Mean: 51.35%, stdev: 3.50%, avgWeeklyGainPercent mean: -0.03%, stdev: 0.20%, projCAGR: -1.37%

B.
Having the new 3000 samples, we have these results:
After introducing 5x times more samples (samples from 600 to 3000)
5% testSet case
Neurons: 2:winLoseRatios Arithmetic Mean: 54.42%, stdev: 4.23%, avgWeeklyGainPercent mean: 0.15%, stdev: 0.29%, projCAGR: 8.20%
Neurons: 3:winLoseRatios Arithmetic Mean: 53.99%, stdev: 4.16%, avgWeeklyGainPercent mean: 0.13%, stdev: 0.28%, projCAGR: 6.81%
Neurons: 4:winLoseRatios Arithmetic Mean: 54.46%, stdev: 4.05%, avgWeeklyGainPercent mean: 0.16%, stdev: 0.28%, projCAGR: 8.50%
Neurons: 5:winLoseRatios Arithmetic Mean: 54.51%, stdev: 4.11%, avgWeeklyGainPercent mean: 0.16%, stdev: 0.28%, projCAGR: 8.61%
Neurons: 6:winLoseRatios Arithmetic Mean: 53.77%, stdev: 4.25%, avgWeeklyGainPercent mean: 0.12%, stdev: 0.28%, projCAGR: 6.33%

40% testSet case
Neurons: 3: winLoseRatios Arithmetic Mean: 53.68%, stdev: 2.30%, avgWeeklyGainPercent mean: 0.12%, stdev: 0.11%, projCAGR: 6.41%
Neurons: 4: winLoseRatios Arithmetic Mean: 54.06%, stdev: 1.81%, avgWeeklyGainPercent mean: 0.14%, stdev: 0.10%, projCAGR: 7.36%
Neurons: 5: winLoseRatios Arithmetic Mean: 53.92%, stdev: 1.99%, avgWeeklyGainPercent mean: 0.13%, stdev: 0.10%, projCAGR: 7.19%
Neurons: 6: winLoseRatios Arithmetic Mean: 53.24%, stdev: 2.17%, avgWeeklyGainPercent mean: 0.10%, stdev: 0.10%, projCAGR: 5.36%

Note that these seems to be much better results.
The best one is when we use only 5% samples for out of sample tests
And when we have 5 neurons.
In this case, we have 54.51% chance to forecast the direction correctly, the avgWeeklyGain is 0.16%, that is about 8.61% CAGR.

But, I fear, because we include Monday/next Monday, Tuesday/next Tuesday, etc. samples
the precondition that the samples should be independent doesn’t hold.
For example, if the learning set contain a Monday/next Monday pair, and if the successive Tuesday/next Tuesday pair is also in the test set, this is now easier to forecast.
However, note that the NN uses validation samples, so it stops the learning before it would overtrain.
An obvious test that we should do is to backtest the strategy with windowing,
and when estimating the next day forecast, we shouldn’t use any samples that are later than the estimated date.
Currently, the test/train/validation sets are divided randomly over all the 3000 samples, so there is no Time concept in the current backtesting program.

Note that the projected CAGR we achived is 8% annual. A wishful thinking approach would estimate that with double leveraged ETFs, we can achieve double, 16% annual gain. However, this is still too small, because the transactions cost an slippage can be more than this if we consider the potential 52 times trading.
I would be happy if we have less than 20 times trading per year.
We have to backtest it more precisely.

Advertisements


No Responses Yet to “Increasing the number of weekly samples 5 fold”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: