Forecasting artificial random series

10Jun10

The subject is an oxymoron. If something is totally random, there is nothing we can do to forecast it. I was always fascinated by the idea that researchers and academic paper authors usually don’t try their learning algorithm on artificially random series instead of real world financial time series.
On the one hand, it is sensible not to use random inputs, because nobody expect that any learning algorithm will ever forecast random time series, but on the other hand as I wish to understand every detail and behavior of my learning algorithm, I think it is a necessary test.
What if a system designed carefully by an author can predict totally random time series?
I would say, that means that there is a bug in the system. So, trying our system on random data proves nothing valuable per se; it can reveal potential mistakes, bugs in the implementation.
I use this parameter space for this test:
nNeurons = 5;
testSamplesRatio = 0.50;
nRebalanceDays = 5;
nShortTermMA = 30;
nLongTermMA = 45;
nRepeatLearning = 5;

1.
Note that usually I use this RUT data as the input for the time series. (from 1997 to 2010)

The test result is this

15 runs:

****Test 1: winLose: 56.07%, avgWeeklyGainP: 0.18%, projectedCAGR: 9.69%, periodGain: 46.20%
****Test 2: winLose: 55.08%, avgWeeklyGainP: 0.28%, projectedCAGR: 15.87%, periodGain: 101.36%
****Test 3: winLose: 52.30%, avgWeeklyGainP: 0.28%, projectedCAGR: 7.54%, periodGain: 22.45%
****Test 4: winLose: 52.65%, avgWeeklyGainP: 0.28%, projectedCAGR: 4.71%, periodGain: 7.57%
****Test 5: winLose: 48.68%, avgWeeklyGainP: -0.07%, projectedCAGR: -3.45%, periodGain: -32.98%
****Test 1: winLose: 56.67%, avgWeeklyGainP: 0.27%, projectedCAGR: 14.82%, periodGain: 6.75%
****Test 2: winLose: 50.00%, avgWeeklyGainP: -0.16%, projectedCAGR: -8.23%, periodGain: -6.23%
****Test 3: winLose: 63.33%, avgWeeklyGainP: 0.37%, projectedCAGR: 21.27%, periodGain: 10.42%
****Test 4: winLose: 60.00%, avgWeeklyGainP: 0.56%, projectedCAGR: 33.46%, periodGain: 16.79%
****Test 5: winLose: 65.52%, avgWeeklyGainP: 0.36%, projectedCAGR: 20.31%, periodGain: 9.13%
****Test 1: winLose: 53.33%, avgWeeklyGainP: -0.41%, projectedCAGR: -19.44%, periodGain: -13.00%
****Test 2: winLose: 50.00%, avgWeeklyGainP: -0.29%, projectedCAGR: -13.82%, periodGain: -9.58%
****Test 3: winLose: 70.00%, avgWeeklyGainP: 0.95%, projectedCAGR: 63.53%, periodGain: 31.37%
****Test 4: winLose: 60.00%, avgWeeklyGainP: 0.49%, projectedCAGR: 28.68%, periodGain: 14.35%
****Test 5: winLose: 55.17%, avgWeeklyGainP: -0.12%, projectedCAGR: -5.96%, periodGain: -4.90%

Average of the runs:
winLose: = AVERAGE() = 56.5%
avgWeeklyGainP: AVERAGE() = 0.198%

2.
Let’s generate random time series with this code

randPercChange = random(‘Normal’, 1, 0.01, length(indexCloses), 1); % 1percent stddev
randSeries = ones(length(indexCloses), 1);
randSeries(1) = indexCloses(1);
for i=2:length(indexCloses)
randSeries(i) = randSeries(i -1) * randPercChange(i – 1);
end

Note the randPercentChange has a mean of 1 and std.dev of 0.01. (= N(1, 0.01))
That means that with 95% confidence we can say that the daily change is less than 2 times the standard deviation = 2 * 1% = 2%.

The generated time series for example look like these

The NN can learn this time series and forecast it.
Result:
random(‘normal’)

****Test 1: winLose: 54.10%, avgWeeklyGainP: 0.24%, projectedCAGR: 13.36%, periodGain: 94.91%
****Test 2: winLose: 55.41%, avgWeeklyGainP: 0.30%, projectedCAGR: 16.56%, periodGain: 130.00%
****Test 3: winLose: 51.64%, avgWeeklyGainP: 0.18%, projectedCAGR: 9.92%, periodGain: 60.85%
****Test 4: winLose: 49.67%, avgWeeklyGainP: 0.16%, projectedCAGR: 8.46%, periodGain: 49.54%
****Test 5: winLose: 50.99%, avgWeeklyGainP: 0.16%, projectedCAGR: 8.82%, periodGain: 52.86%
****Test 1: winLose: 56.67%, avgWeeklyGainP: 0.43%, projectedCAGR: 25.24%, periodGain: 13.04%
****Test 2: winLose: 56.67%, avgWeeklyGainP: 0.40%, projectedCAGR: 23.20%, periodGain: 12.04%
****Test 3: winLose: 56.67%, avgWeeklyGainP: 0.56%, projectedCAGR: 33.70%, periodGain: 17.34%
****Test 4: winLose: 60.00%, avgWeeklyGainP: 0.07%, projectedCAGR: 3.49%, periodGain: 1.20%
****Test 5: winLose: 65.52%, avgWeeklyGainP: 0.34%, projectedCAGR: 19.31%, periodGain: 9.50%
****Test 1: winLose: 73.33%, avgWeeklyGainP: 0.60%, projectedCAGR: 36.16%, periodGain: 18.62%
****Test 2: winLose: 43.33%, avgWeeklyGainP: -0.45%, projectedCAGR: -20.71%, periodGain: -13.18%
****Test 3: winLose: 53.33%, avgWeeklyGainP: 0.48%, projectedCAGR: 28.24%, periodGain: 14.72%
****Test 4: winLose: 63.33%, avgWeeklyGainP: 0.18%, projectedCAGR: 9.76%, periodGain: 4.92%
****Test 5: winLose: 58.62%, avgWeeklyGainP: 0.34%, projectedCAGR: 19.14%, periodGain: 9.64%

Average of the runs:
winLose: = AVERAGE() = 56.62%
avgWeeklyGainP: AVERAGE() = 0.27%

Wow! Our NN could forecast the random time series! 🙂 That is amazing.
I know that NN is a universal approximator, but it is far beyond its capability.
So, there should be a bug somewhere in the implementation… Isn’t it?
Not so hastily…
I reckon there is no bug. I will explain later.
But first, let’s see another Test.

3.
Instead of using the normal distribution to generate random daily %changes, use the Uniform distribution with this code
randPercChange = random(‘unif’, 1-0.03, 1+0.03, length(indexCloses), 1);
The min is 0.97 the max is 1.03. So this generates %changes in the range of -3%..+3% with uniform distribution.

The generated time series for example look like these

The NN can learn this time series and forecast it.
Result:
random(‘normal’)

****Test 1: winLose: 49.18%, avgWeeklyGainP: 0.05%, projectedCAGR: 2.78%, periodGain: -5.91%
****Test 2: winLose: 51.80%, avgWeeklyGainP: 0.12%, projectedCAGR: 6.63%, periodGain: 13.99%
****Test 3: winLose: 50.66%, avgWeeklyGainP: 0.02%, projectedCAGR: 1.19%, periodGain: -16.55%
****Test 4: winLose: 49.01%, avgWeeklyGainP: -0.09%, projectedCAGR: -4.68%, periodGain: -40.70%
****Test 5: winLose: 50.00%, avgWeeklyGainP: 0.24%, projectedCAGR: 13.47%, periodGain: 67.77%

****Test 1: winLose: 56.67%, avgWeeklyGainP: 0.71%, projectedCAGR: 44.72%, periodGain: 21.03%
****Test 2: winLose: 43.33%, avgWeeklyGainP: 0.48%, projectedCAGR: 28.35%, periodGain: 13.17%
****Test 3: winLose: 63.33%, avgWeeklyGainP: 1.18%, projectedCAGR: 84.07%, periodGain: 39.38%
****Test 4: winLose: 50.00%, avgWeeklyGainP: 0.76%, projectedCAGR: 48.45%, periodGain: 22.84%
****Test 5: winLose: 55.17%, avgWeeklyGainP: -0.24%, projectedCAGR: -11.74%, periodGain: -8.81%

****Test 1: winLose: 46.67%, avgWeeklyGainP: -0.28%, projectedCAGR: -13.40%, periodGain: -9.15%
****Test 2: winLose: 53.33%, avgWeeklyGainP: -0.19%, projectedCAGR: -9.32%, periodGain: -7.26%
****Test 3: winLose: 56.67%, avgWeeklyGainP: -0.28%, projectedCAGR: -13.75%, periodGain: -10.19%
****Test 4: winLose: 50.00%, avgWeeklyGainP: -0.44%, projectedCAGR: -20.54%, periodGain: -14.36%
****Test 5: winLose: 48.28%, avgWeeklyGainP: -0.21%, projectedCAGR: -10.51%, periodGain: -8.11%

Average of the runs:
winLose: = AVERAGE() = 50.77%
avgWeeklyGainP: AVERAGE() = 0.05%

Great! As we expected. There is no edge, there is no predictive power for uniformly random %change generated time series.
But then the question arises that why there seems to be predictive power for normal distributed random %change generated time series?

Our theory is this.
1. The good news that my implementation is not buggy, because the uniform random series cannot be predicted.
2. The normal distribution series can be predicted.
This is the consequence of my inputs design.
Note that my 2 inputs are the %Difference of the time series from its SMA(30) (or SMA(45)).
So, my inputs are a preprocessed version of the raw time series data.
But for this input, it is sensible to expect a behavior:
Large %changes occurs rarely (normal distribution), however when this occurs, the time series %difference from SMA(30), jumps to a very high value.
What will happen next: the following %changes tends to be small changes, so the SMA(30) will ultimately keep up with the time series.
So, the normal distribution assures that the system works like a spring or a rubber band. When it is stretched far for the equilibrium point (from SMA(30), it tends to snap back. This behavior is predictable. And this behavior is present even in the random time series.
3.
The uniform distribution doesn’t work like a spring. After a large %change, another large %change is as likely as small changes.
Therefore uniform distribution cannot be predicted.
4.
The stock market %changes behaves as a normal distribution (in reality, it is rather a power distribution, but in the context of this article accept that it is more a normal distribution than uniform distribution) Therefore, as our system is designed by these special inputs (%difference from SMA) we shouldn’t be surprised that the prediction accuracy of or a random normal series is similar to the prediction accuracy of the real RUT series.
5.
The similarity is no surprise, but what strikes me is that the RUT prediction accuracy is not much better than the random normal (Gaussian) series prediction.
(the direction accuracy is better 56.6% vs. 56.5%, but the weeklyGainPercent is quite similar 0.198% vs. 0.27%. The random Gaussian is even better)
That is a little bit sad news of this experiment. It means that our NN could learn and leverage on the normal distribution feature of the RUT series, but it couldn’t extract any more useful information from the data.

One reason it is not better, because in real life the daily %change distribution is not normal, but power distribution like.
After a big down move, another big down move easily occurs (fat tail distribution). In the ideal Gauss distribution, it is very unlikely, however, in real life it can happen with higher probability than the ideal Gauss distribution suggest.
Since the NN is a smooth function approximator, it is better suited to an ideal Gauss distribution, and the wild crashes of real life makes it more difficult to predict.
Compared to the Gaussian, real life is a little bit more like the Uniform distribution: after big down move, another big down move is equally likely.
And as NN proved bad for predicting Uniform distribution, it is also worse in predicting real life than the ideal Gaussian.

Another reason of this that I may picked too large std.dev. for the normal distribution. I picked the std.dev. as 1%, so with 5% probability the daily change is greater than 2% (occurs once in a month).
If it is an overstatemet compared to real life, then this can be a reason.
Let’s suppose we overstate the std.dev. to be 5% daily.
If we have an edge in direction prediction (like 60% correct direction prediction) than this over-defined std.dev. can be the reason that the overall weekly %gain is better than in real life.
However, I reckon, std.dev. of 1% is quite correct and not an exaggeration.

We shouldn’t be too sad about this relative under-performance compared to the ideal case, because it still forecast with about 56% directional accuracy and about 10% CAGR (=0.2%weekly), which maybe playable in the future. At least, it is a good base to start width.

Advertisements


No Responses Yet to “Forecasting artificial random series”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: