Artificial patterns to learn

02May11

Can the Encog NN learn deterministic or non-deterministic, but predictable patterns? It should, but to be sure, we investigate this in this post.
When we test these artificially created patters, there is no point using the gCAGR as a performance measure (as it can go to infinity in the good case), so we use the D_stat, the directional accuracy as a measure to check that the ANN learn the pattern or not. (We could use the RMS error too, but let’s stick to the dStat in this post. We despise the RMS anyway.)
Just for comparison, predicting the RUT (Russell 2000) index, we achieved 56% directional accuracy.
We usually contend that an ANN training is successful if the D_stat significantly differs from 50%.

We know that by its nature, when we generate completely random time series, it cannot be predicted.
From logic theory use the ‘Modus tollens’ to express why we do this experiment. (We have a falsifiable case; let’s try to prove it wrong.)
“A->B” is true rule, that induces that => “^B->^A” is also a true rule.
Let’s fill A, B with concrete statements:
“if the (Encog NN works properly) -> (it cannot predict the random time series)”. (that is the statement A->B)
so, it induces that: (“^B->^A”)
“if not(it cannot predict the random time series) -> not(Encog NN working properly) .”
Not(not(f)) equals to f.
So, “if it can predict the random time series -> Encog NN doesn’t work properly.”
That is why it is so important to check this. To check that it can predict the random time series or not.
If it can, that would be evidence that there is a bug somewhere in our implementation or in the Encog framework.
Note also, that even if we find that it cannot predict the random time series that is not a proof that it works properly.
Proving that it works properly is impossible. Only proving that it doesn’t work properly is possible.

And we will try to prove this in this experiment. (Wiki ‘falsification ‘ if you wish)

This chart shows all our measured directional accuracy (dStat). One cell represents 5 tests average. (Click for crisp image)

On the vertical axis, we increased the number of NN neurons from 1 to 5.
As we expect, we see that in general, increasing the neurons will increase the prediction power accuracy.

0. RUT prediction, nNeurons dependence.
Apart from the dStat, we measured the Portfolio Value too. PV conveys no meaning for the artificial patterns, but it is worth looking at it in the RUT index prediction case.

At first sight, it seems that using only 1 neuron is better for the PV, however, note that the STDev for the 1 neuron case is the highest among all.
So, this PV = 586 (that is composed of averaging 5 experiments) contains an outlier, and in practice the situation is not as rosy.
The same can be seen on the RUT dStat SD chart (we don’t show it here). Therefore, at the second look, we have doubt that the nNeurons=1 case would be the one we would use.
Let’s say now that this war is not yet decided.
We will repeat this trial&error experiment later, but so far, let’s suppose we use the nNeurons=2 case in the future for RUT prediction,
even though nNeurons=1 seems to be better (but we reckon that it is again just the trick of randomness. The high randomness shows its footprint in the high SD)

1. Artificial patterns: random

Let’s try random input artificial pattern: Gaussian and uniform distribution.
– uniform(-1%..1%)
– Gaussian(0% mean, 1%STD)

We are happy to see that the ANN couldn’t predict this random pattern. So, we cannot prove that the ANN doesn’t work properly.
(However, we didn’t prove either that the ANN works properly.)

2. Artificial patterns: deterministic
We need some patterns that are deterministic.
We picked 3 different versions:
-2 period: 1%, 2%, -1%, -2%, (repeat this pattern)
(The ‘2 period’ here means that in theory an ANN that can look back to the last 1 period, could synthesize the rules. So, the patters are 2 period patterns. F(t-1) always determines F(t))
The pattern the ANN has to learn is this. The ideal function is the green one,
however it is evident that with 1 neuron it is impossible to learn the ideal function.

The question is what kind of ANN surface we got after training this pattern.
Here we show one examples for nNeurons = 1, 2, 3, 4, 5 case consecutively.

We inspect that with 1 neuron the surface is quite linear. This also a suggestion that even if the RUT prediction case gave better result with 1 neuron than with more,
but we shouldn’t use the 1 neuron ANN in real life. It is just too simple to have too much predictive power.

Note also that as the nNeurons are increased, the ANN can predict this function perfectly. Almost with 100% accuracy (nNeurons=5 case reached 93% accuracy, but we can still increase the nNeurons, increase the epoch or increase the ensemble members from 1 to 21 to improve this 93% accuracy. So this artificial pattern is predictable and the ANN does a good job.)

-3 period (with 2 confusions): 1%, 0.5%, 1%, %-1 %-0.5 %-1 (repeat this pattern) // 2 confusions in it; for -1, +1,

Note that it is an impossible task for this kind of ANN. We have a 6 days pattern that repeats, and we feed the ANN only the previous day change%.
However, without the confusions we placed into it, the ANN would be able to learn it completely, with 100% accuracy.
But even in this impossible to solve case, the ANN fares reasonable well.
This is similar how humans operate in the world. We make predictions, based only just part of the information that is necessary to make a correct forecast.
We are humans; we cannot have all the information that affects a dynamic system.

-3 period (with 3 confusions): 1%, 0%, 1%, %-1 %0 %-1 (repeat this pattern) // 3 confusions for it: for -1, 1, 0

It is the same as the previous, but we complicated it even more with another extra confusion.
Interestingly, the ANN could learn this one better than the 2 confusions case.
That is understandable if you compare the green lines, the ideal functions. In this case, the green line is smoother, simpler than the 2 confusions case.

So, we learnt that the ANN is more successful, if the function to estimate is simpler.
Unfortunately, in real life it is rarely the case. Financial markets has complex relationships, complex dynamics, very far from being simple.
Note that it is impossible to achieve 100% accuracy for this pattern: for example for the 0% input, the output can be +1% and also -1%.
If the ANN trains a surface that is positive at 0, it will predict +1%, but it will fail on -1%.
(same problem, if the ANN function is negative at zero)

– 2 confusions and 3 confusions D_stat difference is exactly 1 out of 6 different values. (75-58=16.6= 1/6 of 100) That is understandable, because of an extra confusions, we miss another value.
Also interesting to see that 75% = 4.5 * (1/6 of 100), so the 3 confusions case misses 1.5 out of 6 times.

3. Artificial patterns: semi-random
Random, but predictable
this is the code:

This pattern has 10 period length, in which there are 2 period patterns.
It repeats the next pattern: 1%, 2%, 3%, 4%, 5%, -1%, -2%, -3%, -4%, -5% (repeat)
At each step a uniformly generated random number is added to this deterministic pattern.
We generate the randomness with various extent: 0%, 0.5%, 1%, 2%, 4%.

Note that in the dStat table when we increase the randomness to 4%, it becomes hardly predictable. That is expected too.

The dStat table shows encouraging results. As we increase the randomness, the prediction accuracy diminishes. Well done!

Discussions:
– overall we like the outcome of the experiment: the ANN couldn’t predict completely random time series, but
it could predict deterministic and semi-deterministic strategies.
– as we increased the number of neurons, the ANN could predict the deterministic patterns better (see SemiRan_r0 column), but
– as we increased the number of neurons, it predicts the randomized versions worse (SemiRan_r4, and RUT columns).
Therefore, in real life, when the time series are quite random, we should prefer to use as few neurons as possible.
– even if it is deterministic pattern that we want to train with our system, and humans in theory can find a deterministic rule, a 100% sure and accurate algorithm (D_stat=100%), the NN approach is still better and advised. Because, it is like human thinking, it is stochastic, non-deterministic as life is.
When you are an amateur chess player or amateur trader, you like rules, you like a 100% deterministic algorithm. When you are a grandmaster chessplayer or guru trader, you cannot make rules about your decisions. (We call it intuition, insight, creative thought, etc. That is a nice world for the pattern matching that the human mind does.) You just look at the chess table and do the pattern matching (based on past experience) and instantly make a decision. You feel that you have more than 50% chance to win if you do this move. That is the Epiphany, the revelation. After you got that insight, you try to consciously reason out your previous unconscious decision. ANN works the same way. ANN learns from all past experiences (preferring the recent one). From it, the ANN makes a decision (sometimes a non-deterministic one as humans do.) It doesn’t care about reasoning the decision.

The randomness that we experience with the ANN, it is also present in the human mind. Sometime, we miss an important figurine (a random figurine) on the chessboard when we match the input pattern. If the grandmaster chess player does another experiment with the same input, he may decide differently. Maybe he (by pure chance) remembers something about a previous event in his memory.

The human chess player runs different random experiments concurrently in his mind, in the background. One of the processors (threads) will win and gain the attention of the conscious mind focus. That solution will be selected by the grandmaster, but in case that there are many parallel PU (Processing Units) running in the background it is quite random which one will be selected. (this is the ensembling mechanism of the ANN)

Most of the time, the global minima (best solution) is not found, but the solution that is found is reasonable good that the grandmaster wins in the long term. The human experts work exactly like parallel ANN processors with an aggregation, ensembling mechanism that selects, picks one solution from the candidate function approximators.

So we shouldn’t worry about the fact that ANN is non-deterministic, random and aggregation of the ensembles poses a problem. The human mind does the same. In the long term, if enough experience is learned all candidate PUs will cast reasonable good estimates.

Advertisements


No Responses Yet to “Artificial patterns to learn”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: