Normalization revisited.

07Feb10

I haven’t posted anything in the last week, but I tried many things in the code.

1.

One of the important things is that I developed a little framework that can run

backtest forever and register the winLoseRatios and avgDaily gains.

Therefore I don’t have to manually run every backtest. I can set the framework to run the whole night,

and in the morning I have the results.

Note that if I test for the last 260 days, that backtest can take 10minutes to 2 hours depending

on the number of neurons used (10 neurons or 200 neurons). So, this framework is a valuable asset.

Each backtest is different.

Why?

Because the NeuralNetwork in Matlab uses a kind of validation mechanism that

divides the my training samples to 3 groups: training, testing, validation sets.

These division is random.

So, every backtest I run produces a different NN, trained differently, therefore

it produces different results.

In spite of this randomness, in the long term, running many backtest

the average and stdDev values of the %gain should tell us if we found the good NN.

while(true)
[winLoseRatio, avgDailyGainPercent] = NNTestSubrutine(iTest, nNeurons, nDaysInTest, <inputs>, logFile);
write winLoseRatio, avgDailyGainPercent to file.
end

2. Normalization revisited.

Another development is that I rethink the normalization again.

How naive I was before to think that the NN can produce anything without normalization.

My previous normalization was this simple:

normalizedClosePrices = (closes – minCloses) ./ (maxCloses-minCloses);

However, as I noted that time, this doesn’t de-trend the data.

However, if we don’t detrend the data, the samples are not independent.

For example in 2001, an SPX close of 800 is possible in the input for previous day.

In 2008, an SPX close of 1500 is possible.

The poor NN has no chance to excavate this connection in the input. The trend drove the SPX from 800 to 1500, but the NN looks at the input only for the last 5 days.

What we really need in the input is that how many percent the change was compared to the previous day.

At least, this measurement doesn’t contain that awful trend.

So, the current input of the NN is filled not with the absolute value of SPX, but with this:

closePricesChange = closePrices(2:nDaysInSpData)./closePrices(1:nDaysInSpData-1)-1;

I tested various versions:

I changed the default validation error from 6 to 12: the performance was worse (almost the same).

I changed the input so, it contains not 5 days data, but only 2 or 1 days data: It didn’t help.

The typical results I got, after 29 tests:

****Test: 29.

winLoseRatios:

WinLoseRatios Arithmetic Mean: 48.36%, stdev: 7.24%

CAGR: (annual %gain):

CAGR Arithmetic Mean: 3.68%, stdev: 24.46%

So, the case of proving that NN has predictive power is not yet solved.

A side note:

I realize why other researchers don’t like to use the SPX data.

If you go to yahooFinance:

http://finance.yahoo.com/q/hp?s=^GSPC&a=00&b=3&c=2000&d=00&e=7&f=2000&g=d

You can see that the Jan-4 openPrice equals to the Jan-3 closePrice.

And it even equalst to the Jan-4 Highprice.

Of course, it is a faulty data. But this erroneous behavior is presented in the SPX index reported by YahooFinance until about 2004.

Therefore,

we have to find a better source for our SPX

or

as others do it often, use the SPY stock price data. That doesn’t contain this error.

(Warning! use the adjusted price).

Until that time, there is no point to use the open/high/low price data.

Advertisements


No Responses Yet to “Normalization revisited.”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: