Target normalization: ‘RUT next day %gain’ forecast

22Oct10

In this post, we try different scaling methods for mapping the ANN target. In contrast to the previous post, we try to solve the function approximation task, not the classification task. We predict not only the direction of the next day, but the value as well.
In half of the experiments, we used only a simply scaling of the target:

nnTarget = ‘nextDay%gain’*Multiplier;

with various Multipliers.
In other half of the experiments, we used different scaling mechanisms we call squashing methods.
Our theory is that we are best to squash the target range into -1..+1. But there are different kinds of techniques to achieve this.
For example, we can center it around zero (zero is mapped to zero), or we can center it around the average (average is mapped to zero).
Or we can choose not to center at all when squashing to the -1..+1 range.
Note the code from the Matlab file that defines the Squashing functions from -1 to -7:

targetMin = min(nnTargetWithoutOutlier);
targetMax = max(nnTargetWithoutOutlier);
targetMaxAbs = max(abs(targetMin), abs(targetMax));
targetMean = mean(nnTargetWithoutOutlier);
targetMeanMaxAbs = max(abs(min(nnTargetWithoutOutlier-targetMean)), abs(max(nnTargetWithoutOutlier-targetMean)));

if (p_targetMultiplier == -1)
nnTargetNormalized = ((nnTargetWithoutOutlier - targetMin)/(targetMax - targetMin) - 0.5) * 2 * 0.9; % to -0.9..+0.9
elseif (p_targetMultiplier == -2)
nnTargetNormalized = ((nnTargetWithoutOutlier - targetMin)/(targetMax - targetMin) - 0.5) * 2; % to -1..+1
nnTargetNormalized = tansig(nnTargetNormalized*4); % this will squash the bottom 20%, that is those under -0.4, to the bottom 5%, that is those under -0.9
elseif (p_targetMultiplier == -3) % centering around zero but not around the mean
nnTargetNormalized = (nnTargetWithoutOutlier + targetMaxAbs)/targetMaxAbs - 1; % to -1..+1
elseif (p_targetMultiplier == -4) % centering around zero but not around the mean
nnTargetNormalized = (nnTargetWithoutOutlier + targetMaxAbs)/targetMaxAbs - 1; % to -1..+1
nnTargetNormalized = tansig(nnTargetNormalized*4); % this will squash the bottom 20%, that is those under -0.4, to the bottom 5%, that is those under -0.9
elseif (p_targetMultiplier == -5) % centering around zero but not around the mean
nnTargetNormalized = (nnTargetWithoutOutlier - targetMean + targetMeanMaxAbs)/targetMeanMaxAbs - 1; % to -1..+1
elseif (p_targetMultiplier == -6) % centering around zero but not around the mean
nnTargetNormalized = (nnTargetWithoutOutlier - targetMean + targetMeanMaxAbs)/targetMeanMaxAbs - 1; % to -1..+1
nnTargetNormalized = tansig(nnTargetNormalized*4); % this will squash
elseif (p_targetMultiplier == -7) % centering around zero but not around the mean
nnTargetNormalized = ((nnTargetWithoutOutlier + targetMaxAbs)/targetMaxAbs - 1)*100; % to -100..+100

(click to enlarge)

Conclusions:
1.As our outlier threshold is 4%, we expected that the multiplier=25 gives the best result, because that is the one that maps the output range best to the -1..+1.
However, it seems that the multiplier 100 is the best. This can be only by pure luck. Note the Experiment 3 in the Multiply 100 case. That one achieved more than 500% TR%. That is probably only pure luck. However it contributes much into the average. So, maybe the fact that multiplier 100 is the winner only pure randomness.

2. There is no question about it that the original Multiplier = 1 case are not the optimal. This study revealed that we can gain more with any multiplier: 25 or 100.

3. From the squashing function experiments, we conclude that using the tansig() as a second preprocess always worsen the result.
Our idea come from here:

Target normalization
Why target normalization ? Because building a model between the data elements and their associated target is made easier when the set of values to predict is rather compact. So when the distribution of the target variable is skewed, that is there are many lower values and a few higher values (e.g. the distribution of income ; the income is non-negative, most people are earn around the average, and few people make bigger money), it is preferable to transform the variable to a normal one by computing its logarithm. Then the distribution becomes more even.

The tansig squashing function would make the distribution more even. We have few outliers at the edged and the bulk of the samples are crowded near the mean, near zero. For example the tansig(nnTargetNormalized*4) squashes the bottom 20%, that is those under -0.4, to the bottom 5%, that is those under -0.9. It seemed a good idea, but our measurements doesn’t confirm this to be a good idea. For example the Squashing-2 function was so bad, we tried to debug it.

Our note:
the first day forecast is negative, even if the input (= day 4) is the most bullish day of the week, with hugely positive %return.
It is because the minValue is too small: -3.7. (the maxValue is +3.0). The ANN forecast something onto the middle. (an average); this is perturbed by a -3.7 subtraction. That is huge.
Imagine a situation when our range is -10..+2. If we center it around the zero; our forecast will be near zero; and then we subtract -10 from it when we de-normalize.

4. From the squashing function, it turns out that re-centering (and after re-centering we rescale) the distribution around the mean (instead of around zero) is not a good idea.

5.
In the future, we use
nnTargetNormalized = nnTargetWithoutOutlier*25;
or
nnTargetNormalized = nnTargetWithoutOutlier*100;
or the
Squashing-3:
nnTargetNormalized = (nnTargetWithoutOutlier + targetMaxAbs)/targetMaxAbs – 1; % to -1..+1

Interesting that this Squashing-3 looks like a recenting function, but in theory, it is not.
(nnTargetWithoutOutlier + targetMaxAbs)/targetMaxAbs – 1;
equals to
nnTargetWithoutOutlier/targetMaxAbs;

For the first 200 days training samples, targetMaxAbs = 0.037, so the multiplier was 1/0.037=27. So, it is usually a little bit higher than 25.

We prefer Squashing-3, because that is adaptive to the range. It is not fixed. For example, if targetMaxAbs is 0.01 only (not 0.04), it can stretch the range more.
In case targetMaxAbs = 0.02, the multiplier becomes 50.
Note also that Squashing-3 has very good D_stat, even if its TR are ot the best. This can be due to randomness. From the 3 performance measurements, I prefer D_stat. That tells the most about the prediction power. A good TR% and CAGR% can be a result of only some outlier days.

Advertisements


No Responses Yet to “Target normalization: ‘RUT next day %gain’ forecast”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: