Comparing Linear Regression vs. Classification, VXX, 2D

01Nov11

As we continue the previous posts, we have at least one parameter: the number of lookback days. Doing sensitivity analysis on this parameter, we hope to compare regression and classification methods.

 

In further detail, we are going to compare

1.

Linear regression

– based on the normal equation, deterministic evaluation

– it is not iteration, so it instantly finds the exact optimum; so iteration number is not a parameter

– requires no normalization of inputs, outputs (so there is no Normalization as a parameter)

– has only 1 parameter: lookback days

 

2.

Logistic Regression (the name shouldn’t mislead you, it is a classification), binary

– 2 categories: Buy or Sell (these categories are defined by the %change threshold of 0%)

– gradient descent iteration parameter: 400 (probably, it is enough, because in linear tasks, the Cost function is convex, so there is only a global minima)

– in theory, normalization is a parameter, because we do gradient descent iteration. Normalization would help the gradient descent to converge faster.

However in this case, with this very simple convex Cost function and because the range doesn’t differ too much from the ideal -1..1 range,

(our range is -0.2..0.2 (in a range of -20% to +20%)), we could normalize by a x5 multiplier, but that wouldn’t help the gradient descent too go faster too much.

So, we regard that we lose very small speed of the gradient descent. And this is not significant.

 

3.

Logistic Regression (classification), 3 categories,

– 3 categories: Buy, Cash, Sell signals;

– these 3 categories are defined by the %thresholds of: -1.5%.. +1.5% (so, if Y output is in that range, we regard Y output as Cash)

– gradient descent iteration parameter: 400 (as before)

– we chose not to normalize the input, as in the previous case. That is also a parameter: “Normalization: OFF” from a possible basket of normalizations { mean normalization, mean and min-max range normalization, mean and std normalization, only std normalization, etc.)

 

 

We do sensitivity analysis for lookback days:

<click on the image to see it properly>

 

What we can observe that in general is that the classification solutions achieve less final PV (Portfolio Value).

Why is that?

The reason lies how the methods handle the training samples.

For Classification, all training samples are equal, irrespective of their Y magnitude. As it takes only the Sign() of the %change (+1, 0, -1). If we have a VIX spike of +40% on a day, that is treated equally to another day that has +2% VIX change (in the classification case). This has some advantages and disadvantages. Particularly, the classification is less sensitive to outliers. However, it turns out that exactly these outliers are very important in our problem. When VIX increased on 1st August 2011 by 50% on a day, which was a huge increase. It instantly modified the non-classification (but regression) based solution to be positively biased. Only one of this outlier could have a great effect for the next weeks, months. Afterward, all the predictions were Upside biased: it was more likely to forecast Up %changes than Down %changes.

However, for classification, this huge 50% %change was only another sample with +1 output value. It took a long time, until the Classification methods realized that we are in a new regime: in a regime where Up days are more likely than down days.

In that sense, Regression is more agile, it adapts more quickly to a regime change that is signed by an outlier than Classification. And it turns out that in this problem case, it is better.

In another problem case (forecasting the price of houses) this outlier sensitivity would be counterproductive.

 

This is very well illustrated in the Ultimate predictor PV chart.

The Ultimate predictor aggregates the different lookback predictors from 75 to 110 lookback days and do a majority vote. It is the PV (Portfolio Value) chart:

 

 

The 2 occasions when the Linear Regression outperformed the Classification is when the low VIX regime changed to a high VIX regime: in 2010 summer and 2011 August. In both cases, regression was quicker to adapt.

 

The 3 classifier case has the lowest drawdown in the PV chart, but the lowest profit too. This is a kind of trade-off. We can go to Cash sometimes. This, obviously, decrease the drawdown, but as we don’t participate in the market in this less certain times, we leave profit on the table. However, that can be good for a conservative, non aggressive version of the strategy.

 

Observe also that in the Sensitivity Analysis chart, we can witness that the 3 categories classifier achieves the least PV. That is somehow expected, because it is in Cash about 30% of the time. It has probably less drawdown too.

It is unexplained however that at the far end of the sensitivity chart (having more than 150+ lookback days) why the 2bin classifier performs so poorly (it goes back to the PV = 1 line, having no profit in 2 years), while the 3 bins classifier (that is in cash 30% of the time) has PV = 2 in this region of the sensitivity analysis chart.

 

Conclusion:

We compared regression and classification. In our prediction problem, regression was better, because it doesn’t trump the effect of outliers.

The binary and the 3 categories classifier perform similarly to each other. That means their PVs are equal (Ultimate version), albeit the 3 bins version has lower drawdown, suitable for conservative implementation.

 

Advertisements


No Responses Yet to “Comparing Linear Regression vs. Classification, VXX, 2D”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: