Linear Regression learning for VXX, OLS estimator, 1D


An advantage of attending a university course is that it broadens the knowledge someone have. But even more important to that, it adds new usable tools into the repertoires that we keep in our toolbox.

The Stanford University Machine Learning course mentioned in the previous blog post is not only theoretical, but very practical indeed. I would say it is even more practical than theoretical, that is a bad news for theoretical mathematicians, but good news for the applied scientists or for programmers. The course force students to write homework programs every week. The suggested language is Octave, that is a free, open source version of Matlab. One of the topic in the last week was Multivariate Linear Regression and two approaches for the solution: the Normal Equation and the Gradient Descent.

In the context of this blog, we pursued Neural Network based solution of the problem, but for this post, let’s just solve the Matrix Equations.

In this post, let’s assume we want to forecast the next day %change of VXX as an output variable, based on the today %change of the VXX.

The linear equation would look like this.

Y = beta0 + beta1*X ,


X = today %change,

Y = next day %change.

The linear regression is finding the coefficient of the line that mostly fits to the data, like here:

I usually say that from the sample points we regress back the line (we determine it, we guess it) that is very likely to generate those sample points.

The unknown is the beta0, beta1. We want to determine (learn) them.

Let’s suppose to learn them by looking back in the history by D days, where D can be 20, 50, 100, 200 days.

beta0, beta1 = ?

How to solve it?

The solution is the OLS estimator, where OLS stands for Ordinary Least Squares and

In a nutshell, you have to evaluate this equation, which is using Octave/Matlab matrix operations, it is pretty straightforward.

For the geeks, see the details here:

I would like to stop here a little bit. Just looking at the equation: Beta = (X’X)^-1 * X’ y.

Why is it the equation? The proof is pretty straightforward.

Consider the original eqution:

X *Beta= y.

Try to determine Beta = ?

We cannot multiply both sides by X^-1. Why? because X is not a square matrix. If X is not a square matrix, there is no inverse matrix.

So, multiply both sides by X’ first (X transpose) to have

(X’X )*Beta= X’y

Now (X’X) is a square matrix, so we can have an inverse. Let’s multiply both sides by this inverse.

(X’X)^-1 *(X’X) *Beta= (X’X)^-1 *X’y

that is equivalent to:

Beta= (X’X)^-1 *X’y



The advantage of the OLS method compared to the Neural Network, or Gradient Descent is that it is

– deterministic. All the Neural Network solutions are randomized, therefore requires a lot of random runs for backtesting. In contrast, OLS requires only 1 backtest.

-easy to compute (takes half a second)

OLS doesn’t require normalization of the samples.

-the whole method has only 1 parameter: lookback days. That is contrast to the NeuralNetwork based solution that has another parameters: lookbackdays, outlier threshold, numberOfRandomRuns, weighting of the decision of the neural network, normalization parameters (SD or min-max normalization, range normalization or mean normalization too?).

-having only 1 parameter significantly reduces the parameter fine-tuning bias that distorts the results of many backtests.

– The disadvantage of OLS that it can capture only the linear relations of the inputs vs. output.  In contrast to the Neural Network that can describe any continuous functions.

In our concrete example, we took the VXX close prices from its inception. That is about the beginning of 2009.

We run the algorithm with lookback days = 20, 50, 100, 200.

We also plot the SMA70 of the strategy (as a mean to use some playing the equity curve technique).

The return curves of the strategy looks like this:


What can we realistically say. The charts are simiar.

-For the 200 days lookback, we can see a it went from 1 to 3 in about 2 years. That is 70% CAGR. Not bad.

-However, the maxDD was -50% (2010 summer), which is pretty high.

– the best performer was the 50 lookback days (probably that should be played in real life). That multiplied the initial deposit by 10 during 2.5 years.  That is about 150% CAGR, but we consider this performance as an outlier. Also note how volatile was this in 2011 august (albeit volatile in the favoured direction).

-someone could start the strategy when the profit curve is above the SMA70, as it is now (as a means of money management)

-someone could start the strategy when the profit curve is higher than the previous highest high (maybe it is safer: less whipsaw)

On the other hand, it is worth mentioning that these are only theoretical results. Real life can be harsher than this. Sometimes because of the parameter fine tuning bias, sometimes because of that real life order execution is not perfect: (ask-bid spread, commissions, not executed short sale orders, because there were not enough stocks to be available to borrow, etc.)

In future posts, we will examine the 2D input case, and we will also do some Sensitivity Analysis on the ‘lookbackdays ‘ variable.


No Responses Yet to “Linear Regression learning for VXX, OLS estimator, 1D”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: