### Visualize your data (x,y) and the prediction model f(x)

**Visualize your data**

Our quest is supervised learning is to find function f(x) that is likely to generate your training set. The training set is defined by in which the input X, has output Y labels attached to them. One thing you learn quickly is the importance of analysing you data. There are some problems with it.

In one hand, the dimensionality: It is very frequent to have multidimensional data (10, 20+), however we — Earth people — are very poor visualizing anything that more than 3 dimensional.

In the other hand, the problem of visualizing is that if the data contains a lot of noise, it is difficult to see any meaningful structure in the data.

Luckily, in our experiments, we try to minimize dimensional complexity. Mostly, to mitigate the problem of overfitting.

We showed (2 posts ago) that 2 dimensional time series prediction was better for VXX than 1 dimensional one.

Therefore, we continue with the 2 dimensional case.

Our x1 dimension (horizontal axis) is the %change today, x2 (vertical axis) is %change yesterday.

Having 3 years of historical data, let’s look at it:

This plot shows when the tomorrow %change is positive (green + sign), or negative (red o sign).

Do you see any meaningful structure?

Not easy. Because of the lot of noise (and unfortunately it is not a white noise).

Some things can be concluded though:

– there seems to be more red dots overall. (expected) More VIX down days.

– green (VXX up days) are probable when either today was up highly or yesterday was up highly (expected: volatility brings more volatility)

– down VXX probable when the market is peaceful (small up, small down moves in the last 2 days)

But overall, the plot **looks so random, it is difficult to imagine how can we separate the two groups: the positive days from the negative.**

Obviously there is no linear separator.

This plot is useful, if we do **classification to 2 groups (Up, Down),** but what it we would like to do **classification into 3 groups:**

**Bullish days, Bearish Days, Cash days. Cash would mean that the %change was mild: -1%..+1%.**

Let’s make a plot. The black diamonds represent those Neutral days.

More or less the same can be said. Extra conclusions can be made like:

– there are no black dots (Neutral days), if today or yesterday %change is extreme (so the Neutral days happens usually in less volatile regime)

– if the VXX %gain was higher than +20% today (2 cases), then it was followed by another VXX increase

– when the VXX %gain was negative today, and negative yesterday, it is likely it will be negative tomorrow. (the VXX has a daily FollowThrough, momentum)

** 2. Visualize your final fitted prediction model (f(x))**

Let’s suppose we do a Linear Regression learning described in the previous posts.

How does the decision surface look like?

It looks something like this:

We draw the decision boundary by black dotted line. That represents the Y values where the f(x) is zero. That separates the up forecast from the down forecasts. The plot is dated on 2011-10-28.

The prediction can be made manually from the plot, if we know the %change of yesterday (vertical axes) and %change today (horizontal axis). For example, if both is 0%, the prediction is in the yellowish (upper area), the prediction is a positive tomorrow %change.

Note that **observing the f(0,0) is a good way to evaluate whether the current model is Upside biased or Downside biased.** Because it was taught by feeding the last 93 trading days samples, and from August 2011 we are in a very volatile period, it is not a shock that f(0,0) is positive, so the model mostly predicts Positive values. (Positively biased)

As we anticipate negative VXX changes in the foreseeable December, Xmas season, it is not advised to start trading the strategy right now.

Filed under: Uncategorized | Leave a Comment

## No Responses Yet to “Visualize your data (x,y) and the prediction model f(x)”