### Conceptual Framework : “Machine learning based, non-objective probability-model for regular and extreme values of stock returns”

11Nov12

Now and then everybody needs a conceptual framework.

Our conceptual framework is different from the link above.

Here is the outline of a system that can help in stock market decisions. One of best way to illustrate a concept is the Flow Chart that usually tells the viewer the flow of information or sequential steps.

Here is our future conceptual framework:

The system is general enough that it can work in the prediction of the SPX, the RUT, the VIX, the EUR/USD ratio or anything else.

Let’s detail the bubbles in the flow chart a little bit more:

1. Learning probability distributions from historical data. Here we use the term Machine Learning in a general sense. We want to extract some useful prediction from historical data using the machine, the computer.

Let’s imagine a simple system that we usually don’t regard as a Machine Learning system.

Imagine that today, the SPX is over its 200 days SMA(200) (Simple Moving Average). A technical trader wants to take a bet for tomorrow’s market direction. How does he determine that he goes short or long?

He looks back at the last 20 years or 100 years history and based on that he calculates that every time the spot SPX was above the SMA(200), its next day return was 0.1% in average, and when it was under the SMA(200) its next day return was -0.2%. (These numbers are illustration only).

So, our technical trader ‘learns from the past samples’ that the expected profit for tomorrow is positive, so he goes long next day and buy SPX futures.

This is a very simple ‘machine learning’ system that everybody can ‘calculate’ it in Excel in about 1 hour.

Our conceptual framework is general enough that we didn’t fix the machine learning method. It can be a simply Excel calculation as mentioned before, or it can be a Linear Regression, a Neural Network, an SVM (Support Vector Machine) or a genetic algorithm, anything.

The different learning algorithms create different mathematical models.

We try to use a machine learning that is deterministic, so the same result can be reproduced with different backtests, but deterministic nature is not a must.

We prefer those machine learning algorithms that learns a probability distribution function (PDF), and Excel is not like that. The reason is that in one of the next step, we need not only the mean return, (anyway we prefer the median return) and we also wish to obtain the expected volatility, the standard deviation.

Therefore we prefer to work with probabilities.

We think the biggest mistake researchers generally make is that they research only the mean return, the expected profit or loss, but they fail to determine the expected volatility. For example, the previously mentioned SMA(200) days crossover method. We obtained that the profit is positive above the SMA(200), but we know nothing about the volatility. We contend that by forcing volatility down can even increase the profit in the long term, which contradicts the general efficient market theory. That says that the more profit we expect, the more volatility we should suffer.

2.

The ‘new’ classical probability approach (by Irving Fisher) uses only objective information coming from historical observations. That is the model that we have built in step 1. However, before the Fisher method of probability, others, like Bayes 100 years earlier regarded  that probabilities should be defined subjectively, based on prior beliefs.

Imagine the Nassim Taleb’s turkey example. A human farmer feeds the turkey nicely every day in the last 1000 days. The turkey is very happy with his friend: the farmer. However, on day 1001, it is Xmas time and the farmer comes with a knife, instead of food. A Fisher following mathematician would build up the turkey probability model only using the historical data, the last 1000 days of observation.

Fisher wouldn’t use other ‘fundamental information’, like the ‘general knowledge’ (belief) that

– farmer’s turkeys are eaten at the end anyway (almost without exception), as that is the purpose of raising turkeys.

– as the Xmas day is coming there are higher and higher probability that the farmer brings the knife instead of the food.

These fundamental knowledge doesn’t fit into the Fisher approach, but Bayes and Pascal would happily use this information too to build up the PDF (Probability Distribution Function).

Plato had no idea about PDFs, but even for him, this would have seen the better approach. Probability distributions are eternal objects. They exist irrespective of the observations. No matter how many observations we make, we cannot get to know the probability distribution totally through this way.

This is especially true for extreme values, outliers, power law distributions, in which extreme events occur very rarely, so observing them is quite difficult or impossible.

We can call this ‘new’ (rather old) approach belief based, subjective or non-objective probability. We tend to prefer the term ‘non-objective’, but it is only personal taste; they mean the same.

What do we mean by subjectivity (not objective) in the concept of stock market prediction?

Things that effect the PDF, but that cannot be observed through the historical samples of the last 3 years. Examples like:

– Events like USA presidential election in the next week (because the last 3 years data doesn’t contain the previous one)

– our general belief (by reading news and media) that the iPhone sales are ‘probably’ very good, because there were long queues in front of Apple shops.

– Mario Draghi makes a speech that he is willing to do everything to save the EU currency, and we believe he will do

– our belief that Cloud computing will be a big success in the future, so all cloud companies will perform better than other technology companies.

Because these fundamental things cannot be expressed through historical observations in step 1, we include these effects into our mathematical model here in step 2.

But how? It is not easy.

In step 1 we synthetized a probability distribution based on historical samples. We can use Gaussian distributions, log-normal distributions, Levy stable distributions, etc. If we use a Gaussian distribution, it can be described by 2 parameters: Mean, StDev, therefore our belief in step 2 can modify these parameter values. For example, our bullish (bearish) belief can increase (decrease) the Mean. If we expect higher volatility (in the case of coming USA election), our belief increases the StDev. If we expect lower volatility (because ECB starts to buy Southern country bonds), we decrease the StDev. Unfortunately, we prefer to work with log-normal and Levy stable distributions. Those have more obscure parameters, and therefore it is not so easy to express our belief as we mentioned here.

There is another question too? How much should we change these parameters?

No general, formalized answer for this.

We suggest modifying a little bit, and run 100K simulations based on those new parameters and calculate the CAGR, maxDD, StDev of the PV, to see the effect of those modifications.

3.

Using our non-objectively modified probability distribution we generate 1 million samples for the next day return.

4.

We determine Mean, Median, StDev and other statistics for the next day return from the simulated samples. You don’t have to do it for the Gaussian distribution, but you have to do it for all general probability distributions.

5.

Because we work with time series and we place bets every day, even if we have a positive expected next day return, it doesn’t mean we should place a bet.

In short, if the volatility is high, it is better to stay in cash, even if the expected profit is positive.

In step 5, based on the StDev we determine the minimum  threshold for the Mean. If the simulated Mean/Median is smaller than this threshold, we stay in cash.

For example: with 4.5% daily StDev, the volatility drag is 26% annual, so the threshold is 0.1% daily.

It means if the simulated Mean/Median is positive, but less than 0.1%, we stay in cash, and don’t go Long. Similarly, if the simulated Mean/Median is negative, but more than -0.1%, we stay in cash, and don’t go Short.

For determining these thresholds we can use the already generated 1M simulation samples, and assuming that these samples make a long time series.

This system can be called Conceptual Framework, but we prefer to call it “Machine learning based, non-objective probability-model for regular and extreme values of stock returns”.

We build up and use a probability model that can use non-Gaussian heavy tail distributions. Generating 1 million simulations or more — because this system is simulation based — we can model not only Gaussian, but extreme long tail stock market moves too.

### The power of Cash position, Part 2: A toy model with known probabilities.

05Nov12

A brief addendum to the Part 1.

Can the volatility drag quantified by the simulation that was performed in the previous post?

Let’s construct the toy SPY model in a way that the Expected %change is a positive constant every day, but very, very close to zero. (For example = 0.00001)

A naive observer would say that in this case, the Buy & Hold strategy would be profitable, since every day has a positive expected outcome, so it is worth taking a Long position in equities.

That is not the case.

And we show here that the outcome largely depends on the SD.

Assuming Gaussian distribution, the real world SPY has a mean 0.000384, that is 0.0384% and a Standard Deviation (SD) of 0.0124, that is 1.24%.

Let’s run our toy SPY generation process:

assuming SD of 4.5% (that mimic the SPY Triple ETFs), let’s construct a time series 100 times, and average it.

The CAGR of the toy SPY is -23%.

It means that if we bid randomly on the outcome, and our daily expected profit is zero, we should expect -23% annual capital loss every year.

assuming SD of 3% (that mimic the SPY Ultra (double) ETFs), let’s construct a time series 100 times, and average it.

The CAGR is -11%.

It means that if we bid randomly on the outcome, and our daily expected profit is zero, we should expect -11% annual capital decrease every year.

assuming SD of 1.5% (that mimic the SPY non leveraged ETFs), let’s construct a time series 100 times, and average it.

The CAGR is -2.7%.

The loss strongly depends on the SD. However, there is good news here.

If we have an instrument that is not much volatile, that has a SD of less than 1.5% (as for the SPY), we don’t have to worry too much about the cash position. This simulation shows that if the probabilities are for us (and not against us), if the Expected profit is above zero, even just slightly above zero, we can go into the position (long or short) full size. Not going to cash is forgivable, because the maximum we can lose is the -2.7% annual loss of the volatility drag.

However, with the triple ETFs the situation is different. The expected profit should compensate for the -23% annual loss of the volatility drag.

As the currently popular VIX volatility products (ETFs, futures) can have a Beta of 2 or 3 compared to the SPY, the cash position should be a frequent position of any strategy that plays the VIX.

If the expected profit on the next day is not greater than 23%/250= 0.1%, the strategy should favour the cash position for the sake of CAGR.

If our utility (goodness) function is not the CAGR, but the Sharpe, in which the volatility counts too, we would say that this threshold should be even greater than 0.1% (maybe 0.15% or 0.2%) for a strategy to dislike the cash position.

### The power of Cash position, Part 1: A toy model with known probabilities.

05Nov12

A long time ago MarketSci had some strategies called Scotty and YK.

These strategies are currently retired, but audited performance result can be obtained from here and here.

There were ‘by and large’ daily MR strategies. To be honest, that is a crude simplification, because YK  was a learning algorithm, but because daily MR was very successful in 2008, the YK strategy played mostly that.

A debate was risen that how to play correctly a Mean Reversion signal.

If SPY on a given day drops -5%, it is clear that a Mean Reversion strategy would go long at the end of the day to prepare for the bounce tomorrow.

However, if SPY drops only 0.10% our human intuition says that it is not a strong signal to go long tomorrow.

Should we go to Cash if the MR signal is weak?

Michael admitted that with a close to zero, like 0.1% daily change weak signal, the expected profit next day is quite small, but he insisted that we should go long even in this case, because the Expected Value of the profit is still positive.

In this post, we construct an artificial SPY example with known probabilities. Our mathematical model will be a stochastic process, so the outcome of every simulation is not deterministic, but random. However all probabilities are known.

We will play a daily Mean Reversion (MR) on this artificial SPY stock that we construct.

Our aim is to refute that claim: we show that even if the expected profit is positive, we better stay in Cash on a weak MR signal.

At first let’s construct our SPY stock.

Assuming Gaussian distribution, the real world SPY has a mean 0.000384, that is 0.0384% and a Standard Deviation (SD) of 0.0124, that is 1.24%.

In our mockup SPY, we create an SPY with SD = 3%. (assume Beta = 2 compared to the real world SPY, and YK strategy played the Ultra ETFs anyway), and with a mean, that is not fixed, but varies.

We want to play daily MR on this stock, so we construct an SPY that has the following mean as %change for the next day:

Note that the mean is known every day, but the proper next day outcome is not known, so we still have a random process. After the Mean is determined by this function, the actual next day %change is determined by a Gaussian process with this Mean.

Note the strong Mean Reversion feature of the generated time series: when previous day %change is negative, we generate a new day with a positive expected value.

Let be f() a stochastic function that transform the %change of the previous day to the %change of next day for SPY, we can say that:

E(f(x) | for all x, where x < 0) > 0      , which means negative days imply that the next day change has a positive Expected Value

and

E(f(x) | for all x, where x > 0) < 0               , which means positive days imply that the next day change has a negative Expected Value

Perfect instrument for playing daily MR.

Our next day SPY price is calculated by

SPY = SPY * (1 + f(x)),

where f(x) = N(mean for next day, SD);

where N(mean for next day, SD) is a Normal distribution process with the specified mean and SD.

We have a threshold of -1% for the strong mean reversion. If today %change is less than that, we have a strong MR on the next day, because our Expected %change for next day is +0.3%.

The whole f(x) function is similar to a -X function, except that between -0.5% and +0.5% it is very, very close to zero.

We say we have weak MR signal when the %change of the previous day is in this range.

Michael can argue that even in this region, the expected profit for next day is positive with MR strategy, but we will show it is not the case.

The expected profit is positive; yes; but it doesn’t imply we should take a position other than Cash.

(the Expected profit of the next day is positive, yes, but the Expected profit of the MR strategy will not be)

We generate two strategies: one (Strategy1) is a pure MR that goes long if the previous day was negative and goes short otherwise.

The other strategy (Strategy2) goes to cash in case of week MR signal, between -0.5% and 0.5%.

We generated 100,000 days of data that would equal to 400 years of stock market days.

One run of the simulation is charted here: (click for better image)

The outcome of the stochastic simulation is random; therefore we repeated the simulation 100 times to got reliable (not too random) results. The presented statistics is the average of those 100 simulations.

The MR+Cash position strategy is in cash 13% of the time. This is with the default 3% SD of the SPY generation.

(In case we use 2% SD for SPY generation, we are in cash 20% of the time.)

Conclusion:

– Because of the cash position, we are not surprised too much that the 2.8% SD of the MR+Cash strategy is less than the 3.0% for the pure MR strategy.

What really surprising is that the profit, the annual CAGR is higher too (37.01% instead of the 35.04%).

It is higher in spite of the fact that the Expected Profit was positive on those days that we replaced by cash position.

So we missed some positive profit in the MR+Cash strategy compared to the MR, still got better profit.

How is it possible that even with those missed profit, the CAGR is higher in the MR+Cash strategy?

The answer lies in the volatility. And the fact that we are talking about a time series, a.k.a. a sequence of discrete days.

If our job would be to bet on the MR outcome on a weak MR signal just once, only on a single day in our lifetime, we would bet on playing the MR strategy (and no Cash), because the expected outcome is always positive if we play the MR.

However, now we are dealing with a time series, and the daily aggregation (multiplication) of every day simulations. In this case, not only the expected value, but the SD of the time series matters too.

In this case, we would omit using the MR strategy on weak MR signals, and we would stay in cash.

As the simulation shows this decreases the volatility and increases the profit too.

– With both better CAGR and better SD, no wonder the Sharpe is increased from 11.68 to 13.23.

– The answer to the mystery is that the time series profit is decreased by the volatility drag of the time series.

– When we bet on the outcome of the next day, the expected profit should be higher than a threshold to compensate the volatility drag. And this threshold for the expected profit should be significantly higher than zero.

– Our world is not simple black and white. If it doesn’t worth to short SPY, it doesn’t imply we should long SPY. There is a fine line between the two, where no short, no long positions worth taking. We better stay in cash. The higher the volatility the larger is the region around the decision boundary of the expected profit where we should stay in cash.

– the cash position is generally preferred when we are uncertain about the outcome. Cash position is a good risk mitigation tool too.

### Do stock prices live in Mediocristan? An Apple case study 2: Levy alpha stable distribution

20Jul12

We continue the previous post here that analysed the distribution of the Apple stock price daily %changes. We concluded that the distribution (if it is a static distribution) cannot be Gaussian. What can the probability distribution be then?

There is a distribution called Levy distribution

http://en.wikipedia.org/wiki/L%C3%A9vy_distribution

which has 2 parameters and it is not really what we are looking for.

A generalization of it is the Levy alpha stable distribution:

http://en.wikipedia.org/wiki/Stable_distributions

(a quote from the wiki page that is relevant to our case

It was the seeming departure from normality along with the demand for a self-similar model for financial data that led Benoît Mandelbrot to propose that cotton prices follow an alpha-stable distribution with a equal to 1.7. Lévy distributions are frequently found in analysis of critical behavior and financial data (Voit 2003, § 5.4.3).

“)

`The Levy alpha stable distribution has the following 4 parameters:`

alpha = 1.5;    % characteristic exponent, and describes the tail of the distribution
beta = 0;       % skewness, asymmetry
gamma = 1;      % scale, c, (almost like variance),
delta = 0;      % location, (almost like a mean),

A good summary is here:

http://math.bu.edu/people/mveillet/html/alphastablepub.html

that shows that the Gaussian, Cauchy, simple Levy are all special case of the Levy alpha stable distribution.

Also, that link shows a package that in theory could be used to calculate the parameters from the samples, or calculate the PDF, CDF from the parameters. Unfortunately, that Matlab code is buggy, so we couldn’t use it to estimate the parameters.

However, we later used it to generate the PDF and CDF from the parameters.

Luckily, we found another software package that seems to work for generating the 4 parameters from the samples:

http://www.mathworks.com/matlabcentral/fileexchange/34783-estimation-of-alpha-stable-distribution-parameters-using-a-quantile-method

1.

Let’s see how to code it in Matlab:

1.1. Generating the parameters:

```aMean = mean(pChanges); stDev = std(pChanges);  % it uses the n-1 as a denominator params=alpha_loglik(pChanges); disp(sprintf('The optimizing value of alpha is: %d',params.alph)); disp(sprintf('The optimizing value of beta is:  %d',params.bet)); disp(sprintf('The optimizing value of gamma is: %d',params.gamm)); disp(sprintf('The optimizing value of delta is: %d',params.delt));```

1.2 Plotting the PDF:

``` x=-0.59:0.01:0.39; yGauss=gaussmf(x,[stDev aMean]); plotGauss = plot(x,min(yGauss, 5.2)); set(plotGauss,'Color','green','LineWidth',2) yAlphaLevy=stblpdf(x,params.alph,params.bet,params.gamm,params.delt,1e-12); plotLevy = plot(x , min(5.2, yAlphaLevy ./ max(yAlphaLevy)));        % normalize maximum to 1 set(plotLevy,'Color','red','LineWidth',2) xlabel('Gaussian vs. Alpha Stable Levy'```

1.3 Calculating the CDF:

``` cdfGauss = normcdf(x,aMean,stDev); cdfLevy = stblcdf(x,params.alph,params.bet,params.gamm,params.delt,1e-12);```

2.

There is a question that what are the synthetized parameters of the Levy alpha stable distribution for AAPL daily %change? Here they are:

params =
alph: 1.6228 % characteristic exponent, and describes the tail of the distribution
bet: 0.20171 % skewness, asymmetry
gamm: 0.016158 % scale, c, (almost like variance)
delt: 0.0015028  % location, (almost like a mean),

Alpha is 1.62. So it has a long tail. It is comparable to the cotton price alpha of 1.7 that was calculated by Mandelbrot.

Beta is 0.2, there is some positive skew, asymmetry. No wonder, since Apple stock prices trended up mostly, and in general as the stock market is trending up, there are more Up days than down days.

The Delta is 0.0015, that is not exactly like an arithmetic mean, but you can interpret it that the daily %change is about +0.15% (a positive number). Again! It is not a mean! Alpha stable distributions hasn’t got a concept of mean; The mean is not determined, because the mean is not stable. Just remember that the Cauchy distribution has infinite variance, and therefore undetermined mean. (We can talk about the median though)

3.

Let’s see visually how the Levy alpha stable distribution fits to the real life samples. So, plot the PDF of the samples (blue bars), the Gaussian (green line) and the Levy alpha stable (red).

It is amazing how nicely the Levy version fits the samples. In contrast the Gaussian estimation looks clumsy.

It seems that in the center part of the plot, the Levy is under the Gaussian, however, we know that at the tails, the Levy should be above the Gaussian, since Levy correctly estimates the ‘fat tails’ of the distribution. So, let’s zoom to 0-0.2 range to see when the two distributions cross each other.

4.

As an illustration what is the difference of probabilities at the tail, when using Levy vs. Gaussian.

For example, let’s go back to the day, when AAPL dropped -52% on a single day.

The PDF at -0.52 is:

Gaussian: 1E-60

Levy: 0.0016 = 1.6E-3

That is much of a difference.

Note, it is the PDF! (not the CDF), so don’t use it for calculating chances. It only illustrates the difference of the two. And that the Gaussian PDF is so small, that no integration of those small values can result a significant probability (CDF) at that level.

5.

We have to confess that in the previous post, we used the PDF for probabilities calculation. That was wrong, but after recalculating those numbers, the main message is still the same. We partially amend that in this post. Now, we correctly use the CDF for probability calculation.

• p(-10% drop)=
Gauss: 0.04%, every 2500 trading days, every 10 years
Levy: 0.7%  // (every 140 trading days; about 2 times per year); yes; fundamentally, it is possible, because there are 4 earnings dates per year
• p(-20% drop)=
Gauss:1.3e-11, about once in every 1e+11 days.
As Earth is 10^12 days old, it can happen 10 times in the lifetime of the Earth.
Levy: 0.02%  // every 500 trading days: every 2 years
• p(-52% drop)=
Gauss: 5.6e-67, more than the lifetime of the known Universe
Levy: 0.045%: every 2200 trading days; every 9 years
• p(+33% gain)=
Gauss: 1.07e-26; about once in every 1e+26 days.
Levy: 0.15%, every 666 trading days, every 3 years;  Maybe that is an exaggeration.

We let the reader decide which mathematical model (Gaussian or alpha stable Levy) fits the real life data better.

Just for curiosity, according to Levy, every day

– there is 0.74% chance of a -10% drop

– there is 1.1% chance of a 10% gain   (strange asymmetry); One expect the chance of the same percent gain to be less, because drops are more violent;

That is true, but in general AAPL trended up, therefore the whole distribution skewed to the right: more samples show gains than losses; that is the reason;

Obviously, if a stock goes up a little in every 99 out of 100 days, the distribution is skewed to the right.

About the same thing, but in another words:

On every single day:

– there is 1% chance of a -8.5% drop

– there is 1% chance of a 11% gain                            // there are more gains than drops

(1% chance: realistically happen every 100 trading days = 5 months)

It means that any Good Risk Management strategy should consider that

– a -10% drop can occur twice per year (Gaussians thinks it happens every 10 years),

– and a 20% drop can occur every 2 years. (Gaussians thinks it is impossible)

Conclusion

This post is tries to be a similar eye opening material in AAPL price changes as Mandelbrot’s book ‘The (Mis)Behaviour of Markets’ in many real life events.

http://www.amazon.co.uk/The-Mis-Behaviour-Markets-Fractal/dp/1846682622/ref=sr_1_1?ie=UTF8&qid=1342045407&sr=8-1

We showed how useless is the Gaussian based risk estimations and Gaussian based probability and likelihood calculations in real life stock price estimations (Apple). A much better estimation is based on Levy alpha stable distribution.

### Do stock prices live in Mediocristan?: An Apple case study.

08Jul12

In his famous book Black Swan, Nassim Taleb introduced the concepts of Extremistan and Mediocristan (as two countries). He uses them as guides to define how predictable is the environment one’s studying. Mediocristan environments safely can use Gaussian distribution. In Extremistan environments, a Gaussian distribution is used at one’s peril. There are big fat tail distributions there.

In this case study, we looked at a specific stock, Apple (ticker: AAPL).

See the historical Apple price chart here:

We took the daily historical adjusted close prices, and then we calculated the daily %changes from it.

Let’s see how the distribution of the daily price %changes fit the Gaussian curve.

Since Apple IPO, we have 7000 days of data, which is 26 years.

The MATLAB code is not too difficult:

pChanges = closePrices(2:end) ./ closePrices(1:end-1) – 1;
aMean = mean(pChanges);
stDev = std(pChanges);  % it uses the n-1 as a denominator
figure;
hold on;  % plot 2 time series on each other
[nInBins, xout] = hist(pChanges, 600);
nInBins = nInBins ./ max(nInBins) .* 2.0 ;      % convert the max to 2
bar(xout, nInBins);
x=-0.59:0.01:0.39;
y=gaussmf(x,[stDev aMean]);   % generate Gaussian
plot(x,y)

The produced chart is here (you have to click it to see it properly in its full size).

What can we observe?

The mean %change is 0.12%. The stDev is 3.02%.

That looks quite a lot of standard deviation. It would mean that

–          the price of AAPL changes more than 3% only 31% of the time (every 3rd day, ZScore = 1), or equivalently,

-the price of AAPL changes more than 6% 5% of the time. (every 20th day, ZScore = 2).

So, someone can argue that the stDev number: 3% shows that it is very volatile.

However, even this seemingly high volatility model cannot explain the AAPL real life price behaviour over the years.

Specifically, it cannot explain a -52% drop in a single day for example.

Let’s see some historical events in the Apple stock price:

1.

Worst day: 2000-09-29:  -52% single day loss.

Apple had a grim earnings report on that day and it triggered many downgrades.

It was a brutal day for Apple:

Shares of the Cupertino, Calif.-based company fell \$27.75, or nearly 52 percent, to \$25.75. Volume topped 132 million shares, more than 26 times the stock’s average daily volume of about 5 million shares. Analysts at nearly a dozen financial institutions downgraded Apple and penned scathing reports on the company.

The generated Gaussian function (that fits to that mean and stDev) says that the probability of this is

P(-52% daily loss) = 2.4 * 10^(-65). (In a scientific notation it is: 2.4e-65).

It is a very, very small value.

In average, this loss should occur every 1/2.4*10^65 days. Let’s say, it realistically occurs every 10^65 days.

Just to illustrate how big value is this:

How many days old is the earth?

Earth is 4.5B years old that is 4,500,000,000 x 365 days = 4.5*10^9*365= 1.6*10^12 days.

So, Earth is about 10^12 days old, and the event that Apple stock price drops -52% should occur every 10^65 days.

It shouldn’t have occurred in the lifetime of the Earth!

Do you think there is a problem with the Gaussian mathematical model to describe financial data, or do you think the Gaussian function properly models real life events?

2.

Second worst day: 1987-10-19, -25% single day loss.

This was the famous Black Monday (1987) day when the Dow Jones dropped -22% on that single day. In itself, it was a Black Swan event.

The generated Gaussian function (that fits to that mean and stDev) says that the probability of this is

P(-25% daily loss) = 9.7 * 10^(-16). (in a scientific notation it is: 9.7e-16).

In average, this loss should occur every 1/9.7*10^16 days. Let’s say, it realistically occurs every 10^17 days.

(Again: Earth is 10^12 days old).

3.

Best day: 1997-08-06, 33% single day gain.

The event for the day was the following:

1997: Microsoft rescues one-time and future nemesis Apple with a \$150 million investment that breathes new life into a struggling Silicon Alley icon

The generated Gaussian function (that fits to that mean and stDev) says that the probability of this is

P(33% daily gain) = 1.9 * 10^(-26). (in a scientific notation it is: 1.9e-26).

In average, this gain should occur every 1/1.9*10^26 days. Let’s say, it realistically occurs every 10^26 days.

(Again: Earth is 10^12 days old).

4.

Second Best day: 1997-12-31, 24% single day gain.

The generated Gaussian function (that fits to that mean and stDev) says that the probability of this is

P(24% daily gain) = 2.7 * 10^(-14). (in a scientific notation it is: 2.7e-14).

In average, this gain should occur every 1/2.7*10^14 days. Let’s say, it realistically occurs every 10^14 days.

(Again: Earth is 10^12 days old).

In other words, if Earth’s lifetime is 100x times bigger than it is, this event should occur only once, only on 1 day.

Conclusion:

After this data, we contend that the price time series of stocks doesn’t fit into the Gaussian model.

Financial time series doesn’t belong to the world of Mediocristan. Unfortunately, the general mathematical models of risk that is used by banks, hedge funds or regulators are based on Gaussian distribution. We conclude that real life price series doesn’t work according to the mathematical model.

We would urge the investigation of other risk models: Levy-distribution, power laws or Mandelbrot’s fractals that we reckon would better fit real life data.

### Playing the equity curve: when it doesn’t help too much

04Jun12

I continue my developer diary with a post that illustrates that sometimes even a seemingly sensible Wall Street idea doesn’t help. One important investment advice is containing the losses. One way of this is the well know stop loss (fix percent or trailing stop loss). The problem with the stop loss technique is that it gives a timer when to exit the position, but it doesn’t tell you when to enter the position again once the position was stop lost. (Wait 2 months or what?).

There is one idea that helps similarly as a stop loss, but it exactly tells the trader when to enter the position again. It is a trailing indicator called ‘playing the equity curve’. (Albeit, it has various other names in other terminologies). The ‘equity curve’ is the Portfolio Value curve of the strategy. It can be any strategy (simple or complex, it doesn’t matter).

The basic idea is to play the strategy if it is above its 200 days moving average, and go cash or play the inverse strategy otherwise.

The method follows not one, but three portfolio value (PV) charts.

–          Original PV

–          EMA PV (Exponential Moving Average of the original PV)

–          Played PV  (is played in real life)

It seems it is a trend following method. When our strategy has strength, we play it, when it has some weakness, we stop loss it. All with a lagging indicator.

It only has 1 additional parameter: the lookback days of the EMA. (SMA can be used too).

Here is the original equity curve of a strategy over 8 years. It is a special volatility strategy, but in the context of this blog post, it doesn’t matter.

The human eye can ‘clearly’ recognize some patterns: the strategy worked in the first 2.5 years, then it stopped working for 3 years, then it worked again for 2.5 years.

We see 3 market regimes accordingly. In regime 1 and regime 3, we should play the strategy, and we should sidestep regime 2.

The ‘playing the equity curve’ technique will be a great help. Won’t be?

Version 1:

Be in Cash on the downside.

Let’s see when it is applied for 50, 75, 250 EMA values when under the EMA curve we are in cash.

(Click on the image to see it properly)

Did it help with the drawdown (DD)? Yes.

Did it hep with the profit? Not really.

Just let’s imagine that you started this strategy and in half a year it doubled your investment, then it didn’t give any profit for 6 years. Would you play this strategy? No.

The problem is that regime 2 becomes a too ‘neutral’ territory for our strategy. In regime 2 our strategy was not a winner, but neither a loser. The original curve flatlined.  And the EMA curve fitted onto it. Using a 75 days EMA parameter, in 75 days the EMA curve reached our original equity curve. In the next 4 years, we treaded water.

Version 2:

Let’s see the same EMA parameters, but instead of being Cash under the EMA, it plays the inverse-strategy.

We have huge drawdowns.

In theory, it seems that a long term EMA like 250 helps a little more, because there are less whipsaws. That is because the smooth EMA250 line only slowly reaches the equity curve of the original strategy in the problematic regime 2.

There are 2 problems with the feeling that the parameter 250 is the best and that we should we use this in the future. One is that selecting an optimal historical parameter is a kind of parameter overtuning. Because of some random chance, it turned out that this parameter was better.

Another one is that there is no guarantee that future bad (neutral) regimes (like regime 2) will take 3-4 years (as it was for regime 2), therefore it is unlikely that the EMA250 will be the best parameter in the future.

Conclusion:

I don’t have the magic solution right now.

The ‘playing the equity curve’ technique helped a little on the profit, a little on the volatility, but it was far from the success I expected.

One of the main problems is that investors would very likely stop the strategy after 4 years of treading water. I expected more from this technique. Maybe I expected too much.

### Comparing Linear Regression vs. Classification, VXX, 2D

01Nov11

As we continue the previous posts, we have at least one parameter: the number of lookback days. Doing sensitivity analysis on this parameter, we hope to compare regression and classification methods.

In further detail, we are going to compare

1.

Linear regression

– based on the normal equation, deterministic evaluation

– it is not iteration, so it instantly finds the exact optimum; so iteration number is not a parameter

– requires no normalization of inputs, outputs (so there is no Normalization as a parameter)

– has only 1 parameter: lookback days

2.

Logistic Regression (the name shouldn’t mislead you, it is a classification), binary

– 2 categories: Buy or Sell (these categories are defined by the %change threshold of 0%)

– gradient descent iteration parameter: 400 (probably, it is enough, because in linear tasks, the Cost function is convex, so there is only a global minima)

– in theory, normalization is a parameter, because we do gradient descent iteration. Normalization would help the gradient descent to converge faster.

However in this case, with this very simple convex Cost function and because the range doesn’t differ too much from the ideal -1..1 range,

(our range is -0.2..0.2 (in a range of -20% to +20%)), we could normalize by a x5 multiplier, but that wouldn’t help the gradient descent too go faster too much.

So, we regard that we lose very small speed of the gradient descent. And this is not significant.

3.

Logistic Regression (classification), 3 categories,

– 3 categories: Buy, Cash, Sell signals;

– these 3 categories are defined by the %thresholds of: -1.5%.. +1.5% (so, if Y output is in that range, we regard Y output as Cash)

– gradient descent iteration parameter: 400 (as before)

– we chose not to normalize the input, as in the previous case. That is also a parameter: “Normalization: OFF” from a possible basket of normalizations { mean normalization, mean and min-max range normalization, mean and std normalization, only std normalization, etc.)

We do sensitivity analysis for lookback days:

<click on the image to see it properly>

What we can observe that in general is that the classification solutions achieve less final PV (Portfolio Value).

Why is that?

The reason lies how the methods handle the training samples.

For Classification, all training samples are equal, irrespective of their Y magnitude. As it takes only the Sign() of the %change (+1, 0, -1). If we have a VIX spike of +40% on a day, that is treated equally to another day that has +2% VIX change (in the classification case). This has some advantages and disadvantages. Particularly, the classification is less sensitive to outliers. However, it turns out that exactly these outliers are very important in our problem. When VIX increased on 1st August 2011 by 50% on a day, which was a huge increase. It instantly modified the non-classification (but regression) based solution to be positively biased. Only one of this outlier could have a great effect for the next weeks, months. Afterward, all the predictions were Upside biased: it was more likely to forecast Up %changes than Down %changes.

However, for classification, this huge 50% %change was only another sample with +1 output value. It took a long time, until the Classification methods realized that we are in a new regime: in a regime where Up days are more likely than down days.

In that sense, Regression is more agile, it adapts more quickly to a regime change that is signed by an outlier than Classification. And it turns out that in this problem case, it is better.

In another problem case (forecasting the price of houses) this outlier sensitivity would be counterproductive.

This is very well illustrated in the Ultimate predictor PV chart.

The Ultimate predictor aggregates the different lookback predictors from 75 to 110 lookback days and do a majority vote. It is the PV (Portfolio Value) chart:

The 2 occasions when the Linear Regression outperformed the Classification is when the low VIX regime changed to a high VIX regime: in 2010 summer and 2011 August. In both cases, regression was quicker to adapt.

The 3 classifier case has the lowest drawdown in the PV chart, but the lowest profit too. This is a kind of trade-off. We can go to Cash sometimes. This, obviously, decrease the drawdown, but as we don’t participate in the market in this less certain times, we leave profit on the table. However, that can be good for a conservative, non aggressive version of the strategy.

Observe also that in the Sensitivity Analysis chart, we can witness that the 3 categories classifier achieves the least PV. That is somehow expected, because it is in Cash about 30% of the time. It has probably less drawdown too.

It is unexplained however that at the far end of the sensitivity chart (having more than 150+ lookback days) why the 2bin classifier performs so poorly (it goes back to the PV = 1 line, having no profit in 2 years), while the 3 bins classifier (that is in cash 30% of the time) has PV = 2 in this region of the sensitivity analysis chart.

Conclusion:

We compared regression and classification. In our prediction problem, regression was better, because it doesn’t trump the effect of outliers.

The binary and the 3 categories classifier perform similarly to each other. That means their PVs are equal (Ultimate version), albeit the 3 bins version has lower drawdown, suitable for conservative implementation.