Encog quality

06Mar11

The previous post compared the speed of Encog and Matlab Neural Network (NN) Toolbox. This post compares Encog and Matlab qualitatively. Can Encog predict with the same prediction power as Matlab did for us in the past?

Before answering that question, we amend our previous post. In the previous post, we contend that a test in Matlab runs 35 minutes. That was true, but we also mentioned that that Matlab program does a lot of extra things like outlier elimination, handling VIX and EUR data, etc. For this post, we weeded out the unnecessary Matlab code parts, so now the Matlab and Encog code are exactly the same in functionality.
For speed comparison, we run exactly the same task for Matlab and Encog this time, and that is the 2bins input and 2bins output case. This is a different task than the one previously tested. It is much simpler, quicker to execute. That is the reason why even the current Encog measurements differ from the previous Encog measurements. The comparison is more faithful now. That is for this simpler prediction task, using 1 random sample in the ensemble:
– Matlab: 11 minutes (instead of 35 minutes as stated earlier) = 660 seconds (it uses only 1 core (inspected by the TaskManager))
– Encog (single thread): 8 seconds (in theory, the training algorithm is multithreaded)
– Encog (days run parallel on 4 core PC): 3 seconds

In practice, we always run the Encog backtests in parallel mode, so the 660seconds to 3 seconds increase is still a 200x fold speed increase. Thank you Encog!

Let’s see the different strategies we run for the last 23 years, lookback days: 200.
– The Buy&hold strategy forecasts an up day for all days.
– The Deterministic MR (Mean Reversion) strategy forecast an up day if the current day is a down day (and vice versa).
– The Deterministic FT (Follow Through) strategy forecast a down day in the current day is a down day (and vice versa).
– The Naive Learner uses the last 200 days as input. It calculates the probability distribution of the input based on binning the input into 2 or 4 bins. It gives back the average in that bin.
For Encog and Matlab we tested 3 different cases
– the unnormalized input, output case
– 2bins: the Sign() discrete inputs; that basically converts the input and output to -1 or -1. That is some kind of basic ‘normalization’
– the normalization of the input and output to the [-1..+1] continuous range
(sometimes, we boosted the normalization with a constant boost value. In case of boost=100, the range becomes [-100..+100].

We show some portfolio value charts.
See the RUT Buy&Hold for the last 23 years period.

The deterministic MR chart:

Note the last period from 1998:

The deterministic FT chart here. Note the -90% drawdown:

Note the last period from 2010:

The naive learner, 2bins input case:

The naive learner, 4bins input case:

The Encog NN learner with 2bins input 2bins output:

The Encog NN learner with normalization boost 100:

And we show some performance numbers. The meaning of the columns:
-CAGR: Cumulative Annual Growth Rate
-TR: Total Return
-D_Stat: directional accuracy of the forecasts
You have to click the images to zoom it to see them nicely.
For the non NN (Neural Network) strategies:

Matlab:

Encog:

During our quest we learnt some differences of Matlab and Encog:
Encog doesn’t auto-normalize input or output, while Matlab does (that is why Matlab prediction is quite good, even without any normalization)
-there is no tansig() activation function in Encog (that is the default in Matlab), but there is a similar one TANH in Encog.(Note from Matlab documentation: ‘tansig() is mathematically equivalent to tanh(N). It differs in that it runs faster than the MATLAB® implementation of tanh, but the results can have very small numerical differences.’ For us, it means the two are the same.
-There are different learning algorithms, Matlab uses trainLM (Levenberg-Marquardt) by default and Encog uses ResilientBackpropagation by default. Encog doesn’t have trainLM.
-It seems that Encog gives back the correct value of the estimation. For example, if directions (+1,-1) were learnt, Encog gives +1 or -1 (or very close to that) as a prediction. Matlab gives the average, like 0.04 and -0.04.
-Encog default 3 layer Backprog network has extra bias/and neurons. It differs from Matlab. More specifically:
A.
The newFF() in Matlab network is:
1. The input is not a layer; no activation function, no bias. It has only 2 layers (the middle and the output)
2. The middle layer has a bias, and tansig transfer function
3. The output is a layer; having a bias (we checked); but it has Linear activation (in the default case); in the Matlab book, there are examples with tansig output layers too

B.
The default FeedForwardPattern().Generate() in Encog gives:
1. 3 layers has TanH() activation. We found it weird. Only the middle or maybe the output layer should have activation functions.
2. the last layer has no bias, but the first and second have
3. the middle layer has 2 biases (the biasWeight is 2 dimensional) in case of 2 neurons case
4. the biasWeights are initilazed randomly; Correct. That is expected.

Conclusions after studying the performance numbers:
– the Buy&Hold gives 430% profit in 23 years. That is about 7.96% CAGR. It multiplied your initial 1 dollar by 5 in 23 years.
– the deterministic MR is a disaster. It worked from 1998 only.
– the deterministic FT gives 18% CAGR (TR = 3,657%), but the maximum DD of -90% makes this method a disaster (it lost -90% from 1998)
– the 2bins case naive learner is quite good: 29.5% CAGR. (TR = 30,000%). It multiplied your initial 1 dollar by 300 in 23 years. It is also a learning algorithm. Adaptive. So, even without the power of the NN (neural networks) it is worth considering.
– the 4bins naive learner (TR = 20,000%) was worse than the 2bins naive learner (TR = 30,000%).
– the NN based machine learning algorithm (TR = 45,000%) could beat the naive learner. That is a very good message to us. So, it is worth using the NN.

– the non-normalized, no binned Encog NN is a disaster, because the input values are very small: 0.1% change of the RUT induce an input of 0.001. These lead to numerical errors during training.
– the 2bins case NN achieved in average about TR= 8,246%. (It multiplied your initial 1 dollar by 100). That is about 20.94% CAGR.
– however, the properly normalized, and normalization boosted Encog NN shines. It was even better than the Matlab NN.Note that the average TR= 45,000%. (It multiplied your initial 1 dollar by 450). That is about 32% CAGR. The directional accuracy is 56.5%. In such a long term period (23 years) like this, this is our best prediction; best result so far.
– Compare it to the Buy&Hold approach. Buy&hold multiplied the initial capital by 5 in 23 years. Encog NN multiplied it by 450. That is 100 times more.

– To refresh our mind. This charts show again that why it is worth using an adaptive learning algorithm instead of a static one (like DV2, RSI2, overbought signals, etc.). The deterministic algorithms (daily MR, FT) cannot cope with the change in the world. Any fixed, static algorithm (daily FT) that was a winner before 1998 had a 12 years losing period after 1998. Something happened in 1998. We don’t know what, but the financial world has changed. The daily follow through is substituted by daily mean reversion. A rigid, static strategy would fail. In contrast, any learning algorithm (naive, NN, SVN, etc.) could adapt very well to the change of the world and could be a winner strategy even after 1998. That is why we have to continue using machine learning. To adapt to the change of the world.

– Note also that this input (currDayChange) per se is not sufficient for us. It shows 32% CAGR, which is very high, but note that too much portion of the gains come from 2008, a black swan year. This is the year, in which because of the financial turmoil, any daily MR strategy beat the market. Because of this fact, we wouldn’t use this input, this strategy alone. We would like to combine it with other inputs to have a more reliable (not more profitable) strategy.

Most important conclusion of this post:
Encog and Matlab behaved about the same. When one had predictive power, the other also had and the magnitude is about the same. (in the 2bins case, Matlab gave 8842% TR in average, Encog gave 8246% TR). Therefore, we can contend the prediction power of the Encog NN and the training algorithm and the whole Encog framework is reliable. We will use Encog in the future. One warning only: be very careful and normalize the data. We learnt that small range input is unacceptable and Encog doesn’t do automatic normalization.

Advertisements


No Responses Yet to “Encog quality”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: