Abandoning MATLAB for greater good


subject: Abandoning MATLAB for greater good

Happy new year. This is our first post in 2011.

A serious obstacle we had during our research last year is the backtesting time. All the code we wrote was in Matlab. Consider that backtesting only 10 years requires approx. 10 minutes in Matlab with only 1 random sample. Note that instead of using 1 random forecast, we have to use about 5 or 10 to have a reliable projection. So every single backtest run takes about 100 minutes. When we are interested in how a parameter change affect the performance, we want to test 10 different parameter values, 10 different backtests. It can take 10 times 100 minutes. 1000 minutes is 16.6 hours for fine tuning a parameter. It took me days and nights to run the backtests. And usually I couldn’t backtest 20 years, only 5 years. Not to mention that many times I had to wait 1 or 2 days to evaluate the result of a new idea. That is simply annoying.

I stumbled upon a promising Machine Learning framework called Encog that has native Java and C# implementation.

This post is about comparing the speed of Matlab and Encog.

I believe Matlab is one of the best platform for mathematicians. Compared to other packages, Matlab is quite fast. see here At least it is 2 times faster than the Matlab Compatible open source Octave.

Matlab speaks the language of mathematicians. However, it is an interpreted language and it is inherently slow if you compare it to more native computer languages like C++ or C# or Java.
The advantage of Matlab:
– very well written. Paid product, so the authors has some responsibility to keep their code fast and bug free.
– it contains many useful math functions, ready to use, tested, reliable
– it has in-the-box chart making capabilities
– the final source code is very concoise, easier to read
Disadvantage of Matlab:
– I significant drowback of using Matlab is the execution speed.
– debugging (watching the variables in real time) is not very sophisticated

However, I am natural born programmer, so I shouldn’t be concerned about using C# instead of the Matlab script language. Even if expressing the same thought into machine code takes much more line of source code to write in C# than in Matlab.

Here are the time required for a backtest that used only 1 random sample per day, backtesting time from 1987 to 2011 (about 23 years).
Lookback period is 200 days. The MaxEpoch is 5 in Matlab and 20 in C#.
Time measurements: (for last 23 years, equals 5700 forecasts)
Matlab: 35 minutes = 2100 seconds
Encog (single thread): 12 seconds (in theory, the training algorithm is multithreaded)
Encog (days run parallel on 4 core PC): 6 seconds

Note that the Matlab version calculates CAGR and TR too, and do some minor extra calculations (outlier elimination) which are not implemented in the C# version.
The C# version is really very simple. However, I don’t think this seriously distorts the result.
The timing measurements say that Matlab neural network training runs about 200 times slower.
Let’s just assume a 100x increase in case the Matlab and Encog version would calculate exactly the same things
. It would mean that a previously 1 day backtest in Matlab would run in only 15 minutes in C#. What a relief.

Note the possibility of using Encog in the Rackspace cloud.
The RackSpace ‘small machine’ cost only 1 cent per hour to rent.It is very cheap to run long backtests in the cloud. It is definitely worth considering in the future.

In speed Encog easily beats Matlab to death. Encog is faster about 100x – 200x times.
But this post is only about speed. The neural training efficiency, correctness is another issue. We haven’t yet checked the CAGR, TR or other financial performance of the Encog predictions. We have a feeling that Matlab is written much better than the open source Encog, so Matlab is more correct in training the neural network. We have to make some tests to assure that Encog makes the same good forecast as Matlab did.


2 Responses to “Abandoning MATLAB for greater good”

  1. 1 tromso

    Hi I’m curious what kind of data you use for backtesting in MATLAB. We’re using intraday with every available tick. This results in ~60 million price points in the smallest month, which we have to go through sequentially and its nowhere even near 1 year in 10 minutes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: