Input and output in log scale


Some articles use not the original raw time series as input, but scale the input by a log function.
I haven’t seen too much justification of it, but I thought I will try it myself.
How can I use this idea?
My input1 is ‘RUT/SMA(30) – 1’, that is the RUT %difference between its SMA(30).
So, for example instead of using input1 = ‘RUT/SMA(30) – 1’, should I use
‘log (RUT/SMA(30) – 1)’.
Hardly. Note for example that input1 is in the range of [-0.3%..0.3%]. (that is the RUT is [-30%..+30%] from its SMA(30)).
That is invalid.


I was scratching my head how to use it when I decided to investigate why researchers use it at all?
1. Output
General usage.
Note this quote from a “Guidelines for Financial Forecasting with Neural Networks” article from National University of Singapore.

Select inputs from available information. Inputs and
targets also need to be carefully selected. Traditionally,
only changes are processed to predict targets as the
return or changes are the main concerns of fund
managers. Three types of changes have been used in
previous research: X_i – X_i-1, log X_i – log X_i-1, (X_i – X_i-1)/X_i-1

As I use the 3rd one as output, I don’t need the second one, the log version for output.

2. Inputs
Note this from the article “Artificial Neural Network Models for Forecasting Stock Price Index in the Bombay Stock Exchange”

the error during training dropped by almost 60 per cent when
input ranges were narrowed down to 2,500–6,000 from the initial range of
0–10,000. Specifying an input range which is much larger than the possible
values of inputs has a huge penalty in terms of accuracy and reliability of
the network as gradient changes become much smaller due to the larger input
range and network performance becomes poorer in line with the increasing range

That makes sense. So, using log scale for the input is good for narrowing the input ranges.

In intelligent NN frameworks (like Matlab), narrowing the range is made automatically.
It is documented that the training set to [Min..Max] range is converted to [0..1] range.
That is a pity if the authors of the Bombay article had a software that doesn’t do it automatically.
But, this thing is trivial.
And for me, it is made auto by Matlab.

If the input has a non-uniform range, like:
there are many input values in the [1..10] range, then there are very few in the [10..100] range, but there is a couple of inputs
in the [100..1M] range, that is a non-uniform distribution of inputs in the input domain.
For it, it makes sense to use a log scale, so the very high inputs (1M) converted to a lower value.
This can really improve the learning precision.
However, in my case, the typical input values are in the range [-0.3..0.3].
On this range, the log() function is invalid.
I can convert this range to a range of [0.7..1.3], on which the log() is defined,
but on this range, the log() is very much a linear function.
Transforming something with a linear function makes no sense, it will not improve the distribution of the input samples (nor improve the range).
So, we have a special case and therefore there is no point converting our inputs or outputs with the log() function.

At least, that is what I reckon now.


No Responses Yet to “Input and output in log scale”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: