### Input and output in log scale

Some articles use not the original raw time series as input, but scale the input by a log function.

I haven’t seen too much justification of it, but I thought I will try it myself.

How can I use this idea?

My input1 is ‘RUT/SMA(30) – 1’, that is the RUT %difference between its SMA(30).

So, for example instead of using input1 = ‘RUT/SMA(30) – 1’, should I use

‘log (RUT/SMA(30) – 1)’.

Hardly. Note for example that input1 is in the range of [-0.3%..0.3%]. (that is the RUT is [-30%..+30%] from its SMA(30)).

log(-0.3)?

🙂

That is invalid.

I was scratching my head how to use it when I decided to investigate why researchers use it at all?

**1. Output**

General usage.

Note this quote from a “Guidelines for Financial Forecasting with Neural Networks” article from National University of Singapore.

”

Select inputs from available information. Inputs and

targets also need to be carefully selected. Traditionally,

only changes are processed to predict targets as the

return or changes are the main concerns of fund

managers. Three types of changes have been used in

previous research: X_i – X_i-1, log X_i – log X_i-1, (X_i – X_i-1)/X_i-1

”

As I use the 3rd one as output, I don’t need the second one, the log version for output.

**2. Inputs**

Note this from the article “Artificial Neural Network Models for Forecasting Stock Price Index in the Bombay Stock Exchange”

”

the error during training dropped by almost 60 per cent when

input ranges were narrowed down to 2,500–6,000 from the initial range of

0–10,000. Specifying an input range which is much larger than the possible

values of inputs has a huge penalty in terms of accuracy and reliability of

the network as gradient changes become much smaller due to the larger input

range and network performance becomes poorer in line with the increasing range

”

That makes sense. So, using log scale for the input is good for narrowing the input ranges.

However:

2.1:

In intelligent NN frameworks (like Matlab), narrowing the range is made automatically.

It is documented that the training set to [Min..Max] range is converted to [0..1] range.

That is a pity if the authors of the Bombay article had a software that doesn’t do it automatically.

But, this thing is trivial.

And for me, it is made auto by Matlab.

2.2:

If the input has a non-uniform range, like:

there are many input values in the [1..10] range, then there are very few in the [10..100] range, but there is a couple of inputs

in the [100..1M] range, that is a non-uniform distribution of inputs in the input domain.

For it, it makes sense to use a log scale, so the very high inputs (1M) converted to a lower value.

http://en.wikipedia.org/wiki/Logarithm

This can really improve the learning precision.

However, in my case, the typical input values are in the range [-0.3..0.3].

On this range, the log() function is invalid.

I can convert this range to a **range of [0.7..1.3]**, on which the log() is defined,

but on this range, **the log() is very much a linear function**.

Transforming something with a linear function makes no sense, it will not improve the distribution of the input samples (nor improve the range).

So, **we have a special case and therefore there is no point converting our inputs or outputs with the log() function**.

At least, that is what I reckon now.

Filed under: Uncategorized | Leave a Comment

## No Responses Yet to “Input and output in log scale”