First Heterogeneous Ensemble

01Nov10

One point of using an ensemble is to decrease randomness. Especially in the case of a homogeneous ensemble. The average of the backtests remains the same as in the standalone case, but the variance decreases. However, another motivation for the ensemble forecast is to increase the performance, increase the average. This is reported by many articles.

We performed the directional prediction, that is the classification task (prediction of only sign(next%return)) and the function approximation task (prediction the ‘next%return’ value) in the previous two posts. Generally, the function approximation worked better (note that in our previous classification test, we haven’t removed the outlier days and we used 1 dimensional encoding in the previous test, but we will remove outliers in this test and use 5 dimensional encoding). However, the classification was not bad. (It has changed now.)

We used nEpoch = 5 for all FF (Feed Forward network) tests. In GRNN there is no nEpoch parameter, we used the default spread = 1 there.
We compare the homogeneous 10 member ensembles to the heterogeneous 10 member ensembles:
The performance of the
E1. Homogeneous (GRNN(%return)) ensemble [0 0 1] equivalent to [0 0 10]
E2. Homogeneous (FF(%return)) ensemble: [10 0 0]
E3. Homogeneous (FF(sign(return))) ensemble [0 10 0]
E4. Heterogeneous 8-1-1 ensemble [8 1 1]
E5. Heterogeneous 6-2-2 ensemble [6 2 2]
E6. Heterogeneous 4-4-2 ensemble [4 4 2]
are compared.
For example the [8 1 1] means (8 (FF(%return)), 1 FF(sign(return)) and 1 GRNN(%return))

Note that the GRNN is not a random algorithm; therefore the 10 member ensemble is equivalent to the standalone version.

There are many approaches for aggregating the votes of the members.
Quote from the Kin Keung Forex prediction book

Typically, majority voting, ranking and weighted
averaging are three popular decision fusion approaches.

Majority voting is
the most widely used fusion strategy for classification problems due to
its easy implementation. Ensemble members’ voting determines the final
decision. Usually, it takes over half the ensemble to agree a result for it to
be accepted as the final output of the ensemble regardless of the diversity
and accuracy of each network’s generalization. Majority voting ignores the
fact some neural network that lie in a minority sometimes do produce the
correct results. At the stage of integration, it ignores the existence of diversity
that is the motivation for ensembles (Yang and Browne, 2004). In
addition, majority voting is only a class of integration strategy at the
abstract level.

Ranking is where the members of an ensemble are called low level classifiers
and they produce not only a single result but a list of choices ranked
in terms of their likelihood. Then the high level classifier chooses from this
set of classes using additional information that is not usually available to
or well represented in a single low level classifier (Yang and Browne,
2004). However, ranking strategy is a class of fusion strategy at the rank
level, as earlier mentioned.

Weighted averaging is where the final ensemble decision is calculated in
terms of individual ensemble members’ performances and a weight attached
to each member’s output. The gross weight is one and each ensemble
member is entitled to a portion of this gross weight based on their performances
or diversity (Yang and Browne, 2004).

Generally, there are two ensemble
strategies: linear ensemble and nonlinear ensemble strategies.

A. Linear ensemble strategy
Typically, linear ensemble strategies include two approaches: the simple
averaging (Tumer and Ghosh, 1995; Lincoln and Skrzypek, 1990) approach
and the weighted averaging (Burges, 1998) approach. There are
three types of weighted averaging: the simple mean squared error (MSE)
approach (Benediktsson et al., 1997), stacked regression (modified MSE)
approach (Breiman, 1996a) and variance-based weighted approach (Tresp
and Taniguchi, 1995).

Simple averaging is one of the most frequently used ensemble approaches.
After selecting the members of the ensemble, the final prediction
can be obtained by averaging the sum of each forecaster’s prediction of ensemble
members. Some experiments (Hansen and Salamon, 1990; Breiman,
1994) have shown that simple averaging is an effective approach to improve
neural network performance. It is more useful when the local minima of
ensemble members are different, i.e., when the local minima of ensemble
networks are different. Different local minima mean that ensemble members
are diverse. Thus averaging can reduce the ensemble variance. However,
this approach treats each member equally, i.e., it does not stress
ensemble members that can make more contribution to the final generalization.
If the variances of ensemble networks are very different, we do not
expect to obtain a better result using simple averaging (Ueda, 2000).

Weighted averaging is where the final ensemble prediction result is
calculated based upon individual members’ performances with a weight attached
to each individual member’s prediction. The gross weight is one
and each member of a ensemble is entitled to a portion of this gross weight
according to their performance or diversity. There are three methods used
to calculate weights: the simple MSE approach (Benediktsson et al., 1997),
stacked regression approach (Breiman, 1996a) and variance-based weighted
approach (Tresp and Taniguchi, 1995).

B. Nonlinear ensemble strategy
The nonlinear ensemble method is a promising approach for determining
the optimal weight of neural ensemble predictor. The literature only mentions
one nonlinear ensemble approach: the neural network-based nonlinear
ensemble method (Huang et al., 1995; Yu et al., 2005c). This approach
uses “meta” neural networks for ensemble purposes (Lai et al., 2006a).
Experiment results obtained show that the neural networkbased
nonlinear ensemble approach consistently outperforms the other ensemble
approach.

Note that we cannot average the votes here, because the FF(sign(return) network predicts directions only and it wouldn’t be fair to combine this with actual %return forecasts.
Therefore in this study, we used here the majority vote to aggregate the votes of the members, that is

resultForecast = sum(sign(forecasts));

1.
It is strange that the FF(%return) network returns so good CAGR, while having so bad D_stat. Because of this, using the D_stat as a performance measure for comparing these algorithm is not advisable. We better relate to the CAGR and TR measurements now.

2.
Based on the performance measurements, just blindly aggregating the member forecasts in the [4 4 2] case, there is no improvement. In this case, the weights of the different type of the algorithms are 4,4,2. That is very close to the 3.3,3.3,3.3 equal weight weighting scheme. However, with this heterogeneous ensemble approach using the majority vote ensembling when members got equal votes, we got bad result, because:

A.
The standalone algorithms, namely the FF(sign(return)) and GRNN versions doesn’t work. They gave negative TR. It is not a surprise that combining them is not very good. Albeit the combination of them is better then the worst of the standalone algorithm, but the combination [4 4 2] is not better than the best standalone, the FF(%return) ANN.
(Note. in our previous studies, the FF(sign(return)) and GRNN worked only for 1 dimensional case without outlier elimination, and we hadn’t tested them for the 5 dimensional case.)

B.
It shows only that the equal weighting majority voting doesn’t work. In the future, we may try other voting mechanism, like the averaging the forecasts, or some other confidence based weighting. For example, giving more weight to the FF(%return) ANN, that performs very well as a standalone algorithm.

3.
For the CAGR and TR, the best performance is obtained by the heterogeneous [8 1 1] network. This proves us that a blind equal aggregation of the members with w=1 weight doesn’t work, but when a good predictor have higher weight (in this case FF(%return) has 80% of the votes), the overall prediction improves, even if the other 20% members are generally losers. What happens here is that if a decision of the FF(%return) are almost equally bullish as bearish (they have 4 bullish vote, 4 bearish vote), it is good to aggregate a new player, a new strategist into the picture. But the main point of this study is this: when ensembling networks, never use the equal weight approach. The winner standalone strategies should be given higher weights.

The heterogeneous ensemble can be a better predictor than the homogenous, if the weights are selected according to their underlying performance.

Advertisements


No Responses Yet to “First Heterogeneous Ensemble”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: