18697768.txt
31 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
BIOINFORMATICS ORIGINAL PAPER
Edward Wijaya1 ,2 , Siu-Ming Yiu3 , Ngo Thanh Son1 , Rajaraman Kanagasabai2 and Wing-Kin Sung1 ,4 , ∗ 1School of Computing , National University of Singapore , Singapore 119260 , 2Institute for Infocomm Research , 21 Heng Mui Keng Terrace , Singapore 119613 , 3Department of Computer Science , The University of Hong Kong , Pokfulam Road , Hong Kong and 4Genome Institute of Singapore , 60 Biopolis Street , # 02-01 Genome , Singapore 138672 Received on May 9 , 2008 ; revised on August 3 , 2008 ; accepted on August 7 , 2008 Advance Access publication August 12 , 2008 Associate Editor : Alex Bateman
ABSTRACT
Motivation : Locating transcription factor binding sites ( motifs ) is a key step in understanding gene regulation .
Based on Tompa 's benchmark study , the performance of current de novo motif finders is far from satisfactory ( with sensitivity ≤ 0.222 and precision ≤ 0.307 ) .
The same study also shows that no motif finder performs consistently well over all datasets .
Hence , it is not clear which finder one should use for a given dataset .
To address this issue , a class of algorithms called ensemble methods have been proposed .
Though the existing ensemble methods overall perform better than stand-alone motif finders , the improvement gained is not substantial .
Our study reveals that these methods do not fully exploit the information obtained from the results of individual finders , resulting in minor improvement in sensitivity and poor precision .
Results : In this article , we identify several key observations on how to utilize the results from individual finders and design a novel ensemble method , MotifVoter , to predict the motifs and binding sites .
Evaluations on 186 datasets show that MotifVoter can locate more than 95 % of the binding sites found by its component motif finders .
In terms of sensitivity and precision , MotifVoter outperforms stand-alone motif finders and ensemble methods significantly on Tompa 's benchmark , Escherichia coli , and ChIP-Chip datasets .
MotifVoter is available online via a web server with several biologist-friendly features .
Availability : http://www.comp.nus.edu.sg/∼bioinfo/MotifVoter Contact : ksung@comp.nus.edu.sg supplementary information : Supplementary data are available at Bioinformatics online .
1 INTRODUCTION
Understanding the regulatory mechanism of genes is one of the major challenges faced by biologists today .
Central to this problem is the identification of recurring patterns ( motifs ) in regulatory sequences , which represent binding sites of a transcription factor .
In general , one would like to identify transcription factors whose binding sites are statistically over-represented in the promoter of a set of co-regulated genes .
Many motif finders have been proposed to undertake this problem using different approaches , such as profilebased methods ( Nimwegen , 2007 ) and consensus-based methods ( Eskin and Pevzner , 2002 ; Pavesi et al. , 2001 ; Wijaya et al. , 2007 ) .
However the existing tools are still not effective for discovering motifs ( Das and Dai , 2007 ; Tompa et al. , 2005 ) .
For example , as shown in Tompa et al. 's evaluation , even the best performing algorithm has sensitivity < 13 % and precision < 35 % ( sensitivity is percentage of true nucleotides that are predicted and precision is the percentage of predicted nucleotides that are true ) .
To deal with this issue , some literatures ( Harbison et al. , 2004 ; Hu and Kihara , 2005 ; MacIsaac and Fraenkel , 2006 ) hinted that assembling the results of multiple motif finders can provide better results for motif finding .
For example , Harbison et al. ( 2004 ) observed different motif finders have different strengths .
They successfully identified more binding sites by combining results of six motif finders compared to using only single finder .
In fact , the benchmark datasets from Tompa et al. ( 2005 ) also support this .
By simply taking the union of all binding sites predicted by 10 selected motif finders , the sensitivity can be increased by more than double over each selected motif finder .
However , the union of all predicted sites could contain a lot of noise .
It is not trivial to distinguish the real binding sites from the noise .
Six ensemble methods have been developed .
SCOPE ( Chakravarty et al. , 2007 ) and BEST ( Dongsheng et al. , 2005 ; Jensen and Liu , 2006 ) rerank all motifs predicted by individual finders based on a certain scoring function and report the top motif .
WebMotifs ( Gordon et al. , 2005 ; Romer et al. , 2007 ) , MultiFinder ( Huber and Bulyk , 2006 ) and RGSMiner ( Huang et al. , 2004 ) assume motifs given by the consensus of several motif finders are likely to be the real motifs .
They cluster the motifs and report one motif from the best cluster .
All above methods select one representative motif among the predicted motifs from the individual finders .
If none of the motifs from the finders can capture the binding sites accurately , the performance of these methods will suffer .
EMD ( Hu et al. , 2006 ) goes a step further and considers the binding sites predicted by the motifs .
Motifs of the same rank from different component motif finders are grouped together .
For each group , based on the binding sites predicted by all the motifs in the group , new motifs are derived .
Though these ensemble methods indeed help to improve the performance of motif finding , the improvement is not significant .
For example , in Tompa 's benchmark and Escherichia coli datasets , the average sensitivity is only improved by 62 % but the average precision is reduced by 15 % .
We believe that some important aspects have been overlooked by existing ensemble methods :
These observations are also supported by Hu and Kihara ( 2005 ) and MacIsaac and Fraenkel ( 2006 ) .
In this article , we show how to formalize these observations and derive two criteria , namely discriminative and consensus criteria , to combine results from multiple motif finders .
Extensive evaluation shows that our proposed ensemble method , MotifVoter , can extract almost all information from multiple motif finders .
On 186 datasets for Tompa 's benchmark ( Tompa et al. , 2005 ) and E.coli ( Salgado et al. , 2004 ) , MotifVoter retains more than 95 % of true binding sites predicted by at least one individual motif finder .
This implies that MotifVoter improves the sensitivity by 120 % and precision by 77 % when compared with the best individual motif finder and it also shows an improvement of 90 % in sensitivity and 135 % in precision when compared to the existing ensemble methods .
Applying MotifVoter on 140 yeast and mammalian ChIP-Chip datasets , we demonstrate that MotifVoter is scalable .
Furthermore , in these ChIP-Chip datasets , we notice that the motifs reported by MotifVoter are more similar to true motifs when compared with those reported by existing stand-alone motif finders and ensemble methods .
It may be noted that , unlike other ensemble methods that only work on the predicted motifs , MotifVoter and EMD work on both the predicted motifs and binding sites .
MotifVoter differs from EMD in the discriminative and consensus criteria used for grouping ( clustering ) the motifs .
First , the discriminative criterion requires the selected cluster to of motif share as many binding sites as possible , while requiring that motifs outside the cluster share none or few binding sites .
Second , the consensus criterion requires the motifs in the selected cluster to be contributed by as many motif finders as possible .
We propose a scoring function that captures the two criteria and derive an efficient algorithm to select a cluster that optimizes this scoring function .
In addition to these two criteria , MotifVoter selects only those binding sites of high confidence to construct the new motif , rather than using all binding sites predicted by motifs in the same group ( cluster ) .
Compared to existing ensemble schemes , MotifVoter has the following advantages :
3 METHODS
MotifVoter uses 10 basic motif finders .
The first group consists of three motif finders based on ( l , d ) model , namely MITRA ( Eskin and Pevzner , 2002 ) , Weeder ( Pavesi et al. , 2001 ) , and SPACE ( Wijaya et al. , 2007 ) .
The second group consists of seven motif finders based on PWM model , namely AlignACE ( Hughes et al. , 2000 ) , ANN-Spec ( Workman and Stormo , 2000 ) , BioProspector ( Liu et al. , 2001 ) , Improbizer ( Ao et al. , 2004 ) , MDScan ( Liu et al. , 2002 ) , MEME ( Bailey and Elkan , 1995 ) and MotifSampler ( Thijs et al. , 2001 ) .
The details of these component motif finders and parameter descriptions can be found in Supplementary Material 10 .
Section 3.1 formally describes the discriminative and consensus criteria used in the motif filtering step .
Then , in Section 3.2 , we present the heuristic algorithm for finding a cluster of motifs that satisfy the two criteria .
The details of the sites extraction step are given in Section 3.3 .
Finally , Section 3.4 describes the performance evaluation measures we used for benchmarking MotifVoter against other existing methods .
3.1 The discriminative and consensus criteria
Given two motifs x and y , we first describe how to measure their similarity based on the binding sites defined by these motifs .
Let I ( x ) be the set of binding sites defined by motif x. Let I ( x ) ∩ I ( y ) be the set of regions covered by at least one site in x and one site in y. Let I ( x ) ∪ I ( y ) be the set of regions covered by any site of x or y .
We denote the total number of nucleotides of all the regions in I ( x ) ∩ I ( y ) and I ( x ) ∪ I ( y ) , as | I ( x ) ∩ I ( y ) | and | I ( x ) ∪ I ( y ) | , respectively .
The similarity of x and y , denoted sim ( x , y ) , is expressed as | I ( x ) ∩ I ( y ) | / | I ( x ) ∪ I ( y ) | .
Note that 0 ≤ sim ( x , y ) ≤ 1 and sim ( x , x ) = 1 .
Now , we define the scoring function to capture the discriminative criterion .
Given m motif finders and each motif finder reports its top-n candidate motifs , there will be a set P of mn candidate motifs .
Among all the candidate motifs in P , some of them will approximate the real motif while the other will not .
We would like to identify the subset X of P such that the candidate motifs in X are likely to approximate the real motif .
The principle idea is that if the candidate motifs in X can model the real motif , motifs in X should be highly similar while motifs outside X should be distant from one another ( discriminative criteria ) .
Let X be some subset of candidate motifs of P .
The mean similarity among the candidate motifs in X , denoted as sim ( X ) , is defined as : ∑
2 ( ( sim ( x , y ) − sim ( X ) ) x , y ∈ X The function w ( X ) measures the degree of similarity among the candidate motifs in X .
If many of the candidate motifs in X approximate the real motif , we should expect to have a high w ( X ) .
On the other hand , we expect the complement of X , that is P − X , should have a low w ( P − X ) .
Thus , w ( P − X ) constitutes the discriminative criterion in the clustering procedure .
In other words , if X is the set of candidate motifs which approximates the real motif , we expect to have a high A ( X ) score , where : w ( P − X ) Note that there may be multiple sets of X with the same A ( X ) score .
Among those X 's , we would select one that contains maximum number of motif finders .
The latter constitutes the consensus criterion .
In summary , this stage aims to find X ⊆ P which maximizes A ( X ) while X contains the candidate motifs predicted by maximum number of motif finders .
The naive method to identify X is to enumerate all possible X and check if they satisfy the above two criteria .
However , this approach is computationally infeasible .
In the next section , we describe our proposed heuristics to identify X to overcome this difficulty .
3.2 Heuristics for motif filtering
In this subsection , we describe the heuristic algorithm for identifying X , the set of similar candidate motifs in the first stage .
Let P be all motifs found by m motif finders , where each motif finder returns n motifs .
Steps 1 -- 3 compute the pairwise similarity scores for all pairs of motifs .
Based on these similarity scores , we apply the following heuristics approach to find X ( Steps 4 -- 9 ) .
Instead of enumerating all possible subsets in P which takes exponential time , we only consider subsets Xz , j = { z , p1 , ... , pj } for every z ∈ P and for every 1 ≤ j ≤ | P | − 1 where sim ( z , pi ) > sim ( z , pi +1 ) and pi ∈ P .
The rationale behind examining only these subsets is as follows .
For each z ∈ P , let a and b be two other motifs such that sim ( a , z ) > sim ( b , z ) .
That is , a is more similar to z than b. Then , it is likely that A ( { z , a , b } ) > A ( { z , b } ) .
By including pj in the subset , we also include all pi with i < j. For each X , we compute the value of A ( X ) and the number of motif z , j z , j finders contributing to Xz , j , denoted as Q ( Xz , j ) .
Finally , we set X to Xz , j with maximum A ( Xz , j ) value .
If there are more than one such subset , pick the one with the largest Q ( Xz , j ) .
In case of a tie again , we pick one randomly .
The time complexity of heuristics can be shown to be O ( m2n2 ) as follows .
There are | P | = mn motifs .
For each motif z , we consider mn different
Output : PWM of the aligned sites 1 : for each z , y ∈ P do 2 : compute sim ( z , y ) 3 : end for 4 : for each z ∈ P do 5 : sort p1 , p2 , ... , pk such that sim ( z , pi ) ≥ sim ( z , pi +1 ) ; ∀ i = 1 . .
k 6 : consider sets X = { z , p , ... , p } ; ∀ j = 1 , ... , k z , j 1 j 7 : compute A ( Xz , j ) for all such Xz , j 8 : set Q ( Xz , j ) = number of motif finders contributing to Xz , j 9 : end for = 10 : set X Xz , j with the maximum A ( Xz , j ) score , if there are two such Xz , j 's , pick the one with the largest Q ( Xz , j ) 11 : extract and align sites of motifs in X 12 : construct PWM
Xz , j subsets .
For each subset , we need to compute the value of A ( Xz , j ) for all j. Note that we can obtain the value of A ( Xz , j ) from the value of A ( Xz , j − 1 ) in constant time .
So , for each motif z , to compute all values of A ( Xz , j ) , it takes O ( mn ) time .
Therefore , the overall time complexity is O ( m2n2 ) .
For the actual running time of the heuristics , please refer to Supplementary Material 5 .
3.3 Sites extraction
From X , we obtain the list of sites using two requirements .
First , we accept all regions which are covered by sites defined by at least two motifs x and y in X , where x and y are predicted by two different motif finders .
The reason behind is that it is unlikely that several motif finders predict the same spurious binding sites .
Second , we accept all the sites of the motif x in X with the highest confidence in approximating real motif .
The confidence score is defined as follows .
Let B ( x ) be the total number of nucleotides covered by the sites defined by motif x. Let O ( x ) be the total number of nucleotides covered by the sites defined by motif x and all other motifs in X .
The confident score of motif x is defined as O ( x ) / B ( x ) .
For the selected sites that are covered by more than one motif finder , we further apply a post-processing procedure to refine each site by removing the nucleotides that are only covered by a single finder to increase the precision of our prediction as these nucleotides are likely to be noise .
Given all the sites predicted by MotifVoter , we generate a PWM motif to model the motif .
To achieve this , we first align these sites using MUSCLE ( Edgar , 2004 ) , then a PWM is generated from this alignment to model the motif .
Figure 1c provides an illustration of this process .
3.4 Performance evaluation measures
For performance evaluation , we use the same four measures proposed in ( Tompa et al. , 2005 ) namely , sensitivity ( nSN ) , positive predictive value1 ( nPPV ) , performance coefficient ( nPC ) and correlation coefficient ( nCC ) .
Index n is used to denote that the assessment is done at the nucleotide level instead of site level .
The definitions of these performance measures can be found in Supplementary Material 4 .
For ChIP-Chip datasets which do not have sites position information , we use sum of squared distance ( SSD ) to evaluate the performance of motif finders .
It is one of the most widely used methods for comparing PWMs ( Mahony et al. , 2007 ) .
Let L be length of two aligned motifs X and Y , SSD is defined as :
1Throughout the article , we use the term precision instead of positive predictive value .
where pxi ( b ) and pyi ( b ) are the probabilities of base b occurring at position i in motif X and Y , respectively .
Note that 0 ≤ SSD ( X , Y ) ≤ 2 and SSD ( X , X ) = 2 .
If the motifs are of different length , we use the shorter motif to align different regions of the longer motif and obtain the maximum SSD value .
4 EXPERIMENTAL RESULTS
There is no universal benchmark dataset for evaluating a motif finder .
We thus performed an extensive evaluation on MotifVoter over 326 datasets from Tompa 's benchmark , E.coli and ChIP-Chip datasets .
In our experiments , we consider the top-30 ( i.e. n = 30 ) motifs reported by each individual motif finder .
For the effect of different values of n , please refer to Supplementary Material 6 .
4.1 Comparison using Tompa’s benchmark dataset
This section compares MotifVoter with existing methods using Tompa 's benchmark datasets .
The datasets are constructed based on real transcription factor binding sites drawn from four different organisms ( human , fruitfly , mouse and yeast ) .
The background sequences are based on three categories : ( 1 ) real upstream sequences ( real ) , ( 2 ) randomly chosen upstream sequences with binding sites inserted ( generic ) and ( 3 ) sequences randomly generated by a Markov chain with binding sites inserted ( markov ) .
It consists of 56 datasets .
The number of sequences per dataset ranges from 1 to 35 and the sequence lengths are up to 3000 bp ( Tompa et al. , 2005 ) .
Such diverse characteristics of the benchmark make it a good candidate for evaluating the robustness of a motif finder .
The graph in Figure 2a shows the average performance of all stand-alone and ensemble motif finders based on four performance evaluation measures [ see also Table 1 for the actual figures of sensitivity ( nSN ) and precision ( nPPV ) ] using Tompa 's benchmark dataset .
Among the stand-alone motif finders , Weeder and SPACE perform somewhat similar and outperform the other 15 stand-alone motif finders by all four measures .
However , integration of motif finders enables MotifVoter to further enhance the performance .
The improvement gained over SPACE is 215 % in sensitivity and 45 % in precision .
We believe that MotifVoter is able to piece together different subregions of sites identified by the stand-alone motif finders , thus yielding significant improvement in sensitivity .
Furthermore , since MotifVoter imposes strict discriminative and consensus criteria on the clustering procedure , we get significant improvement in precision also .
For ensemble methods , MotifVoter outperforms the sensitivity of BEST and precision of SCOPE by 54.1 % and 226.2 % , respectively .
We can not directly evaluate RGSMiner and MultiFinder because of their limited applicability .
For WebMotifs , it only takes probe names as input , so we only evaluate WebMotifs on ChIP-Chip dataset ( see Section 4.3 ) .
In principle , these three ensemble methods select one motif out of all motifs reported by the component motif finders .
Although we are not able to run these finders on the dataset , to predict the upper bound of the performance of these finders , we can select the most sensitive and precise motif for each dataset .
We found that even if we do so , the sensitivity is at least 48.6 % lower than MotifVoter and the precision is at least 41.2 % lower .
For EMD , a direct evaluation is done in the next subsection on E.coli dataset .
However , in this benchmark if we cluster the motifs of the same rank given by 10 motif finders and pick the most sensitive cluster , the highest possible sensitivity is 21.3 % lower than MotifVoter .
The main reason why MotifVoter yields better performance is because apart from the ability to capture true sites from various motif finders , it also includes true sites that may come from different ranks .
4.2 Comparison using E.coli dataset
Despite its comprehensive assortments of datasets , Tompa 's benchmark does not fully represent real biological settings since it still contains synthetic sequences ( markov ) and mixture of real and synthetic ( generic ) sequences .
Here we evaluate MotifVoter using real E.coli datasets .
This species is not included in Tompa 's benchmark .
The datasets are taken from RegulonDB ( Salgado et al. , 2004 ) ( http://regulondb.ccg.unam.mx/ ) , and the sequences are generated from the intergenic regions of E.coli genome .
In total , there are 62 datasets .
The average number of sequences per dataset is 12 and the average sequence length is 300 bp .
The graph in Figure 2b shows the average performance of all stand-alone and ensemble motif finders based on four performance evaluation measures [ see also Table 1 for the actual figures of
The nSN and nPPV for stand-alone motif finders in Tompa 's benchmark are taken directly from Tompa et al. ( 2005 ) .
For E.coli dataset , we are unable to obtain the result of a few stand-alone motif finders ( marked with dash ) , because either these programs are not available or the output does not give binding sites for us to evaluate the results ( e.g YMF ) , during the preparation of this article .
Motif finders marked with asterisks ( * ) are ensemble methods .
Note that for some ensemble methods , we are not able to execute on the given datasets , we only estimate their upper bound ( see Section 4.1 for details on how we obtain the upper bound ) .
sensitivity ( nSN ) and precision ( nPPV ) ] using the E.coli dataset .
MotifVoter consistently outperforms the stand-alone motif finders by about 168 % in terms of sensitivity .
For precision , although the improvement is not so much as for sensitivity , MotifVoter still yields a significant improvement over stand-alone motif finders exceeding the highest precision by 80.9 % .
Three ensemble methods ( SCOPE , BEST , EMD ) evaluated on this dataset outperform the stand-alone algorithms in terms of sensitivity .
However , even for the most sensitive ensemble method ( EMD ) among the three , the improvement is not significant .
On the other hand , MotifVoter outperforms the EMD ensemble method by 130.2 % in sensitivity .
As for precision , MotifVoter also consistently performs better with a 45.9 % improvement over the best ensemble method ( SCOPE ) .
4.3 Application on ChIP-Chip datasets
In this section , we evaluate MotifVoter on yeast and mammalian ChIP-Chip datasets .
These datasets provide an ideal platform for assessing the scalability and applicability of MotifVoter to the entire genome .
The yeast ChIP-Chip datasets and TF profiles are obtained from Harbison et al. ( 2004 ) ( http://fraenkel.mit.edu/Harbison/ ) .
For mammalian datasets , we evaluate nine transcription factors : CREB ( Zhang et al. , 2005 ) , E2F ( Ren et al. , 2002 ) , HNF4/HNF6 ( Odom et al. , 2004 ) , MYOD/MYOG ( Cao et al. , 2006 ) , NFKB ( Schrieber et al. , 2006 ) , NOTCH ( Palomero et al. , 2006 ) , SOX ( Boyer et al. , 2005 ) .
Further details about the yeast and mammalian datasets sources can be found in Supplementary Material 1 .
Out of 65 cases , MotifVoter can identify 56 motifs .
As for the missing ones , we found that the individual finders do not perform well .
Figure 3 shows the evaluation using SSD values of the stand-alone and ensemble motif finders based on the 56 successful cases .
Note that SSD values can give an estimation of how close the prediction of one motif finder is to the real PWM in each dataset .
Table 2 shows the average SSD performance of each motif finder .
On yeast ChIP-Chip datasets , among all stand-alone and ensemble motif finders , Improbizer has the highest average SSD value ( 1.639 ) .
MotifVoter outperforms Improbizer with an average SSD value of 1.919 .
Weblogos of the actual motifs predicted by MotifVoter on these datasets can be found in Supplementary Material 3 .
Figure 4 shows the nine PWMs found by MotifVoter in agreement with canonical motifs in mammalian datasets .
Evaluation on MotifVoter shows that it has an average SSD value of 1.824 , while the best performing existing motif finder achieves an average SSD value of 1.530 .
SSD of MotifVoter on mammalian datasets is lower than that on yeast since the datasets of mammalian genome are much more complex than yeast .
Note that the SSD score is at most 2.
5 DISCUSSION
This section further investigates the effectiveness of MotifVoter in terms of the following aspects : ( 1 ) the optimality of MotifVoter ; ( 2 ) the effects of the discriminative and consensus criteria used in MotifVoter and ( 3 ) the robustness of MotifVoter and the running time of MotifVoter .
5.1 Optimality of MotifVoter
In general , the performance of any ensemble method is bounded by its component motif finders .
The total number of true binding sites that are covered by any of its component motif finders can be regarded as a rough upper bound on the performance of an ensemble method .
We say an ensemble motif finder is - optimal if it can find fraction of the true sites covered by at least one of its component motif finders .
Evaluation on Tompa 's benchmark shows that the highest possible sensitivity that can be achieved all by the component motif finders is 0.440 .
The sensitivity of MotifVoter is 0.419 which is 95.2 % of this upper bound .
In E.coli dataset , the highest possible sensitivity is 0.643 and MotifVoter achieves 95.7 % .
This empirical study shows that MotifVoter is a near optimal ensemble method .
5.2 Analysis of the effects of MotifVoter’s criteria
The discriminative criterion of MotifVoter makes use of the information provided by the false positive ( spurious ) motifs/sites to guide the clustering of true motifs/sites .
This information is not fully utilized by existing ensemble methods .
The consensus criterion in MotifVoter requires the cluster to be globally contributed by as many motif finders as possible .
This can enhance the confidence that the cluster contains good motifs/sites .
These two key ideas distinguish our ensemble method from the existing ones .
This subsection experimentally shows that both criteria in MotifVoter are equally important in their own right .
Figure 5a shows the evaluation results on Tompa 's benchmark datasets .
Sensitivity of MotifVoter without both criteria ( Case IV ) is 80 % lower than BEST and precision is 15 % lower than Weeder .
Similarly in Figure 5b for E.coli , when neither discriminative nor consensus criteria are implemented , the precision of MotifVoter is lower than that of SCOPE .
In contrast , by including both criteria MotifVoter can improve the sensitivity by 163.2 % and precision by 45.9 % .
5.3 Robustness and time complexity MotifVoter
We also evaluate the performance of MotifVoter on various background sequences and various species in Tompa 's benchmark .
MotifVoter achieved the highest nSN and nPPV on all three backgrounds .
The major sensitivity improvement is on real datasets ( 275 % ) , followed by generic datasets ( 128 % ) .
As for evaluation on three species , MotifVoter made major sensitivity improvement on human dataset ( 314 % ) followed by fruitfly ( 263 % ) , while the least improvement was made on yeast datasets ( 84 % ) .
MotifVoter relies on its component motif finders .
Thus , a natural question is whether the performance of MotifVoter will degrade if we include some motif finders with poor performance .
The performance of MotifVoter does degrade as more random motif finders ( representing motif finders with poor performance ) are included .
However , even if we include five random motif finders ( that is half of the real motif finders we used ) , the performance of MotifVoter is still greater than that of the best individual motif finder by 183 % in sensitivity and by 10.4 % in precision .
These results imply that MotifVoter is robust even if some of the component motif finders perform badly .
We remark that it is possible that MotifVoter may predict wrong motifs if almost every individual finder reports similar wrong motifs .
However , MotifVoter can reduce the chance of finding such wrong motifs significantly in comparison to individual finders because the chance of having many individual finders to report the same ( similar ) spurious motif is low since every finder has its own scheme of predicting the motifs .
Running all 10 component motif finders for MotifVoter is not always practical .
We show that the sensitivity and precision of MotifVoter are still significantly better than the best motif finder when we run only the five fastest motif finders .
The details of the above robustness evaluations are shown in Supplementary Material 5 .
In Supplementary Material 8 , we evaluate and discuss the characteristics of binding sites missed by MotifVoter .
We also show that MotifVoter can identify motifs in motif modules that contain multiple motifs work in groups , provided that these motifs have signals strong enough to be detected by the individual motif finders used in MotifVoter .
Details can be found in Supplementary Material 9 .
6 CONCLUSIONS
We have presented MotifVoter , a simple yet effective ensemble method that utilizes both positive and negative information in the motifs returned by the basic motif finders and achieves near-optimal sensitivity .
In extensive evaluations on many ChIP-Chip and benchmark datasets , we observed that MotifVoter outperforms all stand-alone and existing ensemble motif finders significantly .
It is also robust with respect to the inclusion of random motif finders and number of component motif finders .
A key advantage of MotifVoter is its flexibility in incorporating new algorithms .
That is , if a novel superior motif finding algorithm is made available , it can be readily incorporated into the MotifVoter platform .
In the website version , we provide an option for users to provide results from the component motif finders that are not included in our main list .
Furthermore , MotifVoter , as a web application , has several biologist friendly features such as a parameter-free interface , ability for users to choose their preferred component motif finders and option for speedier return of results using the fastest motif finders .
ACKNOWLEDGEMENTS
We are grateful to the authors of the component motif finders used by MotifVoter for making their tools available for public use .
We thank C. Dongsheng for providing command line version of BEST .
Finally , the authors would like to thank the reviewers for all their useful and constructive comments .
Funding : National University of Singapore ( grant R-252-000-326-112 ) ; Research Output Prize ( Faculty of Engineering ) of the University of HongKong to S.M.Y.