18697768.txt 31 KB

Raw Blame History Permalink

BIOINFORMATICS ORIGINAL PAPER
Edward Wijaya1 ,2 , Siu-Ming Yiu3 , Ngo Thanh Son1 , Rajaraman Kanagasabai2 and Wing-Kin Sung1 ,4 , ∗ 1School of Computing , National University of Singapore , Singapore 119260 , 2Institute for Infocomm Research , 21 Heng Mui Keng Terrace , Singapore 119613 , 3Department of Computer Science , The University of Hong Kong , Pokfulam Road , Hong Kong and 4Genome Institute of Singapore , 60 Biopolis Street , # 02-01 Genome , Singapore 138672 Received on May 9 , 2008 ; revised on August 3 , 2008 ; accepted on August 7 , 2008 Advance Access publication August 12 , 2008 Associate Editor : Alex Bateman
ABSTRACT
Motivation : Locating transcription factor binding sites ( motifs ) is a key step in understanding gene regulation .
Based on Tompa 's benchmark study , the performance of current de novo motif ﬁnders is far from satisfactory ( with sensitivity ≤ 0.222 and precision ≤ 0.307 ) .
The same study also shows that no motif ﬁnder performs consistently well over all datasets .
Hence , it is not clear which ﬁnder one should use for a given dataset .
To address this issue , a class of algorithms called ensemble methods have been proposed .
Though the existing ensemble methods overall perform better than stand-alone motif ﬁnders , the improvement gained is not substantial .
Our study reveals that these methods do not fully exploit the information obtained from the results of individual ﬁnders , resulting in minor improvement in sensitivity and poor precision .
Results : In this article , we identify several key observations on how to utilize the results from individual ﬁnders and design a novel ensemble method , MotifVoter , to predict the motifs and binding sites .
Evaluations on 186 datasets show that MotifVoter can locate more than 95 % of the binding sites found by its component motif ﬁnders .
In terms of sensitivity and precision , MotifVoter outperforms stand-alone motif ﬁnders and ensemble methods signiﬁcantly on Tompa 's benchmark , Escherichia coli , and ChIP-Chip datasets .
MotifVoter is available online via a web server with several biologist-friendly features .
Availability : http://www.comp.nus.edu.sg/∼bioinfo/MotifVoter Contact : ksung@comp.nus.edu.sg supplementary information : Supplementary data are available at Bioinformatics online .
1 INTRODUCTION
Understanding the regulatory mechanism of genes is one of the major challenges faced by biologists today .
Central to this problem is the identiﬁcation of recurring patterns ( motifs ) in regulatory sequences , which represent binding sites of a transcription factor .
In general , one would like to identify transcription factors whose binding sites are statistically over-represented in the promoter of a set of co-regulated genes .
Many motif ﬁnders have been proposed to undertake this problem using different approaches , such as proﬁlebased methods ( Nimwegen , 2007 ) and consensus-based methods ( Eskin and Pevzner , 2002 ; Pavesi et al. , 2001 ; Wijaya et al. , 2007 ) .
However the existing tools are still not effective for discovering motifs ( Das and Dai , 2007 ; Tompa et al. , 2005 ) .
For example , as shown in Tompa et al. 's evaluation , even the best performing algorithm has sensitivity < 13 % and precision < 35 % ( sensitivity is percentage of true nucleotides that are predicted and precision is the percentage of predicted nucleotides that are true ) .
To deal with this issue , some literatures ( Harbison et al. , 2004 ; Hu and Kihara , 2005 ; MacIsaac and Fraenkel , 2006 ) hinted that assembling the results of multiple motif ﬁnders can provide better results for motif ﬁnding .
For example , Harbison et al. ( 2004 ) observed different motif ﬁnders have different strengths .
They successfully identiﬁed more binding sites by combining results of six motif ﬁnders compared to using only single ﬁnder .
In fact , the benchmark datasets from Tompa et al. ( 2005 ) also support this .
By simply taking the union of all binding sites predicted by 10 selected motif ﬁnders , the sensitivity can be increased by more than double over each selected motif ﬁnder .
However , the union of all predicted sites could contain a lot of noise .
It is not trivial to distinguish the real binding sites from the noise .
Six ensemble methods have been developed .
SCOPE ( Chakravarty et al. , 2007 ) and BEST ( Dongsheng et al. , 2005 ; Jensen and Liu , 2006 ) rerank all motifs predicted by individual ﬁnders based on a certain scoring function and report the top motif .
WebMotifs ( Gordon et al. , 2005 ; Romer et al. , 2007 ) , MultiFinder ( Huber and Bulyk , 2006 ) and RGSMiner ( Huang et al. , 2004 ) assume motifs given by the consensus of several motif ﬁnders are likely to be the real motifs .
They cluster the motifs and report one motif from the best cluster .
All above methods select one representative motif among the predicted motifs from the individual ﬁnders .
If none of the motifs from the ﬁnders can capture the binding sites accurately , the performance of these methods will suffer .
EMD ( Hu et al. , 2006 ) goes a step further and considers the binding sites predicted by the motifs .
Motifs of the same rank from different component motif ﬁnders are grouped together .
For each group , based on the binding sites predicted by all the motifs in the group , new motifs are derived .
Though these ensemble methods indeed help to improve the performance of motif ﬁnding , the improvement is not signiﬁcant .
For example , in Tompa 's benchmark and Escherichia coli datasets , the average sensitivity is only improved by 62 % but the average precision is reduced by 15 % .
We believe that some important aspects have been overlooked by existing ensemble methods :
These observations are also supported by Hu and Kihara ( 2005 ) and MacIsaac and Fraenkel ( 2006 ) .
In this article , we show how to formalize these observations and derive two criteria , namely discriminative and consensus criteria , to combine results from multiple motif ﬁnders .
Extensive evaluation shows that our proposed ensemble method , MotifVoter , can extract almost all information from multiple motif ﬁnders .
On 186 datasets for Tompa 's benchmark ( Tompa et al. , 2005 ) and E.coli ( Salgado et al. , 2004 ) , MotifVoter retains more than 95 % of true binding sites predicted by at least one individual motif ﬁnder .
This implies that MotifVoter improves the sensitivity by 120 % and precision by 77 % when compared with the best individual motif ﬁnder and it also shows an improvement of 90 % in sensitivity and 135 % in precision when compared to the existing ensemble methods .
Applying MotifVoter on 140 yeast and mammalian ChIP-Chip datasets , we demonstrate that MotifVoter is scalable .
Furthermore , in these ChIP-Chip datasets , we notice that the motifs reported by MotifVoter are more similar to true motifs when compared with those reported by existing stand-alone motif ﬁnders and ensemble methods .
It may be noted that , unlike other ensemble methods that only work on the predicted motifs , MotifVoter and EMD work on both the predicted motifs and binding sites .
MotifVoter differs from EMD in the discriminative and consensus criteria used for grouping ( clustering ) the motifs .
First , the discriminative criterion requires the selected cluster to of motif share as many binding sites as possible , while requiring that motifs outside the cluster share none or few binding sites .
Second , the consensus criterion requires the motifs in the selected cluster to be contributed by as many motif ﬁnders as possible .
We propose a scoring function that captures the two criteria and derive an efﬁcient algorithm to select a cluster that optimizes this scoring function .
In addition to these two criteria , MotifVoter selects only those binding sites of high conﬁdence to construct the new motif , rather than using all binding sites predicted by motifs in the same group ( cluster ) .
Compared to existing ensemble schemes , MotifVoter has the following advantages :
3 METHODS
MotifVoter uses 10 basic motif ﬁnders .
The ﬁrst group consists of three motif ﬁnders based on ( l , d ) model , namely MITRA ( Eskin and Pevzner , 2002 ) , Weeder ( Pavesi et al. , 2001 ) , and SPACE ( Wijaya et al. , 2007 ) .
The second group consists of seven motif ﬁnders based on PWM model , namely AlignACE ( Hughes et al. , 2000 ) , ANN-Spec ( Workman and Stormo , 2000 ) , BioProspector ( Liu et al. , 2001 ) , Improbizer ( Ao et al. , 2004 ) , MDScan ( Liu et al. , 2002 ) , MEME ( Bailey and Elkan , 1995 ) and MotifSampler ( Thijs et al. , 2001 ) .
The details of these component motif ﬁnders and parameter descriptions can be found in Supplementary Material 10 .
Section 3.1 formally describes the discriminative and consensus criteria used in the motif ﬁltering step .
Then , in Section 3.2 , we present the heuristic algorithm for ﬁnding a cluster of motifs that satisfy the two criteria .
The details of the sites extraction step are given in Section 3.3 .
Finally , Section 3.4 describes the performance evaluation measures we used for benchmarking MotifVoter against other existing methods .
3.1 The discriminative and consensus criteria
Given two motifs x and y , we ﬁrst describe how to measure their similarity based on the binding sites deﬁned by these motifs .
Let I ( x ) be the set of binding sites deﬁned by motif x. Let I ( x ) ∩ I ( y ) be the set of regions covered by at least one site in x and one site in y. Let I ( x ) ∪ I ( y ) be the set of regions covered by any site of x or y .
We denote the total number of nucleotides of all the regions in I ( x ) ∩ I ( y ) and I ( x ) ∪ I ( y ) , as | I ( x ) ∩ I ( y ) | and | I ( x ) ∪ I ( y ) | , respectively .
The similarity of x and y , denoted sim ( x , y ) , is expressed as | I ( x ) ∩ I ( y ) | / | I ( x ) ∪ I ( y ) | .
Note that 0 ≤ sim ( x , y ) ≤ 1 and sim ( x , x ) = 1 .
Now , we deﬁne the scoring function to capture the discriminative criterion .
Given m motif ﬁnders and each motif ﬁnder reports its top-n candidate motifs , there will be a set P of mn candidate motifs .
Among all the candidate motifs in P , some of them will approximate the real motif while the other will not .
We would like to identify the subset X of P such that the candidate motifs in X are likely to approximate the real motif .
The principle idea is that if the candidate motifs in X can model the real motif , motifs in X should be highly similar while motifs outside X should be distant from one another ( discriminative criteria ) .
Let X be some subset of candidate motifs of P .
The mean similarity among the candidate motifs in X , denoted as sim ( X ) , is deﬁned as : ∑
2 ( ( sim ( x , y ) − sim ( X ) ) x , y ∈ X The function w ( X ) measures the degree of similarity among the candidate motifs in X .
If many of the candidate motifs in X approximate the real motif , we should expect to have a high w ( X ) .
On the other hand , we expect the complement of X , that is P − X , should have a low w ( P − X ) .
Thus , w ( P − X ) constitutes the discriminative criterion in the clustering procedure .
In other words , if X is the set of candidate motifs which approximates the real motif , we expect to have a high A ( X ) score , where : w ( P − X ) Note that there may be multiple sets of X with the same A ( X ) score .
Among those X 's , we would select one that contains maximum number of motif ﬁnders .
The latter constitutes the consensus criterion .
In summary , this stage aims to ﬁnd X ⊆ P which maximizes A ( X ) while X contains the candidate motifs predicted by maximum number of motif ﬁnders .
The naive method to identify X is to enumerate all possible X and check if they satisfy the above two criteria .
However , this approach is computationally infeasible .
In the next section , we describe our proposed heuristics to identify X to overcome this difﬁculty .
3.2 Heuristics for motif ﬁltering
In this subsection , we describe the heuristic algorithm for identifying X , the set of similar candidate motifs in the ﬁrst stage .
Let P be all motifs found by m motif ﬁnders , where each motif ﬁnder returns n motifs .
Steps 1 -- 3 compute the pairwise similarity scores for all pairs of motifs .
Based on these similarity scores , we apply the following heuristics approach to ﬁnd X ( Steps 4 -- 9 ) .
Instead of enumerating all possible subsets in P which takes exponential time , we only consider subsets Xz , j = { z , p1 , ... , pj } for every z ∈ P and for every 1 ≤ j ≤ | P | − 1 where sim ( z , pi ) > sim ( z , pi +1 ) and pi ∈ P .
The rationale behind examining only these subsets is as follows .
For each z ∈ P , let a and b be two other motifs such that sim ( a , z ) > sim ( b , z ) .
That is , a is more similar to z than b. Then , it is likely that A ( { z , a , b } ) > A ( { z , b } ) .
By including pj in the subset , we also include all pi with i < j. For each X , we compute the value of A ( X ) and the number of motif z , j z , j ﬁnders contributing to Xz , j , denoted as Q ( Xz , j ) .
Finally , we set X to Xz , j with maximum A ( Xz , j ) value .
If there are more than one such subset , pick the one with the largest Q ( Xz , j ) .
In case of a tie again , we pick one randomly .
The time complexity of heuristics can be shown to be O ( m2n2 ) as follows .
There are | P | = mn motifs .
For each motif z , we consider mn different
Output : PWM of the aligned sites 1 : for each z , y ∈ P do 2 : compute sim ( z , y ) 3 : end for 4 : for each z ∈ P do 5 : sort p1 , p2 , ... , pk such that sim ( z , pi ) ≥ sim ( z , pi +1 ) ; ∀ i = 1 . .
k 6 : consider sets X = { z , p , ... , p } ; ∀ j = 1 , ... , k z , j 1 j 7 : compute A ( Xz , j ) for all such Xz , j 8 : set Q ( Xz , j ) = number of motif ﬁnders contributing to Xz , j 9 : end for = 10 : set X Xz , j with the maximum A ( Xz , j ) score , if there are two such Xz , j 's , pick the one with the largest Q ( Xz , j ) 11 : extract and align sites of motifs in X 12 : construct PWM
Xz , j subsets .
For each subset , we need to compute the value of A ( Xz , j ) for all j. Note that we can obtain the value of A ( Xz , j ) from the value of A ( Xz , j − 1 ) in constant time .
So , for each motif z , to compute all values of A ( Xz , j ) , it takes O ( mn ) time .
Therefore , the overall time complexity is O ( m2n2 ) .
For the actual running time of the heuristics , please refer to Supplementary Material 5 .
3.3 Sites extraction
From X , we obtain the list of sites using two requirements .
First , we accept all regions which are covered by sites deﬁned by at least two motifs x and y in X , where x and y are predicted by two different motif ﬁnders .
The reason behind is that it is unlikely that several motif ﬁnders predict the same spurious binding sites .
Second , we accept all the sites of the motif x in X with the highest conﬁdence in approximating real motif .
The conﬁdence score is deﬁned as follows .
Let B ( x ) be the total number of nucleotides covered by the sites deﬁned by motif x. Let O ( x ) be the total number of nucleotides covered by the sites deﬁned by motif x and all other motifs in X .
The conﬁdent score of motif x is deﬁned as O ( x ) / B ( x ) .
For the selected sites that are covered by more than one motif ﬁnder , we further apply a post-processing procedure to reﬁne each site by removing the nucleotides that are only covered by a single ﬁnder to increase the precision of our prediction as these nucleotides are likely to be noise .
Given all the sites predicted by MotifVoter , we generate a PWM motif to model the motif .
To achieve this , we ﬁrst align these sites using MUSCLE ( Edgar , 2004 ) , then a PWM is generated from this alignment to model the motif .
Figure 1c provides an illustration of this process .
3.4 Performance evaluation measures
For performance evaluation , we use the same four measures proposed in ( Tompa et al. , 2005 ) namely , sensitivity ( nSN ) , positive predictive value1 ( nPPV ) , performance coefﬁcient ( nPC ) and correlation coefﬁcient ( nCC ) .
Index n is used to denote that the assessment is done at the nucleotide level instead of site level .
The deﬁnitions of these performance measures can be found in Supplementary Material 4 .
For ChIP-Chip datasets which do not have sites position information , we use sum of squared distance ( SSD ) to evaluate the performance of motif ﬁnders .
It is one of the most widely used methods for comparing PWMs ( Mahony et al. , 2007 ) .
Let L be length of two aligned motifs X and Y , SSD is deﬁned as :
1Throughout the article , we use the term precision instead of positive predictive value .
where pxi ( b ) and pyi ( b ) are the probabilities of base b occurring at position i in motif X and Y , respectively .
Note that 0 ≤ SSD ( X , Y ) ≤ 2 and SSD ( X , X ) = 2 .
If the motifs are of different length , we use the shorter motif to align different regions of the longer motif and obtain the maximum SSD value .
4 EXPERIMENTAL RESULTS
There is no universal benchmark dataset for evaluating a motif ﬁnder .
We thus performed an extensive evaluation on MotifVoter over 326 datasets from Tompa 's benchmark , E.coli and ChIP-Chip datasets .
In our experiments , we consider the top-30 ( i.e. n = 30 ) motifs reported by each individual motif ﬁnder .
For the effect of different values of n , please refer to Supplementary Material 6 .
4.1 Comparison using Tompa’s benchmark dataset
This section compares MotifVoter with existing methods using Tompa 's benchmark datasets .
The datasets are constructed based on real transcription factor binding sites drawn from four different organisms ( human , fruitﬂy , mouse and yeast ) .
The background sequences are based on three categories : ( 1 ) real upstream sequences ( real ) , ( 2 ) randomly chosen upstream sequences with binding sites inserted ( generic ) and ( 3 ) sequences randomly generated by a Markov chain with binding sites inserted ( markov ) .
It consists of 56 datasets .
The number of sequences per dataset ranges from 1 to 35 and the sequence lengths are up to 3000 bp ( Tompa et al. , 2005 ) .
Such diverse characteristics of the benchmark make it a good candidate for evaluating the robustness of a motif ﬁnder .
The graph in Figure 2a shows the average performance of all stand-alone and ensemble motif ﬁnders based on four performance evaluation measures [ see also Table 1 for the actual ﬁgures of sensitivity ( nSN ) and precision ( nPPV ) ] using Tompa 's benchmark dataset .
Among the stand-alone motif ﬁnders , Weeder and SPACE perform somewhat similar and outperform the other 15 stand-alone motif ﬁnders by all four measures .
However , integration of motif ﬁnders enables MotifVoter to further enhance the performance .
The improvement gained over SPACE is 215 % in sensitivity and 45 % in precision .
We believe that MotifVoter is able to piece together different subregions of sites identiﬁed by the stand-alone motif ﬁnders , thus yielding signiﬁcant improvement in sensitivity .
Furthermore , since MotifVoter imposes strict discriminative and consensus criteria on the clustering procedure , we get signiﬁcant improvement in precision also .
For ensemble methods , MotifVoter outperforms the sensitivity of BEST and precision of SCOPE by 54.1 % and 226.2 % , respectively .
We can not directly evaluate RGSMiner and MultiFinder because of their limited applicability .
For WebMotifs , it only takes probe names as input , so we only evaluate WebMotifs on ChIP-Chip dataset ( see Section 4.3 ) .
In principle , these three ensemble methods select one motif out of all motifs reported by the component motif ﬁnders .
Although we are not able to run these ﬁnders on the dataset , to predict the upper bound of the performance of these ﬁnders , we can select the most sensitive and precise motif for each dataset .
We found that even if we do so , the sensitivity is at least 48.6 % lower than MotifVoter and the precision is at least 41.2 % lower .
For EMD , a direct evaluation is done in the next subsection on E.coli dataset .
However , in this benchmark if we cluster the motifs of the same rank given by 10 motif ﬁnders and pick the most sensitive cluster , the highest possible sensitivity is 21.3 % lower than MotifVoter .
The main reason why MotifVoter yields better performance is because apart from the ability to capture true sites from various motif ﬁnders , it also includes true sites that may come from different ranks .
4.2 Comparison using E.coli dataset
Despite its comprehensive assortments of datasets , Tompa 's benchmark does not fully represent real biological settings since it still contains synthetic sequences ( markov ) and mixture of real and synthetic ( generic ) sequences .
Here we evaluate MotifVoter using real E.coli datasets .
This species is not included in Tompa 's benchmark .
The datasets are taken from RegulonDB ( Salgado et al. , 2004 ) ( http://regulondb.ccg.unam.mx/ ) , and the sequences are generated from the intergenic regions of E.coli genome .
In total , there are 62 datasets .
The average number of sequences per dataset is 12 and the average sequence length is 300 bp .
The graph in Figure 2b shows the average performance of all stand-alone and ensemble motif ﬁnders based on four performance evaluation measures [ see also Table 1 for the actual ﬁgures of
The nSN and nPPV for stand-alone motif ﬁnders in Tompa 's benchmark are taken directly from Tompa et al. ( 2005 ) .
For E.coli dataset , we are unable to obtain the result of a few stand-alone motif ﬁnders ( marked with dash ) , because either these programs are not available or the output does not give binding sites for us to evaluate the results ( e.g YMF ) , during the preparation of this article .
Motif ﬁnders marked with asterisks ( * ) are ensemble methods .
Note that for some ensemble methods , we are not able to execute on the given datasets , we only estimate their upper bound ( see Section 4.1 for details on how we obtain the upper bound ) .
sensitivity ( nSN ) and precision ( nPPV ) ] using the E.coli dataset .
MotifVoter consistently outperforms the stand-alone motif ﬁnders by about 168 % in terms of sensitivity .
For precision , although the improvement is not so much as for sensitivity , MotifVoter still yields a signiﬁcant improvement over stand-alone motif ﬁnders exceeding the highest precision by 80.9 % .
Three ensemble methods ( SCOPE , BEST , EMD ) evaluated on this dataset outperform the stand-alone algorithms in terms of sensitivity .
However , even for the most sensitive ensemble method ( EMD ) among the three , the improvement is not signiﬁcant .
On the other hand , MotifVoter outperforms the EMD ensemble method by 130.2 % in sensitivity .
As for precision , MotifVoter also consistently performs better with a 45.9 % improvement over the best ensemble method ( SCOPE ) .
4.3 Application on ChIP-Chip datasets
In this section , we evaluate MotifVoter on yeast and mammalian ChIP-Chip datasets .
These datasets provide an ideal platform for assessing the scalability and applicability of MotifVoter to the entire genome .
The yeast ChIP-Chip datasets and TF proﬁles are obtained from Harbison et al. ( 2004 ) ( http://fraenkel.mit.edu/Harbison/ ) .
For mammalian datasets , we evaluate nine transcription factors : CREB ( Zhang et al. , 2005 ) , E2F ( Ren et al. , 2002 ) , HNF4/HNF6 ( Odom et al. , 2004 ) , MYOD/MYOG ( Cao et al. , 2006 ) , NFKB ( Schrieber et al. , 2006 ) , NOTCH ( Palomero et al. , 2006 ) , SOX ( Boyer et al. , 2005 ) .
Further details about the yeast and mammalian datasets sources can be found in Supplementary Material 1 .
Out of 65 cases , MotifVoter can identify 56 motifs .
As for the missing ones , we found that the individual ﬁnders do not perform well .
Figure 3 shows the evaluation using SSD values of the stand-alone and ensemble motif ﬁnders based on the 56 successful cases .
Note that SSD values can give an estimation of how close the prediction of one motif ﬁnder is to the real PWM in each dataset .
Table 2 shows the average SSD performance of each motif ﬁnder .
On yeast ChIP-Chip datasets , among all stand-alone and ensemble motif ﬁnders , Improbizer has the highest average SSD value ( 1.639 ) .
MotifVoter outperforms Improbizer with an average SSD value of 1.919 .
Weblogos of the actual motifs predicted by MotifVoter on these datasets can be found in Supplementary Material 3 .
Figure 4 shows the nine PWMs found by MotifVoter in agreement with canonical motifs in mammalian datasets .
Evaluation on MotifVoter shows that it has an average SSD value of 1.824 , while the best performing existing motif ﬁnder achieves an average SSD value of 1.530 .
SSD of MotifVoter on mammalian datasets is lower than that on yeast since the datasets of mammalian genome are much more complex than yeast .
Note that the SSD score is at most 2.
5 DISCUSSION
This section further investigates the effectiveness of MotifVoter in terms of the following aspects : ( 1 ) the optimality of MotifVoter ; ( 2 ) the effects of the discriminative and consensus criteria used in MotifVoter and ( 3 ) the robustness of MotifVoter and the running time of MotifVoter .
5.1 Optimality of MotifVoter
In general , the performance of any ensemble method is bounded by its component motif ﬁnders .
The total number of true binding sites that are covered by any of its component motif ﬁnders can be regarded as a rough upper bound on the performance of an ensemble method .
We say an ensemble motif ﬁnder is - optimal if it can ﬁnd fraction of the true sites covered by at least one of its component motif ﬁnders .
Evaluation on Tompa 's benchmark shows that the highest possible sensitivity that can be achieved all by the component motif ﬁnders is 0.440 .
The sensitivity of MotifVoter is 0.419 which is 95.2 % of this upper bound .
In E.coli dataset , the highest possible sensitivity is 0.643 and MotifVoter achieves 95.7 % .
This empirical study shows that MotifVoter is a near optimal ensemble method .
5.2 Analysis of the effects of MotifVoter’s criteria
The discriminative criterion of MotifVoter makes use of the information provided by the false positive ( spurious ) motifs/sites to guide the clustering of true motifs/sites .
This information is not fully utilized by existing ensemble methods .
The consensus criterion in MotifVoter requires the cluster to be globally contributed by as many motif ﬁnders as possible .
This can enhance the conﬁdence that the cluster contains good motifs/sites .
These two key ideas distinguish our ensemble method from the existing ones .
This subsection experimentally shows that both criteria in MotifVoter are equally important in their own right .
Figure 5a shows the evaluation results on Tompa 's benchmark datasets .
Sensitivity of MotifVoter without both criteria ( Case IV ) is 80 % lower than BEST and precision is 15 % lower than Weeder .
Similarly in Figure 5b for E.coli , when neither discriminative nor consensus criteria are implemented , the precision of MotifVoter is lower than that of SCOPE .
In contrast , by including both criteria MotifVoter can improve the sensitivity by 163.2 % and precision by 45.9 % .
5.3 Robustness and time complexity MotifVoter
We also evaluate the performance of MotifVoter on various background sequences and various species in Tompa 's benchmark .
MotifVoter achieved the highest nSN and nPPV on all three backgrounds .
The major sensitivity improvement is on real datasets ( 275 % ) , followed by generic datasets ( 128 % ) .
As for evaluation on three species , MotifVoter made major sensitivity improvement on human dataset ( 314 % ) followed by fruitﬂy ( 263 % ) , while the least improvement was made on yeast datasets ( 84 % ) .
MotifVoter relies on its component motif ﬁnders .
Thus , a natural question is whether the performance of MotifVoter will degrade if we include some motif ﬁnders with poor performance .
The performance of MotifVoter does degrade as more random motif ﬁnders ( representing motif ﬁnders with poor performance ) are included .
However , even if we include ﬁve random motif ﬁnders ( that is half of the real motif ﬁnders we used ) , the performance of MotifVoter is still greater than that of the best individual motif ﬁnder by 183 % in sensitivity and by 10.4 % in precision .
These results imply that MotifVoter is robust even if some of the component motif ﬁnders perform badly .
We remark that it is possible that MotifVoter may predict wrong motifs if almost every individual ﬁnder reports similar wrong motifs .
However , MotifVoter can reduce the chance of ﬁnding such wrong motifs signiﬁcantly in comparison to individual ﬁnders because the chance of having many individual ﬁnders to report the same ( similar ) spurious motif is low since every ﬁnder has its own scheme of predicting the motifs .
Running all 10 component motif ﬁnders for MotifVoter is not always practical .
We show that the sensitivity and precision of MotifVoter are still signiﬁcantly better than the best motif ﬁnder when we run only the ﬁve fastest motif ﬁnders .
The details of the above robustness evaluations are shown in Supplementary Material 5 .
In Supplementary Material 8 , we evaluate and discuss the characteristics of binding sites missed by MotifVoter .
We also show that MotifVoter can identify motifs in motif modules that contain multiple motifs work in groups , provided that these motifs have signals strong enough to be detected by the individual motif ﬁnders used in MotifVoter .
Details can be found in Supplementary Material 9 .
6 CONCLUSIONS
We have presented MotifVoter , a simple yet effective ensemble method that utilizes both positive and negative information in the motifs returned by the basic motif ﬁnders and achieves near-optimal sensitivity .
In extensive evaluations on many ChIP-Chip and benchmark datasets , we observed that MotifVoter outperforms all stand-alone and existing ensemble motif ﬁnders signiﬁcantly .
It is also robust with respect to the inclusion of random motif ﬁnders and number of component motif ﬁnders .
A key advantage of MotifVoter is its ﬂexibility in incorporating new algorithms .
That is , if a novel superior motif ﬁnding algorithm is made available , it can be readily incorporated into the MotifVoter platform .
In the website version , we provide an option for users to provide results from the component motif ﬁnders that are not included in our main list .
Furthermore , MotifVoter , as a web application , has several biologist friendly features such as a parameter-free interface , ability for users to choose their preferred component motif ﬁnders and option for speedier return of results using the fastest motif ﬁnders .
ACKNOWLEDGEMENTS
We are grateful to the authors of the component motif ﬁnders used by MotifVoter for making their tools available for public use .
We thank C. Dongsheng for providing command line version of BEST .
Finally , the authors would like to thank the reviewers for all their useful and constructive comments .
Funding : National University of Singapore ( grant R-252-000-326-112 ) ; Research Output Prize ( Faculty of Engineering ) of the University of HongKong to S.M.Y.