21278291.txt 19 KB
Retrospective Application of Transposon-Directed Insertion Site
Sequencing to a Library of Signature-Tagged Mini-Tn5Km2 Mutants of Escherichia coli O157 : H7 Screened in Cattle † Sabine E. Eckert ,1 ‡ Francis Dziva ,2 ‡ Roy R. Chaudhuri ,3 ‡ Gemma C. Langridge ,1 ‡ Daniel J. Turner ,1 § Derek J. Pickard ,1 Duncan J. Maskell ,3 Nicholas R. Thomson ,1 and Mark P. Stevens4 * The Wellcome Trust Sanger Institute , Wellcome Trust Genome Campus , Hinxton , Cambridge CB10 1SA , United Kingdom1 ; Enteric Bacterial Pathogens Laboratory , Institute for Animal Health , Compton , Berkshire RG20 7NN , United Kingdom2 ; Department of Veterinary Medicine , University of Cambridge , Madingley Road , Cambridge CB3 0ES , United Kingdom3 ; and Roslin Institute and Royal ( Dick ) School of Veterinary Studies , University of Edinburgh , Bush Farm Road , Roslin , Midlothian EH25 9RG , United Kingdom4 
Enterohemorrhagic Escherichia coli ( EHEC ) strains comprise a subset of Shiga toxin-producing E. coli strains that cause acute enteritis in humans ( 2 ) . 
Infections may be complicated by severe sequelae and are frequently acquired via contact with ruminant feces . 
The molecular mechanisms underlying colonization of the ruminant intestines by EHEC are incompletely understood . 
Previously , we screened a library of 1,900 EHEC O157 : H7 mutants for their ability to colonize bovine intestines by signature-tagged mutagenesis ( STM ) ( 6 ) . 
STM relies on a panel of transposons harboring unique oligo-nucleotide tags . 
The tags can be detected by amplification and hybridization , enabling the composition of complex pools to be analyzed before and after inoculation of animals . 
Mutants that are negatively selected in vivo relative to the inoculum are inferred to lack a gene required for colonization or survival , which can be identified by isolation and sequencing of trans-poson-flanking regions ( 16 ) . 
Our analysis focused on the prototype E. coli O157 : H7 strain EDL933 , for which the chromosome and plasmid sequences are known ( 1 , 18 ) . 
Of the 1,900 signature-tagged mutants screened , 101 were underrepresented in pools recovered from feces 5 days postinoculation of calves ( 6 ) . 
The transposon insertion site could be mapped in 79 such mutants , identifying 59 different genes influencing colonization ( 6 ) . 
Thirteen attenuating mutations were mapped to the locus of enterocyte effacement ( LEE ) , which encodes a type III secretion system 
* Corresponding author . 
Mailing address : Roslin Institute and Royal ( Dick ) School of Veterinary Studies , University of Edinburgh , Bush Farm Road , Roslin , Midlothian EH25 9RG , United Kingdom . 
Phone : 44 131 527 4200 . 
Fax : 44 131 440 0434 . 
E-mail : Mark.Stevens@roslin.ed.ac.uk . 
§ Present address : Oxford Nanopore Technologies , 4 Robert Robinson Way , Magdalen Science Park , Oxford OX4 4GA , United Kingdom . 
‡ Contributed equally to the study . 
† Supplemental material for this article may be found at http://jb . 
asm.org / . 
Published ahead of print on 28 January 2011 . 
( T3SS ) required for the formation of `` attaching and effacing '' lesions . 
The role of T3SS components in intestinal coloni-zation was subsequently confirmed with defined mutants ( 6 , 17 ) and by screening of 480 signature-tagged mutants of EHEC O26 : H from calves ( 27 ) . 
STM also detected attenuating mutations in genes encoding secreted substrates of the T3SS ( espD , map , and nleD ) ( 6 ) . 
Though STM has provided valuable insights into the genetic basis of virulence of microbes , it is limited by the number of unique tags and the effort required to construct libraries and map attenuating mutations . 
Moreover , only negatively selected mutants tend to be investigated and subjective judgments are required to compare signal intensities relative to the input and coscreened mutants . 
Functional annotation of the E. coli O157 : H7 genome in reservoir hosts is further hindered by the cost of using large animals at a high level of disease containment . 
Recently , several protocols have been described that permit the simultaneous assignment of the genotype and fitness score for mutants screened in pools . 
Transposon-di-rected insertion-site sequencing ( TraDIS ) exploits Illumina sequencing to obtain the sequence flanking each transposon insertion ( 11 ) . 
The massively parallel nature of such sequencing permits comparison of the number of specific reads derived from inocula and output pools recovered from animals , providing a numerical measure of the extent to which mutants were selected in vivo . 
TraDIS obviates the need to construct and array uniquely tagged mutants and to subclone and sequence attenuating mutations , yielding substantial time and cost savings . 
TraDIS-like methods have defined the essential gene complement of Salmonella enterica serovar Typhi ( 11 ) and Streptococcus pneumoniae ( 28 ) and have identified genes influencing Haemo-philus influenzae pathogenesis ( 7 ) and survival of the gut symbiont Bacteroides thetaiotaomicron ( 8 ) . 
We retrospectively applied TraDIS to assign the genotype and fitness score of EDL933 mutants previously screened in calves . 
This required the massively parallel sequencing of transposon-flanking regions in the input and output pools of 
EDL933 mini-Tn5Km2 mutants obtained by Dziva et al. ( 6 ) , as schematically shown in Fig. 1 . 
Adequate genomic DNA was retrieved for 19 of the mutant pools screened , comprising a total of 1,805 mutants . 
Genomic DNA from each input and output sample was quantified with a Nanodrop ND-1000 spectrophotometer ( Thermo Fisher , Loughborough , United Kingdom ) . 
Equal amounts ( 1 g ) from all input and all output samples were pooled , and input and output pools were separately fragmented by ultrasonication with a Covaris adaptive focused acoustics instrument , to an average of 200 bp ( 19 ) . 
Fragment libraries were prepared with the Illumina paired-end DNA sample preparation kit ( PE-102-1001 ; Illumina , Little Chesterford , United Kingdom ) , according to the manufacturer 's instructions , and quantified on an Agilent DNA1000 chip ( Agilent , South Queensferry , United Kingdom ) . 
To form dou-ble-strand adapters , oligonucleotides Ind_Ad_T ( 5 - ACACTC TTTCCCTACACGACGCTCTTCCGATC * T-3 [ where the asterisk represents phosphorothioate ) and Ind_Ad_B ( 5 - pG ATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG ATCTC-3 ) were annealed . 
The input and output DNA was ligated to the double-strand adapters and then quantified by quantitative PCR ( qPCR ) using the primers Ad_T_qPCR1 ( 5-CTTTCCCTACACGACGCTCTTC-3 ) and Ad_B_qPCR2 ( 5 - ATTCCTGCTGAACCGCTCTTC-3 ) and SYBR green ( Applied Biosystems , Warrington , United Kingdom ) . 
Two hundred nanograms of adaptor-ligated fragments was used to specifically amplify transposon insertion sites . 
Twenty-four cycles of PCR were performed with transposon-specific forward primer MiniTn5-P5-3pr-3 ( 5 - AATGATACGGCGACCACC GAGATCTACACCTAGGCTGCGGCTGCACTTGTG-3 ) , which contains the Illumina P5 end for attachment to the flow cell , and reverse primer RInV3 .3 ( 5 - CAAGCAGAAGACGG CATACGAGATCGGTACACTCTTTCCCTACACGACGC TCTTCCGATCT-3 , containing the Illumina P7 end ) . 
PCR products were size separated on an agarose gel , and fragments of 350 to 450 bp were excised and recovered with QiaExII gel extraction columns ( Qiagen , Crawley , United Kingdom ) following the manufacturer 's instructions , but without heating ( 19 ) . 
DNA was eluted in 30 l of elution buffer , and quantified by qPCR with standards of known concentration , using prim-ers Syb_FP5 ( 5 - ATGATACGGCGACCACCGAG-3 ) and Syb_RP7 ( 5 - CAAGCAGAAGACGGCATACGAG-3 ) ( 19 ) . 
The DNA fragment libraries were sequenced for 37 cycles according to the manufacturer 's instructions on single end flow cells by an Illumina GAII sequencer , using the custom sequencing primer MiniTn5-3pr-seq3 ( 5 - TAGGCTGCGGCTG CACTTGTGTA-3 ) , which binds 10 bp from the transposon end . 
There were 12.6 and 13.3 million reads obtained for the input and output pools , respectively ( European Nucleotide Archive accession no . 
ERP000368 ) . 
Totals of 12.1 million ( 96.3 % ) of the input reads and 12.4 million ( 93.7 % ) of the output reads contained perfect matches to the 3 end of mini-Tn5Km2 ( 3 ) , and these reads were included in downstream analyses . 
Transposon-derived sequence was removed from each read with a custom Perl script available from the authors . 
The remainder of each sequence read was mapped to th 
EDL933 chromosome and pO157 with NovoAlign ( Novocraft Technologies Sdn Bhd , Selangor , Malaysia ) . 
Totals of 9.9 million input reads ( 78.4 % ) and 10.7 million output reads ( 80.6 % ) were mapped to unique positions in the EDL933 genome . 
Subsequent analyses were performed with R , version 2.8.0 ( R Foundation for Statistical Computing , Vienna , Austria ) . 
To quantify changes in the number of reads arising from specific insertions between the input and output , we adopted an approach suggested for RNA-Seq data analysis ( 15 ) . 
The number of reads at each insertion location ( x ) was treated as a proportion of the total number of mapped reads ( n ) , and a variance-stabilizing arcsine-root transformation was applied , converting each value of x to narcsin ( x/n ) . 
The transformed output values were divided by the equivalent input values to determine the fold change . 
To avoid infinite values derived from taking the log of 0 , sequence counts of 0 were replaced with an arbitrary value of 0.5 . 
Log2 fold change values were calculated to represent the difference in abundance of each mutant in the output pools relative to the input and provide a measure of fitness . 
In our experience , TraDIS may overpredict the number of insertion sites due to a low-level background signal derived from incorrectly mapped or chimeric reads . 
To distinguish genuine inserts from this background signal , predicted insertion sites with fewer than 25 ( i.e. , 32 ) mapped reads were removed from the data set ( see Fig . 
S1 in the supplemental material ) . 
Of the 1,805 EDL933 mutants screened , TraDIS unambig-uously assigned the insertion site and fitness score for 1,645 , representing 855 different genes . 
Importantly , we assigned the genotype and fitness scores to 91.1 % of the mutants analyzed , where previously we only identified the insertion site in 4.2 % of mutants owing to the constraints of STM ( 6 ) . 
Insertions were in general well distributed , although there are AT-rich regions where insertions are overrepresented ( Fig. 2 ) , as may be expected as mini-Tn5Km2 preferentially inserts at TA dinucleotides . 
Table S1 in the supplemental material lists the insertion site and log2 fold change relative to input for each mutation . 
Figure S2 in the supplemental material shows a histogram of log2 fold change values obtained for all the mutants . 
This distribution was modeled by fitting a bimodal normal distribution using the R package mixdist ( 13 ) ( Fig . 
S2 ) . 
This model represents the mutants as a mixture of two distinct populations . 
Most of the mutants show no attenuation , with no clear change in abundance relative to the input pool and a normal distribution of log2 fold change values with a mean close to 0 . 
Attenuated mutants show lower log2 fold change values , with a mean of approximately 3 . 
The model suggests that a log2 fold change of 1 ( equivalent to a 2-fold decrease in the abundance of the mutant in the output pool relative to the input ) is a suitable cutoff value to identify most of the attenuated mutants while restricting the number of false positives to an acceptable level . 
Seventy-two insertions were detected by both STM and TraDIS , 86.1 % of which were negatively selected in both cases and 72.2 % of which showed at least 1 log2 fold change or greater by TraDIS ( see Table S1 in the supplemental material ) . 
Though STM screening of EDL933 mutants in calves identi-fied 13 attenuating mutations in LEE genes ( 6 ) and was considered exhaustive at the time , TraDIS identified 54 insertions in the LEE in 21 different genes . 
By TraDIS , all LEE mutants were negatively selected , except those with insertions in rorf1 or the region between ler and espG ( Fig. 3 ) . 
Insertions in the LEE-flanking regions were not attenuating . 
Mutations in predicted T3SS structural components were strongly negatively selected , with the exception of a single insertion in a gene of unknown function ( rorf8 ) . 
Several LEE genes were disrupted many times , producing comparable fitness scores . 
Variance in the scores for a given gene may reflect differences in competition dynamics in the pools in which the mutants were screened . 
Tra-DIS found 5 attenuating mutations in eae , encoding intimin and 3 mutations in tir , encoding the translocated intimin receptor . 
These were missed by STM , even though intimin and Tir play key roles in intestinal colonization of cattle by E. coli O157 : H7 ( 22 , 29 ) . 
TraDIS also identified mutations in 29 of the 39 type III secreted effectors of E. coli O157 : H7 verified by Tobe et al. ( 26 ) ( see Table S2 in the supplemental material ) . 
Mutants with insertions in several LEE-encoded effectors ( EspF , EspB , Tir , Map , EspH , and EspZ ) were all negatively selected , consistent with the role of such effectors in intestinal persistence of Citrobacter rodentium in mice ( 4 ) and E. coli O157 : H7 in rabbits ( 20 ) . 
Of the non-LEE-encoded effectors , several appeared to play little or no role ( e.g. , NleG , NleH , EspY1 , and EspY4 ) ( Table S2 ) , whereas mutations in the genes coding for the others were attenuating . 
Among the latter was z1829 , encoding EspK , an effector missed by STM but which influences persistence of EHEC in calves ( 27 , 30 ) . 
Though several effector phenotypes have been independently verified , we caution that some attenuating mutations identified by STM could not be reproduced when mutants were tested in isolation ( e.g. , map ) ( 6 ) or by coinfection with the parent strain ( e.g. , nleD ) ( 14 ) , possibly due to the distinct selection pressure exerted by combining 95 mutants during the library screen . 
Analysis of signature-tagged mutants of EHEC O26 : H in calves indicated that the cytotoxins EspP and enterohemo-lysin may promote intestinal colonization ( 27 ) . 
Though mutants with defects in these genes were not detected in the EDL933 STM screen ( 6 ) , TraDIS revealed that several such mutants were represented in the library and were generally negatively selected in calves . 
Three of four EDL933 espP mutants were attenuated by TraDIS ( see Table S1 in the supplemental material ) , consistent with the modest attenuation of a defined espP mutant in calves ( 5 ) . 
Nine of 11 mutants with defects in the enterohemolysin ( EHEC-hly ) operon were negatively selected by TraDIS , supporting the attenuation of an ehxA mutant of EHEC O26 : H in calves ( 27 ) . 
EhxA appears not to play a significant role in rectal colonization in steers ( 22 ) ; however , the latter study involved rectal application of the mutant to ruminant steers , without passage through the intestines . 
Eight insertions were detected in l7031/tagA , which encodes a zinc metalloprotease that cleaves C1-esterase inhib-itor ( StcE ) ( 12 ) , promotes adherence ( 9 ) , and modulates neutrophil function ( 25 ) . 
StcE mutants were generally underrepresented in calves , as were mutants with insertions in the EtpCD type II secretion system required for StcE secretion , consistent with the role of this system in colonization of rabbits ( 10 ) . 
Seventeen mutations were detected in the gene encoding the large clostridial toxin homolog L7095/ToxB , though only 7 were negatively selected by greater than 1 log2 fold change 
This relatively weak phenotype is consistent with the phenotype of a defined E. coli O157 : H7 toxB mutant in calves ( 24 ) . 
Other genes carried by pO157 that were missed by STM but putatively linked to colonization by TraDIS include katP ( cat-alase-peroxidase ) , l7029/msbB ( lipid A myristoyl transferase ) , and a gene of the linked ecf operon ( l7026 ) . 
TraDIS faithfully reproduced the fitness defect of mutants detected by STM that are impaired in O-antigen biosynthesis ( e.g. , manC , per , wbdP , and wzy ) , consistent with the phenotype of an E. coli O157 : H7 perosamine synthetase ( per ) mutant in steers ( 23 ) . 
It also identified other attenuating mutations missed by STM that affect this process , as well as other pathways implicated in bacterial survival in vivo , such as aromatic amino acid biosynthesis ( aroA ) and iron storage ( ftn ) . 
Of further interest , TraDIS identified an attenuating mutation in the catalytic subunit of Shiga toxin 1 ( stx1A ) . 
Previously , STM identified an attenuating mutation downstream of the toxin genes in prophage CP-933V but upstream of those involved in bacterial lysis . 
The attenuation of the stx1A mutant supports the finding that Stx1 promotes intestinal colonization of mice by E. coli O157 : H7 ( 21 ) . 
In common with other methods for screening pools of random mutants , TraDIS describes single gene-phenotype relationships and does not account for functional redundancy . 
Rarely , mutants may also contain more than one transposon insertion , harbor a secondary mutation of another kind , or possess polar insertions affecting the expression of nearby genes . 
These limitations impose a formal requirement to confirm mutant phenotypes via the evaluation of nonpolar mutant and repaired or trans-complemented strains . 
The number of mutants that can be simultaneously screened will also be constrained by the requirement to obtain an output pool of an adequate size at a time postinoculation sufficient for attenuation to be evident . 
It is estimated that if 100 mutants are screened , the output pool must comprise at least 10,000 colo-nies in order to state at the 95 % confidence interval that specific mutants are absent due to attenuation as opposed to chance ( 6 ) . 
Moreover , at high pool complexities , stochastic loss of mutants may occur if the number of mutants exceeds a `` bottleneck '' above which individual mutants in the population no longer have an equal chance of establishing themselves in the host . 
Such limitations are balanced by the ability of massively parallel sequencing of mutant libraries to derive vastly richer functional annotation of pathogen genomes than can be obtained by earlier methods . 
In conclusion , TraDIS validated and substantially extended our analysis of signature-tagged E. coli O157 : H7 mutants in cattle . 
It described the genotype and fitness score for 91.1 % of mutants screened , unlocking hundreds of novel phenotypes with no further animal use . 
It represents a significant advance toward the principles of reduction , refinement , and replacement of animals in research and is relatively inexpensive to apply de novo or retrospectively . 
The procedures described herein relate to transposons that have been extensively used in other microbes ( reviewed in reference 16 ) and can therefore be widely applied to derive quantitative data for functional annotation of microbial genomes . 
We gratefully acknowledge the support of DEFRA ( grant OZ0707 ) , the BBSRC ( grants D017556 and D017947 ) , and the Wellcome Trust