29476659.txt
64.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
Devon M. Fitzgerald 1 , Carol Smith 2, Pascal Lapierre 2, and Joseph T. Wade 1,2,3
This article has been accepted for publication and undergone full peer review but has not been through the copyediting , typesetting , pagination and proofreading process which may lead to differences between this version and the Version of Record .
Please cite this article as an ` Accepted Article ' , doi : 10.1111 / mmi .13941 This article is protected by copyright .
All rights reserved
Recent findings have identified thousands of bacterial promoters in unexpected locations , such as inside genes .
Here , we investigate the functions of intragenic promoters for the flagellar sigma factor FliA .
Our data suggest that most of these promoters are not functional , but that one intragenic FliA promoter is broadly conserved , and constrains evolution of the overlapping protein-coding gene .
Our data suggest that intragenic regulatory
ABSTRACT
In Escherichia coli , one Sigma factor recognizes the majority of promoters , and six `` alternative '' Sigma factors recognize specific subsets of promoters .
The a 28 lternative Sigma factor FliA ( σ ) recognizes promoters upstream of many flagellar genes .
We previously showed that most E. coli FliA binding sites are located inside genes .
However , it was unclear whether these intragenic binding sites represent active promoters .
Here , we construct and assay transcriptional promoter-lacZ fusions for all 52 putative FliA promoters previously identified by
ChIP-seq .
These experiments , coupled with integrative analysis of published genome-scale transcriptional datasets , strongly suggest that most intragenic FliA binding sites are active promoters that transcribe highly unstable RNAs .
Additionally , we show that widespread intragenic FliA-dependent transcription may be a conserved phenomenon , but that specific promoters are not themselves conserved .
We conclude that intragenic
FliA-dependent promoters and the resulting RNAs are unlikely to have important regulatory functions .
Nonetheless , one intragenic FliA promoter is broadly conserved , and constrains evolution of the overlapping protein-coding gene .
Thus , our data indicate that intragenic regulatory elements can influence bacterial protein evolution , and suggest that the impact of intragenic regulatory sequences on genome evolution should be
INTRODUCTION
In bacteria , RNA polymerase ( RNAP ) requires a transcription initiation factor , σ , to recognize promoter elements and initiate transcription .
Bacteria encode one housekeeping σ factor that functions at most promoters , and multiple `` alternative '' σ factors that each recognize smaller sets of promoters .
Historically , promoters were thought to be located solely upstream of annotated genes .
However , widespread transcription initiation from inside genes has now been described in Escherichia coli and many other species ( reviewed , ( Lybecker et al. ,
2014 ; Wade and Grainger , 2014 ) ) .
Consistent with these observations , the 70 E. coli housekeeping σ factor , σ , has been shown to bind many intragenic sites ( Singh et al. , 2014 ) .
Similar findings have been reported for alternative σ factors , e.g. 40 % of Mycobacterium tuberculosis SigF binding sites , 25 % of 32 E. coli σ binding et al. , 2013 ; Bonocora et al. , 2015 ) .
The high degree of pervasive transcription involving multiple σ factors
Like σ factors , DNA-binding transcription factors often bind extensively within genes ( Shimada et al. , 2008 ; J.
Galagan et al. , 2013 ; J. E. Galagan et al. , 2013 ; Bonocora et al. , 2013 ; Wade and Grainger , 2014 ; Grainger ,
2016 ) .
The regulons of most transcription factors have not been mapped , even for E. coli , suggesting that most intragenic binding sites remain to be identified .
Indeed , a study of 51 transcription factors in Mycobacterium tuberculosis suggests that a typical bacterial genome contains > 10,000 intragenic binding sites ( J. E. Galagan et al. , 2013 ) .
The transcriptional activities of most intragenic transcription / σ factor binding sites have not been extensively studied , but many are likely to be functional ( J. E. Galagan et al. , 2013 ) .
Although transcription regulatory networks evolve rapidly , individual regulatory interactions are often maintained by purifying selection ( Lozada-Chávez et al. , 2006 ; Perez and Groisman , 2009 ; Stringer et al. , 2014 ) .
Hence , many intragenic transcription / σ factor binding sites may be functional , and thus are likely to be conserved .
A previous study suggested that purifying selection on intragenic transcription / σ factor binding sites in human cells constrains the evolution of overlapping protein-coding genes ( Stergachis et al. , 2013 ) .
The impact of bacterial
FliA ( σ ) is an alternative σ factor involved in transcription of genes associated with flagellar motility and chemotaxis ( reviewed ( Paget , 2015 ) ) .
FliA also initiates transcription of some non-flagellar genes in E. coli
( Fitzgerald et al. , 2014 ) , and is encoded by some non-motile bacteria , such as Chlamydia ( Yu and Tan , 2003 ) , suggesting additional non-flagellar roles .
Recently , we reported that over half of E. coli FliA binding sites are located inside genes , often far from gene starts ( Fitzgerald et al. , 2014 ) .
These intragenic sites were split approximately evenly between those occurring in the sense and antisense orientations , with respect to the overlapping gene .
Most intragenic FliA binding sites were not associated with detectable FliA-dependent
RNAs , so it is unclear whether they represent functional promoters .
Notably , FliA is the most highly and broadly conserved alternative σ factor ( Feklístov et al. , 2014 ; Paget , 2015 ) .
The interactions between FliA ,
RNA polymerase , and promoter DNA are so highly conserved that the D Bacillus subtilis homolog , σ , can complement an E. coli ΔfliA strain ( Chen and Helmann , 1992 ) .
Like many alternative σ factors , FliA has a decreased ability to melt DNA as compared to housekeeping σ factors ( Koo , Rhodius , Nonaka , et al. , 2009 ;
Feklístov et al. , 2014 ) .
Thus , FliA-dependent transcription initiation requires a stringent match to its consensus promoter sequence ( Koo , Rhodius , Campbell , et al. , 2009 ) .
Together , the high conservation and readily
In this study , we evaluate the promoter activity of intragenic FliA binding sites in E. coli .
We also assess the conservation of intragenic FliA promoters and map the Salmonella FliA regulon .
We conclude that most intragenic FliA binding sites represent bona fide promoters that transcribe unstable intragenic RNAs .
We show that extensive intragenic transcription by FliA is likely to be a conserved phenomenon , but the genetic locations of intragenic FliA promoters are generally not conserved .
Nonetheless , we show that a single intragenic FliA promoter is under strong selective pressure that constrains the evolution of the FlhC protein .
This is the first documented example of intragenic regulatory sequence impacting evolution of the overlapping protein-coding gene in a bacterium , and suggests that selective pressure on intragenic binding sites for σ factors and
RESULTS
Most intragenic FliA binding sites represent transcriptionally active promoters
To test whether FliA binding sites previously identified by ChIP-seq ( Fitzgerald et al. , 2014 ) represent active promoters , we generated transcriptional fusions of potential promoters to the lacZ reporter gene .
For each of the
52 putative FliA promoters , the region from approximately -200 to +10 was cloned upstream of lacZ on a single-copy plasmid ( Figure 1A ) .
We chose to include 200 bp upstream sequence because at least one FliA promoter is regulated by a transcription factor binding upstream ( Hollands et al. , 2010 ) .
Plasmids were transformed into a motile strain of E. coli MG1655 ( i.e. expressing FliA ) , or an isogenic ΔfliA derivative , and assayed for β-galactosidase activity .
Of the 20 intergenic promoters , 15 displayed significant FliA-dependent activity ( t-test , p ≤ 0.05 ; Figure 1B ) .
Of the 30 intragenic promoters , 10 out of 16 sense - and 7 out of 14 antisense-orientation putative intragenic promoters showed significant FliA-dependent activity ( t-test , p ≤ 0.05 ; with transcription of stable RNAs ( ( flhC ) motAB-cheAW , ( yafY ) ykfB , ( yjdA ) yjcZ , ( uhpT ) , and antisense ( hypD ) , where genes in parentheses indicate those with an internal FliA promoter .
One of the two putative promoters located in convergent intergenic regions also showed significant FliA-dependent activity ( t-test , p ≤ 0.05 ; Figure
1C ) .
It should be noted that some fusions had very high levels of background activity , which may have prevented the detection of lower levels of FliA-dependent transcription from these promoter fusions .
Of note , no FliA-dependent activity was detected for the well-characterized promoters upstream of fliA , fliD , and fliL , likely due to overwhelming transcriptional activity from the strong , 70 σ - dependent , FlhDC-activated promoters known to be immediately upstream ( Liu and Matsumura , 1996 ; Stafford et al. , 2005 ; Fitzgerald et al. , 2014 ) .
High β-galactosidase activity associated with the lacZ fusions for pntA , cvrA , glyA , proK , and insB-4 / cspH suggest they are also likely to include σ promoters that may preclude identification of FliA-dependent
We previously identified FliA-regulated transcripts using RNA-seq , although most intragenic FliA sites were not associated with a detectable RNA ( Fitzgerald et al. , 2014 ) .
However , this method often fails to detect unstable RNAs .
To independently assess whether intragenic FliA binding sites act as promoters , we analyzed two published datasets generated from motile E. coli strains : ( i ) genome-wide transcription start site ( TSS ) mapping by differential RNA-seq ( dRNA-seq ) ( Thomason et al. , 2015 ) , and ( ii ) Nascent Elongating Transcript sequencing ( NET-seq ) ( Larson et al. , 2014 ) .
dRNA-seq identifies TSSs by selectively degrading processed transcripts bearing a 5 ' monophosphate , and then preparing a library from the remaining 5 ' triphosphate-bearing primary transcripts ( Sharma and Vogel , 2014 ) .
By focusing reads to the 5 ' ends of transcripts , this technique is more sensitive than standard RNA-seq , and can distinguish intragenic RNAs from overlapping mRNAs .
NET-seq isolates nascent RNA still bound to RNAP , facilitating detection of unstable transcripts prior to degradation
To compare FliA binding site location to TSS mapping data , we determined the distance from the predicted
FliA promoter sequence associated with each FliA binding site ( Fitzgerald et al. , 2014 ) to all downstream TSSs within 500 bp ( Figure 2A ) .
For most well-characterized FliA-dependent promoters for flagellar genes , the distance between the center of the promoter sequence and TSS was between 18 and 22 bp .
For other FliA binding sites , we observed a strong enrichment for TSSs between 18 and 23 bp downstream of FliA motif centers .
In total , 38 of the 52 FliA binding sites have a TSS located 18-23 bp downstream of the center of their predicted promoter .
This positional enrichment is highly significant when compared to the same analysis performed with a randomized TSS dataset ; only one random TSS was between 18-23 bp downstream of a FliA
To systematically assess whether FliA binding sites are associated with signal in the NET-seq dataset , the sequence read coverage upstream and downstream of FliA binding sites was determined .
For FliA binding sites associated with a TSS , the read coverage at each position from -100 to +100 was determined relative to the
TSS .
For all other FliA binding sites , a TSS was predicted at 20 bp downstream of the predicted promoter sequence center ( average position of other TSSs ) , and coverage was determined from -100 to +100 relative to this position .
The coverage profile for each binding site was normalized to the minimum and maximum coverage in the region and plotted as a heatmap ( Figure 2B ) .
There is a clear trend of higher NET-seq read coverage downstream of FliA binding sites , compared to the regions immediately upstream .
To quantify this trend , the ratio of NET-seq read coverage upstream and downstream of the TSS was calculated for each putative
FliA-dependent promoter .
In total , 44 out of the 52 putative promoters showed at least 2-fold higher coverage in the region 100 bp downstream of the TSS than in the region 100 bp upstream of the TSS .
These 44 putative
FliA binding sites with transcriptional activity detected by NET-seq and those detected by TSS association
In total , 26 of the 30 intragenic FliA binding sites , and one of the two FliA sites in a convergent intergenic region , show evidence of promoter activity from at least one assay .
Table 1 summarizes the existing evidence for these sites .
It should be noted that neither the TSS nor NET-seq datasets have matched ΔfliA controls , so it is formally possible that TSSs/transcripts are associated with FliA-independent promoters .
However , this is highly unlikely given the position of putative TSSs and the position of NET-seq signal with respect to the predicted
FliA promoter sequences .
Overall , there is substantial overlap between the sets of putative intragenic promoters that display FliA-dependent activity in promoter fusion assays , those with appropriately positioned TSSs , and
Most intragenic FliA promoters are not conserved across species
To assess whether intragenic FliA promoters and binding sites are likely to be functionally important , we determined conservation of these sites bioinformatically .
The sequence surrounding each of the 52 FliA binding sites previously identified by ChIP-seq ( Fitzgerald et al. , 2014 ) was extracted and used as a BLAST query to search genomes from 24 γ-proteobacterial genera ( Table S1 ) .
All genomes queried encode FliA , except for those of Klebsiella and Raoultella , which were included as controls .
If a homologous region was identified , it was scored against the previously determined E. coli FliA position-weight matrix ( Fitzgerald et al. , 2014 ) .
These scores are depicted as a heatmap in Figure 3A , where yellow represents the highest-scoring sites and blue the lowest-scoring .
Sites are categorized by location and orientation , and then ranked by total degree of conservation within each category , from left to right .
The well-characterized FliA-dependent promoter inside flhC , which drives transcription of the downstream motABcheAW operon , was the most highly conserved .
All other well-characterized , flagellar-related FliA promoters were well-conserved at the sequence level , with the exception of the promoter upstream of the fliLMNOPQR operon 70 , which is also transcribed by σ in E. coli .
Most novel intergenic and intragenic FliA binding sites showed no evidence of conservation , even in close relatives such as Salmonella .
It should be noted that a few intragenic FliA binding sites , such as those inside hslU , glyA , and ybhK , appear conserved , but score equally well in species that lack fliA ( Klebsiella and
Raoultella ) , suggesting they are maintained for reasons independent of their ability to bind FliA , most likely because of high levels of conservation for these protein-coding genes .
A few other intragenic promoters , such as those inside uhpC , hypD , metF , and speA , show possible sequence conservation in Salmonella , but not in more
Intragenic FliA promoters are not conserved across E. coli strains
Previous studies suggest that while intragenic promoters may not be conserved between species ( Raghavan et al. , 2012 ) , they may be conserved within strains of the same species ( Shao et al. , 2014 ) .
Hence , we bioinformatically determined the conservation of all FliA sites across 9,432 E. coli strains for which a genome sequence is available ( Table S2 ) .
The sequence surrounding each of the 52 FliA binding sites previously identified by ChIP-seq ( Fitzgerald et al. , 2014 ) was extracted and used as a BLAST query to search genomes from each E. coli genome contig .
If a homologous region was identified , we determined whether each position in each E. coli K-12 FliA site is conserved .
We then determined the proportion of strains with a homologous region in which each position of each FliA site is conserved .
Figure 3B shows the level of conservation of each position of FliA sites divided into two classes : ( i ) sites that represent promoters of mRNAs ( based on our
The second class includes most of the intragenic FliA sites .
FliA sites that represent promoters of mRNAs are with the lack of sequence requirements in the spacer region for FliA binding .
By contrast , FliA sites that do not conservation between these regions and the spacer .
We conclude that , as a group , FliA binding sites that do not
Genome-wide mapping of the Salmonella Typhimurium FliA regulon
Salmonella enterica and E. coli diverged approximately 100 million years ago and exhibit substantial drift at wobble positions ( Gordienko et al. , 2013 ) .
As an independent , empirical test of FliA binding site conservation , we determined the genome-wide binding profile of S. enterica serovar Typhimurium FliA using ChIP-seq of a
C-terminally tagged derivative expressed from its native locus .
To facilitate comparison with E. coli ChIP-seq data , we grew cells under similar conditions as those used in our previous study of E. coli FliA ( Fitzgerald et al. , 2014 ) .
A total of 23 high-confidence FliA binding sites were identified ( Table 2 , Figure 4A ) .
Of these 23 sites , three are inside genes but within 300 bp of a gene start ( 13 % ; Figure 4B ) , and five are inside genes and far from a gene start ( 22 % ) .
No equivalent ChIP-seq peaks were identified using a control , untagged strain of S.
Typhimurium .
All 23 S. Typhimurium FliA binding sites are associated with a match to the consensus FliA motif ( Figure 4C ; MEME , E-value = 7.4e-49 ) , and motif positions were enriched in the region ~ 25 bp upstream of peak centers , as previously described for FliA binding sites in E. coli ( Fitzgerald et al. , 2014 ) .
As predicted by the sequence conservation analysis ( Figure 3A ) , FliA-dependent promoters upstream of key flagellar operons were conserved in S. Typhimurium .
However , with the exception of the motA promoter that is located inside flhC , no intragenic FliA binding sites were found to be conserved between E. coli and S. Typhimurium .
RNA-seq was used to assess FliA-dependent changes in gene expression by comparing wild-type and ΔfliA strains of S. Typhimurium ( Figure 5 ) .
As for the ChIP-seq experiment , cells were grown under similar significantly differentially expressed between the two strains ( q-value ≤ 0.01 , fold-change ≥ 2 ) , of which 36 were downstream of FliA binding sites identified by ChIP-seq ( Table 2 ) .
The intragenic FliA binding sites within flhC , STM14_3340 , and STM14_3895 were associated with FliA-dependent regulation of the downstream genes , all of which are known flagellar genes .
The other intragenic binding sites were not
The motA promoter within flhC constrains evolution of the FlhC protein
Although most intragenic FliA promoters in E. coli are not well conserved in other species , the motA promoter , located inside flhC , is highly conserved ( Figure 3A ) .
However , it is unclear whether this conservation is due to selective pressure on the promoter or on the amino acid sequence of FlhC , which is encoded by the same DNA .
As expected given the conservation of the motA promoter inside flhC , the two FlhC amino acids , Ala177 ¬
Asp178 , that are encoded by sequence overlapping the -10 region , are highly conserved among γ-proteobacteria
6A ) , leading us to hypothesize that the Ala-Asp motif is conserved due to selective pressure on the motA promoter , rather than on the amino acids themselves .
To test this hypothesis , we determined whether Asp178 is required for FlhC function .
We created a strain of motile E. coli MG1655 in which the flhDC promoter is transcriptionally active , but flhC is replaced with a cassette containing thyA under the control of a constitutive σ promoter .
Thus , this strain lacks the motA promoter , but we reasoned that motA would be co-transcribed with thyA ( Figure 6B ) .
We then introduced either wild-type FlhC or D178A FlhC from a plasmid , or an empty vector control .
Cells containing the empty vector control were non-motile , as expected given that they lack FlhC expressing D178A FlhC were also fully motile ( mean motility level relative to wild-type FlhC of 0.97 ± s.d.
0.09 , n = 3 ; Figure 6B ) .
We conclude that the conserved Asp178 is likely not required for FlhC function .
To further investigate the conservation of the Ala-Asp motif in FlhC , we aligned the sequences of FlhC homologues from 98 different proteobacterial species , each from a different genus in which motA is positioned immediately downstream of flhC ( Table S4 ) .
Although Ala177 and Asp178 are well conserved across these conserved ( Table S4 ) .
We reasoned that if Asp178 is broadly conserved due to selective pressure on the overlapping motA promoter , species in which Asp178 is not conserved are likely to have repositioned the motA promoter .
To test this hypothesis , we extracted the intergenic sequences between flhC and motA for each of the
43 species where Asp178 is not conserved ( Figure S1 ) .
Consistent with our hypothesis , we identified a strongly enriched sequence motif in 19 species ( MEME E-value = 1.5e-32 ) corresponding to a consensus FliA promoter
S1 ) , we did not observe enrichment of a FliA promoter motif in the flhC-motA intergenic region .
Having a FliA promoter for motA within flhC is likely to be the ancestral state , since the position of FliA promoters in flhCmotA intergenic regions differs extensively between species , as do the sequences flanking these promoters .
We also compared the length of the flhC-motA intergenic region in ( i ) the 19 species where FlhC Asp178 is not conserved and for which we identified a likely intergenic FliA promoter , and ( ii ) the 55 species where FlhC
Asp178 is conserved .
Intergenic distances in group ( i ) are significantly higher ( median length 207 bp ) than those in group ( ii ) ( median length 131 bp ; Mann-Whitney U Test p = 4.0e-7 ) .
We conclude that the selective pressure on Asp178 is lost in species that reposition the motA promoter to the flhC-motA intergenic region , and
DISCUSSION
Most FliA Binding Sites are Active Promoters for Unstable RNAs
Most FliA binding sites identified by ChIP-seq display FliA-dependent promoter activity when fused upstream of the lacZ reporter gene ( Figure 1 ) .
Many of these FliA binding sites , and some additional sites that had inactive lacZ fusions , are associated with correctly positioned TSSs and NET-seq signal from published studies
( Larson et al. , 2014 ; Thomason et al. , 2015 ) .
Together , these data suggest that almost all FliA binding sites represent transcriptionally active FliA-dependent promoters , regardless of their location relative to proteincoding genes .
The small subset of FliA binding sites that appear to be transcriptionally inert were amongst the most weakly bound sites detected by ChIP-seq ( Fitzgerald et al. , 2014 ) .
Three of these sites have at least one mismatch to key -10 region residues ( Koo , Rhodius , Campbell , et al. , 2009 ) , suggesting that the sites are unlikely to be active promoters , or are so weakly transcribed that their activity is undetectable using standard
Although most intragenic FliA binding sites are likely to represent active promoters , they are not associated with the transcription of stable RNAs , since we previously detected very few such RNAs using standard RNA-seq ( Fitzgerald et al. , 2014 ) .
We conclude that most intragenic FliA promoters drive transcription of unstable
RNAs .
This is consistent with the previously described phenomenon of `` pervasive transcription '' that generates large numbers of short , unstable transcripts , primarily from promoters within genes ( Lybecker et al. , 2014 ;
Wade and Grainger , 2014 ) .
Intragenic promoters typically drive transcription of non-coding RNAs .
Transcription of these RNAs is rapidly terminated by Rho ( Peters et al. , 2012 ) , and the transcripts are rapidly
Limited conservation of the FliA regulon outside of core flagellar genes
Evolutionary conservation of DNA sequences is due to purifying selection , and suggests that the sequence has beneficial function . .
As expected , most flagella-associated FliA promoters are highly conserved at the sequence level ( Figure 3 ) .
Of the intragenic FliA binding sites , only those that drive transcription of an mRNA for a downstream gene appear to be at all functionally conserved .
A few intragenic promoters , such as those within hslU , glyA , and ybhK , are conserved at the sequence level between E. coli and many species ( Figure 3A ) .
However , the fact that these sites are also conserved in two genera not encoding fliA -- Klebsiella and Raoultella
-- suggests that the DNA sequences are maintained for reasons independent of FliA , most likely purifying
To experimentally validate the sequence-based conservation predictions , we performed ChIP-seq on S.
Typhimurium FliA .
As predicted based on sequence conservation , all key flagellar promoters were functionally conserved , except the one upstream of fliLMNOPQR .
In E. coli , this operon is primarily 70 transcribed from a σ promoter that is activated by FlhDC ( Liu and Matsumura , 1996 ; Stafford et al. , 2005 ; Fitzgerald et al. , 2014 ) .
Conservation of the σ promoter and FlhDC regulation would ensure that these genes are coordinately regulated with other flagellar genes in S. Typhimurium , potentially relieving the selective pressure to maintain the FliA promoter .
Our ChIP-seq data indicate the only intragenic FliA promoter functionally conserved between E. coli and S. Typhimurium is that within flhC .
While specific intragenic FliA binding sites were not conserved , S. Typhimurium FliA binds multiple intragenic sites .
This suggests that the factors affecting FliA specificity , or lack thereof , are similar between E. coli and S. Typhimurium , and that the phenomenon of intragenic FliA promoters is conserved , even if the specific promoters are not .
Note that we identified fewer intragenic FliA sites in S. Typhimurium than we previously identified in E. coli ( Fitzgerald et al. , 2014 ) , but this is likely due to the data for S. Typhimurium having slightly lower signal-to-noise ratios ( compare ChIP-seq
It should be noted that lack of conservation of specific promoters does not necessarily indicate a lack of functional importance , but could instead reflect lineage-specific evolution .
Indeed , regulatory small RNAs are often poorly conserved , even between closely related species ( Toffano-Nioche et al. , 2012 ; Beauregard et al. ,
2013 ; Patenge et al. , 2015 ) .
However , our analysis of conservation within E. coli suggests that most intragenic
FliA promoters are not conserved even within the species , although this multi-promoter analysis does not rule out the possibility that a small proportion of the intragenic promoters are functional .
Indeed , one of the two stable , FliA-transcribed non-coding RNAs -- that transcribed from within uhpT -- is likely a functional regulator .
A recent study detected numerous Hfq-mediated interactions between mRNAs and RNA originating from the 3 ' end of uhpT ( Melamed et al. , 2016 ) .
Although the uhpT sequences from these interactions map to locations downstream of the sRNA predicted by RNA-seq ( Fitzgerald et al. , 2014 ) , an earlier microarray study and NET-seq data suggest that the FliA-transcribed sRNA extends further downstream ( Reppas et al. , 2006 ; Larson et al. ,
2014 ) .
The other stable , FliA-transcribed non-coding RNA -- that transcribed from within hypD -- was not detected in any sRNA : mRNA interactions ( Melamed et al. , 2016 ) , suggesting that it is not functional .
Unstable
FliA-transcribed non-coding RNAs are also unlikely to be functional , given their transient nature , and the lack
Intragenic FliA promoters likely arise as a result of sequence drift during evolution , although the likelihood of creating a FliA promoter as a result of a base substitution is lower than for some other σ factors , since FliA promoters require a more stringent match to the consensus sequence .
Nonetheless , we estimate that there are
474 possible single base substitutions in the E. coli genome that would create a new FliA promoter ( see
Methods ) .
Strikingly , this number is similar to the number of single base substitutions that we predict would destroy an existing FliA site , based on the number of actual FliA sites and the information content of the binding motif .
We propose that the number of intragenic FliA sites in E. coli is in equilibrium , but that nonfunctional sites turn over relatively frequently .
The prevalence of intragenic FliA promoters in E. coli and S.
Typhimurium suggests that they do not substantially impact expression of the overlapping genes .
Consistent with this , we detected significant FliA-dependent regulation of only three S. Typhimurium genes that have an internal FliA site ( Figure 5 ; Table 2 ) ; one of these genes ( STM14_3340 ) is immediately upstream of a FliAtranscribed flagellar gene , and another ( motB ) is a downstream gene in a FliA-transcribed operon .
While most intragenic FliA promoters are unlikely to be individually functional , the phenomenon of widespread intragenic
FliA sites may be functional .
For example , intragenic FliA sites could titrate cellular FliA , thereby sensitizing could reduce stochasticity in effective FliA levels , by requiring that FliA levels be maintained at higher levels .
These functions would be independent of the specific locations of FliA promoters , and more dependent on the number and strength of promoters .
Spontaneous creation of FliA binding sites by genetic drift may also provide a source of novel , functional FliA promoters , e.g. if there is a selective advantage of coordinately regulating the
The motA promoter inside flhC constrains the evolution of FlhC
conserved of all FliA promoters .
This promoter has been described previously , and drives transcription of the require a stringent match to the consensus promoter sequence ( Koo , Rhodius , Campbell , et al. , 2009 ) , and this is reflected by the high information content in the sequence motif associated with FliA binding , especially in the
-10 region ( Figure 4C ) ( Fitzgerald et al. , 2014 ) .
Hence , conservation of an intragenic FliA promoter is likely to result in conservation of the amino acid sequence for the overlapping codons .
The -10 region of the FliA promoter in flhC corresponds to an Ala-Asp motif in the FlhC protein .
This motif is broadly conserved .
Multiple independent lines of evidence support the idea that the Ala-Asp sequence motif is conserved due to selective pressure on the intragenic FliA promoter and not on the amino acids themselves : ( i ) amino acids close to the Ala-Asp motif that are not associated with FliA promoter elements are poorly conserved ( Figure 6A ) ; ( ii ) the Ala-Asp motif is not present in the X-ray crystal structure of FlhDC ( Wang et al. , 2006 ) , suggesting that it is in a disordered region ; ( iii ) Asp178 does not detectably contribute to FlhC function ( Figure 6B ) ; and ( iv ) in proteobacterial species where flhC and motA are adjacent genes but FlhC Asp178 is not conserved , an alternative FliA promoter is often located in the intergenic region between flhC and motA ( Figure 6C ) .
Thus , even in cases where the specific FliA promoter inside flhC is not conserved , the presence of a FliA promoter upstream of motA is conserved .
If the FliA promoter inside flhC were conserved because of selective pressure on the Ala-Asp motif , we would expect that ( i ) surrounding amino acids would also be conserved , regardless of whether they are encoded in sequence overlapping key FliA promoter elements , ( ii ) the Ala-Asp motif would be part of an important structural motif , ( iii ) Asp178 would be required for motility , and ( iv ) in species where
Asp178 is not conserved , there would be no selective pressure to acquire an alternative FliA promoter for motA .
We therefore conclude that the amino acid sequence of FlhC is constrained by the internal promoter for motA .
Thus , the evolution of FlhC protein sequence is directly impacted by the function of the downstream gene .
The potential for an abundance of bacterial regulatory sequences that constrain protein evolution
A recent study reported large numbers of putative transcription factor binding sites in the coding sequences of the human genome , and suggested that these sequences are under selective pressure for both their regulatory and coding functions ( Stergachis et al. , 2013 ) .
While the specific findings of that study have been questioned
( Xing and He , 2015 ) , the FliA promoter inside flhC is clearly analogous .
We propose that conservation of intragenic sequences due to selective pressure on their regulatory function is likely to occur far more frequently in bacteria than in eukaryotes .
The compact nature of bacterial genomes causes them to be gene-dense , greatly limiting the non-coding sequence space ; in E. coli , ~ 90 % of the genome is protein-coding , in stark contrast to the human genome , which is < 2 % protein-coding .
Consistent with the paucity of non-coding sequence in bacterial genomes , numerous intragenic binding sites have been identified for transcription factors and σ factors
( Wade et al. , 2006 ; Shimada et al. , 2008 ; Hartkoorn et al. , 2012 ; J. Galagan et al. , 2013 ; J. E. Galagan et al. ,
2013 ; Bonocora et al. , 2013 ; Wade and Grainger , 2014 ; Bonocora et al. , 2015 ; Grainger , 2016 ) .
In some cases , low stringency in the DNA sequence requirements for binding may allow for sequence changes that change encoded amino acids while maintaining regulatory function 70 .
For example , there are many intragenic σ consensus ( Singh et al. , 2014 ) 70 .
Hence , even if an intragenic σ promoter is under selective pressure , it could acquire mutations that alter the overlapping coding potential without affecting promoter strength .
However , bacterial transcription factors and some alternative σ factors tend to have high information content binding sites , especially compared to their eukaryotic equivalents ( Wade et al. , 2005 ; Wunderlich and Mirny , 2009 ) .
This suggests that functional conservation of intragenic transcription / σ factor binding sites in bacteria will often
Identification of regulatory sequences that constrain protein evolution requires further investigation of intragenic regulatory sites .
Although numerous intragenic binding sites have been identified , their regulatory capacity is often unclear , and their conservation has not been extensively analyzed .
Intragenic promoters have been reported in numerous bacterial species ( Lybecker et al. , 2014 ; Wade and Grainger , 2014 ) .
Limited evolutionary analysis suggests that most promoters for antisense RNAs are not conserved ( Raghavan et al. ,
2012 ) , although there is evidence for lineage-specific conservation ( Shao et al. , 2014 ) .
Importantly , there are specific examples of intragenic σ factor binding that likely constrain evolution of the amino acid sequence encoded by the overlapping protein-coding gene 24 .
First , an intragenic promoter for the alternative σ factor , σ , is conserved both at the sequence level and functionally ( Guo et al. , 2014 ; Li et al. , 2015 ) .
This promoter drives transcription of a non-coding , regulatory RNA , MicL , that is also conserved ( Guo et al. , 2014 ) .
Hence , both the promoter and non-coding RNA might represent dual-usage sequence .
Second , an 54 alternative σ factor , σ , binds many intragenic sites in E. coli and S. Typhimurium that are conserved both at the sequence level and functionally ( Bonocora et al. , 2015 ; Bono et al. , 2017 ) , suggesting that they may constrain protein evolution .
Since conserved intragenic σ binding sites are likely to be promoters for downstream genes ( Bonocora et al. ,
2015 ) , evolution of the amino acid sequence of proteins encoded 54 by genes containing σ promoters may often
Extrapolating from our data for FliA , the majority of intragenic transcription / σ factor binding sites are likely to be non-functional , and hence not under selective pressure .
These sites would therefore not impact protein evolution .
Even though the complete regulons of most E. coli transcription / σ factors remain to be mapped , thousands of intragenic sites have already been identified , implying that there are thousands more sites yet to be discovered .
Even if only a small fraction of intragenic sites are under selection , this would indicate the existence of many such sequences that constrain protein evolution .
Hence , our data suggest that the evolutionary impact of intragenic regulatory sequences should be considered more broadly , as it is likely to be
MATERIALS AND METHODS
Strains, plasmids, and growth conditions
All bacterial strains and plasmids used in this study are listed in Table 3 .
All oligonucleotides used in this study are listed in Table S5 .
All E. coli strains are derivatives of the motile MG1655 strain ( DMF36 ) described previously ( Fitzgerald et al. , 2014 ) .
To construct strains used for β-galactosidase assays , the native lacZ gene of
DMF36 , or the isogenic ΔfliA strain ( DMF40 ) ( Fitzgerald et al. , 2014 ) was replaced by thyA using FRUIT recombineering ( Stringer et al. , 2012 ) with oligonucleotides JW5397 and JW5398 , generating strains DMF122 and DMF123 , respectively .
flhC and 106 bp downstream sequence was replaced with thyA in DMF36 using
FRUIT recombineering ( Stringer et al. , 2012 ) to generate strain CDS105 .
Salmonella strains are derivatives of
S. enterica serovar Typhimurium 14028s ( Jarvik et al. , 2010 ) .
S. Typhimurium FliA was N-terminally epitope tagged with a 3x-FLAG tag at the native chromosomal locus using FRUIT recombineering ( Stringer et al. ,
2012 ) , generating strain DMF087 .
The S. Typhimurium ΔfliA strain , DMF088 , was constructed using FRUIT
Wild-type flhC was PCR-amplified using oligonucleotides JW8879 and JW8880 , and cloned into the SacI and
SalI restriction sites of pBAD30 ( Guzman et al. , 1995 ) using the In-Fusion method ( Clontech ) to generate pCDS043 .
D178A mutant flhC was PCR-amplified using oligonucleotides JW8879 and JW8881 , and cloned as described for wild-type fhlC , to generate pCDS044 .
Transcriptional fusions of putative FliA promoters to lacZ were constructed in plasmid pAMD-BA-lacZ ( Stringer et al. , 2014 ) .
Putative promoter regions ( nucleotide positions -200 to +10 , relative to the predicted TSS ) were PCR-amplified from MG1655 cells .
PCR products were cloned into pAMD-BA-lacZ cut with SphI and NheI using the In-Fusion method ( Clontech ) .
Oligonucleotides used for the plasmid cloning are listed in Table 3.
For all experiments involving liquid growth , subcultures were grown in LB at 37 °C , with aeration , to OD600
Transcriptional lacZ promoter fusion plasmids were transformed into ΔlacZ strains with ( DMF122 ) or without fliA ( DMF123 ) .
Promoter activity was assessed by β-galactosidase assay , as previously described ( Stringer et
Analysis of published TSS data
To determine whether FliA binding sites were associated with TSSs , a published list of TSS locations derived from dRNA-seq was used ( Thomason et al. , 2015 ) .
Orientation of putative FliA promoters was determined based on associated motifs .
For each putative FliA promoter , the distance from the motif center to each downstream TSS on the correct strand was calculated .
All pairwise distances < 500 bp are plotted in Figure 2A .
As a control , a randomized TSS dataset was generated with the same total number and distribution ( with respect to strand and being intragenic/intergenic ) as the experimental dataset .
The analysis was repeated with this
Analysis of published NET-seq data
Raw sequencing data files from NET-seq experiments ( Larson et al. , 2014 ) were obtained and mapped to the E. coli MG1655 genome using CLC Genomics Workbench .
Sequence read depths at positions surrounding putative FliA promoters were calculated using a custom Python script .
For FliA binding sites associated with a
TSS , the NET-seq read coverage was calculated at every position from -100 to +100 relative to the TSS .
For
FliA binding sites not associated with a TSS , a TSS was predicted to be located 20 bp downstream of the motif center , and NET-seq read coverage was calculated from -100 to +100 relative to this position .
For each region ,
NET-seq read coverage was normalized to local minimum and maximum values .
Normalized read coverage
The locations of all E. coli FliA binding sites described previously ( Fitzgerald et al. , 2014 ) were used to identify homologous sequences in 24 other species ( Table S1 ) .
A Position Specific Scoring Matrix ( PSSM ) was derived from the identified FliA binding sites in E. coli ( Fitzgerald et al. , 2014 ) , as described previously ( Bonocora et al. , 2015 ) .
We then took a 300 bp sequence surrounding each FliA site in E. coli MG1655 .
For sites within
ORFs we used BLASTX ( Altschul et al. , 1990 ) to search for homologous protein sequences in the selected bacterial species ( BLAST E-value cut-off of 1e-04 , low-complexity filter turned off ) .
Using the PSSM , we scored the top-scoring BLAST hit for each species , searching within 100 bp of the position corresponding to the binding site in E. coli .
For sites within intergenic regions , we used BLASTN to search for regions homologous to each of the 300 bp sequences in each of the selected species ( BLAST E-value cut-off of 1e-04 , lowcomplexity filter turned off ) , and extracted 100 bp on either side of the position corresponding to the position of the site in E. coli .
If no hits were found , we took the sequence of the downstream gene in E. coli and used
BLASTX to search for homologues in the selected species ( BLAST E-value cut-off of 1e-04 , low-complexity filter turned off ) .
For each top BLAST hit , we used the position of the binding site in E. coli relative to the downstream gene to determine the predicted site of binding , and extracted 100 bp on either side .
We calculated
PSSM scores for all sequences in each of the selected regions .
The best score for each region tested was
FliA binding site conservation in E. coli strains
All complete or partial genome sequences for E. coli ( 9432 genomes or contigs ; Table S2 ) were downloaded directly from NCBI and individually scored for the presence FliA sites using the method described above for
ChIP-seq of S. Typhimurium FliA
ChIP-seq was performed with strains DMF087 ( FliA-FLAG3 ) or 14028s ( untagged control ) as previously described ( Stringer et al. , 2014 ) .
Sequence reads were mapped to the S. Typhimurium 14028s genome using
CLC Genomics Workbench ( Version 8 ) .
Peaks were called using a previously described analysis pipeline
( Fitzgerald et al. , 2014 ) .
Three peaks with a FAT score of 1 were identified in the control dataset ; these peaks
RNA-seq
RNA-seq was performed with strains 14028s and DMF088 , as previously described ( Stringer et al. , 2014 ) .
Read mapping and differential expression analysis were performed using Rockhopper ( McClure et al. , 2013 ) .
The normalized expression values and indicators of statistical significance in Table 2 were generated using
Rockhopper.
Analysis of FlhC sequence conservation
We used the RSAT `` Comparative Genomics/Get Orthologs '' tool ( default parameters , except we required 50 % amino acid sequence identity ; ( Medina-Rivera et al. , 2015 ) ) to identify 52 FlhC homologues from γ-proteobacterial species , each from a different genus .
We aligned protein sequences using MUSCLE ( v3 .8 , default parameters ; ( Edgar , 2004 ) ; Table S3 ) , and for each FlhC homologue we counted matches at each amino
Identification of enriched sequence motifs in flhC-motA intergenic regions
We used the RSAT `` Comparative Genomics/Get Orthologs '' tool ( default parameters , except we required 40 % amino acid sequence identity ; ( Medina-Rivera et al. , 2015 ) ) to identify 130 FlhC homologues from proteobacterial species , each from a different genus .
We aligned these protein sequences using MUSCLE ( v3 .8 , default parameters ; ( Edgar , 2004 ) ; Table S4 ) .
To determine whether the flhC and motA genes are adjacent in each of the 131 species selected , we first used the RSAT `` Comparative Genomics/Get Orthologs '' tool ( default parameters except required 40 % amino acid sequence identity ; ( Medina-Rivera et al. , 2015 ) ) to extract 100 bp of sequence immediately downstream of the end of the intergenic region following flhC for each species .
We then searched for open reading frames similar to that of E. coli K-12 motA using BLASTX ( v2 .2.3 , hosted on
EcoGene 3.0 , default parameters , searching against the E. coli annotated proteome ; ( Altschul et al. , 1997 ; Zhou and Rudd , 2013 ) ) .
We discarded 32 FlhC sequences for which there was no BLASTX match to MotA with the corresponding sequence downstream of flhC .
For each of the 98 remaining FlhC homologues , using the
MUSCLE alignment described above ( Table S4 ) , we determined whether E. coli K-12 Asp178 is conserved .
We used the RSAT `` Comparative Genomics/Get Orthologs '' tool ( Medina-Rivera et al. , 2015 ) to extract intergenic sequence downstream of flhC for the 98 FlhC homologues from genomes where flhC and motA are adjacent genes .
We discarded intergenic sequences < 50 bp .
We used MEME ( v4 .12.0 , default settings , except we selected the `` look on given strand only '' option ; ( Bailey and Elkan , 1994 ) ) to identify enriched sequence motifs in intergenic regions from species where FlhC Asp178 is conserved ( n = 55 ) or is not conserved ( n = 43 ) ,
Motility assays were performed as previously described (Fitzgerald et al., 2014).
Estimating the number of single base substitutions that would create a new FliA site in E. coli
We used the E. coli FliA PSSM ( Fitzgerald et al. , 2014 ) to calculate motif scores for all 27mer sequences in the
E. coli MG1655 genome .
For each score window between integer values ( e.g. scores between 10 and 11 , scores between 11 and 12 , etc. ) , we determined the frequency of sequences that represent actual FliA binding sites , as determined previously by ChIP-seq ( Fitzgerald et al. , 2014 ) .
We then calculated motif scores for every 27mer in the genome with every possible single base substitution ( i.e. 81 scores for each sequence ) .
We binned scores in whole integer windows ( e.g. a bin for scores between 10 and 11 , a bin for scores between 11 and 12 , etc. ) and used the frequencies calculated for actual sites to estimate the number of mutated 27mers that would represent
Raw ChIP-seq and RNA-seq data are available from the EBI ArrayExpress repository using accession numbers
This work was funded by the National Institutes of Health through the NIH Director 's New Innovator Award
Program , 1DP2OD007188 ( JTW ) and through grant 5R01GM114812 ( JTW ) .
This material is based on work supported by the National Science Foundation Graduate Research Fellowship under grant number DGE ¬
1060277 ( DMF ) .
DMF was also supported by National Institutes of Health training grant T32AI055429 .
The funders had no role in study design , data collection and interpretation , or the decision to submit the work for
ACKNOWLEDGEMENTS
We thank the Applied Genomic Technologies Core Facility for Sanger sequencing , the University at Buffalo
Next Generation Sequencing Core Facility for Illumina sequencing , and the Wadsworth Center Media and
Glassware Core facilities for media and glassware .
We thank David Grainger , Keith Derbyshire , and members of the Wade group for helpful discussions .
We thank the anonymous reviewers for their suggestions and
Mol Biol 215: 403–410.
Altschul , S.F. , Madden , T.L. , Schaffer , A.A. , Zhang , J. , Zhang , Z. , Miller , W. , and Lipman , D.J. ( 1997 ) Gapped
BLAST and PSI-BLAST : a new generation of protein database search programs .
Nucleic Acids Res 25 : 3389 --
3402.
Bailey , T.L. , and Elkan , C. ( 1994 ) Fitting a mixture model by expectation maximization to discover motifs in
Beauregard , A. , Smith , E.A. , Petrone , B.L. , Singh , N. , Karch , C. , McDonough , K.A. , and Wade , J.T. ( 2013 )
Identification and characterization of small RNAs in Yersinia pestis. RNA Biol 10: 397–405.
Bono , A.C. , Hartman , C.E. , Solaimanpour , S. , Tong , H. , Porwollik , S. , McClelland , M. , et al. ( 2017 ) Novel
DNA Binding and Regulatory Activities for σ ( 54 ) ( RpoN ) in Salmonella enterica Serovar Typhimurium
14028s. J Bacteriol 199.
Bonocora , R.P. , Fitzgerald , D.M. , Stringer , A.M. , and Wade , J.T. ( 2013 ) Non-canonical protein-DNA
Bonocora , R.P. , Smith , C. , Lapierre , P. , and Wade , J.T. ( 2015 ) Genome-Scale Mapping of Escherichia coli σ54
Reveals Widespread, Conserved Intragenic Binding. PLoS Genet 11: e1005552.
Brewster , R.C. , Weinert , F.M. , Garcia , H.G. , Song , D. , Rydenfelt , M. , and Phillips , R. ( 2014 ) The transcription
Chen , Y.F. , and Helmann , J.D. ( 1992 ) Restoration of motility to an Escherichia coli fliA flagellar mutant by a
Churchman , L.S. , and Weissman , J.S. ( 2011 ) Nascent transcript sequencing visualizes transcription at
Edgar , R.C. ( 2004 ) MUSCLE : multiple sequence alignment with high accuracy and high throughput .
Nucleic
Acids Res 32: 1792–1797.
Feklístov , A. , Sharon , B.D. , Darst , S.A. , and Gross , C.A. ( 2014 ) Bacterial sigma factors : a historical , structural ,
Fitzgerald , D.M. , Bonocora , R.P. , and Wade , J.T. ( 2014 ) Comprehensive Mapping of the Escherichia coli
Flagellar Regulatory Network. PLOS Genet 10: e1004649.
Galagan , J. , Lyubetskaya , A. , and Gomes , A. ( 2013 ) ChIP-Seq and the Complexity of Bacterial Transcriptional
Regulation. Curr Top Microbiol Immunol 363: 43–68.
Galagan , J.E. , Minch , K. , Peterson , M. , Lyubetskaya , A. , Azizi , E. , Sweet , L. , et al. ( 2013 ) The Mycobacterium
Gordienko , E.N. , Kazanov , M.D. , and Gelfand , M.S. ( 2013 ) Evolution of pan-genomes of Escherichia coli ,
Shigella spp., and Salmonella enterica. J Bacteriol 195: 2786–2792.
Grainger , D.C. ( 2016 ) The unexpected complexity of bacterial genomes .
Microbiol Read Engl 162 : 1167 -- 1172 .
Guo , M.S. , Updegrove , T.B. , Gogol , E.B. , Shabalina , S.A. , Gross , C.A. , and Storz , G. ( 2014 ) MicL , a new σE-dependent sRNA , combats envelope stress by repressing synthesis of Lpp , the major outer membrane
Guzman , L.M. , Belin , D. , Carson , M.J. , and Beckwith , J. ( 1995 ) Tight regulation , modulation , and high-level
Hartkoorn , R.C. , Sala , C. , Uplekar , S. , Busso , P. , Rougemont , J. , and Cole , S.T. ( 2012 ) Genome-wide definition in Escherichia coli by the cyclic AMP receptor protein requires an unusual promoter organization .
Mol
Microbiol 75: 1098–1111.
Ide , N. , Ikebe , T. , and Kutsukake , K. ( 1999 ) Reevaluation of the promoter structure of the class 3 flagellar
Jarvik , T. , Smillie , C. , Groisman , E.A. , and Ochman , H. ( 2010 ) Short-term signatures of evolutionary change in
Koo , B.-M. , Rhodius , V.A. , Campbell , E.A. , and Gross , C.A. ( 2009 ) Mutational analysis of Escherichia coli sigma28 and its target promoters reveals recognition of a composite -10 region , comprised of an `` extended -10 ''
Koo , B.-M. , Rhodius , V.A. , Nonaka , G. , deHaseth , P.L. , and Gross , C.A. ( 2009 ) Reduced capacity of alternative sigmas to melt promoters ensures stringent promoter recognition .
Genes Dev 23 : 2426 -- 2436 .
Larson , M.H. , Mooney , R.A. , Peters , J.M. , Windgassen , T. , Nayak , D. , Gross , C.A. , et al. ( 2014 ) A pause sequence enriched at translation start sites drives transcription dynamics in vivo .
Science 344 : 1042 -- 1047 .
Li , J. , Overall , C.C. , Johnson , R.C. , Jones , M.B. , McDermott , J.E. , Heffron , F. , et al. ( 2015 ) ChIP-Seq Analysis of the σE Regulon of Salmonella enterica Serovar Typhimurium Reveals New Genes Implicated in Heat Shock
Liu , X. , and Matsumura , P. ( 1996 ) Differential regulation of multiple overlapping promoters in flagellar class II
Lozada-Chávez , I. , Janga , S.C. , and Collado-Vides , J. ( 2006 ) Bacterial regulatory networks are extremely
McClure, R., Balasubramanian, D., Sun, Y., Bobrovskyy, M., Sumby, P., Genco, C.A., et al. (2013)
Computational analysis of bacterial RNA-Seq data. Nucleic Acids Res 41: e140.
Medina-Rivera , A. , Defrance , M. , Sand , O. , Herrmann , C. , Castro-Mondragon , J.A. , Delerce , J. , et al. ( 2015 )
RSAT 2015: Regulatory Sequence Analysis Tools. Nucleic Acids Res 43: W50-56.
Melamed , S. , Peer , A. , Faigenbaum-Romm , R. , Gatt , Y.E. , Reiss , N. , Bar , A. , et al. ( 2016 ) Global Mapping of
Small RNA-Target Interactions in Bacteria. Mol Cell 63: 884–897.
Paget , M.S. ( 2015 ) Bacterial Sigma Factors and Anti-Sigma Factors : Structure , Function and Distribution .
Biomolecules 5: 1245–1265.
Park , K. , Choi , S. , Ko , M. , and Park , C. ( 2001 ) Novel sigmaF-dependent genes of Escherichia coli found using
Patenge , N. , Pappesch , R. , Khani , A. , and Kreikemeyer , B. ( 2015 ) Genome-wide analyses of small non-coding
RNAs in streptococci. Front Genet 6: 189.
Perez , J.C. , and Groisman , E.A. ( 2009 ) Evolution of transcriptional regulatory circuits in bacteria .
Cell 138 :
233–244.
Peters , J.M. , Mooney , R.A. , Grass , J.A. , Jessen , E.D. , Tran , F. , and Landick , R. ( 2012 ) Rho and NusG suppress
Raghavan , R. , Sloan , D.B. , and Ochman , H. ( 2012 ) Antisense Transcription Is Pervasive but Rarely Conserved
Reppas , N.B. , Wade , J.T. , Church , G. , and Struhl , K. ( 2006 ) The transition between transcriptional initiation
Transcription Start Sites within Genes across a Bacterial Genus. mBio 5: e01398-14.
Sharma , C.M. , and Vogel , J. ( 2014 ) Differential RNA-seq : the approach behind and the biological insight
Shimada , T. , Ishihama , A. , Busby , S.J. , and Grainger , D.C. ( 2008 ) The Escherichia coli RutR transcription
Singh , S.S. , Singh , N. , Bonocora , R.P. , Fitzgerald , D.M. , Wade , J.T. , and Grainger , D.C. ( 2014 ) Widespread
Stafford , G.P. , Ogi , T. , and Hughes , C. ( 2005 ) Binding and transcriptional activation of non-flagellar genes by
Stergachis , A.B. , Haugen , E. , Shafer , A. , Fu , W. , Vernot , B. , Reynolds , A. , et al. ( 2013 ) Exonic transcription
Stringer , A.M. , Currenti , S.A. , Bonocora , R.P. , Petrone , B.L. , Palumbo , M.J. , Reilly , A.E. , et al. ( 2014 )
Genome-Scale Analyses of Escherichia coli and Salmonella enterica AraC Reveal Non-Canonical Targets and
Stringer , A.M. , Singh , N. , Yermakova , A. , Petrone , B.L. , Amarasinghe , J.J. , Reyes-Diaz , L. , et al. ( 2012 )
FRUIT , a scar-free system for targeted chromosomal mutagenesis , epitope tagging , and promoter replacement
Thomason , M.K. , Bischler , T. , Eisenbart , S.K. , Förstner , K.U. , Zhang , A. , Herbig , A. , et al. ( 2015 ) Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in
Toffano-Nioche , C. , Nguyen , A.N. , Kuchly , C. , Ott , A. , Gautheret , D. , Bouloc , P. , and Jacq , A. ( 2012 )
Transcriptomic profiling of the oyster pathogen Vibrio splendidus opens a window on the evolutionary
Wade , J.T. , and Grainger , D.C. ( 2014 ) Pervasive transcription : illuminating the dark matter of bacterial
Wade , J.T. , Reppas , N.B. , Church , G.M. , and Struhl , K. ( 2005 ) Genomic analysis of LexA binding reveals the permissive nature of the Escherichia coli genome and identifies unconventional target sites .
Genes Dev 2619 --
2630.
Wade , J.T. , Roa , D.C. , Grainger , D.C. , Hurd , D. , Busby , S.J.W. , Struhl , K. , and Nudler , E. ( 2006 ) Extensive
Wang , S. , Fleming , R.T. , Westbrook , E.M. , Matsumura , P. , and McKay , D.B. ( 2006 ) Structure of the
Escherichia coli FlhDC complex , a prokaryotic heteromeric regulator of transcription .
J Mol Biol 355 : 798 -- 808 .
Wunderlich , Z. , and Mirny , L.A. ( 2009 ) Different gene regulation strategies revealed by analysis of binding
Xing , K. , and He , X. ( 2015 ) Reassessing the `` duon '' hypothesis of protein evolution .
Mol Biol Evol 32 : 1056 --
Yu , H.H.Y. , and Tan , M. ( 2003 ) σ28 RNA polymerase regulates hctB , a late developmental gene in Chlamydia .
( speA ) ( ybhK ) ( yqjA ) ( holA ) ( otsA ) ( rmuC ) ( hslU ) ( uhpC ) ( ydcU ) ( yjiN ) - - ✓ ( serT ) hyaA - -- Intergenic ( between convergent genes ) tsr/yjiZ - ✓ ✓ insB-4 / cspH - -- 1 Genes associated with FliA binding sites .
Genes in parentheses have an internal FliA binding site ; genes not in parentheses start < 300 bp downstream of a FliA binding site and are orientated in the same direction as the putative promoter .
Asterisks indicate FliA binding sites previously reported to be associated with transcription of an mRNA ( Fitzgerald et al. , 2014 ) .
2 Check marks indicate a significant difference in β-galactosidase activity between + fliA and ΔfliA cells for the corresponding lacZ transcriptional fusion ( Figure 1 ) .
3 Check marks indicate association with a nearby TSS .
4 Check marks indicate a downstream : upstream ( relative to the putative TSS ) coverage ratio ≥ 2 .
5 Check marks indicate regulation of the corresponding gene ( s ) , as determined using RNA-seq ( Fitzgerald et al. , 2014 ) .
flgM flgK trg ycgR cheM ( motB ) ( flhC ) motA fliA fliC fliD STM14_2852 ( pepB ) ( STM14_3340 ) TAAAGTTTATGCCTCAAGTGTCGATAAC ( 280 ) 1954 fljBA STM14_3817 STM14_3893 ( STM14_3895 ) TAAAGATAAATAGATTAGCGCCGAAATA aer 3504766 ( arcB ) 3559655 ( yhdA ) 3801092 yhjH 4531425 ( nrfB ) 4802894 tsr 1 Genome coordinate of the ChIP-seq peak center .
Coordinates are relative to the 14028s chromosomal reference sequence ( NC_003198 .1 ) .
2 Fold Above Threshold ( FAT ) score , a measure of relative ChIP-seq enrichment .
3 Genome coordinate of the sequence motif identified using MEME .
Coordinates are relative to the 14028s chromosomal reference sequence ( NC_003198 .1 ) .
4 Genomic strand of the sequence motif identified using MEME .
5 Sequence of the motif identified using MEME .
6 For intergenic FliA binding sites , the downstream gene is listed .
Genes containing intragenic FliA binding sites are listed in parentheses .
Underlining indicates that the putative promoter is in the antisense orientation relative to the overlapping gene .
If a gene start is located within 300 bp of a putative intragenic FliA promoter , that gene name is listed as well .
7 Normalized expression values for the indicated genes , as determined by RNA-seq .
8 Asterisks indicate significant differential expression between wild-type and ΔfliA cells ( q < 0.01 ) .
Schematic of transcriptional fusions of potential FliA promoters to the lacZ reporter gene .
For all FliA binding sites identified in a previous study , transcriptional fusions to lacZ were constructed using positions -200 to +10 relative to the predicted TSS based on the previously identified FliA binding motif ( Fitzgerald et al. , 2014 ) .
( B ) β-galactosidase activity for transcriptional fusions for FliA binding sites in intergenic regions upstream of genes , for wild-type ( wt ; DMF122 ; green bars ) and ΔfliA ( DMF123 ; gray bars ) cells .
Reporter fusions that showed significantly lower β-galactosidase activity in ΔfliA cells than wild-type cells ( t-test p < 0.05 ) are indicated .
The genes downstream of the FliA binding sites are listed on the x-axis .
( C ) As above , but for FliA binding sites within genes or between convergently transcribed genes .
Genes containing FliA binding sites are listed on the x-axis in parentheses .
Genes not in parentheses are downstream of the corresponding FliA binding transcriptome datasets .
( A ) For each FliA binding site identified previously ( Fitzgerald et al. , 2014 ) , we determined the distance to each downstream TSS identified previously ( Thomason et al. , 2015 ) within a 500 bp range .
The frequencies of these distances are plotted in 10 bp bins ( green line ) , with the inset showing the frequency of binding sites 10-30 bp upstream of TSSs with a bin size of 1 bp .
The gray line shows the frequency of distances from FliA binding sites to a control , randomized TSS dataset ( see Methods ) .
( B )
Normalized sequence read coverage from published NET-seq data ( Larson et al. , 2014 ) ( see Methods ) for each previously identified FliA binding site ( Fitzgerald et al. , 2014 ) , plotted 100 bp upstream and downstream of the known/predicted TSS .
Predicted TSSs are indicated by the dashed vertical line .
Darker green indicates higher
Heat-map depicting the match to the FliA consensus binding site for regions in the genomes of a range of bacterial species , where the region analyzed is homologous to a region surrounding a FliA binding site in E. coli .
Genera are listed on the left .
E. coli genes associated with the binding sites are listed across the top of the heat-map .
FliA binding sites are grouped by location/orientation category , as indicated by category labels across the bottom of the heat-map .
Genes containing FliA binding sites are listed in parentheses .
Genes not in parentheses are downstream of the corresponding FliA binding site .
The color scale indicating the strength of the sequence match is shown next to the heat-map .
Empty squares in the heat-map indicate that the corresponding genomic region in E. coli is not sufficiently conserved in the species being analyzed .
( B )
Conservation of FliA sites across 9,432 E. coli strains .
For each site from E. coli K-12 , conservation was determined at each position within the site for all strains of E. coli where the surrounding sequence is conserved .
Thus , the fraction of genomes in which each base is conserved was calculated .
Values plotted represent the average ( mean ) level of conservation for ( i ) 18 FliA sites that represent promoters for mRNAs
( filled circles ; Table 1 ) , and ( ii ) the remaining 34 FliA sites ( empty circles ) .
The FliA binding motif is shown read coverage across the S. Typhiumurium genome for a FliA ChIP-seq dataset .
Annotated genes are indicated by gray bars .
The green graph shows relative sequence read coverage , with `` spikes '' corresponding to sites of
FliA association .
( B ) Pie-chart showing the distribution of identified FliA binding sites relative to genes .
`` Inside '' = FliA binding within a gene .
`` Upstream '' = FliA binding upstream of a gene .
`` Inside + us '' = FliA binding within a gene but within 300 bp of a downstream gene start .
( C ) Enriched sequence motif associated with FliA binding sites identified by ChIP-seq .
( D ) Distribution of motifs relative to ChIP-seq peak centers for all FliA binding sites identified by ChIP-seq .
Motifs are enriched in the region ~ 25 bp upstream of the peak normalized expression ( see Methods ) for each gene in S. Typhimurium for wild-type cells ( 14028s ; x-axis ) or
ΔfliA cells ( DMF088 ; y-axis ) .
Gray dots represent genes that are not associated with a FliA binding site and are not significantly differentially expressed between wild-type and ΔfliA cells .
Black dots represent genes that are not associated with a FliA binding site and are significantly differentially expressed between wild-type and
ΔfliA cells .
Green circles represent genes that are associated with an upstream FliA binding site .
Green triangles represent genes that are associated with an internal FliA binding site .
Filled green circles/triangles indicate genes that are significantly differentially expressed between wild-type and ΔfliA cells .
Empty green circles/triangles represent genes that are not differentially expressed between wild-type and ΔfliA cells .
conservation of FlhC amino acid sequence between E. coli and 51 other γ-proteobacterial species .
The graph indicates the level of identity across all species analyzed for each amino acid in FlhC ; data for Ala177 and
Asp178 are highlighted in red .
The nucleotide sequence of flhC in the motA promoter region is indicated , aligned with the previously reported FliA binding motif logo ( Fitzgerald et al. , 2014 ) .
Codons 177 and 178 are shown in red .
( B ) Motility assay for ΔflhC : : thyA E. coli ( CDS105 ) containing either empty vector ( pBAD30 ) , or plasmid expressing wild-type FlhC ( pCDS043 ) or D178A mutant FlhC ( pCDS044 ) .
Dashed red circles indicate the inoculation sites .
Plates were incubated for 7 hours .
The schematic to the left of the plate image shows how the strain was constructed .
( C ) Enriched sequence motif found in the flhC-motA intergenic regions of species in which FlhC Asp178 is not conserved .
This motif is a close match to the known FliA binding site consensus .
sequences between flhC and motA for selected proteobacterial species where Asp178 of FlhC is conserved/not conserved .
Putative FliA promoters identified by MEME for species where Asp178 of FlhC is not conserved are