27900321.txt
23.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
Genome-Wide Transcriptional
1 Laboratory of Molecular Biology , National Institutes of Health , National Cancer Institute , Bethesda , MD , USA , 2 Microbiomics and Immunity Research Center , Korea Research Institute of Bioscience and Biotechnology , Daejeon , Korea , 3 Laboratory of Metabolism , National Institutes of Health , National Cancer Institute , Bethesda , MD , USA , 4 Wadsworth Center , New York State Department of Health , Albany , NY , USA , 5 Department of Biomedical Sciences , School of Public Health , University of Albany , Albany , NY , USA , 6 Gene Regulation and Chromosome Biology Laboratory , National Institutes of Health , National Cancer Institute , Center for Cancer Research , Frederick , MD , USA , 7 DNASTAR , Inc. , Madison , WI , USA
Keywords: GalR regulon, mega-loop, ChIP-chip, nucleoid, DNA superhelicity
INTRODUCTION
The 4.6 Mb Escherichia coli chromosomal DNA is packaged into a small volume ( 0.2 -- 0.5 µm3 ) for residing inside a cell volume of 0.5 -- 5 3 µm ( Loferer-Krossbacher et al. , 1998 ; Skoko et al. , 2006 ; Luijsterburg et al. , 2008 ) .
It has been suggested that a bacterial chromosome has a 3-D structure that dictates the entire chromosome 's gene expression pattern ( Kar et al. , 2005 ; Macvanin and Adhya , 2012 ) .
The chromosome structure and the associated volume are defined and environmentdependent .
The compaction of the DNA into a structured chromosome ( nucleoid ) is facilitated by several architectural proteins , often called `` nucleoid-associated proteins '' ( NAPs ) .
NAPs are well-characterized bacterial histone-like proteins such as HU , H-NS , Fis , and Dps ( Ishihama , 2009 ) .
For example , deletion of the gene encoding the NAP HU leads to substantial changes in cell volume and in the global transcription profile , presumably due to changes in chromosome architecture ( Kar et al. , 2005 ; Oberto et al. , 2009 ; Priyadarshini et al. , 2013 ) .
A recent and surprising addition to the list of NAPs in E. coli is the sequence-specific DNA-binding transcription regulatory protein ,
GalR ( Qian et al. , 2012 ) .
In contrast , related DNA-binding proteins PurR , MalT , FruR , and TyrR do not appear to affect the chromosome structure ( Qian et al. , 2012 ) .
Here , we discuss experimental results that led us to explore the idea that GalR also regulates transcription at a global scale through DNA architectural changes .
GalR regulates transcription of the galETKM , galP , galR , galS , and mglBAC transcripts ( Figure 1 ) .
These genes all encode proteins involved in the transport and metabolism of D-galactose .
Moreover , GalR controls expression of the chiPQ operon , which encodes genes involved in the transport of chitosugar .
The galETKM operon ( Figure 1 ) is transcribed as a polycistronic mRNA from two overlapping promoters , P1 ( +1 ) and P2 ( − 5 ) ( Musso et al. , 1977 ; Aiba et al. , 1981 ) .
GalR regulates P1 and P2 promoters differentially .
GalR binds two operators , OE , located at position − 60.5 , and OI , located at +53.5 ( Irani et al. , 1983 ; Majumdar and Adhya , 1984 , 1987 ) .
Binding of GalR to OE represses P1 and activates P2 by arresting RNA polymerase , and facilitating the step of RNA polymerase isomerization , respectively ( Roy et al. , 2004 ) .
When GalR binds to both OE and OI , which are 113 bp apart and do not overlap with the two promoters , it prevents transcription initiation from both P1 and P2 ( Aki et al. , 1996 ; Aki and Adhya , 1997 ; Semsey et al. , 2002 ; Roy et al. , 2005 ) .
Mechanistically , two
DNA-bound GalR dimers transiently associate , creating a loop in the intervening promoter DNA segment .
Kinking at the apex of the loop facilitates binding of HU , which in turn stabilizes the loop ( Figure 2 ; Kar and Adhya , 2001 ) .
The DNA structure in the looped form is topologically closed and binds RNA polymerase , but does not allow isomerization into an actively transcribing complex ( Choy et al. , 1995 ) .
Following the example of GalR-mediated DNA loop formation by interaction of GalR bound to two operators in the galE operon , and considering the fact that GalR operators in the galP , mglB , galS , galR , and chiP promoters are scattered around the chromosome , we hypothesized that GalR may oligomerize while bound to distal sites , thereby forming much larger DNA loops ( `` mega-loops '' ) .
We employed the Chromosome Conformation Capture ( 3C ) method to investigate interactions between distal GalR operators ( Dekker et al. , 2002 ) .
Thus , we showed that GalR does indeed oligomerize over long distances , resulting in the formation of mega-loops .
Moreover , our data suggested the existence of other unidentified GalR binding sites around the chromosome , with these novel sites also participating in long-distance interactions ( Qian et al. , 2012 ) .
Figure 3 shows in a cartoon from the demonstrable GalR-mediated DNA-DNA connections as listed in Table 1 .
Although , we originally proposed that DNA-bound GalR-mediated mega-loops may serve to increase the local concentrations of GalR around their binding sites for regulation of the adjacent promoters ( Oehler and Muller-Hill , 2010 ) , global regulation of gene expression due to change in chromosome structure may be another consequence of mega-loop formation .
We propose that GalR-mediated mega-loop formation results in the formation of topologically independent DNA domains , with the level of superhelicity in each domain influencing transcription of the local promoters .
Bacterial and Bacteriophage Strains
Bacteriophage P1 lysates of galR : : kanR ( from Keio collection ; ( Baba et al. , 2006 ) ) were made and E. coli K-12 MG1655 galR deletion strains were constructed from MG655 by bacteriophage P1 transduction using the lysate .
Cells were then grown in 125 ml corning flasks ( Corning © R 430421 ) containing 30 ml of M63 minimal medium plus D-fructose ( final concentration 0.3 % ) at 37 ◦ C with 230 rpm shaking .
At OD600 0.6 , cell cultures were separated into two flasks .
Subsequently , D-galactose ( final
3072949 3072964 O ( F25-1 ) CTTAAATCGATTGCCG
3072989 3073004 O ( F25-2 ) TTTGAAGCGATTGCGG
Connections were detected among these sites except galEE and galEI by 3C assays .
The first seven operators that showed connections by 3C were known before .
The ones named as F were discovered during the 3C studies ( Qian et al. , 2012 ) .
concentration 0.3 % ) or water was added and cells were cultivated for an additional 1.5 h at 37 ◦ C. E. coli MG1655 galR-TAP ( AMD032 ) was constructed by bacteriophage P1 transduction of the kanR-linked TAP tag cassette from DY330 galR-TAP ( Butland et al. , 2005 ) .
The kanR cassette was removed using pCP20 , as described previously ( Datsenko and Wanner , 2000 ) .
E. coli MG1655 galR-FLAG3 ( AMD188 ) was constructed using FRUIT ( Stringer et al. , 2012 ) .
RNA Isolation
Cell cultures were placed on ice and RNAprotectTM Bacteria Reagent ( Qiagen © R 76506 ) was added to stabilize the RNA ( Lee et al. , 2014 ) .
Cells were harvested for RNA purification by RNeasy © Mini Kit ( Qiagen R R © 74104 ) following the manufacturer 's recommendations .
RNA concentrations and TM purity were measured using a Thermo Scientific NanoDrop 1000 .
Further sample processing was performed according to the Affymetrix GeneChip © R Expression Analysis Technical Manual , Section 3 : Prokaryotic Sample and Array Processing ( 701029 Rev. 4 ) .
Isolated RNA ( 10 µg ) was used for Random Primer cDNA synthesis using SuperScript IITM Reverse Transcriptase ( Invitrogen Life Technologies 18064-071 ) .
The reaction mixture was treated with 1N NaOH to degrade any remaining RNA and treated with 1N HCl to neutralize the NaOH .
Synthesized cDNA was then purified using MinElute © R PCR Purification columns ( Qiagen © 28004 ) .
Purified cDNA concentration R and purity were measured using a Thermo Scientific NanoDropTM 1000 .
Purified cDNA was fragmented to between 50 and 200 bp by 0.6 U / µg of DNase I ( Amersham Biosciences 27-0514-01 ) ◦ for 10 min at 37 C in 1X One-Phor-All buffer ( Amersham Biosciences 27-0901-02 ) .
Heat inactivation of the DNase I enzyme was performed at 98 ◦ C for 10 min .
Fragmented cDNA was then 3 ′ termini biotin labeled using the GeneChip © DNA Labeling Reagent ( Affymetrix R 900542 ) and 60 U of Terminal Deoxynucleotidyl Transferase ( Promega M1875 ) at 37 ◦ C for 60 min .
The labeling reaction was then stopped by the addition of 0.5 M EDTA .
Microarray Hybridization
Labeled cDNA fragments ( 3 µg ) were then hybridized for 16 h ( 60 rpms ) at 45 ◦ C to tiling array chips ( Ecoli_Tab520346F ) purchased from Affymetrix ( Santa Clara , CA ) .
The chips have 1,159,908 probes in 1.4 cm × 1.4 cm and a 25-mer probe every 8 bps in both strands of whole E. coli genome .
In addition , the probes are also overlapped by 4 bps with other strand probes .
Each 25-mer DNA probe in the tiling array chip are 8 bp apart from the next probe .
Probes are designed to cover the whole E. coli genome .
Microarray: Washing and Staining
The chips were then washed with Wash Buffer A : NonStringent Wash Buffer ( 6X SSPE , 0.01 % Tween-20 ) .
Wash Buffer B : ( 100 mM MES , 0.1 M [ Na + ] and 0.01 % Tween-20 ) and stained with Streptavidin Phycoerythrin ( Molecular Probes S-866 ) and anti-streptavidin antibody ( goat ) , biotinylated ( Vector Laboratories BA-0500 ) on a Genechip Fluidics Station 450 ( Affymetrix ) according to washing and staining protocol , ProkGE-WS2_450 .
Microarray: Scanning and Data Analysis
Hybridized , washed , and stained microarrays were scanned using a Genechip Scanner 3000 ( Affymetrix ) .
Standardized signals , for each probe in the arrays , were generated using the MAT analysis software , which provides a model-based , sequencespecific , background correction for each sample ( Johnson et al. , 2006 ) .
A gene specific score was then calculated for each gene by averaging all MAT scores ( natural log ) for all probes under the annotated gene coordinates .
Gene annotation was from the ASAP database at the University of WisconsinMadison , for E. coli K-12 MG1655 version m56 ( Glasner et al. , 2003 ) .
Data were graphed with ArrayStar © , version 2.1 .
R DNASTAR .
Madison , WI .
The tiling array data was submitted to NCBI Gene Expression Omnibus .
The accession number is GSE85334 .
ChIP-Chip Assays
MG1655 galR-TAP ( AMD032 ) cells were grown in LB at 37 ◦ C to an OD600 of ∼ 0.6 .
ChIP-chip was performed as described previously ( Stringer et al. , 2014 ) .
Data analysis was performed as described previously except that probes were ignored only if they had a score of < 100 pixels , indicating regions that are likely missing from the genome ( Stringer et al. , 2014 ) .
Adjacent probes scoring above the threshold for being called as being in GalR-bound regions were merged , and the highest-scoring probe was selected as the `` peak position . ''
The closely spaced peaks upstream of mglB and galS were manually separated .
The ChIP-chip data was submitted to the EBI Array Express repository .
The accession number is E-MTAB-4903 .
Identification of an Enriched Sequence Motif from ChIP-Seq Data
For each peak position , we extracted genomic DNA sequence using the following formulae to determine the upstream and downstream coordinates : upstream coordinate : UP − ( ( UP − UP − 1 ) ∗ ( SP − 1 / SP ) ) ; downstream coordinate : DP − ( ( DP +1 − DP ) ∗ ( SP +1 / SP ) ) ; where S = probe score , U = genome coordinate corresponding to the upstream end of a probe , D = genome coordinate corresponding to the downstream end of a probe , P = peak probe , P − 1 = probe upstream of peak , and P +1 = probe downstream of peak .
We used MEME ( version 4.11.2 , default parameters except any number of motif repetitions was allowed ) to identify an enriched sequence motif ( Bailey and Elkan , 1994 ) .
ChIP-qPCR
MG1655 galR-FLAG3 ( AMD188 ) cells were grown in LB at 37 ◦ C to an OD600 of 0.6 -- 0.8 .
ChIP-qPCR was performed as described previously ( Stringer et al. , 2014 ) .
The motifs in bold letters are also present in Table S2.
RESULTS
In silico Identification of Novel GalR Target Genes in E. coli A consensus sequence of GalR binding sites from the previously known functional 9 operators in the gal regulon ( galE , galP , mglB , galS , and galR promoters ; Figure 1 ) appears to be a 16-bp hyphenated dyad symmetry sequence with the center between 1 16 positions 8 and 9 : GTGNAANC.GNTTNCAC ( with N being any nucleotide ; Weickert and Adhya , 1993a ) .
Genetic analysis showed that mutations at any of the positions 3 , 5 , 9 , and 15 ( labeled in bold ) create a functionally defective operator ( Adhya and Miller , 1979 ) .
Therefore , we used a motif in which nucleotides at positions 3 , 5 , 9 , and 15 were fixed to search through the whole genome of E. coli ( NC_000193 .3 ) ( Baba et al. , 2006 ) for putative GalR operators , allowing two mismatches at other non-N positions as described ( Qian et al. , 2012 ) .
Thus , we found 165 potential GalR operators distributed across the genome ( Table S1 ) .
Further analysis of the original 9 GalR-target operators sequences with critical information content was conducted ( Figure 1 ; Schneider and Mastronarde , 1996 ) .
A unique alignment of 42 bp length was obtained ; the information content of the optimally aligned sites was Rsequence = 16.1 ± 0.7 bits/site for the 42 bp sequence range ( Shannon , 1948 ; Pierce , 1980 ; Schneider et al. , 1986 ) .
The information content needed to find these 9 sites in the 4,641,652 bp E. coli genome ( NC_000913 .3 ) is Rfrequency = 18.98 bits/site ; the information content in the sites is not suficient for them to be found in the genome , Rsequence/Rfrequency = 0.85 ± 0.04 , so the binding sites do not have enough information content for them to be located in the genome ( Schneider et al. , 1986 ; Schneider , 2000 ) .
This result implies that there could be 66 ± 32 sites in the genome .
As shown in Figure 4 , the sequence logo of the binding sites covers the DNase I protection segment ( Majumdar and Adhya , 1987 ; Schneider and Stephens , 1990 ) .
There may be additional conservation near a DNase I-hypersensitive site in a major groove one helical turn from the central two major grooves bound by GalR ( − 16 and +17 ; Figure 4 ) .
The sequence conservation in the center of the site at bases 0 and 1 exceeds the sine wave , indicating that GalR binds to non-B-form DNA
( Schneider , 2001 ) as was previously suggested ( Majumdar and Adhya , 1989 ) .
An individual information weight matrix corresponding to positions − 20 to +21 of the logo in Figure 4 was created and scanned across the E. coli genome ( Schneider , 1997 ) .
Sixty sites were identified that contain more than 9.4 bits , the lowest information content of the biochemically proven sites .
The sequences of novel GalR predicted sites corresponding to the logo are summarized in Table 2 .
Rfrequency for these sites in the genome is 16.24 bits/site , which is close to the observed 16.3 ± 0.1 bits/site from all the predicted genomic sites .
Functional Analysis of the Putative GalR Binding Sites Using ChIP-chip Assays
For the functional analysis of the putative binding sites , a ChIP-chip assay was performed to detect GalR target sequences genome-wide in vivo ( Collas , 2010 ; Wade , 2015 ) .
In this ChIP-chip assay the binding of C-terminally TAP ( tandem afinity purification ) - tagged GalR ( tagged at its native locus in an unmarked strain ) was mapped across the E. coli genome .
The experimental data resulting from ChIP-chip analysis were validated by quantitative real-time PCR ( ChIP/qPCR ) .
To demonstrate that the ChIP signal was not an artifact of the TAP tag , we constructed an unmarked derivative of E. coli MG1655 that expressed a C-terminally FLAG3-tagged GalR from its native locus .
We selected six ( ytfQ , galE , purR , talB , cyaA , and chiP ) sites for validation , including ytfQ , talB , and cyaA that had not been described or predicted previously .
In all cases , we detected significant signal of GalR binding indicating that these are genuine sites of GalR binding ( Figure 5 ) .
The inferred binding sites from ChIP-chip assays are listed in Table 3 .
We identified 15 GalR-bound regions , four of which contain two operators .
These include 8 known operators ( in galE , galP , galS , galR , chip , and mglB ; Weickert and Adhya , 1993b ; Plumbridge et al. , 2014 ) .
Thirteen of the 15 putative GalR-bound regions overlap an intergenic region upstream of a gene start .
This is a strong enrichment over the number expected by chance ( only ∼ 12 % of the genome is intergenic ) .
Global Transcription Profile in the Presence and Absence of GalR
Since both in silico investigation and ChIP-chip assays suggested that the regulatory role of GalR goes beyond D-galactose metabolism , we used transcriptome profiling to gain further insight into the impact of GalR on genome-wide transcription .
To evaluate the effect of galR deletion on global gene expression patterns , we compared the ratio of RNA isolated from a ∆ galR mutant to that isolated from wild-type cells , using DNA tiling microarrays ( Tokeson et al. , 1991 ) .
The results of the transcriptional analysis are displayed in the MAT plot shown in Figure 6 .
For all analysis , we arbitrarily selected a stringent ratio cut-off of 3 .
We identified 238 genes with values exceeding this cut-off ( Table S2 ) .
These 238 genes are transcribed from 158 promoters .
Three transcripts ( 5 genes ) of the 158 promoters are up-regulated ( GalR acting as a repressor ) and 155 transcripts ( 233 genes ) are down-regulated ( GalR acting as an activator ; Table S2 ) .
Interestingly , several genes including mglB are dys-regulated by GalR but fall outside of the cut-off range .
All three ( galP , galP1 , and galP2 ) of the up-regulated promoters have adjacent operators .
Of the 155 down-regulated promoters , 4 promoters contain adjacent operators and the remaining 151 do not .
DISCUSSION
Using a combination of bioinformatic and experimental approaches we identified many putative novel GalR operators in the E. coli genome .
As expected , several of these putative operators were identified by both information theory and ChIP-chip assays , demonstrating that they represent genuine GalR binding sites .
Thus , we have substantially expanded the known GalR regulon .
Surprisingly , our data suggest that GalR , a regulator of D-galactose metabolism , also regulates the expression of genes involved in other cellular processes .
Interestingly , three of the putative novel GalR target genes -- cytR , purR , and adiY -- encode transcription factors , suggesting that GalR may be part of a more complex regulatory network .
Moreover , putative GalR operators upstream of cytR and purR overlap with operators for CytR and PurR , respectively , indicating combinatorial regulation of these genes ( Meng et al. , 1990 ; Rolfes and Zalkin , 1990 ; Mengeritsky et al. , 1993 ) .
Despite our identification of GalR operators with high confidence upstream of genes mentioned above , our expression microarray data show little or no regulation of these genes by GalR .
We propose that regulation of these genes by GalR is conditionspecific , requiring input from additional regulatory factors .
Role of GalR in Gene Regulation
DNA tiling array analysis revealed that the transcription of a surprisingly large number of promoters ( 158 ) in E. coli is dysregulated by deletion of the galR gene .
On the other hand , we identified 165 established or potential GalR operators in the chromosome , 76 of which are located between − 200 to +400 bp from the tsp of promoters ( cognate ) , and the other 89 operators are not ( Table S1 ) .
We called the former group of operators , `` Gene Regulatory Sites '' ( GRS , listed in Table 4 ) .
Consistent with a previous proposal ( Macvanin and Adhya , 2012 ) , we believe that 89 non-cognate operators around the chromosome are playing an architectural role in chromosome organization .
The unattached operators would be referred to as `` Chromosome Anchoring Sites '' ( CAS ) .
Some of the sites may serve as both GRS and CAS .
The 76 ( 46 % ) GRS and 89 ( 54 % ) CAS are shown in Table S1 .
Seventy-six GRS include 9 previously known operators of the gal regulon ( see Figure 1 ) ; the other 67 , which control promoters , were not known previously .
The discovery of new GRS indicates that GalR , a well-known regulator of D-galactose metabolism , also regulates the expression of other genes .
Among the new GRS , 3 ( in yaaJ , purR , and ytfQ promoters ) were confirmed by in vivo DNA-binding ( ChIP-chip assays ) as shown in Table 3 .
The salient features of our findings presented in this paper are shown schematically in Figure 7 .
Although we identified 158 transcripts whose expression was regulated by GalR , very few of these are associated with a putative GalR operator identified in silico and/or ChIP-chip assays , strongly suggesting that the majority of regulation by GalR occurs indirectly .
Based on our earlier observation that GalR mediates mega-loop formation , we propose that long-range oligomerization of GalR indirectly regulates transcription by altering chromosome structure .
There are at least three possible mechanisms for such regulation : indirect control , enhancer activity , and modulation of DNA superhelicity .
In the indirect control model , GalR directly regulates another regulator , such as PurR or CytR , and the downstream regulator directly regulates other genes .
The regulation by GalR is indirect , but occurs by a classical regulatory mechanism .
In the enhancer activity model , GalR stimulates transcription of some target genes by binding to a distal site and forming an enhancer-loop with a protein bound to the promoter region .
Examples of enhancer activity have been described before for some prokaryotic and many eukaryotic promoters ( Rombel et al. , 1998 ; Schaffner , 2015 ) .
In the DNA superhelicity modulation model , GalR creates DNA topological domains by mega-loop formation and defines local chromosomal superhelicity by GalR-GalR interactions between distally bound dimers .
The strength of a promoter is usually defined by superhelical nature of the DNA ( Pruss and Drlica , 1989 ; Lim et al. , 2003 ) .
We propose that GalR entraps different amount of superhelicity in different topological domains and thus controls transcription of the constituent promoters .
In the absence of GalR such domains are not formed resulting in a change in local DNA superhelicity , and thus a change in the strength of the constituent promoters .
In this model , GalR protein indirectly regulates gene transcription as an architectural protein .
We are currently studying the regional superhelicities in the entire chromosome in the presence and absence of GalR as well as the implication of genes affected by GalR , but independent of D-galactose metabolism ( Lal et al. , 2016 ) .
AUTHOR CONTRIBUTIONS
ZQ : designed genome-wide sequence analysis , interpreted sequence analysis data and tiling array data ; AT and SL : executed tiling array experiments and data analysis ; XH : executed genome-wide sequence analysis ; TD : integrated tiling array and genome-wide sequence data ; AS and JW : executed ChIP-chip and ChIP-qPCR experiments and data analysis ; DL : data analysis ; TS : executed Information Theory and data analysis ; SA : organized and designed experiments , and data analysis .
All authors contributed to the manuscript preparation .
ACKNOWLEDGMENTS
This work was supported by the Intramural Research Program of the National Institutes of Health , the National Cancer Institute , and the Center for Cancer Research .
The authors have no conflict of interest to declare .
We thank the Wadsworth Center Applied Genomic Technologies Core Facility for assistance with microarrays for ChIP-chip assays .
SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at : http://journal.frontiersin.org/article/10.3389/fmolb .
2016.00074 / full #supplementary - material