27900321.txt 23.8 KB
Genome-Wide Transcriptional
1 Laboratory of Molecular Biology , National Institutes of Health , National Cancer Institute , Bethesda , MD , USA , 2 Microbiomics and Immunity Research Center , Korea Research Institute of Bioscience and Biotechnology , Daejeon , Korea , 3 Laboratory of Metabolism , National Institutes of Health , National Cancer Institute , Bethesda , MD , USA , 4 Wadsworth Center , New York State Department of Health , Albany , NY , USA , 5 Department of Biomedical Sciences , School of Public Health , University of Albany , Albany , NY , USA , 6 Gene Regulation and Chromosome Biology Laboratory , National Institutes of Health , National Cancer Institute , Center for Cancer Research , Frederick , MD , USA , 7 DNASTAR , Inc. , Madison , WI , USA 
Keywords: GalR regulon, mega-loop, ChIP-chip, nucleoid, DNA superhelicity
INTRODUCTION
The 4.6 Mb Escherichia coli chromosomal DNA is packaged into a small volume ( 0.2 -- 0.5 µm3 ) for residing inside a cell volume of 0.5 -- 5 3 µm ( Loferer-Krossbacher et al. , 1998 ; Skoko et al. , 2006 ; Luijsterburg et al. , 2008 ) . 
It has been suggested that a bacterial chromosome has a 3-D structure that dictates the entire chromosome 's gene expression pattern ( Kar et al. , 2005 ; Macvanin and Adhya , 2012 ) . 
The chromosome structure and the associated volume are defined and environmentdependent . 
The compaction of the DNA into a structured chromosome ( nucleoid ) is facilitated by several architectural proteins , often called `` nucleoid-associated proteins '' ( NAPs ) . 
NAPs are well-characterized bacterial histone-like proteins such as HU , H-NS , Fis , and Dps ( Ishihama , 2009 ) . 
For example , deletion of the gene encoding the NAP HU leads to substantial changes in cell volume and in the global transcription profile , presumably due to changes in chromosome architecture ( Kar et al. , 2005 ; Oberto et al. , 2009 ; Priyadarshini et al. , 2013 ) . 
A recent and surprising addition to the list of NAPs in E. coli is the sequence-specific DNA-binding transcription regulatory protein , 
GalR ( Qian et al. , 2012 ) . 
In contrast , related DNA-binding proteins PurR , MalT , FruR , and TyrR do not appear to affect the chromosome structure ( Qian et al. , 2012 ) . 
Here , we discuss experimental results that led us to explore the idea that GalR also regulates transcription at a global scale through DNA architectural changes . 
GalR regulates transcription of the galETKM , galP , galR , galS , and mglBAC transcripts ( Figure 1 ) . 
These genes all encode proteins involved in the transport and metabolism of D-galactose . 
Moreover , GalR controls expression of the chiPQ operon , which encodes genes involved in the transport of chitosugar . 
The galETKM operon ( Figure 1 ) is transcribed as a polycistronic mRNA from two overlapping promoters , P1 ( +1 ) and P2 ( − 5 ) ( Musso et al. , 1977 ; Aiba et al. , 1981 ) . 
GalR regulates P1 and P2 promoters differentially . 
GalR binds two operators , OE , located at position − 60.5 , and OI , located at +53.5 ( Irani et al. , 1983 ; Majumdar and Adhya , 1984 , 1987 ) . 
Binding of GalR to OE represses P1 and activates P2 by arresting RNA polymerase , and facilitating the step of RNA polymerase isomerization , respectively ( Roy et al. , 2004 ) . 
When GalR binds to both OE and OI , which are 113 bp apart and do not overlap with the two promoters , it prevents transcription initiation from both P1 and P2 ( Aki et al. , 1996 ; Aki and Adhya , 1997 ; Semsey et al. , 2002 ; Roy et al. , 2005 ) . 
Mechanistically , two 
DNA-bound GalR dimers transiently associate , creating a loop in the intervening promoter DNA segment . 
Kinking at the apex of the loop facilitates binding of HU , which in turn stabilizes the loop ( Figure 2 ; Kar and Adhya , 2001 ) . 
The DNA structure in the looped form is topologically closed and binds RNA polymerase , but does not allow isomerization into an actively transcribing complex ( Choy et al. , 1995 ) . 
Following the example of GalR-mediated DNA loop formation by interaction of GalR bound to two operators in the galE operon , and considering the fact that GalR operators in the galP , mglB , galS , galR , and chiP promoters are scattered around the chromosome , we hypothesized that GalR may oligomerize while bound to distal sites , thereby forming much larger DNA loops ( `` mega-loops '' ) . 
We employed the Chromosome Conformation Capture ( 3C ) method to investigate interactions between distal GalR operators ( Dekker et al. , 2002 ) . 
Thus , we showed that GalR does indeed oligomerize over long distances , resulting in the formation of mega-loops . 
Moreover , our data suggested the existence of other unidentified GalR binding sites around the chromosome , with these novel sites also participating in long-distance interactions ( Qian et al. , 2012 ) . 
Figure 3 shows in a cartoon from the demonstrable GalR-mediated DNA-DNA connections as listed in Table 1 . 
Although , we originally proposed that DNA-bound GalR-mediated mega-loops may serve to increase the local concentrations of GalR around their binding sites for regulation of the adjacent promoters ( Oehler and Muller-Hill , 2010 ) , global regulation of gene expression due to change in chromosome structure may be another consequence of mega-loop formation . 
We propose that GalR-mediated mega-loop formation results in the formation of topologically independent DNA domains , with the level of superhelicity in each domain influencing transcription of the local promoters . 
Bacterial and Bacteriophage Strains
Bacteriophage P1 lysates of galR : : kanR ( from Keio collection ; ( Baba et al. , 2006 ) ) were made and E. coli K-12 MG1655 galR deletion strains were constructed from MG655 by bacteriophage P1 transduction using the lysate . 
Cells were then grown in 125 ml corning flasks ( Corning © R 430421 ) containing 30 ml of M63 minimal medium plus D-fructose ( final concentration 0.3 % ) at 37 ◦ C with 230 rpm shaking . 
At OD600 0.6 , cell cultures were separated into two flasks . 
Subsequently , D-galactose ( final 
3072949 3072964 O ( F25-1 ) CTTAAATCGATTGCCG 
3072989 3073004 O ( F25-2 ) TTTGAAGCGATTGCGG 
Connections were detected among these sites except galEE and galEI by 3C assays . 
The first seven operators that showed connections by 3C were known before . 
The ones named as F were discovered during the 3C studies ( Qian et al. , 2012 ) . 
concentration 0.3 % ) or water was added and cells were cultivated for an additional 1.5 h at 37 ◦ C. E. coli MG1655 galR-TAP ( AMD032 ) was constructed by bacteriophage P1 transduction of the kanR-linked TAP tag cassette from DY330 galR-TAP ( Butland et al. , 2005 ) . 
The kanR cassette was removed using pCP20 , as described previously ( Datsenko and Wanner , 2000 ) . 
E. coli MG1655 galR-FLAG3 ( AMD188 ) was constructed using FRUIT ( Stringer et al. , 2012 ) . 
RNA Isolation
Cell cultures were placed on ice and RNAprotectTM Bacteria Reagent ( Qiagen © R 76506 ) was added to stabilize the RNA ( Lee et al. , 2014 ) . 
Cells were harvested for RNA purification by RNeasy © Mini Kit ( Qiagen R R © 74104 ) following the manufacturer 's recommendations . 
RNA concentrations and TM purity were measured using a Thermo Scientific NanoDrop 1000 . 
Further sample processing was performed according to the Affymetrix GeneChip © R Expression Analysis Technical Manual , Section 3 : Prokaryotic Sample and Array Processing ( 701029 Rev. 4 ) . 
Isolated RNA ( 10 µg ) was used for Random Primer cDNA synthesis using SuperScript IITM Reverse Transcriptase ( Invitrogen Life Technologies 18064-071 ) . 
The reaction mixture was treated with 1N NaOH to degrade any remaining RNA and treated with 1N HCl to neutralize the NaOH . 
Synthesized cDNA was then purified using MinElute © R PCR Purification columns ( Qiagen © 28004 ) . 
Purified cDNA concentration R and purity were measured using a Thermo Scientific NanoDropTM 1000 . 
Purified cDNA was fragmented to between 50 and 200 bp by 0.6 U / µg of DNase I ( Amersham Biosciences 27-0514-01 ) ◦ for 10 min at 37 C in 1X One-Phor-All buffer ( Amersham Biosciences 27-0901-02 ) . 
Heat inactivation of the DNase I enzyme was performed at 98 ◦ C for 10 min . 
Fragmented cDNA was then 3 ′ termini biotin labeled using the GeneChip © DNA Labeling Reagent ( Affymetrix R 900542 ) and 60 U of Terminal Deoxynucleotidyl Transferase ( Promega M1875 ) at 37 ◦ C for 60 min . 
The labeling reaction was then stopped by the addition of 0.5 M EDTA . 
Microarray Hybridization
Labeled cDNA fragments ( 3 µg ) were then hybridized for 16 h ( 60 rpms ) at 45 ◦ C to tiling array chips ( Ecoli_Tab520346F ) purchased from Affymetrix ( Santa Clara , CA ) . 
The chips have 1,159,908 probes in 1.4 cm × 1.4 cm and a 25-mer probe every 8 bps in both strands of whole E. coli genome . 
In addition , the probes are also overlapped by 4 bps with other strand probes . 
Each 25-mer DNA probe in the tiling array chip are 8 bp apart from the next probe . 
Probes are designed to cover the whole E. coli genome . 
Microarray: Washing and Staining
The chips were then washed with Wash Buffer A : NonStringent Wash Buffer ( 6X SSPE , 0.01 % Tween-20 ) . 
Wash Buffer B : ( 100 mM MES , 0.1 M [ Na + ] and 0.01 % Tween-20 ) and stained with Streptavidin Phycoerythrin ( Molecular Probes S-866 ) and anti-streptavidin antibody ( goat ) , biotinylated ( Vector Laboratories BA-0500 ) on a Genechip Fluidics Station 450 ( Affymetrix ) according to washing and staining protocol , ProkGE-WS2_450 . 
Microarray: Scanning and Data Analysis
Hybridized , washed , and stained microarrays were scanned using a Genechip Scanner 3000 ( Affymetrix ) . 
Standardized signals , for each probe in the arrays , were generated using the MAT analysis software , which provides a model-based , sequencespecific , background correction for each sample ( Johnson et al. , 2006 ) . 
A gene specific score was then calculated for each gene by averaging all MAT scores ( natural log ) for all probes under the annotated gene coordinates . 
Gene annotation was from the ASAP database at the University of WisconsinMadison , for E. coli K-12 MG1655 version m56 ( Glasner et al. , 2003 ) . 
Data were graphed with ArrayStar © , version 2.1 . 
R DNASTAR . 
Madison , WI . 
The tiling array data was submitted to NCBI Gene Expression Omnibus . 
The accession number is GSE85334 . 
ChIP-Chip Assays
MG1655 galR-TAP ( AMD032 ) cells were grown in LB at 37 ◦ C to an OD600 of ∼ 0.6 . 
ChIP-chip was performed as described previously ( Stringer et al. , 2014 ) . 
Data analysis was performed as described previously except that probes were ignored only if they had a score of < 100 pixels , indicating regions that are likely missing from the genome ( Stringer et al. , 2014 ) . 
Adjacent probes scoring above the threshold for being called as being in GalR-bound regions were merged , and the highest-scoring probe was selected as the `` peak position . '' 
The closely spaced peaks upstream of mglB and galS were manually separated . 
The ChIP-chip data was submitted to the EBI Array Express repository . 
The accession number is E-MTAB-4903 . 
Identification of an Enriched Sequence Motif from ChIP-Seq Data
For each peak position , we extracted genomic DNA sequence using the following formulae to determine the upstream and downstream coordinates : upstream coordinate : UP − ( ( UP − UP − 1 ) ∗ ( SP − 1 / SP ) ) ; downstream coordinate : DP − ( ( DP +1 − DP ) ∗ ( SP +1 / SP ) ) ; where S = probe score , U = genome coordinate corresponding to the upstream end of a probe , D = genome coordinate corresponding to the downstream end of a probe , P = peak probe , P − 1 = probe upstream of peak , and P +1 = probe downstream of peak . 
We used MEME ( version 4.11.2 , default parameters except any number of motif repetitions was allowed ) to identify an enriched sequence motif ( Bailey and Elkan , 1994 ) . 
ChIP-qPCR
MG1655 galR-FLAG3 ( AMD188 ) cells were grown in LB at 37 ◦ C to an OD600 of 0.6 -- 0.8 . 
ChIP-qPCR was performed as described previously ( Stringer et al. , 2014 ) . 
The motifs in bold letters are also present in Table S2.
RESULTS
In silico Identification of Novel GalR Target Genes in E. coli A consensus sequence of GalR binding sites from the previously known functional 9 operators in the gal regulon ( galE , galP , mglB , galS , and galR promoters ; Figure 1 ) appears to be a 16-bp hyphenated dyad symmetry sequence with the center between 1 16 positions 8 and 9 : GTGNAANC.GNTTNCAC ( with N being any nucleotide ; Weickert and Adhya , 1993a ) . 
Genetic analysis showed that mutations at any of the positions 3 , 5 , 9 , and 15 ( labeled in bold ) create a functionally defective operator ( Adhya and Miller , 1979 ) . 
Therefore , we used a motif in which nucleotides at positions 3 , 5 , 9 , and 15 were fixed to search through the whole genome of E. coli ( NC_000193 .3 ) ( Baba et al. , 2006 ) for putative GalR operators , allowing two mismatches at other non-N positions as described ( Qian et al. , 2012 ) . 
Thus , we found 165 potential GalR operators distributed across the genome ( Table S1 ) . 
Further analysis of the original 9 GalR-target operators sequences with critical information content was conducted ( Figure 1 ; Schneider and Mastronarde , 1996 ) . 
A unique alignment of 42 bp length was obtained ; the information content of the optimally aligned sites was Rsequence = 16.1 ± 0.7 bits/site for the 42 bp sequence range ( Shannon , 1948 ; Pierce , 1980 ; Schneider et al. , 1986 ) . 
The information content needed to find these 9 sites in the 4,641,652 bp E. coli genome ( NC_000913 .3 ) is Rfrequency = 18.98 bits/site ; the information content in the sites is not suficient for them to be found in the genome , Rsequence/Rfrequency = 0.85 ± 0.04 , so the binding sites do not have enough information content for them to be located in the genome ( Schneider et al. , 1986 ; Schneider , 2000 ) . 
This result implies that there could be 66 ± 32 sites in the genome . 
As shown in Figure 4 , the sequence logo of the binding sites covers the DNase I protection segment ( Majumdar and Adhya , 1987 ; Schneider and Stephens , 1990 ) . 
There may be additional conservation near a DNase I-hypersensitive site in a major groove one helical turn from the central two major grooves bound by GalR ( − 16 and +17 ; Figure 4 ) . 
The sequence conservation in the center of the site at bases 0 and 1 exceeds the sine wave , indicating that GalR binds to non-B-form DNA 
( Schneider , 2001 ) as was previously suggested ( Majumdar and Adhya , 1989 ) . 
An individual information weight matrix corresponding to positions − 20 to +21 of the logo in Figure 4 was created and scanned across the E. coli genome ( Schneider , 1997 ) . 
Sixty sites were identified that contain more than 9.4 bits , the lowest information content of the biochemically proven sites . 
The sequences of novel GalR predicted sites corresponding to the logo are summarized in Table 2 . 
Rfrequency for these sites in the genome is 16.24 bits/site , which is close to the observed 16.3 ± 0.1 bits/site from all the predicted genomic sites . 
Functional Analysis of the Putative GalR Binding Sites Using ChIP-chip Assays
For the functional analysis of the putative binding sites , a ChIP-chip assay was performed to detect GalR target sequences genome-wide in vivo ( Collas , 2010 ; Wade , 2015 ) . 
In this ChIP-chip assay the binding of C-terminally TAP ( tandem afinity purification ) - tagged GalR ( tagged at its native locus in an unmarked strain ) was mapped across the E. coli genome . 
The experimental data resulting from ChIP-chip analysis were validated by quantitative real-time PCR ( ChIP/qPCR ) . 
To demonstrate that the ChIP signal was not an artifact of the TAP tag , we constructed an unmarked derivative of E. coli MG1655 that expressed a C-terminally FLAG3-tagged GalR from its native locus . 
We selected six ( ytfQ , galE , purR , talB , cyaA , and chiP ) sites for validation , including ytfQ , talB , and cyaA that had not been described or predicted previously . 
In all cases , we detected significant signal of GalR binding indicating that these are genuine sites of GalR binding ( Figure 5 ) . 
The inferred binding sites from ChIP-chip assays are listed in Table 3 . 
We identified 15 GalR-bound regions , four of which contain two operators . 
These include 8 known operators ( in galE , galP , galS , galR , chip , and mglB ; Weickert and Adhya , 1993b ; Plumbridge et al. , 2014 ) . 
Thirteen of the 15 putative GalR-bound regions overlap an intergenic region upstream of a gene start . 
This is a strong enrichment over the number expected by chance ( only ∼ 12 % of the genome is intergenic ) . 
Global Transcription Profile in the Presence and Absence of GalR
Since both in silico investigation and ChIP-chip assays suggested that the regulatory role of GalR goes beyond D-galactose metabolism , we used transcriptome profiling to gain further insight into the impact of GalR on genome-wide transcription . 
To evaluate the effect of galR deletion on global gene expression patterns , we compared the ratio of RNA isolated from a ∆ galR mutant to that isolated from wild-type cells , using DNA tiling microarrays ( Tokeson et al. , 1991 ) . 
The results of the transcriptional analysis are displayed in the MAT plot shown in Figure 6 . 
For all analysis , we arbitrarily selected a stringent ratio cut-off of 3 . 
We identified 238 genes with values exceeding this cut-off ( Table S2 ) . 
These 238 genes are transcribed from 158 promoters . 
Three transcripts ( 5 genes ) of the 158 promoters are up-regulated ( GalR acting as a repressor ) and 155 transcripts ( 233 genes ) are down-regulated ( GalR acting as an activator ; Table S2 ) . 
Interestingly , several genes including mglB are dys-regulated by GalR but fall outside of the cut-off range . 
All three ( galP , galP1 , and galP2 ) of the up-regulated promoters have adjacent operators . 
Of the 155 down-regulated promoters , 4 promoters contain adjacent operators and the remaining 151 do not . 
DISCUSSION
Using a combination of bioinformatic and experimental approaches we identified many putative novel GalR operators in the E. coli genome . 
As expected , several of these putative operators were identified by both information theory and ChIP-chip assays , demonstrating that they represent genuine GalR binding sites . 
Thus , we have substantially expanded the known GalR regulon . 
Surprisingly , our data suggest that GalR , a regulator of D-galactose metabolism , also regulates the expression of genes involved in other cellular processes . 
Interestingly , three of the putative novel GalR target genes -- cytR , purR , and adiY -- encode transcription factors , suggesting that GalR may be part of a more complex regulatory network . 
Moreover , putative GalR operators upstream of cytR and purR overlap with operators for CytR and PurR , respectively , indicating combinatorial regulation of these genes ( Meng et al. , 1990 ; Rolfes and Zalkin , 1990 ; Mengeritsky et al. , 1993 ) . 
Despite our identification of GalR operators with high confidence upstream of genes mentioned above , our expression microarray data show little or no regulation of these genes by GalR . 
We propose that regulation of these genes by GalR is conditionspecific , requiring input from additional regulatory factors . 
Role of GalR in Gene Regulation
DNA tiling array analysis revealed that the transcription of a surprisingly large number of promoters ( 158 ) in E. coli is dysregulated by deletion of the galR gene . 
On the other hand , we identified 165 established or potential GalR operators in the chromosome , 76 of which are located between − 200 to +400 bp from the tsp of promoters ( cognate ) , and the other 89 operators are not ( Table S1 ) . 
We called the former group of operators , `` Gene Regulatory Sites '' ( GRS , listed in Table 4 ) . 
Consistent with a previous proposal ( Macvanin and Adhya , 2012 ) , we believe that 89 non-cognate operators around the chromosome are playing an architectural role in chromosome organization . 
The unattached operators would be referred to as `` Chromosome Anchoring Sites '' ( CAS ) . 
Some of the sites may serve as both GRS and CAS . 
The 76 ( 46 % ) GRS and 89 ( 54 % ) CAS are shown in Table S1 . 
Seventy-six GRS include 9 previously known operators of the gal regulon ( see Figure 1 ) ; the other 67 , which control promoters , were not known previously . 
The discovery of new GRS indicates that GalR , a well-known regulator of D-galactose metabolism , also regulates the expression of other genes . 
Among the new GRS , 3 ( in yaaJ , purR , and ytfQ promoters ) were confirmed by in vivo DNA-binding ( ChIP-chip assays ) as shown in Table 3 . 
The salient features of our findings presented in this paper are shown schematically in Figure 7 . 
Although we identified 158 transcripts whose expression was regulated by GalR , very few of these are associated with a putative GalR operator identified in silico and/or ChIP-chip assays , strongly suggesting that the majority of regulation by GalR occurs indirectly . 
Based on our earlier observation that GalR mediates mega-loop formation , we propose that long-range oligomerization of GalR indirectly regulates transcription by altering chromosome structure . 
There are at least three possible mechanisms for such regulation : indirect control , enhancer activity , and modulation of DNA superhelicity . 
In the indirect control model , GalR directly regulates another regulator , such as PurR or CytR , and the downstream regulator directly regulates other genes . 
The regulation by GalR is indirect , but occurs by a classical regulatory mechanism . 
In the enhancer activity model , GalR stimulates transcription of some target genes by binding to a distal site and forming an enhancer-loop with a protein bound to the promoter region . 
Examples of enhancer activity have been described before for some prokaryotic and many eukaryotic promoters ( Rombel et al. , 1998 ; Schaffner , 2015 ) . 
In the DNA superhelicity modulation model , GalR creates DNA topological domains by mega-loop formation and defines local chromosomal superhelicity by GalR-GalR interactions between distally bound dimers . 
The strength of a promoter is usually defined by superhelical nature of the DNA ( Pruss and Drlica , 1989 ; Lim et al. , 2003 ) . 
We propose that GalR entraps different amount of superhelicity in different topological domains and thus controls transcription of the constituent promoters . 
In the absence of GalR such domains are not formed resulting in a change in local DNA superhelicity , and thus a change in the strength of the constituent promoters . 
In this model , GalR protein indirectly regulates gene transcription as an architectural protein . 
We are currently studying the regional superhelicities in the entire chromosome in the presence and absence of GalR as well as the implication of genes affected by GalR , but independent of D-galactose metabolism ( Lal et al. , 2016 ) . 
AUTHOR CONTRIBUTIONS
ZQ : designed genome-wide sequence analysis , interpreted sequence analysis data and tiling array data ; AT and SL : executed tiling array experiments and data analysis ; XH : executed genome-wide sequence analysis ; TD : integrated tiling array and genome-wide sequence data ; AS and JW : executed ChIP-chip and ChIP-qPCR experiments and data analysis ; DL : data analysis ; TS : executed Information Theory and data analysis ; SA : organized and designed experiments , and data analysis . 
All authors contributed to the manuscript preparation . 
ACKNOWLEDGMENTS
This work was supported by the Intramural Research Program of the National Institutes of Health , the National Cancer Institute , and the Center for Cancer Research . 
The authors have no conflict of interest to declare . 
We thank the Wadsworth Center Applied Genomic Technologies Core Facility for assistance with microarrays for ChIP-chip assays . 
SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at : http://journal.frontiersin.org/article/10.3389/fmolb . 
2016.00074 / full #supplementary - material