25735747.txt
32.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
The architecture of ArgR-DNA complexes at the
1Department of Biological Sciences , Korea Advanced Institute of Science and Technology , Daejeon 305-701 , Republic of Korea , 2KI for the BioCentury , Korea Advanced Institute of Science and Technology , Daejeon 305-701 , Republic of Korea , 3Department of Chemical and Biochemical Engineering , Dongguk University-Seoul , Seoul 100-715 , Republic of Korea , 4Department of Bioengineering , University of California , San Diego , La Jolla , CA , USA , 5Department of Pediatrics , University of California , San Diego , La Jolla , CA , USA and 6Center for Biosustainability , Technical University of Denmark , Hørsholm , Denmark
Received September 23, 2014; Revised February 12, 2015; Accepted February 13, 2015
ABSTRACT
DNA-binding motifs that are recognized by transcription factors ( TFs ) have been well studied ; however , challenges remain in determining the in vivo architecture of TF-DNA complexes on a genome-scale .
Here , we determined the in vivo architecture of Escherichia coli arginine repressor ( ArgR ) - DNA complexes using high-throughput sequencing of exonuclease-treated chromatin-immunoprecipitated DNA ( ChIP-exo ) .
The ChIP-exo has a unique peak-pair pattern indicating 5 ′ and 3 ′ ends of ArgR-binding region .
We identified 62 ArgR-binding loci , which were classified into three groups , comprising single , double and triple peak-pairs .
Each peak-pair has a unique ± 93 base pair ( bp ) - long ( 2 bp ) ArgR-binding sequence containing two ARG boxes ( 39 bp ) and residual sequences .
Moreover , the three ArgR-binding modes defined by the position of the two ARG boxes indicate that DNA bends centered between the pair of ARG boxes facilitate the non-specific contacts between ArgR subunits and the residual sequences .
Additionally , our approach may also reveal other fundamental structural features of TF-DNA interactions that have implications for studying genome-scale transcriptional regulatory networks .
INTRODUCTION
Transcription factors ( TFs ) are ubiquitous regulatory proteins found across all domains of life that determine gene expression by controlling the distribution of RNA polymerase ( RNAP ) molecules on promoter sites ( 1 ) .
TFs recognize and bind to specific DNA sequences in response to various environmental conditions and govern transcriptional activation
Nucleic Acids Research, 2015, Vol. 43, No. 6 3079–3088 doi: 10.1093/nar/gkv150
or repression of the genes via promoter-associated RNAP ( 2 ) .
Therefore , the determination of TF-binding site ( TFBS ) with consensus DNA sequence motif is critical to understand the regulatory mechanism and role of TFs in transcription ( 3 ) .
In bacterial genomes , the TF-binding consensus sequences are generally between 12 and 30 base pairs ( bp ) in length , and are often structured as direct repeats or palindromes spaced with a fixed number of random nucleotides ( 4,5 ) .
Furthermore , the location of the TFBS determines whether the TFs interfere with or support the association of RNAP to a particular promoter .
For example , TFBS in the vicinity of the core promoter elements , the start of the coding region , or the activator-binding site can inhibit transcription by preventing the access of RNAP to those genomic regions ( 3 ) .
Interestingly , TFs often exert regulatory functions such as transcriptional activation and repression even at distal locations by causing topological changes in the structures of the genome such as DNA looping or bending ( 6 -- 8 ) .
Among the bacterial TFs , cAMP receptor protein ( CRP ) and arginine repressor ( ArgR ) are particularly interesting from a DNA structure point of view .
CRP bends the ◦ DNA by at least 90 at the site of interaction with DNA , thereby contributing to transcriptional regulation .
The association of hexameric ArgR complex induces DNA bending with the angle of ∼ 70 − 90 ◦ apparently centered at its binding motif ( 9 -- 11 ) .
Genome-scale studies for mapping of TFBS have been performed using chromatin immunoprecipitation ( IP ) coupled with microarray ( ChIP-chip ) or sequencing ( ChIP-seq ) for various bacterial TFs ( 7,12 -- 18 ) .
These studies , however , have not revealed the broad changes in genome topology and motif recognition mechanism by ArgR in vivo .
Here , we describe in vivo architecture of how DNA wraps around the hexameric ArgR complex on a genome-scale .
The comprehensive determination of ArgR target genes by analysis of unique peak-pair pattern of ChIP-exo demonstrates that the sharp DNA bending ( 70 -- 90o ) at the TFBS facilitates the non-specific contacts between ArgR subunits and residual sequences of TFBS .
This approach provides a foundation to determine direct regulon members and in vivo architecture of TFs and DNA complexes to elucidate a mechanistic understanding of transcriptional regulatory networks .
MATERIALS AND METHODS
Bacterial strains and growth
All strains used are Escherichia coli K-12 MG1655 and its derivatives .
The strain harboring ArgR-8myc was constructed as described previously with the tagging primers , AACGGTTTCACAGTCAAAGACC TGTACGAAGCGATTTTAGAGCTGTTCGACC AGGAGCTTGTCGGATCCAGTCTTCGTGAT and GCAGGGGGTTGAGAGGGATAAGCAACATTTTC CCCGCCGTCAGAAACGACGGGGCAGAGAAATT CCGGGGATCCGTCGACC ( 19 ) .
A Glycerol stock of the strain was inoculated into 3 ml Luria broth supplemented ◦ with 150 g kanamycin and cultured overnight at 37 C with constant agitation .
The cultured cells were inoculated with 1:100 dilution into 50 ml of the fresh M9 medium containing 2 g/l-glucose in either the presence or absence of 1 g/l-arginine and continued to be grown at 37 ◦ C until reaching an appropriate cell density ( OD600 ≈ 0.5 ) .
ChIP-exo
Cultured cells ( 50 ml ) were cross-linked with 1 % formaldehyde at room temperature for 30 min .
2 ml of 2.5 M glycine was added to quench the unused formaldehyde .
After washing three times with 50 ml of ice-cold Tris-buffered saline ( TBS ) , the washed cells were resuspended in 0.5 ml of lysis buffer composed of 50 mM Tris-HCl ( pH 7.5 ) , 100 mM NaCl , 1 mM EDTA , 1 g ml RNaseA , / protease inhibitor cocktail and 1 kU Ready-Lyse lysozyme ( Epicentre , Madison , WI , USA ) , and then incubated at 37 ◦ C for 30 min ( 20 ) .
The cells were then treated with 0.5 ml of 2 × immunoprecipitation ( IP ) buffer ( 100 mM Tris-HCl ( pH 7.5 ) , 100 mM NaCl , 1 mM EDTA , 2 % ( v/v ) Triton X-100 and protease inhibitor cocktail ) , followed by incubation on ice for 30 min .
The lysate was sonicated in an ice bath using Sonic Dismembrator Model 500 ( four times for 20 s each , output level , 2.5 W ) .
Size distribution of the fragmented DNAs was confirmed using agarose gel electrophoresis ( 200 -- 400 bp ) after removing cell debris by centrifugation .
The cross-linked DNA-ArgR complexes in the supernatant were then subjected to IP by adding 10 l of Anti-myc ( 9E10 ) ( Santa Cruz , Dallas , TX , USA ) .
For mock-IP control , 2 g of normal mouse IgG ( Santa Cruz ) was added into the supernatant in parallel .
They were then incubated overnight at 4 ◦ C with constant rotation .
The cross-linked DNA-protein and antibody complexes were selectively captured by adding 50 l of Dynabeads Pan Mouse IgG magnetic beads ( Invitrogen , Grand Island , NY , USA ) .
Next , DNAs were end-polished using T4 DNA polymerase ( NEB , Ipswich ,
MA , USA ) , ligated with the annealed adaptor 1 ( 5 ′ - Phospho-AACTGCCCCGGGTTGCTCTTCCGATCT and 5 ′ - OH-AGATCGGAAGAGC-OH ) , nick-repaired using phi29 polymerase ( NEB ) , and digested with exonuclease ( NEB ) as illustrated in the Supplementary Figure S1 ( 21 ) .
Then , protein-DNA complexes were reverse-cross-linked by heating at 65 ◦ C overnight and proteins were degraded by 8 g of protease K ( Invitrogen ) .
The purified DNAs were denatured at 95 ◦ C and extended by P1 primer ( 5 ′ - OH-GTGACTGGAGTTCAGACGTGTGCTCTTCC GATCT ) , further ligated with the annealed adaptor 2 ( 5 ′ - OH-ACACTCTTTCCCTACACGACGCTCTTCCGAT CT and 5 ′ - OH-AGATCGGAAGAGCGTCGTGTAGG GAAAGAGTGTAG ) .
The ligated DNA products were purified using Qiagen polymerase chain reaction ( PCR ) purification kit and were PCR-amplified by P2 primer ( 5 ′ - OH-AATGATACGGCGACCACCGAGATCTAC ACTCTTTCCCTACACGACGCTCTTCCGATCT ) and P3 primer ( 5 ′ - OH-CAAGCAGAAGACGGCATACGA GATNNNNNNGTGACTGGAGTTCAGACGTGT ) .
The degenerate sequence ( the underlined 6Ns ) in the P3 primer indicates the index sequence for the Illumina next-generation sequencing ( Illumina , San Diego , CA , USA ) .
The PCR-amplified DNA products were separated on a 2 % agarose gel and the amplicons were excised from the gel and extracted using QIAquick gel purification columns .
Real-time quantitative PCR
To measure the enrichment of the ArgR-binding DNA in chromatin IP samples , real-time quantitative PCR ( qPCR ) was performed .
1 l of IP or mock-IP DNA was used with specific primers to the previously identified ArgR binding regions ( gltB promoter ) and non-binding regions ( aroH gene ) ( 17 ) .
The primer sequences for gltB were 5 ′ - AAGCTT ′ GCCATTTGACCTGT and 5 - TCCTTTTCGCATCGGT TAAT , the ones for aroH were 5 ′ - TCCTCTCGCCAGAC ′ AAAAAT and 5 - TCAAACTCGTGCAGCGTATC .
A reaction mixture of 1 l of IP of mock-IP DNA , 1 l of 10 M primers of each region , 15 l of SYBR mix ( Biorad , Hercules , CA , USA ) and 13 l of ddH2O was prepared on ice .
All real-time qPCR reactions were conducted in trip - ◦ licate .
The samples were cycled for 15 s to 94 C , for 30 s ◦ ◦ to 54 C and for 30 s to 72 C ( total 40 cycles ) in Thermal Cycler ( Biorad ) .
The threshold cycle ( Ct ) values were calculated automatically by the iCycler iQ optical system software ( Bio-Rad ) .
Normalized Ct ( Ct ) values for each sample were calculated by subtracting the Ct value obtained for the mock-IP DNA from the Ct value for the IP-DNA ( Ct = Ct , IP -- Ct , mock ) .
Next-generation sequencing
Prior to the high-throughput sequencing , the sequencing libraries for ChIP-exo were cloned into TOPO vector ( Invitrogen ) and several colonies were subjected to Sanger sequencing to confirm the adapter sequences and inserted DNA length of the sequencing library .
Then , the sequencing libraries were quantified using Qubit © R 2.0 fluorometer ( Invitrogen ) and ExperionTM system ( Bio-Rad ) , and se quenced using Illumina Miseq © R V2 ( Supplementary Figure S2 ) .
Read mapping and data processing
All sequencing reads from ChIP-exo experiments were mapped to E. coli MG1655 reference genome ( NC 000913 ) using CLC Genomics Workbench5 with the length fraction of 0.9 and the similarity of 0.99 ( Supplementary Table S1 ) .
To capture target protein binding sites from ChIP-exo data , corresponding genomic position of mapped reads start position ( MRSP ) was counted and stored for visual inspection using in-house scripts .
Motif searching
The motif search and sequence logo was completed using the BioProspector , MEME Suite ver .
4.9.128 , and WebLogo 3 .
Raw experimental data
All raw data files can be downloaded from Gene Expression Omnibus through accession number GSE60546 .
RESULTS
Immunoprecipitation (IP) of ArgR-DNA complexes
ArgR is a transcription factor involved in arginine biosynthesis and metabolism in E. coli .
The high concentration of cellular arginine enhances ArgR affinity for specific genomic regions and concurrently modulates the transcription of the related genes .
Cellular arginine facilitates the formation of the ArgR hexamer .
Consequently , the presence of arginine is essential for ArgR hexamer to bind its binding sites with high affinity for the transcriptional regulation of its regulon members ( 22 ) .
We used the genome-wide ChIP-exo method on the E. coli K-12 MG1655 strain harboring myc-tagged ArgR protein to probe the ArgR-binding sites at single nucleotide resolution in vivo ( 17,21 ) .
Since ArgR responds to the concentration of exogenous L-arginine , the cells were grown in M9 minimal media either in the presence ( + ARG ) or absence ( − ARG ) of the amino acid .
Prior to the genome-wide ChIP-exo assay , we first examined the enrichment of ArgR proteins on the promoter of gltBDF operon in the IP ArgR-DNA complexes under the experimental conditions ( Figure 1a ) .
A cross-linking experiment was performed at mid-log phase , followed by lysis , DNA shearing , and IP using anti-myc antibody and then purification of DNA fragments .
Quantitative PCR was performed to confirm the enrichment of ArgR-binding regions in the immunoprecipitated DNA ( IP-DNA ) samples by using primers that amplified the previously known ArgR-binding region .
ArgR negatively regulates the gltBDF operon , which encodes one of the two main ammonia assimilation pathways in E. coli ( 23 ) .
As a negative control , we examined the level of ArgR enrichment on the promoter region of aroH , which is involved in the biosynthesis of aromatic amino acids ( 24 ) .
The occupancy level of ArgR at the promoter region of gltBDF operon was ∼ 60-fold higher than aroH under both + ARG and − ARG growth conditions ( Figure 1a ) .
This result is in good agreement with the previous ChIP-chip results ( 17 ) , demonstrating that ArgRbound DNA fragments were selectively enriched under the experimental conditions .
Determination of genome-wide ArgR-binding loci using ChIP-exo
The direct analysis of in vivo ArgR-binding across the E. coli genome , previously described using ChIP-chip experiments , revealed a total of 61 unique ArgR-binding regions .
This study demonstrated that integration of the ChIP-chip with transcriptome analysis determines the ArgR regulon along with its transcriptional regulatory network overarching the amino acid metabolism ( 17 ) .
Although a partially conserved 18-bp-long imperfect palindrome sequence was inferred as the consensus ArgR-binding motif from the previous ChIP-chip study , we were unable to elucidate the interaction between ArgR hexamer and the neighboring sequences of the ArgR-binding motif due to the limitation of peak resolution .
Therefore , we employed ChIP-exo assay ( Supplementary Figure S1 ) , which sequentially performs exonuclease trimming , end polishing , blunt-ended and nickrepairing of the IP-DNA followed by high-throughput sequencing ( Figure 1b ) ( 21 ) .
To this end , we modified the ChIP-exo method for the Illumina sequencing platforms .
The high-quality sequencing reads from the + ARG and − ARG samples were uniquely mapped to the E. coli reference genome ( NC 000913 ) , separately , resulting in identification of ArgR-binding sites in the genome-wide landscape ( Figure 1c ) .
In case of the + ARG sample , ArgR-binding occupancy was increased in the identified binding regions ( over 90 % loci ) , in comparison to the -- ARG sample ( Supplementary Figure S3 ) , which is consistent with the previous ChIP-chip result ( 17 ) .
Overall , the genome-wide ChIP-exo profile exhibits a pattern similar to the ChIP-chip profile ; but , we observed ∼ 100-fold higher signal-to-noise ( S/N ) ratio with ChIP-exo profile .
The ChIP-exo method enabled the identification of the precise location of the ArgR-binding genomic regions , which are represented by the two peaks ( hereafter , referred to as a peak-pair ) , one from the top strand and the other from the bottom strand ( Figure 1b ) .
The additional exonuclease treatment digested the ArgR-bound DNA up to the first nucleotide point of cross-linking between DNA and ArgR in the 5 ′ to 3 ′ direction .
Thus , these peak-pairs allowed us to identify ArgR-binding locations , which are strand-specific for the interaction between DNA and ArgR .
From this data set , a total of 62 unique ArgR-binding locations were identified ( Supplementary Table S2 ) .
The ChIP-exo profiles represented complete coverage of the 15 ArgR-binding regions , which had been characterized by in vitro DNA-binding experiments and in vivo mutational analysis ( 25 ) .
The previous ChIP-chip assays determined a total of 64 ArgR-binding regions , including two divergent promoter regions ( 17 ) .
From the comparative analysis of the ChIP-chip data with the ChIP-exo data , a majority of them ( 90 % ) were identified simultaneously ; however , a few exceptions were observed , such as asnT , yoeI , yqaE , plsC , atpI and phnN promoters ( Figure 1d )
These exceptions were attributed to low occupancy level ( ∼ 1.10 ) measured by ChIP-chip , which was significantly lower than other regions ( ∼ 2.78 ) ( Supplementary Table S2 ) .
Thus , exonuclease treatment may eliminate contamination of non-ArgR-bound non-specific DNA fragments with the detection of DNA fragments that are weakly bound by ArgR ( 21 ) .
Additionally , ChIP-exo profiles exhibited four new ArgR-associations from the upstream regions of proV , mltA , yhcC and ygaW , which encode a subunit of glycinebetaine/proline ABC transporter , one of six methionine tRNAs , predicted Fe-S oxidoreductase and L-alanine exporter , respectively ( Supplementary Table S2 ) .
All newly identified ArgR-binding regions were confirmed by electrophoretic mobility shift assays ( EMSA ) ( Supplementary Figures S4 and S5 ) .
The average distance between peaks at the extremities was 116 bp , which indicates a better peak resolution than ChIP-chip analysis ( Figure 1e ) .
The high resolution of ArgR-binding location led us to infer its mode of regulation .
Based upon the position of 84 % and 76 % of ArgR-binding peaks found at the upstream sites of translation start codon and within ± 100 bp at the vicinity of transcription , ArgR regulates most of the genes in its regulon at the transcriptional level ( Figure 1f and g ) .
Taken together , ChIP-exo profiles show low background and enhanced signals , leading to the attainment of bona fide ArgR-binding locations with high resolution .
Analysis of unique ArgR-binding peak-pair pattern
We found that the ArgR-binding signals are often composed of multiple peak-pairs using ChIP-exo analysis .
Th presence of such multiple peaks indicates that the interaction between ArgR and the cognate DNA sequence is more complicated than previously thought ; that it was based upon the simple DNA binding motif composed of a pair of palindromic sequences ( 9,11,26 ) .
For quantitative analysis of the ChIP-exo profiles , we determined 5 ′ end positions of mapped reads ( MRSPexo ) at each genomic position .
The MRSPexo provides strand-specific first point of cross-linking site between DNA and the ArgR at top and bottom strands , which may directly provide structural information of the complex .
For instance , we found single , double and triple peak-pairs from the promoter regions of hisJ , aroP and argD , which are responsible for the ATP-dependent histidine transport , active transport of three aromatic amino acids across E. coli inner membrane and amination steps in lysine , ornithine and arginine biosynthesis , respectively ( Figure 1h ) ( 27 -- 29 ) .
We sought to analyze the characteristics of the different multiplicities of ArgR at different binding sites .
First , to analyze genome-wide multiple peak-pair patterns , the MRSPexo signals of individual ArgR-binding regions were visualized as heatmaps using the values ranging from − 150 to +150 bp from the center position .
The heatmaps were categorized into three classes of ArgR-binding regions based on the number of peak-pairs ( Figure 2a , Supplementary Table S3 ) .
From the 63 unique ArgR-binding loci , we identified 21 sites ( ∼ 33 % ) with a single peak-pair .
Significant portions of ArgR-binding loci ( ∼ 67 % ) were composed of double ( 25 sites ) and triple peak-pairs ( 17 sites ) ( Figure 2b , Supplementary Table S3 ) .
MRSPexo at the single peak were enriched between − 150 and +150 bp from the center of forward and reverse single peak-pair ( F1-R1 ) .
Double and triple peak-pairs are composed of F1-R1 and F2-R2 ; and F1-R1 , F2-R2 and F3-R3 , respectively ( Figure 2c ) .
In cases of double and triple peak-pairs , the signals were enriched from the center of F1-R2 and F1-R3 between − 150 and +150 bp , respectively .
Thus , the complex interaction between ArgR and the cognate DNA is a genome-wide pattern .
Next , we calculated the distance between forward and reverse peaks from each peak-pair category .
Surprisingly , the pitch had a uniform distance of 93 bp ( ± 2 ) between symmetrically arranged peaks of the peak-pair ( F1-R1 , F2-R2 and F3-R3 ) , regardless of the number of the peak-pair ( Figure 2d ) .
In addition , the distance between each peak-pair was approximately 20 bp ( Figure 2e ) , suggesting that the ArgR binds to the cognate DNA in similar manner ( i.e. sequence specific binding ) but different conformation according to the number of binding events between ArgR and DNA .
We next examined if the number of peak-pairs show direct correlation at the loci with the ArgR-binding occupancy in the ChIP-chip data ( 17 ) .
Indeed , we observed an increase in occupancy between single , double and triple peak-pairs , whose median values were 1.56 , 3.34 and 4.08 , respectively , indicating a positive correlation due to the number of cross-linking sites between ArgR protein and DNA sequence ( Figure 2f ) .
The ChIP-chip or ChIP-seq signal intensities at the ArgR-binding sites serve as a good indicator of the different binding occupancies of ArgR ( 30 ) .
Furthermore , the multiple peak-pairs are a direct consequence of various topological structures of ArgR-DNA complexes .
It was proposed that the association of hexameric ArgR complex induces sharp DNA bend by an angle of ∼ 70 − 90 ◦ ( 9 -- 11 ) , which covers a region of approximately four helical turns through only one side of the DNA helix ( 26,31 ) .
Despite in vitro experimental evidence supporting such a steric-hindrance model , our results argue that the bending angle and region covered by ArgR complex in vivo is variable .
In vivo organization of the ArgR-DNA complexes The hexameric ArgR complex binds to the specific DNA motif composed of a pair of imperfect palindromic sequences that are connected by a fixed length spacer sequence ( 2 or 3 bp ) ( 26 ) .
To examine if the multiple peakpairs are the consequence of the presence of multiple ArgR-binding motifs , we inferred a de novo position-specific weight matrix ( PSWM ) for ArgR using MEME , which is a bioinformatics tool that identifies overrepresented motifs in multiple unaligned sequences ( 32 ) .
The DNA motifs were screened from the sequences for peak pairs of the three categories .
All peak-pairs contained the 39-bp long ArgR-binding motif comprising two 18 bp palindromic sequences with three nucleotides as a spacer , however the multiple ArgR-binding motifs were not observed in double and triple peak-pairs ( Figure 3a ) .
Thus , we speculated that the multiple peak-pairs in our ChIP-exo profiles did not originate due to the interaction between ArgR subunits with the multiple binding motifs .
Instead , we hypothesize that the multiple peak-pairs are the consequence of the single binding motif serving as an anchor for the confined non-specific interaction with neighboring sequences by the ArgR subunits .
This hypothesis is further supported by the fact that the distance between forward and reverse peak ( ∼ 93 bp ) is longer than the 39-bp long ArgR-binding motif .
To investigate this hypothesis , we determined the location of the ArgR-binding motif ( i.e. two ArgR boxes connected by 3-bp spacer ) between each paired peak .
A total of 122 individual peak-pairs were identified from the 63 ArgR-binding loci ( Figure 3b , Supplementary Table S4 ) .
Interestingly , these peak-pairs were classified into three groups based upon the location of the two ARG boxes in the DNA sequence between forward and reverse peak ( i.e. left , middle and right position ) .
In the first group ( 34 peak-pairs ) , the two ARG boxes are located at 6.7 bp on average from the left end of the DNA sequence .
In the second ( 47 peakpairs ) and third group ( 41 peak-pairs ) , the two ARG boxes were located at 26.9 and 47.3 bp from the left end , respectively .
The respective distance between the left ends of each group were 20.2 and 20.4 bp .
These unique peak-pair patterns suggest that the crosslinking positions detected from ChIP-exo are correlated with the interaction between a multimeric ArgR complex and its binding region .
It is known that two monomeric ArgR subunits bind one ARG box .
Thus , two ARG boxes of 39-bp in length are occupied by four monomeric ArgR subunits through interaction with only one side of the DNA helix that is equivalent to a region of about four helical turns ( 31 ) .
Note that a hexameric ArgR complex , which is the functionally active form for regulating the target genes , is composed of two trimeri
ArgR complexes depending on the allosteric effect of arginine ( 33,34 ) .
However , our data show a difference in the sequence length of ArgR-binding region ( ∼ 39 bp ) between in vitro experiments and the protected region ( ∼ 93 bp ) by in vivo ChIP-exo experiment .
Thus , we propose three ArgR-binding modes based upon the participation of the remaining two monomeric ArgR subunits in the interaction with the residual DNA region ( Figure 3c ) .
For modes and , four monomeric ArgR subunits from the extreme left or right positions bind to the two ARG boxes , and the remaining two monomeric ArgR subunits interact non-specifically with the residual DNA ( Figure 3c ( ) and ( ) ) .
The interaction between two ARG boxes and four monomeric ArgR subunits , which bends the DNA by an angle of ∼ 70 − 90 ◦ ( 9 -- 11 ) , may permit the contact of two monomeric ArgR subunits with the residual DNA .
For mode , four monomeric ArgR subunits at the center position hold the ArgR-binding motif by bending DNA .
Each ArgR subunit at the extreme left and right positions interacts with the residual DNA sequences non-specifically ( Figure 3c ( ) ) , which does not require an additional binding motif or identical length of sequence with the ARG box .
Furthermore , the N-terminal domain of ArgR carries a basic charge that interacts with the negatively charged DNA ( 35 ) .
To test this hypothesis , we screened the additional motif or a single ARG box from the DNA sequences of nonspecific contact region using the MEME tool .
No significant DNA motifs were found from residual sequences of the mode , and .
For example , the upstream region of hisJQMP operon containing ARG boxes participates in binding and stabilizing ArgR interaction ( 36 ) .
This site is ∼ 90 bp positioned away from ARG boxes ( 37 ) .
Thus , the binding of four monomeric ArgR subunits to ARG boxes facilitates DNA-bending that mediates non-specific contacts between ArgR subunits and the ArgR-binding region .
Next , we elucidated the structural difference between single , double and triple peak-pairs .
The previous gelretardation experiments suggested that one ArgR hexamer binds to the two palindromic ARG boxes ( 31 ) .
Consistent with this , our data imply that the ArgR-binding regions can bind to one of the three modes ( Figure 3d ) .
Thus , the number of peak-pairs can be determined by the binding accessibility of ArgR to the ARG boxes that results in regulat ing the bending angle ( ∼ 70 -- 90o ) .
For example , the higher ArgR-binding accessibility can induce the lower bending angle , resulting in a greater chance of non-specific contact for generating the multiple peak-pairs .
These diverse binding patterns agree well with the fact that the imperfect ArgR consensus sequences are important for increasing the range of the arginine concentration in vivo to regulate genes in a large regulon ( 38 ) .
Interaction between ArgR and RNA polymerase
In general , the ArgR represses transcription by steric exclusion of RNAP from the promoter regions ( 26,29,39 ) .
To determine this interaction , we compared the ArgR-binding sites with the − 10 and − 35 promoter elements occupied by RNAP .
We classified the interactions between ArgR and RNAP into three unique modes based on their binding locations .
For instance , ArgR binds to the promoter region of the hisJQMP operon , which is occupied by RNAP for transcriptional initiation ( 36 ) .
34 genes showed overlap of binding location of ArgR with RNAP , henceforth referred to as the overlapped mode ( O ) ( Figure 4a ) .
In the genes of aroP and yaaU , which encode an aromatic amino acid permease and an uncharacterized member of the major facilitator superfamily ( MFS ) of transporters , the ArgR-binding loci were determined at the upstream ( U ) and downstream ( D ) sites from RNAP-binding region , respectively ( Figure 4b and c ) .
We determined 11 such genes as having the upstream and downstream modes , respectively ( Figure 4d ) .
The relative binding locations of ArgR to the TSS positions ( upstream , downstream and overlapped ) were not directly correlated with the number of peak-pairs and transcriptional activity ( 17 ) ( Figure 4d ) .
Altogether , the binding of ArgR does not simply exclude the RNAP for the transcriptional repression , but instead the transcriptional regulation by ArgR is likely mediated by the combinatorial effect of DNA-bending at the ARG boxes , the ArgR-binding positions , the interaction with other TFs , and the number of peak-pairs ( 23,37 ) .
DISCUSSION
In conclusion , we describe in vivo DNA-wrapping modes around the hexameric ArgR complex induced by DNA-bending at the ARG boxes and non-specific contacts on a genome-wide scale .
ArgR is a hexameric transcriptional regulator , which controls the transcription of genes involved in arginine biosynthesis , utilization and transport , as well a histidine transport ( 17,36 ) .
In the presence of L-arginine , the hexameric ArgR complex binds to specific DNA sequences called ARG boxes , which consist of a pair of imperfect palindromic sequences .
The two palindromes are connected by a fixed-length spacer sequence ( 2 or 3 bp ) , resulting in the ArgR-binding site totaling 39 bp in length ( 26 ) .
It has been proposed that the association of hexameric ArgR complex with two ARG boxes bends DNA by an angle of ∼ 70 − 90 ◦ apparently centered between the pair of palindromes ( 9 -- 11 ) .
Additionally , it was postulated that the hexameric ArgR complex covers a region of about four helical turns through only one side of the DNA helix ( 26,31 ) .
Despite in vitro experimental evidence supporting such a steric-hindrance model , the mode of interaction of hexameric ArgR-DNA complex in vivo is unclear .
Our ChIP-exo data indicated comprehensive ArgR-DNA interactions at high-resolution with successful removal of false positives , resulting in a clearer snapshot of in vivo ArgR-binding events than in a previous study ( 17 ) .
The ArgR-binding data showing the unique DNA sequences ( 93 ± 2 bp ) defined by peak-pairs were classified into three modes comprising multiple peak-pairs ( 93 bp-long for each peak-pair and 20-bp-long interval between peak-pairs ) .
Moreover , we discovered that 67 % of ArgR-binding regions contain multiple peak-pairs where one broad peak was shown in the previous ArgR ChIP-chip data ( 17 ) .
Furthermore , the peak-pairs were grouped into three modes defined by the location of the two ARG boxes ( left , middle , right ) .
The sharp DNA bending ( 70 − 90 ◦ ) can be induced by specific interaction between four monomeric ArgR subunits and two ARG boxes .
Subsequently , the interaction facilitates non-specific contacts between residual monomeric ArgR subunits and DNA sequences .
These findings along with results of RNAP-binding loci suggest that the transcriptional regulation by hexameric ArgR complex is likely mediated by the combinatorial effect of DNA-bending at the ARG boxes , th
ArgR-binding positions , the interaction with other TFs and the non-specific contacts between ArgR and neighboring sequences .
ChIP-exo data significantly contributed to elucidating protein-DNA binding mechanisms at the genome-scale through the recognition of accurate protein-binding sites .
In the future , this technology will support fundamental information for various transcription factors to understand the bacterial transcription regulatory network .
Supplementary Data are available at NAR Online.
FUNDING
Intelligent Synthetic Biology Center of Global Frontier Project [ 2011 -- 0031957 to B.-K.C. ] ; Basic Science Research Program [ NRF-2013R1A1A3010819 to S.C. ] through the National Research Foundation of Korea ( NRF ) funded by the Ministry of Science , ICT and Future Planning .
Funding for open access charge : Intelligent Synthetic Biology Center [ 2011-0031957 to B.-K.C. ] .
Conflict of interest statement .
None declared .