20460455.txt
40.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
Identification of b-catenin binding regions in colon
ABSTRACT
Nucleic Acids Research , 2010 , Vol .
38 , No. 17 5735 -- 5745 doi :10.1093 / nar/gkq363
Deregulation of the Wnt/b-catenin signaling pathway is a hallmark of colon cancer .
Mutations in the adenomatous polyposis coli ( APC ) gene occur in the vast majority of colorectal cancers and are an initiating event in cellular transformation .
Cells harboring mutant APC contain elevated levels of the b-catenin transcription coactivator in the nucleus which leads to abnormal expression of genes controlled by b-catenin/T-cell factor 4 ( TCF4 ) complexes .
Here , we use chromatin immunoprecipitation coupled with massively parallel sequencing ( ChIP-Seq ) to identify b-catenin binding regions in HCT116 human colon cancer cells .
We localized 2168 b-catenin enriched regions using a concordance approach for integrating the output from multiple peak alignment algorithms .
Motif discovery algorithms found a core TCF4 motif ( T/A -- T/A -- C -- A -- A -- A -- G ) , an extended TCF4 motif ( A/T/G -- C/G -- T/A -- T/A -- C -- A -- A -- A -- G ) and an AP-1 motif ( T -- G -- A -- C/T -- T -- C -- A ) to be significantly represented in b-catenin enriched regions .
Furthermore , 417 regions contained both TCF4 and AP-1 motifs .
Genes associated with TCF4 and AP-1 motifs bound b-catenin , TCF4 and c-Jun in vivo and were activated by Wnt signaling and serum growth factors .
Our work provides evidence that Wnt / b-catenin and mitogen signaling pathways intersect directly to regulate a defined set of target genes .
Oregon Clinical and Translational Research Institute , Oregon Health and Science University , Portland , OR , 2 The Department of Biochemistry and Molecular Biology , The Pennsylvania State University College of Medicine , Hershey , PA 17033 , Department of Medical Informatics and Clinical Epidemiology , Oregon Health 3 and Science University , Portland , OR , Knight Cancer Institute , Division of Biostatistics in the Department of 4 5 Public Health and Preventative Medicine , Oregon Health and Science University , Portland , OR , Program in 6 Cellular and Molecular Biology and The Pennsylvania State Hershey Cancer Institute , The Pennsylvania State 7 University College of Medicine , Hershey , PA 17033 , USA
INTRODUCTION
The Wnt/b-catenin signaling pathway is required for homeostasis in the gastrointestinal ( GI ) tract ( 1 ) .
The GI tract is coated with small invaginations , or crypts , which are comprised of discrete zones of proliferating and differentiated cells ( 1 ) .
Wnt signaling maintains the proliferative compartment of the crypt .
The b-catenin transcriptional coactivator controls downstream genetic programs elicited by Wnt signaling and its cellular levels are tightly regulated ( 2,3 ) .
When cells are not exposed to Wnt , cytosolic b-catenin associates with a multi-protein complex that contains the adenomatous polyposis coli ( APC ) protein .
APC functions as a scaffold to coordinate b-catenin phosphorylation and degradation by the prote-asome .
Under these conditions , Wnt/b-catenin target genes are silenced by corepressor complexes that are tethered to Wnt responsive DNA enhancers ( WREs ) through interactions with the T-cell factor ( TCF ) family of transcription factors ( 4 ) .
TCF4 is a predominant TCF family member in colon cancer cells ( 5 ) .
When cells are exposed to Wnt , b-catenin escapes proteasomal degrad-ation , and is chaperoned to the nucleus by APC .
There , it occupies TCF4 bound WREs and displaces the corepressors .
b-catenin then recruits chromatinmodifying complexes and Wnt/b-catenin target genes are expressed ( 4 ) .
Deregulation of the Wnt/b-catenin pathway is associated with colon carcinogenesis ( 2,3 ) .
In virtually all cases of colon cancer , mutations target components of the Wnt signaling pathway ( 6 ) .
The most common lesions localize to APC and lead to production of a truncated APC protein that can no longer effectively coordinate b-catenin degradation .
This mutation occurs at the earliest stages of carcinogensis when normal colonic epithelial cells are transformed into aberrant crypt foci ( 6 ) .
Inherited APC mutations give rise to familial adenomatous polyposis , a disease where afflicted individuals are burdened by thousands of intestinal polyps early in adulthood ( 7,8 ) .
In the rare cancer cases where APC is wild-type , mutations instead are found in CTNNB1 ( 3,9,10 ) .
CTNNB1 is the gene that encodes b-catenin , and the cancer causing lesions map to positions near the 50 portion of the gene .
These mutations give rise to a b-catenin pool that is resistant to proteasomal degrad-ation ( 11 ) .
In each instance , mutations that target APC or CTNNB1 lead to high levels of b-catenin in the nucleus and abnormal expression of genes regulated by b-catenin / TCF4 complexes ( 5,9 ) .
Therefore identifying target genes directly controlled by b-catenin/TCF4 is required to understand the pathogenesis of this disease .
To identify direct b-catenin/TCF4 target genes it is first necessary to map binding sites for these factors across the genome .
Previously , we used an unbiased and genome-wide screen termed serial analysis of chromatin occupancy ( SACO ) to localize 412 high confidence b-catenin binding sites in the human colorectal cancer cell line , HCT116 ( 12 ) .
Approximately half of the binding sites were near ( < 2.5 kb ) or within protein-coding gene boundaries .
These b-catenin binding sites were located in 50 promoter regions , intragenic regions and 30 untranslated regions .
b-Catenin binding to 30 positions relative to E2F4 and MYC genes identified functional WREs .
For E2F4 , the downstream enhancer drove expression of an antisense and non-coding transcript that decreased E2F4 protein levels ( 13 ) .
For MYC , b-catenin occupancy of the downstream enhancer initiated a chromatin loop that integrated the 50 WRE to coordinate MYC expression in response to Wnt/b-catenin and mitogen signaling pathways ( 14,15 ) .
Recently , Hatzis et al. ( 16 ) used chromatin immunoprecipitation ( ChIP ) coupled with microarrays ( ChIP-chip ) to localize 6868 TCF4 binding sites in L171 colorectal cancer cells ( 16 ) .
As was the case for b-catenin , TCF4 binding was also found throughout protein coding gene boundaries .
Together these studies indicate that Wnt activation of target gene expression is mechanistically more intricate than the simple model involving recruitment of b-catenin to TCF4-bound 50 promoter regions .
While SACO was a pioneering technique used to identify transcription factor binding sites and was a viable alternative to the ChIP-chip approach , it did , like most methodologies , suffer from some limitations .
First , SACO libraries were constructed in plasmid vectors and then sequenced using high-throughput and Sanger-based sequencing .
This was both laborious and costly .
In addition , large DNA fragments were included in construction of the earliest SACO libraries .
While most DNA was in the 500 -- 700 bp range , fragments as large as 2.5 kb were included in the b-catenin SACO library .
This impinged upon the resolution of the technique and hindered b de novo motif discovery within - catenin bound loci .
A successor to SACO is the recently described ChIP coupled with massively parallel sequencing ( ChIP-Seq ) approach ( 17 -- 19 ) .
In this technique , DNA purified from immunoprecipitated chromatin is size-selected and then sequenced using one of the next-generation sequencing platforms such as the Illumina genome analyzer ( 18,19 ) .
The robustness , cost , resolution and relative ease in library construction has made ChIP-seq , rather than SACO , a current method of choice for genome-wide localization of transcription factor binding sites .
ChIP-Seq has been used to map numerous histone modifications ( 20 ) and binding sites for several transcription factors including , but not limited to , NRSF/REST , GATA1 , SRF , E2F4 , E2F6 and STAT1 ( 17 ) .
In addition , ChIP-Seq has been used recently to identify TCF4 enriched binding regions in human colon cancer cell lines ( 21,22 ) .
Tuupanen et al. ( 22 ) identified 10 TCF4-site containing regions in LoVo cells using a combination of ChIP-Seq and the enhancer element locator analysis .
Blahnik et al. ( 21 ) used ChIP-Seq to identify 21 102 TCF4 binding sites in HCT116 cells .
In this report we used ChIP-Seq to identify b-catenin binding regions in HCT116 human colon cancer cells .
We chose this high-resolution and genome-wide approach because we were interested in using de novo motif analysis to identify transcription factors that putatively cooperate with b-catenin/TCF4 .
Many algorithms exist to map the enriched genomic regions identified in a ChIP-Seq experiment ( 23 ) .
Because each approach varies in computational strategy and can produce dramatically different numbers of enriched regions for a given false discovery rate ( FDR ) cutoff , there is some debate as to which is the preferred algorithm ( 18 ) .
In this report , we used CisGenome ( 24 ) , SISSRs ( 25 ) and WTD ( 26 ) to initially identify enriched regions from the b-catenin ChIP-Seq library .
Based on the intersection of regions found in common with each algorithm , we identified 2168 b-catenin enriched regions .
Consistent with our previous report ( 12 ) , we found over-representation of the core and evolutionarily conserved TCF4 consensus motifs within enriched regions .
In addition , we found that consensus AP-1 motifs were also over-represented in a large subset of the enriched regions with the majority of these motifs co-occurring with a TCF4 motif .
Finally , we show that serum mitogens and Wnt signaling agonists cooperatively activate expression of some target genes in proximity to b-catenin bound loci that contain AP-1 and TCF4 motifs .
These findings indicate that a discrete subset of b-catenin target genes are activated by mitogen and Wnt signaling in colon cancer cells , and that this regulation likely occurs through consensus AP-1 and TCF4 sites , respectively .
MATERIALS AND METHODS Cell culture
HCT116 human colorectal cancer cells ( ATCC number CCL-247 ) were cultured as previously described ( 14 ) .
ChIP
Antibodies used in ChIP assays included : 3 mg of anti-b-catenin ( BD transduction , 610154 ) , 3 mg of anti-TCF
( Millipore , 05-511 ) , 2 ml of anti-c-Jun ( Millipore , 06-225 ) and 6 mg rabbit anti mouse IgG ( Jackson Immunoresearch , 315-005-003 ) .
b-Catenin ChIP DNA for the ChIP-Seq library was prepared using the Chromatin Immunoprecipitation Assay Kit ( Millipore , 17-295 ) according to the instructions .
To assess b-catenin , TCF4 and c-Jun binding to ChIP-Seq peak loci , ChIP assays contained 5 -- 10 10 cells and were conducted as previously 6 reported ( 13 ) .
Chromatin in formaldehyde fixed cell lysates was sonicated to an average size of 500 -- 700 bp using a Misonix Ultrasonic XL-2000 Liquid Processor ( 5 20 s , output wattage 7 , with 45 s rest intervals on ice between pulses ) .
Real time PCR was used to detect isolated ChIP fragments and samples contained 10 ml of 2 iQ SYBR Green Supermix ( Bio-rad , 170-882 ) , 0.25 mM of each primer and 3 ml of purified ChIP DNA .
Reactions were processed for one cycle at 94 C for 3 min , then 45 cycles at 94 C for 10 s and at 68 C for 40 s using a MyIQ Single Color Real-Time PCR machine ( Bio-rad ) .
Primers were designed , using Primer3 software , to a 600 bp DNA segment that was centered on the b-catenin ChIP-Seq coverage region .
Primers used in this study are listed in Supplementary Table S7 .
Real time data is represented as fold levels over control .
The control is a distal region that is 5 kb upstream from the MYC transcript start site that does not bind significant levels of b-catenin , TCF4 or c-Jun ( 14 ) .
Construction of the ChIP-Seq library
b-Catenin precipitated and purified ChIP DNA ( 350 ng ) was processed using the ChIP-Seq DNA Sample Preparation Kit ( Illumina , 1003473 ) according to instructions provided by the manufacturer .
Prior to sequencing , DNAs were re-quantified using a NanoDrop 1000 Spectrophotometer and the quality of DNA was assessed using a Bioanalyzer DNA 1000 ( Agilent ) .
Samples were diluted to 10 nM and 54-nt reads were obtained from one lane of sequencing on a Illumina GA II sequencer .
The High Throughput Sequencing Facility at the University of Oregon ( http://htseq.uoregon.edu ) sequenced the library .
Raw sequence data was submitted to the sequence read archive ( SRA ) under accession number SRA012054 .
Realignment of ChIP-Seq reads
Reads ( 9 322 654 ) were sequenced and of these , 8 456 287 passed the quality filter as assessed using ELAND software ( Illumina ) .
We then used ELAND to align the 8 456 267 reads to the repeat masked NCBI 36/hg18 build of the human genome .
We assigned unique positions for 6 576 033 reads allowing up to two mismatches in the first 32 bases of the read sequence .
This set of reads was retained for downstream computational analysis .
Identification of b-catenin enriched regions
We utilized three peak calling programs to define a set of putative binding regions : CisGenome ( 24 ) , SISSRs ( 25 ) and WTD ( 26 ) .
Each method implements variations on a sliding window approach to identify regions of higher read depth , referred to as peaks , relative to a background distribution .
Based on the particular algorithm , background distributions are derived using reads from a negative control experiment , through monte carlo proced-ures , or through statistical models ( 23 ) .
CisGenome and SISSRs rely on statistical model fitting while the WTD method uses a randomization approach .
SISSRs was run using a window size of 20 bases with the FDR set at 0.01 .
WTD was run with the window size estimated from the binding characteristics .
Any local tag anomalies were removed and an FDR cutoff of 0.01 was assessed using 10 randomization procedures .
CisGenome was run using a window size of 100 and a read cutoff of 7 reads .
We then identified the midpoints of each of the regions and extended 299 bp upstream and 300 bp downstream so that there was a total of 600 bp identifying each putative binding region .
We chose 600 bases because it was twice what we considered to be the largest size of DNA fragments submitted for sequencing ( Figure 1B ) .
Using these criteria , CisGenome , SISSRs and WTD called 100 372 , 80 733 and 2940 peak regions , respectively .
Peaks called in common by the three algorithms yielded 2168 putative b-catenin binding regions and this set was used for further computational analysis ( See Supplementary Table S1 for a summary of peak overlaps ) .
De novo motif analysis
The genomic sequence ( 600 bp ) encompassing each enriched region was isolated and the regions were separated into two sets based upon whether they had at least one instance of a canonical ( T/A -- T/A -- C -- A -- A -- A -- G ) or evolutionarily conserved ( A -- C/G -- T/A -- T -- C -- A -- A -- A -- G ) TCF4 motif within the boundaries of the region ( 16 ) .
The reverse complements of these sequences were also included .
The sequence from each region was repeat masked and used as input into the Gibbs sampler motif finding program provided by CisGenome .
The motif finder was run searching for motifs of 7 , 11 and 15 bp using 5000 MCMC iterations and a score was produced for each motif .
Control sequences were picked based on the strategy of Ji et al. ( 27 ) .
Briefly , sequences were chosen to match the underlying characteristics of the enriched regions .
Each control sequence was picked randomly such that it was of the same size and was in the same position relative to the nearest RefSeq transcript ( 28 ) as a given enriched region .
Five sets of control sequences were chosen in this manner for both sets of enriched regions .
Motifs found from the de novo search were mapped back to both the b-catenin enriched regions and the control sequences using the motif mapping tool from CisGenome .
The number of matches were based on a likelihood ratio cutoff of 500 and a background model consisting of a third-order markov chain , in accordance with Ji et al. ( 27 ) .
The relative enrichment was computed as described ( 27 ) .
Motifs that had a relative enrichment score > 2 were determined to be over-represented in the b-catenin enriched regions .
Chromatin conformation capture
Chromatin conformation capture ( 3C ) assays were conducted as described ( 15 ) with minor modifications .
Formaldehyde cross linked chromatin was digested over-night with 40 ml ( 800 U ) of XbaI ( New England Biolabs ) .
XbaI was then heat-inactivated at 65 C for 20 min prior to ligation reactions .
After proteinase K treatment , the samples were extracted in phenol/chloroform three times , followed by three back extractions with chloroform .
The chromatin loop at CXXC5 was detected by PCR using primers C51 , GTACGTAGTCGTTTTAGCC and C56 , GCACCCAGCCTCTCAAACCC and the conditions previously described ( 15 ) .
To control for loading , parallel samples were amplified by PCR with the tubulin specific primers GGGGCTGGGTAAATGGCAAA and TGGCACTGGCTCTGGGTTCG .
Products were analyzed on a 1 % agarose gel by electrophoresis , purified and sequenced .
Serum and LiCl stimulation of HCT116 cells
HCT116 cells were synchronized in the cell cycle as previously described ( 14 ) .
For ChIP experiments , G0/G1 cells grown in a 10-cm tissue culture dish were stimulated with medium containing 10 % fetal bovine serum for 1 or 2 h prior to formaldehyde fixation .
For expression analysis , G0/G1 cells were grown in a 6-well plate prior to stimulation with medium containing serum with and without 10 mM LiCl for 1 or 4 h as indicated .
Reverse transcription/real time PCR
RNA was isolated using TRIZol reagent ( Invitrogen , 15596-018 ) according to the instructions .
cDNA was synthesized using 500 ng of total RNA and the iScript cDNA Synthesis Kit ( Bio-rad , 170-8890 ) according to the instructions .
cDNA was diluted to 1 : 150 before quantification by real-time PCR .
Real-time PCR was conducted as outlined under the ChIP section , except 3 ml of diluted cDNA was used as the template .
Primers were designed using Primer3 software and their sequences are included in Supplementary Table S7 .
RESULTS
Construction of the b-catenin ChIP-Seq library
We were interested in using ChIP coupled with massively parallel sequencing ( ChIP-Seq ) to identify b-catenin binding regions in HCT116 human colon cancer cells .
Prior to constructing the library , we tested the eficacy of our ChIP protocol to identify bona fide b-catenin targets in this cell line .
b-catenin strongly associated with a WRE located 1.4 kb downstream of the transcription stop site of the c-Myc gene ( MYC ) as we have reported previously ( Figure 1A ) ( 14,15 ) .
Furthermore , insignificant levels of b-catenin were detected at a control element located 5 kb upstream of the MYC transcription start site that did not associate with either b-catenin or TCF4 ( Figure 1A ) ( 14 ) .
Size-selected b-catenin ChIP DNA was then processed according to the Illumina sample preparation protocol and minimally amplified by PCR .
Most amplified fragments were in the range of 175 -- 225 bp and were produced in samples containing b-catenin ChIP DNA whereas these products were absent in the control sample ( Figure 1B ) .
A total of 9 322 654 reads were generated from one lane of sequencing using an Illumina GA II high throughput sequencer .
Of these , 90.7 % ( 8 456 287 ) passed the quality filter and 77.8 % ( 6 576 033 ) reads were assigned a unique position in the human genome .
The set of 6 576 033 reads was then subjected to additional computational analysis .
As outlined in the ` Materials and Methods ' section , we compared three computational algorithms and we considered the peaks called in common to demarcate putative b-catenin binding regions .
This approach yielded 2168 peaks that we termed b-catenin enriched regions .
The genomic boundaries of these regions are provided in Supplementary Table S2 .
To determine whether our approach identified bona fide Wnt/b-catenin target genes , we searched for representation of the MY gene .
b-catenin enriched regions coincided with the 50 , 30 and distal WREs previously shown to regulate MYC expression ( Figure 1C ) ( 15,22,29 -- 31 ) .
Thus , our approach identified b-catenin associated WREs in colon cancer cells .
Computational analysis of b-catenin enriched regions We next localized the b-catenin enriched regions relative to transcripts deposited in the reference sequence database ( RefSeq ) ( 28 ) .
Of the 2168 b-catenin enriched regions , 1562 ( 72 % ) were within 50 kb of a RefSeq transcript .
Upon further analysis , we found that 1219 ( 56 % ) were within 10 kb and 1090 ( 50 % ) were within 2.5 kb ( Figure 2A ) .
With respect to protein-coding genes and in agreement with our previous findings ( 12 ) , we found that b-catenin preferentially localized to internal positions or those positions that are downstream from the transcription start site and upstream from the transcription stop site ( Figure 2B ) .
There was a tendency for b-catenin enriched regions within 2.5 kb of the 50 gene boundary to cluster around transcriptional start sites ( Figure 2C ) .
The genes containing b-catenin enriched regions near ( < 2.5 kb ) or within gene boundaries are listed in Supplementary Table S3 .
Overall , the 1090 b-catenin enriched regions are near or within 988 genes .
It was recently shown that a distal WRE interacted with MYC through a large chromatin loop ( 30,31 ) .
This was the first demonstration indicating that a WRE positioned hundreds of kilobases away from their target genes functioned as a transcriptional enhancer .
To further explore the relationship of b-catenin enriched regions and annotated protein-coding transcripts , we determined the empirical cumulative distribution function ( CDF ) of the distance from each b-catenin enriched region to the nearest transcript .
This analysis found that 80 % of enriched regions were within 100 kb of an annotated transcript and that 95 % were within 450 kb ( Figure 2D ) .
Together these findings indicate that while most b-catenin regions are near or within protein-coding genes , 28 % localized at a distance of > 50 kb away .
Furthermore , localization of b-catenin enriched regions with gene boundaries was statistically significant when compared to localization of a control set of regions with gene boundaries ( Supplementary Figure S1 ) .
b-Catenin occupancy of 50 and 30 regions at CXXC5 identified a chromatin loop Recently we described a b-catenin and TCF4-coordinated chromatin loop at MYC that integrated 50 and 30 proximal WREs ( 15 ) .
To identify targets that may be likewise regulated , we searched for genes that contained both 50 and 30 b-catenin enriched regions .
In addition to MYC , we found two genes , CXXC5 and FXR2 , that contained b-catenin enriched regions within 2.5 kb of both transcript boundaries ( Supplementary Table S4 and Figure 3A ) .
If the range was expanded to include regions 10 kb from the 50 and 30 ends , 11 loci were identified .
This number increased to 111 loci if the range was further expanded to 50 kb .
We first used ChIP and real-time PCR to determine whether b-catenin occupied the identified regions relative to CXXC5 .
b-catenin precipitated higher levels of the 50 and 30 CXXC5 enriched regions relative to control ( Figure 3B ) .
We then used chromatin conform-ation capture ( 3C ) to determine whether a chromatin loop containing the 50 and 30 b-catenin associated regions formed at CXXC5 ( 32 ) .
Figure 3A depicts the pos-itions of the XbaI restriction endonuclease sites and PCR primer locations used to interrogate CXXC5 in 3C assays .
A PCR product of the correct size was generated with forward primer C51 and reverse primer C56 , and its production was dependent upon the addition of XbaI and DNA ligase to the reaction ( Figure 3C ) .
This 341 bp fragment was sequenced and confirmed to be the correct CXXC5 product ( Figure 3D ) .
This analysis indicated that a chromatin loop containing b-catenin bound 50 and 30 WREs is present at CXXC5 in human colon cancer cells .
Motif analysis of b-catenin enriched regions
Genome-wide binding analysis has indicated that most b-catenin recruitment to chromatin in colon cancer cells likely occurred through interactions with TCF4 ( 12 ) .
Therefore , we first determined whether the b-catenin enriched regions contained a canonical TCF4 motif ( T/A -- T/A -- C -- A -- A -- A -- G ) or the evolutionarily conserved TCF4 motif ( A -- C/G -- T/A -- T -- C -- A -- A -- A -- G ) ( 16 ) .
Of the
GTACGTAGTCGTTTTAGCCCCGGGACTCAAGAG TTGAGGCTGATGCCTGCCTGAGAGATAAAATATCCTTTCTCGGAT
CAGTTTCCTCACCTGAGAAATGGGAACGGGAATCTCCGCCCCTT TTCTCCCGGGGCCCTAGTGCCCACTGAATCCATTAAGGAGCTCT TGGAAGGGTGGGGTCTTGGAACACGCGTCTACCTCCCAGGACC CTCGACTAGGAATCTCTGGCCCGCCGCGCACCTGAGCTGGGGG GCGCGGCCAAATTCTCCCTCCCGGTCCTCGGAGCTTCTGGCCC CGC TCTAGA CACAGAACGGTGGGGGTTTGAGAGGCTGGGTGC XbaI C56 Figure 3 .
A chromatin loop containing 50 and 30 b-catenin enriched regions is detected at the CXXC5 gene .
( A ) Schematic of the CXXC5 locus with untranslated regions as thin rectangles , introns as thin lines , exons as thick rectangles and an arrow demarcating the transcription start site .
The peak density plots below the gene represent b-catenin enriched regions identified in the ChIP-Seq library .
The triangles and stunted arrows identify the XbaI sites and PCR primers , respectively , used in the chromatin conformation capture ( 3C ) assay depicted in ( C ) .
( B ) Real time PCR analysis of b-catenin ChIP assays performed in HCT116 cells .
Specific oligonucleotides were used to detect b-catenin binding to enriched regions depicted by gray rectangles in ( A ) .
50 is the upstream site and 30 is the downstream site .
A distal upstream region of the MYC gene was used as a negative control ( Ctrl ) .
Error bars are SEM .
( C ) Agarose gel of PCR products generated from a 3C analysis of CXXC5 in HCT116 cells .
Generation of the 3C product ( CXXC5 0 0 5 3 ) with primers C51 and C56 required the addition of XbaI and ligase to the reactions .
LC is a loading control and S is a DNA standard .
( D ) DNA sequence of the 3C product .
Arrows denote primer sequences C51 and C56 and the XbaI site are boxed .
2168 enriched regions , 1026 ( 47 % ) contained at least one TCF4 motif .
A fraction of these , 192 ( 9 % ) , resembled the longer and evolutionarily conserved variant .
We then performed de novo motif analysis on these populations using a Gibbs sampler algorithm ( 24,27 ) .
Over-representation of motifs was determined by computing the relative enrichment measure as described in the ` Materials and Methods ' section ( 27 ) .
Using the 1026 enriched-regions that contained TCF4 consensus sequences , we successfully identified an over-representation of both the core and evolutionarily conserved TCF4 motifs .
This indicated that our de novo search approach was valid .
Upon further analysis , we found a striking co-enrichment of AP-1 motifs with the consensus TCF4 motifs .
Examples of all three motifs , along with their scores and enrichment values relative to control sequences , are shown in Figure 4A .
Overall , 417 b-catenin enriched regions contained a TCF4 and an AP-1 motif ( Figure 4B and Supplementary Table S5 ) .
The coupling of AP-1 and TCF4 motifs in 417 ( 19 % ) b-catenin enriched regions suggested that AP-1 , TCF4 and b-catenin may co-regulate target gene expression .
To address this hypothesis , we used the ChIP assay to determine whether these factors bound regions containing AP-1 and TCF4 motifs .
We first tested b-catenin binding to a selected subset of regions associated with 23 protein-coding genes .
For this set of genes , associated regions were those that localized within 2.5 kb from gene boundaries .
b-catenin occupied 19 sites in asynchronously growing HCT116 cells ( Figure 5A ) .
We then assayed the same regions for TCF4 binding using TCF4 specific antibodies in the ChIP assay .
TCF4 bound the same 19 targets as b-catenin ( Figure 5B ) .
We concluded from this analysis that b-catenin and TCF4 co-occupied target genes containing TCF4 and AP-1 consensus motifs .
Next , we determined whether AP-1 bound to these selected regions .
AP-1 is a heterodimeric complex comprised of Fos and Jun transcription factors ( 33,34 ) .
AP-1 regulates key cellular processes such as proliferation , differentiation and apoptosis ( 35 ) .
Several groups have shown that c-Jun associates with AP-1 consensus motifs in colon cancer cells ( 14,36 -- 39 ) .
We therefore tested whether c-Jun occupied b-catenin enriched regions containing AP-1 and TCF4 motifs .
Using c-Jun antibodies in ChIP assays conducted in asynchronously growing HCT116 cells , we found that c-Jun associated with 14 of the 19 regions that bound b-catenin and TCF4 ( Figure 5C ) .
Serum mitogens elicit signal transduction pathways that stimulate c-Jun binding to chromatin .
We have previously shown that c-Jun occupancy of the MYC 30 enhancer increased as quiescent cells re-entered the cell cycle in response to serum ( 14 ) .
We therefore determined whether treatment of quiescent cells with serum would stimulate c-Jun association with the five targets that lacked binding in asynchronous cells .
HCT116 cells were grown to confluency in serum-depleted medium for two days , which caused these cells to enter the G0/G1 stage of the cell cycle ( 14,39 ) .
Cells were then treated with medium containing serum for 1 or 2 h and c-Jun ChIP assays were conducted .
In line with previous findings , higher levels of c-Jun were found at the MYC 30 enhancer when synchronized cells were exposed to serum for 1 h as compared to levels detected in quiescent cells media .
We then added medium containing serum with or without 10 mM LiCl for 1 or 4 h. LiCl is a well-established agonist of the Wnt/b-catenin pathway as it inhibits GSK3b and stimulates nuclear b-catenin accumulation ( 13,40,41 ) .
We and others have shown that LiCl increased b-catenin levels in HCT116 cells ( 13,42,43 ) .
Therefore , we predicted that if mitogen and Wnt/b-catenin signaling pathways converged to regulate gene expression , treatment with serum and LiCl would result in increased transcript levels when compared to treatment with serum alone .
LiCl increased mitogen-induced expression of MYC , PDE4B , DDR2 , CTBP2 , EGFR , DNAJB1 , WISP1 and PINX1 ( Figure 6B ) .
HDAC4 , PCDH7 and HABP4 were activated by serum alone , and MMP20 , CYP39A1 and YAP1 genes were not induced above levels seen in serum-deprived cells ( Figure 6B ) .
Together this analysis indicated that Wnt/b-catenin and mitogen-signaling pathways directly activate a subset of b-catenin target genes in colon cancer cells .
DISCUSSION
Gene expression is rarely controlled by the association of a single transcription factor with an enhancer element embedded in the proximal promoter .
Rather , association of multiple transcription factors within an enhancer allows for precise and specific regulation of gene expression in response to environmental stimuli .
Moreover , enhancers can occupy regions over 100 kb from their target gene ( 44,45 ) .
Genome-wide profiling of transcription factor binding sites has emerged as one method to localize composite enhancer elements that integrate upstream signal transduction pathways ( 17,45 ) .
In this report , we used ChIP-Seq to identify b-catenin enriched regions in human colon cancer cells .
Through an integrated approach involving bioinformatics , ChIP and expression analyses , we provide evidence that a population of b-catenin target genes is directly regulated by b-catenin , TCF4 and AP-1 transcription factors .
The nature of ChIP-Seq data provides many challenges for analysis ( 18 ) .
Algorithms have been designed to assign a presence or absence prediction for occupancy at any non-repetitive region of the genome .
A common approach for many algorithms is to use sliding window methods that identify regions of high read depth ( relative to a background distribution ) by traversing the genome in windows of a predetermined size .
CisGenome , SISSRs and WTD algorithms exemplify this approach ( 24 -- 26 ) .
It follows that although each algorithm finds disparate numbers of enriched regions , increased confidence can be assigned to regions that have been found by all three .
This approach resulted in 2168 enriched b-catenin binding regions identified in our b-catenin ChIP-Seq library .
In addition to the three WREs that control MYC expression ( 14,22,29 ) , 30 Wnt/b-catenin target genes listed on the Wnt homepage ( http://www.stanford.edu/ rnusse / pathways/targets .
html ) were in proximity to b-catenin enriched regions ( Supplementary Table S5 ) .
Furthermore , when considering of all genomic regions identified and assayed for b-catenin and TCF4 binding in this report , we found that 28 of 32 ( 87.5 % ) bound both factors .
Together , these findings indicate that the b-catenin ChIP-Seq library identified bona fide and direct Wnt/b-catenin target genes .
We previously localized b-catenin binding sites in colon cancer cells using an unbiased and genome-wide approach termed SACO ( 14 ) .
In that study , we found that 84 % of high confidence b-catenin binding regions contained at least one TCF4 consensus core motif ( T/A -- T/A -- C -- A -- A -- A -- G ) .
In this report , we performed de novo motif analysis on the 2168 b-catenin enriched regions and found that 47 % contained a core TCF4 consensus motif .
This discrepancy is likely attributed to methodo-logical and computational differences in the generation and analysis of each data set .
For the SACO study , a 5 kb interval surrounding the mean position of the enriched region was used in the analysis .
For the ChIP-Seq analysis , we examined a much smaller interval enveloping each b-catenin enriched region ( 600 bp ) .
Therefore , based on our findings using ChIP-Seq and due to the resolution of this technique , the 47 % association rate of TCF4 consensus motifs found in b-catenin enriched regions likely reflects the landscape in vivo .
Results gleaned from our current analysis are consistent with TCF4 being a predominant factor that directly recruits b-catenin to enhancers in colon cancer cells , but also suggest that the portion of targets that rely on other b factors to recruit - catenin is substantial .
Two recent studies localized TCF4 binding regions in human colon cancer cells using ChIP-Seq ( 21,22 ) .
We b therefore searched for representation of our - catenin enriched regions in the reported TCF4 libraries .
Of the 10 conserved TCF4 binding peak regions identified by Tuupanen et al. ( 22 ) , three were identified in our b-catenin ChIP-Seq library .
This included the peak that identified the WRE located 335 kb upstream from the MYC transcription start site .
Upon analysis of the TCF4 ChIP-Seq data sets reported by Blahnik et al. ( 21 ) and the ENCODE Project Consortium , we found that 786 ( 36.3 % ) of our b-catenin peak regions overlapped a TCF4 peak region .
There are several possibilities for why a greater percentage of our b-catenin peak regions are not represented in the aforementioned ChIP-Seq libraries .
Methodological variations , algorithms chosen to assign peak regions , and cell-type differences aside , TCF4 is bound to both transcriptionally active and repressed genes .
As b-catenin is thought to primarily associate with transcribed genes , a partial overlap of the b-catenin enriched regions identified in our study with the TCF4 enriched regions is expected .
Furthermore , as mentioned above , our analysis suggests that the many genes likely recruit b-catenin independently of TCF4 .
Most of these targets are not represented in the TCF4 ChIP-Seq libraries .
Finally , the amount of sequencing required to identify all of the binding sites represented in a ChIP-Seq library is debatable ( 18 ) .
Therefore , we would anticipate that additional sequencing of our library would undoubtedly identify more of the reported TCF4 peak regions .
Overall , however , the concordance of TCF4 and b-catenin peaks identified by three separate groups independently validates ChIP-Seq as a methodology to identify direct Wnt/b-catenin target genes .
Upon mapping the b-catenin enriched regions relative to RefSeq gene boundaries , we found that binding sites are dispersed through the 50 , intragenic , and 30 ends of gene boundaries .
This finding is in line with previous genome-wide localization studies of b-catenin and TCF4 binding sites ( 12,16 ) .
We were intrigued by the observation that several targets contained b-catenin enriched regions 0 0 that localized to both 5 and 3 gene boundaries .
Based on our previous work with MYC ( 15 ) , we tested whether a chromatin loop was present at two targets that contained 50 and 30 b-catenin enriched regions in the library , CXXC5 and GRHL3 .
CXXC5 is a zinc finger containing protein that inhibits canonical Wnt signaling in response to bone morphogen protein signaling in neural stem cells ( 46 ) .
GRHL3 encodes a Grainyhead factor that plays a role in epidermal barrier formation in the bladder ( 47 ) .
0 0 Substantial levels of b-catenin binding to the 5 and 3 regions of CXXC5 and GRHL3 were detected by ChIP analysis ( Figure 3B and Supplementary Figure S2 ) .
Using the 3C technique , we found that a chromatin loop 0 0 accompanied b-catenin binding to 5 and 3 regions of CXXC5 .
However , we were unable to detect a chromatin loop at GRHL3 .
This finding suggests that while looping between separated enhancers may be a prevalent mechan-ism to coordinate Wnt/b-catenin gene expression , b-catenin binding to 50 and 30 sites alone is not suficient for this interaction .
We anticipate that 3C coupled with high throughput sequencing techniques will facilitate identification of target genes that are regulated by distal WREs via chromatin loops ( 48,49 ) .
Through de novo motif analysis , we found that nearly half of the b-catenin enriched regions that contained a TCF4 consensus motif also contained an AP-1 motif .
It is noted here that AP-1 motifs were also over-represented in TCF4 bound regions identified by recent ChIP-Seq and ChIP-chip analysis ( 16,21 ) .
However , our current study is the first to report over-representation of AP-1 motifs in b-catenin bound regions .
While b-catenin/TCF4 and AP-1 have been shown by others and our group to regulate target gene expression ( 36,38,39,50 ) , our findings here suggest that target genes regulated by 417 b-catenin enriched regions may be likewise regulated .
Our ChIP analysis indicated that nearly every region assayed ( 95 % ) containing a TCF4 and AP-1 motif bound c-Jun , a component of the AP-1 complex .
The majority of these loci showed an additive increase in expression upon the addition of both LiCl and serum .
This analysis suggests that mitogen and Wnt signaling pathways likely converge through AP-1 and b-catenin/TCF4 to co-regulate target gene expression .
However , LiCl treatment failed to enhance mitogen-activated gene expression for several targets .
It is possible that pre-treatment of quiescent cells with LiCl prior to serum stimulation would sensitize the system to facilitate detection of pathway cooperation .
Alternatively , AP-1 binding may function to regulate gene expression in response to a different stimulus such as cytokine signaling or the apoptotic stress response ( 35 ) .
The application of sequence-based methods to identify transcription factor binding sites genome-wide is likely t persist as the methodology of choice .
Next generation sequencing technology , such as those using the Illumina platform , allows increased resolution and increased output .
These attributes have facilitated the replacement of SACO with massively parallel sequencing approaches to map transcription factor binding regions isolated by b ChIP .
Through our ChIP-Seq screen of - catenin binding regions in asynchronous HCT116 cells , we uncovered evidence for a functional interplay between b-catenin/TCF4 and AP-1 .
Because cells that initiate colon carcinogenesis contain pathogenic levels of b-catenin in the nucleus and are bathed in serum mitogens , our findings here suggest that miss-expression of target genes containing AP-1 and TCF4 motifs might represent the pathogenically relevant set .
Supplementary Data are available at NAR Online.
ACKNOWLEDGEMENTS
We would like to thank Dr Laura Carrel and Dr Faoud Ishmael ( Penn State University College of Medicine ) for critically reading this manuscript and providing helpful comments .
We would like to thank Doug Turnbull and the High Throughput Sequencing Facility in the Molecular Biology Institute at the University of Oregon for sequencing the library .
We would like to thank Dr Richard Goodman and Dr Gail Mandel ( Oregon Health and Science University ) for support during the initiation of this project .
FUNDING
National Institutes of Health ( grant number R01DK080805 to G.S.Y. ) ; start-up research funds from the Pennsylvania State University College of Medicine ( to G.S.Y. ) ; National Institutes of Health , National Center for Research Resources ( 5UL1RR024140 to S.K.M. ) ; National Institutes of Health , National Cancer Institute ( 5 P30 CA069533-13 to S.K.M. ) .
Funding for open access charge : National Institutes of Health ( grant number R01DK080805 to G.S.Y. ) .