20460455.txt 40.9 KB

Raw Blame History Permalink

Identification of b-catenin binding regions in colon
ABSTRACT
Nucleic Acids Research , 2010 , Vol .
38 , No. 17 5735 -- 5745 doi :10.1093 / nar/gkq363
Deregulation of the Wnt/b-catenin signaling pathway is a hallmark of colon cancer .
Mutations in the adenomatous polyposis coli ( APC ) gene occur in the vast majority of colorectal cancers and are an initiating event in cellular transformation .
Cells harboring mutant APC contain elevated levels of the b-catenin transcription coactivator in the nucleus which leads to abnormal expression of genes controlled by b-catenin/T-cell factor 4 ( TCF4 ) complexes .
Here , we use chromatin immunoprecipitation coupled with massively parallel sequencing ( ChIP-Seq ) to identify b-catenin binding regions in HCT116 human colon cancer cells .
We localized 2168 b-catenin enriched regions using a concordance approach for integrating the output from multiple peak alignment algorithms .
Motif discovery algorithms found a core TCF4 motif ( T/A -- T/A -- C -- A -- A -- A -- G ) , an extended TCF4 motif ( A/T/G -- C/G -- T/A -- T/A -- C -- A -- A -- A -- G ) and an AP-1 motif ( T -- G -- A -- C/T -- T -- C -- A ) to be significantly represented in b-catenin enriched regions .
Furthermore , 417 regions contained both TCF4 and AP-1 motifs .
Genes associated with TCF4 and AP-1 motifs bound b-catenin , TCF4 and c-Jun in vivo and were activated by Wnt signaling and serum growth factors .
Our work provides evidence that Wnt / b-catenin and mitogen signaling pathways intersect directly to regulate a defined set of target genes .
Oregon Clinical and Translational Research Institute , Oregon Health and Science University , Portland , OR , 2 The Department of Biochemistry and Molecular Biology , The Pennsylvania State University College of Medicine , Hershey , PA 17033 , Department of Medical Informatics and Clinical Epidemiology , Oregon Health 3 and Science University , Portland , OR , Knight Cancer Institute , Division of Biostatistics in the Department of 4 5 Public Health and Preventative Medicine , Oregon Health and Science University , Portland , OR , Program in 6 Cellular and Molecular Biology and The Pennsylvania State Hershey Cancer Institute , The Pennsylvania State 7 University College of Medicine , Hershey , PA 17033 , USA
INTRODUCTION
The Wnt/b-catenin signaling pathway is required for homeostasis in the gastrointestinal ( GI ) tract ( 1 ) .
The GI tract is coated with small invaginations , or crypts , which are comprised of discrete zones of proliferating and differentiated cells ( 1 ) .
Wnt signaling maintains the proliferative compartment of the crypt .
The b-catenin transcriptional coactivator controls downstream genetic programs elicited by Wnt signaling and its cellular levels are tightly regulated ( 2,3 ) .
When cells are not exposed to Wnt , cytosolic b-catenin associates with a multi-protein complex that contains the adenomatous polyposis coli ( APC ) protein .
APC functions as a scaffold to coordinate b-catenin phosphorylation and degradation by the prote-asome .
Under these conditions , Wnt/b-catenin target genes are silenced by corepressor complexes that are tethered to Wnt responsive DNA enhancers ( WREs ) through interactions with the T-cell factor ( TCF ) family of transcription factors ( 4 ) .
TCF4 is a predominant TCF family member in colon cancer cells ( 5 ) .
When cells are exposed to Wnt , b-catenin escapes proteasomal degrad-ation , and is chaperoned to the nucleus by APC .
There , it occupies TCF4 bound WREs and displaces the corepressors .
b-catenin then recruits chromatinmodifying complexes and Wnt/b-catenin target genes are expressed ( 4 ) .
Deregulation of the Wnt/b-catenin pathway is associated with colon carcinogenesis ( 2,3 ) .
In virtually all cases of colon cancer , mutations target components of the Wnt signaling pathway ( 6 ) .
The most common lesions localize to APC and lead to production of a truncated APC protein that can no longer effectively coordinate b-catenin degradation .
This mutation occurs at the earliest stages of carcinogensis when normal colonic epithelial cells are transformed into aberrant crypt foci ( 6 ) .
Inherited APC mutations give rise to familial adenomatous polyposis , a disease where aﬄicted individuals are burdened by thousands of intestinal polyps early in adulthood ( 7,8 ) .
In the rare cancer cases where APC is wild-type , mutations instead are found in CTNNB1 ( 3,9,10 ) .
CTNNB1 is the gene that encodes b-catenin , and the cancer causing lesions map to positions near the 50 portion of the gene .
These mutations give rise to a b-catenin pool that is resistant to proteasomal degrad-ation ( 11 ) .
In each instance , mutations that target APC or CTNNB1 lead to high levels of b-catenin in the nucleus and abnormal expression of genes regulated by b-catenin / TCF4 complexes ( 5,9 ) .
Therefore identifying target genes directly controlled by b-catenin/TCF4 is required to understand the pathogenesis of this disease .
To identify direct b-catenin/TCF4 target genes it is ﬁrst necessary to map binding sites for these factors across the genome .
Previously , we used an unbiased and genome-wide screen termed serial analysis of chromatin occupancy ( SACO ) to localize 412 high conﬁdence b-catenin binding sites in the human colorectal cancer cell line , HCT116 ( 12 ) .
Approximately half of the binding sites were near ( < 2.5 kb ) or within protein-coding gene boundaries .
These b-catenin binding sites were located in 50 promoter regions , intragenic regions and 30 untranslated regions .
b-Catenin binding to 30 positions relative to E2F4 and MYC genes identiﬁed functional WREs .
For E2F4 , the downstream enhancer drove expression of an antisense and non-coding transcript that decreased E2F4 protein levels ( 13 ) .
For MYC , b-catenin occupancy of the downstream enhancer initiated a chromatin loop that integrated the 50 WRE to coordinate MYC expression in response to Wnt/b-catenin and mitogen signaling pathways ( 14,15 ) .
Recently , Hatzis et al. ( 16 ) used chromatin immunoprecipitation ( ChIP ) coupled with microarrays ( ChIP-chip ) to localize 6868 TCF4 binding sites in L171 colorectal cancer cells ( 16 ) .
As was the case for b-catenin , TCF4 binding was also found throughout protein coding gene boundaries .
Together these studies indicate that Wnt activation of target gene expression is mechanistically more intricate than the simple model involving recruitment of b-catenin to TCF4-bound 50 promoter regions .
While SACO was a pioneering technique used to identify transcription factor binding sites and was a viable alternative to the ChIP-chip approach , it did , like most methodologies , suffer from some limitations .
First , SACO libraries were constructed in plasmid vectors and then sequenced using high-throughput and Sanger-based sequencing .
This was both laborious and costly .
In addition , large DNA fragments were included in construction of the earliest SACO libraries .
While most DNA was in the 500 -- 700 bp range , fragments as large as 2.5 kb were included in the b-catenin SACO library .
This impinged upon the resolution of the technique and hindered b de novo motif discovery within - catenin bound loci .
A successor to SACO is the recently described ChIP coupled with massively parallel sequencing ( ChIP-Seq ) approach ( 17 -- 19 ) .
In this technique , DNA puriﬁed from immunoprecipitated chromatin is size-selected and then sequenced using one of the next-generation sequencing platforms such as the Illumina genome analyzer ( 18,19 ) .
The robustness , cost , resolution and relative ease in library construction has made ChIP-seq , rather than SACO , a current method of choice for genome-wide localization of transcription factor binding sites .
ChIP-Seq has been used to map numerous histone modiﬁcations ( 20 ) and binding sites for several transcription factors including , but not limited to , NRSF/REST , GATA1 , SRF , E2F4 , E2F6 and STAT1 ( 17 ) .
In addition , ChIP-Seq has been used recently to identify TCF4 enriched binding regions in human colon cancer cell lines ( 21,22 ) .
Tuupanen et al. ( 22 ) identiﬁed 10 TCF4-site containing regions in LoVo cells using a combination of ChIP-Seq and the enhancer element locator analysis .
Blahnik et al. ( 21 ) used ChIP-Seq to identify 21 102 TCF4 binding sites in HCT116 cells .
In this report we used ChIP-Seq to identify b-catenin binding regions in HCT116 human colon cancer cells .
We chose this high-resolution and genome-wide approach because we were interested in using de novo motif analysis to identify transcription factors that putatively cooperate with b-catenin/TCF4 .
Many algorithms exist to map the enriched genomic regions identiﬁed in a ChIP-Seq experiment ( 23 ) .
Because each approach varies in computational strategy and can produce dramatically different numbers of enriched regions for a given false discovery rate ( FDR ) cutoff , there is some debate as to which is the preferred algorithm ( 18 ) .
In this report , we used CisGenome ( 24 ) , SISSRs ( 25 ) and WTD ( 26 ) to initially identify enriched regions from the b-catenin ChIP-Seq library .
Based on the intersection of regions found in common with each algorithm , we identiﬁed 2168 b-catenin enriched regions .
Consistent with our previous report ( 12 ) , we found over-representation of the core and evolutionarily conserved TCF4 consensus motifs within enriched regions .
In addition , we found that consensus AP-1 motifs were also over-represented in a large subset of the enriched regions with the majority of these motifs co-occurring with a TCF4 motif .
Finally , we show that serum mitogens and Wnt signaling agonists cooperatively activate expression of some target genes in proximity to b-catenin bound loci that contain AP-1 and TCF4 motifs .
These ﬁndings indicate that a discrete subset of b-catenin target genes are activated by mitogen and Wnt signaling in colon cancer cells , and that this regulation likely occurs through consensus AP-1 and TCF4 sites , respectively .
MATERIALS AND METHODS Cell culture
HCT116 human colorectal cancer cells ( ATCC number CCL-247 ) were cultured as previously described ( 14 ) .
ChIP
Antibodies used in ChIP assays included : 3 mg of anti-b-catenin ( BD transduction , 610154 ) , 3 mg of anti-TCF
( Millipore , 05-511 ) , 2 ml of anti-c-Jun ( Millipore , 06-225 ) and 6 mg rabbit anti mouse IgG ( Jackson Immunoresearch , 315-005-003 ) .
b-Catenin ChIP DNA for the ChIP-Seq library was prepared using the Chromatin Immunoprecipitation Assay Kit ( Millipore , 17-295 ) according to the instructions .
To assess b-catenin , TCF4 and c-Jun binding to ChIP-Seq peak loci , ChIP assays contained 5 -- 10 10 cells and were conducted as previously 6 reported ( 13 ) .
Chromatin in formaldehyde ﬁxed cell lysates was sonicated to an average size of 500 -- 700 bp using a Misonix Ultrasonic XL-2000 Liquid Processor ( 5 20 s , output wattage 7 , with 45 s rest intervals on ice between pulses ) .
Real time PCR was used to detect isolated ChIP fragments and samples contained 10 ml of 2 iQ SYBR Green Supermix ( Bio-rad , 170-882 ) , 0.25 mM of each primer and 3 ml of puriﬁed ChIP DNA .
Reactions were processed for one cycle at 94 C for 3 min , then 45 cycles at 94 C for 10 s and at 68 C for 40 s using a MyIQ Single Color Real-Time PCR machine ( Bio-rad ) .
Primers were designed , using Primer3 software , to a 600 bp DNA segment that was centered on the b-catenin ChIP-Seq coverage region .
Primers used in this study are listed in Supplementary Table S7 .
Real time data is represented as fold levels over control .
The control is a distal region that is 5 kb upstream from the MYC transcript start site that does not bind signiﬁcant levels of b-catenin , TCF4 or c-Jun ( 14 ) .
Construction of the ChIP-Seq library
b-Catenin precipitated and puriﬁed ChIP DNA ( 350 ng ) was processed using the ChIP-Seq DNA Sample Preparation Kit ( Illumina , 1003473 ) according to instructions provided by the manufacturer .
Prior to sequencing , DNAs were re-quantiﬁed using a NanoDrop 1000 Spectrophotometer and the quality of DNA was assessed using a Bioanalyzer DNA 1000 ( Agilent ) .
Samples were diluted to 10 nM and 54-nt reads were obtained from one lane of sequencing on a Illumina GA II sequencer .
The High Throughput Sequencing Facility at the University of Oregon ( http://htseq.uoregon.edu ) sequenced the library .
Raw sequence data was submitted to the sequence read archive ( SRA ) under accession number SRA012054 .
Realignment of ChIP-Seq reads
Reads ( 9 322 654 ) were sequenced and of these , 8 456 287 passed the quality ﬁlter as assessed using ELAND software ( Illumina ) .
We then used ELAND to align the 8 456 267 reads to the repeat masked NCBI 36/hg18 build of the human genome .
We assigned unique positions for 6 576 033 reads allowing up to two mismatches in the ﬁrst 32 bases of the read sequence .
This set of reads was retained for downstream computational analysis .
Identiﬁcation of b-catenin enriched regions
We utilized three peak calling programs to deﬁne a set of putative binding regions : CisGenome ( 24 ) , SISSRs ( 25 ) and WTD ( 26 ) .
Each method implements variations on a sliding window approach to identify regions of higher read depth , referred to as peaks , relative to a background distribution .
Based on the particular algorithm , background distributions are derived using reads from a negative control experiment , through monte carlo proced-ures , or through statistical models ( 23 ) .
CisGenome and SISSRs rely on statistical model ﬁtting while the WTD method uses a randomization approach .
SISSRs was run using a window size of 20 bases with the FDR set at 0.01 .
WTD was run with the window size estimated from the binding characteristics .
Any local tag anomalies were removed and an FDR cutoff of 0.01 was assessed using 10 randomization procedures .
CisGenome was run using a window size of 100 and a read cutoff of 7 reads .
We then identiﬁed the midpoints of each of the regions and extended 299 bp upstream and 300 bp downstream so that there was a total of 600 bp identifying each putative binding region .
We chose 600 bases because it was twice what we considered to be the largest size of DNA fragments submitted for sequencing ( Figure 1B ) .
Using these criteria , CisGenome , SISSRs and WTD called 100 372 , 80 733 and 2940 peak regions , respectively .
Peaks called in common by the three algorithms yielded 2168 putative b-catenin binding regions and this set was used for further computational analysis ( See Supplementary Table S1 for a summary of peak overlaps ) .
De novo motif analysis
The genomic sequence ( 600 bp ) encompassing each enriched region was isolated and the regions were separated into two sets based upon whether they had at least one instance of a canonical ( T/A -- T/A -- C -- A -- A -- A -- G ) or evolutionarily conserved ( A -- C/G -- T/A -- T -- C -- A -- A -- A -- G ) TCF4 motif within the boundaries of the region ( 16 ) .
The reverse complements of these sequences were also included .
The sequence from each region was repeat masked and used as input into the Gibbs sampler motif ﬁnding program provided by CisGenome .
The motif ﬁnder was run searching for motifs of 7 , 11 and 15 bp using 5000 MCMC iterations and a score was produced for each motif .
Control sequences were picked based on the strategy of Ji et al. ( 27 ) .
Brieﬂy , sequences were chosen to match the underlying characteristics of the enriched regions .
Each control sequence was picked randomly such that it was of the same size and was in the same position relative to the nearest RefSeq transcript ( 28 ) as a given enriched region .
Five sets of control sequences were chosen in this manner for both sets of enriched regions .
Motifs found from the de novo search were mapped back to both the b-catenin enriched regions and the control sequences using the motif mapping tool from CisGenome .
The number of matches were based on a likelihood ratio cutoff of 500 and a background model consisting of a third-order markov chain , in accordance with Ji et al. ( 27 ) .
The relative enrichment was computed as described ( 27 ) .
Motifs that had a relative enrichment score > 2 were determined to be over-represented in the b-catenin enriched regions .
Chromatin conformation capture
Chromatin conformation capture ( 3C ) assays were conducted as described ( 15 ) with minor modiﬁcations .
Formaldehyde cross linked chromatin was digested over-night with 40 ml ( 800 U ) of XbaI ( New England Biolabs ) .
XbaI was then heat-inactivated at 65 C for 20 min prior to ligation reactions .
After proteinase K treatment , the samples were extracted in phenol/chloroform three times , followed by three back extractions with chloroform .
The chromatin loop at CXXC5 was detected by PCR using primers C51 , GTACGTAGTCGTTTTAGCC and C56 , GCACCCAGCCTCTCAAACCC and the conditions previously described ( 15 ) .
To control for loading , parallel samples were ampliﬁed by PCR with the tubulin speciﬁc primers GGGGCTGGGTAAATGGCAAA and TGGCACTGGCTCTGGGTTCG .
Products were analyzed on a 1 % agarose gel by electrophoresis , puriﬁed and sequenced .
Serum and LiCl stimulation of HCT116 cells
HCT116 cells were synchronized in the cell cycle as previously described ( 14 ) .
For ChIP experiments , G0/G1 cells grown in a 10-cm tissue culture dish were stimulated with medium containing 10 % fetal bovine serum for 1 or 2 h prior to formaldehyde ﬁxation .
For expression analysis , G0/G1 cells were grown in a 6-well plate prior to stimulation with medium containing serum with and without 10 mM LiCl for 1 or 4 h as indicated .
Reverse transcription/real time PCR
RNA was isolated using TRIZol reagent ( Invitrogen , 15596-018 ) according to the instructions .
cDNA was synthesized using 500 ng of total RNA and the iScript cDNA Synthesis Kit ( Bio-rad , 170-8890 ) according to the instructions .
cDNA was diluted to 1 : 150 before quantiﬁcation by real-time PCR .
Real-time PCR was conducted as outlined under the ChIP section , except 3 ml of diluted cDNA was used as the template .
Primers were designed using Primer3 software and their sequences are included in Supplementary Table S7 .
RESULTS
Construction of the b-catenin ChIP-Seq library
We were interested in using ChIP coupled with massively parallel sequencing ( ChIP-Seq ) to identify b-catenin binding regions in HCT116 human colon cancer cells .
Prior to constructing the library , we tested the eficacy of our ChIP protocol to identify bona ﬁde b-catenin targets in this cell line .
b-catenin strongly associated with a WRE located 1.4 kb downstream of the transcription stop site of the c-Myc gene ( MYC ) as we have reported previously ( Figure 1A ) ( 14,15 ) .
Furthermore , insigniﬁcant levels of b-catenin were detected at a control element located 5 kb upstream of the MYC transcription start site that did not associate with either b-catenin or TCF4 ( Figure 1A ) ( 14 ) .
Size-selected b-catenin ChIP DNA was then processed according to the Illumina sample preparation protocol and minimally ampliﬁed by PCR .
Most ampliﬁed fragments were in the range of 175 -- 225 bp and were produced in samples containing b-catenin ChIP DNA whereas these products were absent in the control sample ( Figure 1B ) .
A total of 9 322 654 reads were generated from one lane of sequencing using an Illumina GA II high throughput sequencer .
Of these , 90.7 % ( 8 456 287 ) passed the quality ﬁlter and 77.8 % ( 6 576 033 ) reads were assigned a unique position in the human genome .
The set of 6 576 033 reads was then subjected to additional computational analysis .
As outlined in the ` Materials and Methods ' section , we compared three computational algorithms and we considered the peaks called in common to demarcate putative b-catenin binding regions .
This approach yielded 2168 peaks that we termed b-catenin enriched regions .
The genomic boundaries of these regions are provided in Supplementary Table S2 .
To determine whether our approach identiﬁed bona ﬁde Wnt/b-catenin target genes , we searched for representation of the MY gene .
b-catenin enriched regions coincided with the 50 , 30 and distal WREs previously shown to regulate MYC expression ( Figure 1C ) ( 15,22,29 -- 31 ) .
Thus , our approach identiﬁed b-catenin associated WREs in colon cancer cells .
Computational analysis of b-catenin enriched regions We next localized the b-catenin enriched regions relative to transcripts deposited in the reference sequence database ( RefSeq ) ( 28 ) .
Of the 2168 b-catenin enriched regions , 1562 ( 72 % ) were within 50 kb of a RefSeq transcript .
Upon further analysis , we found that 1219 ( 56 % ) were within 10 kb and 1090 ( 50 % ) were within 2.5 kb ( Figure 2A ) .
With respect to protein-coding genes and in agreement with our previous ﬁndings ( 12 ) , we found that b-catenin preferentially localized to internal positions or those positions that are downstream from the transcription start site and upstream from the transcription stop site ( Figure 2B ) .
There was a tendency for b-catenin enriched regions within 2.5 kb of the 50 gene boundary to cluster around transcriptional start sites ( Figure 2C ) .
The genes containing b-catenin enriched regions near ( < 2.5 kb ) or within gene boundaries are listed in Supplementary Table S3 .
Overall , the 1090 b-catenin enriched regions are near or within 988 genes .
It was recently shown that a distal WRE interacted with MYC through a large chromatin loop ( 30,31 ) .
This was the ﬁrst demonstration indicating that a WRE positioned hundreds of kilobases away from their target genes functioned as a transcriptional enhancer .
To further explore the relationship of b-catenin enriched regions and annotated protein-coding transcripts , we determined the empirical cumulative distribution function ( CDF ) of the distance from each b-catenin enriched region to the nearest transcript .
This analysis found that 80 % of enriched regions were within 100 kb of an annotated transcript and that 95 % were within 450 kb ( Figure 2D ) .
Together these ﬁndings indicate that while most b-catenin regions are near or within protein-coding genes , 28 % localized at a distance of > 50 kb away .
Furthermore , localization of b-catenin enriched regions with gene boundaries was statistically signiﬁcant when compared to localization of a control set of regions with gene boundaries ( Supplementary Figure S1 ) .
b-Catenin occupancy of 50 and 30 regions at CXXC5 identiﬁed a chromatin loop Recently we described a b-catenin and TCF4-coordinated chromatin loop at MYC that integrated 50 and 30 proximal WREs ( 15 ) .
To identify targets that may be likewise regulated , we searched for genes that contained both 50 and 30 b-catenin enriched regions .
In addition to MYC , we found two genes , CXXC5 and FXR2 , that contained b-catenin enriched regions within 2.5 kb of both transcript boundaries ( Supplementary Table S4 and Figure 3A ) .
If the range was expanded to include regions 10 kb from the 50 and 30 ends , 11 loci were identiﬁed .
This number increased to 111 loci if the range was further expanded to 50 kb .
We ﬁrst used ChIP and real-time PCR to determine whether b-catenin occupied the identiﬁed regions relative to CXXC5 .
b-catenin precipitated higher levels of the 50 and 30 CXXC5 enriched regions relative to control ( Figure 3B ) .
We then used chromatin conform-ation capture ( 3C ) to determine whether a chromatin loop containing the 50 and 30 b-catenin associated regions formed at CXXC5 ( 32 ) .
Figure 3A depicts the pos-itions of the XbaI restriction endonuclease sites and PCR primer locations used to interrogate CXXC5 in 3C assays .
A PCR product of the correct size was generated with forward primer C51 and reverse primer C56 , and its production was dependent upon the addition of XbaI and DNA ligase to the reaction ( Figure 3C ) .
This 341 bp fragment was sequenced and conﬁrmed to be the correct CXXC5 product ( Figure 3D ) .
This analysis indicated that a chromatin loop containing b-catenin bound 50 and 30 WREs is present at CXXC5 in human colon cancer cells .
Motif analysis of b-catenin enriched regions
Genome-wide binding analysis has indicated that most b-catenin recruitment to chromatin in colon cancer cells likely occurred through interactions with TCF4 ( 12 ) .
Therefore , we ﬁrst determined whether the b-catenin enriched regions contained a canonical TCF4 motif ( T/A -- T/A -- C -- A -- A -- A -- G ) or the evolutionarily conserved TCF4 motif ( A -- C/G -- T/A -- T -- C -- A -- A -- A -- G ) ( 16 ) .
Of the
GTACGTAGTCGTTTTAGCCCCGGGACTCAAGAG TTGAGGCTGATGCCTGCCTGAGAGATAAAATATCCTTTCTCGGAT
CAGTTTCCTCACCTGAGAAATGGGAACGGGAATCTCCGCCCCTT TTCTCCCGGGGCCCTAGTGCCCACTGAATCCATTAAGGAGCTCT TGGAAGGGTGGGGTCTTGGAACACGCGTCTACCTCCCAGGACC CTCGACTAGGAATCTCTGGCCCGCCGCGCACCTGAGCTGGGGG GCGCGGCCAAATTCTCCCTCCCGGTCCTCGGAGCTTCTGGCCC CGC TCTAGA CACAGAACGGTGGGGGTTTGAGAGGCTGGGTGC XbaI C56 Figure 3 .
A chromatin loop containing 50 and 30 b-catenin enriched regions is detected at the CXXC5 gene .
( A ) Schematic of the CXXC5 locus with untranslated regions as thin rectangles , introns as thin lines , exons as thick rectangles and an arrow demarcating the transcription start site .
The peak density plots below the gene represent b-catenin enriched regions identiﬁed in the ChIP-Seq library .
The triangles and stunted arrows identify the XbaI sites and PCR primers , respectively , used in the chromatin conformation capture ( 3C ) assay depicted in ( C ) .
( B ) Real time PCR analysis of b-catenin ChIP assays performed in HCT116 cells .
Speciﬁc oligonucleotides were used to detect b-catenin binding to enriched regions depicted by gray rectangles in ( A ) .
50 is the upstream site and 30 is the downstream site .
A distal upstream region of the MYC gene was used as a negative control ( Ctrl ) .
Error bars are SEM .
( C ) Agarose gel of PCR products generated from a 3C analysis of CXXC5 in HCT116 cells .
Generation of the 3C product ( CXXC5 0 0 5 3 ) with primers C51 and C56 required the addition of XbaI and ligase to the reactions .
LC is a loading control and S is a DNA standard .
( D ) DNA sequence of the 3C product .
Arrows denote primer sequences C51 and C56 and the XbaI site are boxed .
2168 enriched regions , 1026 ( 47 % ) contained at least one TCF4 motif .
A fraction of these , 192 ( 9 % ) , resembled the longer and evolutionarily conserved variant .
We then performed de novo motif analysis on these populations using a Gibbs sampler algorithm ( 24,27 ) .
Over-representation of motifs was determined by computing the relative enrichment measure as described in the ` Materials and Methods ' section ( 27 ) .
Using the 1026 enriched-regions that contained TCF4 consensus sequences , we successfully identiﬁed an over-representation of both the core and evolutionarily conserved TCF4 motifs .
This indicated that our de novo search approach was valid .
Upon further analysis , we found a striking co-enrichment of AP-1 motifs with the consensus TCF4 motifs .
Examples of all three motifs , along with their scores and enrichment values relative to control sequences , are shown in Figure 4A .
Overall , 417 b-catenin enriched regions contained a TCF4 and an AP-1 motif ( Figure 4B and Supplementary Table S5 ) .
The coupling of AP-1 and TCF4 motifs in 417 ( 19 % ) b-catenin enriched regions suggested that AP-1 , TCF4 and b-catenin may co-regulate target gene expression .
To address this hypothesis , we used the ChIP assay to determine whether these factors bound regions containing AP-1 and TCF4 motifs .
We ﬁrst tested b-catenin binding to a selected subset of regions associated with 23 protein-coding genes .
For this set of genes , associated regions were those that localized within 2.5 kb from gene boundaries .
b-catenin occupied 19 sites in asynchronously growing HCT116 cells ( Figure 5A ) .
We then assayed the same regions for TCF4 binding using TCF4 speciﬁc antibodies in the ChIP assay .
TCF4 bound the same 19 targets as b-catenin ( Figure 5B ) .
We concluded from this analysis that b-catenin and TCF4 co-occupied target genes containing TCF4 and AP-1 consensus motifs .
Next , we determined whether AP-1 bound to these selected regions .
AP-1 is a heterodimeric complex comprised of Fos and Jun transcription factors ( 33,34 ) .
AP-1 regulates key cellular processes such as proliferation , differentiation and apoptosis ( 35 ) .
Several groups have shown that c-Jun associates with AP-1 consensus motifs in colon cancer cells ( 14,36 -- 39 ) .
We therefore tested whether c-Jun occupied b-catenin enriched regions containing AP-1 and TCF4 motifs .
Using c-Jun antibodies in ChIP assays conducted in asynchronously growing HCT116 cells , we found that c-Jun associated with 14 of the 19 regions that bound b-catenin and TCF4 ( Figure 5C ) .
Serum mitogens elicit signal transduction pathways that stimulate c-Jun binding to chromatin .
We have previously shown that c-Jun occupancy of the MYC 30 enhancer increased as quiescent cells re-entered the cell cycle in response to serum ( 14 ) .
We therefore determined whether treatment of quiescent cells with serum would stimulate c-Jun association with the ﬁve targets that lacked binding in asynchronous cells .
HCT116 cells were grown to conﬂuency in serum-depleted medium for two days , which caused these cells to enter the G0/G1 stage of the cell cycle ( 14,39 ) .
Cells were then treated with medium containing serum for 1 or 2 h and c-Jun ChIP assays were conducted .
In line with previous ﬁndings , higher levels of c-Jun were found at the MYC 30 enhancer when synchronized cells were exposed to serum for 1 h as compared to levels detected in quiescent cells media .
We then added medium containing serum with or without 10 mM LiCl for 1 or 4 h. LiCl is a well-established agonist of the Wnt/b-catenin pathway as it inhibits GSK3b and stimulates nuclear b-catenin accumulation ( 13,40,41 ) .
We and others have shown that LiCl increased b-catenin levels in HCT116 cells ( 13,42,43 ) .
Therefore , we predicted that if mitogen and Wnt/b-catenin signaling pathways converged to regulate gene expression , treatment with serum and LiCl would result in increased transcript levels when compared to treatment with serum alone .
LiCl increased mitogen-induced expression of MYC , PDE4B , DDR2 , CTBP2 , EGFR , DNAJB1 , WISP1 and PINX1 ( Figure 6B ) .
HDAC4 , PCDH7 and HABP4 were activated by serum alone , and MMP20 , CYP39A1 and YAP1 genes were not induced above levels seen in serum-deprived cells ( Figure 6B ) .
Together this analysis indicated that Wnt/b-catenin and mitogen-signaling pathways directly activate a subset of b-catenin target genes in colon cancer cells .
DISCUSSION
Gene expression is rarely controlled by the association of a single transcription factor with an enhancer element embedded in the proximal promoter .
Rather , association of multiple transcription factors within an enhancer allows for precise and speciﬁc regulation of gene expression in response to environmental stimuli .
Moreover , enhancers can occupy regions over 100 kb from their target gene ( 44,45 ) .
Genome-wide proﬁling of transcription factor binding sites has emerged as one method to localize composite enhancer elements that integrate upstream signal transduction pathways ( 17,45 ) .
In this report , we used ChIP-Seq to identify b-catenin enriched regions in human colon cancer cells .
Through an integrated approach involving bioinformatics , ChIP and expression analyses , we provide evidence that a population of b-catenin target genes is directly regulated by b-catenin , TCF4 and AP-1 transcription factors .
The nature of ChIP-Seq data provides many challenges for analysis ( 18 ) .
Algorithms have been designed to assign a presence or absence prediction for occupancy at any non-repetitive region of the genome .
A common approach for many algorithms is to use sliding window methods that identify regions of high read depth ( relative to a background distribution ) by traversing the genome in windows of a predetermined size .
CisGenome , SISSRs and WTD algorithms exemplify this approach ( 24 -- 26 ) .
It follows that although each algorithm ﬁnds disparate numbers of enriched regions , increased conﬁdence can be assigned to regions that have been found by all three .
This approach resulted in 2168 enriched b-catenin binding regions identiﬁed in our b-catenin ChIP-Seq library .
In addition to the three WREs that control MYC expression ( 14,22,29 ) , 30 Wnt/b-catenin target genes listed on the Wnt homepage ( http://www.stanford.edu/ rnusse / pathways/targets .
html ) were in proximity to b-catenin enriched regions ( Supplementary Table S5 ) .
Furthermore , when considering of all genomic regions identiﬁed and assayed for b-catenin and TCF4 binding in this report , we found that 28 of 32 ( 87.5 % ) bound both factors .
Together , these ﬁndings indicate that the b-catenin ChIP-Seq library identiﬁed bona ﬁde and direct Wnt/b-catenin target genes .
We previously localized b-catenin binding sites in colon cancer cells using an unbiased and genome-wide approach termed SACO ( 14 ) .
In that study , we found that 84 % of high conﬁdence b-catenin binding regions contained at least one TCF4 consensus core motif ( T/A -- T/A -- C -- A -- A -- A -- G ) .
In this report , we performed de novo motif analysis on the 2168 b-catenin enriched regions and found that 47 % contained a core TCF4 consensus motif .
This discrepancy is likely attributed to methodo-logical and computational differences in the generation and analysis of each data set .
For the SACO study , a 5 kb interval surrounding the mean position of the enriched region was used in the analysis .
For the ChIP-Seq analysis , we examined a much smaller interval enveloping each b-catenin enriched region ( 600 bp ) .
Therefore , based on our ﬁndings using ChIP-Seq and due to the resolution of this technique , the 47 % association rate of TCF4 consensus motifs found in b-catenin enriched regions likely reﬂects the landscape in vivo .
Results gleaned from our current analysis are consistent with TCF4 being a predominant factor that directly recruits b-catenin to enhancers in colon cancer cells , but also suggest that the portion of targets that rely on other b factors to recruit - catenin is substantial .
Two recent studies localized TCF4 binding regions in human colon cancer cells using ChIP-Seq ( 21,22 ) .
We b therefore searched for representation of our - catenin enriched regions in the reported TCF4 libraries .
Of the 10 conserved TCF4 binding peak regions identiﬁed by Tuupanen et al. ( 22 ) , three were identiﬁed in our b-catenin ChIP-Seq library .
This included the peak that identiﬁed the WRE located 335 kb upstream from the MYC transcription start site .
Upon analysis of the TCF4 ChIP-Seq data sets reported by Blahnik et al. ( 21 ) and the ENCODE Project Consortium , we found that 786 ( 36.3 % ) of our b-catenin peak regions overlapped a TCF4 peak region .
There are several possibilities for why a greater percentage of our b-catenin peak regions are not represented in the aforementioned ChIP-Seq libraries .
Methodological variations , algorithms chosen to assign peak regions , and cell-type differences aside , TCF4 is bound to both transcriptionally active and repressed genes .
As b-catenin is thought to primarily associate with transcribed genes , a partial overlap of the b-catenin enriched regions identiﬁed in our study with the TCF4 enriched regions is expected .
Furthermore , as mentioned above , our analysis suggests that the many genes likely recruit b-catenin independently of TCF4 .
Most of these targets are not represented in the TCF4 ChIP-Seq libraries .
Finally , the amount of sequencing required to identify all of the binding sites represented in a ChIP-Seq library is debatable ( 18 ) .
Therefore , we would anticipate that additional sequencing of our library would undoubtedly identify more of the reported TCF4 peak regions .
Overall , however , the concordance of TCF4 and b-catenin peaks identiﬁed by three separate groups independently validates ChIP-Seq as a methodology to identify direct Wnt/b-catenin target genes .
Upon mapping the b-catenin enriched regions relative to RefSeq gene boundaries , we found that binding sites are dispersed through the 50 , intragenic , and 30 ends of gene boundaries .
This ﬁnding is in line with previous genome-wide localization studies of b-catenin and TCF4 binding sites ( 12,16 ) .
We were intrigued by the observation that several targets contained b-catenin enriched regions 0 0 that localized to both 5 and 3 gene boundaries .
Based on our previous work with MYC ( 15 ) , we tested whether a chromatin loop was present at two targets that contained 50 and 30 b-catenin enriched regions in the library , CXXC5 and GRHL3 .
CXXC5 is a zinc ﬁnger containing protein that inhibits canonical Wnt signaling in response to bone morphogen protein signaling in neural stem cells ( 46 ) .
GRHL3 encodes a Grainyhead factor that plays a role in epidermal barrier formation in the bladder ( 47 ) .
0 0 Substantial levels of b-catenin binding to the 5 and 3 regions of CXXC5 and GRHL3 were detected by ChIP analysis ( Figure 3B and Supplementary Figure S2 ) .
Using the 3C technique , we found that a chromatin loop 0 0 accompanied b-catenin binding to 5 and 3 regions of CXXC5 .
However , we were unable to detect a chromatin loop at GRHL3 .
This ﬁnding suggests that while looping between separated enhancers may be a prevalent mechan-ism to coordinate Wnt/b-catenin gene expression , b-catenin binding to 50 and 30 sites alone is not suficient for this interaction .
We anticipate that 3C coupled with high throughput sequencing techniques will facilitate identiﬁcation of target genes that are regulated by distal WREs via chromatin loops ( 48,49 ) .
Through de novo motif analysis , we found that nearly half of the b-catenin enriched regions that contained a TCF4 consensus motif also contained an AP-1 motif .
It is noted here that AP-1 motifs were also over-represented in TCF4 bound regions identiﬁed by recent ChIP-Seq and ChIP-chip analysis ( 16,21 ) .
However , our current study is the ﬁrst to report over-representation of AP-1 motifs in b-catenin bound regions .
While b-catenin/TCF4 and AP-1 have been shown by others and our group to regulate target gene expression ( 36,38,39,50 ) , our ﬁndings here suggest that target genes regulated by 417 b-catenin enriched regions may be likewise regulated .
Our ChIP analysis indicated that nearly every region assayed ( 95 % ) containing a TCF4 and AP-1 motif bound c-Jun , a component of the AP-1 complex .
The majority of these loci showed an additive increase in expression upon the addition of both LiCl and serum .
This analysis suggests that mitogen and Wnt signaling pathways likely converge through AP-1 and b-catenin/TCF4 to co-regulate target gene expression .
However , LiCl treatment failed to enhance mitogen-activated gene expression for several targets .
It is possible that pre-treatment of quiescent cells with LiCl prior to serum stimulation would sensitize the system to facilitate detection of pathway cooperation .
Alternatively , AP-1 binding may function to regulate gene expression in response to a different stimulus such as cytokine signaling or the apoptotic stress response ( 35 ) .
The application of sequence-based methods to identify transcription factor binding sites genome-wide is likely t persist as the methodology of choice .
Next generation sequencing technology , such as those using the Illumina platform , allows increased resolution and increased output .
These attributes have facilitated the replacement of SACO with massively parallel sequencing approaches to map transcription factor binding regions isolated by b ChIP .
Through our ChIP-Seq screen of - catenin binding regions in asynchronous HCT116 cells , we uncovered evidence for a functional interplay between b-catenin/TCF4 and AP-1 .
Because cells that initiate colon carcinogenesis contain pathogenic levels of b-catenin in the nucleus and are bathed in serum mitogens , our ﬁndings here suggest that miss-expression of target genes containing AP-1 and TCF4 motifs might represent the pathogenically relevant set .
Supplementary Data are available at NAR Online.
ACKNOWLEDGEMENTS
We would like to thank Dr Laura Carrel and Dr Faoud Ishmael ( Penn State University College of Medicine ) for critically reading this manuscript and providing helpful comments .
We would like to thank Doug Turnbull and the High Throughput Sequencing Facility in the Molecular Biology Institute at the University of Oregon for sequencing the library .
We would like to thank Dr Richard Goodman and Dr Gail Mandel ( Oregon Health and Science University ) for support during the initiation of this project .
FUNDING
National Institutes of Health ( grant number R01DK080805 to G.S.Y. ) ; start-up research funds from the Pennsylvania State University College of Medicine ( to G.S.Y. ) ; National Institutes of Health , National Center for Research Resources ( 5UL1RR024140 to S.K.M. ) ; National Institutes of Health , National Cancer Institute ( 5 P30 CA069533-13 to S.K.M. ) .
Funding for open access charge : National Institutes of Health ( grant number R01DK080805 to G.S.Y. ) .