20460455.txt 40.9 KB
Identification of b-catenin binding regions in colon
ABSTRACT 
Nucleic Acids Research , 2010 , Vol . 
38 , No. 17 5735 -- 5745 doi :10.1093 / nar/gkq363 
Deregulation of the Wnt/b-catenin signaling pathway is a hallmark of colon cancer . 
Mutations in the adenomatous polyposis coli ( APC ) gene occur in the vast majority of colorectal cancers and are an initiating event in cellular transformation . 
Cells harboring mutant APC contain elevated levels of the b-catenin transcription coactivator in the nucleus which leads to abnormal expression of genes controlled by b-catenin/T-cell factor 4 ( TCF4 ) complexes . 
Here , we use chromatin immunoprecipitation coupled with massively parallel sequencing ( ChIP-Seq ) to identify b-catenin binding regions in HCT116 human colon cancer cells . 
We localized 2168 b-catenin enriched regions using a concordance approach for integrating the output from multiple peak alignment algorithms . 
Motif discovery algorithms found a core TCF4 motif ( T/A -- T/A -- C -- A -- A -- A -- G ) , an extended TCF4 motif ( A/T/G -- C/G -- T/A -- T/A -- C -- A -- A -- A -- G ) and an AP-1 motif ( T -- G -- A -- C/T -- T -- C -- A ) to be significantly represented in b-catenin enriched regions . 
Furthermore , 417 regions contained both TCF4 and AP-1 motifs . 
Genes associated with TCF4 and AP-1 motifs bound b-catenin , TCF4 and c-Jun in vivo and were activated by Wnt signaling and serum growth factors . 
Our work provides evidence that Wnt / b-catenin and mitogen signaling pathways intersect directly to regulate a defined set of target genes . 
Oregon Clinical and Translational Research Institute , Oregon Health and Science University , Portland , OR , 2 The Department of Biochemistry and Molecular Biology , The Pennsylvania State University College of Medicine , Hershey , PA 17033 , Department of Medical Informatics and Clinical Epidemiology , Oregon Health 3 and Science University , Portland , OR , Knight Cancer Institute , Division of Biostatistics in the Department of 4 5 Public Health and Preventative Medicine , Oregon Health and Science University , Portland , OR , Program in 6 Cellular and Molecular Biology and The Pennsylvania State Hershey Cancer Institute , The Pennsylvania State 7 University College of Medicine , Hershey , PA 17033 , USA 
INTRODUCTION
The Wnt/b-catenin signaling pathway is required for homeostasis in the gastrointestinal ( GI ) tract ( 1 ) . 
The GI tract is coated with small invaginations , or crypts , which are comprised of discrete zones of proliferating and differentiated cells ( 1 ) . 
Wnt signaling maintains the proliferative compartment of the crypt . 
The b-catenin transcriptional coactivator controls downstream genetic programs elicited by Wnt signaling and its cellular levels are tightly regulated ( 2,3 ) . 
When cells are not exposed to Wnt , cytosolic b-catenin associates with a multi-protein complex that contains the adenomatous polyposis coli ( APC ) protein . 
APC functions as a scaffold to coordinate b-catenin phosphorylation and degradation by the prote-asome . 
Under these conditions , Wnt/b-catenin target genes are silenced by corepressor complexes that are tethered to Wnt responsive DNA enhancers ( WREs ) through interactions with the T-cell factor ( TCF ) family of transcription factors ( 4 ) . 
TCF4 is a predominant TCF family member in colon cancer cells ( 5 ) . 
When cells are exposed to Wnt , b-catenin escapes proteasomal degrad-ation , and is chaperoned to the nucleus by APC . 
There , it occupies TCF4 bound WREs and displaces the corepressors . 
b-catenin then recruits chromatinmodifying complexes and Wnt/b-catenin target genes are expressed ( 4 ) . 
Deregulation of the Wnt/b-catenin pathway is associated with colon carcinogenesis ( 2,3 ) . 
In virtually all cases of colon cancer , mutations target components of the Wnt signaling pathway ( 6 ) . 
The most common lesions localize to APC and lead to production of a truncated APC protein that can no longer effectively coordinate b-catenin degradation . 
This mutation occurs at the earliest stages of carcinogensis when normal colonic epithelial cells are transformed into aberrant crypt foci ( 6 ) . 
Inherited APC mutations give rise to familial adenomatous polyposis , a disease where afflicted individuals are burdened by thousands of intestinal polyps early in adulthood ( 7,8 ) . 
In the rare cancer cases where APC is wild-type , mutations instead are found in CTNNB1 ( 3,9,10 ) . 
CTNNB1 is the gene that encodes b-catenin , and the cancer causing lesions map to positions near the 50 portion of the gene . 
These mutations give rise to a b-catenin pool that is resistant to proteasomal degrad-ation ( 11 ) . 
In each instance , mutations that target APC or CTNNB1 lead to high levels of b-catenin in the nucleus and abnormal expression of genes regulated by b-catenin / TCF4 complexes ( 5,9 ) . 
Therefore identifying target genes directly controlled by b-catenin/TCF4 is required to understand the pathogenesis of this disease . 
To identify direct b-catenin/TCF4 target genes it is first necessary to map binding sites for these factors across the genome . 
Previously , we used an unbiased and genome-wide screen termed serial analysis of chromatin occupancy ( SACO ) to localize 412 high confidence b-catenin binding sites in the human colorectal cancer cell line , HCT116 ( 12 ) . 
Approximately half of the binding sites were near ( < 2.5 kb ) or within protein-coding gene boundaries . 
These b-catenin binding sites were located in 50 promoter regions , intragenic regions and 30 untranslated regions . 
b-Catenin binding to 30 positions relative to E2F4 and MYC genes identified functional WREs . 
For E2F4 , the downstream enhancer drove expression of an antisense and non-coding transcript that decreased E2F4 protein levels ( 13 ) . 
For MYC , b-catenin occupancy of the downstream enhancer initiated a chromatin loop that integrated the 50 WRE to coordinate MYC expression in response to Wnt/b-catenin and mitogen signaling pathways ( 14,15 ) . 
Recently , Hatzis et al. ( 16 ) used chromatin immunoprecipitation ( ChIP ) coupled with microarrays ( ChIP-chip ) to localize 6868 TCF4 binding sites in L171 colorectal cancer cells ( 16 ) . 
As was the case for b-catenin , TCF4 binding was also found throughout protein coding gene boundaries . 
Together these studies indicate that Wnt activation of target gene expression is mechanistically more intricate than the simple model involving recruitment of b-catenin to TCF4-bound 50 promoter regions . 
While SACO was a pioneering technique used to identify transcription factor binding sites and was a viable alternative to the ChIP-chip approach , it did , like most methodologies , suffer from some limitations . 
First , SACO libraries were constructed in plasmid vectors and then sequenced using high-throughput and Sanger-based sequencing . 
This was both laborious and costly . 
In addition , large DNA fragments were included in construction of the earliest SACO libraries . 
While most DNA was in the 500 -- 700 bp range , fragments as large as 2.5 kb were included in the b-catenin SACO library . 
This impinged upon the resolution of the technique and hindered b de novo motif discovery within - catenin bound loci . 
A successor to SACO is the recently described ChIP coupled with massively parallel sequencing ( ChIP-Seq ) approach ( 17 -- 19 ) . 
In this technique , DNA purified from immunoprecipitated chromatin is size-selected and then sequenced using one of the next-generation sequencing platforms such as the Illumina genome analyzer ( 18,19 ) . 
The robustness , cost , resolution and relative ease in library construction has made ChIP-seq , rather than SACO , a current method of choice for genome-wide localization of transcription factor binding sites . 
ChIP-Seq has been used to map numerous histone modifications ( 20 ) and binding sites for several transcription factors including , but not limited to , NRSF/REST , GATA1 , SRF , E2F4 , E2F6 and STAT1 ( 17 ) . 
In addition , ChIP-Seq has been used recently to identify TCF4 enriched binding regions in human colon cancer cell lines ( 21,22 ) . 
Tuupanen et al. ( 22 ) identified 10 TCF4-site containing regions in LoVo cells using a combination of ChIP-Seq and the enhancer element locator analysis . 
Blahnik et al. ( 21 ) used ChIP-Seq to identify 21 102 TCF4 binding sites in HCT116 cells . 
In this report we used ChIP-Seq to identify b-catenin binding regions in HCT116 human colon cancer cells . 
We chose this high-resolution and genome-wide approach because we were interested in using de novo motif analysis to identify transcription factors that putatively cooperate with b-catenin/TCF4 . 
Many algorithms exist to map the enriched genomic regions identified in a ChIP-Seq experiment ( 23 ) . 
Because each approach varies in computational strategy and can produce dramatically different numbers of enriched regions for a given false discovery rate ( FDR ) cutoff , there is some debate as to which is the preferred algorithm ( 18 ) . 
In this report , we used CisGenome ( 24 ) , SISSRs ( 25 ) and WTD ( 26 ) to initially identify enriched regions from the b-catenin ChIP-Seq library . 
Based on the intersection of regions found in common with each algorithm , we identified 2168 b-catenin enriched regions . 
Consistent with our previous report ( 12 ) , we found over-representation of the core and evolutionarily conserved TCF4 consensus motifs within enriched regions . 
In addition , we found that consensus AP-1 motifs were also over-represented in a large subset of the enriched regions with the majority of these motifs co-occurring with a TCF4 motif . 
Finally , we show that serum mitogens and Wnt signaling agonists cooperatively activate expression of some target genes in proximity to b-catenin bound loci that contain AP-1 and TCF4 motifs . 
These findings indicate that a discrete subset of b-catenin target genes are activated by mitogen and Wnt signaling in colon cancer cells , and that this regulation likely occurs through consensus AP-1 and TCF4 sites , respectively . 
MATERIALS AND METHODS Cell culture
HCT116 human colorectal cancer cells ( ATCC number CCL-247 ) were cultured as previously described ( 14 ) . 
ChIP
Antibodies used in ChIP assays included : 3 mg of anti-b-catenin ( BD transduction , 610154 ) , 3 mg of anti-TCF 
( Millipore , 05-511 ) , 2 ml of anti-c-Jun ( Millipore , 06-225 ) and 6 mg rabbit anti mouse IgG ( Jackson Immunoresearch , 315-005-003 ) . 
b-Catenin ChIP DNA for the ChIP-Seq library was prepared using the Chromatin Immunoprecipitation Assay Kit ( Millipore , 17-295 ) according to the instructions . 
To assess b-catenin , TCF4 and c-Jun binding to ChIP-Seq peak loci , ChIP assays contained 5 -- 10 10 cells and were conducted as previously 6 reported ( 13 ) . 
Chromatin in formaldehyde fixed cell lysates was sonicated to an average size of 500 -- 700 bp using a Misonix Ultrasonic XL-2000 Liquid Processor ( 5 20 s , output wattage 7 , with 45 s rest intervals on ice between pulses ) . 
Real time PCR was used to detect isolated ChIP fragments and samples contained 10 ml of 2 iQ SYBR Green Supermix ( Bio-rad , 170-882 ) , 0.25 mM of each primer and 3 ml of purified ChIP DNA . 
Reactions were processed for one cycle at 94 C for 3 min , then 45 cycles at 94 C for 10 s and at 68 C for 40 s using a MyIQ Single Color Real-Time PCR machine ( Bio-rad ) . 
Primers were designed , using Primer3 software , to a 600 bp DNA segment that was centered on the b-catenin ChIP-Seq coverage region . 
Primers used in this study are listed in Supplementary Table S7 . 
Real time data is represented as fold levels over control . 
The control is a distal region that is 5 kb upstream from the MYC transcript start site that does not bind significant levels of b-catenin , TCF4 or c-Jun ( 14 ) . 
Construction of the ChIP-Seq library
b-Catenin precipitated and purified ChIP DNA ( 350 ng ) was processed using the ChIP-Seq DNA Sample Preparation Kit ( Illumina , 1003473 ) according to instructions provided by the manufacturer . 
Prior to sequencing , DNAs were re-quantified using a NanoDrop 1000 Spectrophotometer and the quality of DNA was assessed using a Bioanalyzer DNA 1000 ( Agilent ) . 
Samples were diluted to 10 nM and 54-nt reads were obtained from one lane of sequencing on a Illumina GA II sequencer . 
The High Throughput Sequencing Facility at the University of Oregon ( http://htseq.uoregon.edu ) sequenced the library . 
Raw sequence data was submitted to the sequence read archive ( SRA ) under accession number SRA012054 . 
Realignment of ChIP-Seq reads
Reads ( 9 322 654 ) were sequenced and of these , 8 456 287 passed the quality filter as assessed using ELAND software ( Illumina ) . 
We then used ELAND to align the 8 456 267 reads to the repeat masked NCBI 36/hg18 build of the human genome . 
We assigned unique positions for 6 576 033 reads allowing up to two mismatches in the first 32 bases of the read sequence . 
This set of reads was retained for downstream computational analysis . 
Identification of b-catenin enriched regions
We utilized three peak calling programs to define a set of putative binding regions : CisGenome ( 24 ) , SISSRs ( 25 ) and WTD ( 26 ) . 
Each method implements variations on a sliding window approach to identify regions of higher read depth , referred to as peaks , relative to a background distribution . 
Based on the particular algorithm , background distributions are derived using reads from a negative control experiment , through monte carlo proced-ures , or through statistical models ( 23 ) . 
CisGenome and SISSRs rely on statistical model fitting while the WTD method uses a randomization approach . 
SISSRs was run using a window size of 20 bases with the FDR set at 0.01 . 
WTD was run with the window size estimated from the binding characteristics . 
Any local tag anomalies were removed and an FDR cutoff of 0.01 was assessed using 10 randomization procedures . 
CisGenome was run using a window size of 100 and a read cutoff of 7 reads . 
We then identified the midpoints of each of the regions and extended 299 bp upstream and 300 bp downstream so that there was a total of 600 bp identifying each putative binding region . 
We chose 600 bases because it was twice what we considered to be the largest size of DNA fragments submitted for sequencing ( Figure 1B ) . 
Using these criteria , CisGenome , SISSRs and WTD called 100 372 , 80 733 and 2940 peak regions , respectively . 
Peaks called in common by the three algorithms yielded 2168 putative b-catenin binding regions and this set was used for further computational analysis ( See Supplementary Table S1 for a summary of peak overlaps ) . 
De novo motif analysis
The genomic sequence ( 600 bp ) encompassing each enriched region was isolated and the regions were separated into two sets based upon whether they had at least one instance of a canonical ( T/A -- T/A -- C -- A -- A -- A -- G ) or evolutionarily conserved ( A -- C/G -- T/A -- T -- C -- A -- A -- A -- G ) TCF4 motif within the boundaries of the region ( 16 ) . 
The reverse complements of these sequences were also included . 
The sequence from each region was repeat masked and used as input into the Gibbs sampler motif finding program provided by CisGenome . 
The motif finder was run searching for motifs of 7 , 11 and 15 bp using 5000 MCMC iterations and a score was produced for each motif . 
Control sequences were picked based on the strategy of Ji et al. ( 27 ) . 
Briefly , sequences were chosen to match the underlying characteristics of the enriched regions . 
Each control sequence was picked randomly such that it was of the same size and was in the same position relative to the nearest RefSeq transcript ( 28 ) as a given enriched region . 
Five sets of control sequences were chosen in this manner for both sets of enriched regions . 
Motifs found from the de novo search were mapped back to both the b-catenin enriched regions and the control sequences using the motif mapping tool from CisGenome . 
The number of matches were based on a likelihood ratio cutoff of 500 and a background model consisting of a third-order markov chain , in accordance with Ji et al. ( 27 ) . 
The relative enrichment was computed as described ( 27 ) . 
Motifs that had a relative enrichment score > 2 were determined to be over-represented in the b-catenin enriched regions . 
Chromatin conformation capture
Chromatin conformation capture ( 3C ) assays were conducted as described ( 15 ) with minor modifications . 
Formaldehyde cross linked chromatin was digested over-night with 40 ml ( 800 U ) of XbaI ( New England Biolabs ) . 
XbaI was then heat-inactivated at 65 C for 20 min prior to ligation reactions . 
After proteinase K treatment , the samples were extracted in phenol/chloroform three times , followed by three back extractions with chloroform . 
The chromatin loop at CXXC5 was detected by PCR using primers C51 , GTACGTAGTCGTTTTAGCC and C56 , GCACCCAGCCTCTCAAACCC and the conditions previously described ( 15 ) . 
To control for loading , parallel samples were amplified by PCR with the tubulin specific primers GGGGCTGGGTAAATGGCAAA and TGGCACTGGCTCTGGGTTCG . 
Products were analyzed on a 1 % agarose gel by electrophoresis , purified and sequenced . 
Serum and LiCl stimulation of HCT116 cells
HCT116 cells were synchronized in the cell cycle as previously described ( 14 ) . 
For ChIP experiments , G0/G1 cells grown in a 10-cm tissue culture dish were stimulated with medium containing 10 % fetal bovine serum for 1 or 2 h prior to formaldehyde fixation . 
For expression analysis , G0/G1 cells were grown in a 6-well plate prior to stimulation with medium containing serum with and without 10 mM LiCl for 1 or 4 h as indicated . 
Reverse transcription/real time PCR
RNA was isolated using TRIZol reagent ( Invitrogen , 15596-018 ) according to the instructions . 
cDNA was synthesized using 500 ng of total RNA and the iScript cDNA Synthesis Kit ( Bio-rad , 170-8890 ) according to the instructions . 
cDNA was diluted to 1 : 150 before quantification by real-time PCR . 
Real-time PCR was conducted as outlined under the ChIP section , except 3 ml of diluted cDNA was used as the template . 
Primers were designed using Primer3 software and their sequences are included in Supplementary Table S7 . 
RESULTS
Construction of the b-catenin ChIP-Seq library
We were interested in using ChIP coupled with massively parallel sequencing ( ChIP-Seq ) to identify b-catenin binding regions in HCT116 human colon cancer cells . 
Prior to constructing the library , we tested the eficacy of our ChIP protocol to identify bona fide b-catenin targets in this cell line . 
b-catenin strongly associated with a WRE located 1.4 kb downstream of the transcription stop site of the c-Myc gene ( MYC ) as we have reported previously ( Figure 1A ) ( 14,15 ) . 
Furthermore , insignificant levels of b-catenin were detected at a control element located 5 kb upstream of the MYC transcription start site that did not associate with either b-catenin or TCF4 ( Figure 1A ) ( 14 ) . 
Size-selected b-catenin ChIP DNA was then processed according to the Illumina sample preparation protocol and minimally amplified by PCR . 
Most amplified fragments were in the range of 175 -- 225 bp and were produced in samples containing b-catenin ChIP DNA whereas these products were absent in the control sample ( Figure 1B ) . 
A total of 9 322 654 reads were generated from one lane of sequencing using an Illumina GA II high throughput sequencer . 
Of these , 90.7 % ( 8 456 287 ) passed the quality filter and 77.8 % ( 6 576 033 ) reads were assigned a unique position in the human genome . 
The set of 6 576 033 reads was then subjected to additional computational analysis . 
As outlined in the ` Materials and Methods ' section , we compared three computational algorithms and we considered the peaks called in common to demarcate putative b-catenin binding regions . 
This approach yielded 2168 peaks that we termed b-catenin enriched regions . 
The genomic boundaries of these regions are provided in Supplementary Table S2 . 
To determine whether our approach identified bona fide Wnt/b-catenin target genes , we searched for representation of the MY gene . 
b-catenin enriched regions coincided with the 50 , 30 and distal WREs previously shown to regulate MYC expression ( Figure 1C ) ( 15,22,29 -- 31 ) . 
Thus , our approach identified b-catenin associated WREs in colon cancer cells . 
Computational analysis of b-catenin enriched regions We next localized the b-catenin enriched regions relative to transcripts deposited in the reference sequence database ( RefSeq ) ( 28 ) . 
Of the 2168 b-catenin enriched regions , 1562 ( 72 % ) were within 50 kb of a RefSeq transcript . 
Upon further analysis , we found that 1219 ( 56 % ) were within 10 kb and 1090 ( 50 % ) were within 2.5 kb ( Figure 2A ) . 
With respect to protein-coding genes and in agreement with our previous findings ( 12 ) , we found that b-catenin preferentially localized to internal positions or those positions that are downstream from the transcription start site and upstream from the transcription stop site ( Figure 2B ) . 
There was a tendency for b-catenin enriched regions within 2.5 kb of the 50 gene boundary to cluster around transcriptional start sites ( Figure 2C ) . 
The genes containing b-catenin enriched regions near ( < 2.5 kb ) or within gene boundaries are listed in Supplementary Table S3 . 
Overall , the 1090 b-catenin enriched regions are near or within 988 genes . 
It was recently shown that a distal WRE interacted with MYC through a large chromatin loop ( 30,31 ) . 
This was the first demonstration indicating that a WRE positioned hundreds of kilobases away from their target genes functioned as a transcriptional enhancer . 
To further explore the relationship of b-catenin enriched regions and annotated protein-coding transcripts , we determined the empirical cumulative distribution function ( CDF ) of the distance from each b-catenin enriched region to the nearest transcript . 
This analysis found that 80 % of enriched regions were within 100 kb of an annotated transcript and that 95 % were within 450 kb ( Figure 2D ) . 
Together these findings indicate that while most b-catenin regions are near or within protein-coding genes , 28 % localized at a distance of > 50 kb away . 
Furthermore , localization of b-catenin enriched regions with gene boundaries was statistically significant when compared to localization of a control set of regions with gene boundaries ( Supplementary Figure S1 ) . 
b-Catenin occupancy of 50 and 30 regions at CXXC5 identified a chromatin loop Recently we described a b-catenin and TCF4-coordinated chromatin loop at MYC that integrated 50 and 30 proximal WREs ( 15 ) . 
To identify targets that may be likewise regulated , we searched for genes that contained both 50 and 30 b-catenin enriched regions . 
In addition to MYC , we found two genes , CXXC5 and FXR2 , that contained b-catenin enriched regions within 2.5 kb of both transcript boundaries ( Supplementary Table S4 and Figure 3A ) . 
If the range was expanded to include regions 10 kb from the 50 and 30 ends , 11 loci were identified . 
This number increased to 111 loci if the range was further expanded to 50 kb . 
We first used ChIP and real-time PCR to determine whether b-catenin occupied the identified regions relative to CXXC5 . 
b-catenin precipitated higher levels of the 50 and 30 CXXC5 enriched regions relative to control ( Figure 3B ) . 
We then used chromatin conform-ation capture ( 3C ) to determine whether a chromatin loop containing the 50 and 30 b-catenin associated regions formed at CXXC5 ( 32 ) . 
Figure 3A depicts the pos-itions of the XbaI restriction endonuclease sites and PCR primer locations used to interrogate CXXC5 in 3C assays . 
A PCR product of the correct size was generated with forward primer C51 and reverse primer C56 , and its production was dependent upon the addition of XbaI and DNA ligase to the reaction ( Figure 3C ) . 
This 341 bp fragment was sequenced and confirmed to be the correct CXXC5 product ( Figure 3D ) . 
This analysis indicated that a chromatin loop containing b-catenin bound 50 and 30 WREs is present at CXXC5 in human colon cancer cells . 
Motif analysis of b-catenin enriched regions
Genome-wide binding analysis has indicated that most b-catenin recruitment to chromatin in colon cancer cells likely occurred through interactions with TCF4 ( 12 ) . 
Therefore , we first determined whether the b-catenin enriched regions contained a canonical TCF4 motif ( T/A -- T/A -- C -- A -- A -- A -- G ) or the evolutionarily conserved TCF4 motif ( A -- C/G -- T/A -- T -- C -- A -- A -- A -- G ) ( 16 ) . 
Of the 
GTACGTAGTCGTTTTAGCCCCGGGACTCAAGAG TTGAGGCTGATGCCTGCCTGAGAGATAAAATATCCTTTCTCGGAT
CAGTTTCCTCACCTGAGAAATGGGAACGGGAATCTCCGCCCCTT TTCTCCCGGGGCCCTAGTGCCCACTGAATCCATTAAGGAGCTCT TGGAAGGGTGGGGTCTTGGAACACGCGTCTACCTCCCAGGACC CTCGACTAGGAATCTCTGGCCCGCCGCGCACCTGAGCTGGGGG GCGCGGCCAAATTCTCCCTCCCGGTCCTCGGAGCTTCTGGCCC CGC TCTAGA CACAGAACGGTGGGGGTTTGAGAGGCTGGGTGC XbaI C56 Figure 3 . 
A chromatin loop containing 50 and 30 b-catenin enriched regions is detected at the CXXC5 gene . 
( A ) Schematic of the CXXC5 locus with untranslated regions as thin rectangles , introns as thin lines , exons as thick rectangles and an arrow demarcating the transcription start site . 
The peak density plots below the gene represent b-catenin enriched regions identified in the ChIP-Seq library . 
The triangles and stunted arrows identify the XbaI sites and PCR primers , respectively , used in the chromatin conformation capture ( 3C ) assay depicted in ( C ) . 
( B ) Real time PCR analysis of b-catenin ChIP assays performed in HCT116 cells . 
Specific oligonucleotides were used to detect b-catenin binding to enriched regions depicted by gray rectangles in ( A ) . 
50 is the upstream site and 30 is the downstream site . 
A distal upstream region of the MYC gene was used as a negative control ( Ctrl ) . 
Error bars are SEM . 
( C ) Agarose gel of PCR products generated from a 3C analysis of CXXC5 in HCT116 cells . 
Generation of the 3C product ( CXXC5 0 0 5 3 ) with primers C51 and C56 required the addition of XbaI and ligase to the reactions . 
LC is a loading control and S is a DNA standard . 
( D ) DNA sequence of the 3C product . 
Arrows denote primer sequences C51 and C56 and the XbaI site are boxed . 
2168 enriched regions , 1026 ( 47 % ) contained at least one TCF4 motif . 
A fraction of these , 192 ( 9 % ) , resembled the longer and evolutionarily conserved variant . 
We then performed de novo motif analysis on these populations using a Gibbs sampler algorithm ( 24,27 ) . 
Over-representation of motifs was determined by computing the relative enrichment measure as described in the ` Materials and Methods ' section ( 27 ) . 
Using the 1026 enriched-regions that contained TCF4 consensus sequences , we successfully identified an over-representation of both the core and evolutionarily conserved TCF4 motifs . 
This indicated that our de novo search approach was valid . 
Upon further analysis , we found a striking co-enrichment of AP-1 motifs with the consensus TCF4 motifs . 
Examples of all three motifs , along with their scores and enrichment values relative to control sequences , are shown in Figure 4A . 
Overall , 417 b-catenin enriched regions contained a TCF4 and an AP-1 motif ( Figure 4B and Supplementary Table S5 ) . 
The coupling of AP-1 and TCF4 motifs in 417 ( 19 % ) b-catenin enriched regions suggested that AP-1 , TCF4 and b-catenin may co-regulate target gene expression . 
To address this hypothesis , we used the ChIP assay to determine whether these factors bound regions containing AP-1 and TCF4 motifs . 
We first tested b-catenin binding to a selected subset of regions associated with 23 protein-coding genes . 
For this set of genes , associated regions were those that localized within 2.5 kb from gene boundaries . 
b-catenin occupied 19 sites in asynchronously growing HCT116 cells ( Figure 5A ) . 
We then assayed the same regions for TCF4 binding using TCF4 specific antibodies in the ChIP assay . 
TCF4 bound the same 19 targets as b-catenin ( Figure 5B ) . 
We concluded from this analysis that b-catenin and TCF4 co-occupied target genes containing TCF4 and AP-1 consensus motifs . 
Next , we determined whether AP-1 bound to these selected regions . 
AP-1 is a heterodimeric complex comprised of Fos and Jun transcription factors ( 33,34 ) . 
AP-1 regulates key cellular processes such as proliferation , differentiation and apoptosis ( 35 ) . 
Several groups have shown that c-Jun associates with AP-1 consensus motifs in colon cancer cells ( 14,36 -- 39 ) . 
We therefore tested whether c-Jun occupied b-catenin enriched regions containing AP-1 and TCF4 motifs . 
Using c-Jun antibodies in ChIP assays conducted in asynchronously growing HCT116 cells , we found that c-Jun associated with 14 of the 19 regions that bound b-catenin and TCF4 ( Figure 5C ) . 
Serum mitogens elicit signal transduction pathways that stimulate c-Jun binding to chromatin . 
We have previously shown that c-Jun occupancy of the MYC 30 enhancer increased as quiescent cells re-entered the cell cycle in response to serum ( 14 ) . 
We therefore determined whether treatment of quiescent cells with serum would stimulate c-Jun association with the five targets that lacked binding in asynchronous cells . 
HCT116 cells were grown to confluency in serum-depleted medium for two days , which caused these cells to enter the G0/G1 stage of the cell cycle ( 14,39 ) . 
Cells were then treated with medium containing serum for 1 or 2 h and c-Jun ChIP assays were conducted . 
In line with previous findings , higher levels of c-Jun were found at the MYC 30 enhancer when synchronized cells were exposed to serum for 1 h as compared to levels detected in quiescent cells media . 
We then added medium containing serum with or without 10 mM LiCl for 1 or 4 h. LiCl is a well-established agonist of the Wnt/b-catenin pathway as it inhibits GSK3b and stimulates nuclear b-catenin accumulation ( 13,40,41 ) . 
We and others have shown that LiCl increased b-catenin levels in HCT116 cells ( 13,42,43 ) . 
Therefore , we predicted that if mitogen and Wnt/b-catenin signaling pathways converged to regulate gene expression , treatment with serum and LiCl would result in increased transcript levels when compared to treatment with serum alone . 
LiCl increased mitogen-induced expression of MYC , PDE4B , DDR2 , CTBP2 , EGFR , DNAJB1 , WISP1 and PINX1 ( Figure 6B ) . 
HDAC4 , PCDH7 and HABP4 were activated by serum alone , and MMP20 , CYP39A1 and YAP1 genes were not induced above levels seen in serum-deprived cells ( Figure 6B ) . 
Together this analysis indicated that Wnt/b-catenin and mitogen-signaling pathways directly activate a subset of b-catenin target genes in colon cancer cells . 
DISCUSSION
Gene expression is rarely controlled by the association of a single transcription factor with an enhancer element embedded in the proximal promoter . 
Rather , association of multiple transcription factors within an enhancer allows for precise and specific regulation of gene expression in response to environmental stimuli . 
Moreover , enhancers can occupy regions over 100 kb from their target gene ( 44,45 ) . 
Genome-wide profiling of transcription factor binding sites has emerged as one method to localize composite enhancer elements that integrate upstream signal transduction pathways ( 17,45 ) . 
In this report , we used ChIP-Seq to identify b-catenin enriched regions in human colon cancer cells . 
Through an integrated approach involving bioinformatics , ChIP and expression analyses , we provide evidence that a population of b-catenin target genes is directly regulated by b-catenin , TCF4 and AP-1 transcription factors . 
The nature of ChIP-Seq data provides many challenges for analysis ( 18 ) . 
Algorithms have been designed to assign a presence or absence prediction for occupancy at any non-repetitive region of the genome . 
A common approach for many algorithms is to use sliding window methods that identify regions of high read depth ( relative to a background distribution ) by traversing the genome in windows of a predetermined size . 
CisGenome , SISSRs and WTD algorithms exemplify this approach ( 24 -- 26 ) . 
It follows that although each algorithm finds disparate numbers of enriched regions , increased confidence can be assigned to regions that have been found by all three . 
This approach resulted in 2168 enriched b-catenin binding regions identified in our b-catenin ChIP-Seq library . 
In addition to the three WREs that control MYC expression ( 14,22,29 ) , 30 Wnt/b-catenin target genes listed on the Wnt homepage ( http://www.stanford.edu/ rnusse / pathways/targets . 
html ) were in proximity to b-catenin enriched regions ( Supplementary Table S5 ) . 
Furthermore , when considering of all genomic regions identified and assayed for b-catenin and TCF4 binding in this report , we found that 28 of 32 ( 87.5 % ) bound both factors . 
Together , these findings indicate that the b-catenin ChIP-Seq library identified bona fide and direct Wnt/b-catenin target genes . 
We previously localized b-catenin binding sites in colon cancer cells using an unbiased and genome-wide approach termed SACO ( 14 ) . 
In that study , we found that 84 % of high confidence b-catenin binding regions contained at least one TCF4 consensus core motif ( T/A -- T/A -- C -- A -- A -- A -- G ) . 
In this report , we performed de novo motif analysis on the 2168 b-catenin enriched regions and found that 47 % contained a core TCF4 consensus motif . 
This discrepancy is likely attributed to methodo-logical and computational differences in the generation and analysis of each data set . 
For the SACO study , a 5 kb interval surrounding the mean position of the enriched region was used in the analysis . 
For the ChIP-Seq analysis , we examined a much smaller interval enveloping each b-catenin enriched region ( 600 bp ) . 
Therefore , based on our findings using ChIP-Seq and due to the resolution of this technique , the 47 % association rate of TCF4 consensus motifs found in b-catenin enriched regions likely reflects the landscape in vivo . 
Results gleaned from our current analysis are consistent with TCF4 being a predominant factor that directly recruits b-catenin to enhancers in colon cancer cells , but also suggest that the portion of targets that rely on other b factors to recruit - catenin is substantial . 
Two recent studies localized TCF4 binding regions in human colon cancer cells using ChIP-Seq ( 21,22 ) . 
We b therefore searched for representation of our - catenin enriched regions in the reported TCF4 libraries . 
Of the 10 conserved TCF4 binding peak regions identified by Tuupanen et al. ( 22 ) , three were identified in our b-catenin ChIP-Seq library . 
This included the peak that identified the WRE located 335 kb upstream from the MYC transcription start site . 
Upon analysis of the TCF4 ChIP-Seq data sets reported by Blahnik et al. ( 21 ) and the ENCODE Project Consortium , we found that 786 ( 36.3 % ) of our b-catenin peak regions overlapped a TCF4 peak region . 
There are several possibilities for why a greater percentage of our b-catenin peak regions are not represented in the aforementioned ChIP-Seq libraries . 
Methodological variations , algorithms chosen to assign peak regions , and cell-type differences aside , TCF4 is bound to both transcriptionally active and repressed genes . 
As b-catenin is thought to primarily associate with transcribed genes , a partial overlap of the b-catenin enriched regions identified in our study with the TCF4 enriched regions is expected . 
Furthermore , as mentioned above , our analysis suggests that the many genes likely recruit b-catenin independently of TCF4 . 
Most of these targets are not represented in the TCF4 ChIP-Seq libraries . 
Finally , the amount of sequencing required to identify all of the binding sites represented in a ChIP-Seq library is debatable ( 18 ) . 
Therefore , we would anticipate that additional sequencing of our library would undoubtedly identify more of the reported TCF4 peak regions . 
Overall , however , the concordance of TCF4 and b-catenin peaks identified by three separate groups independently validates ChIP-Seq as a methodology to identify direct Wnt/b-catenin target genes . 
Upon mapping the b-catenin enriched regions relative to RefSeq gene boundaries , we found that binding sites are dispersed through the 50 , intragenic , and 30 ends of gene boundaries . 
This finding is in line with previous genome-wide localization studies of b-catenin and TCF4 binding sites ( 12,16 ) . 
We were intrigued by the observation that several targets contained b-catenin enriched regions 0 0 that localized to both 5 and 3 gene boundaries . 
Based on our previous work with MYC ( 15 ) , we tested whether a chromatin loop was present at two targets that contained 50 and 30 b-catenin enriched regions in the library , CXXC5 and GRHL3 . 
CXXC5 is a zinc finger containing protein that inhibits canonical Wnt signaling in response to bone morphogen protein signaling in neural stem cells ( 46 ) . 
GRHL3 encodes a Grainyhead factor that plays a role in epidermal barrier formation in the bladder ( 47 ) . 
0 0 Substantial levels of b-catenin binding to the 5 and 3 regions of CXXC5 and GRHL3 were detected by ChIP analysis ( Figure 3B and Supplementary Figure S2 ) . 
Using the 3C technique , we found that a chromatin loop 0 0 accompanied b-catenin binding to 5 and 3 regions of CXXC5 . 
However , we were unable to detect a chromatin loop at GRHL3 . 
This finding suggests that while looping between separated enhancers may be a prevalent mechan-ism to coordinate Wnt/b-catenin gene expression , b-catenin binding to 50 and 30 sites alone is not suficient for this interaction . 
We anticipate that 3C coupled with high throughput sequencing techniques will facilitate identification of target genes that are regulated by distal WREs via chromatin loops ( 48,49 ) . 
Through de novo motif analysis , we found that nearly half of the b-catenin enriched regions that contained a TCF4 consensus motif also contained an AP-1 motif . 
It is noted here that AP-1 motifs were also over-represented in TCF4 bound regions identified by recent ChIP-Seq and ChIP-chip analysis ( 16,21 ) . 
However , our current study is the first to report over-representation of AP-1 motifs in b-catenin bound regions . 
While b-catenin/TCF4 and AP-1 have been shown by others and our group to regulate target gene expression ( 36,38,39,50 ) , our findings here suggest that target genes regulated by 417 b-catenin enriched regions may be likewise regulated . 
Our ChIP analysis indicated that nearly every region assayed ( 95 % ) containing a TCF4 and AP-1 motif bound c-Jun , a component of the AP-1 complex . 
The majority of these loci showed an additive increase in expression upon the addition of both LiCl and serum . 
This analysis suggests that mitogen and Wnt signaling pathways likely converge through AP-1 and b-catenin/TCF4 to co-regulate target gene expression . 
However , LiCl treatment failed to enhance mitogen-activated gene expression for several targets . 
It is possible that pre-treatment of quiescent cells with LiCl prior to serum stimulation would sensitize the system to facilitate detection of pathway cooperation . 
Alternatively , AP-1 binding may function to regulate gene expression in response to a different stimulus such as cytokine signaling or the apoptotic stress response ( 35 ) . 
The application of sequence-based methods to identify transcription factor binding sites genome-wide is likely t persist as the methodology of choice . 
Next generation sequencing technology , such as those using the Illumina platform , allows increased resolution and increased output . 
These attributes have facilitated the replacement of SACO with massively parallel sequencing approaches to map transcription factor binding regions isolated by b ChIP . 
Through our ChIP-Seq screen of - catenin binding regions in asynchronous HCT116 cells , we uncovered evidence for a functional interplay between b-catenin/TCF4 and AP-1 . 
Because cells that initiate colon carcinogenesis contain pathogenic levels of b-catenin in the nucleus and are bathed in serum mitogens , our findings here suggest that miss-expression of target genes containing AP-1 and TCF4 motifs might represent the pathogenically relevant set . 
Supplementary Data are available at NAR Online.
ACKNOWLEDGEMENTS
We would like to thank Dr Laura Carrel and Dr Faoud Ishmael ( Penn State University College of Medicine ) for critically reading this manuscript and providing helpful comments . 
We would like to thank Doug Turnbull and the High Throughput Sequencing Facility in the Molecular Biology Institute at the University of Oregon for sequencing the library . 
We would like to thank Dr Richard Goodman and Dr Gail Mandel ( Oregon Health and Science University ) for support during the initiation of this project . 
FUNDING
National Institutes of Health ( grant number R01DK080805 to G.S.Y. ) ; start-up research funds from the Pennsylvania State University College of Medicine ( to G.S.Y. ) ; National Institutes of Health , National Center for Research Resources ( 5UL1RR024140 to S.K.M. ) ; National Institutes of Health , National Cancer Institute ( 5 P30 CA069533-13 to S.K.M. ) . 
Funding for open access charge : National Institutes of Health ( grant number R01DK080805 to G.S.Y. ) .