26862720.txt 11.3 KB
DamID-seq: Genome-wide Mapping of Protein-DNA Interactions b
Abstract 
The DNA adenine methyltransferase identification ( DamID ) assay is a powerful method to detect protein-DNA interactions both locally and genome-wide . 
It is an alternative approach to chromatin immunoprecipitation ( ChIP ) . 
An expressed fusion protein consisting of the protein of interest and the E. coli DNA adenine methyltransferase can methylate the adenine base in GATC motifs near the sites of protein-DNA interactions . 
Adenine-methylated DNA fragments can then be specifically amplified and detected . 
The original DamID assay detects the genomic locations of methylated DNA fragments by hybridization to DNA microarrays , which is limited by the availability of microarrays and the density of predetermined probes . 
In this paper , we report the detailed protocol of integrating high throughput DNA sequencing into DamID ( DamID-seq ) . 
The large number of short reads generated from DamID-seq enables detecting and localizing protein-DNA interactions genome-wide with high precision and sensitivity . 
We have used the DamID-seq assay to study genome-nuclear lamina ( NL ) interactions in mammalian cells , and have noticed that DamID-seq provides a high resolution and a wide dynamic range in detecting genome-NL interactions . 
The DamID-seq approach enables probing NL associations within gene structures and allows comparing genome-NL interaction maps with other functional genomic data , such as ChIP-seq and RNA-seq . 
Introduction
DNA adenine methyltransferase identification ( DamID ) 1,2 is a method to detect protein-DNA interactions in vivo and is an alternative approach to chromatin immunoprecipitation ( ChIP ) 3 . 
It uses a relatively low amount of cells and does not require chemical cross-linking of protein with DNA or a highly specific antibody . 
The latter is particularly helpful when the target protein is loosely or indirectly associated with DNA . 
DamID has been successfully used to map the binding sites of a variety of proteins including nuclear envelope proteins 4-10 , chromatin associated proteins 11-13 , chromatin modifying enzymes 14 , transcription factors and co-factors15-18 and RNAi machineries 19 . 
The method is applicable in multiple organisms including S. cerevisiae 13 , S. pombe 7 , C. elegans 9,17 , D. melanogaster 5,11,18,20 , A. thaliana 21,22 as well as mouse and human cell lines 6,8,10,23,24 . 
The development of the DamID assay was based on the specific detection of adenine-methylated DNA fragments in eukaryotic cells that lack endogenous adenine methylation 2 . 
An expressed fusion protein , consisting of the DNA-binding protein of interest and E. coli DNA adenine methyltransferase ( Dam ) , can methylate the adenine base in GATC sequences that are in spatial proximity ( most significantly within 1 kb and up to roughly 5 kb ) to the binding sites of the protein in the genome 2 . 
The modified DNA fragments can be specifically amplified and hybridized to microarrays to detect the genomic binding sites of the protein of interest 1,25,26 . 
This original DamID method was limited by the availability of microarrays and the density of predetermined probes . 
We have therefore integrated high throughput sequencing into DamID 10 and designated the method as DamID-seq . 
The large number of short reads generated from DamID-seq enables precise localization of protein-DNA interactions genome-wide . 
We found that DamID-seq provided a higher resolution and a wider dynamic range than DamID by microarray for studying genome-nuclear lamina ( NL ) associations 10 . 
This improved method allows probing NL associations within gene structures 10 and facilitates comparisons with other high throughput sequencing data , such as ChIP-seq and RNA-seq . 
The DamID-seq protocol described here was initially developed for mapping genome-NL associations 10 . 
We generated a fusion protein by tethering mouse or human Lamin B1 to E. coli DNA adenine methyltransferase and tested the protocol in 3T3 mouse embryonic fibroblasts , C2C12 mouse myoblasts 10 and IMR90 human fetal lung fibroblasts ( data not published ) . 
In this protocol , we start with constructing vectors and expressing Dam-tethered fusion proteins by lentiviral infection in mammalian cells 24 . 
Next , we describe the detailed protocols of amplifying adenine-methylated DNA fragments and preparing sequencing libraries that should be applicable in other organisms . 
The Dam-V5-LmnB1 fusion protein was verified to be co-localized with the endogenous Lamin B protein by immunofluorescence staining ( Figure 1 ) . 
The successful PCR amplification of adenine-methylated DNA fragments is a key step for DamID-seq . 
The experimental samples should amplify a smear of 0.2 - 2 kb while the negative controls ( without DpnI , without ligase or without PCR template ) should result in no-or clearly less-amplification ( Figure 2 ) . 
The methylated DNA fragments are in the range from 0.2 to 2 kb , while the desired insert size for an NGS library is from 200 to 300 bp . 
Therefore , it is essential to fragment the methyl PCR products into the suitable size range . 
Nonetheless , it was found to be impractical to simultaneously break larger DNA fragments down to suitable sizes and keep the majority of smaller DNA fragments intact in a single fragmentation duration . 
Therefore , time course experiments were performed to determine the minimal time ( T0 .2 kb ) needed to fragment 1 µg DNA to a smear centered at 200 bp ( Figure 3 ) . 
Then 6 time durations in equal increments were selected between 5 min and T0 .2 kb for the actual fragmentation . 
The enzymatic activity of double strand DNA Fragmentase may vary from batch to batch and may decrease over time , so it is recommended to repeat this step for a new batch of Fragmentase or after storage for a period of time . 
The desired insert size is between 200 and 300 bp corresponding to DNA fragments between 300 and 400 bp (including 121 bp sequencing adaptors) on the agarose gel. Three thin slices within this range were excised from each experimental sample to narrow the size range of a library and increase the possibility of obtaining at least one qualified sequencing library (Figure 4).
An aliquot of 5 µl of each amplified DNA library was analyzed on the agarose gel to determine which library may qualify for sequencing . 
As shown in Figure 5A , a clear single band of the same size as the excised gel slice should be visible on the agarose gel ( step 3.7.4 ) . 
Next , selected libraries were examined by a Bioanalyzer ( Figure 5B ) to determine the exact size range and concentrations prior to sequencing . 
If desired , amplified DNA libraries can be directly examined by a Bioanalyzer without gel analysis . 
When multiple libraries are of good quality , it is recommended to sequence libraries of similar size ranges for a pair of experimental ( cells expressing Dam-V5-POI ) and control ( cells expressing V5-Dam ) samples . 
The short reads generated by sequencing systems were first mapped back to the corresponding genome . 
Uniquely aligned reads were then passed to subsequent analyses . 
A pipeline to process short reads , construct a genome-NL interaction map and analyze gene-NL associations were described in detail in our previous work 10 . 
Representative results are shown in Figure 6 . 
Discussion
Whether Dam-tagged proteins retain the functions of endogenous proteins should be examined before a DamID-seq experiment . 
The subcellular localization of Dam-tagged nuclear envelope proteins should always be determined and compared with that of the endogenous proteins . 
For studying transcription factors , it is suggested to examine whether the Dam-fusion protein can rescue the functions of the endogenous protein in regulating gene expression . 
This functional test can be performed in organisms in which knockout mutants of endogenous DNA-binding proteins are available . 
Because advances in genome engineering have potentially allowed knocking out any endogenous gene of interest , functions of Dam-tagged DNA-binding proteins can be examined in cultured mammalian cells . 
The critical step in this protocol is to successfully fragment the DpnII-digested DamID PCR products to around 200 bp . 
This step is designed to render the amplified adenine-methylated fragments to a narrow size range for sequencing and to randomize the starting nucleotides of the DNA fragments in a sequencing library . 
Inefficient fragmentation will leave the majority of the DNA fragments starting with GATC ( the 5 ' - overhang from the second DpnII digestion ) , and will result in a much lower performance and yield or even a failure in Illumina sequencing . 
Other DNA fragmentation methods may be used as an alternative approach . 
The resolution of DamID ( and DamID-seq described here ) is limited by the frequency of GATCs in the genome to be studied . 
Moreover , even with high throughput sequencing , the genomic localizations of a DNA-binding protein can only be mapped within two consecutive GATCs rather than to the actual DNA-binding sites . 
Despite its limitation , the DamID assay has important advantages . 
Because DamID does not require highly-specific antibodies , it can be used to detect a subset of nuclear proteins that could be difficult to assay by ChIP ( such as the nuclear envelope proteins ) . 
To study how these proteins regulate genome functions , it is important to integrate and cross-analyze their genome-wide localization data with the current epigenomic mapping data ( such as data from the ENCODE and NIH Roadmap Epigenomics Projects 30,31 ) . 
The DamID-seq approach provides both higher resolution and higher sensitivity than DamID by microarray and enables detecting differential NL-associations within gene structures 10 . 
A combinatorial analysis of DamID-seq data , ChIP-seq data 32 and gene expression data has identified a class of NL-associated genes with distinct epigenetic and transcriptional features ( data not published ) . 
Another advantage of DamID is that it only requires a small number of cells . 
In recent years , there has been an explosion in single cell analysis of gene regulation 33,34 . 
Although genome sequence 35 , genome-wide gene expression 36 and chromatin conformation 37 can be assayed in a single cell , there has not been an available approach for detecting protein-DNA interactions genome-wide in a single cell . 
DamID-seq is a highly promising approach for this goal , and may complement the single cell imaging approach in detecting the dynamics of genome-NL interactions 38 . 
One complication is that because the Dam-fusion protein is expressed at a much lower level than the endogenous protein in the DamID assay , it is possible that the Dam-fusion protein may only occupy a subset of genomic binding sites as compared to the endogenous protein . 
DamID assay has mostly been used in cultured animal cells to detect protein-DNA interactions . 
Notably , developmental biologists have applied this assay in detecting protein-DNA interactions in specific cell types in vivo . 
For example , Dam-tagged RNA polymerase II was expressed specifically in Drosophila neural stem cells to detect their genome-wide occupancy without cell isolation 39 . 
DamID-seq will be highly useful to study the genome-wide localizations of nuclear envelope proteins , transcription factors and chromatin regulators during development in animal models . 
Acknowledgements