Add TEXT_FILES

carlosmendeznlp
Commit 27818a9ad50f2ce38a88629917e769e730896292 27818a9a 1 parent 1e051ed3
Showing 116 changed files with 35269 additions and 0 deletions
data/TEXT_FILES/notuseful_txt/18460200.txt
data/TEXT_FILES/notuseful_txt/18697768.txt
data/TEXT_FILES/notuseful_txt/18974181.txt
data/TEXT_FILES/notuseful_txt/19843227.txt
data/TEXT_FILES/notuseful_txt/20460455.txt
data/TEXT_FILES/notuseful_txt/20639326.txt
data/TEXT_FILES/notuseful_txt/20817769.txt
data/TEXT_FILES/notuseful_txt/21051353.txt
data/TEXT_FILES/notuseful_txt/21124945.txt
data/TEXT_FILES/notuseful_txt/21278291.txt
data/TEXT_FILES/notuseful_txt/21515770.txt
data/TEXT_FILES/notuseful_txt/22555467.txt
data/TEXT_FILES/notuseful_txt/22890136.txt
data/TEXT_FILES/notuseful_txt/22923524.txt
data/TEXT_FILES/notuseful_txt/23190111.txt
data/TEXT_FILES/notuseful_txt/23232715.txt
data/TEXT_FILES/notuseful_txt/23275538.txt
data/TEXT_FILES/notuseful_txt/23470992.txt
data/TEXT_FILES/notuseful_txt/23511241.txt
data/TEXT_FILES/notuseful_txt/23580539.txt
--- a/data/TEXT_FILES/notuseful_txt/18460200.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/18460200.txt 0 → 100644
View file @27818a9
+ BMC Bioinformatics
+ Address : 1Centre INRIA Rennes Bretagne Atlantique , IRISA , Rennes , France , 2Université de Rennes 1 , IRISA , Rennes , France , 3Université de Rennes 1 , IRMAR , Rennes , France and 4CNRS , UMR 6074 , IRISA , Rennes , France Email : Philippe Veber * - philippe.veber@googlemail.com; Carito Guziolowski - cvargas@irisa.fr; Michel Le Borgne - michel.leborgne@irisa.fr; Ovidiu Radulescu - ovidiu.radulescu@irisa.fr; Anne Siegel - anne.siegel@irisa.fr * Corresponding author 
+ Abstract
+ Background : Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks , from well studied , simple organisms up to higher eukaryotes . 
+ Admittedly , a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information , as well as to comply with practical observability issues : measurements can be scarce or noisy . 
+ In this work , we show how to combine a network of genetic regulations with a set of expression profiles , in order to infer the functional effect of the regulations , as inducer or repressor . 
+ Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays . 
+ Results : We evaluate our approach in several settings of increasing complexity . 
+ First , we generate artificial expression data on a transcriptional network of E. coli extracted from the literature ( 1529 nodes and 3802 edges ) , and we estimate that 30 % of the regulations can be annotated with about 30 profiles . 
+ We additionally prove that at most 40.8 % of the network can be inferred using our approach . 
+ Second , we use this network in order to validate the predictions obtained with a compendium of real expression profiles . 
+ We describe a filtering algorithm that generates particularly reliable predictions . 
+ Finally , we apply our inference approach to S. cerevisiae transcriptional network ( 2419 nodes and 4344 interactions ) , by combining ChIP-chip data and 15 expression profiles . 
+ We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model ( 15 % of all the interactions ) . 
+ In addition , we report predictions for 14.5 % of all interactions . 
+ Conclusion : Our approach does not require accurate expression levels nor times series . 
+ Nevertheless , we show on both data , real and artificial , that a relatively small number of perturbation experiments are enough to determine a significant portion of regulatory effects . 
+ This is a key practical asset compared to statistical methods for network reconstruction . 
+ We demonstrate that our approach is able to provide accurate predictions , even when the network is incomplete and the data is noisy . 
+ Background
+ A central problem in molecular genetics is to understand the transcriptional regulation of gene expression . 
+ A transcription factor ( TF ) is a protein that binds to a typical domain on the DNA and influences transcription . 
+ The effect of this TF can be either a repression or an activation of transcription depending on the type of binding site , the distance to coding regions , or on the presence of other molecules . 
+ Finding which gene is controlled by which TF is a reverse engineering problem , usually named network reconstruction . 
+ This question has been approached over the past years by various groups . 
+ A first approach to achieve this task is to collect the information spread in the primary literature . 
+ Following this idea , a large number of databases that take protein and regulatory interactions from the literature and curate them have been developed [ 1-5 ] . 
+ For the bacteria E. coli , RegulonDB is a dedicated database that contains experimentally verified regulatory interactions [ 6 ] . 
+ For the budding yeast ( S. cerevisiae ) , the Yeast Proteome Database contains a large amount of regulatory information [ 7 ] . 
+ In this latter case , however , the amount of available information is not sufficient to build a reasonably accurate model of transcriptional regulation . 
+ Databases with regulatory knowledge extracted from the literature are , nevertheless , an unavoidable starting point for network reconstruction . 
+ The alternative to a literature-curated approach is a data-driven approach . 
+ This approach is supported by the availability of high-throughput experimental data including microarray expression analysis of deletion mutants ( simple or more rarely double non-lethal knockouts ) , over expression of TF-encoding genes , protein-protein interactions , protein localisation , or ChIP-chip experiments coupled with promoter sequence analysis . 
+ We may cite several classes of methods that use these kinds of data , such as correlation , mutual information or causality studies , Bayesian networks , path analysis , information-theo-retic approaches , and ordinary differential equations [ 8-10 ] . 
+ In short , most available approaches so far are based on a probabilistic framework which defines a probability distribution over the set of models . 
+ The reconstructed network is then defined as the most likely model given the data . 
+ Such an optimization problem is usually non convex , and finding a global optimum can not be guaranteed in practice . 
+ Existing algorithms report a local optimum which should be interpreted with care : errors can appear and no consensual model may be produced . 
+ As an illustration , special attention has been paid to the reconstruction of S. cerevisiae network from ChIP-chip data and protein-protein interaction networks [ 11 ] . 
+ A first regulatory network was obtained with promoter sequence analysis methods [ 12,13 ] , yet , some undetected transcriptional regulatory motifs were proposed using non-para-metric causality tests [ 14 ] . 
+ Moreover , Bayesian analysis also identified new regulatory modules for this network [ 15,16 ] . 
+ Thus , the results obtained with the different methods do not coincide and a fully data-driven search is in general subject to over-fitting and not fully reliable [ 17 ] . 
+ In regulatory networks an important and non-trivial physiological information is the regulatory role of TFs as inducer or repressor , also called the sign of the interaction . 
+ This information is needed if one wants to know , for instance , the physiological effect of a change caused by external conditions or the effect of a perturbation on the TF . 
+ While this can be achieved for one gene at a time with ( long and expensive ) dedicated experiments , probabilistic methods such as Bayesian models [ 18 ] or path analysis [ 19,20 ] are capable of proposing models from highthroughput experimental data . 
+ However , as for the network reconstruction task , these methods are based on optimization algorithms that compute an optimal solution with respect to an interaction model . 
+ In this paper , we apply formal methods to compute the sign of interactions in networks that have an available topology . 
+ By doing so , we also validate the topology of the network . 
+ Roughly , we use expression profiles to constrain the possible regulatory roles of TFs , and we report those regulations that are assigned the same role in all feasible models . 
+ Thus , we over-approximate the set of feasible models , and then look for invariants in this set . 
+ A similar idea was applied in [ 21 ] to check the consistency of gene expression assays . 
+ However , we use a deeper formalisation and stronger algorithmic methods to achieve the inference task . 
+ Different sources of large-scale data are exploited in this study : gene expression arrays , which provide information on the interaction signs ; and ChIP-chip experiments , which provide the topology of the regulatory network when not available . 
+ The main tasks we address are the following:
+ 1 . 
+ Building a formal model of regulation for a set of genes that integrates information from ChIP-chip data , sequence analysis , and literature annotations . 
+ 2 . 
+ Checking its consistency with expression profiles on perturbation assays . 
+ 3 . 
+ Inferring the regulatory role of TFs as inducer or repressor if the model is consistent with expression profiles . 
+ 4 . 
+ Isolating ambiguous pieces of information if it is not . 
+ The Results section is organised as follows . 
+ We first introduce the mathematical framework which is used to define and to test the consistency between expression profiles and transcriptional networks . 
+ Then , we apply our algorithms to address three main issues : 
+ • Analysis of the dependence between the number of available observations and the number of inferred regulations . 
+ In the case where all genes are observed , we prove that at most 40.8 % of E. coli network can be inferred and that 30 perturbation experiments are enough to infer 30 % of the network on average . 
+ In the case of missing observations , we estimate how the proportion of unobserved genes affects the number of inferred regulations . 
+ • Illustration and validation of our method on the transcriptional network of E. coli , obtained from RegulonDB [ 6 ] , with a compendium of expression profiles [ 9,22 ] . 
+ • Execution of our inference algorithms over the S. cerevi-siae transcriptional network . 
+ We inferred , for small scale subnetworks , more than 20 % of the roles of regulations . 
+ For more complex networks , we detected and isolated inconsistencies ( ambiguities ) between expression profiles and a significant part of the model ( 15 % of all the interactions ) . 
+ Results
+ Detecting the role of a regulation and validating a model Our goal is to determine the regulatory role of a TF on its target genes by using expression profiles . 
+ Let us illustrate our purpose with a simple example . 
+ We suppose that we are given the topology of a network ( this topology can be obtained from ChIP-chip data or any computational network inference method ) . 
+ In this network , let us consider a node A with a single predecessor . 
+ In other words , the model tells us that the protein B acts on the expression of the gene coding for A and no other protein acts on A. 
+ Independently , we suppose that we have several gene expression arrays at our disposal . 
+ One of these arrays indicates that A and B simultaneously increase during a steady state shift experiment . 
+ Then , common sense tells us that B must have been an activator of A during the experiment . 
+ More precisely , protein B can not have inhibited gene A since they both have increased . 
+ Consequently , we say that the model predicts the sign of the interaction from B to A as positive ( see Fig. 1 ) . 
+ This naive rule is actually used in a large class of models ; we will call it the naive inference rule . 
+ When several expres ¬ 
+ Model
+ B A
+ to A is an activation . 
+ F Illiugsutraeti1on of the simple inference rule Illustration of the simple inference rule . 
+ sion profiles are available , the predictions of the different profiles can be compared . 
+ If two expression profiles predict different signs for a given interaction , there is an ambiguity or inconsistency between data and model ( see Fig. 2 ) . 
+ Then , the ambiguity of the regulatory role can be attrib-uted to three factors : ( 1 ) a complex mechanism of regulation , the role of the interaction depends on the state of the system ; ( 2 ) a missing interaction in the model ; ( 3 ) an error in the experimental source . 
+ This simple strategy is implemented in the Algorithm 1 . 
+ Let us consider now the case when A is activated by two proteins B and C. No more natural deduction can be done when A and B increase during an experiment since the influence of C must be taken into account . 
+ A model of interactions between A , B , and C has to be proposed . 
+ Probabilistic methods estimate the most probable signs of regulations that fit with the theoretical model [ 18,23 ] . 
+ Our point of view is different ; we introduce a basic rule that shall be checked by each interaction in the model . 
+ This rule tells us that any variation of A must be explained by the variation of at least one of its predecessors . 
+ In previous papers , we introduced a formal framework to justify this basic rule under some reasonable assumptions . 
+ We also tested the consistency between expression profiles and a graphical model of cellular interactions . 
+ This formalism will be introduced here in an informal way ; its full justification and extensions can be found in the references [ 24-27 ] . 
+ In our example , the basic rule means that if B and C activate A , and both ( B and C ) are known to decrease during a steady state experiment , A can not be observed as increasing . 
+ Then A is predicted to decrease ( see Fig. 3 ) . 
+ More generally , we apply the rule as a constraint for the model , we 
+ B
+ write constraints for all the nodes of the model , and we use several approaches in order to solve the system of constraints . 
+ From the study of the set of solutions , we deduce which signs are surely determined by these rules . 
+ Then , we obtain necessary conditions on the signs instead of the most probable signs given by probabilistic methods . 
+ A formal approach
+ Consider a system of n chemical species { 1 , ... , n } . 
+ These species interact with each other and we model these interactions using an interaction graph G = ( V , E ) . 
+ The set of nodes is denoted by V = { 1 , ... , n } . 
+ There is an edge j → i ∈ E if the level of species j influences the production rate of species i. Edges are labelled by a sign { + , - } which indicates whether j activates or represses the production of i. 
+ In a typical stress perturbation experiment a system leaves an initial steady state following a change in control parameters . 
+ After waiting long enough , the system may reach a new steady state . 
+ In genetic perturbation experiments , a gene of the cell is either knocked-out or overexpressed ; perturbed cells are then compared to the reference . 
+ Our approach relies on the signs of the variations in expression or activity of the species in the network . 
+ Let us denote by sign ( Xi ) ∈ { + , - , 0 } the sign of the variation of species i during a given perturbation experiment , and by sign ( j → i ) ∈ { + , - } the sign of the edge j → i in the interaction graph . 
+ Let us fix species i such that there is no positive self-regu-lating action on i. For every predecessor j of i , sign ( j → i ) * sign ( Xj ) provides the sign of the influence of j on the spe-cies i. Then , we can write a constraint on the variation to interpret the rule that was previously stated : the variation of species i is explained by the variation of at least one of its predecessors in the graph . 
+ When the experiment is a genetic perturbation , the same equation holds for every node that was not genetically perturbed during the experiment and such that all its predecessors were not genetically perturbed . 
+ If a predecessor XM of the node was knocked-out , the equation becomes 
+ The same holds with + sign ( M → i ) when the predecessor XM was over-expressed . 
+ There is no equation for the genetically perturbed node . 
+ The sign algebra is the suitable framework for reading these equations [ 26 ] . 
+ It is defined as the set { + , - , ? 
+ , 0 } , provided with a sign consistency relation ≈ , and arithmetic operations + and × . 
+ The following tables describe this algebra : 
+ For a given interaction graph G , we will refer to the qualitative system associated with G as the set made up by applying constraint ( 1 ) for each node in G . 
+ We say that node variations Xi ∈ { + , - , 0 } are consistent with the graph G when they satisfy all the constraints associated with G using the sign consistency relation ≈ . 
+ With this material at hand , let us come back to our original problem , namely to infer the regulatory role of TFs from the combination of heterogeneous data . 
+ In the following we assume that : 
+ • The interaction graph is either given by a model to be validated , or built from ChIP-chip data and TF binding site search in promoter sequences . 
+ Thus , as soon as a TF j binds to the promoter sequence of gene i , j is assumed to regulate i . 
+ This is represented by an arrow j → i in the interaction graph . 
+ • The regulatory role of a TF j on a gene i ( as inducer or repressor ) is represented by the variable Sji , which is constrained by Eqs . 
+ ( 1 ) or ( 2 ) . 
+ • Expression profiles provide the sign of variation of the gene expression for a set of r steady-state perturbation , mutant , or over-expression experiments . 
+ In the following , xi will stand for the sign of the observed variation of gene i in experiment k. 
+ Our inference problem can now be stated as finding values in { + , - } for Sji , subject to the constraints : 
+ Most of the time , this inference problem has a huge number of solutions . 
+ However , some variables Sji may be assigned the same value in all solutions of the system . 
+ Then , the recurrent value assigned to Sji is a logical consequence of the constraints ( 3 ) , and a prediction of the model . 
+ We will refer to these inferred interaction signs as predictions of the qualitative system , that is , sign variables Sji that have the same value in all solutions of a qualitative system ( 3 ) . 
+ When the inference problem has no solution , we say that the model and the data are inconsistent or ambiguous . 
+ Let us illustrate this formulation with a very simple ( yet informative ) example . 
+ Suppose that we have a system of three genes A , B , C , where B and C influence A , as given in Fig. 4 . 
+ Let us say that for this interaction graph we obtained six experiments , and in each of them the variation of all products in the graph was observed . 
+ Using some or all of the experiments provided will lead us to different qualitative systems , as shown in Table 1 , hence to different inference results . 
+ e1 e2 e3 e4 e5 -- -- + C e6 + -- + IiFmnitegenuxrtaprsce , t rtei4sosniognrawpahsoofbtsherreveedgeinessixA , stBr , eCss wpehretruerbthaeiiorncheaxnpgeersiInteraction graph of three genes A , B , C , where their changes in expression was observed in six stress perturbation experiments . 
+ Algorithmic procedure
+ When the signs on edges of the interaction graph are known ( i.e. fixed values of Sji ) , finding consistent node variations Xi is a NP-complete problem [ 26 ] . 
+ When the node variations are known ( i.e. fixed values of Xi ) , finding the signs of edges Sji from Xi can be proven NP-complete in a very similar way . 
+ However , we have been able to design algorithms that perform efficiently on a wide class of regulatory networks . 
+ These algorithms predict signs of the edges when the network topology and the expression profiles are consistent . 
+ In case of inconsistency , though , they identify ambiguous motifs and propose predictions on parts of the network that are not concerned with ambiguities . 
+ The general process flow is as follows ( see the Methods section for details ) : 
+ Step 1 Sign Inference
+ Divide the graph into motifs ( each node with its predecessors ) . 
+ For each motif , find sign valuations ( see Algorithm 1 in the Appendix section ) that are consistent with all expression profiles . 
+ If there are no solutions , call the motif Multiple Behaviours Module ( MBM ) and remove it from the network . 
+ Solve again the remaining equations and determine the edge signs that are fixed to the same value in all the solutions . 
+ These fixed signs are called predicted edge and represent our predictions . 
+ Step 2 Global test/correction of the inferred signs
+ Solutions at the previous step are not guaranteed to be global . 
+ Indeed , two node motifs at step 1 can be consist-ent separately , but not altogether ( with respect to all expression profiles ) . 
+ This step checks global consistency by solving the equations for each expression profile . 
+ New Multiple Behaviours Modules can be found and removed from the system . 
+ Step 3 Extending the original set of observations
+ Once all conflicts have been removed , we get a set of solutions in which signs are assessed to both nodes and edges . 
+ Predicted nodes , representing inferred gene variations can be found in the same way as we did for edges . 
+ We add the new variations to the set of observations and return to step 1 . 
+ The algorithm is iterated until no new signs are inferred . 
+ Step 4 Filtering predictions
+ In the inconsistent case , the validity of the predictions depends on the accuracy of the model and on the correct identification of the MBMs . 
+ The model can be incomplete ( missing interactions ) , and MBMs are not always identifiable in a unique way . 
+ Thus , it is useful to sort predictions according to their reliability . 
+ Our filtering parameter is a positive integer k representing the number of different experiments with which the predicted sign is consistent . 
+ For a filtering value k , all the predictions that are consist-ent with less than k profiles are rejected . 
+ The inference process then generates three results:
+ 1 . 
+ A set of MBMs , containing interactions whose role was unclear and generated inconsistencies . 
+ We have identified several types of MBMs : 
+ • Modules of Type I : are composed of several direct regulations towards the same gene . 
+ They are detected in the Step 1 of the algorithm , and most of them are composed of only one edge like illustrated in Fig. 5 , but bigger examples exist . 
+ • Modules of Type II , III , IV : are detected in Steps 2 or 3 , hence they contain either direct regulations coming from the same protein or indirect regulations and/or loops . 
+ Each of these regulations represents a consensus of all the experiments , but when we attempt to assess them glo-bally , they lead to contradictions . 
+ The indices II-IV have no topological meaning , they label the most frequent situations and are illustrated in Fig. 5 . 
+ 2 . 
+ A set of inferred signs , meaning that the expression profiles fix the signs of certain interactions in a unique way . 
+ 3 . 
+ A reliability ranking of inferred signs . 
+ The filtering parameter k used for ranking is the number of different expression profiles that validate a given sign . 
+ On a computational level , the division between Step 1 ( which considers each small motif with all profiles together ) and Step 2 ( which considers the whole network with each profile separately ) is necessary to overcome the memory complexity of the search for solutions . 
+ To handle 
+ [ Type I ] [ Type II ] [ Type III ] [ Type IV ] C F i i f i t i s o foilugansusdrfenca5Sti.ocneroevitshaee MrualntspclreipBtieohnaavlionuertwMrokdules ( MBM ) Classification of the Multiple Behaviours Modules ( MBM ) found in S. cerevisiae transcriptional network . 
+ Green and red interactions correspond to inferred activations and repressions respectively . 
+ Significant differentially expressed genes of the MBM , during one experimental condition , are coloured green ( up-regulated ) , or red ( down-reg-ulated ) ( a ) Type I modules are composed by regulations towards the same gene . 
+ Regulations in this module were found to be inconsistent in at least 2 experiments . 
+ ( b ) Type II are composed by genes regulated by the same direct pred-ecessor . 
+ Explanation : The interaction among Sum1 and YFL040W is inferred at the Step 1 of the algorithm as an activation , while among Sum1 and DIT2 as an inhibition . 
+ During the correction step ( Step 2 ) , expression profiles related to one experiment showed that the expression of these two genes ( YFL040W and DIT2 ) is up-regulated . 
+ As it is impossible to state if SUM1 is up or down-regulated ( inconsistency ) , we mark this module as MBM . 
+ ( c ) Type III are composed by coloured genes that share a predecessor . 
+ ( d ) Type IV are composed by coloured genes sharing the same predecessor or successor . 
+ large scale systems we combine decision diagrams and constraint solvers ( see details in the Methods section ) . 
+ Since our basic rule is a weak constraint , we expect it to produce very robust predictions . 
+ On the other hand , there are theoretical limits to this approach . 
+ For certain interaction graphs , not a single sign may be inferred even with a high number of experiments . 
+ In the next paragraphs , we comment on the maximum number of signs that can be inferred from a given graph . 
+ In perturbation experiments , gene responses are observed following changes of external conditions ( temperature , nutritional stress , etc. ) , gene inactivations , knock-outs , or over-expression . 
+ When one expression profile is available for all the genes in the network we say that we have a complete profile , otherwise the profile is partial ( data is missing ) . 
+ In the following pragraphs we describe the results we obtained . 
+ First of all , in order to validate our formal approach , we evaluated the percentage of the E. coli network recovered from a reasonable number of artificial randomly generated perturbation experiments . 
+ Secondly , we combined real perturbation experiments with the E. coli network and computed the percentage of the recovered network . 
+ Finally , we performed the same previous analysis in a real setting of the S. cerevisiae network obtained from ChIP-chip data . 
+ On a computational level , we checked that our algorithms were able to handle large scale data , as produced by highthroughput measurement techniques ( expression arrays , ChIP-chip data ) . 
+ This is demonstrated in the following by considering networks of thousands of genes . 
+ Stress perturbation experiments : how many do you need ? 
+ For any given network topology , even when considering all possible experimental profiles , there are signs that can not be determined ( see Table 1 ) . 
+ Sign inference has thus a theoretical limit , referred to here as theoretical percentage of recovered signs , that is unique for a given network topology . 
+ If only some perturbation experiments are available , and/or data is missing , the percentage of inferred signs will be lower . 
+ For a given number N of available expression profiles , the average percentage of recovered signs is defined over all sets of N different expression profiles consistent with the qualitative constraints Eqs . 
+ ( 1 ) and ( 2 ) . 
+ In order to calculate the theoretical and the average percentages of recovered signs for the transcriptional network of E. coli , we modelled the network as an interaction graph using the public database RegulonDB [ 6 ] . 
+ For each transcriptional regulation A → B we added the corresponding arrow between genes A and B in the interaction graph . 
+ This graph will be referred to as the unsigned interaction graph . 
+ From the unsigned interaction graph of E. coli , we build the signed interaction graph by annotating the edges with a sign . 
+ Most of the time , the regulatory role of a TF is available in RegulonDB , however , when it is unknown or depends on the TF level , we arbitrarily choose the value + for this regulation . 
+ This provides a graph with 1529 nodes and 3802 edges , all signed edges . 
+ The signed interaction graph is used to generate complete expression profiles that simulate the effect of perturbations . 
+ More precisely , a perturbation experiment is represented by a set of gene expression variations { Xi } i = 1 , ... , n that are not entirely random , for they are constrained by Eqs . 
+ ( 1 ) and ( 2 ) . 
+ Then , we forget the signs of the network edges and compute the qualitative system with the signs of regulations as unknown . 
+ The theoretical maximum percentage of inference is given by the number of signs that can be recovered assuming that complete expression profiles of all conceivable perturbation experiments are available . 
+ We computed this maximum percentage using constraint solvers ( see Algorithm 2 in the Appendix section ) . 
+ We found that at most 40.8 % of the signs in the network can be inferred , corresponding to Mmax = 1551 edges . 
+ However , this maximum can be obtained only if all conceivable ( more than 250 ) perturbation experiments are done , which is in practice not possible . 
+ We performed computations to understand the influence of the number of experiments ( N ) on the inference . 
+ For each value of N ( from 5 to 200 ) , we generated 100 sets of N complete random expression profiles and performed our algorithm for each set . 
+ Then , the percentage of inference was calculated as a function of N . 
+ The resulting statistics are shown in Fig. 6 . 
+ We can obtain a theoretical formula explaining the saturation aspect of the curve in Fig. 6 . 
+ Let us suppose that the network contains M1 single incoming regulations . 
+ These can be inferred with probability one from only one experiment , using the naive algorithm ( see Algorithm 1 ) . 
+ Let us suppose a second category of interactions , whose signs are inferred with probability p ( 0 < p < 1 ) on average , per experiment . 
+ This implies that the average number of inferred signs for one experiment is M ( 1 ) = M1 + pM2 , where M2 is the number of interactions in the second category . 
+ Supposing now that inference failures are independent for different experiments , we obtain the average number of inferred signs for N experiments : M ( N ) = M1 + M2 ( 1 - ( 1 - p ) N ) . 
+ In general , we have M1 + M2 < E ( E is the total number of edges ) , meaning that there are edges whose signs can not be inferred . 
+ In our example , the value M1 = 609 corresponds to the average number of signs inferred by the naive algorithm . 
+ Surprisingly , by using our method we can significantly improve the naive inference with little effort . 
+ For the whole E. coli network it appears that a few expression profiles are enough to infer a significant percentage of the network . 
+ More precisely , 30 different expression profiles may be enough to infer one third of the network ( 1267 regulatory roles ) . 
+ Adding more expression profiles continuously increases the percentage of inferred signs . 
+ For N > 100 we are practically on the plateau close to 37.3 % ( this corresponds to M = 1420 signed regulations ) . 
+ According to our estimates the position of the plateau is M = M1 + M2 = 1420 , which is smaller than the theoretical maximum M < Mmax . 
+ The difference , although negligible in practice ( to obtain Mmax one has to perform N > 250 experiments ) , suggests that the plateau has a very weak slope . 
+ This means that contributions of different experiments to sign inference are weakly dependent . 
+ The values of M1 , M2 , p estimate the efficiency of our method : large p , M1 , M2 mean small number of expression profiles needed for inference . 
+ Inferring the core of the network
+ Obviously , not all interactions play the same role in the network . 
+ The core is a subnetwork that naturally appears for computational purposes and plays an important role in the system . 
+ It consists of all oriented loops and of all oriented chains leading to loops . 
+ All oriented chains leaving the core without returning are discarded when reducing the network to its core . 
+ Acyclic graphs and in particular trees have no core . 
+ The main property of the core is that if a system of qualitative equations has no solution , neither has the reduced system built from its core . 
+ Hence it corresponds to the most difficult part of the constraints to solve . 
+ It is obtained by reduction techniques that are very similar to those used in [ 28 ] ( see details in the Methods section ) . 
+ As an example , the core of E. coli network ( shown in Fig. 7 ) only has 28 nodes and 57 edges . 
+ In the previous section , we applied the same inference process to this graph . 
+ Not surprisingly , we noticed a rather different behaviour when inferring signs on a core graph than on a whole graph as demonstrated in Fig. 6 . 
+ In the former case , we needed many more experiments for the inference since the sets of expression profiles contained from N = 50 to 2000 random profiles . 
+ Two observations may be concluded . 
+ First , a greater number of experiments is required to reach a comparable percentage of inference ; the value of p is smaller than for the whole network . 
+ This confirms that the core is more difficult to infer than the rest of the network . 
+ Second , Fig . 
+ FCiogrueroef7E. coli network
+ Core of E. coli network . 
+ It consists of all oriented loops and of all oriented chains leading to loops . 
+ The core contains the dynamical information of the network , hence sign edges are more difficult to infer . 
+ 6 displays a much less continuous behaviour for the core . 
+ More precisely , when using the core , different perturbation experiments have a strongly variable impact on sign inference . 
+ For instance , the experimental maximum percentage of inference ( 27 signs over 58 ) can be obtained already from about 400 expression profiles , yet , most of the datasets with 400 profiles infer only 22 signs . 
+ This suggests that not only the core of the network is more difficult to infer , but also that a brute force approach ( multiplying the number of experiments ) may fail as well . 
+ This situation encourages us to apply experiment design and planning , that is , computational methods to mini-mise the number of perturbation experiments while inferring a maximal number of regulatory roles . 
+ This also illustrates why our approach is complementary to dynamical modelling . 
+ In the case of large scale networks , when an interaction stands outside the core of the graph , an inference approach is suitable for inferring the sign of the interaction . 
+ However , when an interaction belongs to the core of the network , more complex behav-iours occur ( e.g. influences that depend on activation thresholds ) thus , a precise modelling of the dynamical behaviour of this part of the network should be performed [ 29 ] . 
+ Influence of missing data
+ In the previous paragraphs , we assumed that all products in the network were observed . 
+ That is , in each experiment each node is assigned a value in { + , 0 , - } . 
+ However , in real measurement devices , such as expression profiles , a part of the values is discarded due to technical reasons . 
+ A practical method for network inference should cope with missing data . 
+ We studied the impact of missing values on the percentage of inference . 
+ For this , we have considered a fixed number of expression profiles ( N = 30 for the whole E. coli network , N = 30 and N = 200 for its core ) . 
+ Then , we have randomly discarded a growing percentage of observed products in the profiles , and computed the percentage of inferred regulations . 
+ The resulting statistics are shown in Fig. 8 . 
+ In both cases ( whole network and core ) , the dependency between the average percentage of inference and the percentage of missing values is qualitatively linear . 
+ Simple arguments allow us to find an analytic dependency . 
+ If not observing one node of the network implies losing information on d interaction signs , we are able to obtain the following linear dependency M max i = Mi - d * f * Mtotal ; where Mmax i is the number of inferred interactions for complete expression profiles ( no missing values ) , f is the fraction of unobserved nodes , and Mtotal is the total number of nodes . 
+ In order to keep Mi non negative , d must decrease with f . 
+ Our numerical results imply that the constancy of d and the linearity of the above dependency extend to rather large values of f . 
+ This indicates that our qualitative inference method is robust enough for practical use . 
+ For the whole network we estimated d = 0.35 , meaning that on average we lose one interaction sign for about 2.9 missing values . 
+ However , for the same number of expression profiles , the core of the network is more sensitive to missing data ( the value of d is larger , it corresponds to losing one sign for about 2.3 missing values ) . 
+ For the core , increasing the number of expression profiles increases d and hence the sensitivity to missing data . 
+ Application to E. coli network with a real compendium of expression profiles We validated our method on the transcriptional E. coli network using the compendium of expression profiles publicly available in [ 9 ] and [ 22 ] . 
+ This time the network was composed of 1418 nodes and 2888 edges . 
+ The difference with the previous model are the sigma-factors -- gene interactions . 
+ Several profiles were available , including a reference condition . 
+ We grouped together the different profiles corresponding to the same experiment ; for each gene we calculated its average variation in the group of profiles . 
+ When profiles were time series , we considered that each time series ends with steady state and we used the last state in the time series . 
+ Then , we sorted the measured genes in four classes : 2-fold up-regulated , 2-fold down-regulated , non-observed , and zero variation ; this last class corresponds to non significantly ( 2-fold ) expressed genes . 
+ Only the first two classes were used in the algorithm . 
+ Therefore , there will be missing data : for some edges , neither the input nor the output are observed . 
+ Altogether , we have processed 226 sets of expression profiles corresponding to 61 different experiments ( over-expression , gene-deletion , and stress perturbation ) . 
+ We verified , for all the experiments , that they correspond to the comparison between one perturbed condition against a control condition with identical levels in all chemical components except for the one altered in the perturbed condition . 
+ We applied our inference algorithm twice : the first time we used the signed network in a pre-processing step , in order to clean the expression data . 
+ It appears that the signed network is consistent with only 31 of the 61 selected experiments . 
+ After discarding the inconsistent motifs from each experiment ( deleting observations that caused conflicts ) , we stayed with 61 experiments which only contained the data consistent with the signed network . 
+ In these 61 experiments , on average 12.62 % of the network nodes were observed . 
+ When summing up all the observations , we obtained that 6.5 % ( 190 ) of the edges ( input and output ) were observed in at least one expression profile ; these represent the maximal set of signs that can be inferred at Steps 1 and 2 of our inference algorithm . 
+ In order to test our algorithm we wiped out the information on edge signs and then tried to recover it . 
+ Since the profiles and network were consistent , our algorithm found no ambiguity and predicted 38 signs , i.e. 20 % of the edges observed at least once ( input and output ) . 
+ The naive inference algorithm inferred 31 signs . 
+ Hence , 18 % of the total of our predictions could not be obtained by the naive algorithm . 
+ Afterwards , we tested our algorithm with the full set of observations , no data being discarded . 
+ Conflicts appeared and we filtered our inference with different parameters on the full set of 61 experiments including inconsistencies . 
+ This time 12.9 % of the network products were observed on average . 
+ When summing all the observations , 17.2 % ( 497 ) of the edges ( input and output ) were observed in at least one expression profile . 
+ Several values of the filtering parameter k were used from k = 1 to k = 5 . 
+ Without filtering we predicted 152 signs of the network ( 30 % of the edges observed at least once ) , among them , 41.4 % were not inferred by the naive algorithm . 
+ We compared the predictions to the known interaction signs : 28.3 % of the predictions were false predictions . 
+ Sources of errors may lie on non-modelled interactions ( possibly effects of sigma-factors ) , or in using experiments on different E. coli strains . 
+ Filtering improves our score allowing us to retain only reliable predictions . 
+ Thus , for k = 5 , we inferred 41 signs , of them , only 1 was an incorrect prediction ( 2.5 % of false prediction ) . 
+ We conclude that filtering is a good way to strengthen our predictions even when the model is not precise enough . 
+ We illustrated the effect of the filtering process in Fig. 9 . 
+ It should be noted that we obtained very similar results either by cleaning the data thanks to the signed network , either by using our filtering procedure . 
+ This is a particularly clear indication that this filtering procedure is an effective strategy to produce robust predictions . 
+ Our algorithm also detected ambiguous modules in the network . 
+ There are seven MBM of Type I ( i.e. single incoming interactions ) ; four of them are also stated as ambiguous by the naive algorithm . 
+ In addition , there are 4 MBM of Type II that are not detected by the naive inference algorithm . 
+ All the ambiguities are shown in Fig. 10 . 
+ A list of experimental assays that yield ambiguities on each interaction is given in the Supplementary Web site . 
+ This analysis shows that there exist non-modelled interactions that balance the effects on the targets in the MBM detected . 
+ wit r i t l i FRiegshulratesco9fmtphenidnifuemenocfe61algeoxrpitehrmmeapnpsliendottogloE.bcaolly nceothweorernkt Results of the inference algorithm applied to E. coli network with a compendium of 61 experiments not globally coherent . 
+ The dark and light regions of the bars correspond to false positive and validated predictions , respectively . 
+ Without filtering , there are 28.3 % of false positives . 
+ With filtering -- keeping only the sign predictions confirmed by k different experiments -- the rate of false positives decreases to 2.5 % . 
+ A real case : inference of signs in S. cerevisiae transcriptional regulatory network We applied our inference algorithm to the transcriptional regulatory network of the budding yeast S. cerevisiae . 
+ Let us here briefly review the available sources that can be used to build the unsigned regulatory network . 
+ The experimental dataset proposed by Lee et al. [ 11 ] is widely used in the network reconstruction literature . 
+ It is a study conducted under nutrient rich conditions , and it consists of an extensive ChIP-chip screening of 106 TFs . 
+ Estimations regarding the number of yeast TFs that are likely to regulate specific groups of genes by direct binding to the DNA vary from 141 to 209 , depending on the selection criteria . 
+ In follow-up papers of this work , the ChIP-chip analysis was extended to 203 yeast TFs in rich media conditions 
+ Fit i f . 
+ i Ianmgeburigarucetoi1ou0snswiinththaecroemgupleantodruymnedtawtaorokf oexpErecsosliothnaptraorfeles Interactions in the regulatory network of E. coli that are ambiguous with a compendium data of expression profiles . 
+ For each interaction , there exist at least two expression profiles that do not predict the same sign on the interaction . 
+ Dotted and filled lines represent the MBM of Type I and Type II , respectively . 
+ and 84 of these regulators in at least one environmental perturbation [ 12 ] . 
+ Analysis methods were refined in 2005 by MacIsaac et al. [ 13 ] . 
+ Other studies continued to work in this network using different approaches [ 10,14-16 ] . 
+ Here we selected two of these sources . 
+ All networks are provided in the Supplementary Web site . 
+ ( A ) The first network consists of the core of the transcriptional ChIP-chip regulatory network produced in [ 11 ] . 
+ Starting from the full network with a p-value of 0.005 , we reduced it to the set of nodes that have at least one output edge . 
+ This network was already studied in [ 28 ] . 
+ It contains 31 nodes and 52 interactions . 
+ ( B ) The second network contains all the transcriptional interactions between TFs shown by [ 11 ] with a p-value below 0.001 . 
+ It contains 70 nodes and 96 interactions . 
+ ( C ) The third network is the set of interactions among TFs as inferred in [ 13 ] from sequence comparisons . 
+ We have considered the network corresponding to a p-value of 0.001 and 2 bindings ( 83 nodes , 131 interactions ) . 
+ ( D ) The last network contains all the transcriptional interactions among genes and regulators shown by [ 11 ] with a p-value below 0.001 . 
+ It contains 2419 nodes and 4344 interactions . 
+ Inference process with gene-deletion expression profiles We first applied our inference algorithm to the large scale network ( D ) using a panel of expression profiles for 210 gene-deletion experiments [ 30 ] . 
+ The information given by this panel is quite small , since 1.6 % of all the products in the network is on average observed , and 12 % of the edges ( input and output ) of the network are observed in at least one expression profile . 
+ Using these data , we inferred 162 regulatory roles . 
+ We validated our prediction with a literature-curated network on Yeast [ 31 ] . 
+ We found that among the 162 signpredictions , 12 were referenced with a known interaction in the database , and 9 with a good sign . 
+ Gene-deletion expression profiles were used in order to compare our results to path analysis methods [ 20,23 ] since the latter can only be applied to knock-out data . 
+ Other sign-regulation inference methods needed either other sources of gene-regulatory information ( promoter binding information , protein-protein information ) , or time-series data to be performed [ 10,15,18 ] . 
+ First , we tested the consistency between the inferred network obtained from path analysis methods with the 210 gene-deletion experiments . 
+ We obtained that the network was inconsistent with 28 of the 210 experiments . 
+ Second , we compared the inference results for both methods , our approach and the path analysis method , obtaining in the latter that 234 roles of widely connected paths were inferred ; whereas with our method 162 roles were inferred , mainly localised in the branches of the network . 
+ Both results intersected on 17 interactions and no contradiction in the inferred role was reported . 
+ An illustration of these results is given in the Supplementary Web site . 
+ This suggests that our approach is complementary to path analysis methods . 
+ Our explanation is as follows : in [ 20,23 ] , network inference algorithms identify probable paths of physical interactions connecting a gene knockout to genes that are differentially expressed as a result of that knock-out . 
+ This leads to a search for the smallest number of interactions that carry the largest information in the network . 
+ Hence , inferred interactions are located near the core of the network , but not exactly in the core . 
+ On the contrary , as we already mentioned , the combinatorics of interactions in the core of the network are too intricate to be determined from a few hundreds of expression profiles with our algorithm , thus , we concentrate on interactions around the core . 
+ Inference with stress perturbation expression profiles To overcome the problem exposed using the small amount of information contained in [ 30 ] , we have used stress perturbation experiments . 
+ These data correspond to curated information available in SGD ( Saccharomyces Genome Database ) [ 32 ] . 
+ When time series profiles were available , we selected the last time expression array . 
+ Therefore , we collected and treated 15 experiments described in Table 2 . 
+ For each expression array , we sorted the measured genes in four classes : 2-fold up-regulated , 2-fold down-regulated , non-observed , and zero variation . 
+ Full datasets are available in the Supplementary Web site . 
+ As in the case of E. coli , it appeared that all the networks ( A ) , ( B ) , ( C ) , and ( D ) were not consistent with the whole set of expression arrays . 
+ Thus , when executing our algorithms we identified motifs that held ambiguities , and we marked them as MBM of type I-IV ( as described in our inference algorithm ) . 
+ We also generated a set of inferred signs and applied the filtered algorithm ( with filter k = 3 ) to the large scale network ( D ) . 
+ We obtained our total inference rate by adding the number of inferred signs fixed in an unique way to the number of non-repeated interactions in the MBM detected , and dividing it by the total number of edges in the network . 
+ In Table 3 we illustrate the inference rate obtained for each of the networks . 
+ Depending on the network , the inference rate varies from 19 % to 37 % ; thus , they are similar to the theoretical rates obtained for E. coli 
+ All experiments contain information on steady state shift and their curated data is available in SGD (Saccharomyces Genome Database) [32].
+ network even with a small number of perturbation experiments ( 14 or 15 ) . 
+ We validated the inferred interactions comparing them to the literature-curated network published in [ 31 ] . 
+ We obtained 631 predictions when no filtering is applied . 
+ Furthermore , among the 198 interactions predicted with a filter parameter k = 3 , 19 were referenced with a known interaction in the database , and only 1 prediction had a wrong sign . 
+ As in the case of E. coli , we conclude that filtering is a good way to produce extremely robust predictions . 
+ Additionally , we compared our predictions to the naive inference algorithm finding that the naive algorithm usually predicts half of the signs that we obtain . 
+ In Fig. 11 we illustrate the inferred interactions for Network ( B ) . 
+ As already mentioned , the algorithm identified a large number of ambiguities . 
+ The exhaustive list of MBM is given in the Supplementary Web site and the Type I modules of size 2 found for the networks ( A ) , ( B ) , and ( C ) are detailed in Table 4 . 
+ We noticed that the MBM of Type I were detected in the four networks ; whereas the MBM of Type II-IV were only detected , in an large number , for Network ( D ) ; Type II MBM being the most numerous ( 85.4 % ) . 
+ For each MBM , a precise biological study of the species should enable to understand the origin of the ambiguity : erroneous expression data , missing interactions in the model , or context-dependent regulations . 
+ Contribution of expression profiles to the inference Analysing only the sign inference process on the global network ( D ) , we wish to estimate how the 14 experiments used influence the unique way { + , - } inferred signs . 
+ On that account we address the following question : Assuming that all the inferred roles in Step 1 of our inference algorithm are correct , which is the experiment that marks more inferred roles as inconsistent ( i.e. that generates more MBM ) ? 
+ Therefore , we classified the 14 experiments according to the MBM of Type II-IV generated per experiment . 
+ MBM of Type I are not included in this computation , for they are inferred in Step 1 of the algorithm . 
+ The results of this classification are shown in Fig. 12 . 
+ The fourth chart illustrates that the real contribution of each expression profile does not depend on the amount of observed genes it contains . 
+ Discussion
+ Predicting from a `` small '' number of expression profiles In principle , inferring the functional effect of regulations could be done using general reconstruction methods . 
+ The most outstanding approaches in this domain include Bayesian networks [ 33 ] , linear ordinary differential equations ( ODE ) [ 34,35 ] and correlation/causal networks [ 14,16,36 ] ( see [ 10 ] for a review , and a comparison on several datasets ) . 
+ These are quantitative methods which are carefully designed to cope with the high level of noise that is generally observed in expression data . 
+ They rely either on an explicit parametric modelling of noise distribution ( like in Bayesian networks ) , either on robust statisnFil i i i f t ) t t e eCnogacdseusrspf , erc4oa13ct24eo4snseodofgrehtshee1g4loebxapletrriamnescnrtispuisoendalinnethwosirgkn - ( i2n4fe1r9-Classification of the 14 experiments used in the sign-inference process for the global transcriptional network ( 2419 nodes , 4344 edges ) . 
+ The experiments are represented by their identifier ( see Table 2 ) . 
+ Each experiment has a twofold contribution : it spots inconsistent modules ( MBM that are further excluded from inference ) and it predicts interaction roles . 
+ Some experiments have more predictive power , just because they include more genes . 
+ In order to normalise the predictive power , we divided the percentage of predictions by the percentage of observed nodes . 
+ For each experiment we have estimated : ( A ) Number of significant ( 2-fold ) up/down-regulated genes . 
+ ( B ) Percentage of edges in the spotted MBMs of type II-IV divided by the percentage of observed genes . 
+ ( C ) Percentage of inferred signs divided by the percentage of observed genes . 
+ ( D ) Real contribution of each experiment , calculated by subtracting C ( inference ) from B ( eliminated inconsistency ) ; negative values correspond to experiments whose main role is to spot ambiguities . 
+ tical estimators for the network and its kinetic parameters . 
+ The main limitation of these approaches is the number of independent samples they require in order to be properly used . 
+ It is often stated [ 10,36 ] that a minimum of 100 to 300 expression profiles are needed for the estimation procedure . 
+ While there exists a couple of datasets of such size , the usual number of available profiles for a given biological system is much smaller . 
+ Our approach is meant to be used when the number of profiles ranges from 1 to a couple of hundreds , and should thus be seen as complementary to quantitative methods . 
+ Indeed our simulations on E. coli network show that one can characterise about 30 % of the regulations from 30 expression profiles . 
+ We additionally showed that this is close to the theoretical limit of our approach . 
+ This result was confirmed using expression data on the same network : we infer 20 % of the regulations whose input and output are simultaneously observed in at least one experiment , using 61 expression profiles . 
+ Generating accurate predictions
+ The problem of inferring functional effect of transcription factors was specifically addressed by Yeang and colleagues [ 20,23 ] , using a probabilistic discrete model . 
+ In this approach , one identifies probable paths of physical interactions connecting a gene knock-out to genes that are differentially expressed as a result of that knock-out . 
+ Predictions correspond to the signs found in models of maximum likelihood . 
+ More generally , most reconstruction methods are based on computing an `` optimal '' model with respect to the data . 
+ This raises two main issues . 
+ First , the underlying optimization problems are often non convex , and finding a global optimum is a very difficult computational task . 
+ In practice , most algorithms only guarantee to find a local optimum , which should be cautiously examined before being reported as a prediction . 
+ Second , even if a global optimum is found , it is important ( but computationally difficult ) to check that there is no slightly sub-optimal model that yields very different predictions . 
+ In other terms , it is necessary to evaluate the robustness of the predictions . 
+ In our approach , we describe the ( possibly huge ) set of models that are consist-ent with the data , then look for invariants in this set . 
+ This means that our predictions are compatible with all feasible models . 
+ In order to cope with experimental noise , we combine this strategy with a filtering procedure , which selects predictions that agree with a minimal number of expression profiles . 
+ This led us to very accurate predictions , as it was shown on data from E. coli and yeast . 
+ We compared our inference approach to the path analysis method by Yeang and colleagues [ 20,23 ] . 
+ We found that both algorithms infer a similar number of regulations , and that the predictions coincide . 
+ We noticed that the predictions are located in different parts of the network , depending on the algorithm : path analysis tends to infer signs in highly connected regions , while our approach infer signs on regulations acting on small in-degree nodes . 
+ Another difference is that path analysis requires expression profiles from gene-deletion experiments , whereas our method gives better results with stress perturbation experiments ( though it can be applied to both types of experiment ) . 
+ Sign inference and network topology
+ Using simulations , we evaluated the dependence between the number of available expression profiles and the number of signs that can be inferred from them . 
+ Not surprisingly , we noticed that the topology of the regulatory network has a strong influence on the estimated relationship . 
+ This was illustrated by computing statistics on both a complete regulatory network and its core . 
+ The complete network is characterised by an over-representation of feedback-free regulatory cascades , which are controlled by a small number of TFs . 
+ In this setting , the number of inferred signs grows almost continuously with the number of observations . 
+ In contrast , the core network does not obey the simple law `` the more you observe , the better '' , some expression profiles being clearly more informative than others . 
+ Additionally , in these core networks an unfeasible number of experiments is necessary to infer a small number of signs with high probability . 
+ For these core networks , two different strategies may be adopted . 
+ First , to build a more accurate model for these restricted subnetworks using dynamic modelling techniques ( see [ 29 ] for a review ) . 
+ Second , to develop experiment planning in our qualitative framework : given some control parameters , how to find the most informative experiments while keeping their number as low as possible ? 
+ Conclusion
+ In this work we proposed a discrete approach for a particular case of reconstruction problem : given a set of regulations between genes , and a set of expression profiles , determine the functional effect of each regulation , as activation or inhibition . 
+ Our approach is based on a qualitative modelling framework , that was initially introduced to check the consistency between a regulatory network and expression data [ 24,25 ] . 
+ This framework is based on a rule , which basically says that if the expression of a gene varies between two conditions , then this should be accounted for by the variation of at least one of its predecessors . 
+ Here we applied this approach to predict the functional effect of transcription factors on their target genes . 
+ While intuitive and simple , the qualitative rule we propose can be used to infer a significant number of regulatory effects from a reasonable number of expression profiles . 
+ As shown using data on E. coli and yeast , the predictions are particularly reliable , especially when they are validated with our filtering procedure . 
+ Furthermore , our algorithms can handle datasets of realistic size . 
+ It should be noted that computing the predictions presented in this work requires to solve thousands of NP-hard problems ( more precisely , constraints with variables on a finite domain ) . 
+ Each of these problem has several thousands of variables . 
+ Nevertheless , our algorithms are exact and compute the predictions in no more than an hour using a standard desktop PC . 
+ This means that they are able to cope with system-wide data in a fairly reasonable amount of time . 
+ Due to the structure of the algorithms , we are confident that they can handle even larger datasets in less time , by distributing the computations on several machines . 
+ From our results on yeast , it appears that a significant proportion of the network -- as given by ChIP-chip data -- is not compatible with the available expression profiles . 
+ As explained in the Results section , these data is discarded from the analysis , in order to compute safe predictions -- but at the expense of a loss of information . 
+ The subject of our current work is to develop an improved notion of prediction , that copes better with inconsistent network and data . 
+ The goal is to include inconsistent data in the inference process , while preserving the reliability of the predictions . 
+ where Xi stands for the sign of the variation of species i in experiment k , and Sji the sign of the influence of species j on species i. Recall that the graph G itself comes from chIP-chip experiments or sequence analysis . 
+ Using expression arrays , we obtain an experimental value for some var-k k iables Xi , which will be denoted xi ; more generally uppercase ( resp . 
+ lowercase ) letters will stand for variables of the systems ( resp . 
+ constants + , - or 0 ) . 
+ A single equation in the system ( 4 ) can be viewed as a predicate Pi , k ( X , S ) where i denotes a node in the graph and k one of the r available experiments . 
+ If the value for some variables in the equation is known , the predicate resulting from their instantiation will be denoted Pi , k ( X , S ) [ xk , s ] . 
+ Our problem can now be stated as follows : given a set of expression profiles x1 , ... , xr , decide if the predicate : 
+ Decision diagram encoding
+ In a previous work [ 26 ] , we showed how the set of solutions of a qualitative system can be computed as a decision diagram [ 37 ] . 
+ A decision diagram is a data structure meant to represent functions on finite domains ; it is widely used for the verification of circuits or network protocols . 
+ Using such a compact representation of the set of solutions , we proposed efficient algorithms for computing solutions of the systems , hard components , and other properties of a qualitative system . 
+ Back to our problem : in order to predict the regulatory role of TFs on their target genes , it is enough to compute the decision diagram representing the predicate ( 5 ) , and compute its hard components as proposed in [ 26 ] . 
+ This approach is suitable for systems of at most a couple of hundred variables . 
+ Above this limit , the decision diagram is too large in memory complexity . 
+ In our case however , we consider systems of about 4000 variables at most , which is far too large for the above mentioned algorithms . 
+ In order to cope with the size of the problem , we propose to investigate a particular case , when all species are observed , in all experiments . 
+ In this case , i ≠ j implies that P ( X , S ) [ xk ] and P ( X , S ) [ xk i , k j , k ] share no variables . 
+ This means that P may be satisfied if and only if each predicate may be satisfied . 
+ As a consequence , a variable Sji is a hard component of P if and only if it is a hard component of Pi , . . 
+ Pi , . 
+ correspond to the constraints which relate species i to its predecessors in G for all experiments . 
+ The number of variables in Pi , . 
+ is exactly the in-degree of species i in G , which is at most 10 -- 20 in biological networks . 
+ As soon as some species are not observed in some experiment , the predicates Pi , . 
+ share some variables and it is not guaranteed to find all hard components by studying them separately . 
+ A brief investigation showed ( data not shown ) that due to the topology of the graph , most of the equations are not independent any more , even with few missing nodes . 
+ Note however , that any hard component of Pi , . 
+ is still a hard component of P . 
+ The same statement holds for 
+ In practice , this algorithm is very effective in terms of computation time and number of hard components found . 
+ However , as already stated , it is not guaranteed to find all hard components of P . 
+ This is what motivates the technique described in the next paragraph . 
+ Solving with Answer Set Programming
+ In order to solve large qualitative systems , we also tried to encode the problem as a logic program , in the setting of answer set programming ( ASP ) . 
+ While decision diagrams represent the set of all solutions , finding a model for a logic program provides one solution . 
+ In order to find hard components , it is enough to check for each variable V , if there exists a solution such that V = + and another solution such that V = - . 
+ The ASP program we used in order to solve the qualitative system is given in supplementary materials . 
+ In the following we will denote by asp_solve ( P ) the call to the ASP solver on the predicate P . 
+ The returned value is an admissible valuation if there is one , or ⊥ otherwise . 
+ The complete algorithm is reported below 
+ We use clasp for solving ASP programs [ 38 ] , which performs astonishingly well on our data . 
+ The procedure described in Algorithm 3 is particularly efficient in finding non hard components : generating one solution may be enough to prove non hardness of many variables at a time . 
+ To sum up , in order to solve a system of qualitative equations ( 4 ) with only partial observations , we use Algorithm 2 first and thus determine most ( if not all ) hard components . 
+ Then , Algorithm 3 is used for the remaining components , which are nearly all non hard . 
+ Reduction technique
+ As mentioned in the Result section , interaction graphs may be reduced in a way that preserves the satisfiability of the associated qualitative system . 
+ Consider a graph G with defined signs on its edges . 
+ If some node n has no successor , then deletes it from G. Note then , that any solution of the qualitative system associated to the new graph can be extended in a solution to the system associated to G . 
+ The same statement holds if one iteratively delete all nodes in the graph with no successor . 
+ The result of this procedure is the subgraph of G such that any node is either on a cycle , or has a cycle downstream . 
+ We refer to it as the core of the interaction graph . 
+ The core of an interaction graph corresponds to the most difficult part to solve , because extending a solution for the core to the entire graph can be done in polynomial time , using a breadth-first traverse . 
+ Diagnosis for noisy data
+ When working with real-life data , it may happen that the predicate P defined in Eq . 
+ ( 5 ) can not be satisfied . 
+ This may be due to three ( non exclusive ) reasons : 
+ • the sign on an edge depends on the state of the system 
+ In the third case , the conditions for deriving Eq . 
+ ( 1 ) are not fulfilled for one node and its qualitative equation should be discarded . 
+ This , however , does not affect the validity of the remaining equation . 
+ In all cases , isolating the cause of the problem is a hard task . 
+ We propose the following diagnosis approach : as P is a conjunction of smaller predicates , it might happen that some subsets of the predicates are not satisfiable yet . 
+ Our strategy is then to find `` small '' subsets of predicates which can not be satisfied . 
+ A particularly interesting feature of this approach is that by selecting subsets of Pi , . 
+ , . 
+ predicates , the result might directly be interpreted and vis-ualised as a subgraph of the original model . 
+ How to determine if a sign can be inferred
+ In the Results section , we have seen some examples showing that even when all feasible observations are available , it might not be possible to infer all signs in the interaction graph . 
+ Whether or not a sign can be inferred depends on the topology of the graph , and also on the actual signs on interactions . 
+ In practice , it is thus impossible to tell from the unsigned graph only if a sign can be recovered . 
+ However , it is still interesting to evaluate on fully signed interaction networks which part can be inferred . 
+ A trivial algorithm for this consists in explicitly generating all feasible observations and using the algorithms described above . 
+ This is unfeasible due to the number of observations . 
+ With the notations introduced above , consider an observation X and sign variables S for an interaction graph . 
+ Pi ( X , S ) denotes the constraints that link the variation of a node i to that of its predecessors given the signs of the interactions . 
+ Moreover , the real signs in the graph are denoted by s. For each node i , we build the predicate giving the feasible observations on node i and its predecessors , given the rest of the graph and the real signs s 
+ O (X) = ∃X P (X, s) i jj∈{i}∪pred(i) , i,k 1≤i≤n
+ Then , the constraint that we can derive on S variables is : for any observation X that is feasible Pi ( X , S ) should hold . 
+ This constraint is more formally defined by 
+ Ci(S) = ∀XOi(X) ⇒Pi(X, S)
+ Finally , the hard components of Ci are exactly the signs that can be inferred using all feasible observations . 
+ Let us sum up the procedure : 
+ 1. compute P(X, S) = ∧1 ≤ i ≤ n Pi(X, s)
+ 2. compute Oi from P and the actual signs s
+ 3 . 
+ compute Ci , the constraints of signs given all feasible observations 
+ 4 . 
+ compute the hard components of Ci , which are exactly the signs that can be inferred . 
+ If it is not possible to compute P ( X , S ) ( mainly because the interaction graph is too large ) , we use a more sophisticated approach based on a modular decomposition of the interaction graph . 
+ The resulting algorithm , as well as all inference algorithms , experimental data , and the results obtained for the S. cerevisiae , and E. coli regulatory networks can be found at : http://www.irisa.fr/symbiose/ interactionNetworks/supplementaryInference.html . 
+ Authors' contributions
+ PV participated in designing the algorithms described in the Methods section and in performing the simulations . 
+ CG designed the algorithms described in the Results section , and performed the analysis on E. coli and yeast data . 
+ OR made the statistical modeling in the Results section . 
+ MLB participated in implementing the algorithms on decision diagrams . 
+ All authors participated in writing the manuscript , read and approved its final state . 
+ Appendix
+ Algorithm 2
+ Heuristic for finding hard components in large interaction networks with many expression profiles . 
+ Algorithm 3
+ Exact algorithm for finding the set of hard components of P , based on logic programming . 
+ Acknowledgements
+ The authors are particularly grateful to B. Kauffman , M. Gebser , and T. Schaub from the University of Potsdam for their help on CLASP software . 
+ They also wish to thank the referees for their interesting and constructive remarks .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/18697768.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/18697768.txt 0 → 100644
View file @27818a9
+ BIOINFORMATICS ORIGINAL PAPER
+ Edward Wijaya1 ,2 , Siu-Ming Yiu3 , Ngo Thanh Son1 , Rajaraman Kanagasabai2 and Wing-Kin Sung1 ,4 , ∗ 1School of Computing , National University of Singapore , Singapore 119260 , 2Institute for Infocomm Research , 21 Heng Mui Keng Terrace , Singapore 119613 , 3Department of Computer Science , The University of Hong Kong , Pokfulam Road , Hong Kong and 4Genome Institute of Singapore , 60 Biopolis Street , # 02-01 Genome , Singapore 138672 Received on May 9 , 2008 ; revised on August 3 , 2008 ; accepted on August 7 , 2008 Advance Access publication August 12 , 2008 Associate Editor : Alex Bateman 
+ ABSTRACT
+ Motivation : Locating transcription factor binding sites ( motifs ) is a key step in understanding gene regulation . 
+ Based on Tompa 's benchmark study , the performance of current de novo motif ﬁnders is far from satisfactory ( with sensitivity ≤ 0.222 and precision ≤ 0.307 ) . 
+ The same study also shows that no motif ﬁnder performs consistently well over all datasets . 
+ Hence , it is not clear which ﬁnder one should use for a given dataset . 
+ To address this issue , a class of algorithms called ensemble methods have been proposed . 
+ Though the existing ensemble methods overall perform better than stand-alone motif ﬁnders , the improvement gained is not substantial . 
+ Our study reveals that these methods do not fully exploit the information obtained from the results of individual ﬁnders , resulting in minor improvement in sensitivity and poor precision . 
+ Results : In this article , we identify several key observations on how to utilize the results from individual ﬁnders and design a novel ensemble method , MotifVoter , to predict the motifs and binding sites . 
+ Evaluations on 186 datasets show that MotifVoter can locate more than 95 % of the binding sites found by its component motif ﬁnders . 
+ In terms of sensitivity and precision , MotifVoter outperforms stand-alone motif ﬁnders and ensemble methods signiﬁcantly on Tompa 's benchmark , Escherichia coli , and ChIP-Chip datasets . 
+ MotifVoter is available online via a web server with several biologist-friendly features . 
+ Availability : http://www.comp.nus.edu.sg/∼bioinfo/MotifVoter Contact : ksung@comp.nus.edu.sg supplementary information : Supplementary data are available at Bioinformatics online . 
+ 1 INTRODUCTION
+ Understanding the regulatory mechanism of genes is one of the major challenges faced by biologists today . 
+ Central to this problem is the identiﬁcation of recurring patterns ( motifs ) in regulatory sequences , which represent binding sites of a transcription factor . 
+ In general , one would like to identify transcription factors whose binding sites are statistically over-represented in the promoter of a set of co-regulated genes . 
+ Many motif ﬁnders have been proposed to undertake this problem using different approaches , such as proﬁlebased methods ( Nimwegen , 2007 ) and consensus-based methods ( Eskin and Pevzner , 2002 ; Pavesi et al. , 2001 ; Wijaya et al. , 2007 ) . 
+ However the existing tools are still not effective for discovering motifs ( Das and Dai , 2007 ; Tompa et al. , 2005 ) . 
+ For example , as shown in Tompa et al. 's evaluation , even the best performing algorithm has sensitivity < 13 % and precision < 35 % ( sensitivity is percentage of true nucleotides that are predicted and precision is the percentage of predicted nucleotides that are true ) . 
+ To deal with this issue , some literatures ( Harbison et al. , 2004 ; Hu and Kihara , 2005 ; MacIsaac and Fraenkel , 2006 ) hinted that assembling the results of multiple motif ﬁnders can provide better results for motif ﬁnding . 
+ For example , Harbison et al. ( 2004 ) observed different motif ﬁnders have different strengths . 
+ They successfully identiﬁed more binding sites by combining results of six motif ﬁnders compared to using only single ﬁnder . 
+ In fact , the benchmark datasets from Tompa et al. ( 2005 ) also support this . 
+ By simply taking the union of all binding sites predicted by 10 selected motif ﬁnders , the sensitivity can be increased by more than double over each selected motif ﬁnder . 
+ However , the union of all predicted sites could contain a lot of noise . 
+ It is not trivial to distinguish the real binding sites from the noise . 
+ Six ensemble methods have been developed . 
+ SCOPE ( Chakravarty et al. , 2007 ) and BEST ( Dongsheng et al. , 2005 ; Jensen and Liu , 2006 ) rerank all motifs predicted by individual ﬁnders based on a certain scoring function and report the top motif . 
+ WebMotifs ( Gordon et al. , 2005 ; Romer et al. , 2007 ) , MultiFinder ( Huber and Bulyk , 2006 ) and RGSMiner ( Huang et al. , 2004 ) assume motifs given by the consensus of several motif ﬁnders are likely to be the real motifs . 
+ They cluster the motifs and report one motif from the best cluster . 
+ All above methods select one representative motif among the predicted motifs from the individual ﬁnders . 
+ If none of the motifs from the ﬁnders can capture the binding sites accurately , the performance of these methods will suffer . 
+ EMD ( Hu et al. , 2006 ) goes a step further and considers the binding sites predicted by the motifs . 
+ Motifs of the same rank from different component motif ﬁnders are grouped together . 
+ For each group , based on the binding sites predicted by all the motifs in the group , new motifs are derived . 
+ Though these ensemble methods indeed help to improve the performance of motif ﬁnding , the improvement is not signiﬁcant . 
+ For example , in Tompa 's benchmark and Escherichia coli datasets , the average sensitivity is only improved by 62 % but the average precision is reduced by 15 % . 
+ We believe that some important aspects have been overlooked by existing ensemble methods : 
+ These observations are also supported by Hu and Kihara ( 2005 ) and MacIsaac and Fraenkel ( 2006 ) . 
+ In this article , we show how to formalize these observations and derive two criteria , namely discriminative and consensus criteria , to combine results from multiple motif ﬁnders . 
+ Extensive evaluation shows that our proposed ensemble method , MotifVoter , can extract almost all information from multiple motif ﬁnders . 
+ On 186 datasets for Tompa 's benchmark ( Tompa et al. , 2005 ) and E.coli ( Salgado et al. , 2004 ) , MotifVoter retains more than 95 % of true binding sites predicted by at least one individual motif ﬁnder . 
+ This implies that MotifVoter improves the sensitivity by 120 % and precision by 77 % when compared with the best individual motif ﬁnder and it also shows an improvement of 90 % in sensitivity and 135 % in precision when compared to the existing ensemble methods . 
+ Applying MotifVoter on 140 yeast and mammalian ChIP-Chip datasets , we demonstrate that MotifVoter is scalable . 
+ Furthermore , in these ChIP-Chip datasets , we notice that the motifs reported by MotifVoter are more similar to true motifs when compared with those reported by existing stand-alone motif ﬁnders and ensemble methods . 
+ It may be noted that , unlike other ensemble methods that only work on the predicted motifs , MotifVoter and EMD work on both the predicted motifs and binding sites . 
+ MotifVoter differs from EMD in the discriminative and consensus criteria used for grouping ( clustering ) the motifs . 
+ First , the discriminative criterion requires the selected cluster to of motif share as many binding sites as possible , while requiring that motifs outside the cluster share none or few binding sites . 
+ Second , the consensus criterion requires the motifs in the selected cluster to be contributed by as many motif ﬁnders as possible . 
+ We propose a scoring function that captures the two criteria and derive an efﬁcient algorithm to select a cluster that optimizes this scoring function . 
+ In addition to these two criteria , MotifVoter selects only those binding sites of high conﬁdence to construct the new motif , rather than using all binding sites predicted by motifs in the same group ( cluster ) . 
+ Compared to existing ensemble schemes , MotifVoter has the following advantages : 
+ 3 METHODS
+ MotifVoter uses 10 basic motif ﬁnders . 
+ The ﬁrst group consists of three motif ﬁnders based on ( l , d ) model , namely MITRA ( Eskin and Pevzner , 2002 ) , Weeder ( Pavesi et al. , 2001 ) , and SPACE ( Wijaya et al. , 2007 ) . 
+ The second group consists of seven motif ﬁnders based on PWM model , namely AlignACE ( Hughes et al. , 2000 ) , ANN-Spec ( Workman and Stormo , 2000 ) , BioProspector ( Liu et al. , 2001 ) , Improbizer ( Ao et al. , 2004 ) , MDScan ( Liu et al. , 2002 ) , MEME ( Bailey and Elkan , 1995 ) and MotifSampler ( Thijs et al. , 2001 ) . 
+ The details of these component motif ﬁnders and parameter descriptions can be found in Supplementary Material 10 . 
+ Section 3.1 formally describes the discriminative and consensus criteria used in the motif ﬁltering step . 
+ Then , in Section 3.2 , we present the heuristic algorithm for ﬁnding a cluster of motifs that satisfy the two criteria . 
+ The details of the sites extraction step are given in Section 3.3 . 
+ Finally , Section 3.4 describes the performance evaluation measures we used for benchmarking MotifVoter against other existing methods . 
+ 3.1 The discriminative and consensus criteria
+ Given two motifs x and y , we ﬁrst describe how to measure their similarity based on the binding sites deﬁned by these motifs . 
+ Let I ( x ) be the set of binding sites deﬁned by motif x. Let I ( x ) ∩ I ( y ) be the set of regions covered by at least one site in x and one site in y. Let I ( x ) ∪ I ( y ) be the set of regions covered by any site of x or y . 
+ We denote the total number of nucleotides of all the regions in I ( x ) ∩ I ( y ) and I ( x ) ∪ I ( y ) , as | I ( x ) ∩ I ( y ) | and | I ( x ) ∪ I ( y ) | , respectively . 
+ The similarity of x and y , denoted sim ( x , y ) , is expressed as | I ( x ) ∩ I ( y ) | / | I ( x ) ∪ I ( y ) | . 
+ Note that 0 ≤ sim ( x , y ) ≤ 1 and sim ( x , x ) = 1 . 
+ Now , we deﬁne the scoring function to capture the discriminative criterion . 
+ Given m motif ﬁnders and each motif ﬁnder reports its top-n candidate motifs , there will be a set P of mn candidate motifs . 
+ Among all the candidate motifs in P , some of them will approximate the real motif while the other will not . 
+ We would like to identify the subset X of P such that the candidate motifs in X are likely to approximate the real motif . 
+ The principle idea is that if the candidate motifs in X can model the real motif , motifs in X should be highly similar while motifs outside X should be distant from one another ( discriminative criteria ) . 
+ Let X be some subset of candidate motifs of P . 
+ The mean similarity among the candidate motifs in X , denoted as sim ( X ) , is deﬁned as : ∑ 
+ 2 ( ( sim ( x , y ) − sim ( X ) ) x , y ∈ X The function w ( X ) measures the degree of similarity among the candidate motifs in X . 
+ If many of the candidate motifs in X approximate the real motif , we should expect to have a high w ( X ) . 
+ On the other hand , we expect the complement of X , that is P − X , should have a low w ( P − X ) . 
+ Thus , w ( P − X ) constitutes the discriminative criterion in the clustering procedure . 
+ In other words , if X is the set of candidate motifs which approximates the real motif , we expect to have a high A ( X ) score , where : w ( P − X ) Note that there may be multiple sets of X with the same A ( X ) score . 
+ Among those X 's , we would select one that contains maximum number of motif ﬁnders . 
+ The latter constitutes the consensus criterion . 
+ In summary , this stage aims to ﬁnd X ⊆ P which maximizes A ( X ) while X contains the candidate motifs predicted by maximum number of motif ﬁnders . 
+ The naive method to identify X is to enumerate all possible X and check if they satisfy the above two criteria . 
+ However , this approach is computationally infeasible . 
+ In the next section , we describe our proposed heuristics to identify X to overcome this difﬁculty . 
+ 3.2 Heuristics for motif ﬁltering
+ In this subsection , we describe the heuristic algorithm for identifying X , the set of similar candidate motifs in the ﬁrst stage . 
+ Let P be all motifs found by m motif ﬁnders , where each motif ﬁnder returns n motifs . 
+ Steps 1 -- 3 compute the pairwise similarity scores for all pairs of motifs . 
+ Based on these similarity scores , we apply the following heuristics approach to ﬁnd X ( Steps 4 -- 9 ) . 
+ Instead of enumerating all possible subsets in P which takes exponential time , we only consider subsets Xz , j = { z , p1 , ... , pj } for every z ∈ P and for every 1 ≤ j ≤ | P | − 1 where sim ( z , pi ) > sim ( z , pi +1 ) and pi ∈ P . 
+ The rationale behind examining only these subsets is as follows . 
+ For each z ∈ P , let a and b be two other motifs such that sim ( a , z ) > sim ( b , z ) . 
+ That is , a is more similar to z than b. Then , it is likely that A ( { z , a , b } ) > A ( { z , b } ) . 
+ By including pj in the subset , we also include all pi with i < j. For each X , we compute the value of A ( X ) and the number of motif z , j z , j ﬁnders contributing to Xz , j , denoted as Q ( Xz , j ) . 
+ Finally , we set X to Xz , j with maximum A ( Xz , j ) value . 
+ If there are more than one such subset , pick the one with the largest Q ( Xz , j ) . 
+ In case of a tie again , we pick one randomly . 
+ The time complexity of heuristics can be shown to be O ( m2n2 ) as follows . 
+ There are | P | = mn motifs . 
+ For each motif z , we consider mn different 
+ Output : PWM of the aligned sites 1 : for each z , y ∈ P do 2 : compute sim ( z , y ) 3 : end for 4 : for each z ∈ P do 5 : sort p1 , p2 , ... , pk such that sim ( z , pi ) ≥ sim ( z , pi +1 ) ; ∀ i = 1 . . 
+ k 6 : consider sets X = { z , p , ... , p } ; ∀ j = 1 , ... , k z , j 1 j 7 : compute A ( Xz , j ) for all such Xz , j 8 : set Q ( Xz , j ) = number of motif ﬁnders contributing to Xz , j 9 : end for = 10 : set X Xz , j with the maximum A ( Xz , j ) score , if there are two such Xz , j 's , pick the one with the largest Q ( Xz , j ) 11 : extract and align sites of motifs in X 12 : construct PWM 
+ Xz , j subsets . 
+ For each subset , we need to compute the value of A ( Xz , j ) for all j. Note that we can obtain the value of A ( Xz , j ) from the value of A ( Xz , j − 1 ) in constant time . 
+ So , for each motif z , to compute all values of A ( Xz , j ) , it takes O ( mn ) time . 
+ Therefore , the overall time complexity is O ( m2n2 ) . 
+ For the actual running time of the heuristics , please refer to Supplementary Material 5 . 
+ 3.3 Sites extraction
+ From X , we obtain the list of sites using two requirements . 
+ First , we accept all regions which are covered by sites deﬁned by at least two motifs x and y in X , where x and y are predicted by two different motif ﬁnders . 
+ The reason behind is that it is unlikely that several motif ﬁnders predict the same spurious binding sites . 
+ Second , we accept all the sites of the motif x in X with the highest conﬁdence in approximating real motif . 
+ The conﬁdence score is deﬁned as follows . 
+ Let B ( x ) be the total number of nucleotides covered by the sites deﬁned by motif x. Let O ( x ) be the total number of nucleotides covered by the sites deﬁned by motif x and all other motifs in X . 
+ The conﬁdent score of motif x is deﬁned as O ( x ) / B ( x ) . 
+ For the selected sites that are covered by more than one motif ﬁnder , we further apply a post-processing procedure to reﬁne each site by removing the nucleotides that are only covered by a single ﬁnder to increase the precision of our prediction as these nucleotides are likely to be noise . 
+ Given all the sites predicted by MotifVoter , we generate a PWM motif to model the motif . 
+ To achieve this , we ﬁrst align these sites using MUSCLE ( Edgar , 2004 ) , then a PWM is generated from this alignment to model the motif . 
+ Figure 1c provides an illustration of this process . 
+ 3.4 Performance evaluation measures
+ For performance evaluation , we use the same four measures proposed in ( Tompa et al. , 2005 ) namely , sensitivity ( nSN ) , positive predictive value1 ( nPPV ) , performance coefﬁcient ( nPC ) and correlation coefﬁcient ( nCC ) . 
+ Index n is used to denote that the assessment is done at the nucleotide level instead of site level . 
+ The deﬁnitions of these performance measures can be found in Supplementary Material 4 . 
+ For ChIP-Chip datasets which do not have sites position information , we use sum of squared distance ( SSD ) to evaluate the performance of motif ﬁnders . 
+ It is one of the most widely used methods for comparing PWMs ( Mahony et al. , 2007 ) . 
+ Let L be length of two aligned motifs X and Y , SSD is deﬁned as : 
+ 1Throughout the article , we use the term precision instead of positive predictive value . 
+ where pxi ( b ) and pyi ( b ) are the probabilities of base b occurring at position i in motif X and Y , respectively . 
+ Note that 0 ≤ SSD ( X , Y ) ≤ 2 and SSD ( X , X ) = 2 . 
+ If the motifs are of different length , we use the shorter motif to align different regions of the longer motif and obtain the maximum SSD value . 
+ 4 EXPERIMENTAL RESULTS
+ There is no universal benchmark dataset for evaluating a motif ﬁnder . 
+ We thus performed an extensive evaluation on MotifVoter over 326 datasets from Tompa 's benchmark , E.coli and ChIP-Chip datasets . 
+ In our experiments , we consider the top-30 ( i.e. n = 30 ) motifs reported by each individual motif ﬁnder . 
+ For the effect of different values of n , please refer to Supplementary Material 6 . 
+ 4.1 Comparison using Tompa’s benchmark dataset
+ This section compares MotifVoter with existing methods using Tompa 's benchmark datasets . 
+ The datasets are constructed based on real transcription factor binding sites drawn from four different organisms ( human , fruitﬂy , mouse and yeast ) . 
+ The background sequences are based on three categories : ( 1 ) real upstream sequences ( real ) , ( 2 ) randomly chosen upstream sequences with binding sites inserted ( generic ) and ( 3 ) sequences randomly generated by a Markov chain with binding sites inserted ( markov ) . 
+ It consists of 56 datasets . 
+ The number of sequences per dataset ranges from 1 to 35 and the sequence lengths are up to 3000 bp ( Tompa et al. , 2005 ) . 
+ Such diverse characteristics of the benchmark make it a good candidate for evaluating the robustness of a motif ﬁnder . 
+ The graph in Figure 2a shows the average performance of all stand-alone and ensemble motif ﬁnders based on four performance evaluation measures [ see also Table 1 for the actual ﬁgures of sensitivity ( nSN ) and precision ( nPPV ) ] using Tompa 's benchmark dataset . 
+ Among the stand-alone motif ﬁnders , Weeder and SPACE perform somewhat similar and outperform the other 15 stand-alone motif ﬁnders by all four measures . 
+ However , integration of motif ﬁnders enables MotifVoter to further enhance the performance . 
+ The improvement gained over SPACE is 215 % in sensitivity and 45 % in precision . 
+ We believe that MotifVoter is able to piece together different subregions of sites identiﬁed by the stand-alone motif ﬁnders , thus yielding signiﬁcant improvement in sensitivity . 
+ Furthermore , since MotifVoter imposes strict discriminative and consensus criteria on the clustering procedure , we get signiﬁcant improvement in precision also . 
+ For ensemble methods , MotifVoter outperforms the sensitivity of BEST and precision of SCOPE by 54.1 % and 226.2 % , respectively . 
+ We can not directly evaluate RGSMiner and MultiFinder because of their limited applicability . 
+ For WebMotifs , it only takes probe names as input , so we only evaluate WebMotifs on ChIP-Chip dataset ( see Section 4.3 ) . 
+ In principle , these three ensemble methods select one motif out of all motifs reported by the component motif ﬁnders . 
+ Although we are not able to run these ﬁnders on the dataset , to predict the upper bound of the performance of these ﬁnders , we can select the most sensitive and precise motif for each dataset . 
+ We found that even if we do so , the sensitivity is at least 48.6 % lower than MotifVoter and the precision is at least 41.2 % lower . 
+ For EMD , a direct evaluation is done in the next subsection on E.coli dataset . 
+ However , in this benchmark if we cluster the motifs of the same rank given by 10 motif ﬁnders and pick the most sensitive cluster , the highest possible sensitivity is 21.3 % lower than MotifVoter . 
+ The main reason why MotifVoter yields better performance is because apart from the ability to capture true sites from various motif ﬁnders , it also includes true sites that may come from different ranks . 
+ 4.2 Comparison using E.coli dataset
+ Despite its comprehensive assortments of datasets , Tompa 's benchmark does not fully represent real biological settings since it still contains synthetic sequences ( markov ) and mixture of real and synthetic ( generic ) sequences . 
+ Here we evaluate MotifVoter using real E.coli datasets . 
+ This species is not included in Tompa 's benchmark . 
+ The datasets are taken from RegulonDB ( Salgado et al. , 2004 ) ( http://regulondb.ccg.unam.mx/ ) , and the sequences are generated from the intergenic regions of E.coli genome . 
+ In total , there are 62 datasets . 
+ The average number of sequences per dataset is 12 and the average sequence length is 300 bp . 
+ The graph in Figure 2b shows the average performance of all stand-alone and ensemble motif ﬁnders based on four performance evaluation measures [ see also Table 1 for the actual ﬁgures of 
+ The nSN and nPPV for stand-alone motif ﬁnders in Tompa 's benchmark are taken directly from Tompa et al. ( 2005 ) . 
+ For E.coli dataset , we are unable to obtain the result of a few stand-alone motif ﬁnders ( marked with dash ) , because either these programs are not available or the output does not give binding sites for us to evaluate the results ( e.g YMF ) , during the preparation of this article . 
+ Motif ﬁnders marked with asterisks ( * ) are ensemble methods . 
+ Note that for some ensemble methods , we are not able to execute on the given datasets , we only estimate their upper bound ( see Section 4.1 for details on how we obtain the upper bound ) . 
+ sensitivity ( nSN ) and precision ( nPPV ) ] using the E.coli dataset . 
+ MotifVoter consistently outperforms the stand-alone motif ﬁnders by about 168 % in terms of sensitivity . 
+ For precision , although the improvement is not so much as for sensitivity , MotifVoter still yields a signiﬁcant improvement over stand-alone motif ﬁnders exceeding the highest precision by 80.9 % . 
+ Three ensemble methods ( SCOPE , BEST , EMD ) evaluated on this dataset outperform the stand-alone algorithms in terms of sensitivity . 
+ However , even for the most sensitive ensemble method ( EMD ) among the three , the improvement is not signiﬁcant . 
+ On the other hand , MotifVoter outperforms the EMD ensemble method by 130.2 % in sensitivity . 
+ As for precision , MotifVoter also consistently performs better with a 45.9 % improvement over the best ensemble method ( SCOPE ) . 
+ 4.3 Application on ChIP-Chip datasets
+ In this section , we evaluate MotifVoter on yeast and mammalian ChIP-Chip datasets . 
+ These datasets provide an ideal platform for assessing the scalability and applicability of MotifVoter to the entire genome . 
+ The yeast ChIP-Chip datasets and TF proﬁles are obtained from Harbison et al. ( 2004 ) ( http://fraenkel.mit.edu/Harbison/ ) . 
+ For mammalian datasets , we evaluate nine transcription factors : CREB ( Zhang et al. , 2005 ) , E2F ( Ren et al. , 2002 ) , HNF4/HNF6 ( Odom et al. , 2004 ) , MYOD/MYOG ( Cao et al. , 2006 ) , NFKB ( Schrieber et al. , 2006 ) , NOTCH ( Palomero et al. , 2006 ) , SOX ( Boyer et al. , 2005 ) . 
+ Further details about the yeast and mammalian datasets sources can be found in Supplementary Material 1 . 
+ Out of 65 cases , MotifVoter can identify 56 motifs . 
+ As for the missing ones , we found that the individual ﬁnders do not perform well . 
+ Figure 3 shows the evaluation using SSD values of the stand-alone and ensemble motif ﬁnders based on the 56 successful cases . 
+ Note that SSD values can give an estimation of how close the prediction of one motif ﬁnder is to the real PWM in each dataset . 
+ Table 2 shows the average SSD performance of each motif ﬁnder . 
+ On yeast ChIP-Chip datasets , among all stand-alone and ensemble motif ﬁnders , Improbizer has the highest average SSD value ( 1.639 ) . 
+ MotifVoter outperforms Improbizer with an average SSD value of 1.919 . 
+ Weblogos of the actual motifs predicted by MotifVoter on these datasets can be found in Supplementary Material 3 . 
+ Figure 4 shows the nine PWMs found by MotifVoter in agreement with canonical motifs in mammalian datasets . 
+ Evaluation on MotifVoter shows that it has an average SSD value of 1.824 , while the best performing existing motif ﬁnder achieves an average SSD value of 1.530 . 
+ SSD of MotifVoter on mammalian datasets is lower than that on yeast since the datasets of mammalian genome are much more complex than yeast . 
+ Note that the SSD score is at most 2.
+ 5 DISCUSSION
+ This section further investigates the effectiveness of MotifVoter in terms of the following aspects : ( 1 ) the optimality of MotifVoter ; ( 2 ) the effects of the discriminative and consensus criteria used in MotifVoter and ( 3 ) the robustness of MotifVoter and the running time of MotifVoter . 
+ 5.1 Optimality of MotifVoter
+ In general , the performance of any ensemble method is bounded by its component motif ﬁnders . 
+ The total number of true binding sites that are covered by any of its component motif ﬁnders can be regarded as a rough upper bound on the performance of an ensemble method . 
+ We say an ensemble motif ﬁnder is - optimal if it can ﬁnd fraction of the true sites covered by at least one of its component motif ﬁnders . 
+ Evaluation on Tompa 's benchmark shows that the highest possible sensitivity that can be achieved all by the component motif ﬁnders is 0.440 . 
+ The sensitivity of MotifVoter is 0.419 which is 95.2 % of this upper bound . 
+ In E.coli dataset , the highest possible sensitivity is 0.643 and MotifVoter achieves 95.7 % . 
+ This empirical study shows that MotifVoter is a near optimal ensemble method . 
+ 5.2 Analysis of the effects of MotifVoter’s criteria
+ The discriminative criterion of MotifVoter makes use of the information provided by the false positive ( spurious ) motifs/sites to guide the clustering of true motifs/sites . 
+ This information is not fully utilized by existing ensemble methods . 
+ The consensus criterion in MotifVoter requires the cluster to be globally contributed by as many motif ﬁnders as possible . 
+ This can enhance the conﬁdence that the cluster contains good motifs/sites . 
+ These two key ideas distinguish our ensemble method from the existing ones . 
+ This subsection experimentally shows that both criteria in MotifVoter are equally important in their own right . 
+ Figure 5a shows the evaluation results on Tompa 's benchmark datasets . 
+ Sensitivity of MotifVoter without both criteria ( Case IV ) is 80 % lower than BEST and precision is 15 % lower than Weeder . 
+ Similarly in Figure 5b for E.coli , when neither discriminative nor consensus criteria are implemented , the precision of MotifVoter is lower than that of SCOPE . 
+ In contrast , by including both criteria MotifVoter can improve the sensitivity by 163.2 % and precision by 45.9 % . 
+ 5.3 Robustness and time complexity MotifVoter
+ We also evaluate the performance of MotifVoter on various background sequences and various species in Tompa 's benchmark . 
+ MotifVoter achieved the highest nSN and nPPV on all three backgrounds . 
+ The major sensitivity improvement is on real datasets ( 275 % ) , followed by generic datasets ( 128 % ) . 
+ As for evaluation on three species , MotifVoter made major sensitivity improvement on human dataset ( 314 % ) followed by fruitﬂy ( 263 % ) , while the least improvement was made on yeast datasets ( 84 % ) . 
+ MotifVoter relies on its component motif ﬁnders . 
+ Thus , a natural question is whether the performance of MotifVoter will degrade if we include some motif ﬁnders with poor performance . 
+ The performance of MotifVoter does degrade as more random motif ﬁnders ( representing motif ﬁnders with poor performance ) are included . 
+ However , even if we include ﬁve random motif ﬁnders ( that is half of the real motif ﬁnders we used ) , the performance of MotifVoter is still greater than that of the best individual motif ﬁnder by 183 % in sensitivity and by 10.4 % in precision . 
+ These results imply that MotifVoter is robust even if some of the component motif ﬁnders perform badly . 
+ We remark that it is possible that MotifVoter may predict wrong motifs if almost every individual ﬁnder reports similar wrong motifs . 
+ However , MotifVoter can reduce the chance of ﬁnding such wrong motifs signiﬁcantly in comparison to individual ﬁnders because the chance of having many individual ﬁnders to report the same ( similar ) spurious motif is low since every ﬁnder has its own scheme of predicting the motifs . 
+ Running all 10 component motif ﬁnders for MotifVoter is not always practical . 
+ We show that the sensitivity and precision of MotifVoter are still signiﬁcantly better than the best motif ﬁnder when we run only the ﬁve fastest motif ﬁnders . 
+ The details of the above robustness evaluations are shown in Supplementary Material 5 . 
+ In Supplementary Material 8 , we evaluate and discuss the characteristics of binding sites missed by MotifVoter . 
+ We also show that MotifVoter can identify motifs in motif modules that contain multiple motifs work in groups , provided that these motifs have signals strong enough to be detected by the individual motif ﬁnders used in MotifVoter . 
+ Details can be found in Supplementary Material 9 . 
+ 6 CONCLUSIONS
+ We have presented MotifVoter , a simple yet effective ensemble method that utilizes both positive and negative information in the motifs returned by the basic motif ﬁnders and achieves near-optimal sensitivity . 
+ In extensive evaluations on many ChIP-Chip and benchmark datasets , we observed that MotifVoter outperforms all stand-alone and existing ensemble motif ﬁnders signiﬁcantly . 
+ It is also robust with respect to the inclusion of random motif ﬁnders and number of component motif ﬁnders . 
+ A key advantage of MotifVoter is its ﬂexibility in incorporating new algorithms . 
+ That is , if a novel superior motif ﬁnding algorithm is made available , it can be readily incorporated into the MotifVoter platform . 
+ In the website version , we provide an option for users to provide results from the component motif ﬁnders that are not included in our main list . 
+ Furthermore , MotifVoter , as a web application , has several biologist friendly features such as a parameter-free interface , ability for users to choose their preferred component motif ﬁnders and option for speedier return of results using the fastest motif ﬁnders . 
+ ACKNOWLEDGEMENTS
+ We are grateful to the authors of the component motif ﬁnders used by MotifVoter for making their tools available for public use . 
+ We thank C. Dongsheng for providing command line version of BEST . 
+ Finally , the authors would like to thank the reviewers for all their useful and constructive comments . 
+ Funding : National University of Singapore ( grant R-252-000-326-112 ) ; Research Output Prize ( Faculty of Engineering ) of the University of HongKong to S.M.Y.
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/18974181.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/18974181.txt 0 → 100644
View file @27818a9
+ EcoCyc: A comprehensive view of Escherichia coli
+ ABSTRACT 
+ EcoCyc ( http://EcoCyc.org ) provides a comprehensive encyclopedia of Escherichia coli biology . 
+ EcoCyc integrates information about the genome , genes and gene products ; the metabolic network ; and the regulatory network of E. coli . 
+ Recent EcoCyc developments include a new initiative to represent and curate all types of E. coli regulatory processes such as attenuation and regulation by small RNAs . 
+ EcoCyc has started to curate Gene Ontology ( GO ) terms for E. coli and has made a dataset of E. coli GO terms available through the GO Web site . 
+ The curation and visualization of electron transfer processes has been significantly improved . 
+ Other software and Web site enhancements include the addition of tracks to the EcoCyc genome browser , in particular a type of track designed for the display of ChIP-chip datasets , and the development of a comparative genome browser . 
+ A new Genome Omics Viewer enables users to paint omics datasets onto the full E. coli genome for analysis . 
+ A new advanced query page guides users in interactively constructing complex database queries against EcoCyc . 
+ A Macintosh version of EcoCyc is now available . 
+ A series of Webinars is available to instruct users in the use of EcoCyc . 
+ SRI International , 333 Ravenswood Ave. , Menlo Park , CA 94025 , Program of Computational Genomics , Centro de Ciencias Genómicas , Universidad Nacional Autónoma de México , AP 565-A , Cuernavaca , Morelos 62100 , Mexico , Department of Microbiology , Immunology , and Molecular Genetics and the Molecular 3 Biology Institute , University of California , Los Angeles , CA 90095 , J. Craig Venter Institute , Rockville , 4 MD 20850 , USA and Department of Chemistry and Biomolecular Sciences , Macquarie University , Sydney , NSW , 5 Australia 2109 
+ OVERVIEW OF EcoCyc CONTENT
+ Since the last NAR Database Issue publication on EcoCyc four years ago ( 1 ) , signiﬁcant additions and changes to the content and features of EcoCyc have occurred . 
+ EcoCyc staff perform an ongoing literature-based curation of the Escherichia coli genome , whose methodology and results were described in detail in 2007 ( 2 ) . 
+ The EcoCyc curators edit gene names and functions , and write mini-reviews about each E. coli gene product and multimeric complex . 
+ These mini-reviews include extensive citations to the experimental literature . 
+ In mid-2006 , EcoCyc reached an important milestone when EcoCyc curators had performed literature searches for every E. coli gene and had written mini-reviews for every gene for which experimental literature was found . 
+ In EcoCyc 12.5 , released during fall 2008 , 2650 ( 59.3 % ) E. coli genes have experimentally deﬁned functions . 
+ Table 1 provides an overview of the current contents of EcoCyc . 
+ RECENT INITIATIVES
+ Electron transfer enzymes and associated pathways in the membrane
+ Previously , curation of electron transfer reactions in EcoCyc was limited to brief written summaries of the gene products and protein complexes . 
+ This approach did not provide for a visual representation of the electron transfer enzymes in the membrane , nor did it indicate known or potential roles in cellular electron transfer and proton movement relative to the cell compartments . 
+ To address these issues , we have extended the Pathway Tools software that underlies EcoCyc in two respects : First , it can now visually depict electron transfer enzyme complexes and their associated balanced oxidation/reduction reactions ( Figure 1 ) . 
+ Reaction displays now show enzyme membrane localization , the ﬂow of all substrates and products , and the fate of the protons associated with the overall reactions . 
+ Second , the software can now depict electron transfer pathways that consist of coupled systems of electron transfer enzymes ( Figure 2 ) . 
+ E. coli possesses more than 25 enzymes and enzyme complexes that participate in the oxidation of primary electron donors or in the reduction of terminal electron acceptors during different cell culture conditions . 
+ The literature-based curation for approximately 15 electron transfer enzymes and enzyme complexes has been updated , and associated membrane depictions and balanced reactions are available . 
+ Electron transfer pathways have been generated and curated for 10 sets of electron donor/acceptor pairs . 
+ An example of a membrane depiction is shown in Figure 1 for the E. coli enzyme NADH dehydrogenase I , encoded by the nuoABCDEFGHIJKLMN operon . 
+ Herein , the oxidation of NADH is shown to occur at the cytoplasmic face of the enzyme with electron transfer within the enzyme to the physiological electron acceptors , ubiqui-none ( UQ ) or menaquinone ( MQ ) . 
+ Combining the oxidation reactions for a physiological electron donor and an acceptor yields an electron transport pathway . 
+ For example , in Figure 2 the NADH dehydrogenase I enzyme shown in Figure 1 is combined with cytochrome bo oxidase ( cyoABCD ) to represent the transfer of electrons from NADH to molecular oxygen ( O2 ) . 
+ Net movement of protons across the membrane by each enzyme complex provides , in part , the proton motive force ( PMF ) needed for ATP synthesis . 
+ Updates to regulation of transcription initiation
+ Curation of transcriptional regulation is performed by the RegulonDB group at the Center for Genomic Sciences , Universidad Nacional Autónoma de México . 
+ Curation of older literature on transcriptional regulation was completed in December 2006 and since then , data from new literature is consistently added to EcoCyc shortly after publication . 
+ After reports of differences and apparent inconsistencies between the transcriptional regulatory networks of EcoCyc and RegulonDB appeared ( 3,4 ) , we undertook detailed curation that led to fully synchronized content and releases in both databases ( 5 ) . 
+ Other systematic curation efforts included the sigmulons of s54 ( RpoN ) , s28 ( FliA ) , s19 ( FecI ) , s24 ( RpoE ) , s32 ( RpoH ) , and s38 ( RpoS ) ; various metabolic and motility regulons ; and representations of the binding sites for the ArcA and NarL transcription factors . 
+ In addition , we have developed guidelines for transcription factor summaries to include relevant physiological data found in the literature that can not be easily added as database objects . 
+ Many summaries have been updated according to these guidelines . 
+ To facilitate the tracking and querying of data based on the quality of the evidence , we have classiﬁed the types of evidence used to annotate regulatory objects as ` strong ' or ` weak ' . 
+ Strong evidence corresponds to experiments -- irrespective of methodology -- that provide direct physical evidence . 
+ Examples of strong evidence include the experimental mapping of transcription start sites and DNA binding of puriﬁed transcription factors . 
+ Evidence such as that from gene expression analyses that provide only indirect evidence is considered weak . 
+ Strong and weak evidence types are graphically distinguished by using solid or dashed lines for the corresponding objects ( such as promoter arrows ) . 
+ To expand the information about transcription regulation of E. coli , the RegulonDB group has incorporate various new types of experimental and predicted data into EcoCyc . 
+ A collection of 259 new transcription start sites , which resulted from a high-throughput experimental modiﬁed RACE approach , was added ( 6 ) . 
+ Promoters and DNA binding sites with evidence from at least two types of high-throughput data ( such as computational predictions , microarrays and ChIP-chip experiments ) have been added to EcoCyc . 
+ Examples include a collection of 54 s32 promoters experimentally identiﬁed by ChIP-chip and by gene expression assays ( 7 ) ; 45 s32 promoters identiﬁed by microarray analysis , transcription initiation mapping and computational analysis ( 8 ) ; and 45 Fur DNA binding sites identiﬁed by computational prediction and binding of puriﬁed protein ( 9 ) . 
+ Beyond regulation of transcription initiation
+ EcoCyc has included information about the regulation of both transcription initiation and enzyme activity for many years . 
+ A major new EcoCyc initiative is to expand the database schema and content to include other types of regulation , such as attenuation and regulation of translation by small RNAs ( sRNAs ) . 
+ For example , the EcoCyc schema can now represent all six known types of regulation by attenuation of transcription , each of which involves slightly different database ﬁelds to capture aspects such as the regulatory ligand , protein and RNA regions involved . 
+ This initiative will provide both more complete information about E. coli regulation and the regulatory datasets that can be used by bioinformaticians to develop predictors for a broader diversity of regulatory interactions from genome datasets . 
+ All known examples of ribosome-mediated attenuation in the pathways of amino acid biosynthesis have been added to EcoCyc in release 12.5 . 
+ For example , Figure 3 shows regulation of the thrLABC operon by attenuation , which is modulated by the availability of charged isoleucyl-and threonyl-tRNA . 
+ In this example of attenuation , translation of the thrL leader peptide open reading frame inﬂuences the formation of an attenuator structure . 
+ When charged isoleucyl - and threonyl-tRNAs are abundant , unobstructed translation by the ribosome enables the formation of a secondary structure that acts as a terminator , releasing RNA polymerase and halting transcription of the operon . 
+ On the EcoCyc display , the charged tRNAs are represented as rods . 
+ Their role in modulating termination at the attenuator is indicated by their red color and the ` X ' near the terminator structure ; this shows at a glance that a charged tRNA leads to premature termination . 
+ Curation of other attenuation systems is ongoing . 
+ An example of the representation of regulation by sRNAs is shown in Figure 4 . 
+ The transcription unit that encompasses the glmUS operon is shown . 
+ Expression of this operon is regulated at the level of transcription initiation by the transcription factor NagC ( 10 ) , whose binding sites are shown as green boxes upstream of the glmUS transcription start site . 
+ In addition , the sRNA GlmZ was recently shown to regulate translation of the second open reading frame , glmS ( 11,12 ) . 
+ glmS encodes L-glutamine :D - fructose-6-phosphate aminotransferase , th enzyme that catalyzes the ﬁrst step in the biosynthesis of UDP-N-acetylglucosamine , which is used as the precursor for the synthesis of peptidoglycan , lipid A and the enterobacterial common antigen . 
+ Genetic experiments suggest that full-length GlmZ interacts directly with the 0 5 UTR of glmS , unmasking the ribosome binding site and thus activating translation ( 11,12 ) . 
+ The interaction of GlmZ with the glmUS mRNA is shown by a bar ( representing GlmZ ) that is connected with lines to glmUS , suggesting base-pairing at the position indicated . 
+ The 12.5 release of EcoCyc contains 19 examples of attenuation and 15 examples of regulation by mechanisms other than transcription initiation , attenuation , or regulation of enzyme activity . 
+ We are actively expanding both the curation of the preceding regulatory mechanisms and the ability of the Pathway Tools software to handle additional regulatory mechanisms . 
+ Annotation of EcoCyc gene products with Gene Ontology terms
+ Gene Ontology ( GO ) is an accepted standard for ontological annotation of gene products ( www.Gene Ontology.org ) . 
+ The EcoCyc project has been annotating E. coli genes with GO terms for the past two years . 
+ Overall , the more than 38 000 GO terms present in EcoCyc have been derived from four sources : ( i ) GO terms were inferred from a mapping from the original MultiFun ( 13 ) ontology annotations within EcoCyc to GO terms ; ( ii ) GO terms were inferred from a mapping from the Enzyme Commission ( EC ) numbers present within EcoCyc to GO terms ; ( iii ) GO term assignments are manually curated by EcoCyc curators on an ongoing basis ; and ( iv ) many GO terms were imported into EcoCyc from UniProt . 
+ EcoCyc and the EcoliWiki project ( www.EcoliWiki.net ) are jointly producing an oficial data ﬁle of E. coli GO terms that we regularly submit to the GO project , and that is available from the GO Web site at http://www.geneontology.org/GO.current . 
+ annotations.shtml . 
+ GO terms are found on EcoCyc gene and gene product pages and provide a useful way of ﬁnding all E. coli genes with a common function . 
+ For example , rsmD encodes an rRNA methyltransferase and is annotated with the GO process term for rRNA methylation , GO :0031167 . 
+ Clicking that GO term navigates the user to a page that both provides the deﬁnition of that GO term and lists all other gene products within EcoCyc that have been annotated with that GO term . 
+ The GO term annotations within 
+ EcoCyc should be considered incomplete , as manual curation of GO terms is ongoing . 
+ Updates to metabolic pathways
+ Although EcoCyc has now expanded far beyond its initial role , EcoCyc began as a database of E. coli metabolism , primarily describing metabolic enzymes and pathways . 
+ Therefore , annotations for many metabolic enzymes are among the oldest entries in EcoCyc . 
+ During the past decade , signiﬁcant progress has been made in understanding E. coli metabolic pathways and their enzymes . 
+ Therefore , we have begun to systematically re-annotate these pathways ; in release 12.5 , 41 pathways that were entered into EcoCyc more than ten years ago , as well as 19 more recently added pathways , have been updated . 
+ As part of this effort , the curation of more than 180 metabolic enzymes has already been updated to reﬂect the latest state of knowledge . 
+ NEW SOFTWARE AND WEB SITE FEATURES Genome browser tracks
+ The EcoCyc genome browser now supports a track mechanism to aid users in visually analyzing positional datasets with respect to genome features such as the positions of genes , promoters and transcription factor binding sites . 
+ Examples include datasets of predicted promoters , predicted transcription factor binding sites and ChIP-chip datasets . 
+ Datasets encoded as GFF-format ﬁles ( http://www.sanger.ac.uk/Software/formats/GFF/ ) can be loaded into the desktop or Web versions of EcoCyc . 
+ Figure 5 shows a type of track speciﬁcally designed for the visualization of ChIP-chip data called a graph track . 
+ Multi-genome browser
+ Users of EcoCyc include both researchers who study the biology of E. coli and those who use E. coli , and thus EcoCyc , as a reference for their research in other organisms . 
+ To support both types of users , we have added several comparative tools to EcoCyc . 
+ The comparative genome browser is accessible from every gene page , and allows a user to select organisms from the hundreds that are available via the BioCyc database collection at BioCyc.org ( 14 ) and to then examine the ortholog of the starting gene in its local context within each selected organism . 
+ For example , Figure 6 shows the E. coli gene thrA aligned with its orthologs in several other organisms . 
+ The starting gene is marked with hash marks and aligne across the displays . 
+ Note that the other orthologs present are marked with the same color . 
+ For example , the adjacent gene thrB has an ortholog present in each organism displayed . 
+ The tool also indicates at the bottom of the page when no ortholog could be found . 
+ Using the multi-genome browser , users can query a broad range of organisms in search of orthologs and then can see the extent to which those orthologs have maintained their genetic context relative to E. coli . 
+ The Genome Omics viewer
+ Many users come to EcoCyc with large-scale datasets that include gene expression , proteomic and metabolomic data . 
+ As described in our earlier report on the EcoCyc database , these datasets can be viewed in the context of the E. coli metabolic network via the Cellular Omics Viewer , which is a tool that enables users to ` paint ' the results from these datasets onto the Cellular Overview diagram . 
+ To this tool , we have recently added the Genome Omics Viewer . 
+ This new viewing tool enables the display of large-scale gene-related datasets on the full E. coli genome , providing a valuable additional tool for the interpretation of highthroughput data . 
+ As shown in Figure 7 , the Genome Omics Viewer differs from the EcoCyc Genome Browser both by providing a schematic rather than a ` to-scale ' view of the genome and by placing an emphasis on operon membership and adjacent genes . 
+ In combination , the Genome and Cellular Omics Viewers enable interpretation of large datasets in both the metabolic and genomic contexts . 
+ Advanced query page
+ The new EcoCyc advanced query page is accessible by clicking the ` Advanced Query Form ' button located a the bottom of most EcoCyc data pages . 
+ The resulting page enables users to interactively construct complicated , multicriteria searches against EcoCyc . 
+ Example queries include ` Find all proteins of E. coli K-12 for which the DNA-FOOTPRINT-SIZE is smaller than 10 ' and ` Find all proteins of E. coli K-12 containing more than one subunit and that catalyze a reaction in which pyruvate is a substrate ' . 
+ Instructions for the advanced query page are available at http://www.biocyc.org/webQueryDoc.html . 
+ For many years we have provided a version of EcoCyc that runs as an application on a user 's local laptop or workstation computer . 
+ This form of EcoCyc access is highly recommended for frequent EcoCyc users because it provides faster execution and more capabilities than the Web version of EcoCyc . 
+ Scientists who use either the omics data analysis facilities or the genome browser tracks will ﬁnd this version 's faster speeds particularly useful . 
+ Differences between the desktop and Web versions of EcoCyc are summarized at http://www.biocyc.org/ desktop-vs-web-mode . 
+ shtml . 
+ In early 2008 , we adapted the desktop EcoCyc software to run on the Macintosh , adding one more personal computer option to the existing PC/Windows and PC/Linux platforms . 
+ The EcoCyc Web site now allows users to create accounts through which they can customize the appearance of EcoCyc pages , store organism sets for comparative operations , conﬁgure default settings for the Omics Viewers , and register to receive periodic email updates about EcoCyc . 
+ See the ` Create New Account ' link in the upper right corner of most EcoCyc Web pages . 
+ We have produced several video tutorials that walk users through the basic and advanced use of the EcoCyc and BioCyc Web sites , and that cover the unique features of the desktop software . 
+ These videos are available in several formats directly from the BioCyc site ( http://www.biocyc . 
+ org/webinar . 
+ shtml ) , and as podcasts via either iTunes ( search for ` BioCyc ' in the podcasts section of the iTunes Store ) or the video-sharing site YouTube ( http://www . 
+ youtube.com/user/SRIBRG ) . 
+ Flat ﬁles that contain the EcoCyc data are freely available for download at http://www.biocyc.org/download.shtml . 
+ The Pathway Tools software/database bundle is freely available to academic researchers . 
+ ACKNOWLEDGEMENTS
+ We thank Dr Robert Landick for suggesting the graphtrack display . 
+ National Institutes of Health ( grants GM077678 and GM75742 to P.D.K. , GM071962 to J.C.-V . ) . 
+ Funding for open access charge : NIH grant GM077678 . 
+ Conﬂict of interest statement . 
+ SRI authors beneﬁt from a commercial licensing program for Pathway Tools .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/19843227.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/19843227.txt 0 → 100644
View file @27818a9
+ during exponential growth of Salmonella Typhimuriummmi_69291169..1186
+ The challenge of compacting the chromosome into the limited size of the bacterial cell is met by a number of mechanisms including macromolecular crowding and DNA supercoiling . 
+ The chromosomal DNA is further condensed by small nucleoid-associated proteins that inﬂu-ence local DNA topology ( Luijsterburg et al. , 2006 ) . 
+ The architectural role of these proteins also plays a major role in the transcriptional regulation of genes that respond to environmental changes ( McLeod and Johnson , 2001 ; Fang and Rimsky , 2008 ) . 
+ The protein composition of the bacterial nucleoid is not static and reﬂects varying cellular requirements as it adapts to changing environmental conditions or growth phase ( Azam and Ishihama , 1999 ) . 
+ The 15.5 kDa proteins StpA ( suppressor of td mutant phenotype A ) and H-NS ( histone-like nucleoid structuring protein ) are major components of the Escherichia coli and Salmonella enterica serovar Typhimurium nucleoid ( Luijsterburg et al. , 2006 ) . 
+ The StpA and H-NS proteins are closely related and share 52 % identity . 
+ Conserved amino acids are predominantly located in the dimerization and DNA-binding domains . 
+ The close relatedness of the dimerization domains allows StpA and H-NS to interact to form heteromers ( Williams et al. , 1996 ; Johansson and Uhlin , 1999 ; Leonard et al. , 2009 ) . 
+ Both StpA and H-NS bind DNA nonspeciﬁcally . 
+ They display a slight preference for curved DNA , as found in the AT-rich segments found close to bacterial promoters ( Yamada et al. , 1990 ; Owen-Hughes et al. , 1992 ; Sonnenﬁeld et al. , 2001 ) . 
+ Consequently , both StpA and H-NS can repress transcription from a synthetic gal promoter containing an upstream curved sequence in bacterial cells ( Zhang et al. , 1996 ) . 
+ More recently , an AT-rich ( 78 % ) consensus sequence has been proposed as a high-affinity H-NS binding site in proteobacteria ( Lang et al. , 2007 ; Sette et al. , 2009 ) . 
+ Mutations in the hns structural gene cause pleiotropic phenotypes in both Salmonella and E. coli , many of which are linked to adaptation to environmental changes such as increased resistance to osmotic and cold shock in Salmonella ( Hinton et al. , 1992 ) , carbon source utilization ( Atlung and Ingmer , 1997 ) , homologous recombination and genome stability in E. coli ( Lejeune and Danchin , 1990 ; Dri et al. , 1992 ) . 
+ In contrast , mutations in stpA have 
+ StpA is a paralogue of the nucleoid-associated protein H-NS that is conserved in a range of enteric bacteria and had no known function in Salmonella Typhimurium . 
+ We show that 5 % of the Salmonella genome is regulated by StpA , which contrasts with the situation in Escherichia coli where deletion of stpA only had minor effects on gene expression . 
+ The StpA-dependent genes of S. Typhimurium are a speciﬁc subset of the H-NS regulon that are predominantly under the positive control of s38 ( RpoS ) , CRP-cAMP and PhoP . 
+ Regulation by StpA varied s38 with growth phase ; StpA controlled levels at mid-exponential phase by preventing inappropriate s38 activation of during rapid bacterial growth . 
+ In contrast , StpA only activated the CRP-cAMP regulon during late exponential phase . 
+ ChIP-chip analysis revealed that StpA binds to PhoP-dependent genes s38 but not to most genes of the CRP-cAMP and s38 regulons . 
+ In fact , StpA indirectly regulates-s38 dependent genes by enhancing turnover by repressing the anti-adaptor protein rssC . 
+ We discovered that StpA is essential for the dynamic regula-s38 tion of in response to increased glucose levels . 
+ Our ﬁndings identify StpA as a novel growth phase-speciﬁc regulator that plays an important physi-s38 ological role by linking levels to nutrient availability . 
+ Accepted 11 October , 2009 . 
+ * For correspondence . 
+ E-mail sacha.lucchini@bbsrc.ac.uk; Tel. ( +44 ) 1603 255000 ; Fax ( +44 ) 1603 255288 . 
+ no obvious phenotype in E. coli , although StpA often has a ( minor ) role in processes in which H-NS is involved ( Bertin et al. , 2001 ) . 
+ This minor role was conﬁrmed by the fact that deletion of stpA has no effects at the proteomic or transcriptomic levels in E. coli , as deﬁned by two-dimensional protein gel electrophoresis and global gene expression analysis ( Zhang et al. , 1996 ; Mueller et al. , 2006 ) . 
+ In fact , inactivation of stpA only has phenotypic effects in the absence of hns , indicating that the deletion of stpA is fully compensated by H-NS in E. coli . 
+ The observed phenotypic differences between stpA and hns deletions might be partially explained by the fact that intracellular H-NS levels are higher than those of StpA ( Zhang et al. , 1996 ; Sonnenﬁeld et al. , 2001 ) . 
+ This would be supported by the observation that multicopy expression of stpA rescues several hns mutant phenotypes ( Bertin et al. , 2001 ) . 
+ One mechanism that maintains the imbalance in levels of the two proteins is mediated by the negative cross-regulation that both proteins exert on each other ; H-NS represses stpA transcription more strongly than StpA controls hns ( Zhang et al. , 1996 ) . 
+ Additionally , StpA is susceptible to degradation by the Lon protease , while H-NS is not . 
+ StpA degradation by Lon increases in the absence of H-NS , suggesting that the formation of H-NS-StpA hetero-oligomers protects StpA from Lon and that only low levels of StpA Homo-oligomers are found in bacterial cells ( Johansson and Uhlin , 1999 ) . 
+ Although it is clear that StpA can mediate multicopy suppression of many H-NS-dependent genes , both proteins are not functionally identical . 
+ StpA was discovered by virtue of its ability to suppress the splicing defect of a mutant phage T4 thymidylate-synthase gene , and the protein was shown to promote RNA splicing in vitro ( Zhang et al. , 1995 ) . 
+ In fact , StpA has a stronger effect on splicing than H-NS in vivo ( Zhang et al. , 1996 ) . 
+ The involvement of StpA in RNA splicing probably reﬂects its ability to promote RNA annealing , strand displacement and folding of RNA ( Zhang et al. , 1996 ; Cusick and Belfort , 1998 ; Mayer et al. , 2007 ) . 
+ In addition to its RNA splicing role , StpA induces OmpF expression in E. coli by destabilizing the micF antisense RNA . 
+ In contrast , H-NS positively regulates OmpF levels by transcriptionally repressing micF ( Deighan et al. , 2000 ) . 
+ This suggests that under speciﬁc conditions StpA has a distinct function as an RNA chaperone that can not be fulﬁlled by H-NS . 
+ More recently , the H-NS regulon of S. Typhimurium was deﬁned by transcriptomic analysis . 
+ Consistent with the pleiotropic effects of hns mutations , 1439 genes were regulated by H-NS ( twofold cut-off , FDR 0.05 ) in S. Typhimurium ( Ono et al. , 2005 ; Lucchini et al. , 2006 ; Navarre et al. , 2006 ) . 
+ H-NS is responsible for silencing the expression of horizontally acquired DNA ( Lucchini et al. , 2006 ; Navarre et al. , 2006 ) . 
+ We have now determined the regulatory role of StpA by assessing the impact of deleting stpA upon S. Typhimurium global gene expression at different stages of growth . 
+ We show that , unlike in E. coli , StpA regulates a large number of genes in S. Typhimurium . 
+ StpA plays an important role in the growth phase-speciﬁc regulation of gene expression ; during mid-exponential phase , StpA prevents the premature expression of the s38 regulon whereas , during late-exponential growth , StpA is required for full expression of the CRP-cAMP regulon . 
+ Results
+ Growth phase-dependent expression of StpA The transcription of stpA is known to be restricted to a short period of the exponential growth phase during growth of E. coli in rich medium ( Free and Dorman , 1997 ) . 
+ This suggested that the cellular requirement for StpA could change in Salmonella during batch culture . 
+ To determine whether the levels of StpA were growth phase-dependent in S. Typhimurium SL1344 , we monitored stpA transcription and protein levels throughout growth in rich medium . 
+ + A stpA : : gfp transcriptional fusion showed that expression of stpA peaks during the early stages of exponential growth , as observed in E. coli ( Fig. 1A ) . 
+ As StpA is regulated at the post-translational level by Lon protease in E. coli ( Johansson and Uhlin , 1999 ) , we monitored StpA protein levels . 
+ To achieve this we ﬁrst constructed strain JH3573 that expresses a version of StpA with a C-terminal 3 ¥ FLAG epitope from the native stpA locus . 
+ 3 ¥ FLAG Transcriptomic analysis of the stpA strain JH3573 showed that no genes were differentially expressed during exponential growth compared with SL1344 wild type , conﬁrming that the addition of the epitope did not compromise StpA function ( data not shown ) . 
+ This strain was used to detect StpA protein at all growth stages by Western blotting , with levels peaking at the entry to exponential growth ( Fig. 1B ) . 
+ This conﬁrms that , unlike in E. coli , stpA continues to be expressed at a reduced level throughout exponential and stationary growth in S. Typhimurium . 
+ Identiﬁcation of the StpA regulon
+ The growth phase-dependent expression of StpA prompted us to use a transcriptomic approach to deﬁne the StpA regulon at four stages of growth , corresponding to different levels of StpA expression ( Fig. 1A ) : 60 ′ ( early exponential phase , EEP ) , 160 ′ ( mid-exponential phase , MEP ) , 250 ′ ( late exponential phase , LEP ) and 22 h pos-tinoculation ( late stationary phase , LSP ) . 
+ Bacterial RNA was extracted at each of the four time points , labelled and hybridized against SALSA microarrays ( see Experimental procedures ) . 
+ The transcriptome of the SL1344 parent and a strain lacking stpA ( JH3003 ) was compared under identical growth conditions , and revealed that the StpA regulon varied with growth phase ( Table S1 ) . 
+ The absence of StpA did not affect gene expression in EEP or LSP ( 1 and 22 h ) . 
+ In contrast , expression of up to 5 % of the S. Typhimurium genome was signiﬁcantly altered in JH3003 compared with SL1344 during exponential growth ( MEP and LEP ) , and we deﬁne these genes as being StpA-dependent . 
+ The number of StpA-dependent genes ( twofold change and a false discovery rate ( FDR ) 0.05 ) at any time point was 183 . 
+ There were 96 StpA-dependent genes at MEP and 129 at LEP . 
+ Most of the StpA-dependent genes were derepressed , with 92 % ( MEP ) and 81 % ( LEP ) genes being upregulated in the absence of StpA ( Table S1 ) . 
+ It has been reported that StpA can negatively regulate expression of a reporter gene under the control of the hns promoter in E. coli ( Zhang et al. , 1996 ) ; although this could only be observed in the absence of hns , this raised the possibility that StpA-dependent gene expression could simply reﬂect changes in the level of H-NS . 
+ Because StpA potentially represses hns , we expected such genes to be upregulated in an hns mutant and downregulated in a DstpA strain . 
+ After comparing the list of StpA-dependent genes with the previously published H-NS regulon of S. Typhimurium ( Ono et al. , 2005 ) , only one gene was identiﬁed that displayed such behav-iour ( STM3138 ) . 
+ This indicates that under the conditions tested , StpA does not regulate gene expression by modulation of H-NS levels , and is consistent with the transcriptomic data , which do not show signiﬁcant StpA-dependent changes in hns gene expression in S. Typhimurium JH3003 at any time point . 
+ We used transcriptional gene fusions and RT-PCR to conﬁrm the transcriptomic data for selected genes ( Fig . 
+ S1 ) . 
+ StpA represses genes required for cell envelope modiﬁcation and resistance to cationic peptides
+ Among the genes that were derepressed in the stpA deletion mutant were mig-14 , virK and ugtL , which are required for resistance to the cationic antimicrobial peptide polymyxin B ( PmB ) ( Brodsky et al. , 2002 ; Det-weiler et al. , 2003 ; Shi et al. , 2004 ) . 
+ Interestingly , more than half ( 54 % ) of the genes that were repressed at any time point by StpA were inducible by PmB ( Bader et al. , 2003 ) , suggesting that StpA could be involved in the resistance of Salmonella to antimicrobial peptides . 
+ To test the phenotypic consequences of StpA-dependent gene regulation , we analysed the effects of deleting or overexpressing stpA on survival of PmB challenge . 
+ Moderate overexpression of stpA was achieved by cloning the stpA gene into the low-copy-number vector pWKS30 [ six to eight copies per cell ; ( Wang and Kushner , 1991 ) ] to generate pMDH20 . 
+ A signiﬁcant increase in resistance to PmB was seen in JH3003 ( DstpA ) , and we discovered that overexpression of StpA caused a strong PmBsensitive phenotype ( Fig. 2A ) . 
+ Increased resistance of Salmonella to PmB involves alteration of the cell envelope via modiﬁcation of the lipid 
+ A component of LPS . 
+ It has been demonstrated that ugtL encodes a protein that mediates the formation of mono-phosphorylated lipid A ( Shi et al. , 2004 ) and reduces the negative charge of the lipid A . 
+ This decreases the interaction of the cationic PmB , leading to reduced plasma membrane destabilization and increased PmB resistance ( Ernst et al. , 2001 ) . 
+ Our data show that ugtL is StpA-dependent , as are other genes involved in LPS modiﬁcation including pagC and pagP . 
+ These genes are under the control of the PhoPQ two-component system ( Belden and Miller , 1994 ; Soncini et al. , 1996 ; Zwir et al. , 2005 ) and have been shown to control the permeability of the outer membrane to a variety of molecules , including the bile component deoxycholate ( Murata et al. , 2007 ) . 
+ We tested the effect of deoxycholate on Salmonella survival in response to stpA deletion and overexpression , and discovered that deoxycholate and PmB showed similar sensitivity patterns . 
+ Deletion of stpA resulted in a mar-ginal effect , and overexpression of StpA caused a deoxycholate-sensitive phenotype ( data not shown ) , supporting a role of StpA in the regulation of LPS modiﬁcation . 
+ Analysis of the StpA regulon ( Table S1 ) suggests that the role of StpA in altering the composition of cell envelope is not restricted to modulation of LPS components . 
+ StpA also represses a variety of genes thought to be involved in the synthesis of cell surface structures such as pili ( stdAB , safA , ﬁmA ) , capsule ( wca operon ) and cell membrane components ( ybhO ) . 
+ StpA represses s38-activated genes
+ More than one third of the StpA-dependent genes that are derepressed in JH3003 ( DstpA ) during MEP are activated by the alternative sigma factor s38 in Salmonella ( 39 % , Table S1 ) . 
+ s38 is encoded by rpoS and is required for the cellular reprogramming associated with entry into stationary phase or adverse growth conditions known as the general stress response ( Ishihama , 2000 ) . 
+ The transcription of a large number of stationary phase-associated genes is activated by s38 , and confers increased resistance to a number of physical and chemical stresses , including acid , heat and oxidative agents ( Klauch et al. , 2007 ) . 
+ The list of s38-dependent genes that are derepressed in JH3003 included genes involved in resistance to salt shock ( otsAB ) and oxidative stress ( katN and katE ) . 
+ Two of the most highly upregulated s38-dependent genes in JH3003 , osmY and yciE , code for proteins that are induced by acid in E. coli ( Audia et al. , 2001 ) . 
+ This suggested novel phenotypes that could be regulated by StpA in S. Typhimurium . 
+ We determined the level of resistance of the stpA mutants to stationary phase-relevant stresses , namely salt , acid and peroxide killing . 
+ The absence of StpA caused increased stress resistance at MEP ( Fig. 2B , D and E ) . 
+ This role was con-ﬁrmed by overexpression of StpA , which increased the sensitivity to salt and peroxide , compared with the wildtype strain ( Fig. 2B and D ) . 
+ We discovered that the deletion of stpA caused a far less pronounced effect on the transcription of s38-dependent genes at LEP compared with MEP ( Table S1 ) . 
+ This is exempliﬁed by the osmY and otsAB genes , which were repressed by StpA at the MEP stage of growth , but not at LEP . 
+ This effect had phenotypic consequences as the stpA deletion mutant JH3003 displayed increased resistance to salt at MEP , but not at LEP ( Fig. 2C ) . 
+ StpA prevents s38-dependent gene expression during mid-exponential growth
+ During the early stages of exponential growth in rich medium , several mechanisms ensure that s38 levels are maintained at low levels ( Klauch et al. , 2007 ) . 
+ Consequently , s38 inactivation has a mild effect on gene expression during exponential growth in E. coli ( Dong et al. , 2008 ) . 
+ Consistent with this , comparison of the transcriptional proﬁle of S. Typhimurium SL1344 with that of an rpoS-deletion mutant revealed few differences at MEP . 
+ Only eight genes showed minor changes in gene expression ( Table S3 ) , leading us to consider the possibility that the activation of s38-dependent genes in the absence of StpA could be mediated by other factors . 
+ It may be relevant that the s38-dependent genes osmC and katE have been shown to possess a second s38-independent promoter that is activated by the RcsC-YojN-RcsB phosphorelay ( Tanaka et al. , 1997 ; Bouvier et al. , 1998 ; DavalosGarcia et al. , 2001 ; Majdalani and Gottesman , 2005 ) . 
+ We needed to know if the increased gene expression observed in JH3003 in the absence of the repressive activity of StpA was dependent on s38 or not . 
+ We used a genetic strategy to determine whether deletion of stpA caused upregulation of gene expression in the absence of s38 at MEP . 
+ We compared the expression proﬁles of rpoS and stpA/rpoS mutant strains , and found that the majority of the 88 genes derepressed in JH3003 , including osmC and katE , were no longer derepressed in response to stpA deletion in the absence of s38 at MEP ( Fig. 3 ; Table S1 ) . 
+ The data show that s38-activation is responsible for the derepression observed in JH3003 . 
+ We conclude that StpA prevents s38-dependent gene expression during exponential growth of S. Typhimurium . 
+ StpA is essential for linking s38 expression to glucose availability
+ As well as the large number of genes that are directly or indirectly repressed by StpA , 32 genes were downregulated in the absence of StpA . 
+ Many of these genes , including mglAB , aspA and glpF , are involved in the catabolism of carbohydrates and are activated by CRP-cAMP in E. coli ( Gosset et al. , 2004 ) ( Table S1 ) . 
+ This raised the possibility that StpA could directly induce the expression of crp at LEP , as had been observed in E. coli ( Johansson et al. , 1998 ; 2000 ) . 
+ Indeed , a StpA-dependent decrease of crp expression was observed in our experiment ( 1.6-fold ) . 
+ This change was less than twofold , and therefore did not meet one of the criteria for detecting differentially expressed genes . 
+ However , the StpA-dependent reduction of the crp mRNA was highly statistically signiﬁcant ( FDR 0.001 ) . 
+ A similar low-level StpA-dependent decrease in crp was seen in E. coli ( Johansson et al. , 2000 ) . 
+ Interestingly , the expression of the CRP-cAMP regulon was only StpA-dependent at LEP ( Fig. 4 ) . 
+ In E. coli , CRP-cAMP represses rpoS transcription during logarithmic growth ( Lange and Hengge-Aronis , 1994 ) . 
+ However , there are indications that CRP-cAMP also plays a different role as an activator of rpoS expression during the late stages of growth ( Hengge-Aronis , 2002 ) . 
+ As StpA regulates both s38 and CRP , we hypothesized that StpA might link the expression of s38 with sugar availability . 
+ As CRP requires cAMP for its activity , we determined the effects of levels of glucose that lower intracellular cAMP levels upon s38 ( Deutscher et al. , 2006 ) . 
+ Growth in the presence of glucose caused a decrease of s38 in wild-type S. Typhimurium , and these changes in s38 levels were StpA-dependent ( Fig. 5 ) . 
+ Importantly , the non-PTS sugar maltose had a reduced effect on s38 levels in wild-type S. Typhimurium . 
+ To rule out an indirect effect on s38 through a change in medium osmolarity , we looked at the effect of the addition of lactose , a sugar that can not be utilized by Salmonella . 
+ The addition of lactose did not alter s38 levels . 
+ While this does not prove a mechanistic link between CRP-cAMP and StpA in the regulation of s38 , these results do show that the modulation of s38 by glucose requires functional StpA . 
+ StpA-regulated genes are also H-NS-dependent To determine whether StpA and H-NS regulate the expression of similar genes , we compared the regulons of StpA and H-NS . 
+ During exponential growth , 1439 genes are regulated by H-NS in S. Typhimurium ( twofold , FDR 0.05 ) ( Ono et al. , 2005 ) . 
+ We found that 90 % of the 162 StpA-dependent genes are regulated , to varying extents , by H-NS . 
+ Further analysis revealed that StpA regulates a deﬁned subset of the H-NS regulon , largely 38 consisting of the PhoP and s regulons ( Fig. 6A ) . 
+ These ﬁndings show a limited but distinct role for StpA in the control of global gene expression . 
+ Identiﬁcation of StpA-binding sites on the Salmonella chromosome
+ Because StpA regulates a subset of the H-NS regulon , and we and others have shown that H-NS directly silences expression of ~ 250 S. Typhimurium genes ( Lucchini et al. , 2006 ; Navarre et al. , 2006 ) , it was important to compare the binding of H-NS and StpA to the chromosome in S. Typhimurium . 
+ As StpA and H-NS form hetero-oligomeric complexes ( Williams et al. , 1996 ; Leonard et al. , 2009 ) , we determined whether StpA and H-NS colocalize in vivo and could cooperate to regulate gene expression . 
+ We used chromatin-immunoprecipitation ( ChIP-on-chip ) to identify StpA and H-NS DNA-binding sites on the genome of S. Typhimurium SL1344 . 
+ ChIP-on-chip was performed on strain JH3573 that expresses a 3 ¥ FLAG-tagged version of StpA . 
+ The parental SL1344 strain was used as negative control . 
+ We identiﬁed 285 chromosomal regions that bind StpA , ranging in size from 400 to 3500 base pairs and containing 572 genes . 
+ Comparison of the ChIP data revealed that StpA displayed an identical binding proﬁle to H-NS ; all genomic regions associated with StpA were also bound by H-NS ( Fig. 6B and C ) . 
+ As previously seen for H-NS ( Lucchini et al. , 
+ 2006 ; Navarre et al. , 2006 ) , StpA displayed a slight preference for AT-rich DNA ( Fig . 
+ S2 ) . 
+ Of the 88 StpA-dependent genes that were derepressed in JH3003 during mid-exponential growth , only 25 were bound by StpA in the wild-type strain ( Table S1 ) . 
+ This interesting ﬁnding shows that StpA generally regulates gene expression indirectly . 
+ We investigated the s38 regulon and found that only a small minority of StpA-dependent genes that were activated by s38 were bound by StpA ( 6/35 ) . 
+ In contrast , the majority of PhoP-dependent genes were bound by StpA ( 8/9 ) ( Table S1 ) . 
+ StpA modulates s38 stability
+ Our observation that the regulation of s38-dependent genes by StpA was largely indirect was intriguing . 
+ It was possible that StpA controlled the s38 regulon by modulating levels of s38 at the transcriptional level . 
+ However , transcriptomic data showed that the rpoS mRNA levels did not vary in JH3003 at MEP , compared with wild type ( Table S1 ) , and this was conﬁrmed by real-time PCR ( Fig. 7C ) . 
+ As post-transcriptional and post-translational regulation of s38 play an important role in E. coli ( Hengge-Aronis , 2002 ) , we used Western blotting to analyse s38 levels and discovered a clear increase in s38 in the stpA deletion strain at MEP ( Fig. 7A ) . 
+ This suggested that StpA regulates s38 by a post-transcriptional or post-translational mechanism at MEP . 
+ In contrast , StpA did not signiﬁcantly modulate s38 protein levels at LEP ( Fig. 7A ) . 
+ Consistent with this observation , deletion of stpA had little effect on the transcription of s38-dependent genes at LEP compared with MEP ( Table S1 ) . 
+ This was exempliﬁed by the expression proﬁle of the s38-dependent gene osmY ; the expression of osmY is StpA-dependent ( i.e. altered in the stpA deletion strain ) during mid-exponential growth ( Fig. 7C ) . 
+ However , despite the fact that the deletion of stpA did not affect s38 protein levels at LEP , our data showed that rpoS mRNA levels were signiﬁcantly lower in the absence of StpA ( in strain JH3003 ) at LEP compared with wild type ( Table S1 , Fig. 7C ) . 
+ This suggests that during the later stages of exponential growth , the StpA dependence of the rpoS mRNA is compensated by another mechanism that increases the level of s38 protein . 
+ During exponential growth in LB , s38 is maintained at low levels by a series of post-translational mechanisms , including ClpXP-mediated protein degradation ( Hengge-Aronis , 2002 ) . 
+ We tested a role for StpA in the posttranslational control of s38 by monitoring s38 stability after blocking protein synthesis with spectinomycin . 
+ The LEP growth phase was chosen because s38 levels are similar at this time point in wild type and JH3003 , and any observed difference in stability would not simply reﬂect the concentration of s38 . 
+ The results show that deletion of stpA leads to the stabilization of s38 , while overexpression of StpA reduces the half-life of s38 ( Fig. 7D ) , revealing that StpA is involved in the control of s38 protein turnover . 
+ StpA represses expression of the RssC anti-adaptor protein
+ Degradation of s38 requires the binding of the adapter protein RssB to target s38 to the ClpXP proteolytic system . 
+ Anti-adapter proteins are an important component of this pathway and interfere with the binding of RssB to the sigma factor to stabilize s38 . 
+ The RssB inhibitor IraP has been shown to be dependent on PhoP ( Bougdour et al. , 2006 ; Tu et al. , 2006 ) . 
+ As StpA repressed several PhoP-dependent genes , we speculated that StpA might regulate s38 stability via modulation of IraP activity . 
+ However , deletion of stpA in a DiraP background resulted in the same ﬁvefold increase in s38 half-life seen in a DstpA iraP + background ( data not shown ) showing that StpA does not directly modulate s38 stability via IraP . 
+ A screen for genes that increased s38 activity in S. Typhimurium identiﬁed the rssC gene ( STM1110 ) . 
+ Overexpression of rssC was reported to cause a 10-fold increase in the stability of s38 ( Bougdour et al. , 2008 ) . 
+ Based upon its similarity to the E. coli IraM protein , RssC is predicted to be a RssB anti-adaptor ( Bougdour et al. , 2008 ) . 
+ Our data show that the rssC gene is highly upregulated in the stpA deletion mutant at LEP ( Table S1 ) , raising the possibility that StpA could indirectly modulate s38 by controlling the transcription of rssC . 
+ To conﬁrm the role of RssC , we conducted an experiment at LEP and discovered that the stabilization of s38 caused by deletion of stpA was RssC-dependent ( Fig. 7E ) . 
+ It should be noted that StpA also regulates s38 stability via another mechanism , because s38 levels still show a partial StpA-dependence in the absence of rssC . 
+ We also determined whether rssC was necessary for the StpA-dependent modulation of s38 levels at MEP . 
+ The 
+ Western blot data clearly showed that the stabilization of s38 caused by deletion of stpA is also RssC-dependent at MEP ( Fig. 7B ) . 
+ In summary , StpA controls s38 stability during MEP and LEP by modulating the level of RssCmediated degradation of s38 . 
+ Discussion
+ We characterized the role of an H-NS paralogue in S. Typhimurium during growth in rich medium , and discovered that StpA is a true global regulator of gene expression that directly and indirectly regulates 183 genes . 
+ The majority of these genes are repressed by StpA . 
+ Many of the StpA-repressed genes were involved in the synthesis/modi ﬁcation of the cell envelope and response to various stresses such as osmotic shock , oxi-dative stress and resistance to cationic antimicrobial peptides . 
+ A large proportion of the StpA-dependent genes belong to the s38 , CRP and PhoP regulons . 
+ To validate the transcriptomic data at the phenotypic level , we showed that the stpA mutant exhibited increased resistance to various environmental stresses and antimicrobial peptides . 
+ The majority of the StpA-dependent genes belong to a well-deﬁned subset of the H-NS regulon . 
+ Most of the overlap between the two regulons involved s38 and PhoP-dependent genes . 
+ However , StpA does not regulate the Salmonella pathogenicity island 2 or the motility system , which are strongly repressed by H-NS . 
+ Therefore , StpA does not simply share the same biological properties of H-NS , but has a distinct and speciﬁc role . 
+ Importantly , gene regulation by StpA was found to be growth phase-dependent . 
+ Repression of the s38 regulon by StpA was restricted to the exponential phase of growth , while activation of CRP-dependent genes was limited to LEP . 
+ This identiﬁes StpA as an important factor in the growth phase-dependent control of gene expression in Salmonella . 
+ StpA prevents premature expression of s38-dependent genes and potentiates the expression of genes involved in nutrient acquisition during the late stages of exponential growth , when nutrient availability becomes a limiting factor for cellular replication . 
+ To differentiate between direct and indirect regulation of gene expression by StpA , we used chromatin immunoprecipitation ( ChIP-chip ) to identify StpA binding sites on the Salmonella genome in living bacterial cells . 
+ We found StpA coimmunoprecipitated with the majority of PhoP-dependent genes . 
+ Two StpA-repressed PhoP-dependent genes bound by StpA , pagC and ugtL , are also directly repressed by H-NS ( Perez et al. , 2008 ) . 
+ Interestingly , these genes require SlyA to alleviate H-NS repression . 
+ As SlyA is not able to directly activate transcription in vitro , its role appeared to be limited to counteracting the repressive action of H-NS on genes that were recently acquired by horizontal gene transfer ( Perez et al. , 2008 ) . 
+ Our ﬁnding that a large proportion of SlyA-activated genes are repressed and bound by StpA ( Table S1 ) suggests that StpA regulates these genes in a similar manner to H-NS , and that SlyA can counteract the effect of both H-NS and StpA mediated repression . 
+ In contrast to the binding of StpA to members of the PhoP regulon , only a minority of s38-dependent genes were bound by StpA . 
+ This led us to determine whether StpA regulated s38-dependent genes indirectly by modulation of s38 levels . 
+ We discovered that s38 protein levels were much higher in the stpA mutant than in the parental strain during exponential growth , and that StpA modulates s38 stability via repression of rssC . 
+ H-NS has also been shown to modulate s38 levels in E. coli by regulating translation initiation and s38 stability through mechanisms that still require clariﬁcation ( Hengge-Aronis , 2002 ; Zhou and Gottesman , 2006 ) . 
+ No regulatory function has been attrib-uted to StpA in E. coli ( Hengge-Aronis , 2002 ) . 
+ Strikingly , we observed that deletion of stpA does not affect s38 levels during late exponential growth , even though StpA does modulate s38 stability during this growth phase . 
+ The stpA deletion mutant displayed lower rpoS mRNA levels at LEP , indicating that the negative effect of StpA on s38 stability may be compensated at the transcriptional level under conditions when higher s38 amounts are required ( Fig. 8 ) . 
+ We speculate that the dual role played by StpA in both the transcriptional and post-transcriptional control of s38 makes the RpoS system particularly sensitive to environmental conditions . 
+ The expression of genes required for the resistance against environmental challenges is costly for the bacterial cell . 
+ This is exempliﬁed by the fact that rpoS mutants of Salmonella display a ﬁtness advantage in the absence of stress ( Robbe-Saule et al. , 2003 ) . 
+ Moreover , natural isolates of E. coli that displayed higher levels of s38 activity were shown to metabolize fewer carbon sources and to be less able to compete for low nutrient concentrations ( King et al. , 2004 ) . 
+ These ﬁndings led to the hypothesis that bacteria need to balance self-preservation and nutritional capacity ( SPANC ; Ferenci , 2005 ) . 
+ The ﬁtness advantage derived from reduced s38 levels under low stress conditions has been linked to sigma factor competition ( Farewell et al. , 1998 ) . 
+ It is postulated that there is a limiting amount of RNA polymerase ( RNAP ) and that s38 and s70 compete for binding to the RNAP . 
+ High s38 levels therefore allow less s70 binding to RNAP leading to lower expression of s70-dependent genes , many of which are linked to metabolism and cellular growth ( Gruber and Gross , 2003 ) . 
+ Consequently , bacterial cells must tightly regulate the expression levels of key global regulators and this is exempliﬁed by the complex regulatory network that determines s38 levels . 
+ Because StpA represses the s38 regulon while stimulating expression of the CRP-cAMP regulon , we propose that StpA plays a role in promoting an appropriate SPANC balance by linking stress response to nutrient availability in S. Typhimurium . 
+ This hypothesis is supported by the observation that the glucose-mediated decrease of s38 levels is StpA-dependent . 
+ Analysis of the ChIP-chip data revealed that StpA colocalizes with H-NS on the Salmonella genome . 
+ Together with our ﬁnding that all StpA-dependent genes are also regulated by H-NS , the localization results suggest that StpA and H-NS cooperate to regulate gene expression by forming hetero-oligomers . 
+ This is reminiscent of the H-NS-like proteins MvaT and MvaU of Pseudomonas aeruginosa that regulate gene expression in a synergistic fashion ( Castang et al. , 2008 ) . 
+ It is important to note that although H-NS and StpA bind to the same genes , their effect on gene expression is not equivalent . 
+ For example , the SPI2 genes , which are strongly derepressed in an Dhns mutant ( Lucchini et al. , 2006 ) , do not respond to the deletion of stpA despite the fact that StpA and H-NS share a similar binding proﬁle over the entire locus ( Fig. 6C ) . 
+ Our data ﬁt with other observations that binding of StpA to a particular locus is not sufficient to exert repression . 
+ At the bgl operon in E. coli , StpA can compensate for the deletion of the H-NS DNA-binding domain , but StpA alone is not capable of signiﬁcant repression ( Free et al. , 1998 ) . 
+ The fact that StpA can compensate for the lack of the H-NS DNA-binding domain implies that the functional differences between these two proteins are associated with the dimerization and oligomerization domains . 
+ Consistent with this , recent work in S. Typhimurium and E. coli shows that StpA and H-NS have different dimerization and oligo-merization properties ; StpA homodimers are more thermostable than H-NS homodimers ( Leonard et al. , 2009 ) . 
+ In addition , H-NS-StpA hetero-oligomers are more stable than homo-oligomers of either StpA or H-NS . 
+ There is growing evidence that interactions between H-NS and other regulatory proteins are an essential component of H-NS-mediated repression . 
+ Hha and YdgT are homologues of the H-NS dimerization and oligomerization domains that can interact with both H-NS and StpA ( Nieto et al. , 2002 ; Paytubi et al. , 2004 ) . 
+ Deletion of Hha and YdgT induces derepression of many more H-NS-dependent genes than we observed in the absence of StpA . 
+ These ﬁndings suggest that the complexes that H-NS forms with Hha/YdgT have distinct properties compared with H-NS homo-oligomers or H-NS-StpA hetero-oligomers , possibly due to a modulation of the activity of H-NS ( Vivero et al. , 2008 ) . 
+ Distinctions between the regulatory roles of H-NS and StpA may therefore lie in their differing abilities to form homo-oligomers , and to interact with other proteins , with concomitant effects upon local DNA conformation and gene expression . 
+ Overall , our data reveal a novel role for StpA in the transcriptional regulatory network of S. Typhimurium as a node that connects the CRP-cAMP , PhoP and s38 regulons ( Fig. 8 ) . 
+ This contrasts with the situation described for E. coli , where inactivation of stpA only signiﬁcantly affects gene or protein expression in the absence of hns ( Zhang et al. , 1996 ; Mueller et al. , 2006 ) . 
+ The fact that StpA plays a more limited role in gene regulation in E. coli was also observed at the phenotypic level , as deletion of stpA in E. coli does not show the clear effects we observed in Salmonella ( Atlung and Ingmer , 1997 ; Bertin et al. , 2001 ) . 
+ Our ﬁndings correlates with the fact that hns is an essential gene in S. Typhimurium ( Navarre et al. , 2006 ) , while deletion of hns only has a moderate effect on growth rate in E. coli ( Zhang et al. , 1996 ) . 
+ The differing roles of StpA in the closely related organisms S. Typhimurium and E. coli could reﬂect the different requirements and the evolution of an intracellular pathogen . 
+ Analysis of the conservation of the E. coli transcriptional network in 175 prokaryotic genomes revealed that transcription factors evolve faster than their target genes ( Babu et al. , 2006 ) . 
+ It has also been shown that bacterial adaptation can not only be achieved by gene acquisition and/or loss but also requires changes at the level of gene regulation ( Winﬁeld and Groisman , 2004 ) . 
+ Our discovery that StpA represses the s38 regulon during exponential growth of S. Typhimurium , but not in E. coli , is consistent with the function of orthologous regulatory proteins rapidly diverging to allow adaptation to new environmental niches . 
+ Experimental procedures
+ Bacterial strains and growth conditions
+ The S. enterica serovar Typhimurium strain SL1344 was provided by Catherine Lee ( Hoiseth and Stocker , 1981 ) and is the same isolate used in previous transcriptomic studies from the Hinton laboratory ( Eriksson et al. , 2003 ; Mangan et al. , 2006 ; Nagy et al. , 2006 ; Hautefort et al. , 2008 ) . 
+ Where necessary , antibiotics were used at the following concentrations : ampicillin ( 100 mg ml-1 ) , chloramphenicol ( 12.5 mg ml-1 ) , kanamycin ( 35 mg ml-1 ) . 
+ Cultures were grown in LB broth ( Sambrook and Russell , 2001 ) under aeration ( 250 r.p.m. ) at 37 °C and harvested at four different growth phases : EEP ( A600 = 0.005 -- 0.010 ) , MEP ( A600 = 0.12 -- 0.15 ) , LEP ( A600 = 1.0 -- 1.2 ) and LSP ( A600 = 3.7 -- 3.8 ) . 
+ Bacterial strains and plasmids used in this study are shown in Table 1 . 
+ Strain construction and DNA manipulation
+ To obtain a strain overexpressing StpA , we used the low-copy plasmid pWKS30 ( Wang and Kushner , 1991 ) . 
+ First , we PCR-ampliﬁed the stpA gene including 779 bp upstream of the structural gene using primers stpA-FO2 and stpA-RO2 , which carry the restriction sites XbaI and HindIII respectively . 
+ The resulting PCR product and plasmid pWKS30 were digested with XbaI and HindIII , puriﬁed by agarose gel electrophoresis and ligated to generate plasmid pMDH20 . 
+ This plasmid was then transferred into S. Typhimurium SL1344 to generate strain JH3750 . 
+ Overexpression of StpA from pMDH20 was conﬁrmed at both RNA and protein levels ( data not shown ) . 
+ To generate a stpA deletion derivative of S. Typhimurium SL1344 we ﬁrst constructed a plasmid derivative of pMDH20 , where the stpA orf was replaced by the Campylobacter coli cat gene from the plasmid pAV35 ( van Vliet et al. , 1998 ) . 
+ To do so , we used primers StpA-Ri and StpA-Fi2 carrying a BamHI site to perform an inverse PCR on pMDH20 . 
+ These primers prime outward from the stpA structural gene and do not amplify it . 
+ The PCR product was then ligated to generate plasmid pMDH21 . 
+ The pAV35 cat gene was then inserted into the BamHI site of pMDH21 . 
+ This generated a new plasmid , designated pMDH22 , which contains the cat gene ﬂanked by approximately 780 and 450 base pairs of the regions upstream and downstream of the stpA gene respectively . 
+ The DNA fragment containing the cat ﬂanked by the stpA ﬂanking regions was excised from pMDH22 by a XbaI and HindIII restriction digestion . 
+ This DNA fragment was then electroporated into S. Typhimurium SL1344 expressing the l Red recombinase ( Datsenko and Wanner , 2000 ) . 
+ The other deletion derivatives of S. Typhimurium SL1344 generated for this study ( Table 1 ) were obtained using the un-modiﬁed l Red method ( Datsenko and Wanner , 2000 ) . 
+ After mutagenesis , all mutations were P22-transduced to a clean background using phage P22 HT105/1 int-201 . 
+ EBU plates were used to select for nonlysogens ( Bochner , 1984 ) . 
+ P22-transduction was also used to combine mutations for the generation of double mutants . 
+ All mutant strains were veriﬁed by PCR and DNA sequencing . 
+ Strains carrying the deletion of rpoS and/or stpA were additionally validated by Western blot . 
+ The C-terminal tagging of StpA with the 3 ¥ FLAG epitope was performed using a modiﬁed l Red method ( Uzzau et al. , 2001 ) . 
+ The DNA fragment to be recombined into the chromosome was PCR-ampliﬁed from the pSUB11 plasmid . 
+ The successful fusion of StpA with the 3 ¥ FLAG epitope was conﬁrmed by sequencing and Western blot . 
+ To obtain a stpA : : gfp + transcriptional fusion we used a derivative of the l Red system ( Hautefort et al. , 2003 ) ; we used primers stpA-gfp_F and stpA-gfp_R to amplify from plasmid pZEP07 a DNA fragment containing a promoterless gfp + gene followed by a chloramphenicol resistance cassette . 
+ The primers were designed to direct integration of the gfp + and cat genes in the Salmonella chromosome 20 bp downstream of the stpA ORF . 
+ All primers used to generate the different constructs are shown in Table S2 . 
+ Flow cytometric analysis
+ For measurement of GFP + expression , samples were immediately ﬁxed for 1 min at room temperature in 3.7 % ( w/v ) formalin , diluted in 1 -- 2 ml of PBS to obtain a maximum of approximately 106 particles ml-1 and analysed with a FACS-calibur ﬂow cytometer ( Becton Dickinson , Franklin Lakes , N.J. ) equipped with a 15 mW air-cooled argon ion laser as the excitation light source ( 488 nm ) as previously described ( Hautefort et al. , 2003 ) . 
+ RNA extraction for transcriptomic experiments
+ Overnight cultures were diluted 1000-fold into 250 ml ﬂasks containing 25 ml of LB-broth ( Sambrook and Russell , 2001 ) . 
+ These cultures were incubated at 37 °C for 60 min , 160 min , 250 min or 22 h in a water bath ( New Brunswick Innova 4000 ) under agitation ( 250 r.p.m. ) . 
+ Synthesis and degradation of RNA were then blocked by adding 1/5 volume of stop-solution ( 90 % ethanol/10 % phenol ) ( Tedin and Blasi , 1996 ) . 
+ RNA was prepared from each culture ﬂask using the Promega SV total RNA puriﬁcation kit according to the manufacturer 's instructions . 
+ The quality of RNA was checked using the RNA nanochip ( Labchip , on an Agilent 2100 Bioanalyser ) and quantiﬁed by measuring the absorbance at 260 nm on a Nanodrop 1000 spectrophotometer . 
+ Template labelling and hybridization
+ The ` Common reference ' experimental design used S. Typhimurium genomic DNA as the cohybridized control for one channel on all microarrays . 
+ This method has the advantage of allowing the direct comparison between multiple samples , and is ideal for time-course experiments ( Eriksson et al. , 2003 ; Mangan et al. , 2006 ) . 
+ Total RNA and chromosomal DNA were labelled by random priming according to the protocols described at http://www.ifr.ac.uk/safety/ microarrays / #protocols . 
+ Brieﬂy , a total of 16 mg RNA was reverse-transcribed and labelled with Cy3-conjugated dCTP ( Pharmacia ) using 200 U of Stratascript ( Stratagene ) and random primers ( Invitrogen ) . 
+ Chromosomal DNA ( 400 ng ) was labelled with Cy5-dCTP using the Klenow fragment . 
+ After labelling , each Cy5-labelled cDNA sample was combined with Cy3-labelled chromosomal DNA and hybridized to a microarray overnight at 65 °C . 
+ Data acquisition and transcriptomic data analysis
+ After hybridization , slides were washed and scanned using a GenePix 4000A scanner ( Axon Instruments ) . 
+ Fluorescent spots and the local background intensities were quantiﬁed using GenePix 5.0 software ( Axon ) . 
+ To compensate for unequal dye incorporation , data centring was performed bringing the median Ln ( Red/Green ) for each block to zero ( one block being deﬁned as the group of spots printed by the same pin ) . 
+ Data visualization and data mining were performed using GeneSpring 7.1 ( Silicon Genetics ) . 
+ The complete dataset is available at GEO ( Accession Number GSE18452 ) . 
+ Real-time PCR
+ To determine the levels of rpoS mRNA in wild type and DstpA strains through growth in batch-culture , overnight cultures were diluted 1:1000 in LB and cells were grown as for the transcriptomic experiment at 37 °C under aeration ( 250 r.p.m. ) . 
+ Aliquots were removed at regular intervals and treated with 0.2 vol . 
+ of stop solution ( 90 % EtOH ; 10 % water-saturated phenol ) . 
+ Total RNA was isolated as described above , using the Promega SV total RNA puriﬁcation kit and RNA concentrations were determined on a Nanodrop machine ( NanoDrop Technologies ) . 
+ The RNA ( 5 mg ) was reverse-transcribed in 25 ml of Stratascript ﬁrst-strand buffer in the presence of 0.5 mM dNTPs , 1 mg of random hexamers and 50 U Stratascript ( Stratagene ) . 
+ The relative amounts of target mRNA were then determined by real-time PCR using SYBR Green JumpStart Taq ReadyMix following the manufacturer 's instructions ( Sigma ) . 
+ The real-time PCR was performed using genespeciﬁc primer pairs ( Table S2 ) designed in silico ( http : / / frodo.wi.mit.edu/primer3/ ) to generate amplicons in the 100 -- 120 bp range . 
+ ampD was used as an internal standard as it generally displays little variation in the transcriptional studies performed in our lab . 
+ In all the transcriptomic data presented in this article , ampD did not change by > 1.3-fold . 
+ Environmental stress survival
+ The ability to survive various environmental challenges was measured on S. Typhimurium bacterial cultures grown in LB until MEP ( A600 = 0.12 -- 0.15 ) or LEP ( A600 = 1.0 -- 1.2 ) . 
+ At the appropriate time , hydrogen peroxide ( 20 mM ﬁnal concentration ) or polymyxin B ( 4 mg ml-1 ) was added . 
+ Survival at low pH was tested by diluting MEP cultures 1:100 into LB poised to pH 3.0 with HCl . 
+ Resistance to salt was examined by diluting MEP or LEP cultures 1:10 into LB containing 3.3 M NaCl . 
+ Samples were taken at various time points and spread on LB agar , except for the samples subjected to acid shock that were spread on Tryptone Soya Agar ( Oxoid CM0131 ) to promote optimal recovery . 
+ Colony forming units were enumerated and the relative survival of each strain was determined at each time point . 
+ For each test , at least three biological replicates were quantiﬁed . 
+ To determine the amounts of StpA3 ¥ FLAG or s38 protein levels , cells were lysed by sonication ( 10 mm amplitude ; MSE Soni-prep 150 ) . 
+ Total protein amounts were then determined using the bicinchoninic acid assay ( Sigma , Cat . 
+ BCA-1 ) . 
+ For each sample , 15 mg of total protein were re-suspended in SDS sample buffer ( Sigma , Cat . 
+ S3401 ) and run on NuPAGE 12 % Bis-Tris gels ( Invitrogen ) . 
+ Transfer to PVDF membrane was performed as described by the manufacturer ( Invitrogen ) and the membranes were then blocked for 45 min in 10 % Marvel Milk prepared in TBS-Tween 20 ( 25 mM Tris ; 0.8 % NaCl ; 0.02 % KCl ; 0.05 % Tween 20 ; pH 7.5 ) . 
+ The membranes were subsequently probed for 2 h with 1:1000 anti-s38 ( Neoclone , Cat . 
+ W0009 ) or 10 mg ml-1 anti-FLAG M2 antibody ( Sigma , Cat . 
+ F3165 ) serum in 10 % Marvel Milk prepared in TBS-Tween 20 . 
+ Washes and chemiluminescent immunodetection were performed as described previously ( Hautefort et al. , 2008 ) . 
+ Spectinomycin assay of protein stability
+ The S. Typhimurium SL1344 cultures were grown to LEP , at which point spectinomycin was added to a 100 mg ml-1 concentration to stop protein synthesis ( Zhou and Gottesman , 2006 ) . 
+ Bacterial samples were taken at regular intervals and immediately precipitated by adding 1/6 volume 30 % TCA . 
+ The samples were then left on ice for 20 min and centrifuged at 20 000 g for 20 min at 4 °C . 
+ Pellets were washed with cold acetone and centrifuged again at 20 000 r.p.m. for 20 min . 
+ After air-drying , the pellets were re-suspended in SDS sample buffer for Western blot analysis . 
+ Chromatin immunoprecipitation
+ Overnight cultures of S. Typhimurium stpA3 ¥ FLAG or S. Typh-imurium SL1344 were diluted 1000-fold into 250 ml ﬂasks containing 25 ml of LB and grown at 37 °C under aeration . 
+ After reaching mid-exponential growth ( OD600 0.12 ) , formaldehyde ( Sigma ) was added to reach a 1 % ﬁnal concentration and the cultures incubated for 15 min at 37 °C . 
+ The cross-linking was then stopped by adding 1/4 vol . 
+ 1 M glycine . 
+ Cells were washed three times in ice-cold PBS ( pH 7.4 ) , re-suspended in 500 ml lysis buffer ( 10 mM Tris pH 8.0 ; 50 mM NaCl ; 10 mM EDTA ; 20 % sucrose ; 20 mg ml-1 lysozyme ) and incubated 45 min at 37 °C at which point 500 ml 2 ¥ RIPA ( 100 mM Tris pH 8.0 ; 300 mM NaCl ; 2 % Nonidet P40 ; 1 % sodium deoxycholate ; 0.2 % SDS ) was added . 
+ Chromatin was solubilized by sonication ( 10 mm amplitude ; MSE Soniprep 150 ) until DNA fragments were between 300 and 750 bp , and the lysate centrifuged at 12000 g for 10 min to remove debris . 
+ To determine the binding of StpA3 ¥ FLAG to the S. Typhimurium genome , the cell extract from S. Typhimurium stpA3 ¥ FLAG was incubated with anti-FLAG M2 antibody for 4 h at 4 °C . 
+ Chromatin immuno-precipitation with the anti-FLAG M2 antibody on the SL1344 wild-type strain provided the negative control . 
+ Co-immunoprecipitation of H-NS and DNA was performed by incubating S. Typhimurium SL1344 lysate with H113 monoclonal anti-H-NS antibody ( Sonnenﬁeld et al. , 2001 ) . 
+ As a negative control , immunoprecipitation was obtained by incubating the cell extract without the addition of any antibody.After the incubation of the cell extracts with or without antibody , 50 ml protein G beads ( Sigma , E3405 ) were added and left for 16 h at 4 °C . 
+ The protein G beads were then washed twice in 1 ¥ RIPA , twice in wash solution ( 10 mM Tris pH 8.0 ; 250 mM LiCl ; 1 mM EDTA ; 0.5 % Nonidet P40 ; 0.5 % sodium deoxycho-late ) and twice in TE ( 10 mM Tris pH 8.0 ; 1 mM EDTA ) . 
+ The immunoprecipitate was eluted in 150 ml of elution buffer ( 50 mM Tris pH 8.0 ; 10 mM EDTA ; 1 % SDS ) prewarmed at 65 °C . 
+ Cross-linking was reversed by incubating the eluate in 0.5 ¥ elution buffer containing 0.8 mg ml-1 pronase ( Sigma ) and DNA puriﬁed using the Qiagen PCR puriﬁcation kit . 
+ The StpA and H-NS ChIP data are available at GEO ( Accession Number GSE18452 ) . 
+ Acknowledgements
+ We thank Ida Porcelli and Gary Rowley for useful discussions , and Martin Goldberg for assistance with the construction of plasmid pMDH20 . 
+ We are indebted to Lionello Bossi for providing plasmid pSUB11 for the construction of the epitope-tagged StpA . 
+ We are grateful to Roy Bongaerts and Isabelle Hautefort for their help with the analysis of the stpA-gfp + transcriptional fusion . 
+ We also thank Fran Mulholland and Nigel Belshaw for technical assistance . 
+ We acknowledge funding from the BBSRC Core Strategic Grant to J.H. . 
+ Supporting information
+ Additional supporting information may be found in the online version of this article . 
+ Please note : Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors . 
+ Any queries ( other than missing material ) should be directed to the corresponding author for the article .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/20460455.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/20460455.txt 0 → 100644
View file @27818a9
+ Identification of b-catenin binding regions in colon
+ ABSTRACT 
+ Nucleic Acids Research , 2010 , Vol . 
+ 38 , No. 17 5735 -- 5745 doi :10.1093 / nar/gkq363 
+ Deregulation of the Wnt/b-catenin signaling pathway is a hallmark of colon cancer . 
+ Mutations in the adenomatous polyposis coli ( APC ) gene occur in the vast majority of colorectal cancers and are an initiating event in cellular transformation . 
+ Cells harboring mutant APC contain elevated levels of the b-catenin transcription coactivator in the nucleus which leads to abnormal expression of genes controlled by b-catenin/T-cell factor 4 ( TCF4 ) complexes . 
+ Here , we use chromatin immunoprecipitation coupled with massively parallel sequencing ( ChIP-Seq ) to identify b-catenin binding regions in HCT116 human colon cancer cells . 
+ We localized 2168 b-catenin enriched regions using a concordance approach for integrating the output from multiple peak alignment algorithms . 
+ Motif discovery algorithms found a core TCF4 motif ( T/A -- T/A -- C -- A -- A -- A -- G ) , an extended TCF4 motif ( A/T/G -- C/G -- T/A -- T/A -- C -- A -- A -- A -- G ) and an AP-1 motif ( T -- G -- A -- C/T -- T -- C -- A ) to be significantly represented in b-catenin enriched regions . 
+ Furthermore , 417 regions contained both TCF4 and AP-1 motifs . 
+ Genes associated with TCF4 and AP-1 motifs bound b-catenin , TCF4 and c-Jun in vivo and were activated by Wnt signaling and serum growth factors . 
+ Our work provides evidence that Wnt / b-catenin and mitogen signaling pathways intersect directly to regulate a defined set of target genes . 
+ Oregon Clinical and Translational Research Institute , Oregon Health and Science University , Portland , OR , 2 The Department of Biochemistry and Molecular Biology , The Pennsylvania State University College of Medicine , Hershey , PA 17033 , Department of Medical Informatics and Clinical Epidemiology , Oregon Health 3 and Science University , Portland , OR , Knight Cancer Institute , Division of Biostatistics in the Department of 4 5 Public Health and Preventative Medicine , Oregon Health and Science University , Portland , OR , Program in 6 Cellular and Molecular Biology and The Pennsylvania State Hershey Cancer Institute , The Pennsylvania State 7 University College of Medicine , Hershey , PA 17033 , USA 
+ INTRODUCTION
+ The Wnt/b-catenin signaling pathway is required for homeostasis in the gastrointestinal ( GI ) tract ( 1 ) . 
+ The GI tract is coated with small invaginations , or crypts , which are comprised of discrete zones of proliferating and differentiated cells ( 1 ) . 
+ Wnt signaling maintains the proliferative compartment of the crypt . 
+ The b-catenin transcriptional coactivator controls downstream genetic programs elicited by Wnt signaling and its cellular levels are tightly regulated ( 2,3 ) . 
+ When cells are not exposed to Wnt , cytosolic b-catenin associates with a multi-protein complex that contains the adenomatous polyposis coli ( APC ) protein . 
+ APC functions as a scaffold to coordinate b-catenin phosphorylation and degradation by the prote-asome . 
+ Under these conditions , Wnt/b-catenin target genes are silenced by corepressor complexes that are tethered to Wnt responsive DNA enhancers ( WREs ) through interactions with the T-cell factor ( TCF ) family of transcription factors ( 4 ) . 
+ TCF4 is a predominant TCF family member in colon cancer cells ( 5 ) . 
+ When cells are exposed to Wnt , b-catenin escapes proteasomal degrad-ation , and is chaperoned to the nucleus by APC . 
+ There , it occupies TCF4 bound WREs and displaces the corepressors . 
+ b-catenin then recruits chromatinmodifying complexes and Wnt/b-catenin target genes are expressed ( 4 ) . 
+ Deregulation of the Wnt/b-catenin pathway is associated with colon carcinogenesis ( 2,3 ) . 
+ In virtually all cases of colon cancer , mutations target components of the Wnt signaling pathway ( 6 ) . 
+ The most common lesions localize to APC and lead to production of a truncated APC protein that can no longer effectively coordinate b-catenin degradation . 
+ This mutation occurs at the earliest stages of carcinogensis when normal colonic epithelial cells are transformed into aberrant crypt foci ( 6 ) . 
+ Inherited APC mutations give rise to familial adenomatous polyposis , a disease where aﬄicted individuals are burdened by thousands of intestinal polyps early in adulthood ( 7,8 ) . 
+ In the rare cancer cases where APC is wild-type , mutations instead are found in CTNNB1 ( 3,9,10 ) . 
+ CTNNB1 is the gene that encodes b-catenin , and the cancer causing lesions map to positions near the 50 portion of the gene . 
+ These mutations give rise to a b-catenin pool that is resistant to proteasomal degrad-ation ( 11 ) . 
+ In each instance , mutations that target APC or CTNNB1 lead to high levels of b-catenin in the nucleus and abnormal expression of genes regulated by b-catenin / TCF4 complexes ( 5,9 ) . 
+ Therefore identifying target genes directly controlled by b-catenin/TCF4 is required to understand the pathogenesis of this disease . 
+ To identify direct b-catenin/TCF4 target genes it is ﬁrst necessary to map binding sites for these factors across the genome . 
+ Previously , we used an unbiased and genome-wide screen termed serial analysis of chromatin occupancy ( SACO ) to localize 412 high conﬁdence b-catenin binding sites in the human colorectal cancer cell line , HCT116 ( 12 ) . 
+ Approximately half of the binding sites were near ( < 2.5 kb ) or within protein-coding gene boundaries . 
+ These b-catenin binding sites were located in 50 promoter regions , intragenic regions and 30 untranslated regions . 
+ b-Catenin binding to 30 positions relative to E2F4 and MYC genes identiﬁed functional WREs . 
+ For E2F4 , the downstream enhancer drove expression of an antisense and non-coding transcript that decreased E2F4 protein levels ( 13 ) . 
+ For MYC , b-catenin occupancy of the downstream enhancer initiated a chromatin loop that integrated the 50 WRE to coordinate MYC expression in response to Wnt/b-catenin and mitogen signaling pathways ( 14,15 ) . 
+ Recently , Hatzis et al. ( 16 ) used chromatin immunoprecipitation ( ChIP ) coupled with microarrays ( ChIP-chip ) to localize 6868 TCF4 binding sites in L171 colorectal cancer cells ( 16 ) . 
+ As was the case for b-catenin , TCF4 binding was also found throughout protein coding gene boundaries . 
+ Together these studies indicate that Wnt activation of target gene expression is mechanistically more intricate than the simple model involving recruitment of b-catenin to TCF4-bound 50 promoter regions . 
+ While SACO was a pioneering technique used to identify transcription factor binding sites and was a viable alternative to the ChIP-chip approach , it did , like most methodologies , suffer from some limitations . 
+ First , SACO libraries were constructed in plasmid vectors and then sequenced using high-throughput and Sanger-based sequencing . 
+ This was both laborious and costly . 
+ In addition , large DNA fragments were included in construction of the earliest SACO libraries . 
+ While most DNA was in the 500 -- 700 bp range , fragments as large as 2.5 kb were included in the b-catenin SACO library . 
+ This impinged upon the resolution of the technique and hindered b de novo motif discovery within - catenin bound loci . 
+ A successor to SACO is the recently described ChIP coupled with massively parallel sequencing ( ChIP-Seq ) approach ( 17 -- 19 ) . 
+ In this technique , DNA puriﬁed from immunoprecipitated chromatin is size-selected and then sequenced using one of the next-generation sequencing platforms such as the Illumina genome analyzer ( 18,19 ) . 
+ The robustness , cost , resolution and relative ease in library construction has made ChIP-seq , rather than SACO , a current method of choice for genome-wide localization of transcription factor binding sites . 
+ ChIP-Seq has been used to map numerous histone modiﬁcations ( 20 ) and binding sites for several transcription factors including , but not limited to , NRSF/REST , GATA1 , SRF , E2F4 , E2F6 and STAT1 ( 17 ) . 
+ In addition , ChIP-Seq has been used recently to identify TCF4 enriched binding regions in human colon cancer cell lines ( 21,22 ) . 
+ Tuupanen et al. ( 22 ) identiﬁed 10 TCF4-site containing regions in LoVo cells using a combination of ChIP-Seq and the enhancer element locator analysis . 
+ Blahnik et al. ( 21 ) used ChIP-Seq to identify 21 102 TCF4 binding sites in HCT116 cells . 
+ In this report we used ChIP-Seq to identify b-catenin binding regions in HCT116 human colon cancer cells . 
+ We chose this high-resolution and genome-wide approach because we were interested in using de novo motif analysis to identify transcription factors that putatively cooperate with b-catenin/TCF4 . 
+ Many algorithms exist to map the enriched genomic regions identiﬁed in a ChIP-Seq experiment ( 23 ) . 
+ Because each approach varies in computational strategy and can produce dramatically different numbers of enriched regions for a given false discovery rate ( FDR ) cutoff , there is some debate as to which is the preferred algorithm ( 18 ) . 
+ In this report , we used CisGenome ( 24 ) , SISSRs ( 25 ) and WTD ( 26 ) to initially identify enriched regions from the b-catenin ChIP-Seq library . 
+ Based on the intersection of regions found in common with each algorithm , we identiﬁed 2168 b-catenin enriched regions . 
+ Consistent with our previous report ( 12 ) , we found over-representation of the core and evolutionarily conserved TCF4 consensus motifs within enriched regions . 
+ In addition , we found that consensus AP-1 motifs were also over-represented in a large subset of the enriched regions with the majority of these motifs co-occurring with a TCF4 motif . 
+ Finally , we show that serum mitogens and Wnt signaling agonists cooperatively activate expression of some target genes in proximity to b-catenin bound loci that contain AP-1 and TCF4 motifs . 
+ These ﬁndings indicate that a discrete subset of b-catenin target genes are activated by mitogen and Wnt signaling in colon cancer cells , and that this regulation likely occurs through consensus AP-1 and TCF4 sites , respectively . 
+ MATERIALS AND METHODS Cell culture
+ HCT116 human colorectal cancer cells ( ATCC number CCL-247 ) were cultured as previously described ( 14 ) . 
+ ChIP
+ Antibodies used in ChIP assays included : 3 mg of anti-b-catenin ( BD transduction , 610154 ) , 3 mg of anti-TCF 
+ ( Millipore , 05-511 ) , 2 ml of anti-c-Jun ( Millipore , 06-225 ) and 6 mg rabbit anti mouse IgG ( Jackson Immunoresearch , 315-005-003 ) . 
+ b-Catenin ChIP DNA for the ChIP-Seq library was prepared using the Chromatin Immunoprecipitation Assay Kit ( Millipore , 17-295 ) according to the instructions . 
+ To assess b-catenin , TCF4 and c-Jun binding to ChIP-Seq peak loci , ChIP assays contained 5 -- 10 10 cells and were conducted as previously 6 reported ( 13 ) . 
+ Chromatin in formaldehyde ﬁxed cell lysates was sonicated to an average size of 500 -- 700 bp using a Misonix Ultrasonic XL-2000 Liquid Processor ( 5 20 s , output wattage 7 , with 45 s rest intervals on ice between pulses ) . 
+ Real time PCR was used to detect isolated ChIP fragments and samples contained 10 ml of 2 iQ SYBR Green Supermix ( Bio-rad , 170-882 ) , 0.25 mM of each primer and 3 ml of puriﬁed ChIP DNA . 
+ Reactions were processed for one cycle at 94 C for 3 min , then 45 cycles at 94 C for 10 s and at 68 C for 40 s using a MyIQ Single Color Real-Time PCR machine ( Bio-rad ) . 
+ Primers were designed , using Primer3 software , to a 600 bp DNA segment that was centered on the b-catenin ChIP-Seq coverage region . 
+ Primers used in this study are listed in Supplementary Table S7 . 
+ Real time data is represented as fold levels over control . 
+ The control is a distal region that is 5 kb upstream from the MYC transcript start site that does not bind signiﬁcant levels of b-catenin , TCF4 or c-Jun ( 14 ) . 
+ Construction of the ChIP-Seq library
+ b-Catenin precipitated and puriﬁed ChIP DNA ( 350 ng ) was processed using the ChIP-Seq DNA Sample Preparation Kit ( Illumina , 1003473 ) according to instructions provided by the manufacturer . 
+ Prior to sequencing , DNAs were re-quantiﬁed using a NanoDrop 1000 Spectrophotometer and the quality of DNA was assessed using a Bioanalyzer DNA 1000 ( Agilent ) . 
+ Samples were diluted to 10 nM and 54-nt reads were obtained from one lane of sequencing on a Illumina GA II sequencer . 
+ The High Throughput Sequencing Facility at the University of Oregon ( http://htseq.uoregon.edu ) sequenced the library . 
+ Raw sequence data was submitted to the sequence read archive ( SRA ) under accession number SRA012054 . 
+ Realignment of ChIP-Seq reads
+ Reads ( 9 322 654 ) were sequenced and of these , 8 456 287 passed the quality ﬁlter as assessed using ELAND software ( Illumina ) . 
+ We then used ELAND to align the 8 456 267 reads to the repeat masked NCBI 36/hg18 build of the human genome . 
+ We assigned unique positions for 6 576 033 reads allowing up to two mismatches in the ﬁrst 32 bases of the read sequence . 
+ This set of reads was retained for downstream computational analysis . 
+ Identiﬁcation of b-catenin enriched regions
+ We utilized three peak calling programs to deﬁne a set of putative binding regions : CisGenome ( 24 ) , SISSRs ( 25 ) and WTD ( 26 ) . 
+ Each method implements variations on a sliding window approach to identify regions of higher read depth , referred to as peaks , relative to a background distribution . 
+ Based on the particular algorithm , background distributions are derived using reads from a negative control experiment , through monte carlo proced-ures , or through statistical models ( 23 ) . 
+ CisGenome and SISSRs rely on statistical model ﬁtting while the WTD method uses a randomization approach . 
+ SISSRs was run using a window size of 20 bases with the FDR set at 0.01 . 
+ WTD was run with the window size estimated from the binding characteristics . 
+ Any local tag anomalies were removed and an FDR cutoff of 0.01 was assessed using 10 randomization procedures . 
+ CisGenome was run using a window size of 100 and a read cutoff of 7 reads . 
+ We then identiﬁed the midpoints of each of the regions and extended 299 bp upstream and 300 bp downstream so that there was a total of 600 bp identifying each putative binding region . 
+ We chose 600 bases because it was twice what we considered to be the largest size of DNA fragments submitted for sequencing ( Figure 1B ) . 
+ Using these criteria , CisGenome , SISSRs and WTD called 100 372 , 80 733 and 2940 peak regions , respectively . 
+ Peaks called in common by the three algorithms yielded 2168 putative b-catenin binding regions and this set was used for further computational analysis ( See Supplementary Table S1 for a summary of peak overlaps ) . 
+ De novo motif analysis
+ The genomic sequence ( 600 bp ) encompassing each enriched region was isolated and the regions were separated into two sets based upon whether they had at least one instance of a canonical ( T/A -- T/A -- C -- A -- A -- A -- G ) or evolutionarily conserved ( A -- C/G -- T/A -- T -- C -- A -- A -- A -- G ) TCF4 motif within the boundaries of the region ( 16 ) . 
+ The reverse complements of these sequences were also included . 
+ The sequence from each region was repeat masked and used as input into the Gibbs sampler motif ﬁnding program provided by CisGenome . 
+ The motif ﬁnder was run searching for motifs of 7 , 11 and 15 bp using 5000 MCMC iterations and a score was produced for each motif . 
+ Control sequences were picked based on the strategy of Ji et al. ( 27 ) . 
+ Brieﬂy , sequences were chosen to match the underlying characteristics of the enriched regions . 
+ Each control sequence was picked randomly such that it was of the same size and was in the same position relative to the nearest RefSeq transcript ( 28 ) as a given enriched region . 
+ Five sets of control sequences were chosen in this manner for both sets of enriched regions . 
+ Motifs found from the de novo search were mapped back to both the b-catenin enriched regions and the control sequences using the motif mapping tool from CisGenome . 
+ The number of matches were based on a likelihood ratio cutoff of 500 and a background model consisting of a third-order markov chain , in accordance with Ji et al. ( 27 ) . 
+ The relative enrichment was computed as described ( 27 ) . 
+ Motifs that had a relative enrichment score > 2 were determined to be over-represented in the b-catenin enriched regions . 
+ Chromatin conformation capture
+ Chromatin conformation capture ( 3C ) assays were conducted as described ( 15 ) with minor modiﬁcations . 
+ Formaldehyde cross linked chromatin was digested over-night with 40 ml ( 800 U ) of XbaI ( New England Biolabs ) . 
+ XbaI was then heat-inactivated at 65 C for 20 min prior to ligation reactions . 
+ After proteinase K treatment , the samples were extracted in phenol/chloroform three times , followed by three back extractions with chloroform . 
+ The chromatin loop at CXXC5 was detected by PCR using primers C51 , GTACGTAGTCGTTTTAGCC and C56 , GCACCCAGCCTCTCAAACCC and the conditions previously described ( 15 ) . 
+ To control for loading , parallel samples were ampliﬁed by PCR with the tubulin speciﬁc primers GGGGCTGGGTAAATGGCAAA and TGGCACTGGCTCTGGGTTCG . 
+ Products were analyzed on a 1 % agarose gel by electrophoresis , puriﬁed and sequenced . 
+ Serum and LiCl stimulation of HCT116 cells
+ HCT116 cells were synchronized in the cell cycle as previously described ( 14 ) . 
+ For ChIP experiments , G0/G1 cells grown in a 10-cm tissue culture dish were stimulated with medium containing 10 % fetal bovine serum for 1 or 2 h prior to formaldehyde ﬁxation . 
+ For expression analysis , G0/G1 cells were grown in a 6-well plate prior to stimulation with medium containing serum with and without 10 mM LiCl for 1 or 4 h as indicated . 
+ Reverse transcription/real time PCR
+ RNA was isolated using TRIZol reagent ( Invitrogen , 15596-018 ) according to the instructions . 
+ cDNA was synthesized using 500 ng of total RNA and the iScript cDNA Synthesis Kit ( Bio-rad , 170-8890 ) according to the instructions . 
+ cDNA was diluted to 1 : 150 before quantiﬁcation by real-time PCR . 
+ Real-time PCR was conducted as outlined under the ChIP section , except 3 ml of diluted cDNA was used as the template . 
+ Primers were designed using Primer3 software and their sequences are included in Supplementary Table S7 . 
+ RESULTS
+ Construction of the b-catenin ChIP-Seq library
+ We were interested in using ChIP coupled with massively parallel sequencing ( ChIP-Seq ) to identify b-catenin binding regions in HCT116 human colon cancer cells . 
+ Prior to constructing the library , we tested the eficacy of our ChIP protocol to identify bona ﬁde b-catenin targets in this cell line . 
+ b-catenin strongly associated with a WRE located 1.4 kb downstream of the transcription stop site of the c-Myc gene ( MYC ) as we have reported previously ( Figure 1A ) ( 14,15 ) . 
+ Furthermore , insigniﬁcant levels of b-catenin were detected at a control element located 5 kb upstream of the MYC transcription start site that did not associate with either b-catenin or TCF4 ( Figure 1A ) ( 14 ) . 
+ Size-selected b-catenin ChIP DNA was then processed according to the Illumina sample preparation protocol and minimally ampliﬁed by PCR . 
+ Most ampliﬁed fragments were in the range of 175 -- 225 bp and were produced in samples containing b-catenin ChIP DNA whereas these products were absent in the control sample ( Figure 1B ) . 
+ A total of 9 322 654 reads were generated from one lane of sequencing using an Illumina GA II high throughput sequencer . 
+ Of these , 90.7 % ( 8 456 287 ) passed the quality ﬁlter and 77.8 % ( 6 576 033 ) reads were assigned a unique position in the human genome . 
+ The set of 6 576 033 reads was then subjected to additional computational analysis . 
+ As outlined in the ` Materials and Methods ' section , we compared three computational algorithms and we considered the peaks called in common to demarcate putative b-catenin binding regions . 
+ This approach yielded 2168 peaks that we termed b-catenin enriched regions . 
+ The genomic boundaries of these regions are provided in Supplementary Table S2 . 
+ To determine whether our approach identiﬁed bona ﬁde Wnt/b-catenin target genes , we searched for representation of the MY gene . 
+ b-catenin enriched regions coincided with the 50 , 30 and distal WREs previously shown to regulate MYC expression ( Figure 1C ) ( 15,22,29 -- 31 ) . 
+ Thus , our approach identiﬁed b-catenin associated WREs in colon cancer cells . 
+ Computational analysis of b-catenin enriched regions We next localized the b-catenin enriched regions relative to transcripts deposited in the reference sequence database ( RefSeq ) ( 28 ) . 
+ Of the 2168 b-catenin enriched regions , 1562 ( 72 % ) were within 50 kb of a RefSeq transcript . 
+ Upon further analysis , we found that 1219 ( 56 % ) were within 10 kb and 1090 ( 50 % ) were within 2.5 kb ( Figure 2A ) . 
+ With respect to protein-coding genes and in agreement with our previous ﬁndings ( 12 ) , we found that b-catenin preferentially localized to internal positions or those positions that are downstream from the transcription start site and upstream from the transcription stop site ( Figure 2B ) . 
+ There was a tendency for b-catenin enriched regions within 2.5 kb of the 50 gene boundary to cluster around transcriptional start sites ( Figure 2C ) . 
+ The genes containing b-catenin enriched regions near ( < 2.5 kb ) or within gene boundaries are listed in Supplementary Table S3 . 
+ Overall , the 1090 b-catenin enriched regions are near or within 988 genes . 
+ It was recently shown that a distal WRE interacted with MYC through a large chromatin loop ( 30,31 ) . 
+ This was the ﬁrst demonstration indicating that a WRE positioned hundreds of kilobases away from their target genes functioned as a transcriptional enhancer . 
+ To further explore the relationship of b-catenin enriched regions and annotated protein-coding transcripts , we determined the empirical cumulative distribution function ( CDF ) of the distance from each b-catenin enriched region to the nearest transcript . 
+ This analysis found that 80 % of enriched regions were within 100 kb of an annotated transcript and that 95 % were within 450 kb ( Figure 2D ) . 
+ Together these ﬁndings indicate that while most b-catenin regions are near or within protein-coding genes , 28 % localized at a distance of > 50 kb away . 
+ Furthermore , localization of b-catenin enriched regions with gene boundaries was statistically signiﬁcant when compared to localization of a control set of regions with gene boundaries ( Supplementary Figure S1 ) . 
+ b-Catenin occupancy of 50 and 30 regions at CXXC5 identiﬁed a chromatin loop Recently we described a b-catenin and TCF4-coordinated chromatin loop at MYC that integrated 50 and 30 proximal WREs ( 15 ) . 
+ To identify targets that may be likewise regulated , we searched for genes that contained both 50 and 30 b-catenin enriched regions . 
+ In addition to MYC , we found two genes , CXXC5 and FXR2 , that contained b-catenin enriched regions within 2.5 kb of both transcript boundaries ( Supplementary Table S4 and Figure 3A ) . 
+ If the range was expanded to include regions 10 kb from the 50 and 30 ends , 11 loci were identiﬁed . 
+ This number increased to 111 loci if the range was further expanded to 50 kb . 
+ We ﬁrst used ChIP and real-time PCR to determine whether b-catenin occupied the identiﬁed regions relative to CXXC5 . 
+ b-catenin precipitated higher levels of the 50 and 30 CXXC5 enriched regions relative to control ( Figure 3B ) . 
+ We then used chromatin conform-ation capture ( 3C ) to determine whether a chromatin loop containing the 50 and 30 b-catenin associated regions formed at CXXC5 ( 32 ) . 
+ Figure 3A depicts the pos-itions of the XbaI restriction endonuclease sites and PCR primer locations used to interrogate CXXC5 in 3C assays . 
+ A PCR product of the correct size was generated with forward primer C51 and reverse primer C56 , and its production was dependent upon the addition of XbaI and DNA ligase to the reaction ( Figure 3C ) . 
+ This 341 bp fragment was sequenced and conﬁrmed to be the correct CXXC5 product ( Figure 3D ) . 
+ This analysis indicated that a chromatin loop containing b-catenin bound 50 and 30 WREs is present at CXXC5 in human colon cancer cells . 
+ Motif analysis of b-catenin enriched regions
+ Genome-wide binding analysis has indicated that most b-catenin recruitment to chromatin in colon cancer cells likely occurred through interactions with TCF4 ( 12 ) . 
+ Therefore , we ﬁrst determined whether the b-catenin enriched regions contained a canonical TCF4 motif ( T/A -- T/A -- C -- A -- A -- A -- G ) or the evolutionarily conserved TCF4 motif ( A -- C/G -- T/A -- T -- C -- A -- A -- A -- G ) ( 16 ) . 
+ Of the 
+ GTACGTAGTCGTTTTAGCCCCGGGACTCAAGAG TTGAGGCTGATGCCTGCCTGAGAGATAAAATATCCTTTCTCGGAT
+ CAGTTTCCTCACCTGAGAAATGGGAACGGGAATCTCCGCCCCTT TTCTCCCGGGGCCCTAGTGCCCACTGAATCCATTAAGGAGCTCT TGGAAGGGTGGGGTCTTGGAACACGCGTCTACCTCCCAGGACC CTCGACTAGGAATCTCTGGCCCGCCGCGCACCTGAGCTGGGGG GCGCGGCCAAATTCTCCCTCCCGGTCCTCGGAGCTTCTGGCCC CGC TCTAGA CACAGAACGGTGGGGGTTTGAGAGGCTGGGTGC XbaI C56 Figure 3 . 
+ A chromatin loop containing 50 and 30 b-catenin enriched regions is detected at the CXXC5 gene . 
+ ( A ) Schematic of the CXXC5 locus with untranslated regions as thin rectangles , introns as thin lines , exons as thick rectangles and an arrow demarcating the transcription start site . 
+ The peak density plots below the gene represent b-catenin enriched regions identiﬁed in the ChIP-Seq library . 
+ The triangles and stunted arrows identify the XbaI sites and PCR primers , respectively , used in the chromatin conformation capture ( 3C ) assay depicted in ( C ) . 
+ ( B ) Real time PCR analysis of b-catenin ChIP assays performed in HCT116 cells . 
+ Speciﬁc oligonucleotides were used to detect b-catenin binding to enriched regions depicted by gray rectangles in ( A ) . 
+ 50 is the upstream site and 30 is the downstream site . 
+ A distal upstream region of the MYC gene was used as a negative control ( Ctrl ) . 
+ Error bars are SEM . 
+ ( C ) Agarose gel of PCR products generated from a 3C analysis of CXXC5 in HCT116 cells . 
+ Generation of the 3C product ( CXXC5 0 0 5 3 ) with primers C51 and C56 required the addition of XbaI and ligase to the reactions . 
+ LC is a loading control and S is a DNA standard . 
+ ( D ) DNA sequence of the 3C product . 
+ Arrows denote primer sequences C51 and C56 and the XbaI site are boxed . 
+ 2168 enriched regions , 1026 ( 47 % ) contained at least one TCF4 motif . 
+ A fraction of these , 192 ( 9 % ) , resembled the longer and evolutionarily conserved variant . 
+ We then performed de novo motif analysis on these populations using a Gibbs sampler algorithm ( 24,27 ) . 
+ Over-representation of motifs was determined by computing the relative enrichment measure as described in the ` Materials and Methods ' section ( 27 ) . 
+ Using the 1026 enriched-regions that contained TCF4 consensus sequences , we successfully identiﬁed an over-representation of both the core and evolutionarily conserved TCF4 motifs . 
+ This indicated that our de novo search approach was valid . 
+ Upon further analysis , we found a striking co-enrichment of AP-1 motifs with the consensus TCF4 motifs . 
+ Examples of all three motifs , along with their scores and enrichment values relative to control sequences , are shown in Figure 4A . 
+ Overall , 417 b-catenin enriched regions contained a TCF4 and an AP-1 motif ( Figure 4B and Supplementary Table S5 ) . 
+ The coupling of AP-1 and TCF4 motifs in 417 ( 19 % ) b-catenin enriched regions suggested that AP-1 , TCF4 and b-catenin may co-regulate target gene expression . 
+ To address this hypothesis , we used the ChIP assay to determine whether these factors bound regions containing AP-1 and TCF4 motifs . 
+ We ﬁrst tested b-catenin binding to a selected subset of regions associated with 23 protein-coding genes . 
+ For this set of genes , associated regions were those that localized within 2.5 kb from gene boundaries . 
+ b-catenin occupied 19 sites in asynchronously growing HCT116 cells ( Figure 5A ) . 
+ We then assayed the same regions for TCF4 binding using TCF4 speciﬁc antibodies in the ChIP assay . 
+ TCF4 bound the same 19 targets as b-catenin ( Figure 5B ) . 
+ We concluded from this analysis that b-catenin and TCF4 co-occupied target genes containing TCF4 and AP-1 consensus motifs . 
+ Next , we determined whether AP-1 bound to these selected regions . 
+ AP-1 is a heterodimeric complex comprised of Fos and Jun transcription factors ( 33,34 ) . 
+ AP-1 regulates key cellular processes such as proliferation , differentiation and apoptosis ( 35 ) . 
+ Several groups have shown that c-Jun associates with AP-1 consensus motifs in colon cancer cells ( 14,36 -- 39 ) . 
+ We therefore tested whether c-Jun occupied b-catenin enriched regions containing AP-1 and TCF4 motifs . 
+ Using c-Jun antibodies in ChIP assays conducted in asynchronously growing HCT116 cells , we found that c-Jun associated with 14 of the 19 regions that bound b-catenin and TCF4 ( Figure 5C ) . 
+ Serum mitogens elicit signal transduction pathways that stimulate c-Jun binding to chromatin . 
+ We have previously shown that c-Jun occupancy of the MYC 30 enhancer increased as quiescent cells re-entered the cell cycle in response to serum ( 14 ) . 
+ We therefore determined whether treatment of quiescent cells with serum would stimulate c-Jun association with the ﬁve targets that lacked binding in asynchronous cells . 
+ HCT116 cells were grown to conﬂuency in serum-depleted medium for two days , which caused these cells to enter the G0/G1 stage of the cell cycle ( 14,39 ) . 
+ Cells were then treated with medium containing serum for 1 or 2 h and c-Jun ChIP assays were conducted . 
+ In line with previous ﬁndings , higher levels of c-Jun were found at the MYC 30 enhancer when synchronized cells were exposed to serum for 1 h as compared to levels detected in quiescent cells media . 
+ We then added medium containing serum with or without 10 mM LiCl for 1 or 4 h. LiCl is a well-established agonist of the Wnt/b-catenin pathway as it inhibits GSK3b and stimulates nuclear b-catenin accumulation ( 13,40,41 ) . 
+ We and others have shown that LiCl increased b-catenin levels in HCT116 cells ( 13,42,43 ) . 
+ Therefore , we predicted that if mitogen and Wnt/b-catenin signaling pathways converged to regulate gene expression , treatment with serum and LiCl would result in increased transcript levels when compared to treatment with serum alone . 
+ LiCl increased mitogen-induced expression of MYC , PDE4B , DDR2 , CTBP2 , EGFR , DNAJB1 , WISP1 and PINX1 ( Figure 6B ) . 
+ HDAC4 , PCDH7 and HABP4 were activated by serum alone , and MMP20 , CYP39A1 and YAP1 genes were not induced above levels seen in serum-deprived cells ( Figure 6B ) . 
+ Together this analysis indicated that Wnt/b-catenin and mitogen-signaling pathways directly activate a subset of b-catenin target genes in colon cancer cells . 
+ DISCUSSION
+ Gene expression is rarely controlled by the association of a single transcription factor with an enhancer element embedded in the proximal promoter . 
+ Rather , association of multiple transcription factors within an enhancer allows for precise and speciﬁc regulation of gene expression in response to environmental stimuli . 
+ Moreover , enhancers can occupy regions over 100 kb from their target gene ( 44,45 ) . 
+ Genome-wide proﬁling of transcription factor binding sites has emerged as one method to localize composite enhancer elements that integrate upstream signal transduction pathways ( 17,45 ) . 
+ In this report , we used ChIP-Seq to identify b-catenin enriched regions in human colon cancer cells . 
+ Through an integrated approach involving bioinformatics , ChIP and expression analyses , we provide evidence that a population of b-catenin target genes is directly regulated by b-catenin , TCF4 and AP-1 transcription factors . 
+ The nature of ChIP-Seq data provides many challenges for analysis ( 18 ) . 
+ Algorithms have been designed to assign a presence or absence prediction for occupancy at any non-repetitive region of the genome . 
+ A common approach for many algorithms is to use sliding window methods that identify regions of high read depth ( relative to a background distribution ) by traversing the genome in windows of a predetermined size . 
+ CisGenome , SISSRs and WTD algorithms exemplify this approach ( 24 -- 26 ) . 
+ It follows that although each algorithm ﬁnds disparate numbers of enriched regions , increased conﬁdence can be assigned to regions that have been found by all three . 
+ This approach resulted in 2168 enriched b-catenin binding regions identiﬁed in our b-catenin ChIP-Seq library . 
+ In addition to the three WREs that control MYC expression ( 14,22,29 ) , 30 Wnt/b-catenin target genes listed on the Wnt homepage ( http://www.stanford.edu/ rnusse / pathways/targets . 
+ html ) were in proximity to b-catenin enriched regions ( Supplementary Table S5 ) . 
+ Furthermore , when considering of all genomic regions identiﬁed and assayed for b-catenin and TCF4 binding in this report , we found that 28 of 32 ( 87.5 % ) bound both factors . 
+ Together , these ﬁndings indicate that the b-catenin ChIP-Seq library identiﬁed bona ﬁde and direct Wnt/b-catenin target genes . 
+ We previously localized b-catenin binding sites in colon cancer cells using an unbiased and genome-wide approach termed SACO ( 14 ) . 
+ In that study , we found that 84 % of high conﬁdence b-catenin binding regions contained at least one TCF4 consensus core motif ( T/A -- T/A -- C -- A -- A -- A -- G ) . 
+ In this report , we performed de novo motif analysis on the 2168 b-catenin enriched regions and found that 47 % contained a core TCF4 consensus motif . 
+ This discrepancy is likely attributed to methodo-logical and computational differences in the generation and analysis of each data set . 
+ For the SACO study , a 5 kb interval surrounding the mean position of the enriched region was used in the analysis . 
+ For the ChIP-Seq analysis , we examined a much smaller interval enveloping each b-catenin enriched region ( 600 bp ) . 
+ Therefore , based on our ﬁndings using ChIP-Seq and due to the resolution of this technique , the 47 % association rate of TCF4 consensus motifs found in b-catenin enriched regions likely reﬂects the landscape in vivo . 
+ Results gleaned from our current analysis are consistent with TCF4 being a predominant factor that directly recruits b-catenin to enhancers in colon cancer cells , but also suggest that the portion of targets that rely on other b factors to recruit - catenin is substantial . 
+ Two recent studies localized TCF4 binding regions in human colon cancer cells using ChIP-Seq ( 21,22 ) . 
+ We b therefore searched for representation of our - catenin enriched regions in the reported TCF4 libraries . 
+ Of the 10 conserved TCF4 binding peak regions identiﬁed by Tuupanen et al. ( 22 ) , three were identiﬁed in our b-catenin ChIP-Seq library . 
+ This included the peak that identiﬁed the WRE located 335 kb upstream from the MYC transcription start site . 
+ Upon analysis of the TCF4 ChIP-Seq data sets reported by Blahnik et al. ( 21 ) and the ENCODE Project Consortium , we found that 786 ( 36.3 % ) of our b-catenin peak regions overlapped a TCF4 peak region . 
+ There are several possibilities for why a greater percentage of our b-catenin peak regions are not represented in the aforementioned ChIP-Seq libraries . 
+ Methodological variations , algorithms chosen to assign peak regions , and cell-type differences aside , TCF4 is bound to both transcriptionally active and repressed genes . 
+ As b-catenin is thought to primarily associate with transcribed genes , a partial overlap of the b-catenin enriched regions identiﬁed in our study with the TCF4 enriched regions is expected . 
+ Furthermore , as mentioned above , our analysis suggests that the many genes likely recruit b-catenin independently of TCF4 . 
+ Most of these targets are not represented in the TCF4 ChIP-Seq libraries . 
+ Finally , the amount of sequencing required to identify all of the binding sites represented in a ChIP-Seq library is debatable ( 18 ) . 
+ Therefore , we would anticipate that additional sequencing of our library would undoubtedly identify more of the reported TCF4 peak regions . 
+ Overall , however , the concordance of TCF4 and b-catenin peaks identiﬁed by three separate groups independently validates ChIP-Seq as a methodology to identify direct Wnt/b-catenin target genes . 
+ Upon mapping the b-catenin enriched regions relative to RefSeq gene boundaries , we found that binding sites are dispersed through the 50 , intragenic , and 30 ends of gene boundaries . 
+ This ﬁnding is in line with previous genome-wide localization studies of b-catenin and TCF4 binding sites ( 12,16 ) . 
+ We were intrigued by the observation that several targets contained b-catenin enriched regions 0 0 that localized to both 5 and 3 gene boundaries . 
+ Based on our previous work with MYC ( 15 ) , we tested whether a chromatin loop was present at two targets that contained 50 and 30 b-catenin enriched regions in the library , CXXC5 and GRHL3 . 
+ CXXC5 is a zinc ﬁnger containing protein that inhibits canonical Wnt signaling in response to bone morphogen protein signaling in neural stem cells ( 46 ) . 
+ GRHL3 encodes a Grainyhead factor that plays a role in epidermal barrier formation in the bladder ( 47 ) . 
+ 0 0 Substantial levels of b-catenin binding to the 5 and 3 regions of CXXC5 and GRHL3 were detected by ChIP analysis ( Figure 3B and Supplementary Figure S2 ) . 
+ Using the 3C technique , we found that a chromatin loop 0 0 accompanied b-catenin binding to 5 and 3 regions of CXXC5 . 
+ However , we were unable to detect a chromatin loop at GRHL3 . 
+ This ﬁnding suggests that while looping between separated enhancers may be a prevalent mechan-ism to coordinate Wnt/b-catenin gene expression , b-catenin binding to 50 and 30 sites alone is not suficient for this interaction . 
+ We anticipate that 3C coupled with high throughput sequencing techniques will facilitate identiﬁcation of target genes that are regulated by distal WREs via chromatin loops ( 48,49 ) . 
+ Through de novo motif analysis , we found that nearly half of the b-catenin enriched regions that contained a TCF4 consensus motif also contained an AP-1 motif . 
+ It is noted here that AP-1 motifs were also over-represented in TCF4 bound regions identiﬁed by recent ChIP-Seq and ChIP-chip analysis ( 16,21 ) . 
+ However , our current study is the ﬁrst to report over-representation of AP-1 motifs in b-catenin bound regions . 
+ While b-catenin/TCF4 and AP-1 have been shown by others and our group to regulate target gene expression ( 36,38,39,50 ) , our ﬁndings here suggest that target genes regulated by 417 b-catenin enriched regions may be likewise regulated . 
+ Our ChIP analysis indicated that nearly every region assayed ( 95 % ) containing a TCF4 and AP-1 motif bound c-Jun , a component of the AP-1 complex . 
+ The majority of these loci showed an additive increase in expression upon the addition of both LiCl and serum . 
+ This analysis suggests that mitogen and Wnt signaling pathways likely converge through AP-1 and b-catenin/TCF4 to co-regulate target gene expression . 
+ However , LiCl treatment failed to enhance mitogen-activated gene expression for several targets . 
+ It is possible that pre-treatment of quiescent cells with LiCl prior to serum stimulation would sensitize the system to facilitate detection of pathway cooperation . 
+ Alternatively , AP-1 binding may function to regulate gene expression in response to a different stimulus such as cytokine signaling or the apoptotic stress response ( 35 ) . 
+ The application of sequence-based methods to identify transcription factor binding sites genome-wide is likely t persist as the methodology of choice . 
+ Next generation sequencing technology , such as those using the Illumina platform , allows increased resolution and increased output . 
+ These attributes have facilitated the replacement of SACO with massively parallel sequencing approaches to map transcription factor binding regions isolated by b ChIP . 
+ Through our ChIP-Seq screen of - catenin binding regions in asynchronous HCT116 cells , we uncovered evidence for a functional interplay between b-catenin/TCF4 and AP-1 . 
+ Because cells that initiate colon carcinogenesis contain pathogenic levels of b-catenin in the nucleus and are bathed in serum mitogens , our ﬁndings here suggest that miss-expression of target genes containing AP-1 and TCF4 motifs might represent the pathogenically relevant set . 
+ Supplementary Data are available at NAR Online.
+ ACKNOWLEDGEMENTS
+ We would like to thank Dr Laura Carrel and Dr Faoud Ishmael ( Penn State University College of Medicine ) for critically reading this manuscript and providing helpful comments . 
+ We would like to thank Doug Turnbull and the High Throughput Sequencing Facility in the Molecular Biology Institute at the University of Oregon for sequencing the library . 
+ We would like to thank Dr Richard Goodman and Dr Gail Mandel ( Oregon Health and Science University ) for support during the initiation of this project . 
+ FUNDING
+ National Institutes of Health ( grant number R01DK080805 to G.S.Y. ) ; start-up research funds from the Pennsylvania State University College of Medicine ( to G.S.Y. ) ; National Institutes of Health , National Center for Research Resources ( 5UL1RR024140 to S.K.M. ) ; National Institutes of Health , National Cancer Institute ( 5 P30 CA069533-13 to S.K.M. ) . 
+ Funding for open access charge : National Institutes of Health ( grant number R01DK080805 to G.S.Y. ) .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/20639326.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/20639326.txt 0 → 100644
View file @27818a9
+ r, a Histone-Like Protein H1 (H-NS) Family Protein Encoded the IncP-7 Plasmid pCAR1, Is a Key Global Regulator
+ Histone-like protein H1 ( H-NS ) family proteins are nucleoid-associated proteins ( NAPs ) conserved among many bacterial species . 
+ The IncP-7 plasmid pCAR1 is transmissible among various Pseudomonas strains and carries a gene encoding the H-NS family protein , Pmr . 
+ Pseudomonas putida KT2440 is a host of pCAR1 , which harbors ﬁve genes encoding the H-NS family proteins PP_1366 ( TurA ) , PP_3765 ( TurB ) , PP_0017 ( TurC ) , PP_3693 ( TurD ) , and PP_2947 ( TurE ) . 
+ Quantitative reverse transcription-PCR ( qRT-PCR ) demonstrated that the presence of pCAR1 does not affect the transcription of these ﬁve genes and that only pmr , turA , and turB were primarily transcribed in KT2440 ( pCAR1 ) . 
+ In vitro pull-down assays revealed that Pmr strongly interacted with itself and with TurA , TurB , and TurE . 
+ Transcriptome comparisons of the pmr disruptant , KT2440 , and KT2440 ( pCAR1 ) strains indicated that pmr disruption had greater effects on the host transcriptome than did pCAR1 carriage . 
+ The transcriptional levels of some genes that increased with pCAR1 carriage , such as the mexEF-oprN efﬂux pump genes and parI , reverted with pmr disruption to levels in pCAR1-free KT2440 . 
+ Transcriptional levels of putative horizontally acquired host genes were not altered by pCAR1 carriage but were altered by pmr disruption . 
+ Identiﬁcation of genome-wide Pmr binding sites by ChAP-chip ( chromatin afﬁnity puriﬁcation coupled with high-density tiling chip ) analysis demonstrated that Pmr preferentially binds to horizontally acquired DNA regions . 
+ The Pmr binding sites overlapped well with the location of the genes differentially transcribed following pmr disruption on both the plasmid and the chromosome . 
+ Our ﬁndings indicate that Pmr is a key factor in optimizing gene transcription on pCAR1 and the host chromosome . 
+ Nucleoid-associated proteins ( NAPs ) have architectural and regulatory functions in bacterial cells . 
+ Bacterial chromosomal DNA is folded into a compact nucleoid body by NAPs ( 9 , 11 ) . 
+ Because of their DNA-binding ability , NAPs can also inﬂuence the expression of genes ( 9 , 11 ) . 
+ Histone-like protein H1 ( H-NS ) , a NAP family member , is an oligomeric DNA-binding protein identiﬁed in Escherichia coli because of its effect on transcription in vitro ( 13 , 16 ) . 
+ H-NS acts as a global repressor and binds to horizontally acquired DNA regions ( 28 ) . 
+ Plasmid-encoded H-NS can function as a `` stealth '' protein to switch off gene expression on chromosomes or plasmids and to maintain host cell ﬁtness ( 15 ) . 
+ H-NS also interacts with paralogous proteins , such as StpA and Hfp in E. coli , or other NAPs ( 12 , 16 , 27 ) . 
+ * Corresponding author . 
+ Mailing address : Biotechnology Research Center , University of Tokyo , 1-1-1 Yayoi , Bunkyo-ku , Tokyo 113-8657 , Japan . 
+ Phone : 81-3-5841-3064 . 
+ Fax : 81-3-5841-8030 . 
+ E-mail : anojiri @mail . 
+ ecc.u-tokyo . 
+ ac.jp . 
+ † C.-S.Y. and C.S. contributed equally to this work . 
+ ‡ Present address : Department of Environmental Life Sciences , Graduate School of Life Sciences , Tohoku University , Sendai 980-8577 , Japan . 
+ § Present address : Japan Collection of Microorganisms , Microbe Division , RIKEN BioResource Center , 2-1 Hirosawa , Wako , Saitama 351-0198 , Japan . 
+ ¶ Supplemental material for this article may be found at http://jb . 
+ asm.org / . 
+ Published ahead of print on 16 July 2010 . 
+ Tendeng et al. ( 39 ) suggested that conserved MvaT proteins from Pseudomonas bacteria belong to the H-NS family , despite their limited sequence similarity with H-NS . 
+ Recently MvaT and MvaU from Pseudomonas aeruginosa PAO1 , functional homologous H-NS proteins from Pseudomonas bacteria , were shown to interact with each other ( 44 ) . 
+ Castang et al. ( 5 ) reported that these two H-NS family proteins bind to the same chromosomal regions and that they function coordinately . 
+ Interestingly , P. putida KT2440 has ﬁve genes encoding H-NS family proteins , and recently Renzi et al. ( 30 ) named them as follows : PP_1366 ( turA ) , PP_3765 ( turB ) , PP_0017 ( turC ) , PP_3693 ( turD ) , and PP_2947 ( turE ) . 
+ TurA and TurB were copuriﬁed as the TOL plasmid ( pWW0 ) upper operon repressors A and B , respectively , and both bound to the Pu promoter ( a 54-dependent promoter of the operon encoding enzymes for the upper pathway of toluene degradation in pWW0 ) , suggesting that these two proteins could interact with each other ( 31 ) . 
+ Renzi et al. ( 30 ) proposed that TurA and TurB belonged to groups I and II , respectively , and that these groups contained orthologous H-NS family proteins present in all Pseudo-monadaceae species . 
+ Conversely , TurC , TurD , and TurE belonged to group III , which contained species-speciﬁc H-NS family proteins ( 30 ) . 
+ The self-transmissible pCAR1 , an IncP-7 archetypal plasmid , endows the host strain with carbazole-degrading ability ( 23 , 36 , 38 ) . 
+ pCAR1 carries the pmr gene , encoding the H-NS family protein designated Pmr ( plasmid-encoded MvaT-like regulator ) ( 25 ) and belonging to the above-mentioned group III . 
+ The effect of plasmid carriage on host strains may change in different hosts , and therefore , we performed transcriptome comparisons between pCAR1-free and pCAR1-containing KT2440 strains ( 25 , 35 ) . 
+ Based on the comparisons , pCAR1 carriage affected the iron acquisition system of the host KT2440 strain , enhanced resistance to chloramphenicol by inducing the mexEF-oprN operon , and induced the transcription of PP_3700 ( parI ) ( 35 ) . 
+ We also discovered that pmr was transcribed in four distinct Pseudomonas host bacterial strains ( 26 , 35 ) . 
+ These data suggest that Pmr could interact with other H-NS family proteins , such as TurA , TurB , TurC , TurD , and TurE , encoded on the KT2440 chromosome . 
+ In the present study , we assessed the in vivo transcriptional proﬁles of genes encoding H-NS family proteins on both pCAR1 and the KT2440 chromosome . 
+ Additionally , we investigated the in vitro interaction of Pmr with itself and with other H-NS family proteins . 
+ Furthermore , we assessed the effect of pmr disruption on the transcriptome of the host strain and identiﬁed genome-wide Pmr-binding sites . 
+ Taken together , we clariﬁed the role of Pmr as a horizontally acquired H-NS family protein . 
+ MATERIALS AND METHODS
+ Bacterial strains and plasmids . 
+ The bacterial strains and plasmids used in this study are listed in Table 1 . 
+ E. coli strains for cloning and expression of genes were grown in L broth ( LB ) ( 32 ) at 37 °C or 25 °C , and the Pseudomonas strains were cultivated with LB at 30 °C . 
+ Ampicillin ( Ap ) ( 50 g/ml ) , chloramphenicol ( Cm ) ( 30 g/ml ) , kanamycin ( Km ) ( 50 g/ml ) , gentamicin ( Gm ) ( 120 g/ml ) , rifampin ( Rif ) ( 250 g/ml ) , streptomycin ( Sm ) ( 450 g/ml ) , or tetracycline ( Tc ) ( 12.5 g/ml ) was added to the selective medium . 
+ For plate cultures , the above media were solidiﬁed with 1.6 % agar ( wt/vol ) . 
+ DNA manipulations . 
+ Plasmid DNA extraction from E. coli was performed using the alkaline lysis method ( 32 ) , and total DNA from Pseudomonas strains was extracted using hexadecyltrimethylammonium bromide as described previously ( 1 ) . 
+ Restriction enzymes ( New England Biolabs , Ipswich , MA ; Toyobo , Tokyo , Japan ) and the Ligation High reagent ( Toyobo ) were used according to the manufacturers ' instructions . 
+ DNA fragments were extracted from agarose gels using the Ezna gel extraction kit ( Omega Bio-Tek , Norcross , GA ) according to the manufacturer 's instructions . 
+ PCR was performed with Ex Taq Hot Start polymerase ( Takara Bio , Shiga , Japan ) according to the manufacturer 's instructions . 
+ All other experiments were performed according to standard methods ( 32 ) . 
+ All primers used are presented in Table S1 in the supplemental material . 
+ RNA extraction . 
+ RNA extractions from strain KT2440 , KT2440 ( pCAR1 ) , or KT2440 ( pCAR1 pmr ) were performed as follows : an overnight culture of each strain in LB was washed and transferred into 100 ml NMM-4 buffer ( 37 ) supplemented with 0.1 % succinate by adjusting the turbidity to 0.05 at 600 nm and then incubated at 30 °C in a rotating shaker at 120 rpm . 
+ At early log phase growth ( turbidity of 0.15 to 0.20 at 600 nm ) , we used the RNAprotect bacterial reagent ( Qiagen , Valencia , CA ) to stabilize the total RNA in the bacterial cultures , and subsequently , RNA extraction was performed using the RNeasy Midi kit ( Qiagen ) or Nucleospin RNA II ( Macherey-Nagel GmbH & Co. . 
+ KG , Düren , Germany ) according to the manufacturers ' instructions . 
+ The eluted RNA was treated with RQ1 RNase-free DNase ( Promega , Fitchburg , WI ) at 37 °C for 30 min . 
+ Following inactivation of the DNase by the addition of the stop reagent and subsequent incubation at 65 °C for 10 min , RNA samples were repuriﬁed with th 
+ RNeasy Mini column ( Qiagen ) or Nucleospin RNA binding column ( MachereyNagel ) according to each manufacturer 's RNA cleanup protocol . 
+ Primer extension and pmr disruption . 
+ We identiﬁed the transcription start point ( tsp ) of pmr by primer extension analysis , performed as described previously ( 25 ) . 
+ We used the IRD800-labeled primer PMR-R ( Aloka Co. , Ltd. , Tokyo , Japan ) ( see Table S1 in the supplemental material ) , which anneals to the coding region of pmr from 218 to 237 ( 77782 to 77763 on pCAR1 ; see Table S1 ) . 
+ The extension reaction was performed with 4 l of 5 First Strand buffer containing 10 g of total RNA , 2 pmol of the labeled primer , 100 U of SuperScript III reverse transcriptase ( Invitrogen , Carlsbad , CA ) , 40 U of RNaseOUT ( Invitrogen ) , 10 mM dithiothreitol ( DTT ) , and 0.5 mM deoxynucleoside triphosphates ( dNTPs ) ( Toyobo ) . 
+ After denaturation of the RNA and the labeled primer at 65 °C for 5 min , the remaining reagents were added , and then the mixture was incubated at 50 °C for 30 min . 
+ The extended product was puriﬁed by phenol-chloroform extraction and ethanol precipitation and then dissolved in 2 l of H2O and 1 l of IR2 stop solution ( Li-Cor Inc. , Lincoln , NE ) . 
+ The solution was then denatured at 95 °C for 2 min and subjected to electrophoresis using a Li-Cor model 4200L-2 automated DNA sequencer ( Li-Cor ) . 
+ A sequence ladder was obtained using the same primer and the template plasmid pUB11 ( Table 1 ) . 
+ pmr disruption in pCAR1 was designed by removing the region containing the tsp ( 77486 to 77909 ) . 
+ The 3.8-kb EcoRI-PstI fragment from 75681 to 79457 in pCAR1 ( GenBank/EMBL/DDBJ accession number AB0088420 ) was inserted into pK19mobsacB ( 33 ) , and then the SmaI fragment containing the nonpolar Gm resistance cassette of pSJ12 ( 21 ) was inserted into blunt-ended SalI-SacI sites from 77486 to 77909 in the opposite direction to yield pK19mobsacBpmrGm . 
+ Using a method described previously ( 29 ) , pK19mobsacBpmrGm was introduced into KT2440 ( pCAR1 ) by ﬁlter mating with E. coli S17-1 ( pir ) transformants , and subsequently , double-crossover recombinants were screened . 
+ Quantitative RT-PCR . 
+ Quantitative reverse transcription-PCR ( qRT-PCR ) was performed using the ABI 7300 real-time PCR system ( Applied Biosystems , Foster City , CA ) as described previously ( 25 ) . 
+ The primers used for qRT-PCR are shown in Table S1 in the supplemental material , and all of the products were between 100 and 150 bp in length . 
+ 16S rRNA was used as an internal normalization standard . 
+ All of the reactions were carried out at least in triplicate , and the data were normalized using the average of the internal standard . 
+ Preparation of a KT2440 ( pCAR1 ) derivative containing a gene encoding the His-tagged Pmr protein . 
+ The construction of the KT2440 ( pCAR1 ) derivative strain expressing Pmr containing six histidine ( His ) residues at the C terminus was performed using a homologous recombination-based gene replacement system with suicide vectors , antibiotic resistance selection , and sucrose counterselection ( 33 ) . 
+ The preparation of the DNA region to replace the pmr gene with a modiﬁed gene that expresses His-tagged Pmr was performed by overlap extension PCR as described by Choi and Schweizer ( 6 ) . 
+ Brieﬂy , the primers Pmr-His01 and Pmr-His02 were used to amplify the His-tagged pmr gene . 
+ The prim-ers Pmr-His03 and Pmr-His04 were used to amplify the downstream region of the untagged pmr gene . 
+ Simultaneously , the primers Gm-F and Gm-R were used to amplify the Gm cassette ﬂanked by ﬂippase recognition target ( FRT ) sites from pPS856 ( 20 ) . 
+ The primers used are listed in Table S1 in the supplemental material . 
+ These three partially overlapping DNA fragments were ampliﬁed and then spliced together by in vitro overlap extension PCR . 
+ The resulting DNA fragment was cloned into the pT7Blue T vector . 
+ After veriﬁcation of the inserted sequence , the fragment was excised and then recombined into the suicide vector pK19mobsacB to yield pK19mobsacBpmrHis . 
+ The pK19mobsacBpmrHis construct was introduced into KT2440 by ﬁlter mating with E. coli S17-1 ( pir ) transformants , and double-crossover recombinants were subsequently screened by sucrose counterselection to yield the KT2440 ( pCAR1 ) derivative , replacing the pmr gene with a gene encoding the His-tagged Pmr protein . 
+ Finally , the Gm resistance gene was removed by site-speciﬁc recombination of FRT sites with Flp recombinase supplied from E. coli S17-1 ( pir ) transformants containing pFLP2Km . 
+ Then , pFLP2Km was constructed by insertion of the EcoRV fragment of pTKm ( 47 ) containing the Km resistance gene cassette into the ScaI site of pFLP2 ( 20 ) . 
+ PCR analyses were performed to conﬁrm the ﬁnal construction of the derivative strain . 
+ Western blot analysis for growth phase-dependent expression of Pmr . 
+ Cell lysates for Western blot analyses were prepared using the B-Per reagent ( Pierce Biotechnology , Inc. , Rockford , IL ) according to the manufacturer 's instructions . 
+ The protein samples were quantiﬁed using the bicinchoninic acid ( BCA ) protein assay reagent kit ( Pierce ) , and 40 g of protein sample for Pmr or 5 g of protein sample for the RNA polymerase subunit was loaded in each lane . 
+ Proteins were separated on a 15 % SDS-polyacrylamide gel and transferred to a Sequi-Blot polyvinylidene diﬂuoride ( PVDF ) membrane ( Bio-Rad , Foster City , CA ) . 
+ Anti-His antibody ( GE Healthcare Bio-Sciences , Piscataway , NJ ) or anti-RNA polymerase subunit ( NeoClone , Madison , WI ) was used as the primary antibody , and enhanced chemiluminescence ( ECL ) peroxidase-labeled anti-mouse antibody ( GE Healthcare Bio-Sciences ) was used as the secondary antibody . 
+ Proteins were detected using the Immobilon Western chemiluminescent horse-radish peroxidase ( HRP ) substrate ( Millipore , Billerica , MA ) , and LAS1000 plus ( Fujiﬁlm , Tokyo , Japan ) was used for imaging analyses . 
+ Overexpression of Pmr and other H-NS family proteins in E. coli cells . 
+ To construct the C-terminal-His-tagged Pmr expression plasmid , the pET-26b ( ) vector ( Novagen , San Mateo , CA ) was used . 
+ The insert was ampliﬁed by PCR using the pCAR1-covered clone pUB11 as template DNA and the primer set with artiﬁcial NdeI and XhoI sites at the 5 and 3 ends of the pmr gene . 
+ The nucleotide sequence of the insert was conﬁrmed , and the resultant expression plasmid was designated pET-C-His-pmr . 
+ To express each C-terminal-FLAG-tagged H-NS family protein ( Pmr , PP_0017 [ TurC ] , PP_1366 [ TurA ] , PP_2947 [ TurE ] , PP_3693 [ TurD ] , PP_3765 [ TurB ] ) , pFLAG-CTC ( Sigma-Aldrich , St. Louis , MO ) was used as a vector . 
+ Each insert was ampliﬁed by PCR using the primer set with artiﬁcial NdeI and SalI sites at the 5 and 3 ends of each gene and pUB11 ( for pmr ) or total DNA of the P. putida strain KT2440 ( for others ) as a template . 
+ The resulting expression plasmids were designated pFLAGpmr , pFLAG0017 , pFLAG1366 , pFLAG2947 , pFLAG3693 , pFLAG3765 , and expressed FLAG-tagged forms of Pmr , PP_0017 ( TurC ) , PP_1366 ( TurA ) , PP_2947 ( TurE ) , PP_3693 ( TurD ) , and PP_3765 ( TurB ) , respectively . 
+ Transformed E. coli BL21 ( DE3 ) harboring each expression plasmid of H-NS family proteins was grown at 25 °C to a cell turbidity at 600 nm of 0.6 to 0.8 and was induced overnight by the addition of isopropyl - D-thiogalactoside ( IPTG ) at a ﬁnal concentration of 0.5 mM . 
+ The expression level of each protein was con-ﬁrmed by Tricine-SDS-PAGE ( 34 ) . 
+ Pull-down assays . 
+ Pull-down assays were performed using the MagneHis protein puriﬁcation system ( Promega ) . 
+ Cells expressing His-tagged or FLAG-tagged H-NS family proteins were harvested by centrifugation and washed twice with 25 mM Tris-HCl ( pH 8.0 , 4 °C ) containing 2 mM EDTA and 10 % glycerol . 
+ Cells were then resuspended in 700 l of MagneHis binding/wash buffer and broken by ultrasonication , and crude extracts were obtained by centrifugation ( 17,000 g , 15 min , 4 °C ) . 
+ Protein concentrations were estimated with the Bio-Rad protein assay reagent ( Bio-Rad ) according to the manufacturer 's instructions . 
+ Crude extract ( 200 g ) containing His-tagged Pmr was mixed with the following amounts of crude extract containing FLAG-tagged proteins , according to each protein expression level : Pmr , 225 g ; TurC , 900 g ; TurA , 450 g ; TurE , 225 g ; TurD , 1350 g ; and TurB , 225 g . 
+ After the addition of 30 l of MagneHis Ni particles ( Promega ) , the protein mixture was incubated at 4 °C and centrifuged ( 10 rpm , 1 h ) . 
+ Elution of His-tagged Pmr was done according to the manufacturer 's instructions . 
+ Each protein sample was separated by Tricine-SDS-PAGE and transferred to a PVDF membrane ( iBlot gel transfer stack , PVDF , regular ; Invitrogen ) using the iBlot gel transfer system ( Invitrogen ) according to the manufacturer 's instructions . 
+ Anti-His antibody ( GE Healthcare Bio-Sciences ) or monoclonal anti-FLAG M2 antibody ( Sigma-Aldrich ) was used as the primary antibody , and ECL peroxidase-labeled anti-mouse antibody ( GE Healthcare Bio-Sciences ) was used as the secondary antibody . 
+ Detection of the proteins was performed similarly to that described above for Western blot analyses . 
+ Phenotype MicroArray ( PM ) analyses . 
+ Phenotypic differences between KT2440 ( pCAR1 ) and KT2440 ( pCAR1 pmr ) in carbon metabolism were compared for cell respiration of each strain using 96-well plate microarrays ( Biolog PM1 and PM2 ; Biolog , Hayward , CA ) ( 4 ) . 
+ Each plate well contained deﬁned medium with a unique carbon compound plus indicator dye for cell respiration , and each medium was made at Biolog . 
+ Excluding carbon-free wells ( negative controls ) , the PM1 and PM2 Biolog assays can assess the ability to use 190 carbon compounds as the sole carbon source . 
+ Experiments were performed in duplicate , according to the manufacturer 's instructions , except that the strains were precultured on R2A plates ( 1.5 % agar ) and data collection was performed manually using the Biolog MicroLog MicroStation system . 
+ Tiling array transcriptome analyses of pCAR1 and the KT2440 chromosome . 
+ Transcriptome analyses with our custom-made tiling arrays were performed as described previously ( 26 , 35 ) . 
+ Brieﬂy , total RNA was extracted in parallel from samples of each host culture ( 1 109 cells from two exponential-phase cultures [ the turbidity of each culture was 0.15 to 0.20 at 600 nm ] derived from two independent precultures ) . 
+ cDNAs reverse transcribed from these RNAs were hybridized individually with each microarray chip using the GeneChip hybrid-ization oven 640 ( Affymetrix , Inc. , Santa Clara , CA ) at 60 rpm and at 50 °C for 16 h with the KT2440 chromosomal tiling array or at 45 °C for 16 h with the pCAR1 tiling array . 
+ After washing , staining , and scanning of the chips , the signal intensities for each probe were computed using the Affymetrix Tiling Analysis Software program , v. 1.1 ( TAS ) . 
+ We used the median signal intensities of the probes located within each gene as an indicator of the expression level . 
+ Comparisons between two conditions were performed using each of the biologicall duplicated data , and we identiﬁed upregulated and downregulated open reading frames ( ORFs ) with fold changes of 1.5 in the four data comparisons ( between replicate 1 of KT2440 ( pCAR1 ) and replicate 1 of KT2440 ( pCAR1 pmr ) and between replicate 1 of KT2440 ( pCAR1 ) and replicate 2 of KT2440 ( pCAR1 pmr ) ; see Tables S2 to S4 in the supplemental material ) . 
+ The data were visualized using the IGB software package ( Affymetrix ) . 
+ The fold change of pmr expression levels between KT2440 ( pCAR1 ) and KT2440 ( pCAR1 pmr ) was only 4.5 to 5.5 ( see Table S2 ) because the Gm resistance gene introduced into the pmr gene was transcribed in the counterdirection to the pmr gene , and the read through from the Gm resistance gene was detected ( see Fig . 
+ S2 ) . 
+ Chromatin afﬁnity puriﬁcation coupled with high density tiling chip ( ChAP-chip ) analysis . 
+ An overnight culture of the KT2440 ( pCAR1 ) derivative expressing 6-His-tagged Pmr in LB at 30 °C was inoculated into 200 ml NMM-4 supplemented with 0.1 % ( wt/vol ) succinate to obtain an initial turbidity at 600 nm of 0.05 and then incubated at 30 °C in a rotating shaker at 120 rpm for 4 h to a turbidity at 600 nm of 0.20 to 0.30 . 
+ The His-tagged Pmr and DNA in the cells were in vivo cross-linked by the addition of formaldehyde to a ﬁnal concentration of 1 % for 15 min with shaking at 30 °C . 
+ The cross-linking reaction was quenched by the addition of glycine to a ﬁnal concentration of 125 mM for 5 min , and then the cells were washed twice with chilled Tris-EDTA ( TE ) buffer ( pH 8.0 ) . 
+ The resulting harvested cells were disrupted by sonication on ice in 2.4 ml of QuickPick Imac wash buffer ( Bio-Nobile , Turku , Finland ) . 
+ After centrifugation ( 17,000 g , 20 min ) , the supernatant was afﬁnity puriﬁed using the QuickPick Imac metal afﬁnity kit ( Bio-Nobile ) according to the manufacturer 's instructions to yield 6-His-tagged Pmr . 
+ Cross-links were dissociated by heating at 65 °C for 4 h , and the resulting DNA was puriﬁed using the Qiaquick kit ( Qiagen ) according to the manufacturer 's instructions . 
+ Terminal labeling of the puriﬁed DNA fragments and hybridization to the pCAR1 and KT2440 chromosomal tiling arrays were performed as described above . 
+ Signal intensities of DNA hybridization on the arrays were computed to identify protein-binding sites using TAS , which uses nonparametric quantile normalization and a Hodges-Lehmann estimator for fold enrichment ( Affymetrix Tiling Array Software v1 .1 User 's Guide ) with the biologically duplicated afﬁnity-puriﬁed fractions ( treatment DNA ) and those of DNA isolated from the biologically duplicated wholecell extract fractions before puriﬁcation ( control DNA ) . 
+ Microarray data accession number . 
+ The array data reported in this article have been deposited in the Gene Expression Omnibus ( GEO ) of the National Center for Biotechnology Information ( NCBI ) ( GEO ; http://www.ncbi.nlm.nih . 
+ gov/geo / ) under the GEO Series accession no . 
+ GSE21968 . 
+ RESULTS AND DISCUSSION
+ Transcriptional proﬁles of pmr and other H-NS family genes in P. putida KT2440 . 
+ Transcriptional levels of H-NS family proteins change in the presence or absence of other H-NS homologous proteins , and they are not always transcribed under the same growth condition ( 7 , 27 , 44 ) . 
+ First , we determined the transcription start point ( tsp ) of pmr to construct a pmr disruptant strain by extinguishing its transcription ( see Materials and Methods ) . 
+ The tsp of pmr ( 1 ) was located 69 bp upstream of the annotated start codon of Pmr ( nucleo-tide at 77571 of pCAR1 ; see Fig . 
+ S1 in the supplemental material ) , corroborating our ﬁndings using a previous tiling array analysis ( 26 ) . 
+ To clarify the transcriptional proﬁles of the H-NS family genes , qRT-PCR analyses were performed for KT2440 , KT2440 ( pCAR1 ) , and KT2440 ( pCAR1 pmr ) , along their growth curves . 
+ As demonstrated in Fig. 1A and B , the transcriptional levels of turA ( PP_1366 ) , turC ( PP_0017 ) , and turD ( PP_3693 ) in early log-phase growth were higher than those in the stationary phase , whereas turB ( PP_3765 ) and turE ( PP_2947 ) were transcribed in the late log and stationary growth phases , compared with the early log phase growth in KT2440 , conﬁrming a previous report ( 48 ) . 
+ In KT2440 ( pCAR1 ) , pmr was transcribed in early log phase growth ( the cell turbidity was about 0.18 at 600 nm ) , and its transcription was reduced in the stationary phase ( the cell turbidity was 0.56 ) ( Fig. 1A and B ) . 
+ The transcriptional proﬁles of other H-NS-encoding genes did not change with pCAR1 carriage or with pmr disruption ( Fig. 1A and B ) . 
+ Similar results were also obtained by transcriptome analysis using tiling arrays with these three strains in early log phase growth : the signal intensities of the H-NS family proteins did not change with pCAR1 carriage or with pmr disruption ( Fig. 1C ) . 
+ Taken together with the results of the transcriptional proﬁles of pmr , turA , turB , turC , turD , and turE , pmr and turA were the primary transcribed genes in the early log phase growth , whereas turB was transcribed in the late log and stationary growth phases in KT2440 ( pCAR1 ) ( Fig. 1B and C ) . 
+ Translational proﬁles of Pmr in P. putida KT2440 ( pCAR1 ) . 
+ Because previous reports indicated that the translational pro-ﬁles of some H-NS family proteins were different from their transcriptional proﬁles ( 7 , 44 ) , we conﬁrmed the translational proﬁles of Pmr . 
+ Western blot analysis was performed with the crude extract from KT2440 ( pCAR1 ) cells in the growth phase that expressed C-terminal-6-His-tagged Pmr . 
+ Pmr signals in KT2440 ( pCAR1 ) were detected throughout the growth phase , and translational levels of Pmr were higher in the late log and stationary growth phases than in early log phase growth ( Fig. 1A and D ) . 
+ Notably , the translational proﬁle of Pmr differed from the transcriptional proﬁle ( Fig. 1B and D ) and from those of other previously reported H-NS family proteins ( 7 , 44 ) . 
+ Currently , we could not explain the physiological meaning ( s ) of the discrepancy between pmr transcription and translation . 
+ Reciprocal transcription and translation of a gene encoding an H-NS-like protein , Sfh of pSF-R27 , have been investigated in detail before ( 14 ) . 
+ Those authors showed that a blockade of sfh mRNA translation occurred in early exponential growth and was relieved at the onset of stationary phase , responsible for the expression pattern of Sfh ( 14 ) . 
+ They proposed that con-ﬁnement of Sfh expression may ensure that the conjugative plasmid pSF-R27 carrying sfh minimizes the disruption on the physiology of the host cell ( 14 ) . 
+ It is therefore possible that Pmr translation may have been regulated in a similar manner to reduce effects on the host cell ; however , further investigations are still necessary to clearly explain the Pmr translation mechanism . 
+ Pmr interacts with itself and with three other H-NS family proteins . 
+ Many reports have indicated that H-NS family proteins can interact with themselves and with paralogous proteins , such as StpA , Hfp , or MvaU ( 7 , 22 , 27 , 44 ) . 
+ Thus , Pmr may interact with itself or other H-NS family proteins expressed from the host chromosome . 
+ To assess this possibility , we performed pull-down assays followed by Western blot analyses to clarify whether Pmr interacted with itself and/or other H-NS family proteins . 
+ As revealed in Fig. 2B ( lane 1 of each sample ) , we detected anti-FLAG signals from each crude extract , indicating that each H-NS family protein was expressed in E. coli . 
+ Anti-His signals were also detected in each eluant after the pull-down assays , indicating that His-tagged Pmr existed in each eluant ( data not shown ) . 
+ In contrast , anti-FLAG signals in the eluants were detected only in the mixtures of His-tagged Pmr with FLAG-tagged Pmr , TurA , TurB , and TurE , whereas those with FLAG-tagged TurC and TurD were not detected ( Fig. 2B , lane 2 of each sample ) . 
+ This result indicates that the strength of the interactions between Pmr and itself or between Pmr and TurA , TurB , or TurE is higher than those between Pmr and TurC or TurD . 
+ One important featur of H-NS family proteins is their modular structure ( 10 ) . 
+ Additionally , KT2440 proteins have putative structures similar to that of H-NS : a well-conserved amino-terminal oligomerization domain ( see Fig . 
+ S3A , blue box , in the supplemental material ) , a conserved carboxyl-terminal nucleic acid-binding domain ( see Fig . 
+ S3A , red box ) , and a poorly conserved ﬂexible linker that connects the two aforementioned domains ( see Fig . 
+ S3A ) . 
+ When the amino acid sequences of the H-NS family proteins of KT2440 were aligned , their putative oligomerization domains at the N-terminal regions were well conserved ( see Fig . 
+ S3A ) , although the identity between H-NS family proteins of KT2440 , including Pmr and the H-NS protein of E. coli , was low ( see Fig . 
+ S3B ) . 
+ Although it was difﬁcult to predict why Pmr could have heteromeric interactions with three H-NS family proteins but not with two other H-NS family proteins , some residues from the latter may be important for the interaction . 
+ Notably , the homologous proteins of TurA and TurB are conserved in all Pseudomonadaceae species , but TurC , TurD , and TurE are species-speciﬁc proteins ( 30 ) encoded in the putative horizontally acquired DNA region ( 24 ) . 
+ Taken together with the result that turA and turB were transcribed primarily in the early log and late log growth phases , respectively , Pmr may primarily interact with TurA and TurB , although the functional signiﬁcance of TurE is presently unclear . 
+ Considering the reciprocal transcription and translation of Pmr ( Fig. 1B and D ) , it is necessary to analyze the translational levels of Tur proteins in vivo . 
+ Phenotypic alteration by pmr disruption . 
+ To assess the effects of pmr disruption on the phenotypes of KT2440 ( pCAR1 ) , comparisons of the catabolic abilities of KT2440 ( pCAR1 ) and KT2440 ( pCAR1 pmr ) were performed using Biolog PM analyses by measuring the absorbance of colored cultures derived from a tetrazolium dye used as a reporter of cell respiration . 
+ From the comparisons for each of the 190 substrates as a sole carbon source , reproducible reductions of the maximum absor-bance of the color were observed in the KT2440 ( pCAR1 pmr ) culture , compared with results for the KT2440 ( pCAR1 ) culture , with nine compounds ( D-fructose , L-serine , L-valine , saccharic acid , D-malic acid , pyruvic acid , methyl pyruvate , D-ribono-1 ,4 - lactone , and inosine ; see Fig . 
+ S4 in the supplemental material ) . 
+ We did not detect any difference between the two strains in the culture using the other carbon sources ( for ex ample , the result with D-glucose was shown in Fig . 
+ S4 ) . 
+ These results indicated that pmr disruption affected the catabolic abilities of KT2440 ( pCAR1 ) with several carbon sources , suggesting that Pmr may function as a global regulator of many genes . 
+ Transcriptome alteration by pmr disruption . 
+ To conﬁrm the effects of pmr disruption on the host cells , we performed transcriptome comparisons between KT2440 ( pCAR1 ) and KT2440 ( pCAR1 pmr ) using custom-made tiling arrays of genome sequences of pCAR1 and the KT2440 chromosome ( 26 , 35 ) . 
+ To evaluate the transcriptional and translational proﬁles of pmr ( Fig. 1 ) , transcriptome comparisons were performed for cells in early log phase growth . 
+ Overview . 
+ We found that the transcription of 31 genes on pCAR1 and 159 genes on the KT2440 chromosome were altered by pmr disruption , with a fold change of 1.5 ( see Materials and Methods ; see also Tables S2 and S3 in the supplemental material ) . 
+ We identiﬁed 2 and 19 upregulated genes on pCAR1 and the KT2440 chromosome , respectively , and 29 and 140 downregulated genes on pCAR1 and the KT2440 chromosome , respectively . 
+ Based on our previous study ( 35 ) , we identiﬁed 112 genes altered by pCAR1 carriage with a fold change of 1.5 in both of the duplicate data ( see Table S3 ) . 
+ Notably , the number of downregulated genes following pmr disruption was larger than that with pCAR1 carriage ( see Fig . 
+ S5 ) , suggesting that Pmr may play an important role in mediating the transcription of the chromosomal genes of the host KT2440 by pCAR1 carriage . 
+ The comparison of the transcriptome changes with pCAR1 carriage with those with pmr disruption enabled us to classify 5,398 genes of KT2440 ( after rRNA and tRNA removal ) based on their transcriptional patterns . 
+ First , the transcription of 5,146 genes was not affected by pCAR1 carriage or by pmr disruption . 
+ Among the remaining 252 genes , 43 ( group A ) or 50 ( group B ) were upregulated or downregulated by pCAR1 carriage , respectively , but neither of their transcription levels was affected by pmr disruption ( Fig. 3 ; see also Table S4 ) . 
+ Only one gene ( group D ) was downregulated by both pCAR1 carriage and pmr disruption ( no gene was classiﬁed into group C ) ( Fig. 3 and Table 2 ; see also Table S4 ) . 
+ Seventeen genes ( group E ) were upregulated by pCAR1 carriage but downregulated by pmr disruption , and one gene ( group F ) was the reverse ( Fig. 3 ; see also Table S4 ) . 
+ In total , 122 genes ( group G ) or 18 genes ( group H ) were upregulated or downregulated by pmr disruption , respectively , but neither group was affected by pCAR1 carriage ( Fig. 3 ; see also Table S4 ) . 
+ Doyle et al. ( 15 ) proposed that these H-NS proteins encoded on plasmids have `` stealth '' functions to minimize the effect on host strain ﬁtness , comparing the number of genes whose transcriptional levels were altered in the presence or absence of the H-NS protein . 
+ Regarding KT2440 ( pCAR1 ) , the number of differentially transcribed genes with pmr disruption ( 159 genes on the chromosome ) was larger than that with pCAR1 carriage ( 112 genes ) . 
+ Additionally , 88 % of them ( belonging to group G or H ) were altered only by the absence of Pmr , suggesting that Pmr had a `` stealth '' function , as mentioned above . 
+ The transcription levels of only 12 % of the differentially transcribed genes with pmr disruption ( 18 genes ) reverted t levels similar to those in pCAR1-free KT2440 ( groups E and F in Fig. 3 ; see also Table S4 ) , suggesting that these genes were regulated primarily by Pmr itself , directly or indirectly . 
+ These results suggest that Pmr is a key global regulator of many genes , both on pCAR1 and on the host chromosome . 
+ Martins dos Santos et al. ( 24 ) demonstrated that KT2440 had many putative horizontally acquired DNA regions . 
+ These regions include 1,105 ORFs , corresponding to about 20 % of the total ORFs in KT2440 . 
+ Because H-NS family proteins bind to horizontally acquired DNA regions ( 16 , 28 ) , we calculated the ratio of ORFs in the regions in the above differentially transcribed genes ( Fig. 3 ) . 
+ Of the 112 genes ( Fig. 3 , groups A to F ) differentially transcribed by pCAR1 carriage , 23 ( 21 % ) were located in the putative horizontally acquired DNA region ( Table 2 ; see also Table S4 in the supplemental material ) . 
+ Conversely , 56 ( 35 % ) of 159 genes differentially transcribed by pmr disruption ( Fig. 3 , groups C to H ) were in this region ( Table 2 ; see also Table S4 ) . 
+ Notably , the proportions of groups B , G , and H were high : 28 % , 39 % , and 28 % , respectively ( Table 2 ; see also Table S4 ) . 
+ The average G C content of pCAR1 and the KT2440 chromosome is 56.3 % and 61.6 % , respectively . 
+ We then calculated the G C content in the 500 bp upstream of each ORF . 
+ The G C content of pCAR1-borne genes differentially transcribed by pmr disruption was signiﬁcantly below the average : for most upstream regions of 30 among 31 affected ORFs ( Table 3 ) , it was below 61.6 % , and for those of 27 ORFs , including the car or parAB genes ( see Table S2 ) , it was even below 56.3 % ( Table 3 ) . 
+ Concerning the ORFs on the KT2440 chromosome , the G C content of the upstream regions of 92 ( 58 % ) among 159 ORFs was below 61.6 % , and that for 37 ORFs was below 56.3 % ( Table 3 ) . 
+ As revealed in Table 3 , the ratio of these ORFs to the total affected ORFs was higher ( 87 % in pCAR1 and 23 % in the KT2440 chromosome ) than the ratio of the ORFs whose upstream regions were low in G C content ( below 56.3 % ) to the total ORFs ( 64 % in pCAR1 and 17 % in the KT2440 chromosome ) . 
+ Notably , the ORFs with a ratio of 56.3 % in the upstream region ( 16 ORFs ) among the ORFs affected by pCAR1 carriage ( 112 ORFs ) was 14 % ( Table 3 ) . 
+ Thus , some ORFs with low-G C regions may be speciﬁcally regulated by Pmr . 
+ Downregulated genes on pCAR1 with pmr disruption . 
+ The transcription levels of the genes on the car operon , involved in carbazole degradation , were downregulated ( Fig. 4A ; see also Table S2 in the supplemental material ) . 
+ When KT2440 ( pCAR1 ) is grown with succinate , the car operon is constitutively transcribed from the PcarAa promoter ( 26 , 35 ) , and it is induced by anthranilate , an intermediate of the carbazole deg-radation pathway , from the Pant promoter , further upstream ( 42 ) . 
+ Thus , the constitutively expressed carbazole-degrading enzymes will be required to produce anthranilate . 
+ This suggests that the downregulation of the constitutive transcription levels of car genes may have caused the growth delay with carba-zole . 
+ In fact , the growth rate of KT2440 ( pCAR1 pmr ) was delayed compared with that of KT2440 ( pCAR1 ) in NMM-4 buffer with carbazole as a sole carbon source ( data not shown ) . 
+ The transcriptional levels of the parAB genes were also reduced in the pmr disruptants ( Fig. 4B ; see also Table S2 ) . 
+ The parAB genes are required for the stable maintenance of pCAR1 in the host strain ( 36 ) , and thus , the downregulation of these genes may cause instability of pCAR1 . 
+ However , we did not detect changes in the stability of pCAR1 or pCAR1 pmr in KT2440 cells ( data not shown ) , suggesting that the effects of the downregulation of the parAB genes on plasmid stability may be insigniﬁcant . 
+ It is also possible that the chromosomally encoded ParAB system ( ParABKT2440 ) for the partition of the KT2440 chromosome may have been involved in plasmid partition ; however , the transcriptional levels of these genes were unaltered in the pmr disruptant ( data not shown ) . 
+ Additionally , the cis-acting centromere-like parS sequence is indispensable for the function of ParABKT2440 ; however , the 16-nucleotide ( nt ) parS sequence of P. putida KT2440 ( 5 - TGTTNCACGT GAAACA-3 ) ( 3 , 18 ) was not found in the pCAR1 sequence ( data not shown ) . 
+ The reason pCAR1 pmr was stable in the host strain was not clear . 
+ Notably , the transcriptional levels of the car and parAB genes were altered in different host strains ( 26 , 35 ) , and the transcription of these genes may be related to the Pmr concentration . 
+ pmr disruption alters chromosomal gene transcription that is upregulated by pCAR1 carriage . 
+ The mexEF-oprN operon , encoding the efﬂux pump , was upregulated in KT2440 ( pCAR1 ) and downregulated in KT2440 ( pCAR1 pmr ) ( Fig. 5A ; see also group E of Table S4 in the supplemental material ) . 
+ In our previous study , these gene products enhanced the chloramphenicol ( Cm ) resistance of the host strain ; KT2440 ( pCAR1 ) showed resistance to concentrations of Cm higher than 300 g/ml , although KT2440 was not able to grow with that concentration . 
+ ( 35 ) . 
+ Therefore , we assessed the Cm resistance ( 300 g/ml ) of the pmr disruptants . 
+ Cm resistance reverted to the levels of pCAR1-free KT2440 , indicating that the downregulation of the mexEF-oprN operon may occur with the loss of resistance at that concentration . 
+ Westfall et al. ( 45 ) reported that the transcription of mexEF-oprN orthologous genes in P. aeruginosa PAO1 ( normally untranscribed ) was induced in the mvaT ( PA4315 ) mutant on the PAO1 chromosome . 
+ Thus , H-NS family proteins may be involved in the transcriptional regulation of these genes in PAO1 . 
+ Although our case contrasted with the PAO1 case , i.e. , pmr disruption caused the downregulation of the mexEF-oprN operon , Pmr may have contributed to the transcriptional regulation of these genes . 
+ Herrera et al. ( 19 ) recently reported that PhhR ( PP_4489 ) , a transcriptional regulator of phenylalanine hydroxylase phhAB genes , modulates the level of expression of mexEF-oprN together with MexT ( PP_2826 ) . 
+ Notably , the transcriptional levels of both phhR and mexT were not changed by pCAR1 carriage or by pmr disruption ( data not shown ) , suggesting that Pmr may be the third element for the regulation of the mexEF-oprN operon . 
+ The parI gene encodes a putative ParA-like ATPase containing an N-terminal DNA-binding motif , and its transcription was upregulated in KT2440 ( pCAR1 ) but downregulated in KT2440 ( pCAR1 pmr ) ( Fig. 5B ; see also group E of Table S4 in the supplemental material ) . 
+ This corroborated our previous results that the parI promoter was activated in the presence of pCAR1 because of the parA product from pCAR1 ( 25 ) . 
+ Therefore , the decrease in the parI transcriptional level in KT2440 ( pCAR1 pmr ) was caused by the reduced parA transcription ( Fig. 4B ; see also Table S2 ) , although the reasons for the parA gene downregulation in KT2440 ( pCAR1 pmr ) remain unclear . 
+ Pmr preferentially binds to foreign DNA and low-G C regions of the host chromosome . 
+ Because the transcription of many genes was affected by pCAR1 carriage and by pmr disruption , we identiﬁed genome-wide Pmr-binding DNA regions on both pCAR1 and the KT2440 chromosome . 
+ We performed ChAP-chip analyses to identify the Pmr-binding sites on the KT2440 chromosome and in pCAR1 in early log phase growing cells , as well as transcriptome analyses , although the translational levels of Pmr were higher in the late log and stationary growth phases than in early log phase growth ( Fig. 1D ) . 
+ Consequently , 241 and 26 Pmr-binding sites were detected ( with a P value of 0.01 ) on the KT2440 chromosome and in pCAR1 , respectively ( see Table S5 in the supplemental material ) . 
+ First , we calculated the G C content of the regions identiﬁed . 
+ The average G C content of the 241 Pmr-binding regions in the KT2440 chromosome was signiﬁcantly lower ( 52.5 % ) than that of the entire KT2440 chromosome ( 61.6 % ) . 
+ The 26 Pmr-binding regions in pCAR1 also demonstrated an average G C content ( 52.5 % ) that was lower than that of the entire pCAR1 plasmid ( 56.3 % ) . 
+ Indeed , a high association was found between the Pmr-binding sites on the KT2440 chromosome and the putative foreign DNA region ( Fig. 6A ) . 
+ Notably , 73 % of the Pmr-binding sites in the KT2440 chromosome were located in foreign DNA regions . 
+ Interestingly , many Pmr binding sites in pCAR1 overlapped with the localization of the differentially transcribed genes with pmr disruption ( Fig. 6B ) . 
+ Similarly , many binding sites in the KT2440 chromosome were also found near regions where the differentially transcribed genes localized , although not every gene near a binding-site region was affected by pmr disruption ( Fig. 6A ) . 
+ These data indicate that Pmr regulates the transcription of many genes by binding to intergenic or intragenic regions of target genes . 
+ To determine the relative positions of the Pmr-binding sites to each intergenic or intragenic region of the ORFs , distribution analyses were performed for the ChAP chip analysis data ( Fig. 7 ) . 
+ The Pmr binding site number peaked at around 200 bp upstream from the translational start point and at around 300 bp downstream from the translational endpoint ( Fig. 7 ) , which was similar in the ChAP-chip analysis when different P-value thresholds were used ( Fig. 7 ) . 
+ This analysis indicated that Pmr may bind preferentially to intergenic regions rather than to intragenic regions of ORFs , con-ﬁrming many reports that H-NS family proteins regulate gene expression by binding to target promoter regions ( 10 , 16 ) . 
+ Our ChAP-chip analysis demonstrated that Pmr bound preferentially to DNA with a low G C content in KT2440 ( pCAR1 ) and that Pmr bound to intergenic regions and regulated the transcription of genes in the ﬂanking regions of the binding sites . 
+ We also performed ChAP-chip analysis to identify the binding sites of the TurA and TurB proteins , which are encoded on the KT2440 chromosome . 
+ However , the detected TurA - and TurB-binding sites were almost identical to those of Pmr , and most of them were DNA regions with low G C content ( data not shown ) . 
+ These results were similar to those observed with E. coli or P. aeruginosa PAO1 , in which the two H-NS family proteins ( H-NS and StpA or MvaT and MvaU ) bound to the same regions of the chromosome ( 5 , 43 ) . 
+ Recently Dillon et al. ( 8 ) reported that the DNA binding sites of plasmid-encoded Sfh of Salmonella overlapped with those of H-NS . 
+ Sfh does not bind uniquely to any site , and the number of binding sites in Sfh is smaller than that in H-NS . 
+ Although Sfh binding sites are located within H-NS , the DNA binding sites greatly expand in the absence of H-NS , suggesting that Sfh may play a `` backup '' role for H-NS ( 8 ) . 
+ These facts suggest that the three protein 
+ Pmr , TurA , and TurB may function coordinately as global regulators in the cells and that Pmr may also perform `` backup '' functions for the other proteins , different from those of H-NS and Sfh , because the binding sites of Pmr , TurA , and TurB are identical . 
+ However , previous genome-wide analyses of the binding sites of H-NS family proteins , including ours , did not necessarily take into account the in vivo protein-protein interaction ( s ) . 
+ In other words , the detected sites are not necessarily showing how they bind to the DNA sequences by forming the homo - or heteromultimer of the H-NS family proteins in vivo . 
+ Considering the coordinate functions for DNA binding and transcriptional regulation by Pmr and TurA to TurE , analyses from protein structure viewpoints will be necessary to understand how they compose the homo - or heteromultimer in vivo in the presence or absence of target DNA . 
+ Conclusions . 
+ In this study , we demonstrated that the plas-mid-encoded H-NS family protein Pmr forms homomeric and heteromeric oligomers in vitro and that pmr , turA , and turB are the primary transcribed genes at different growth phases . 
+ We also revealed that pmr disruption affected the carbon catabo-lism of KT2440 ( pCAR1 ) and that Pmr is a key factor that regulates the transcription of genes on both pCAR1 and the host chromosome in two ways : ( i ) Pmr may alter the transcriptional levels of genes in group E or F , such as mexEF-oprN and parI , and ( ii ) Pmr may minimize the effect of the transcription of many genes in group G or H , such as those in the putative foreign DNA regions . 
+ The identiﬁcation of genome-wide binding sites in Pmr by ChAP-chip analysis indicated that Pmr binds to putative foreign DNA regions with low G C content . 
+ Additionally , Pmr binds preferentially to intergenic regions and may regulate many genes in the ﬂanking regions of the binding sites . 
+ These ﬁndings indicate that Pmr is involved in the regulation of the expression of many genes , directly or indirectly , and that this regulation may be closely related to its DNA-binding regions and its interaction with other H-NS family proteins , primarily TurA and TurB . 
+ Recently three H-NS family proteins in pathogenic E. coli , the endogenous hns and stpA genes and the horizontally acquired hfp gene , were shown to be differentially transcribed at distinct temperatures ( 27 ) . 
+ Thus , we must further analyze the transcriptional and translational levels of Pmr , TurA , TurB , TurC , TurD , and TurE under conditions other than those we have used to date . 
+ Notably , the pmr gene is conserved in another IncP-7 plasmid , pWW53 ( 46 ) , indicating that Pmr is an important protein for IncP-7 plasmids . 
+ Moreover , H-NS family proteins are expressed from other plasmids , such as H-NS from R27 ( IncHI ) ( 17 ) , Sfh from pSf-R27 ( IncHI ) ( 15 ) , Orf4 from R446 ( IncM ) ( 41 ) , or an undeposited ORF from pQBR103 ( IncP-3 ) ( 40 ) . 
+ A key function of H-NS expressed from mobile genetic elements is maintaining host cell ﬁtness ( 15 ) . 
+ H-NS is a member of the NAP family , and its coordinate functions with other NAPs are also important for host cell ﬁtness maintenance ( 9 , 11 ) . 
+ In the case of the R27 studies , the Hha-like protein , a protein-protein modulator of H-NS activity , is encoded on the plasmid ( 17 ) ; however , no candidates for Pmr modulator-encoding genes are found on pCAR1 . 
+ Interestingly , pCAR1 harbors two other genes encoding putative NAPs other than Pmr , although their transcriptional levels were unaltered by pmr disruption . 
+ Analyses of the function ( s ) of these gene products are necessary to understand how these H-NS family proteins behave when pCAR1 is introduced into the host cell by conjugative transfer . 
+ Such information would help to explain the adaptive and evolutionary mechanisms of bacteria acquiring foreign genes by horizontal gene transfer . 
+ ACKNOWLEDGMENTS
+ We thank Akira Yokota of the Institute of Molecular and Cellular Biosciences , the University of Tokyo , for use of his Biolog MicroLog MicroStation system . 
+ This study was supported by the Program for Promotion of Basic Research Activities for Innovative Biosciences ( PROBRAIN ) in Ja-pan . 
+ Y.T. was supported by research fellowships from the Japan Society for the Promotion of Science ( JSPS ) for Young Scientists .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/20817769.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/20817769.txt 0 → 100644
View file @27818a9
+ A Polymerase Trafﬁcking in Bacillus subtilis Cells †
+ To obtain insight into the in vivo dynamics of RNA polymerase ( RNAP ) on the Bacillus subtilis genome , we analyzed the distribution of the A and subunits of RNAP and the NusA elongation factor on the genome in exponentially growing cells using chromatin afﬁnity precipitation coupled with gene chip mapping ( ChAP-chip ) . 
+ In contrast to Escherichia coli RNAP , which often accumulates at the promoter-proximal region , B. subtilis R P is evenly distributed from the promoter to the coding sequences . 
+ This ﬁnding suggests that , in general , B. subtilis RNAP recruited to the promoter promptly translocates away from the promoter to form the elongation complex and proceeds without intragenic transcription attenuation . 
+ We detected RNAP accumulation in the promoter-proximal regions of some genes , most of which can be identiﬁed as transcription attenuation systems in the leader region . 
+ Our ﬁndings suggest that the differences in RNAP behavior between E. coli and B. subtilis during initiation and elongation steps might result in distinct strategies for postinitiation control of transcription . 
+ The E. coli mechanism involves trapping at the promoter and promoter-proximal pausing of RNAP in addition to transcription attenuation , whereas transcription attenuation in leader sequences is mainly employed in B. subtilis . 
+ Biochemical studies on RNA polymerase ( RNAP ) show that transcriptional initiation involves three steps ( 20 ) : ( i ) recruitment of RNAP to the promoter via recognition of consensus elements within the promoter sequence to form a closed complex ; ( ii ) isomerization of the closed complex to an open complex , with opening of double-stranded DNA ; and ( iii ) promoter clearance , coupled with formation of the elongation complex ( EC ) and initiation of transcriptional elongation . 
+ The efﬁciency of transcriptional initiation depends on the rate of each step . 
+ While recruitment of RNAP to promoters is the main target for control of gene expression , transitions to the open complex and the subsequent elongation complex are also proposed regulatory targets ( 29 ) . 
+ Recently , in vivo trafﬁcking of RNAP during transcriptional initiation has been explored for several organisms using chromatin immunoprecipitation coupled with the gene chip mapping method ( ChIP-chip ) . 
+ ChIP-chip analyses of RNAP distribution on the Escherichia coli genome reveal typically higher RNAP association at promoter regions than within coding sequences ( 21 , 26 ) . 
+ Initially , it was proposed that promoterproximal peaks reﬂected RNAPs trapped at promoters , designated `` poised RNAP , '' since about 23 % of RNAP peaks at promoters are not associated with transcripts ( 26 ) . 
+ Consistent with this hypothesis , RNAP trapping at the promoter has been reported for a number of E. coli genes . 
+ The GalR repressor suppresses the galP1 promoter by trapping RNAP in an intermediate state between the closed and open complexes ( 3 ) . 
+ * Corresponding author . 
+ Mailing address : Graduate School of Information Science , Nara Institute of Science and Technology , 8916-5 , Takayama , Ikoma , Nara 630-0192 , Japan . 
+ Phone : 81-743-72-5430 . 
+ Fax : 81-743-72-5439 . 
+ E-mail : nogasawa@bs.naist.jp . 
+ † Supplemental material for this article may be found at http://jb . 
+ asm.org / . 
+ ‡ These authors contributed equally . 
+ Published ahead of print on 3 September 2010 . 
+ Moreover , the ArgP activator bound to lysine restrains RNAP at the argO promoter in a step following open complex formation that precedes productive transcription , and the complex is altered to a transcriptionally active state upon arginine addition ( 18 ) . 
+ In vivo and in vitro footprinting analyses additionally suggest that 10 to 20 % of transcriptional complexes in E. coli stall at the promoter or promoter-proximal regions , depending on the 10 and/or 10-like sequences ( 11 ) . 
+ However , Mooney et al. ( 21 ) proposed that promoter-proximal RNAP peaks result from transcriptional attenuation rather than the presence of poised RNAP at the promoter regions , since the average RNAP peak ( subunit peak ) is offset from the average 70 peak in the direction of transcription and coincides with that of NusA , which associates with the elongating complex ( 21 ) . 
+ Promoter-proximal RNAP peaks are common for human and Drosophila genes , where they appear to be correlated with developmentally regulated rather than housekeeping genes ( 10 , 22 , 34 ) . 
+ In Saccharomyces cerevisiae , RNAPs are located upstream of several hundred genes prior to their transcriptional activation during the stationary phase , possibly allowing immediate adaptation to changes in environmental conditions ( 25 ) . 
+ Here , we report in vivo RNAP trafﬁcking in the Gram-positive bacterium Bacillus subtilis . 
+ The major vegetative RNAP holoenzyme of B. subtilis is E A , the counterpart of E 70 in E. coli . 
+ Both RNAPs have the same core subunit composition , comprising 2 , and holoenzymes recognize the same consensus sequences at the 35 ( TTGACA ) and 10 ( TATAAT ) regions of the promoter . 
+ However , biochemical analyses show characteristic differences in their activities . 
+ E. coli RNAP forms a stable open complex , evident from the protection of DNA downstream of the promoter region from DNase I digestion upon binding of RNAP and formation of a heparinresistant RNAP-promoter complex ( 23 , 30 ) . 
+ In contrast , B. subtilis RNAP is unable to protect DNA downstream of the promoter , indicating lower activity of this polymerase in forming a stable open complex than E. coli RNAP ( 1 , 6 , 23 , 27 , 30 ) . 
+ Interestingly , chromatin afﬁnity precipitation ( ChAP ) - chip analysis , a modiﬁed ChIP-chip method developed in our laboratory ( 16 ) , showed that B. subtilis RNAP is distributed evenly from the promoter to the coding sequences in the majority of transcriptional units ( TUs ) . 
+ We detected some promoter-prox-imal peaks of RNAP , which were mostly attributed to transcription attenuation mechanisms . 
+ Accordingly , we conclude that RNAP mainly initiates transcription without trapping in a poised or paused state at the promoter or promoter-proximal sites and proceeds without intragenic transcription attenuation in B. subtilis cells . 
+ These ﬁndings suggest that the differences in RNAP behavior during initiation and elongation steps in E. coli and B. subtilis might result in different strategies for the postinitiation control of transcription . 
+ The E. coli mechanism involves trapping at the promoter region and promoter-proximal pausing of RNAP in addition to transcription attenuation , whereas transcription attenuation in leader sequences is mainly observed in B. subtilis . 
+ MATERIALS AND METHODS
+ Construction of B. subtilis strains . 
+ To construct B. subtilis strains expressing A , , and NusA fused to histidine tags , 500-bp fragments encompassing the 3 portions of the corresponding genes , except the stop codons , were ampliﬁed from B. subtilis 168 genomic DNA by PCR using the primer sets rpoC.f-rpoC . 
+ r , sigA.fsigA.r , and nusA.f-nusA . 
+ r , respectively ( see Table S3 in the supplemental material ) , and cloned between the HindIII or EcoRI and XhoI sites of pMUTinHis ( 15 ) . 
+ The resultant plasmids were integrated into the B. subtilis genome via single crossing-over to obtain strains designated 168rpoCHis , 168sigAHis , and 168nusA-His , respectively . 
+ ChAP-chip analysis . 
+ The 168rpoCHis , 168sigAHis , and 168nusAHis stains were cultivated in LB at 37 °C under aerobic conditions and harvested at log phase ( optical density at 600 nm [ OD600 ] 0.4 ) . 
+ Protein-DNA cross-linking with formaldehyde , puriﬁcation of protein-DNA complexes with the Ni column , and mapping of copuriﬁed DNA fragments using a custom Affymetrix tiling chip were performed as described previously ( 15 ) . 
+ Protein binding signals were analyzed and visualized using the software package In Silico Molecular Cloning , array edition ( In Silico Biology ) . 
+ Protein binding signal intensities were estimated by dividing the signal intensities of DNA in the afﬁnity-puriﬁed fraction ( ChAP DNA ) by control DNA intensity ( enrichment factor ) as previously described ( 32 ) . 
+ The whole-cell extract fraction before puriﬁcation of A ( for A and ChAP-chip ) or NusA ( for NusA ChAP-chip ) was used as control DNA for ChAP-chip analyses . 
+ Distribution of protein binding signals along the genome coordinate in 2 independent ChAP-chip analyses for each protein is shown in Fig . 
+ S1 in the supplemental material . 
+ Transcriptome analysis . 
+ Total RNA was puriﬁed from wild-type cells cultured under conditions similar to those employed for ChAP-chip experiments . 
+ Synthesis and terminal labeling of cDNA , hybridization to the tiling chip , and data processing , including normalization of hybridization intensities of cDNA by those of genomic DNA , were performed according to previous protocols ( 16 ) . 
+ The distribution of transcription signals along the genome coordinate was visualized with the In Silico Molecular Cloning program , array edition ( In Silico Biology ) . 
+ Extraction of TUs . 
+ We extracted transcriptional units ( TUs ) using the average transcription signal intensities of each probe from three independent experiments . 
+ To eliminate the effects of artiﬁcial spikes and inefﬁcient hybridization signals , sliding-window methods were usually applied for transcriptome analysis using a tiling chip , whereby the average values of contiguous probes in a window are determined to extract transcriptionally active regions with signals above a threshold value . 
+ However , symmetrical sliding-window methods tend to produce biased estimates of the starting points and endpoints of transcribed regions depending on the signal levels compared to background signals ( 13 ) . 
+ To precisely determine the initiation and termination sites of transcripts , we designed a novel procedure applying asymmetrical sliding windows . 
+ In addition to the extraction of transcriptionally active regions using a 550-bp symmetrical sliding window , we applied two asymmetrical sliding windows , one comprising the regions 150 bp upstream and 400 bp downstream from each probe and the other comprising the regions 400 bp upstream and 150 bp downstream . 
+ In cases where the signal intensities of more than half of the probes in the upstream and downstream windows were higher than the threshold value , the probe was judged `` transcription positive . '' 
+ We set the threshold value as 0.5 in these analyses ( see Fig . 
+ S5A in the supplemental material ) . 
+ In addition , if the distance between neighboring TUs was determined to be less than 150 bp ( corresponding to a maximum of two probes in the coding regions and 6 probes in the intergenic regions ) , the TUs were concatenated . 
+ Finally , we combined the positive probes determined from the 3 windows , and 2,073 TUs ( deﬁned as a region containing positive probes determined by at least one window ) were obtained . 
+ To evaluate the general behavior of the B. subtilis RNAP , we extracted TUs suitable for the analysis using criteria similar to those adopted for E. coli studies ( 21 , 26 ) . 
+ If TUs have multiple promoters , transcriptional signal intensities tend to increase stepwise at initiation sites . 
+ These TUs were removed by extraction using a higher threshold value of 0.75 . 
+ In total , 1,258 TUs displaying coincident transcription start sites ( TSSs ) determined using both thresholds were selected ( see Fig . 
+ S5B in the supplemental material ) . 
+ Next , we extracted 741 TUs of more than 200 bp . 
+ Among these , we extracted 344 TUs located more than 500 bp from neighboring TUs to avoid overlapping protein binding and transcription signals . 
+ Finally , we visually selected 180 TUs with clear single TSSs , designated `` highquality TUs '' in this study ( see Table S1 in the supplemental material ) . 
+ Detection of A binding sites . 
+ The A binding peaks were computationally detected by searching more than 2 consecutive probes in noncoding regions with A binding signals higher than the threshold value ( determined as 4.0 for experiment 1 and 2.0 for experiment 2 ) , depending on their background levels ( see Fig . 
+ S4 in the supplemental material ) . 
+ Rough spacing of probes in the coding regions on our tiling chip did not allow precise detection of the A binding sites in the A coding region . 
+ Next , we selected binding sites whose center points of peaks were located within 100 bp ( covered by maximally 8 probes ) in two experiments , excluding promoters of rRNA and tRNA operons . 
+ The average median position was assigned as the binding position of A , yielding 571 sites . 
+ Estimation of the maximum enrichment position of the protein binding pro-A ﬁle . 
+ We estimated positions where , , and NusA binding signals reach the maximum level ( peak ) in each of the 180 high-quality TUs based on difference of average values between upstream and downstream windows . 
+ We set a pair of 200-bp sliding windows and calculated differences in average probe intensities between the upstream window and the downstream window . 
+ The window was moved at 50-bp intervals from bp 200 to bp 600 of the TSS . 
+ The position where the difference between the values for the up - and downstream windows changes from positive to negative was computationally detected as a peak . 
+ When multiple peaks were detected , the most upstream position was deﬁned as the peak position . 
+ Extraction of genes for calculation of the traveling ratio ( TR ) . 
+ Genes immediately downstream of the 571 A binding sites were initially extracted . 
+ However , A genes located divergently from the single binding sites were excluded from further analysis , since it was difﬁcult to establish the precise relationship A between binding and regulated genes . 
+ Promoters of rRNA and tRNA operons were additionally excluded . 
+ Genes for which binding signals from upstream TUs overlapped in the promoter regions were removed via visual inspection , leading to a ﬁnal selection of 416 genes ( see Table S2 in the supplemental material ) . 
+ Microarray data accession number . 
+ Raw data ( CEL format ) from the ChAP-chip experiments described here have been deposited in the ArrayExpress da-tabase under accession number E-MEXP-2649 . 
+ RESULTS
+ ChAP-chip analysis of RNAP distribution on the B. subtilis genome . 
+ A , , and NusA fused with histidine tags were employed for ChAP-chip analysis . 
+ For this purpose , the coding sequence of the His tag was fused to the 3 end of each gene at the authentic locus on the B. subtilis genome . 
+ Under the growth conditions used ( LB at 37 °C under aerobic conditions ) , the resultant strains apparently grow normally and the expression levels of His-tagged A , , and NusA were similar to those of untagged proteins in wild-type cells ( data not shown ) . 
+ Puriﬁcation of protein-DNA complexes with a Ni column , mapping of copuriﬁed DNA fragments using a custom Af fymetrix tiling chip , and quantitative analysis and visualization of protein-binding signals were performed as described previously ( 16 ) . 
+ In parallel , transcripts were mapped in wild-type cells grown under similar conditions using the tiling chip . 
+ Two biologically independent analyses were performed for each ChAP-chip experiment , and typical examples of the distribution of protein binding and transcription signals are shown in computationally detected TUs , and signals from neighboring TUs often overlapped . 
+ However , many of TSSs clearly distinguishable from the neighboring TU are associated with A binding signals ( Fig. 1 ; see Fig . 
+ S1 in the supplemental material ) . 
+ TSSs without a A binding signal might correspond to promoters transcribed by alternative factors . 
+ Detailed analysis of the correlation of A binding peaks and promoter sequences is under way . 
+ As expected , RNAP ( represented by ) binding signals are observed along the transcribed regions ( Fig. 1 ; see Fig . 
+ S1 in the supplemental material ) . 
+ NusA binding signals are also additionally detected in most of the transcribed regions . 
+ Indeed , a scatter plot of the RNAP and NusA signals of each probe in the coding regions demonstrated their high genome-wide correlation ( r 0.76 ) ( Fig. 2A ) , strongly suggesting that NusA is a general transcription factor included in elongation complexes of RNAP , as reported for E. coli NusA ( 7 , 8 , 21 ) . 
+ In E. coli , although signal intensity is low , a positive correlation of 70 binding signals with those of RNAP in coding regions has been reported ( 21 ) . 
+ We also observed a similar statistically signiﬁcant correlation ( r 0.48 ) ( Fig. 2B ) , suggesting interaction between A and the elongation complex of RNAP and/or increased nonspeciﬁc binding of A-containing RNAP in transcribed regions , as previously suggested ( 21 ) . 
+ Visualization of general RNAP trafﬁcking proﬁle in B. subtilis . 
+ Interestingly and noticeably , in contrast to E. coli RNAP , which often accumulates at promoter-proximal regions in vivo ( 21 , 26 ) , B. subtilis RNAP appeared evenly distributed from the promoter to the coding regions ( Fig. 1 ; see Fig . 
+ S1 in the supplemental material ) . 
+ To evaluate the general behavior of RNAP from initiation to elongation steps on the B. subtilis genome , we searched TUs suitable for the analysis of the distribution of A , , and NusA on individual TUs , as described in Materials and Methods . 
+ As a result , we selected 180 TUs ( high-quality TUs ) ( see Table S1 in the supplemental material ) that are clearly separated from neighboring TUs , are associated with a single TSS , and have enough length ( 200 bp ) for the analysis of elongating RNAP . 
+ Then , the relative ratios of protein binding and transcription signal intensities to those at TSS were calculated for probes located from bp 800 to bp 1000 relative to TSS in each of 180 TUs and plotted as a function of distance ( bp ) from TSS ( Fig. 3A to D , red lines ) . 
+ Finally , as the relative positions of the probes in each TU were different , we calculated the average proﬁles of transcriptional and A , , and NusA binding signal intensities from promoters to coding regions using the locally weighted scatter plot smoothing ( LOWESS ) method ( 5 ) ; these proﬁles were plotted ( green lines in Fig. 3A to D ) and overlaid ( Fig. 3E ) . 
+ Average proﬁles of protein binding signals were evaluated using data from two independent experiments , and the results from one experiment are presented in Fig. 3B , C , and D . 
+ In Fig. 3E , we also included average proﬁles obtained from other data sets ( see Fig . 
+ S2 in the supplemental material ) to demonstrate the reproducibility of the results . 
+ The average A proﬁle reveals a symmetric distribution centered at TSS , indicating that A is rapidly released from the RNAP core after promoter clearance ( Fig. 3A ) , similar to what is found for E. coli ( 21 , 26 ) . 
+ A signiﬁcant level of the average signal of the subunit is observed at the TSS , and the signal reaches its maximum level immediately downstream of TSS , with relatively constant binding signals in the downstream transcribed regions ( Fig. 3C ) . 
+ These results conﬁrmed an even distribution of RNAP from the promoter to the coding regions , at least in the high-quality TUs , in B. subtilis . 
+ In contrast , Reppas et al. ( 26 ) reported that , in E. coli , the median ratio of the signal 800 bp downstream of the promoter relative to the peak value at the promoter was 0.43 for 59 high-quality TUs similar to ours , and Mooney and coworkers ( 21 ) classiﬁed 29 TUs as having promoter-proximal peaks among 42 highquality and highly transcribed TUs . 
+ Constant distribution was also observed for NusA ( Fig. 3D ) . 
+ However , the average NusA binding signal appeared to reach a plateau downstream of th position where binding reaches the maximum level , consistent with the exchange of A with NusA on the RNAP core after promoter clearance . 
+ Resolution of our ChAP-chip data does not exclude the possibility of NusA binding to RNAP before A release . 
+ However , an in vivo pulldown assay of A and NusA in B. subtilis cells showed that A was not copuriﬁed with NusA and vice versa , strongly suggesting that NusA is an exclusive component of the elongation complex of RNAP through exchange with A bound to the initiation complex ( IC ) ( our unpublished data ) . 
+ Additionally , we veriﬁed that average proﬁles obtained by the LOWESS smoothing method indeed reﬂect characteristic of selected TUs by analyzing data sets with another method . 
+ We estimated positions where A , , and NusA binding signals reach the maximum level ( peak ) in each of the 180 highquality TUs as described in Materials and Methods . 
+ The distribution of peak positions of A , , and NusA in 180 TUs is presented in Fig. 3F . 
+ The median values of the peak position of A , , and NusA are 0 , 100 , and 150 bp from TSS , respectively , and statistical evaluation of these results revealed that there are signiﬁcant differences between estimated maximal enrichment points of A and NusA ( P 2.186 e 14 ) , A and ( P 1.423 e 08 ) , and and NusA ( P 0.026 ) , supporting statistically signiﬁcant differences of average binding proﬁles of A , , and NusA . 
+ Characterization of RNAP distribution based on the TR . 
+ For selection of high-quality TUs analyzed in the previous section , we adopted the criterion that contiguous signals of transcription and binding are distributed in sequences of greater than 200 bp . 
+ This speciﬁcation would eliminate short transcripts associated with RNAP paused at the promoter or promoter-proximal sites or transcriptional termination by attenuation signals . 
+ Next , we attempted to characterize the transition of RNAP occupancy from the promoter to the coding region by calculating the relative ratio of RNAP occupancy ( traveling ratio [ TR ] ) ( 26 ) at 571 automatically detected A binding sites as described in Materials and Methods . 
+ We then selected 416 genes ( see Table S2 in the supplemental material ) located immediately downstream of the A binding site with a binding signal in the promoter region that did not overlap with those of neighboring TUs . 
+ Genes located divergently from the single A binding site were excluded , since the precise relationships between A binding and genes regulated were difﬁcult to establish . 
+ We calculated the average binding signal intensities of probes in the 100-bp region centered at the A binding site ( IC signal intensity ) and the coding region ( EC signal intensity ) ( Fig. 1 ) . 
+ The TR value for each gene , obtained by dividing EC signal intensity by IC signal intensity ( see Table S2 in the supplemental material ) is presented as a histogram in Fig. 4A . 
+ If RNAP is distributed evenly from a promoter to a coding region , TR values are expected to give a normal distribution , with the average value near 1.0 . 
+ In fact , the distribution of TR values is apparently similar to the normal distribution , with an average of 0.85 ( Fig. 4A ) . 
+ We estimated the false discovery rate ( FDR ; possibility of existence of genes that do not actually belong to the normal distribution centered at a TR of 1.0 ) . 
+ The q value of the FDR was determined for each gene by a one sided t test ( 0.05 ) with the null hypothesis of a log2 TR of 0 , followed by the Benjamini-Hochberg ( BH ) method ( 2 ) ( see Table S2 in the supplemental material ) . 
+ The results did not suggest the existence of a signiﬁcant number of genes with signiﬁcantly low TR values . 
+ These results again demonstrate that the majority of RNAPs are relatively evenly distributed from the promoter to coding regions in B. subtilis . 
+ However , we visually found a limited number of low-TR genes in which RNAP accumulates at the promoter or pro-moter-proximal regions , as discussed below . 
+ Characterization of the promoter-proximal peaks of RNAP . 
+ Next , we focused on 40 genes with TR values of 0.5 in two independent experiments ( Table 1 ; see Fig . 
+ S3 in the supplemental material ) . 
+ Typical examples of peaks at the promoter or promoter-proximal regions , together with distribution of A , NusA , and transcripts , are shown in Fig. 4B , b to d . 
+ The peaks overlapped with the NusA signals , except in the cases of 8 genes ( thdF , yerA , yusL , yktD , ylxS , ymcB , yfjO , and mutS ) , indicating that the majority of promoter or promoter-proximal RNAP peaks contain elongating RNAPs . 
+ B. subtilis often uses transcriptional attenuation for regulation of gene expression at the postinitiation step ( 19 ) . 
+ These transcription attenuation sequences are located between the promoters and coding sequences , and transcribed RNA mole-cules form two distinct structures for either termination before the coding region or read-through toward the coding region , depending on association of protein , tRNA , and metabolites ( riboswitch ) . 
+ In B. subtilis , 59 attenuation sequences have been reported , and the behavior of RNAP at these regions is summarized in Table S3 in the supplemental material . 
+ Twenty-ﬁve genes presented in Table S3 in the supplemental material were excluded from this study , since A binding at the promoter regions was undetected ( 18 genes ; yvrC , ribD , ypaA , cysH , metE , metI , yitJ , yxjG , yxjH , ykoF , ylmB , yueJ , ktrA , hutH , bglS , pyrP , sacB , and sacX ) or the A peak was shared by divergent genes ( 7 genes ; gcvT , metK , yoaD , yxkD , yybP , sacP , and yqjO ) . 
+ In the remaining 34 genes , 25 genes showed TR values of 0.5 in two independent experiments ( Table 1 , known attenuators ) . 
+ In addition , visual inspection of 6 genes ( pbuE , xpt , purE , mtnK , yczA , and serS ) showed clear accumulation of RNAP at the promoter-proximal regions ( average TR values of duplicate experiments are 0.53 to 0.82 ) ( see Table S3 in the supplemental material ) . 
+ Thus , most genes regulated by attenuation sequences ( 31 of 34 genes , 91 % ) displayed a promoterproximal peak of RNAP . 
+ The remaining 3 genes ( glmS , pbuG , and yxjA ) displayed a constant distribution of binding and transcription signals , implying that transcriptional attenuation in these cases is not signiﬁcant under our experimental conditions . 
+ It should be also noted that , although A binding was not assigned to ribD in our algorithm because of the existence of a small open reading frame ( ORF ) , ypuE , upstream of it ( see Table S3 in the supplemental material ) , the TU also showed signiﬁcant accumulation of RNAP at the promoter-proximal region ( see Fig . 
+ S1 in the supplemental material ) . 
+ In vitro experiments indicate that RNAP pauses at the leader regions of trpE ( protein-mediated attenuation ) , glyQS ( tRNAmediated attenuation ) , and ribD ( riboswitch ) , probably to provide sufﬁcient time for proper folding of the RNA molecule and effective ligand binding ( 9 , 31 , 33 ) . 
+ Since this pause occurs regardless of the binding of ligands to attenuation structures , RNAP accumulation at the leader region should be observed , even without transcription termination . 
+ We detected short leader transcripts , possibly resulting from pausing and/or premature termination of transcription , for 15 genes . 
+ The failure to detect leader transcripts may be attributed to several reasons . 
+ The transcripts are very unstable , as reported for S-adenosylme-thionine ( SAM ) - dependent riboswitches ( 28 ) and/or too short to act as templates for cDNA synthesis during the preparation of hybridization probes for transcript mapping . 
+ It is additionally possible that completed transcripts accumulating in the cytoplasm mask the signals of ongoing transcription . 
+ While the remaining 15 genes listed as `` candidates '' in Table 1 are not known to be regulated by attenuators , we identiﬁed possible - independent terminator sequences upstream of th coding regions for 8 genes ( pheS , yyaE , ndhF , ybgE , ypbR , yrhG , yusL , and ylxS ) , which may act as part of an unknown attenuation system . 
+ In fact , short leader transcripts were detected for two of the sequences . 
+ The molecular mechanisms underlying RNAP accumulation at the other 7 promoters are unclear at present . 
+ Notably , 8 genes ( thdF , yerA , yusL , yktD , ylxS , ymcB , yfjO , and mutS ) in Table 1 showed no or weak distribution of NusA binding signals , while signiﬁcant transcription signals were observed for 6 genes but not for yktD and yusL ( Fig. 4B , c ) . 
+ These proﬁles might indicate nonuniform population of RNAP traf-ﬁcking . 
+ In the majority fraction , RNAPs might be trapped at the promoter or promoter-proximal region without forming elongation complexes with factors , including NusA , but might initiate elongation with RNA synthesis in a minor fraction . 
+ Two genes , yktD and yusL , were exceptions , since the promoterproximal peaks of A and were not associated with signiﬁcant NusA binding and transcription signals ( Fig. 4B , d ) . 
+ Thes genes are candidates for regulation by poised RNAP in B. subtilis . 
+ DISCUSSION
+ We report here that the B. subtilis RNAP is evenly distrib-uted from the promoter region to the coding sequences in the majority of TUs and that the promoter-proximal peaks mainly result from attenuation mechanisms . 
+ Most of the known attenuator sequences analyzed disclose accumulation of RNAP with NusA at the promoter-proximal regions , regardless of leader transcript detection . 
+ Interestingly , the RNAP signals distrib-uted from promoter to attenuation sequences are signiﬁcant and constant . 
+ This proﬁle is indicative of the high density of the elongation complexes in the leader region . 
+ It was observed that , in contrast to the distribution of the B subtilis RNAP , that of the E. coli RNAP is generally higher at the promoter region than within the coding sequence ( 21 , 26 ) . 
+ Three mechanisms of RNAP accumulation at the promoter or promoter-proximal regions occur in E. coli , speciﬁcally , trapping at the promoter , promoter-proximal pausing , and attenuation of transcription ( 21 ) . 
+ Reppas et al. ( 26 ) suggest that E. coli RNAP is often trapped or `` poised '' at the promoter region , based on the observation that 23 % of E 70s at promoters are not associated with transcripts . 
+ In addition , RNAP trapping at the promoter and pausing at promoter-proximal regions are reported for a number of E. coli genes ( 3 , 11 , 18 ) . 
+ However , Mooney and colleagues ( 21 ) propose that the majority of the promoter-proximal RNAP peaks reﬂect an elongating complex ( transcription attenuation ) rather than an initiating complex ( RNAP poised at the promoter ) , based on the observation that the RNAP peaks overlap with those of NusA . 
+ While the average RNAP peak is located 130 bp downstream of 70 , the NusA peak is further downstream ( 190 bp from the 70 peak ) in 29 high-quality TUs analyzed by Mooney et al. ( 21 ) . 
+ These differences may indicate that promoter-proximal peaks of E. coli RNAP contain multiple types of RNAP , in-70 cluding some fraction of RNAP - trapped at promoters or promoter-proximal sites and RNAP-NusA paused and terminated at further downstream promoter-proximal or attenuated sites . 
+ In contrast , signiﬁcant accumulation of RNAP was not observed for the majority of genes in B. subtilis , indicating that RNAP recruited to the promoter promptly leaves to form an elongation complex without trapping or pausing at the promoterproximal site . 
+ This observation is consistent with the different in vitro properties of RNAPs of the two bacteria . 
+ E. coli RNAP forms a stable open complex , which is not observed for B. subtilis RNAP ( 1 , 6 , 23 , 27 , 30 ) . 
+ Additionally , constant distribution of RNAP on TUs might be related to low activity of Rho-dependent termination in B. subtilis . 
+ In E. coli , the Rho factor is an essential abundant protein and responsible for 20 to 50 % of termination of mRNA synthesis . 
+ Additionally , it has been proposed that Rho factor is involved in intragenic transcriptional termination , which occurs when translation efﬁciency of the mRNA is reduced ( 12 , 24 ) . 
+ In contrast , Rho is a weakly expressed dispensable protein in B. subtilis , and only the trp operon and the rho gene are its known targets ( 14 ) . 
+ Furthermore , it has been demonstrated that two additional factors , NusG and NusA , participate in Rho-dependent termination by coupling Rho to the elongation complex ( 4 ) . 
+ Interestingly , NusA is essential in B. subtilis but not in E. coli ( although only in rho mutants with reduced termination efﬁciency ) , and NusG is essential in E. coli but not in B. subtilis ( 4 ) . 
+ These differences might be also related to different distributions of RNAP on TUs in both bacteria . 
+ Different biochemical characteristics of RNAP and its interactions with transcription factors in B. subtilis and E. coli may result in distinct strategies for postinitiation control of transcription . 
+ Speciﬁcally , E. coli often employs promoter trapping or promoter-proximal pausing of RNAP in addition to transcription attenuation in leader sequences , whereas B. subtilis mainly employs transcription attenuation . 
+ Further examination of the molecular basis for promoter-proximal RNAP peaks and their involvement in regulation of gene expression in both bacterial types is necessary to prove our hypothesis . 
+ In addition , studies on the dynamics of RNAP trafﬁcking in other bacteria are essential to elucidate the relationships between the biochemical characteristics of RNAP and evolution of regulatory strategies of gene expression during the postinitiation steps of transcription . 
+ ACKNOWLEDGMENTS We are grateful to Jon Hobman for the critical reading of the
+ manuscript and Hiroki Takahashi and Hiroshi Mori for the critical discussion about the statistical analysis . 
+ This work was supported by a KAKENHI grant-in-aid for scientiﬁc research in the Priority Area `` Systems Genomics '' from the Ministry of Education , Culture , Sports , Science , and Technology of Japan . 
+ REFERENCES 1 . 
+ Artsimovitch , I. , V. Svetlov , L. Anthony , R. R. Burgess , and R. Landick . 
+ 2000 . 
+ RNA polymerases from Bacillus subtilis and Escherichia coli differ in recognition of regulatory signals in vitro . 
+ J. Bacteriol . 
+ 182:6027 -- 6035 . 
+ 2 . 
+ Benjamini , Y. , and Y. Hochberg . 
+ 1995 . 
+ Controlling the false discovery rate : a practical and powerful approach to multiple testing . 
+ J. R. Stat . 
+ Soc . 
+ Ser . 
+ B 57:289 -- 300 . 
+ 3 . 
+ Choy , H. E. , R. R. Hanger , T. Aki , M. Mahoney , K. Murakami , A. Ishihama , and S. Adhya . 
+ 1997 . 
+ Repression and activation of promoter-bound RNA polymerase activity by Gal repressor . 
+ J. Mol . 
+ Biol . 
+ 272:293 -- 300 . 
+ 4 . 
+ Ciampi , M. S. 2006 . 
+ Rho-dependent terminators and transcription termination . 
+ Microbiology 152:2515 -- 2528 . 
+ 5 . 
+ Cleveland , W. S. 1979 . 
+ Robust locally weighted regression and smoothing scatterplots . 
+ J. Am . 
+ Stat . 
+ Assoc. 74:829 -- 836 . 
+ 6 . 
+ Dobinson , K. F. , and G. B. Spiegelman . 
+ 1987 . 
+ Effect of the delta subunit of Bacillus subtilis RNA polymerase on initiation of RNA synthesis at two bacteriophage phi 29 promoters . 
+ Biochemistry 26:8206 -- 8213 . 
+ 7 . 
+ Gill , S. C. , S. E. Weitzel , and P. H. von Hippel . 
+ 1991 . 
+ Escherichia coli sigma 70 and NusA proteins . 
+ I. Binding interactions with core RNA polymerase in solution and within the transcription complex . 
+ J. Mol . 
+ Biol . 
+ 220:307 -- 324 . 
+ 8 . 
+ Greenblatt , J. , and J. Li . 
+ 1981 . 
+ Interaction of the sigma factor and the nusA gene protein of E. coli with RNA polymerase in the initiation-termination cycle of transcription . 
+ Cell 24:421 -- 428 . 
+ 9 . 
+ Grundy , F. J. , and T. M. Henkin . 
+ 2004 . 
+ Kinetic analysis of tRNA-directed transcription antitermination of the Bacillus subtilis glyQS gene in vitro . 
+ J. Bacteriol . 
+ 186:5392 -- 5399 . 
+ 10 . 
+ Guenther , M. G. , S. S. Levine , L. A. Boyer , R. Jaenisch , and R. A. Young . 
+ 2007 . 
+ A chromatin landmark and transcription initiation at most promoters in human cells . 
+ Cell 130:77 -- 88 . 
+ 11 . 
+ Hatoum , A. , and J. Roberts . 
+ 2008 . 
+ Prevalence of RNA polymerase stalling at Escherichia coli promoters after open complex formation . 
+ Mol . 
+ Microbiol . 
+ 68:17 -- 28 . 
+ 12 . 
+ Henkin , T. M. , and C. Yanofsky . 
+ 2002 . 
+ Regulation by transcription attenuation in bacteria : how RNA provides instructions for transcription termination/antitermination decisions . 
+ Bioessays 24:700 -- 707 . 
+ 13 . 
+ Huber , W. , J. Toedling , and L. M. Steinmetz . 
+ 2006 . 
+ Transcript mapping with high-density oligonucleotide tiling arrays . 
+ Bioinformatics 22:1963 -- 1970 . 
+ 14 . 
+ Ingham , C. J. , J. Dennis , and P. A. Furneaux . 
+ 1999 . 
+ Autogenous regulation of transcription termination factor Rho and the requirement for Nus factors in Bacillus subtilis . 
+ Mol . 
+ Microbiol . 
+ 31:651 -- 663 . 
+ 15 . 
+ Ishikawa , S. , Y. Kawai , K. Hiramatsu , M. Kuwano , and N. Ogasawara . 
+ 2006 . 
+ A new FtsZ-interacting protein , YlmF , complements the activity of FtsA during progression of cell division in Bacillus subtilis . 
+ Mol . 
+ Microbiol . 
+ 60 : 1364 -- 1380 . 
+ 16 . 
+ Ishikawa , S. , Y. Ogura , M. Yoshimura , H. Okumura , E. Cho , Y. Kawai , K. Kurokawa , T. Oshima , and N. Ogasawara . 
+ 2007 . 
+ Distribution of stable DnaA-binding sites on the Bacillus subtilis genome detected using a modiﬁed ChIP-chip method . 
+ DNA Res . 
+ 14:155 -- 168 . 
+ 17 . 
+ Kunst , F. , N. Ogasawara , I. Moszer , A. M. Albertini , G. Alloni , V. Azevedo , M. G. Bertero , P. Bessieres , A. Bolotin , S. Borchert , R. Borriss , L. Boursier , A. Brans , M. Braun , S. C. Brignell , S. Bron , S. Brouillet , C. V. Bruschi , B. Caldwell , V. Capuano , N. M. Carter , S. K. Choi , J. J. Codani , I. F. Connerton , A. Danchin , et al. 1997 . 
+ The complete genome sequence of the grampositive bacterium Bacillus subtilis . 
+ Nature 390:249 -- 256 . 
+ 18 . 
+ Laishram , R. S. , and J. Gowrishankar . 
+ 2007 . 
+ Environmental regulation operating at the promoter clearance step of bacterial transcription . 
+ Genes Dev . 
+ 21:1258 -- 1272 . 
+ 19 . 
+ Mandal , M. , B. Boese , J. E. Barrick , W. C. Winkler , and R. R. Breaker . 
+ 2003 . 
+ Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria . 
+ Cell 113:577 -- 586 . 
+ 20 . 
+ McClure , W. R. 1985 . 
+ Mechanism and control of transcription initiation in prokaryotes . 
+ Annu . 
+ Rev. Biochem . 
+ 54:171 -- 204 . 
+ 21 . 
+ Mooney , R. A. , S. E. Davis , J. M. Peters , J. L. Rowland , A. Z. Ansari , and R. Landick . 
+ 2009 . 
+ Regulator trafﬁcking on bacterial transcription units in vivo . 
+ Mol . 
+ Cell 33:97 -- 108 
+ 22 . 
+ Muse , G. W. , D. A. Gilchrist , S. Nechaev , R. Shah , J. S. Parker , S. F. Grissom , J. Zeitlinger , and K. Adelman . 
+ 2007 . 
+ RNA polymerase is poised for activation across the genome . 
+ Nat . 
+ Genet . 
+ 39:1507 -- 1511 . 
+ 23 . 
+ Nechaev , S. , M. Chlenov , and K. Severinov . 
+ 2000 . 
+ Dissection of two hallmarks of the open promoter complex by mutation in an RNA polymerase core subunit . 
+ J. Biol . 
+ Chem . 
+ 275:25516 -- 25522 . 
+ 24 . 
+ Peters , J. M. , R. A. Mooney , P. F. Kuan , J. L. Rowland , S. Keles , and R. Landick . 
+ 2009 . 
+ Rho directs widespread termination of intragenic and stable RNA transcription . 
+ Proc . 
+ Natl. Acad . 
+ Sci . 
+ U. S. A. 106:15406 -- 15411 . 
+ 25 . 
+ Radonjic , M. , J. C. Andrau , P. Lijnzaad , P. Kemmeren , T. T. Kockelkorn , D. van Leenen , N. L. van Berkum , and F. C. Holstege . 
+ 2005 . 
+ Genome-wide analyses reveal RNA polymerase II located upstream of genes poised for rapid response upon S. cerevisiae stationary phase exit . 
+ Mol . 
+ Cell 18:171 -- 183 . 
+ 26 . 
+ Reppas , N. B. , J. T. Wade , G. M. Church , and K. Struhl . 
+ 2006 . 
+ The transition between transcriptional initiation and elongation in E. coli is highly variable and often rate limiting . 
+ Mol . 
+ Cell 24:747 -- 757 . 
+ 27 . 
+ Rojo , F. , B. Nuez , M. Mencia , and M. Salas . 
+ 1993 . 
+ The main early and late promoters of Bacillus subtilis phage phi 29 form unstable open complexes with sigma A-RNA polymerase that are stabilized by DNA supercoiling . 
+ Nucleic Acids Res . 
+ 21:935 -- 940 . 
+ 28 . 
+ Shahbabian , K. , A. Jamalli , L. Zig , and H. Putzer . 
+ 2009 . 
+ RNase Y , a novel endoribonuclease , initiates riboswitch turnover in Bacillus subtilis . 
+ EMBO J. 28:3523 -- 3533 . 
+ 29 . 
+ Wade , J. T. , and K. Struhl . 
+ 2008 . 
+ The transition from transcriptional initiation to elongation . 
+ Curr . 
+ Opin . 
+ Genet . 
+ Dev . 
+ 18:130 -- 136 . 
+ 30 . 
+ Whipple , F. W. , and A. L. Sonenshein . 
+ 1992 . 
+ Mechanism of initiation of transcription by Bacillus subtilis RNA polymerase at several promoters . 
+ J. Mol . 
+ Biol . 
+ 223:399 -- 414 . 
+ 31 . 
+ Wickiser , J. K. , W. C. Winkler , R. R. Breaker , and D. M. Crothers . 
+ 2005 . 
+ The speed of RNA transcription and metabolite binding kinetics operate an FMN riboswitch . 
+ Mol . 
+ Cell 18:49 -- 60 . 
+ 32 . 
+ Wu , L. J. , S. Ishikawa , Y. Kawai , T. Oshima , N. Ogasawara , and J. Err-ington . 
+ 2009 . 
+ Noc protein binds to speciﬁc DNA sequences to coordinate cell division with chromosome segregation . 
+ EMBO J. 28:1940 -- 1952 . 
+ 33 . 
+ Yakhnin , A. V. , and P. Babitzke . 
+ 2002 . 
+ NusA-stimulated RNA polymerase pausing and termination participates in the Bacillus subtilis trp operon attenuation mechanism in vitro . 
+ Proc . 
+ Natl. Acad . 
+ Sci . 
+ U. S. A. 99:11067 -- 11072 . 
+ 34 . 
+ Zeitlinger , J. , A. Stark , M. Kellis , J. W. Hong , S. Nechaev , K. Adelman , M. Levine , and R. A. Young . 
+ 2007 . 
+ RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo . 
+ Nat . 
+ Genet . 
+ 39:1512 -- 1516
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/21051353.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/21051353.txt 0 → 100644
View file @27818a9
+ Sensitive and accurate identification of protein–DNA
+ ABSTRACT 
+ Immuno-precipitation of protein -- DNA complexes followed by microarray hybridization is a powerful and cost-effective technology for discovering protein -- DNA binding events at the genome scale . 
+ It is still an unresolved challenge to comprehensively , accurately and sensitively extract binding event information from the produced data . 
+ We have developed a novel strategy composed of an information-preserving signal-smoothing proced-ure , higher order derivative analysis and application of the principle of maximum entropy to address this challenge . 
+ Importantly , our method does not require any input parameters to be specified by the user . 
+ Using genome-scale binding data of two Escherichia coli global transcription regulators for which a relatively large number of experimentally supported sites are known , we show that 90 % of known sites were resolved to within four probes , or 88 bp . 
+ Over half of the sites were resolved to within two probes , or 38 bp . 
+ Furthermore , we demonstrate that our strategy delivers significant quantitative and qualitative performance gains over available methods . 
+ Such accurate and sensitive binding site resolution has important consequences for accurately reconstructing transcriptional regulatory networks , for motif discovery , for furthering our understanding of local and non-local factors in protein -- DNA interactions and for extending the usefulness horizon of the ChIP-chip platform . 
+ INTRODUCTION
+ Protein -- DNA interactions are fundamental for cellular function . 
+ Comprehensive and accurate knowledge of protein-binding locations on a chromosome is a prerequis-ite for understanding transcriptional regulation , resolving the role of proteins in structuring the bacterial nucleoid and eukaryotic chromatin and revealing the dynamics of protein binding or translocations . 
+ The biological signiﬁcance of in vivo protein -- DNA interactions has been remarkably enhanced by the advent of the combination of chromatin immuno-precipitation with DNA microarrays ( ChIP-chip ) ( 1 ) . 
+ In this technological framework , the DNA in proximity to binding events is obtained by protein -- DNA complex fragmentation and immunoprecipitation . 
+ Hybridization of this DNA to a tiled DNA microarray produces an enrichment signal at particular locations of the chromosome . 
+ The data from a ChIP-chip experiment is information rich in that it is a report on quasi-digital protein -- DNA binding events , but these binding event signals are shrouded in an analog signal due to the fact that the DNA ﬂanking the actual binding event is also hybridized to the microarray . 
+ Furthermore , probe-level noise inherent in the microarray platform has a signiﬁcant negative impact on the signal-to-noise ratio . 
+ The challenge , then , in ChIP-chip data analysis is to identify all protein -- DNA binding events and to do so with high accuracy . 
+ A number of methods , discussed elsewhere ( 2 ) , have been developed to analyze ChIP-chip data sets . 
+ Many methods only aim to identify the broad regions of enrichment and not the precise location of binding events . 
+ ChIP-chip is a high-throughput technology , and to fully leverage its capabilities requires statistical signiﬁcance calculations to be included with binding event information . 
+ Few methods provide this information . 
+ Furthermore , all available methods require user-speciﬁed parameters -- such as window sizes and cutoff values -- that are difﬁcult for users to optimally set . 
+ As yet , there is no available method that identiﬁes the locations of protein -- DNA binding events with high accuracy , is sensitive to weak signals and to closely spaced binding events , can associ-ate statistical signiﬁcance values to the identiﬁed binding events and learns needed parameters from each individual ChIP-chip data set instead of requiring them as user input . 
+ Higher order derivative analysis has a long history in the analytical chemical sciences ( 3 -- 6 ) , having been applied to a large number of spectroscopic techniques ( 7 ) whose principal commonality is that their output is a curved spectrum comprising a single peak or , more typically , a number of overlapping peaks . 
+ Derivate analysis of zero-order spectra is a powerful technique for identifying weak peak signals from background noise and for resolving essentially hidden peaks in a spectrum that is composed of closely spaced peaks of different magnitudes . 
+ The power of derivative analysis resides in the fact that faint changes in the slope of a signal are revealed as separate , easily identiﬁable peaks in the signal 's higher derivatives . 
+ Herein , we report on the development of a method for applying higher order derivative analysis ( i.e. employing derivatives greater than two ) for the ﬁrst time to ChIP-chip data for the discovery of protein -- DNA binding events . 
+ We evaluate the method by applying it to ChIP-chip data sets of two global regulators in Escherichia coli , for which a large number of experimentally supported binding sites ( ESBSs ) are known , and by comparison to widely used methods . 
+ In so doing , we demonstrate an approach , called DECODE ( binding event discovery using derivatives ) , which accurately and sensitively identiﬁes binding site locations without the need for user-speciﬁed parameter settings and which delivers a signiﬁcant quantitative and qualitative performance gain over available methods . 
+ MATERIALS AND METHODS Deﬁning ESBSs
+ We downloaded protein -- DNA binding event data in the form of a table from RegulonDB and extracted only those entries involving the proteins Lrp and Fis . 
+ We then retained only those interactions whose support for exist-ence included ` Binding of puriﬁed protein ' . 
+ Input data preparation
+ The data utilized in this work were from Nimblegen arrays with 50 bp probes that are overlapped by 25 bp . 
+ There are no issues that would preclude the use of other array platforms , provided that the array data is prepared ( as described below ) in a similar manner . 
+ The control channel corresponds to the probe signal intensities when only genomic DNA is hybridized to the array and the experiment channel is the probe-signal intensities when the immuno-precipitated DNA is hybridized to the array . 
+ For an experimental replicate , then , the two channel signals were normalized to have the same sum of signal intensities -- a correction necessary to reﬂect that fact that the same amount of DNA was used for each channel 's hybridization . 
+ All replicates ' control channel signals were then quantile normalized together , as were the experiment channel signals . 
+ Each replicate 's experiment channel signal was then quantile normalized with its corresponding control channel signal , and a ﬁnal enrichment signal formed from the ratio of the experiment and control channel signals . 
+ The probe ratio values were not logarithmically transformed . 
+ Cross-replicate equalization
+ We equalized the baseline signal across all replicates by , for each replicate in turn , histogramming the probe ratio values ( in bin sizes of 0.01 ) and identifying the bin value corresponding to the histogram maximum -- or the average value of the background noise distribution . 
+ We then computed a replicate-speciﬁc offset by subtracting the histogram maximum bin value from 1.0 . 
+ A replicate enrichment signal was then baseline corrected by adding the offset to all probe ratio values . 
+ The effect of this procedure was to make the value of 1.0 correspond to the average background noise value in all replicates , making them all directly comparable . 
+ Potential binding region identiﬁcation
+ Potential binding site regions were identiﬁed as those contiguous regions at least 400 bp in length wherein all probes had an enrichment value greater than 1.0 . 
+ This cutoff value corresponds to a region size that is much smaller than the size of any positive signal that would be due to immuno-precipitated DNA in any ChIP-chip experiment , and so represents a very liberal criterion for identifying regions that might contain a binding event . 
+ Enrichment signal smoothing
+ The enrichment signal in potential binding site regions was smoothed in a two-step manner . 
+ In step one , we removed spikes in the raw signal using an approach based on Poincaré maps ( 8,9 ) . 
+ As suggested ( 9 ) , we utilized Chauvenet 's criterion and the median of the absolute deviations from the sample median to calculate the Universal Threshold . 
+ Because ChIP-chip signals vary over a greater range than the type of signal for which the Poincaré map procedure was originally intended , we modiﬁed the procedure to identify spikes in the enrichment signal , s , of a potential binding site region using a surrogate signal , s * . 
+ For each probe i in s , we computed the value for probe i in s * as absðs m Þ s 1/4 i i i mi 
+ By normalizing each probe value in this way , we effectively removed the magnitude of the underlying signal while retaining the spike behavior -- rendering all probes directly comparable . 
+ We then applied the Poincaré map procedure to s * and , additionally , computed a weight , wi , for each probe i : 
+ A probe i was considered to be a spike if it was outside of the ellipse of the Poincaré map procedure . 
+ We used the weights , wi , to replace the signal value for each probe considered to be a spike using the weighted average of it as well as its two neighboring probes : ð Þ 0 1/4 wi 1 si 1 + wi si + wi +1 si +1 s i ðwi 1 + wi + wi +1 Þ Finally , we computed the percent change in the sum of the values of the signals s and s0 . 
+ By substituting s0 for s , the entire spike-removal procedure could be iterated , which we did until the percent change converged . 
+ In practice , convergence corresponded to a percent change of 0.1 % . 
+ The second step of the smoothing procedure was smoothed using the Savitzky -- Golay ﬁlter ( 10 ) with a symmetric smoothing window whose of half-width was optimally computed using a modiﬁcation the Durbin-Watson criterion ( 11 ) . 
+ The output of the smoothing procedure was a smoothed enrichment signal S. 
+ Potential binding site identiﬁcation
+ The ﬁrst three derivatives of S were calculated using the differential quotients derivative method -- which simply computes the derivative at a point as the average of the slopes between it and its two adjacent neighboring points . 
+ Negative second derivative regions greater than ﬁve probes in length were then identiﬁed , and all positive-to-negative zero crossings of the third derivative within these regions were identiﬁed as local maxima positions . 
+ The local maxima so identiﬁed were the locations of apices of peaks that could be due to either bona ﬁde binding events or to noise . 
+ We deﬁned the set of all such apex locations as L. 
+ Peak estimation
+ Peak estimation is the process of simultaneously estimating the shape and size of the peaks at all of the peak apex positions in L . 
+ We estimated these peaks using two objective functions concurrently . 
+ First , we required that the estimated peaks , when summed to form a reconstructed signal R , minimized the difference , D , between R and S . 
+ That is , we sought to estimate peaks such that , when summed , reconstructed the original enrichmsenXfi fifi fi fi fi fi fififitfififisfiigfifinfiafifilfififiafisfifificfifilfiosely as possible . 
+ We computed D as 2 Sp Rp p where p is the index over all probes in S. Second , we required that the estimated peaks maximize the entropy , E , over all of the probes in S. ( The deﬁnition of E and how D and E were balanced are detailed below . ) 
+ This second requirement is known as the principle of maximum entropy ( 12 ) , and it states that the only justiﬁable ( frequency ) distribution that can be constructed from incomplete information is the one that has maximum uncertainty , subject to any constraints . 
+ As constraints , we required estimated peaks to be unimodal and symmetric . 
+ It is necessary to explain how a binding region signal was reconstructed before describing how the entropy of R was calculated . 
+ The binding region probe values are real numbers . 
+ We converted the probe values to integers by multiplying them by a large integer ( 100 ) and then rounding to integers . 
+ In so doing , reconstruction of the binding region signal could be accomplished by incrementally adding or subtracting ` counts ' to probes in an ongoing signal reconstruction . 
+ Since adding or subtracting counts to a probe was done in the context of estimating some peak , the counts had an explicitly attached peak label l from the set L. Thus the counts for a probe had a frequency distribution fl over the different peak labels . 
+ The Shan Xnon entropy for a probe p , could then be computed . 
+ The total entropy for the all probes Xin a region was then computed as 
+ In regards to how the two mathematical objective functions governing the peak-estimation process were utilized simultaneously , counts were only added for a probe if D decreased or D remained unchanged and E increased . 
+ The two constraints were enforced in regards to how counts could be added to estimate a peak . 
+ The ﬁrst constraint was that probe values on either side of the peak maximum had to be decreasing with increasing distance from the peak maximum ( to ensure a unimodal peak ) . 
+ The second constraint was that the count values for the symmetric probes about the probe position of the peak maximum had to be the same . 
+ In a ﬁnal step , the probe values were re-scaled by division with the same large integer used above in order to transform the probe intensity values back to real numbers . 
+ Identifying peaks due to noise
+ We ﬁrst identiﬁed the complementary regions to the potential binding site regions by identifying those regions that were at least 400 bp in length wherein all probe values were less than 1.0 . 
+ ( That is , we identiﬁed regions whose signal was below the average background noise level and so could conﬁdently be assumed to not contain binding events . ) 
+ We then inverted these regions about the probe value of 1.0 to create fake enrichment signals and proceeded with our algorithm to identify peak apex pos-itions and perform peak estimation . 
+ We termed the identiﬁed peaks ` noise peaks ' , as they could assuredly be assumed to not be due to bona ﬁde binding events . 
+ Peak signiﬁcance values
+ From each ChIP-chip replicate , we ﬁt a g distribution to the distribution of the noise peak heights . 
+ The parameters of the g distribution were then used to calculate the P-value of peaks identiﬁed in the enrichment signal 
+ Once all peaks have been identiﬁed in all potential binding event regions in a ChIP-chip replicate and their associated P-values calculated , we applied the local false discovery rate ( FDR ) ( 13 ) to distinguish binding event peaks from noise peaks . 
+ We used a local FDR value of 0.01 . 
+ Comparison to other methods
+ We utilized the following algorithms for comparative evaluation : Mpeak ( 14 ) , Nimblegen 's windowed threshold-detection algorithm that is a component of their SignalMap software ( 15 ) , MA2C ( 16 ) , Chipotle ( 17 ) and TAMALPAIS ( 18 ) . 
+ All algorithms were run with their default parameters . 
+ For TAMALPAIS , we used T02P05 predictions . 
+ For the algorithms that only predicted binding event regions , we used the center of the regions as the location of their binding event predictions . 
+ RESULTS
+ We present as results the major aspects of our algorithm and an evaluation of its ability to identify ESBS for the global regulators Fis and Lrp in E. coli . 
+ Algorithm
+ ChIP-chip experiments are usually performed using multiple replicates , and it is common to average these rep-licates to produce on enrichment signal that is then analyzed for binding event information . 
+ We ﬁnd that different replicates can reﬂect non-trivial differences in mo-lecular binding activity and that averaging can abolish strong enrichment signals or indicate binding event locations that are not supported by any individual replicate . 
+ Because our method is designed for high-resolution binding site identiﬁcation , we did not average replicates but instead analyzed each on its own ( 18 ) . 
+ Replicates , though , still need to be directly comparable . 
+ So , after normalizing replicates , ﬁrst individually and then altogether , we computed and applied a baseline correction in the form of an offset for each replicate such that an enrichment signal of 1.0 corresponded to the average value of the background noise distribution ( i.e. mean value of the non-enriched probes ) ( 19 ) . 
+ The most prominent characteristic of ChIP-chip data is the non-smooth variation of signal between adjacent probes ( Figure 1A ) . 
+ Derivative analysis is at the heart of our method , but it can not be applied directly to unsmoothed data because it would indicate a derivative change between essentially all adjacent probes and would in effect be useless . 
+ The challenge , then , in applying de-rivative analysis is to reveal the underlying smoothly varying enrichment signal while simultaneously minimizing the loss of binding event information contained in the raw ChIP-chip signal . 
+ To address this problem , we developed a procedure involving Poincaré maps and the Savitzky -- Golay smoothing ﬁlter that transforms a raw enrichment signal into one that varies smoothly over its entire domain while retaining subtle features ( Figure 1B ) . 
+ Derivative analysis identiﬁes the apex positions of the constituent peaks underlying , and together composing , a ChIP-chip signal spanning a contiguous stretch of a chromosome . 
+ We utilized the second and third derivatives to precisely locate local maxima in each replicate 's smoothed enrichment signal ( Figure 1C and D ) . 
+ As a result of this strategy , local maxima positions corres-ponded to the apex positions of underlying peaks that were due to both bona ﬁde protein-binding events and to noise . 
+ To discern between those maxima corresponding to noise and those corresponding to binding events , it was necessary to estimate the shape and size of the associated underlying peaks . 
+ We applied the principle of maximum entropy to resolve the shape and size of the underlying peaks , subject to the constraints that the peaks be unimodal and symmetric . 
+ The result of this step is the resolution of a smoothed enrichment signal into its underlying peaks ( Figure 1E ) . 
+ In order to quantify the probability that a peak is due to noise and not immuno-precipitated DNA , it was necessary to identify a large number of peaks that are with certainty due to noise from which noise peak statistics could be computed . 
+ Noise in this context is the background variation in signal that occurs among probes to which no immuno-precipitated DNA is complementarily bound . 
+ This noise is symmetrically distributed about the average non-enriched probe value ( 19 ) , and because of how we baseline corrected each data replicate , this noise is symmetrically distributed about the enrichment signal value of 1.0 . 
+ While probe values greater than 1.0 may be due in part to immuno-precipitated DNA , we can be certain that probe values less than 1.0 are due only to noise . 
+ We located regions wherein all probe values were less than 1.0 and reﬂected these about the value 1.0 . 
+ This reﬂection effectively created false ` enrichment signals ' . 
+ We then applied the peak-identiﬁcation algorithm to these false enrichment signals and , with the identiﬁed peaks , computed null distribution statistics . 
+ This procedure is depicted in Figure 2 . 
+ From these statistics we computed a P-value for every identiﬁed peak in a ChIP-chip replicate and used the local FDR to identify the peaks corresponding to bona ﬁde binding events . 
+ Performance evaluation of the algorithm
+ We have previously performed ChIP-chip analysis for the global regulators Fis ( 20 ) and Lrp ( 21 ) in E. coli . 
+ We used the large number of ESBS for these two DNA-binding proteins that are contained in the EcoCyc ( 22 ) and RegulonDB ( 23 ) databases to assess the sensitivity and accuracy of our method and its performance relative to other available methods . 
+ We discriminated protein-binding events from noise using the local FDR values , which as can be seen in Figure 3A are strongly distributed toward the extreme values that the local FDR can assume ( i.e. 0.0 and 1.0 ) . 
+ These plots are for a representative single replicate , but are qualitatively very similar to the results for all replicates . 
+ The very clear split between peaks identiﬁed as being due to noise and to immuno-precipitated DNA means that the composition of the set of binding events is not very sensitive to the exact value of the local FDR cutoff value . 
+ This clear distinction implies that noise peaks have very different characteristics than bona ﬁde binding event peaks . 
+ It also indicates that leveraging the symmetric nature of the background variation in non-enriched probe signals was an effective way to quantitatively discover these differentiating characteristics . 
+ The plots also indicate that peaks due to noise are much more numerous than predicted binding event peaks , underscoring the noisy nature of ChIP-chip data and the need for appropriate measures of signiﬁcance 
+ For each ESBS data set ( i.e. Fis or Lrp ) , we evaluated the accuracy of our method by tabulating the number of ESBS that were identiﬁed as a function of distance from the closest predicted binding site . 
+ Since we had multiple replicates , we deﬁned the distance to the closest predicted binding site as the median distance between one of the ESBS and the closest predicted binding site in each of the replicates . 
+ The results for the Fis and Lrp data sets are shown by the light-colored bars in Figure 3B . 
+ The results show that , while a majority of the ESBS is predicted within a few probes , there were a large number of sites whose closest predicted binding event was relatively distant . 
+ We found that for these cases there was no evidence in our ChIP-chip data for the existence of these 
+ ESBS . 
+ A typical example is shown in Figure 4A , where the arrows indicate the location of four Fis ESBS in the rnpB promoter . 
+ ( For the remainder of this study , we did not utilize these cases in any further analysis involving DECODE or other algorithms used for performance comparison . ) 
+ We re-performed our accuracy assessment without cases like these , and display the results depicted by the black bars in Figure 3B . 
+ This reassessment signiﬁcantly improves the accuracy for both Fis and Lrp data , especially the latter . 
+ Based on the results from these two data sets , we calculated that our method accurately identiﬁes 90 % of the ESBS within 88 bp . 
+ The majority of predicted sites were within 38 bp of the ESBS . 
+ In terms of the number of probes on our tile array platform , these numbers correspond to four and two probes , respectively . 
+ Sensitivity was a key performance goal in the development of our method , where sensitivity refers to the ability to identify weak and closely spaced binding events . 
+ Histograms of the median probe-enrichment values ( across all replicates ) on which the ESBS were located are shown in Figure 3D . 
+ The Fis histogram shows that 24 % of the ESBS were on probes whose signal values were less than twice the average background noise signal . 
+ That a signiﬁcant number of the ESBS with such weak signal were identiﬁed attests to our method 's ability to work close to the background noise level . 
+ To demonstrate the ability to resolve closely spaced binding events , we show two examples . 
+ Note that , in each example , for clarity we show only the closest predicted binding events to the ESBS . 
+ The ESBS for Fis in the osmE promoter are shown in Figure 4B . 
+ While the larger peak is inaccurate by four probes from predicting the leftmost binding site , the smaller peak exactly identiﬁes the probe of the two rightmost binding events . 
+ Nonetheless , these are difﬁcult binding events to resolve since they occur on the shoulder of a larger enrichment signal . 
+ The example in Figure 4C is the Lrp enrichment peak at the invertible ﬁm switch ( 24 ) . 
+ The left arrow identiﬁes the Lrp binding site that is cataloged in RegulonDB and EcoCyc , and as can be seen the underlying predicted peak exactly locates this non-obvious binding site . 
+ The ﬁm switch is a 314 bp , invertible DNA element . 
+ The invertible nature of this stretch of DNA means that it has two orientations in a population of cells , and so the Lrp binding site will actually be located in two chromosomal locations in a population . 
+ The inverted position of the Lrp binding site is marked by the starred arrow , which is also exactly located by a predicted peak . 
+ To assess the performance of DECODE relative to other methods that are available , we performed the same analysis as above with widely used available methods . 
+ This comparative analysis , as shown in Figure 5 , reveals that DECODE provides a marked improvement over all of the available methods in both accuracy and comprehensiveness . 
+ Furthermore , the performance of DECODE does not vary depending on the binding characteristics of the protein of interest -- in distinction to all of the other methods . 
+ This performance difference is due to the fact that Fis binding signals are rarely isolated , unimodal and symmetric peaks [ because of the Fis protein 's propensity to oligimerize along the DNA into extended binding regions ( 25 ) ] and that the other methods do not handle complicated enrichment signals as well as they do unimodal enrichment signals . 
+ The latter result highlights a strength of the derivative-based approach employed by DECODE . 
+ DISCUSSION
+ We have developed a method to address the difﬁcult challenge of extracting all of the information about protein -- DNA interactions from a ChIP-chip data set . 
+ The difﬁculties in this challenge are due to the chemistry-based background hybridization noise inherent in the tiled array platform , to the incompletely fragmented DNA that ﬂanks protein -- DNA binding events that is also immunoprecipitated , and to the ambiguity of deﬁning a binding event location due to the sometimes complicated biologic-al interaction of proteins and DNA . 
+ These sources of dif-ﬁculty together confuse the apex positions of enrichment peaks of isolated binding events , obscure weak binding event enrichment signals and mask closely spaced binding events . 
+ Our method contains a number of novel components . 
+ Principal among these is the use of higher order derivative analysis for identifying local maxima -- which correspond to underlying peaks -- in a ChIP-chip signal . 
+ Second , we developed an information-preserving smoothing proced-ure that allowed us to apply derivative analysis to ChIP-chip signals . 
+ Third , we applied the principle of maximum entropy for discovering the underlying peaks , as opposed to deconvolution using a functional form as in most other approaches . 
+ Fourth , we leveraged the symmetric nature of the background noise to learn noise peak characteristics , allowing us to quantify the signiﬁcance of the underlying peaks and to discriminate peaks due t binding events from noise peaks while controlling for false discovery rates . 
+ This combination of novel components results in high accuracy and sensitivity for a number of reasons . 
+ Our strategy to resolve a signal into peaks -- without identifying whether the peaks are due to noise or to enrichment from immuno-precipitated DNA -- has two important consequences . 
+ First , it allows us to use liberal thresholds in delineating regions that ` might ' contain binding events , and so with high certainty we do not miss any weak bona ﬁde binding events . 
+ Second , it allows us to use signiﬁcance testing to discriminate noise peaks from enrichment peaks . 
+ Furthermore , since our method is applied on a per-replicate basis , peak identiﬁcation is based on the learned noise statistics of each individual experimental replicate . 
+ Consequently , parameters are optimally learned and set and are not required as input from the user . 
+ Since multiple replicates are unnecessary , our method is appropriate to use for exploratory ChIP-chip experiments . 
+ We evaluated our method using ChIP-chip data sets of two DNA-binding proteins for which a relatively large number of ESBS are known ( Figure 4 ) . 
+ We demonstrated accuracy by showing that 90 % of the sites could be identiﬁed within four probes , and the majority could be identiﬁed within two probes . 
+ We demonstrated sensitivity by showing that 24 % of the identiﬁed Fis ESBS were located on probes whose enrichment signal was < 2-fold the background noise signal . 
+ We found that all of the ESBS that we did not closely predict did not have associated ChIP-chip signal enrichment to support the claim of their existence . 
+ These results demonstrate that our method was able to identify the local regions that had even very weak signals . 
+ Furthermore , they call into question a number of ` experimentally validated ' binding sites that are cataloged in the literature -- although the discrepancies could be due to different experimental conditions . 
+ We also evaluated our method through a performance comparison involving widely used available methods ( Figure 5 ) . 
+ We found that DECODE is distinguished from the other methods both in its ability to accurately identify binding events and to comprehensively identify all of them . 
+ These accuracy and comprehensiveness characteristics were very similar for both of the qualitatively different ChIP-chip data sets utilized -- stability not observed in the other methods . 
+ An important characteristic of a binding event discovery algorithm is that its performance does not vary with the protein under study , for such performance variance increases the uncertainty associated with all results that such an algorithm produces . 
+ The ability to resolve binding event locations with high resolution and with associated statistical signiﬁcance is important for many reasons . 
+ First , in genomic regions with a high density of genes or other sequence features , accurate localization helps disambiguate to which features the binding events are functionally related . 
+ Such accuracy is critical for accurate transcriptional regulatory network reconstruction , for instance . 
+ Second , it can dramatically improve the signal-to-noise ratio for motif discovery by identifying regions that are most likely to contain functional motifs . 
+ Third , it facilitates studies aimed at discovering principles of promoter architecture . 
+ Fourth , the statistical signiﬁcance that DECODE associates with each predicted binding event ( i.e. P-values ) gives users the ability to integrate binding event predictions with other high-throughput data types ( 26 ) . 
+ And ﬁfth , the ability to localize binding events with improved accuracy and sensitivity will extend the usefulness horizon of the ChIP-chip platform , especially given that bacterial genomes can now be completely tiled with 1 -- 5 bp resolution and that for bacterial genomes ChIP-chip still has cost and usability advantages over ChIP-seq . 
+ A user of the DECODE software , which is freely available upon request , must bear in mind some caveats when interpreting the output . 
+ Our goal was to develop a method that could discover where binding events were occurring , with a minimum number of errors and localize the binding event more accurately than is possible with other methods . 
+ Realistically , though , one must recognize that deﬁning a binding location entails ambiguity that will be a function of the size and overlap of the chip probes and of the nature of the interaction between the particular protein of interest and the DNA . 
+ So while some binding events can be expected to occur on a predicted probe , it would b most appropriate to work with a narrow region around a predicted binding event location . 
+ The application regime of DECODE encompasses both the resolution afforded by the chip tiling density and the range of genomes to which it can be applied . 
+ Given the high resolution and low cost of chip technology today , we did not design the algorithm for low-resolution arrays . 
+ Nonetheless , we demonstrated that our method is sensitive to weak enrichment signals , and so would be advantageous for discerning weak signals in low-resolution arrays . 
+ For low-resolution arrays whereon probes are widely spaced along the chromosome , the high-resolution advantages of our method are likely to be muted . 
+ Our method is not limited to bacterial genomes and would be appropriate for eukaryotic genomes since there are no genome-speciﬁc parameters in the software . 
+ The only issue that will arise in applying the method to eukaryotic genomes is the increased running time . 
+ Our method is ` embarrassingly parallel ' , though , so it could easily be run simultaneously on different portions of a eukaryotic Chip-chip data set . 
+ CONCLUSIONS
+ In this work we have applied higher order derivative analysis to ChIP-chip data for the ﬁrst time , and in so doing have extended the application regime of a powerful analytical technique . 
+ We limited our method to utilize only the third derivative , which may likely be the useful derivative limit given the signal-to-noise ratio of ChIP-chip data . 
+ Higher derivatives can be used for add-itional information gain , such as for resolving closely spaced binding events . 
+ Resolving more , and more closely spaced , binding events requires that the enrichment signal actually contain such discernable information , such as could be provided by chips with highly overlapping probes or by ChIP-seq ( 27,28 ) . 
+ Chip-seq data is of a fundamentally different nature than ChIP-chip data , so application of our method to ChIP-seq would require changes to the raw data-processing aspect of our algorithm . 
+ Otherwise , the core elements of our method can be adapted to ChIP-seq data , and so could offer a consistent framework for maximizing the information gain from contemporary protein -- DNA binding assay technologies . 
+ FUNDING
+ National Institutes of Health Grant GM062791 ; the Ofﬁce of Science-Biological and Environmental Research , U.S. Department of Energy , cooperative agreement DE-FC02-02ER63446 ; National Health Research Institutes-Taiwan Grant UCSD2008-3183R ; National Institutes of Health Grant DE-AC05-76RL01830 . 
+ Funding for open access charge : National Institutes of Health Grant GM062791 .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/21124945.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/21124945.txt 0 → 100644
View file @27818a9
+ Using Sequence-Specific Chemical and Structural
+ Abstract 
+ An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor ( TF ) . 
+ Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites . 
+ Here , we present a method called SiteSleuth in which DNA structure prediction , computational chemistry , and machine learning are applied to develop models for TF binding sites . 
+ In this approach , binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA . 
+ These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods . 
+ For each of 54 TFs in Escherichia coli , for which at least five DNA binding sites are documented in RegulonDB , the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training . 
+ According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis , SiteSleuth outperforms three conventional approaches : Match , MATRIX SEARCH , and the method of Berg and von Hippel . 
+ SiteSleuth also outperforms QPMEME , a method similar to SiteSleuth in that it involves a learning algorithm . 
+ The main advantage of SiteSleuth is a lower false positive rate . 
+ Introduction
+ An important step in characterizing the genetic regulatory network of a cell is to identify the DNA binding sites recognized by each transcription factor ( TF ) protein encoded in the genome . 
+ A TF typically activates and/or represses genes by associating with specific DNA sequences . 
+ Although other factors , such as metabolite binding partners and protein-protein interactions ( for example , between a TF and RNA polymerase or a second TF ) , can affect gene expression [ 1 ] , it is important to identify the sequences directly recognized by TFs to the best of our ability to understand which genes are controlled by which TFs . 
+ A better understanding of gene regulation , which plays a central role in cellular responses to environmental changes , is a key to manipulating cellular behavior for a variety of useful purposes , as in metabolic engineering applications [ 2 ] . 
+ A number of computational methods have been developed for predicting TF binding sites given a set of known binding sites [ 3 -- 10 ] . 
+ Commonly used methods involve the definition of a consensus sequence or the construction of a position-specific weight matrix ( PWM ) , where DNA binding sites are represented as letter sequences from the alphabet { A , T , C , G } . 
+ More sophisticated approaches further constrain the set of potential binding sites for a given TF by considering , in addition to PWMs , the contribution of each nucleotide to the free energy of protein binding [ 3 ] and additional biologically relevant information , such as nucleotide correlation between different positions of a sequence [ 8 ] or sequence-specific binding energies [ 6 ] . 
+ Perhaps not as widely used as sequence analysis , the idea of employing structural data for predicting TF binding sites has been considered [ 11 -- 15 ] . 
+ Most of these methods use protein-DNA structures rather than DNA by itself . 
+ Acquiring training sets large enough to be useful is problematic for even well-studied TFs , for which only small sets of known binding sites ( on the order of 10 sites ) are typically available [ 8 ] . 
+ New high-throughput technologies have been used to identify large numbers of binding sites for particular TFs [ 16 -- 18 ] , but there remains a need for methods that predict TF binding sites given a small number of positive examples . 
+ Such methods can be used , for example , to complement analysis of high-throughput data . 
+ Binding sites detected by high-throughput in vitro methods , such as protein-binding microarrays [ 16 ] , can be compared with predicted binding sites to prioritize studies aimed at confirming the importance of sites in regulating gene expression in vivo . 
+ The fine three-dimensional ( 3D ) structure of DNA is sequence dependent and TF-DNA interactions depend on various physicochemical parameters , such as contacts between nucleotides and amino acid residues and base pair geometry [ 19 ] . 
+ These parameters are not accounted for by conventional methods for predicting TF binding sites , which rely on sequence information alone . 
+ Letter representations of DNA sequences do not capture the biophysics underlying TF-DNA interactions . 
+ Given that a TF does not read off letters from a DNA sequence , but interacts with a particular sequence because of its chemical and structural features , we hypothesized that better predictions of TF binding sites might be generated by explicitly accounting for these features in an algorithm for predicting TF binding sites . 
+ The mechanisms by which TFs recognize DNA sequences can be divided into two classes : indirect readout and direct readout [ 19 ] . 
+ For indirect readout , a TF recognizes a DNA sequence via the conformation of the sequence , which is determined by the local geometry of base pair steps , the distortion flexibility of the DNA sequence , and ( water-mediated ) protein-DNA interactions [ 20,21 ] . 
+ For direct readout , a TF recognizes a DNA sequence through direct contacts between specific bases of the sequence and amino acid residues of the TF [ 22,23 ] . 
+ These two classes of recognition mechanisms are not mutually exclusive . 
+ In this study , we introduce a method , SiteSleuth , for predicting TF binding sites on the basis of sequence-dependent structural and chemical features of short DNA sequences . 
+ By using molecular dynamics ( MD ) methods to calculate these features , we can map a set of known or potential binding sites for a given TF to vectors of structural and chemical features . 
+ We use features of positive and negative examples of TF binding sites to train a support vector machine ( SVM ) to discriminate between true and false binding sites . 
+ Negative examples are derived from randomly selected non-coding DNA sequences . 
+ Positive examples are taken from RegulonDB [ 24 ] , which collects information about TFs in Escherichia coli . 
+ E. coli Classifiers for TFs developed through the SiteSleuth approach are evaluated by cross validation , and the classifier for Fis is tested against chromatin immunoprecipitation ( ChIP ) - chip assays of Fis binding sites [ 17 ] . 
+ Combining ChIP with microarray technology , ChIP-chip assays provide information about DNA-protein binding in vivo on a genome-wide scale [ 25 ] . 
+ We also evaluate the performance of SiteSleuth against four other computational methods : the method of Berg and von Hippel ( BvH ) [ 3 ] , MATRIX SEARCH [ 5 ] , Match [ 7 ] , and QPMEME [ 6 ] . 
+ The BvH , MATRIX SEARCH , and Match methods rely on the PWM approach to capture TF preferences for binding sites . 
+ The QPMEME method is similar to SiteSleuth in that it employs a learning algorithm . 
+ In the case of Fis , we show that SiteSleuth generates significantly fewer estimated false positives and provides higher prediction accuracy than the other computational approaches . 
+ Methods
+ Our supervised learning approach , which we call SiteSleuth , involves training a linear SVM classifier to distinguish TF binding sites documented in RegulonDB from randomly selected noncoding DNA sequences , which we take to represent negative examples of TF binding sites . 
+ Briefly , a linear SVM classifier is an ( n21 ) - dimensional hyperplane in a n-dimensional feature space that maximally separates positive and negative training examples , if possible . 
+ When the training data can be separated by a hyperplane ( w x T + d = 0 ) , two parallel hyperplanes , given by wTx + d = 61 , mark the boundaries that maximize the distance between positive and negative examples ( 2/IwI ) . 
+ The quantity x is a vector of features , w is a weight vector of length n , and IwI2 = wTw . 
+ A larger distance 2/IwI results in a lower generalization error of the classifier . 
+ Positive examples lie on the positive side of T w x + d = 1 and negative examples lie on the negative side of T w x + d = 21 . 
+ The parameters w and d of a classifier are determined by solving an optimization problem [ 26 ] . 
+ On the other hand , if no hyperplane exists that completely separates positive and negative examples , which is generally the case here , w and d can be determined using a soft margin method [ 26 ] , which finds a hyperplane that achieves the largest separation distance possible with the smallest error penalty imposed by non-zero slack variables , fk ( k = 1 , ... , N ) , where N is the number of training examples , both positive and negative . 
+ The soft margin method trades off separation and misclassification . 
+ Another way to deal with training examples that can not be fully separated is to use a nonlinear SVM . 
+ Because the computational cost of using a nonlinear SVM for our purposes would be expensive , we opted to use a linear SVM with slack variables . 
+ The method of finding 
+ Classifier training
+ Let us use X = { x1 , ... , xN } to represent the set of training data , where xk ( k = 1 , ... , N ) is a real-valued n-dimensional feature vector th that characterizes the k training example and n is the number of features considered . 
+ The features considered are described below . 
+ Given input xk and scalar output yk = { 21,1 } , which identifies a training example as a positive or negative example of a binding site , classifier training produces an ( n21 ) - dimensional hyperplane in the space of features that satisfies the equation w x T + d = 0 and a set of linear inequality constraints , each involving a slack variable . 
+ The parameters w and d and the slack variables jk ( k = 1 , ... , N ) are found by solving the minimization problem where C + and C2 are penalty parameters [ 27 ] . 
+ These parameters are introduced to balance the contributions of negative and positive training examples to the objective function ( Eq . 
+ 1a ) , as we typically have available many more negative examples than positive examples . 
+ The penalty parameters are determined for each TF via a grid search over ranges of C2 and C + values as part of a 3-fold cross-validation procedure for each classifier . 
+ In 3-fold cross validation , we randomly divide the training set into three subsets of roughly equal size . 
+ One subset is then used to test the accuracy of the classifier trained on the remaining two subsets until each subset has been used in testing . 
+ We used the F-measure to assess accuracy . 
+ The F-measure is the harmonic mean of precision ( p ) and recall ( r ) : 
+ F~ : pzr
+ Precision is the fraction of predicted binding sites that are actually binding sites and recall is the fraction of actual binding sites predicted to be binding sites : where TP , FP , and FN represent true positives , false positives and false negatives from 3-fold cross validation . 
+ To find values of C2 and C + that maximize the F-measure , we first performed a coarse grid search over the following grid points : C = [ 2 , 2 , ... , 2 ] 25 23 15 2 and C = [ 2 25 23 15 + , 2 , ... , 2 ] . 
+ We then performed fine grid searches using progressively smaller grid spacing ( 2 , 2 , 2 , ... ) around 0.5 0.125 
+ SiteSleuth prediction
+ Once trained , a classifier for a TF , taken to recognize binding sites of length L , is used for prediction as follows . 
+ The classifier is used to scan an organism 's genome for binding sites of length L. Given a feature vector x for a potential binding site m , we m calculate the quantity w xm T + d . 
+ The decision function of the classifier is the sign of w xm T + d. Thus , if the sign of this quantity is positive , then site m is predicted to be a TF binding site . 
+ Conversely , a negative quantity indicates that m is not a binding site . 
+ This step is repeated for all non-coding sequences in the E. coli genome of length L . 
+ The length L was chosen for each TF based on information in RegulonDB [ 24 ] . 
+ Structural and chemical features
+ Structural and chemical features of short DNA sequences were defined based on the predicted 3D structures of these DNA sequences , which were determined via MD simulations . 
+ MD simulations of solvated nucleic acids have been performed for almost three decades [ 28,29 ] . 
+ Simulations of DNA oligomers have been studied systematically and results have been discussed in multiple publications [ 30 -- 32 ] . 
+ Our approach is similar to that used in Refs . 
+ [ 30 -- 32 ] and is described below . 
+ Because the available experimental data are incomplete ( i.e. , structures are unavailable for all 4-mers , at least in the Nucleic Acid Database [ 33 ] ) and available structures have been determined under various experimental conditions ( e.g. , free or bound to protein ) , we used simulated structures rather than experimentally determined structures for determining structural and chemical features . 
+ Predicted structures were obtained for a common condition in a uniform manner . 
+ Structural features . 
+ For an indirect readout mechanism , a TF recognizes DNA conformation , the local structure of DNA . 
+ To calculate structural features of base pairs , we considered all possible 3-mers and 4-mers of DNA . 
+ Each of the 3-mers ( 4-mers ) was embedded within flanking GC nucleotide pairs to generate 
+ 7-mers ( 8-mers ) . 
+ Flanking nucleotide pairs are added to eliminate edge effects of 3-mers or 4-mers of DNA . 
+ We chose to cap both ends with GC nucleotide pairs , which is a common choice for reasons of rigidity and symmetry [ 30 -- 32 ] . 
+ For each 7-mer or 8-mer , its initial 3D structure was generated using the 3DNA software [ 34 ] . 
+ The structure produced by 3DNA is based on the Watson and Crick DNA structure . 
+ The 3D DNA fragments were solvated and ionized to balance the negative charges of the DNA backbone . 
+ Final structures were obtained using the NAMD software tool [ 35 ] for MD simulations with the CHARMM27 force field parameters [ 36 ] . 
+ Other MD software packages could also have been used to obtain 3D DNA structures , but NAMD was a convenient choice for us because of our familiarity with this package . 
+ For each NAMD simulation , we performed 3 picoseconds ( ps ) of minimization , 7 ps of heating to 300 K , 30 ps of relaxation , and 50 ps of equilibration , followed by 1 nanosecond ( ns ) of production , or post-equilibrium , simulation . 
+ Each simulation was carried out using the isothermal-isobaric ( NPT ) ensemble ( P = 1 atm , T = 300 K ) . 
+ During the production simulation , the DNA structures were recorded every picosecond for a total of 1000 frames of DNA structures . 
+ For each 7-mer and 8-mer , these 1000 frames were aligned to calculate the average DNA structure . 
+ From the average structure , we performed normal mode analysis [ 37 ] using the 3DNA software tool [ 34 ] to estimate six base parameters for the middle base pairs of 3-mers , and six step parameters for the middle base pairs of 4-mers . 
+ The six base parameters are shear , buckle , stretch , propeller , stagger and opening , and the six step parameters are shift , tilt , slide , roll , rise and twist [ 37 ] . 
+ Chemical features . 
+ A TF can recognize specific DNA sequences based on direct contact between nucleotides and amino acids through electrostatic and hydrophobic interactions . 
+ These molecular interactions , and therefore the interaction field features of a nucleotide , depend on nearby bases . 
+ Considering nucleotides beyond the first nearest neighbor bases did not result in significantly different values for interaction field features ( results not shown ) , but it was significantly more computationally expensive . 
+ Thus , we considered only the influence of immediately adjacent bases in calculations of the molecular interaction field features of a nucleotide . 
+ Let b be a middle nucleotide of a 3-mer as shown in Figure 1 . 
+ To characterize the sequence-dependent molecular interaction field around b , we used the average structure for the 3-mer obtained from MD simulations and defined V as the volume around the base b constrained by four planes ( A , B , C , and D ) as shown in Figure 1 . 
+ Within V , we systematically placed a small probe at different locations and computed the interaction energy between the DNA and the probe using the molecular force field encoded in GRID [ 38 ] , a software tool designed for this purpose . 
+ We considered 31 probes available in GRID , such as an alkyl hydroxyl group , a methyl group , and an aliphatic neutral amide group ( Table S1 ) . 
+ The distance between planes C and D , which bound V , is 20 Å . 
+ This distance was chosen to capture all interactions between a probe and the DNA sequence that produce energy less than 20.001 Kcal/mol , which is the largest negative energy reported by GRID . 
+ For each probe i M { 1 , ... , 31 } , using the GRID software tool [ 38 ] , we calculated and recorded the minimum interaction energy , Pi : where W ( r ) is the potential at point r . 
+ We also calculated the interaction score , Qi : where the integration is performed over the volume V . 
+ We integrated over all points in V where the interaction energy was less than 20.001 . 
+ The interaction field features for all middle bases in all of the 64 possible 3-mers were calculated and stored for use in defining chemical features as described below ( Figures 1 and 2 ) . 
+ For probe i , the interaction score , Qi , is a measure of the energy stored in the field of the DNA sequence in the volume V. Note that we defined the volume for each nucleotide separately rather than for a base pair to capture more information about DNA structure , such as major groove and minor groove effects . 
+ A middle base of a 3-mer is associated with 62 molecular interaction field features : a minimum interaction energy given by Eq . 
+ 2a for each of the 31 probes and an interaction score given by Eq . 
+ 2b for each of the 31 probes . 
+ We found that some of these features are correlated . 
+ To identify a smaller set of uncorrelated features , we used principal component analysis ( PCA ) . 
+ PCA generates a list of uncorrelated variables , or principal components , that are described by the eigenvectors of the correlation matrix of a dataset . 
+ The variability in the dataset is captured by the eigenvalues that correspond to the eigenvectors . 
+ For each probe and each of the 64 possible 3-mers , the values of the 62 molecular interaction field features for each base in the middle base pair were normalized to mean 0 and standard deviation 1 and organized in a 64662 matrix . 
+ PCA was performed on this matrix . 
+ We arbitrarily chose the first eight eigenvalues , which capture 93 % of the variance , and used the eigenvectors associated with these first eight eigenvalues as the chemical features to be used in training . 
+ Thus , for each middle base in a 3-mer , its chemical features are the corresponding elements from the first eight principal components , or eigenvectors , from PCA of the 
+ Mapping of DNA sequences to feature vectors . 
+ For a given TF that recognizes binding sites of length L in a genome , DNA sequences of length L are mapped to feature vectors as follows . 
+ For each of the L bases in a DNA sequence , we determine six geometrical base parameters and eight chemical features . 
+ These features are those that were calculated as described above for a 3-mer with the base of interest at the middle position . 
+ Recall that the eight chemical features are derived from the principal components of 62 molecular interaction field features . 
+ We also determine six geometrical step parameters for the middle two bases of all possible 4-mers . 
+ For efficiency , the features of a sequence are determined by table look up . 
+ In other words , the features of all possible 3 - and 4-mers were calculated before assigning features to known and potential TF binding sites and saved in a table . 
+ Recall that structural features of 3 - and 4-mers were determined in the context of flanking GC sequences . 
+ Figure 2 illustrates how feature vectors are obtained for a particular DNA sequence . 
+ The features associated with a sequence depend on the flanking nucleotides . 
+ As shown in Step 1 of Figure 2 , for each of the ten nucleotides in the DNA sequence GACCTCTAGA , starting with G , we determined the chemical features of the 3-mer in which this nucleotide is centered . 
+ Since DNA is double stranded , both strands were mapped to chemical features . 
+ For example , G within AGA and its complement taken in reverse , C within TCT , were mapped to chemical features . 
+ Then , shifting one base to the right , the next triplet GAC and its complement GTC were mapped to chemical features . 
+ This process continues until the last base in the sequence , A , is reached . 
+ The ten possible 3-mers for this example are AGA , GAC , ACC , CCT , CTC , TCT , CTA , TAG , AGA , and GAT . 
+ The corresponding reverse complements are TCT , GTC , GGT , AGG , GAG , AGA , TAG , CTA , TCT , and ATC . 
+ In Step 2 of Figure 2 , we mapped each middle base pair in the ten possible 3-mers in the sequence to six geometrical base features . 
+ Similarly , in Step 3 , we mapped the two middle base pairs for each of the nine possible 4-mers in the sequence to six geometrical step features , starting with GA in AGAC . 
+ The nine possible 4-mers for this example are AGAC , GACC , ACCT , CCTC , CTCT , TCTA , CTAG , TAGA , and 
+ AGAT . 
+ For this example sequence , there are ten triplets and nine quadruplets , which result in ( 10 triplets * 8 features from PCA analysis per base * 2 middle bases per triplet ) 160 chemical features , ( 10 triplets * 6 structural base features per triplet ) 60 structural base features , and ( 9 quadruplets * 6 structural step features per quadruplet ) 54 structural step features , for a total of 274 feature vector components ( n = 274 ) . 
+ The structural and chemical features associated with AGA are given in Table S2 for reference . 
+ Sources of negative and positive examples for training . 
+ The E. coli genome was downloaded from KEGG [ 39 ] . 
+ The E. coli open reading frames ( ORFs ) were identified in KEGG . 
+ For each E. coli TF , its documented binding sites were downloaded from RegulonDB 5.6 . 
+ We decided to consider only E. coli TFs with at least five known binding sites . 
+ There are 54 such TFs in RegulonDB . 
+ The DNA sequences for the set of known binding sites for a given TF were mapped to feature vectors , and these vectors were used in training . 
+ To obtain negative examples for training , we first removed the ORFs from the genome . 
+ The remaining non-coding portions of the genome were taken to be negative examples of TF binding sites . 
+ We randomly selected 10,000 non-coding sequences to serve as negative examples for each TF , and mapped these sequences to feature vectors . 
+ We also obtained positive training data from DPInteract [ 40 ] . 
+ The source of training data did not affect the main qualitative findings of our method comparisons reported in the Results section . 
+ Namely , we find that the performance of SiteSleuth is better than the other methods tested . 
+ Results based on DPInteract training data are given in Table S5 of the Supplemental Material . 
+ These results are not discussed further because DPInteract has not been updated for some time and more binding sites are documented in RegulonDB . 
+ To build a SiteSleuth model for a TF , we need known binding sites for the TF ( positive examples ) , 10,000 randomly selected non-coding sequences ( negative examples ) , and the structural and chemical features of short DNA sequences . 
+ It is time consuming to generate the structural and chemical features of short DNA sequences because these features require MD simulations to be performed and molecular interaction energy calculations . 
+ However , the MD simulations are performed only once and the structural and chemical features of short DNA sequences are tabulated . 
+ SiteSleuth classifiers are defined by a vector ( w , d ) , whose determination requires SVM T training by solving the minimization problem defined in Eq . 
+ 1a subject to the constraints defined in Eq . 
+ 1b for the positive and negative examples . 
+ We used libsvm [ 27 ] for training . 
+ A single training run takes less than 1 minute . 
+ For a potential binding site m , we used the tabulated structural and chemical features to calculate feature T vector xm and the prediction value w xm + d . 
+ Once this is done , using the SiteSleuth model to scan the E.coli genome requires several minutes for each TF . 
+ Implementation of other TF binding site prediction methods
+ For comparison , we implemented four other computational TF binding site prediction methods : the method of Berg and von Hippel ( BvH ) [ 3 ] , Match [ 7 ] , MATRIX SEARCH [ 5 ] , and QPMEME [ 6 ] . 
+ These methods were implemented as described in the cited papers and , for the 54 TFs studied , a list of binding sites predicted by each method can be found online at http://cellsignaling.lanl.gov/ EcoliTFs/SiteSleuth / . 
+ For completeness , each method is briefly presented below . 
+ To discuss these methods we will need to first introduce a few quantities . 
+ For a set of N DNA binding sites of a particular TF , the length of each binding site is denoted by L . 
+ The value of L is set equal to the length of binding sites reported in RegulonDB for a given TF . 
+ In the case of Fis , we set L = 21 . 
+ We define nj ( b ) to be the number of times base b appears in the j position in the th sequences of the binding sites , and fj ( b ) to be the corresponding frequency . 
+ We denote ff ~ ( b ) as the overall background frequency of base b . 
+ We use S to denote a potential TF binding site of length L and we use Sj ( j = 1 , ... , L ) to denote the j base of sequence S. th For the BvH method , we denoted the number of occurrences of the most common base in position j of the set of binding sites by nj ( 0 ) . 
+ Using a training set of N binding sites , the BvH method calculates the score of each binding site as the summation over every position of the log-odds score of observing a base of S versus the most frequent base in the corresponding position of the sequence . 
+ Thus , the score is given by 
+ A pseudocount of 0.5 is used in the formula [ 3 ] . 
+ A cutoff threshold is defined as the mean score of the N positive training examples . 
+ To evaluate whether a new sequence S is a binding site , the score of S is calculated based on the above formula and compared with the cutoff threshold . 
+ If the score of sequence S is greater than the cutoff threshold , it is predicted to be a binding site . 
+ For the Match method , a set of N training examples is used to define an information vector I ( j ) , which describes the conservation of the position j in a binding site from the training set : X 
+ I(j)~ fj(b) ln (4fj(b)): b[fA,T,C,Gg
+ The information vector is used to evaluate whether a new sequence S is a binding site or not by calculating a score defined as and min and max are calculated using the lowest and highest nucleotide frequency in each position , respectively . 
+ A cutoff threshold is defined as the mean score of the N positive training examples . 
+ If the score for a new sequence S is larger than the cutoff threshold , S is predicted to be a binding site . 
+ Using a set of N binding sites as training examples , the MATRIX SEARCH method calculates the score of each binding site S as the summation over every position of the log-odds score of observing a base in S versus the overall background frequency of that base in the corresponding position of the sequences . 
+ Thus , the score is given by 
+ A pseudocount of 0.01 is used in the formula [ 5 ] . 
+ A cutoff threshold is determined as the mean of the N scores calculated from the training data . 
+ A new sequence S is predicted to be a binding site if its score is greater than the cutoff threshold . 
+ The QPMEME ( Quadratic Programming Method of Energy Matrix Estimation ) method defines a weight e ( b ) for each base b j at position j in S . 
+ The score for a sequence S is defined as 
+ The weight ej ( b ) is estimated via a learning algorithm that only uses positive examples . 
+ The learning algorithm minimizes the variance e2 subject to the constraint that the score for each known binding site is less than a predefined cutoff value . 
+ Consistent with the Methods section of Djordjevic et al. [ 6 ] , we used 21 for the cutoff value in our implementation of QPMEME , which constrains all known binding sites to one side of a hyperplane . 
+ Mathematically , the learning algorithm is described by 
+ Comparison of methods
+ Cross-validation . 
+ SiteSleuth was implemented for 54 TFs , which each have at least five known binding sites in E. coli according to RegulonDB ( Table S3 ) . 
+ A complete list of binding sites predicted by SiteSleuth for each TF can be found online at http://cellsignaling.lanl.gov/EcoliTFs/SiteSleuth/ . 
+ A linear SVM served as the classification model for each TF . 
+ The classification models were used to scan the entire non-coding portion of the DNA sequence to predict new binding sites . 
+ For BvH , Match , and MATRIX SEARCH , as described above , the cutoff thresholds for classifying potential binding sites as true binding sites were defined to be the mean scores of the positive training examples . 
+ The cutoff threshold used for QPMEME was 21 [ 6 ] . 
+ The cutoff threshold for SiteSleuth was w x T + d. 0 . 
+ Each model relies on a set of parameters , some of which are fixed and some of which are free parameters that must be estimated . 
+ More complex models have more free parameters , but these free parameters increase the chance of overfitting the data . 
+ It is possible that complex models will be able to fit the training data well but that the model 's ability to accurately predict new TF binding sites may be low . 
+ Thus , to address the question of possible overfitting and to evaluate each model 's prediction capability we performed 3-fold cross-validation . 
+ For each TF , training and testing were performed ten times to estimate the mean crossvalidation value for the positive examples . 
+ The cross-validation score , V , is the fraction of positive examples predicted to be true binding sites . 
+ One measure used to compare classifiers is the area under a receiver operating characteristics ( ROC ) curve . 
+ A ROC curve is a two-dimensional plot of the false positive rate ( 1 - specificity ) versus the true positive rate ( sensitivity ) . 
+ Each data point on this plot is generated by changing the cutoff values of classifiers and the area under the ROC curve ( AUC ) is calculated . 
+ The AUC is always between 0 and 1 . 
+ A perfect classifier will have an AUC of 1 and a random classifier will have an AUC of 0.5 . 
+ We implemented an algorithm for generating ROC curves and for calculating the AUC , which ranks classifier scores according to testing examples [ 41 ] . 
+ Positive examples for a given TF are chosen by randomly dividing the training set data into 2/3 positive training examples and 1/3 positive testing examples . 
+ The non-coding portions of the E. coli genome were used to generate all possible negative examples of TF binding sites . 
+ We built models using the training examples for the five methods . 
+ The models are used to calculate scores for positive testing examples and negative examples . 
+ An ROC curve and the corresponding AUC were estimated . 
+ For each TF , we performed the above procedure ten times to estimate ten AUCs for each method , and we reported the average value and standard deviation of AUC . 
+ For n positive testing examples , we can generate n points to draw the ROC curve . 
+ Fewer positive testing examples may generate large uncertainty in AUC calculation . 
+ Thus , we performed AUC analysis only for TFs in RegulonDB with at least 20 known binding sites . 
+ Comparison with experimental data . 
+ We further interrogated the performance of these methods against SiteSleuth by comparing predictions against experimental data for Fis binding to E. coli DNA [ 17 ] . 
+ Cho et al. [ 17 ] identified 894 Fis-associated binding regions in ChIP-chip experiments . 
+ For each computational method , its list of predicted Fis binding sites , 21 base pairs ( bp ) in length , was compared to these 894 binding regions . 
+ Comparisons were made by scanning the binding region in the forward and reverse directions . 
+ A match was recorded if the complete predicted binding site or its complement was found within the experimentally determined binding region . 
+ False positives were computed by subtracting the number of matches from the total number of predicted binding sites . 
+ Results
+ Local structural features of DNA depend on nucleotide environment
+ To make a preliminary assessment of our hypothesis that we can produce better predictions if we consider the chemical and structural features of sequence-specific DNA , we examined the features of various sequences and found that the same base in the same position in a sequence can have different chemical and structural features depending on its environment . 
+ We illustrate this finding in Figure 3 , which shows sequence-specific DNA structures . 
+ From the structures , one can see the context-dependent variation in the twist angle between the center two base planes . 
+ The center base pair is the same in each structure , but the twist angle for the left structure of Figure 3A is 220.4 u , whereas the twist angle for the right structure of Figure 3A is 24.3 u. Figure 3A demonstrates that different local structural features may characterize the same nucleotide at the same position in a sequence . 
+ The feature vectors for TGG and AGA are given in Table S2 . 
+ Similarly , Figure 3B demonstrates that different nucleotides in the same position may be characterized by the same local structural features . 
+ The twist angles of the middle base pairs of the two structures in Figure 3B are the same , even though the base pairs are different . 
+ These observations suggested to us that chemical and structural features may capture sequence correlations relevant for TF-DNA interactions that are not apparent from sequence data alone and encouraged us to build classifiers that separate negative and positive examples of TF binding sites based on their positions in chemical and structural feature space . 
+ This approach , which we call the SiteSleuth method , combines DNA structure prediction , computational chemistry and machine learning . 
+ To demonstrate the reliability of MD simulations for prediction of structural features of DNA oligomers , we calculated the propeller feature using 1 ) available experimental structural data ( obtained from the Nucleic Acid Database [ 33 ] ) and 2 ) predicted structures obtained via MD simulations , and we found significant correlation ( about 0.8 ) . 
+ The results are shown in Figure S2 . 
+ Classifiers
+ As described in the Methods section , binary SiteSleuth classifiers were developed to identify and predict the binding sites of 54 TFs based on TF binding sites documented in RegulonDB . 
+ The input to a classifier is a vector of structural and chemical features generated from DNA sequences , each labeled as either a positive or negative example . 
+ Negative examples were taken from randomly chosen non-coding sequences of the E. coli genome . 
+ The classifiers were then used to scan both strands of non-coding sequences in the E. coli genome from 59 to 39 to identify potential TF binding sites . 
+ For comparison , we also considered four other computational TF binding site prediction methods : BvH [ 3 ] , MATRIX SEARCH [ 5 ] , Match [ 7 ] , and QPMEME [ 6 ] These methods are each briefly described in the Methods section . 
+ Cross-validation of classifiers
+ The accuracy of predictions of each method was evaluated through a 3-fold cross-validation procedure , described in the Methods section . 
+ For each method , the mean cross-validation score , V , for the 54 TFs considered are listed in Table S4 and classifier accuracy is summarized in Figure 4 . 
+ Recall that V is the fraction of positive examples predicted to be true binding sites in the cross-validation procedure . 
+ Figure 4 is a heat map showing the cross-validation score , 0ƒVƒ1 , produced by each of the five computational methods . 
+ Brighter red indicates a higher cross-validation score and black represents V ~ 0 . 
+ A cross-validation score of V ~ 1 indicates perfect prediction , whereas a cross-validation score of zero indicates that the method fails to predict any TF binding sites correctly . 
+ Of the 54 TFs studied , SiteSleuth outperforms all the other methods in 28 cases , equals the next best method in 11 cases , and performs more poorly in 15 cases . 
+ Based on the number of times a method outperformed all the other methods , SiteSleuth ( 28 ) performed better than QPMEME ( 8 ) , which performed better than MATRIX SEARCH ( 2 ) , which equaled the performance of BvH ( 2 ) , which performed better than Match ( 0 ) . 
+ In one case , IcsR , SiteSleuth is the only method for which V = 0 . 
+ The data used to construct Figure 4 are given in Table S4 . 
+ Interestingly , Figure 4 reveals that all methods give crossvalidation scores of zero for several TFs : CysB , GcvA , OxyR , RcsAB , and Rob . 
+ This observation suggests that methods that rely on DNA sequence information , including SiteSleuth , are insufficiently equipped to predict the binding sites for these TFs . 
+ Some of these TFs , such as GcvA [ 42 ] , may perhaps recognize DNA indirectly via interaction with a second protein that recognizes DNA directly . 
+ Another explanation could be that some of these TFs , such as Rob [ 43 ] , may be recognizing very short sequences . 
+ The total number of TF binding sites predicted by each computational method is given in Table S3 . 
+ For most TFs , QPMEME and Match both predict a large number of TF binding sites in the E. coli genome . 
+ The BvH and MATRIX SEARCH methods predict fewer binding sites , but still more than the number of predictions generated by SiteSleuth . 
+ In Figure 5 , we show the performance of SiteSleuth relative to that of BvH for the TFs with five or more known binding sites . 
+ The relative performance ( RP ) score shown in Figure 5 is defined as the number of TF binding sites predicted by BvH divided by the number of TF binding sites predicted by SiteSleuth . 
+ This score indicates how many times more TF binding sites are predicted by BvH than by SiteSleuth . 
+ For example , BvH predicts 23 times more TF binding sites for MetJ than does SiteSleuth . 
+ For reference , the log transformed number of TF binding sites predicted by SiteSleuth is also indicated in Figure 5 and a solid line is drawn at RP = 1 . 
+ As can be seen in Figure 5 , 41 TFs have RP .1 and 13 TFs have RP ,1 . 
+ Thus , there is a large class of TFs for which SiteSleuth predicts fewer binding sites than BvH ( RP .1 ) and , by extension , the other computational methods . 
+ From these results alone , it is not clear whether fewer predictions are a result of fewer false positives or more false negatives . 
+ To examine this question , we considered ChIP-chip data for Fis binding to DNA [ 17 ] , which , as shown in Figure 5 , has RP .1 . 
+ Our findings are discussed in the next section . 
+ As described in the Methods section , we also generated ROC curves and calculated AUC to compare classifiers . 
+ For each of the five computational methods and for TFs in RegulonDB with 20 or more known binding sites , the AUC values are tabulated in Table S6 . 
+ We find that SiteSleuth had the largest AUC for 60 % of the TFs tested , BvH had the largest AUC for 25 % of the TFs , MATRIX SEARCH had the largest AUC for 10 % of the TFs tested , QPMEME had the largest AUC for 5 % of the TFs tested , and Match had the largest AUC for 0 % of the TFs tested . 
+ Validation against ChIP-chip data
+ ChIP-chip assays have identified 894 DNA sequences that bind Fis in E. coli [ 17 ] , which we used to validate the Fis binding sites predicted by each method . 
+ Looking at SiteSleuth results for Fis , SiteSleuth predicted 129,150 binding sites for Fis from a positive training set of 133 binding sites published in RegulonDB ( Table S3 ) , the second largest training set available for the 54 TFs we studied . 
+ The relative performance of SiteSleuth for Fis binding site prediction is close to one for three of the other methods under consideration ( RPBvH = 1.56 , RPMatch = 2.03 , RPMATRIX SEARCH = 1.55 , and RPQPMEME = 11.67 ) . 
+ SiteSleuth 's cross-validation score for Fis ( V = 0.33 ) is low ( Table S4 ) . 
+ The availability of empirical data on Fis binding , including a larger number of known binding sites in RegulonDB for training , and the indirect recognition mechanisms of Fis binding to DNA [ 33 ] suggested that Fis may provide a good example to test whether SiteSleuth , which accounts for DNA structure , performs better than the other methods , despite its low cross-validation score . 
+ Predictions of Fis binding sites from each computational method are compared to experimentally identified DNA sequences that bind Fis in E. coli in ChIP-chip assays [ 17 ] . 
+ We assume that the sequences found in this study contain , to a first approximation , the complete set of Fis binding sites . 
+ For each method , the approximate number of false positives was determined by subtracting the number of predictions that matched experimentally defined Fis binding sequences from the total number of predictions made by the method . 
+ Figure 6 shows the number of false positives generated by each computational method ( black bars ) . 
+ As can be seen , the QPMEME method produced more than 1.5 million estimated false positives . 
+ Match generated approximately 261,000 false positives and BvH and MATRIX SEARCH both generated roughly 200,000 false positives . 
+ SiteSleuth produced the fewest false positives , over 70,000 fewer than the next best method , a reduction of 35 % in the estimated false positive rate . 
+ In absolute terms , QPMEME predicted a binding site within 889 of the 894 experimentally defined Fis binding sequences ( 99.44 % ) . 
+ However , the predictions are not practically useful , since they are hidden within over 1.5 million estimated false positive results . 
+ The gray bars in Figure 6 report the percentage of TF binding sites correctly predicted by the five computational methods normalized by the total number of predictions . 
+ After normalization , QPMEME was the lowest performer for Fis . 
+ The BvH , Match , and MATRIX SEARCH methods gave approximately equivalent results . 
+ SiteSleuth outperformed these methods , showing a 41 % improvement over MATRIX SEARCH , the next best method . 
+ Discussion
+ We postulated that a better TF binding site prediction method could be developed on the basis of chemical and structural features , instead of letter sequences . 
+ To test this hypothesis , we developed the SiteSleuth method , in which potential TF binding sites are associated with DNA sequence-specific structural and chemical features . 
+ These features are then used to build classification models for and to predict TF binding sites . 
+ Compared to the other computational methods we tested , including the three methods that use a PWM representation of TF binding sites ( BvH , Match , and MATRIX SEARCH ) , our method provides a higher cross-validation accuracy . 
+ For 72 % of the TFs studied , SiteSleuth cross-validation accuracy is as high as or higher than any other method ( Table S4 ) . 
+ SiteSleuth also generates 35 % fewer estimated false positive results ( Figure 6 ) , and gives more accurate predictions ( 41 % improvement over the next best method ) for TF binding sites ( Figure 6 ) . 
+ In addition , the four other methods considered here each rely on the additivity assumption , which states that each nucleotide in a DNA binding site contributes to binding affinity in an independent fashion . 
+ In the study of Benos et al. [ 44 ] , the additivity assumption was tested . 
+ In general , the additivity assumption holds rather well as shown by ddG measurements of mutated DNA sites in several protein-DNA complexes [ 44 ] . 
+ However , it was shown that additivity is a poor assumption for some cases [ 44 ] . 
+ SiteSleuth does not rely on the additivity assumption , which may partially explain its better performance . 
+ It must be noted that none of the methods for predicting TF binding sites considered here can be deemed reliable when used alone . 
+ In Figure 6 , although SiteSleuth indeed produces the highest fraction of correct predictions , the fraction of correct predictions is still small at 0.4 % . 
+ Nonetheless , SiteSleuth constitutes an advance over existing methods and the approach warrants further investigation . 
+ The chemical and structural features we have considered are crude and additional determinants of specificity and other biologically relevant features , such as amino acid side chain interaction energy with DNA , could be incorporated into the SiteSleuth approach in the future . 
+ It may also be possible to incorporate experimental measurements of short DNA sequence properties into the SiteSleuth framework . 
+ A mechanistic understanding of TF binding to DNA could guide the design of novel model features . 
+ For example , a recent study of Fis showed that the shape of the DNA minor groove affects Fis-DNA binding [ 45 ] . 
+ This property is hard to capture using only DNA letter sequences , but could be captured by defining a new feature in SiteSleuth based on the available structural data . 
+ Presently , the features defined in SiteSleuth are unable to capture the effects of the minor groove on Fis binding , which may account for SiteSleuth 's poor performance in absolute terms . 
+ The QPMEME method is similar to the SVM-based approach of SiteSleuth . 
+ Both methods involve a quadratic programming minimization procedure with linear inequality constraints . 
+ QPMEME maps sequences of L bases into 4 | L multidimensional spaces with energy terms for each dimension and constructs a hyperplane such that all positive examples are located on one side of the plane . 
+ This quadratic optimization procedure defines a separating hyperplane by minimizing the variance of energies in an energy matrix so as to minimize the number of random sequences lying on the side of the plane that contains the positive examples . 
+ In contrast , the separating hyperplane of an SVM divides true binding sites from nonbinding sites with maximum margin . 
+ The distinction between random sequences , considered in QPMEME , and negative examples , considered in SiteSleuth , is important because sequences do not appear with equal probability in the E. coli genome , as is shown in Figure S1 . 
+ SiteSleuth used negative examples directly sampled from non-coding regions of the E. coli genome . 
+ In the report of Djordjevic et al. [ 6 ] , the QPMEME method is applied to non-ORF regions of the E. coli genome to predict binding sites for 34 TFs , including Fis . 
+ For Fis , Table 1 of Ref . 
+ [ 6 ] indicates that QPMEME predicts 255 Fis binding sites , compared to the 1.5 million found with QPMEME in our hands ( Table S3 ) . 
+ To ensure that our implementation was correct , we applied QPMEME using the same training data set used by Djordjevic et al. [ 6 ] from DPInteract and were able to reproduce their weight matrix [ 6 ] . 
+ For Fis , RegulonDB reports 133 binding sites , compared to only 19 reported Fis binding sites in DPInteract . 
+ This difference in the size of the training data set ( 19 versus 133 positive examples of Fis binding sites ) may be responsible for the difference in number of predicted binding sites ( 255 vs. 1.5 million ) . 
+ As can be seen by comparing the common entries in Table 1 of Ref . 
+ [ 6 ] and in Table S3 , Fis is not an isolated example of QPMEME predicting a larger number of TF binding sites when the number of positive training examples is larger . 
+ It is also the case for the TFs ArcA , ArgR , CRP , CytR , DnaA , FadR , FarR , Fnr , FruR , GalR , GlpR , H-NS , IHF , LexA , LRP , MetJ , NagC , NarL , OmpR , SoxS , and TyrR . 
+ The QPMEME method may perform poorly for TFs with relatively large numbers of known binding sites because QPMEME requires that all positive examples be located on one side of a hyperplane in the space spanned by an energy matrix [ 6 ] ( see Methods section ) . 
+ Thus , known binding sites that are outliers in this space may potentially expand the range of sequences considered to be binding sites , such that recall is maximized at the expense of precision . 
+ We have not systematically investigated the reasons underlying our observation that QPMEME performs poorly for the TFs identified above when using positive training data from RegulonDB , as such an investigation was beyond the intended scope of our study . 
+ In summary , how TFs selectively bind to DNA is one of the least understood aspects of TF-mediated regulation of gene expression . 
+ An ability to better predict TF binding sites from small training data sets may advance our understanding of TF-DNA binding , and may reveal important insights into TF binding specificity , regulation and coordination of gene expression , and ultimately into gene function . 
+ A long-standing problem has been how to identify new TF binding sites given known binding sites . 
+ The accuracy and usefulness of computational methods for genome-wide TF binding site prediction has been limited by the inability to validate , verify , and inform these methods . 
+ Only recently has technology matured to the point that we can assay for TF binding sites on a genome-wide scale . 
+ This capability should allow us to critically evaluate predictions from computational methods and to develop methods that are more predictive than those currently available . 
+ Toward this end , the work presented here provides a starting point for future investigations of how TF binding site prediction can be improved by considering the physical and chemical aspects of TF-DNA binding . 
+ Supporting Information
+ seen , the non-coding genome sequence is not random , i.e. , the assumption that sequences appear with equal probability is invalid . 
+ Found at : doi :10.1371 / journal.pcbi .1001007 . 
+ s001 ( 3.93 MB TIF ) 
+ DOC)
+ Author Contributions
+ Conceived and designed the experiments : WSH PJU FM . 
+ Performed the experiments : ALB FM . 
+ Analyzed the data : ALB FM . 
+ Wrote the paper : ALB WSH FM . 
+ 6 . 
+ Djordjevic M , Sengupta AM , Shraiman BI ( 2003 ) A Biophysical Approach to Transcription Factor Binding Site Discovery . 
+ Genome Res 13 : 2381 -- 2390 . 
+ 7 . 
+ Kel AE , Gössling E , Reuter I , Cheremushkin E , Kel-Margoulis OV , et al. ( 2003 ) MATCH : a tool for searching transcription factor binding sites in DNA TM sequences . 
+ Nucleic Acid Res 31 : 3576 -- 3579 . 
+ 8 . 
+ Osada R , Zaslavsky E , Singh M ( 2004 ) Comparative analysis of methods for representing and searching for transcription factor binding sites . 
+ Bioinformatics 20 : 3516 -- 3525 .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/21278291.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/21278291.txt 0 → 100644
View file @27818a9
+ Retrospective Application of Transposon-Directed Insertion Site
+ Sequencing to a Library of Signature-Tagged Mini-Tn5Km2 Mutants of Escherichia coli O157 : H7 Screened in Cattle † Sabine E. Eckert ,1 ‡ Francis Dziva ,2 ‡ Roy R. Chaudhuri ,3 ‡ Gemma C. Langridge ,1 ‡ Daniel J. Turner ,1 § Derek J. Pickard ,1 Duncan J. Maskell ,3 Nicholas R. Thomson ,1 and Mark P. Stevens4 * The Wellcome Trust Sanger Institute , Wellcome Trust Genome Campus , Hinxton , Cambridge CB10 1SA , United Kingdom1 ; Enteric Bacterial Pathogens Laboratory , Institute for Animal Health , Compton , Berkshire RG20 7NN , United Kingdom2 ; Department of Veterinary Medicine , University of Cambridge , Madingley Road , Cambridge CB3 0ES , United Kingdom3 ; and Roslin Institute and Royal ( Dick ) School of Veterinary Studies , University of Edinburgh , Bush Farm Road , Roslin , Midlothian EH25 9RG , United Kingdom4 
+ Enterohemorrhagic Escherichia coli ( EHEC ) strains comprise a subset of Shiga toxin-producing E. coli strains that cause acute enteritis in humans ( 2 ) . 
+ Infections may be complicated by severe sequelae and are frequently acquired via contact with ruminant feces . 
+ The molecular mechanisms underlying colonization of the ruminant intestines by EHEC are incompletely understood . 
+ Previously , we screened a library of 1,900 EHEC O157 : H7 mutants for their ability to colonize bovine intestines by signature-tagged mutagenesis ( STM ) ( 6 ) . 
+ STM relies on a panel of transposons harboring unique oligo-nucleotide tags . 
+ The tags can be detected by ampliﬁcation and hybridization , enabling the composition of complex pools to be analyzed before and after inoculation of animals . 
+ Mutants that are negatively selected in vivo relative to the inoculum are inferred to lack a gene required for colonization or survival , which can be identiﬁed by isolation and sequencing of trans-poson-ﬂanking regions ( 16 ) . 
+ Our analysis focused on the prototype E. coli O157 : H7 strain EDL933 , for which the chromosome and plasmid sequences are known ( 1 , 18 ) . 
+ Of the 1,900 signature-tagged mutants screened , 101 were underrepresented in pools recovered from feces 5 days postinoculation of calves ( 6 ) . 
+ The transposon insertion site could be mapped in 79 such mutants , identifying 59 different genes inﬂuencing colonization ( 6 ) . 
+ Thirteen attenuating mutations were mapped to the locus of enterocyte effacement ( LEE ) , which encodes a type III secretion system 
+ * Corresponding author . 
+ Mailing address : Roslin Institute and Royal ( Dick ) School of Veterinary Studies , University of Edinburgh , Bush Farm Road , Roslin , Midlothian EH25 9RG , United Kingdom . 
+ Phone : 44 131 527 4200 . 
+ Fax : 44 131 440 0434 . 
+ E-mail : Mark.Stevens@roslin.ed.ac.uk . 
+ § Present address : Oxford Nanopore Technologies , 4 Robert Robinson Way , Magdalen Science Park , Oxford OX4 4GA , United Kingdom . 
+ ‡ Contributed equally to the study . 
+ † Supplemental material for this article may be found at http://jb . 
+ asm.org / . 
+ Published ahead of print on 28 January 2011 . 
+ ( T3SS ) required for the formation of `` attaching and effacing '' lesions . 
+ The role of T3SS components in intestinal coloni-zation was subsequently conﬁrmed with deﬁned mutants ( 6 , 17 ) and by screening of 480 signature-tagged mutants of EHEC O26 : H from calves ( 27 ) . 
+ STM also detected attenuating mutations in genes encoding secreted substrates of the T3SS ( espD , map , and nleD ) ( 6 ) . 
+ Though STM has provided valuable insights into the genetic basis of virulence of microbes , it is limited by the number of unique tags and the effort required to construct libraries and map attenuating mutations . 
+ Moreover , only negatively selected mutants tend to be investigated and subjective judgments are required to compare signal intensities relative to the input and coscreened mutants . 
+ Functional annotation of the E. coli O157 : H7 genome in reservoir hosts is further hindered by the cost of using large animals at a high level of disease containment . 
+ Recently , several protocols have been described that permit the simultaneous assignment of the genotype and ﬁtness score for mutants screened in pools . 
+ Transposon-di-rected insertion-site sequencing ( TraDIS ) exploits Illumina sequencing to obtain the sequence ﬂanking each transposon insertion ( 11 ) . 
+ The massively parallel nature of such sequencing permits comparison of the number of speciﬁc reads derived from inocula and output pools recovered from animals , providing a numerical measure of the extent to which mutants were selected in vivo . 
+ TraDIS obviates the need to construct and array uniquely tagged mutants and to subclone and sequence attenuating mutations , yielding substantial time and cost savings . 
+ TraDIS-like methods have deﬁned the essential gene complement of Salmonella enterica serovar Typhi ( 11 ) and Streptococcus pneumoniae ( 28 ) and have identiﬁed genes inﬂuencing Haemo-philus inﬂuenzae pathogenesis ( 7 ) and survival of the gut symbiont Bacteroides thetaiotaomicron ( 8 ) . 
+ We retrospectively applied TraDIS to assign the genotype and ﬁtness score of EDL933 mutants previously screened in calves . 
+ This required the massively parallel sequencing of transposon-ﬂanking regions in the input and output pools of 
+ EDL933 mini-Tn5Km2 mutants obtained by Dziva et al. ( 6 ) , as schematically shown in Fig. 1 . 
+ Adequate genomic DNA was retrieved for 19 of the mutant pools screened , comprising a total of 1,805 mutants . 
+ Genomic DNA from each input and output sample was quantiﬁed with a Nanodrop ND-1000 spectrophotometer ( Thermo Fisher , Loughborough , United Kingdom ) . 
+ Equal amounts ( 1 g ) from all input and all output samples were pooled , and input and output pools were separately fragmented by ultrasonication with a Covaris adaptive focused acoustics instrument , to an average of 200 bp ( 19 ) . 
+ Fragment libraries were prepared with the Illumina paired-end DNA sample preparation kit ( PE-102-1001 ; Illumina , Little Chesterford , United Kingdom ) , according to the manufacturer 's instructions , and quantiﬁed on an Agilent DNA1000 chip ( Agilent , South Queensferry , United Kingdom ) . 
+ To form dou-ble-strand adapters , oligonucleotides Ind_Ad_T ( 5 - ACACTC TTTCCCTACACGACGCTCTTCCGATC * T-3 [ where the asterisk represents phosphorothioate ) and Ind_Ad_B ( 5 - pG ATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG ATCTC-3 ) were annealed . 
+ The input and output DNA was ligated to the double-strand adapters and then quantiﬁed by quantitative PCR ( qPCR ) using the primers Ad_T_qPCR1 ( 5-CTTTCCCTACACGACGCTCTTC-3 ) and Ad_B_qPCR2 ( 5 - ATTCCTGCTGAACCGCTCTTC-3 ) and SYBR green ( Applied Biosystems , Warrington , United Kingdom ) . 
+ Two hundred nanograms of adaptor-ligated fragments was used to speciﬁcally amplify transposon insertion sites . 
+ Twenty-four cycles of PCR were performed with transposon-speciﬁc forward primer MiniTn5-P5-3pr-3 ( 5 - AATGATACGGCGACCACC GAGATCTACACCTAGGCTGCGGCTGCACTTGTG-3 ) , which contains the Illumina P5 end for attachment to the ﬂow cell , and reverse primer RInV3 .3 ( 5 - CAAGCAGAAGACGG CATACGAGATCGGTACACTCTTTCCCTACACGACGC TCTTCCGATCT-3 , containing the Illumina P7 end ) . 
+ PCR products were size separated on an agarose gel , and fragments of 350 to 450 bp were excised and recovered with QiaExII gel extraction columns ( Qiagen , Crawley , United Kingdom ) following the manufacturer 's instructions , but without heating ( 19 ) . 
+ DNA was eluted in 30 l of elution buffer , and quantiﬁed by qPCR with standards of known concentration , using prim-ers Syb_FP5 ( 5 - ATGATACGGCGACCACCGAG-3 ) and Syb_RP7 ( 5 - CAAGCAGAAGACGGCATACGAG-3 ) ( 19 ) . 
+ The DNA fragment libraries were sequenced for 37 cycles according to the manufacturer 's instructions on single end ﬂow cells by an Illumina GAII sequencer , using the custom sequencing primer MiniTn5-3pr-seq3 ( 5 - TAGGCTGCGGCTG CACTTGTGTA-3 ) , which binds 10 bp from the transposon end . 
+ There were 12.6 and 13.3 million reads obtained for the input and output pools , respectively ( European Nucleotide Archive accession no . 
+ ERP000368 ) . 
+ Totals of 12.1 million ( 96.3 % ) of the input reads and 12.4 million ( 93.7 % ) of the output reads contained perfect matches to the 3 end of mini-Tn5Km2 ( 3 ) , and these reads were included in downstream analyses . 
+ Transposon-derived sequence was removed from each read with a custom Perl script available from the authors . 
+ The remainder of each sequence read was mapped to th 
+ EDL933 chromosome and pO157 with NovoAlign ( Novocraft Technologies Sdn Bhd , Selangor , Malaysia ) . 
+ Totals of 9.9 million input reads ( 78.4 % ) and 10.7 million output reads ( 80.6 % ) were mapped to unique positions in the EDL933 genome . 
+ Subsequent analyses were performed with R , version 2.8.0 ( R Foundation for Statistical Computing , Vienna , Austria ) . 
+ To quantify changes in the number of reads arising from speciﬁc insertions between the input and output , we adopted an approach suggested for RNA-Seq data analysis ( 15 ) . 
+ The number of reads at each insertion location ( x ) was treated as a proportion of the total number of mapped reads ( n ) , and a variance-stabilizing arcsine-root transformation was applied , converting each value of x to narcsin ( x/n ) . 
+ The transformed output values were divided by the equivalent input values to determine the fold change . 
+ To avoid inﬁnite values derived from taking the log of 0 , sequence counts of 0 were replaced with an arbitrary value of 0.5 . 
+ Log2 fold change values were calculated to represent the difference in abundance of each mutant in the output pools relative to the input and provide a measure of ﬁtness . 
+ In our experience , TraDIS may overpredict the number of insertion sites due to a low-level background signal derived from incorrectly mapped or chimeric reads . 
+ To distinguish genuine inserts from this background signal , predicted insertion sites with fewer than 25 ( i.e. , 32 ) mapped reads were removed from the data set ( see Fig . 
+ S1 in the supplemental material ) . 
+ Of the 1,805 EDL933 mutants screened , TraDIS unambig-uously assigned the insertion site and ﬁtness score for 1,645 , representing 855 different genes . 
+ Importantly , we assigned the genotype and ﬁtness scores to 91.1 % of the mutants analyzed , where previously we only identiﬁed the insertion site in 4.2 % of mutants owing to the constraints of STM ( 6 ) . 
+ Insertions were in general well distributed , although there are AT-rich regions where insertions are overrepresented ( Fig. 2 ) , as may be expected as mini-Tn5Km2 preferentially inserts at TA dinucleotides . 
+ Table S1 in the supplemental material lists the insertion site and log2 fold change relative to input for each mutation . 
+ Figure S2 in the supplemental material shows a histogram of log2 fold change values obtained for all the mutants . 
+ This distribution was modeled by ﬁtting a bimodal normal distribution using the R package mixdist ( 13 ) ( Fig . 
+ S2 ) . 
+ This model represents the mutants as a mixture of two distinct populations . 
+ Most of the mutants show no attenuation , with no clear change in abundance relative to the input pool and a normal distribution of log2 fold change values with a mean close to 0 . 
+ Attenuated mutants show lower log2 fold change values , with a mean of approximately 3 . 
+ The model suggests that a log2 fold change of 1 ( equivalent to a 2-fold decrease in the abundance of the mutant in the output pool relative to the input ) is a suitable cutoff value to identify most of the attenuated mutants while restricting the number of false positives to an acceptable level . 
+ Seventy-two insertions were detected by both STM and TraDIS , 86.1 % of which were negatively selected in both cases and 72.2 % of which showed at least 1 log2 fold change or greater by TraDIS ( see Table S1 in the supplemental material ) . 
+ Though STM screening of EDL933 mutants in calves identi-ﬁed 13 attenuating mutations in LEE genes ( 6 ) and was considered exhaustive at the time , TraDIS identiﬁed 54 insertions in the LEE in 21 different genes . 
+ By TraDIS , all LEE mutants were negatively selected , except those with insertions in rorf1 or the region between ler and espG ( Fig. 3 ) . 
+ Insertions in the LEE-ﬂanking regions were not attenuating . 
+ Mutations in predicted T3SS structural components were strongly negatively selected , with the exception of a single insertion in a gene of unknown function ( rorf8 ) . 
+ Several LEE genes were disrupted many times , producing comparable ﬁtness scores . 
+ Variance in the scores for a given gene may reﬂect differences in competition dynamics in the pools in which the mutants were screened . 
+ Tra-DIS found 5 attenuating mutations in eae , encoding intimin and 3 mutations in tir , encoding the translocated intimin receptor . 
+ These were missed by STM , even though intimin and Tir play key roles in intestinal colonization of cattle by E. coli O157 : H7 ( 22 , 29 ) . 
+ TraDIS also identiﬁed mutations in 29 of the 39 type III secreted effectors of E. coli O157 : H7 veriﬁed by Tobe et al. ( 26 ) ( see Table S2 in the supplemental material ) . 
+ Mutants with insertions in several LEE-encoded effectors ( EspF , EspB , Tir , Map , EspH , and EspZ ) were all negatively selected , consistent with the role of such effectors in intestinal persistence of Citrobacter rodentium in mice ( 4 ) and E. coli O157 : H7 in rabbits ( 20 ) . 
+ Of the non-LEE-encoded effectors , several appeared to play little or no role ( e.g. , NleG , NleH , EspY1 , and EspY4 ) ( Table S2 ) , whereas mutations in the genes coding for the others were attenuating . 
+ Among the latter was z1829 , encoding EspK , an effector missed by STM but which inﬂuences persistence of EHEC in calves ( 27 , 30 ) . 
+ Though several effector phenotypes have been independently veriﬁed , we caution that some attenuating mutations identiﬁed by STM could not be reproduced when mutants were tested in isolation ( e.g. , map ) ( 6 ) or by coinfection with the parent strain ( e.g. , nleD ) ( 14 ) , possibly due to the distinct selection pressure exerted by combining 95 mutants during the library screen . 
+ Analysis of signature-tagged mutants of EHEC O26 : H in calves indicated that the cytotoxins EspP and enterohemo-lysin may promote intestinal colonization ( 27 ) . 
+ Though mutants with defects in these genes were not detected in the EDL933 STM screen ( 6 ) , TraDIS revealed that several such mutants were represented in the library and were generally negatively selected in calves . 
+ Three of four EDL933 espP mutants were attenuated by TraDIS ( see Table S1 in the supplemental material ) , consistent with the modest attenuation of a deﬁned espP mutant in calves ( 5 ) . 
+ Nine of 11 mutants with defects in the enterohemolysin ( EHEC-hly ) operon were negatively selected by TraDIS , supporting the attenuation of an ehxA mutant of EHEC O26 : H in calves ( 27 ) . 
+ EhxA appears not to play a signiﬁcant role in rectal colonization in steers ( 22 ) ; however , the latter study involved rectal application of the mutant to ruminant steers , without passage through the intestines . 
+ Eight insertions were detected in l7031/tagA , which encodes a zinc metalloprotease that cleaves C1-esterase inhib-itor ( StcE ) ( 12 ) , promotes adherence ( 9 ) , and modulates neutrophil function ( 25 ) . 
+ StcE mutants were generally underrepresented in calves , as were mutants with insertions in the EtpCD type II secretion system required for StcE secretion , consistent with the role of this system in colonization of rabbits ( 10 ) . 
+ Seventeen mutations were detected in the gene encoding the large clostridial toxin homolog L7095/ToxB , though only 7 were negatively selected by greater than 1 log2 fold change 
+ This relatively weak phenotype is consistent with the phenotype of a deﬁned E. coli O157 : H7 toxB mutant in calves ( 24 ) . 
+ Other genes carried by pO157 that were missed by STM but putatively linked to colonization by TraDIS include katP ( cat-alase-peroxidase ) , l7029/msbB ( lipid A myristoyl transferase ) , and a gene of the linked ecf operon ( l7026 ) . 
+ TraDIS faithfully reproduced the ﬁtness defect of mutants detected by STM that are impaired in O-antigen biosynthesis ( e.g. , manC , per , wbdP , and wzy ) , consistent with the phenotype of an E. coli O157 : H7 perosamine synthetase ( per ) mutant in steers ( 23 ) . 
+ It also identiﬁed other attenuating mutations missed by STM that affect this process , as well as other pathways implicated in bacterial survival in vivo , such as aromatic amino acid biosynthesis ( aroA ) and iron storage ( ftn ) . 
+ Of further interest , TraDIS identiﬁed an attenuating mutation in the catalytic subunit of Shiga toxin 1 ( stx1A ) . 
+ Previously , STM identiﬁed an attenuating mutation downstream of the toxin genes in prophage CP-933V but upstream of those involved in bacterial lysis . 
+ The attenuation of the stx1A mutant supports the ﬁnding that Stx1 promotes intestinal colonization of mice by E. coli O157 : H7 ( 21 ) . 
+ In common with other methods for screening pools of random mutants , TraDIS describes single gene-phenotype relationships and does not account for functional redundancy . 
+ Rarely , mutants may also contain more than one transposon insertion , harbor a secondary mutation of another kind , or possess polar insertions affecting the expression of nearby genes . 
+ These limitations impose a formal requirement to conﬁrm mutant phenotypes via the evaluation of nonpolar mutant and repaired or trans-complemented strains . 
+ The number of mutants that can be simultaneously screened will also be constrained by the requirement to obtain an output pool of an adequate size at a time postinoculation sufﬁcient for attenuation to be evident . 
+ It is estimated that if 100 mutants are screened , the output pool must comprise at least 10,000 colo-nies in order to state at the 95 % conﬁdence interval that speciﬁc mutants are absent due to attenuation as opposed to chance ( 6 ) . 
+ Moreover , at high pool complexities , stochastic loss of mutants may occur if the number of mutants exceeds a `` bottleneck '' above which individual mutants in the population no longer have an equal chance of establishing themselves in the host . 
+ Such limitations are balanced by the ability of massively parallel sequencing of mutant libraries to derive vastly richer functional annotation of pathogen genomes than can be obtained by earlier methods . 
+ In conclusion , TraDIS validated and substantially extended our analysis of signature-tagged E. coli O157 : H7 mutants in cattle . 
+ It described the genotype and ﬁtness score for 91.1 % of mutants screened , unlocking hundreds of novel phenotypes with no further animal use . 
+ It represents a signiﬁcant advance toward the principles of reduction , reﬁnement , and replacement of animals in research and is relatively inexpensive to apply de novo or retrospectively . 
+ The procedures described herein relate to transposons that have been extensively used in other microbes ( reviewed in reference 16 ) and can therefore be widely applied to derive quantitative data for functional annotation of microbial genomes . 
+ We gratefully acknowledge the support of DEFRA ( grant OZ0707 ) , the BBSRC ( grants D017556 and D017947 ) , and the Wellcome Trust
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/21515770.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/21515770.txt 0 → 100644
View file @27818a9
+ Transcription Factor GreA Contributes to Resolving Promoter-Proximal Pausing of RNA Polymerase
+ Bacterial Gre factors associate with RNA polymerase ( RNAP ) and stimulate intrinsic cleavage of the nascent transcript at the active site of the enzyme ( 12 ) . 
+ In eukaryotic cells , the transcription factor , TFIIS , exerts similar activity ( 9 ) , indicating that Gre function is evolutionarily conserved in multisubunit RNAPs ( 9 ) . 
+ Gre factors consist of an N-termi-nal extended coiled-coil domain ( NTD ) and C-terminal globular domain ( CTD ) ( 19 , 32 ) . 
+ Escherichia coli possesses two highly homologous Gre factors : GreA and GreB . 
+ A structural study on the RNAP-GreB complex further revealed that CTD binds to the rim of the secondary channel of RNAP through which substrate nucleoside triphosphates for RNA synthesis enter the catalytic site ( 18 , 25 , 38 ) , while NTD extends into the secondary channel and the tip reaches the catalytic center ( 28 ) . 
+ Two acidic residues , D41 and E44 , located at the tip of NTD , are conserved in Gre factors , including those of Bacillus subtilis , and are proposed to assist RNAP function by coordinating the Mg2 ion and water molecule required for catalysis of RNA hydrolysis ( 20 , 28 , 31 ) . 
+ During the elongation process of transcription , roadblocks generated by DNA-binding proteins or speciﬁc DNA sequences induce RNAP to slide backward along the template ( backtrack ) , resulting in extrusion of the 3 terminus of nascent RNA through the RNAP secondary channel ( 9 ) . 
+ Several bio-in Bacillus subtilis Cells † Yoko Kusuya ,1 Ken Kurokawa ,2 Shu Ishikawa ,1 Naotake Ogasawara ,1 and Taku Oshima1 * Graduate School of Information Science , Nara Institute of Science and Technology , 8916-5 Takayama , Ikoma , Nara 630-0192 , Japan ,1 and Graduate School of Bioscience and Biotechnology , Tokyo Institute of Technology , 4259 Nagatsuta , Midori , Yokohama , Kanagawa 226-8501 , Japan2 
+ * Corresponding author . 
+ Mailing address : Graduate School of Information Science , Nara Institute of Science and Technology , 8916-5 , Takayama , Ikoma , Nara 630-0192 , Japan . 
+ Phone : 81-743-72-5430 . 
+ Fax : 81-743-72-5439 . 
+ E-mail : taku@bs.naist.jp . 
+ † Supplemental material for this article may be found at http://jb . 
+ asm.org / . 
+ Published ahead of print on 22 April 2011 . 
+ chemical and genetic studies have conﬁrmed that the Gre factor facilitates endonucleolytic cleavage of extruded RNA to generate a new terminus that can be extended by RNAP , thus preventing transcription arrest during elongation and enhancing transcription ﬁdelity ( 3 , 8 , 21 , 27 , 36 ) . 
+ Furthermore , the Gre factor participates in the stimulation of promoter escape and the suppression of promoter-proximal pausing during the beginning of RNA synthesis in E. coli ( 1 , 11 , 13 , 21 , 33 -- 35 ) . 
+ A fraction of RNAP is anchored to the promoter after initiation of RNA synthesis via persistent binding of 70 to the core promoter sequence in some E. coli promoters , and Gre factors upregulate transcription initiation from these promoters . 
+ In addition , E. coli RNAP often binds and stalls at 10-like sequences located downstream of the core sequence after promoter escape ( 6 , 11 , 21 ) . 
+ The data obtained from in vivo KMnO4 mapping suggest that E. coli RNAP stalls at the pro-moter-proximal regions in 10 to 20 % of promoters , and GreA reduces the duration time of stalling at these regions for several genes ( 11 ) . 
+ Consistent with these ﬁndings on Gre involvement in initiation of transcription , recent microarray analyses revealed that GreA activates transcriptional initiation of 19 genes under normal growth conditions and an even larger number of genes upon overexpression ( 34 ) . 
+ The intracellular level of GreA increases upon SigE overexpression in E. coli , indicating that GreA is important for transcriptional regulation under stress , rather than normal growth conditions ( 34 ) . 
+ Although Gre factors are universally conserved in bacteria , current knowledge of their functions in bacterial species other than E. coli is limited . 
+ The gre gene is essential in Mycoplasma pneumoniae ( 14 ) . 
+ Gre factors are important for osmotolerance in Rhizobium tropici and Sinorhizobium meliloti ( 26 , 39 ) . 
+ Bacillus subtilis possess one Gre factor , designated GreA . 
+ 168 trpC2 Pasteur stock 168rpoCHis 168 rpoC : : pMUTinHis rpoC 15 168sigAHis 168 sigA : : pMUTinHis sigA 15 168nusAHis 168 nusA : : pMUTinHis nusA 15 YK02 168 greA : : pMUTinHis greA This study YK03 168 greA : : spec This study YK04 168 greA : : spec This study rpoC : : pMUTinHis rpoC YK05 168 greA ( D44A ) : : cat This study YK06 168 greA ( D44A ) : : cat This study rpoC : : pMUTinHis rpoC a spec , spectinomycin resistance gene ; cat , chloramphenicol resistance gene . 
+ The transcription elongation factors NusA , NusB , and NusG are concentrated in speciﬁc regions of the nucleoid termed transcription foci , which represent major sites of rRNA synthesis in B. subtilis cells . 
+ In contrast , B. subtilis GreA localizes uniformly throughout the nucleoid , suggesting its constant association with RNAP synthesizing mRNA ( 5 , 7 ) . 
+ Recent studies have explored the trafﬁcking of core RNAP and the transcription factors , i.e. , the main sigma factor ( E. coli 70 A and B. subtilis ) and elongation factor NusA , on the chromosomes of E. coli and B. subtilis using ChIP-chip and ChAP-chip ( chromatin afﬁnity precipitation coupled with DNA microarray ) methods ( 15 , 22 , 29a ) . 
+ The results suggest that the sigma factor in the initiation complex of RNAP is replaced with NusA upon transition to the elongation complex . 
+ Furthermore , our group demonstrated that in contrast to E. coli RNAP , which often accumulates at the promoter-proximal region , B. subtilis RNAP is evenly distributed from the promoter to coding sequences , indicating that RNAP B. subtilis recruited to the promoter promptly leaves the promoter-prox-imal region without trapping or pausing to form the elongation complex ( 15 ) . 
+ In the present study , we extended the ChAP-chip analysis to visualize the distribution of B. subtilis GreA on the chromosome and examined the effects of GreA inactivation on trafﬁcking of core RNAP . 
+ Our data indicate that GreA is uniformly distributed throughout the transcribed region ( from promoters to coding regions ) in association with core RNAP , and its inactivation induces accumulation of RNAP at many promoter or promoter-proximal regions . 
+ Accordingly , we propose that GreA is constantly associated with core RNAP during transcriptional initiation and elongation and resolves its stalling at the promoter or promoterproximal regions , resulting in even distribution of the polymer-ase throughout the transcribed region in B. subtilis cells . 
+ MATERIALS AND METHODS
+ Bacterial strains and plasmids . 
+ The bacterial strains and primers used in the present study are listed in Table 1 and in Table S1 in the supplemental material , respectively . 
+ To create a B. subtilis strain expressing C-terminal 12 His-tagged GreA ( YK02 ) , a fragment encompassing the 3 region of the greA gene ( except the stop codon ) was ampliﬁed by PCR from B. subtilis 168 chromosomal DNA using the greA.f-greA . 
+ r primer set , and cloned between the HindIII and XhoI sites of pMUTinHis ( 17 ) . 
+ The resultant plasmid was integrated into the B. subtilis chromosome via single crossover to generate the YK02 strain . 
+ Western blot analysis conﬁrmed a similar level of expression as that of wild-type GreA ( see Fig . 
+ S1 in the supplemental material ) . 
+ It was difﬁcult to determine the functionality of GreA-His , since the greA-deleted mutant showed no apparent phenotype . 
+ However , our results indicate that His-tagged GreA minimally retains binding ability to core RNAP ( see Fig. 4A ) . 
+ To generate a greA deletion mutant ( greA , YK03 ) , the spec resistance gene , including the promoter region , was ampliﬁed from the pJL62 plasmid ( 16 ) by using the specF and specR primers , and the 5 and 3 ﬂanking regions of greA were ampliﬁed from B. subtilis chromosomal DNA by using the primer sets greAF1-greAR1 spec and greAF2 spec-greAR2 , respectively . 
+ The primers greAR1 spec and greAF2 spec contained a 20-bp sequence complementary to the specF and specR primer sequences , respectively , at the 5 end . 
+ The three resulting fragments were fused via PCR by using the greAF1-greAR2 primer set and integrated into the B. subtilis chromosome via homologous recombination through the 5 and 3 ﬂanking regions . 
+ A strain expressing C-terminal histidine-tagged RpoC ( 168rpoCHis ) was transformed with chromosomal DNA of YK03 to obtain YK04 ( rpoC-his greA ) . 
+ The GreA-D44A strain [ greA ( D44A ) ] harboring a point mutation altering Asp44 of GreA to Ala ( YK05 ) was constructed by using PCR ( shown schematically in Fig. 1 ) . 
+ The chloramphenicol resistance gene with the terminator region was obtained from the pDLT3 plasmid ( 23 ) by using the rPCR-CmF2 and rPCR-CmR2 primer sets . 
+ The greA gene and its downstream region were ampliﬁed from B. subtilis 168 chromosomal DNA as two fragments by using the primer sets D44AgreA1F-D44greA1R and D44greA2F Cm-D44AgreA2R , respectively . 
+ Primers D44AgreA1F and D44AgreA2R introduced substitutions of several bases to give one amino acid change ( Asp44 to Ala ) of GreA and the recognition site of the restriction enzyme , ApaLI , used for the conﬁrmation of the substitution in greA gene . 
+ The D44greA2F Cm primer contained a 22-bp sequence complementary to the rPCR-CmF2 primer at the 5 end . 
+ The region upstream of greA was ampliﬁed from B. subtilis 168 chromosomal DNA by using the primers D44greA3F and D44greA3R Cm . 
+ The D44greA3R Cm primer contained a 22-bp sequence complementary to the rPCR-CmR2 primer at the 5 end . 
+ The resulting four fragments were fused by PCR using the D44greA3F-D44greA1R primer set and integrated into the B. subtilis chromosome via homologous recombination with selection for chloramphenicol resistance . 
+ A strain expressing C-terminal histidine-tagged RpoC ( 168rpoCHis ) was transformed with chromosomal DNA of YK05 to generate the YK06 strain [ rpoC-his greA ( D44A ) ] . 
+ Pulldown puriﬁcation of RNAP complexes . 
+ B. subtilis strains expressing histi-dine-tagged protein -- 168rpoCHis , 168sigAHis , 168nusAHis , and YK02 ( expressing His-tagged GreA ) -- were grown in 400 ml of Luria-Bertani ( LB ) me-dium containing erythromycin ( 0.5 g/ml ) under aerobic conditions at 37 °C until cultures reached an optical density at 600 nm ( OD600 ) of 0.4 . 
+ Each culture was treated with formaldehyde ( 1 % ﬁnal concentration ) for 30 min at 37 °C . 
+ Cells were washed with Tris-buffered saline buffer ( pH 7.5 ) and stored at 80 °C . 
+ Afﬁnity puriﬁcation of RNAP complexes was performed according to a previously described procedure for ChAP-chip experiments ( 17 ) with the following modiﬁcations . 
+ Dithiothreitol was removed from the UT buffer , Dynabead Talon ( 50 l ; Invitrogen ) used instead of MagneHis beads , and elution of complexes from Dynabeads was performed twice with 400 l of elution buffer . 
+ Recovered RNAP complexes were heated at 95 °C for 30 min to remove cross-linking , and the appropriate amounts of proteins were separated by using a 5 to 20 % SDS-PAGE gradient gel , followed by transfer to polyvinylidene diﬂuoride membrane ( GE Healthcare ) via electroblotting at 100 V for 1.5 h ( RpoC , SigA , and NusA ) or 4 h ( GreA ) . 
+ Western blotting was performed according to the instructions of the Amersham ECL Plus Western blotting detection system ( GE Healthcare ) using horseradish peroxidase-conjugated goat anti-rabbit or anti-mouse IgG ( Bio-Rad ) . 
+ Mouse polyclonal anti-RpoB antibody was obtained from Neoclone , and rabbit polyclonal anti- A antibody was kindly provided by Fujio Kawamura ( 24 ) . 
+ Rabbit polyclonal anti-NusA and anti-GreA antibodies were prepared as described below . 
+ Preparation of anti-GreA or NusA peptide antibody . 
+ Peptides corresponding to residues 21 to 37 ( EGKQKLEQELEYLKTVK ) , 40 to 55 ( EVVERIKIARS FGDLS ) , and 141 to 157 ( TVQTPGGEMLVKIVKIS ) of GreA and to residues 57 to 73 ( RVFARKDVVDEVYDQRL ) , 228 to 244 ( EAGDRSKISVRTDDP DV ) , and 355 to 371 ( EDDEPLFTEPETAESDE ) of NusA were synthesized , and mixtures of three peptides were used to raise antisera against GreA or NusA in rabbits ( Sigma Genosys , Japan ) . 
+ Anti-GreA and anti-NusA peptide antibodies were subsequently puriﬁed from antiserum by using peptide afﬁnity column chromatography ( Sigma Genosys ) . 
+ ChAP-chip analysis . 
+ The strains used for ChAP-chip analysis were cultivated in 400 ml of LB medium containing the appropriate antibiotic ( s ) -- speciﬁcally , erythromycin ( 0.25 or 0.5 g/ml ) , spectinomycin ( 50 g/ml ) , and chloramphen-icol ( 2.5 g/ml ) -- under aerobic conditions at 37 °C until cultures reached a 
+ OD600 of 0.4 . 
+ The procedure for ChAP fraction preparation was similar to that for pulldown puriﬁcation of the RNAP complex , and the ﬁnal volume of the elution fraction was 40 l. Cross-linked whole-cell extract fractions before pu-riﬁcation of RNAP in each experiment were used to prepare control DNA for ChAP-chip analysis . 
+ Protein-DNA cross-links were dissociated by heating over-night at 65 °C . 
+ DNA was subsequently puriﬁed by using QiaQuick ( Qiagen ) and eluted with 50 l of nuclease-free water ( Ambion ) . 
+ Random ampliﬁcation and terminal labeling of DNA in whole-cell extracts or afﬁnity-puriﬁed fractions and hybridization to the custom Affymetrix tiling chip were performed as described previously ( 17 ) . 
+ The signal intensities of DNA isolated from the afﬁnity puriﬁcation and whole-cell extract fractions before puriﬁcation ( control DNA ) were adjusted to confer a signal average of 500 . 
+ The signal intensities of DNA in the afﬁnity-puriﬁed fraction were divided by those of control DNA for quantitative estimation of the enrichment of DNA fragments by afﬁnity puriﬁcation ( 37 ) . 
+ The binding signals represented by the enrichment values were visualized along the genome coordinate by using the In Silico Mo-lecular Cloning Program , Array Edition ( In Silico Biology , Japan ) . 
+ All experiments were performed in duplicate . 
+ A Analysis of the TR of RNAP . 
+ The binding peaks were automatically detected ( 15 ) , with the threshold value set as 2.0 . 
+ We selected genes positioned immediately downstream of the A binding sites , removing those located divergently and sharing the same A binding sites . 
+ Consequently , we selected 268 genes with sufﬁcient RNAP signal intensities ( 0.95 ) and lengths ( 150 bp ) for traveling ratio ( TR ) calculation ( 15 , 29a ) . 
+ Transcriptome analysis . 
+ Total RNA was puriﬁed from wild-type , YK03 , and YK05 strains cultured in 200 ml of LB medium at 37 °C under aerobic conditions to an OD600 of 0.4 . 
+ Synthesis of cDNA , terminal labeling , and hybridization to the custom Affymetrix tiling chip were performed as described previously ( 4 ) . 
+ The signal intensities of perfectly matched probes ( only ) were used in this analysis and were adjusted to confer a signal average of 500 . 
+ Data visualization was performed by the In Silico Molecular Cloning Program . 
+ The average signal intensities of probes in each gene were calculated , and 2,824 genes with average signal intensities of 100 in wild-type , greA , and GreA-D44A cells were used to search for genes that were up - or downregulated upon inactivation of GreA . 
+ Comparison of transcriptome between the wild type and each greA mutant was performed by four different combinations using duplicate data for each strain . 
+ Array data . 
+ Raw data ( CEL format ) from ChAP-chip and transcriptome experiments have been deposited in ArrayExpress under accession numbers E-MEXP-3056 and E-MEXP-3055 , respectively . 
+ RESULTS
+ Distribution of GreA on the B. subtilis chromosome . 
+ To visualize the genome-wide association of GreA with RNAP , we created a strain expressing GreA tagged with 12 histidines at the C terminus under the control of the original promoter o the chromosome . 
+ GreA-His-expressing cells were cultivated in LB medium under aerobic conditions and harvested at an OD600 of 0.4 , followed by ChAP-chip analysis , as described earlier ( 15 ) . 
+ In parallel , we performed ChAP-chip analysis of the core RNAP ( subunit ) , A , and NusA , as well as transcriptome analysis using cells cultured under similar conditions . 
+ Typical distributions of protein-binding and transcription signals are shown in Fig. 2 , and the complete data set is presented in Fig . 
+ S2 in the supplemental material . 
+ The core RNAP binding signals started from the transcription start site ( 5 edge of contiguous transcription signals ; gray line ) and were evenly distributed along the transcribed region ( Fig. 2A and B ) . 
+ The A signals were observed symmetrically at the transcription start site ( Fig. 2C ) , while the NusA signals started slightly downstream of the transcription start site and were distributed throughout the transcribed regions ( Fig. 2D ) . 
+ These features are consistent with our previous ﬁndings ( 15 ) . 
+ We observed that GreA signals were distributed along the transcribed regions ( Fig. 2E ) , a ﬁnding similar to those for core RNAP and NusA . 
+ However , absolute signal intensities were lower and background signals were higher than binding signals of other proteins , probably because of indirect interaction of GreA with DNA and/or lower accessibility of His tag in the GreA-RNAP complex . 
+ In addition , we found no regions where GreA signals are observed without RNAP signals . 
+ Furthermore , we detected genome-wide positive correlation between the RNAP and GreA binding signals ( r 0.86 , Fig. 3A ) in the coding regions , similar to NusA binding signals ( r 0.94 , Fig. 3B ) . 
+ These results suggest that GreA is constantly associated with the majority of core RNAP during transcription elongation in B. subtilis cells , which is consistent with the overlapping localization of RNAP-green ﬂuorescent protein ( GFP ) and GreA-GFP ﬂuorescence ( 7 ) . 
+ The reduced correlation of signal intensities between GreA and RNAP in the low-signal-intensity region in Fig. 3A would be caused by the higher background signals of GreA . 
+ GreA is involved in the initiation and elongation of RNAP complexes . 
+ Several biochemical and structural studies have established that sigma factor and NusA compete for the same binding surface of core RNAP ( 10 , 41 ) , while GreA associates with core RNAP at a different site ( secondary channel ) . 
+ This ﬁnding suggests that , in addition to association with the elongation complex of RNAP , Gre factor also interacts with the initiation complex of RNAP . 
+ In support of this hypothesis , start sites of the GreA binding signals appeared to shift to transcription start sites , compared to those of NusA signals ( Fig. 2 and see Fig . 
+ S2 in the supplemental material ) . 
+ To conﬁrm GreA association with the RNAP initiation complex , we analyzed the composition of RNAP complexes with the pulldown assay using His-tagged RpoC , GreA , NusA , or A as bait . 
+ Strains expressing 12 His-tagged GreA , NusA , A , and RpoC were cultivated to an OD600 of 0.4 , and cellular proteins were cross-linked with formaldehyde . 
+ Subsequently , cross-linked protein complexes were puriﬁed with nickel magnetic beads , and proteins included within the puriﬁed complexes were fractionated by SDS-PAGE after the removal of cross-linking by heat treatment . 
+ For precise comparison of the amounts of components within each complex , protein mixtures containing similar concentrations of RpoB ( representing the amount of core RNAP ) were subjected to SDS-PAGE ( see greA ( Fig. 5E ) and GreA-D44A strains ( Fig. 5F ) disclose decreased TR values in the majority of genes examined and not speciﬁc genes . 
+ Our ﬁndings suggest that GreA inactivation results in the stalling of RNAP at the promoter or promoterproximal region , supporting its general involvement in stimulation of promoter escape or suppression of promoter-proxi-mal pausing in B. subtilis cells . 
+ Notably , the GreA-D44A mutation exerted a more signiﬁcant effect on RNAP trafﬁcking than GreA deletion . 
+ Thus , it appears that GreA activity in assisting nucleolytic cleavage activity of RNAP is essential to resolve stalling at the promoter or promoter-proximal regions . 
+ Furthermore , it is possible that the GreA-D44A protein retains the ability to bind RNAP , similar to the E. coli mutant protein , which interferes with the intrinsic nucleolytic activity of active RNAP ( 20 ) , although no direct evidence to support this theory has been obtained . 
+ GreA contributes to resolving the stall of A-RNAP . 
+ Next , we attempted to characterize the stalled RNAP complexes in GreA-inactivated cells , focusing on genes for which core RNAP peaks appeared clearly at the promoter or promoterproximal regions in greA and GreA-D44A cells . 
+ We selected genes whose TR values were reduced by more than 0.20 , followed by visual inspection to strictly deﬁne core RNAP accumulation in greA mutants . 
+ As a result , 13 genes were identiﬁed in greA cells ( aee Fig . 
+ S6 in the supplemental material , genes 1 to 13 ) and 34 genes in GreA-D44A cells ( see Fig . 
+ S6 in the supplemental material , genes 2 to 35 ) . 
+ Among these , 1 and 22 genes were reproducibly detected only in greA and GreA-D44A cells , respectively , and 12 genes reproducibly detected in both strains . 
+ We further examined whether these stalled complexes were A-RNAP or NusA-RNAP by searching for genes in which A and NusA peaks could be discriminated . 
+ We identiﬁed 10 genes ( marked by asterisks in Fig . 
+ S6 in the supplemental material ) in which A and NusA peaks overlapped to a lesser extent . 
+ In most of these genes , accumulated RNAP peaks overlapped with A peaks ( Fig. 6 ) , suggesting that the majority of RNAP peaks induced by GreA inactivation constitute the A-RNAP complex . 
+ The greA deletion and D44A substitution have little impact on the transcriptome . 
+ Finally , we investigated the impact of GreA inactivation on genome-wide transcriptional regulation in B. subtilis cells . 
+ Total RNA was prepared from wild-type greA , and GreA-D44A cells , cultivated in LB medium under aerobic conditions , and harvested at an OD600 of 0.4 , and transcriptome proﬁles were obtained by using the tiling chip used for the ChAP-chip experiments , as described earlier ( 4 ) . 
+ We selected 2,824 genes with average signal intensities of 100 in the coding regions in all three strains and generated scatter plots of their transcription signal intensities , as shown in Fig. 7 . 
+ Next , we searched for genes that are up - or downregulated by 2.8-fold ( i.e. , log2 1.5 ) in greA and GreA-D44A cells , compared to wild-type cells , with P values ( Student t test ) lower than 0.05 . 
+ As a result , 28 upregulated and 35 downregulated genes were identiﬁed ( see Table S3 in the supplemental material ) . 
+ Among the 28 upregulated genes , 17 genes were upregulated in both mutant strains , and 24 and 21 genes were upregulated in greA and GreA-D44A cells , respectively . 
+ Similarly , 15 genes were downregulated in both mutant strains , and 24 and 26 genes were downregulated in greA and GreA-D44A , respectively . 
+ Furthermore , we observed no correlation between changes in the transcription level and RNAP accumulation ( Fig. 7 ) . 
+ These results indicate that inactivation of GreA has a limited impact on the transcriptome , and these effects are not directly related to RNAP accumulation in the promoter or promoter-proximal regions . 
+ DISCUSSION
+ To our knowledge , this is the ﬁrst report on genome-wide distribution analysis of the bacterial elongation factor , Gre . 
+ The cellular level of B. subtilis GreA is twice that of RNAP , and the majority of GreA associates with RNAP ( 7 ) . 
+ We have shown here that GreA is evenly distributed from the promoter to coding regions and overlaps with RNAP engaged in transcription in B. subtilis ( Fig. 2 and 3 and see Fig . 
+ S2 in the supplemental material ) . 
+ Gre factors were previously proposed to transiently associate with stalled RNAP ( 9 ) . 
+ However , our data strongly suggest that GreA is not speciﬁcally recruited to stalled RNAP . 
+ In addition , pulldown assays of the components of RNAP complexes demonstrated that GreA associates with not only with the elongation complex of RNAP ( NusA-RNAP ) but also the initiation complex ( A-RNAP ) ( Fig. 4 ) . 
+ However , although the copuriﬁcation analysis suggests that His-tagged GreA minimally retains binding ability to core RNAP ( Fig. 4A ) , it is possible that the His tag addition affects some GreA function and/or its binding afﬁnity to RNAP , and this requires further investigation . 
+ GreA inactivation had no clear effects on the distribution of elongating RNAPs but induced a genome-wide shift in TR values , a ﬁnding indicative of RNAP pausing at promoter or promoter-proximal regions ( Fig. 5 ) . 
+ Clear RNAP peaks were detected at the promoter or promoter-proximal regions of 35 genes in GreA-inactivated cells ( Fig. 6 and see Fig . 
+ S6 in the supplemental material ) . 
+ Furthermore , the majority of the induced RNAP peaks colocalized with A peaks , suggesting the accumulation of A-RNAP . 
+ In E. coli cells , Gre factors enhance promoter escape and suppress promoter-proximal pausing of A-RNAP ( 11 , 13 , 21 , 34 , 35 ) . 
+ Based on these ﬁndings , we propose that B. subtilis Gre factor plays a similar role during the initiation of RNA synthesis in many promoters or promoter proximal regions . 
+ Although the resolution of our ChAP-chip analysis did not permit discrimination of RNAP accumulation at promoters or promoter-proximal regions , we favor the possibility of accumulation at promoter-proximal pausing , since B. subtilis RNAP is known to form unstable open complexes and synthesize smaller amounts of abortive transcripts than E. coli RNAP ( 2 , 40 ) . 
+ Recently , it was reported that the pausing of RNAP in E. coli is induced by direct and sequence speciﬁc interactions of RNAP with promoter-like sequences ( 6 , 29 ) . 
+ However , we have not yet found any correlation between RNAP stalling and promoter-like sequences at the promoter-proximal regions in B. subtilis . 
+ Further in vitro analysis of the effects of GreA on transcription initiation by B. subtilis RNAP and bioinformatics studies on the signals inducing RNAP pausing at promoter-proximal sites are required to elucidate the molecular mechanism of RNAP accumulation in greA mutants . 
+ The RNAP accumulation observed in greA cells was also detected in GreA-D44A cells , supporting the hypothesis that Asp44 of the B. subtilis GreA is essential to resolve the pausing of RNAP through stimulation of nucleolytic cleavage activity of RNAP . 
+ Interestingly , the effects of the GreA-D44A mutation on RNAP pausing were more extensive than those of the greA mutation . 
+ The E. coli Gre protein mutated at D41 ( corresponding to D44 in B. subtilis GreA ) inhibits elongation of transcription in vitro ( 20 ) . 
+ Recently , overexpression of a yeast TFIIS mutant harboring substitutions of two amino acids that stimulate intrinsic nucleolytic activity of RNAP was found to be lethal in yeast cells ( 30 ) . 
+ Based on the collective results , we propose that the B. subtilis GreA-D44A protein retains the ability to bind RNAP , and this binding interferes with the intrinsic nucleolytic activity of RNAP , which remains active , even without stimulation by Gre . 
+ However , in contrast to the data obtained with yeast , growth defects were not observed upon expression of the GreA-D44A protein , suggesting that the pausing of RNAP during the initiation and elongation of transcription does not occur frequently in B. subtilis cells , and the problem of stalling may be resolved by ways other than cleavage of the extruded 3 terminus of nascent RNA such as , for example , collapse of the association of RNAP with the DNA template . 
+ In E. coli cells , GreA inactivation has direct and negative effects on the transcription initiation frequencies of a number of genes ( 34 ) . 
+ However , we could not establish a direct impact of GreA inactivation on the transcriptome in B. subtilis cells under normal growth ( LB medium and aerobic ) conditions , even though RNAP trapping or pausing at promoters or pro-moter-proximal regions was induced in many genes in mutant strains . 
+ These observations suggest that the trapping or pausing frequency is lower in B. subtilis cells , compared to that in E. coli , probably due to differences in the biochemical properties of the two RNAP types . 
+ Initiation complexes of B. subtilis RNAP are efﬁciently converted to elongation complexes in vitro and in vivo , compared to E. coli RNAP ( 2 , 15 , 29a , 40 ) . 
+ As in several other bacteria , GreA function may be essential for B. subtilis growth under stress conditions that induce frequent pausing of RNAP . 
+ However , the phenotypes of greA mutants under various conditions are yet to be established , and further studies are required to understand the biological importance of GreA in B. subtilis cells . 
+ ACKNOWLEDGMENTS
+ We are grateful to Hiroki Takahashi for the suggestion of the statistical analysis . 
+ This study was supported by a KAKENHI grant-in-aid for scientiﬁc research in the Priority Area `` Systems Genomics '' from the Ministry of Education , Culture , Sports , Science , and Technology of Japan .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/22555467.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/22555467.txt 0 → 100644
View file @27818a9
+ Signature tagged mutagenesis in the functional
+ Keywords : signature tagged mutagenesis , virulence , gastrointestinal , pathogenesis , pathogen , gut , GI tract 
+ Introduction
+ PCR . 
+ The original tag consisted of a short DNA sequence of 40 bp that was flanked by two invariant arms of 20 bp . 
+ The region between the variable and invariable region contained a restriction enzyme site that could be used to release the arms from the central regions following amplification and labeling therefore allowing for tag specific probes to be generated . 
+ The original method to detect the signature tags was by dot blot but over the years there have been many methods for detection of signature tags ( PCR , polymorphic tag-length transposon mutagenesis ) . 
+ STM was initially developed to identify virulence genes in Salm bs onel nter ca s ro r oth r b ter al s e een usBlaieoisecvaityphimurium but has subsequently ed in many screens ineenacciep.cies as well as in the yeast Saccharomyces cerevisiae , the fungus Cryptococcus neoformans and the parasite Toxoplasma gondii.1-10 stribute . 
+ For any bacterial pathogen there are several critical parameters that must be followed in order to ensure an efficient STM screen in vivo . 
+ First , the transposon chosen should insert randomly into the chromosome , a property which varies depending on the transposon . 
+ It was previously noted that the Tn917 transposon system used in Listeria monocytogenes has tendency for hot-spots , while this is not the case with a recently developed marnier transposon system pJZ037.11-14 Second , the pool size must be determined with respect to the inoculum dose ( normally this ranges between 48 -- 96 mutants per pool ) . 
+ Finally the route of administration and the infecting dose must be established and the best time frame for when to evaluate a possible attenuation in virulence . 
+ The main advantages of this system compared with other classical gene inactivation methods ( targeted or random ) is that STM is a negative selection screen allowing the discovery of virulence genes without prior knowledge of their nature or function .15 Secondly , as a large number of mutants can be screened in tandem ( up to 96 ) , this method is in principal much faster and more exhaustive in identifying virulence factors compared with standard transposon systems . 
+ The disadvantage of this system is that it is limited to finding non-essential genes ( i.e. , genes not required for growth in broth ) . 
+ Furthermore some DNA tags are unable to be amplified from chromosomal DNA of bacteria after recovery from the animal host . 
+ This reason for this is not known but it can result in loss of reproducibility and the identification of false negatives . 
+ This can be overcome by re-organizing the candidates in a new pool that is then re-screened in the animal . 
+ While this may be time-consuming it reduces the number of false-negative candidates . 
+ Another drawback encountered with infections , particularly diarrheal disease , are still the third most common cause of death in children under 5 years of age .19 An example of such a disease is Salmonella enterica serovar Typhi infection ( typhoid fever ) , which results in more than 2 million infections a year leading to approximately 200,000 deaths .20 Bacterial pathogens have developed several intricate systems to evade detection by the immune response and to circumnavigate the stresses they may encounter in the GI tract ( pH , osmotic stress , bile and acid stress ) .21 -23 Following ingestion , the first physical stress encountered by the bacterium is the low pH of the stomach ( pH 2 ) , followed by the increased osmolarity of the upper small intestine ( equivalent to 0.3 M NaCl ) and in the duodenum , the antimicrobial activity of the biological detergent bile ( 1 L of bile is produced in the liver , stored interdigestively in the gall bladder and secreted into the duodenum each day ) .24 which functions as a Ca2 + dependent metallo-protease with collagenolytic activity . 
+ The mutant strain produced half as much proteolytic activity as the wild-type and was unable to degrade collagen .26 Collagen type I and III are important components of the extracellular matrix of the stomach epithelium . 
+ Furthermore type I collagen is present in the area around gastric ulcers and is important for the process of ulcer healing .27 Kavermann and colleagues suggest that the secretion of a collagen degrading enzyme by H. pylori could be responsible for the persistence of gastric or duodenal ulcerogenesis and the delayed healing process . 
+ Vibrio cholerae
+ Cholera is an acute diarrheal disease which is characterized by discharge of voluminous rice water stool caused by toxigenic Vibrio cholerae strains .28 The pathogen enters the host through the oral route of infection , transits the gastric acid barrier of the stomach and colonizes the small intestine . 
+ Once established within this niche the bacteria begin to produce the cholera toxin ( CT ) , which is responsible for the diarrheal disease that is characteristic of cholera .28 The ensuing diarrheal disease can lead to death by dehydration within hours of infection . 
+ V. cholerae O1 and O139 serogroups producing cholera toxin ( CT ) are mainly responsible for cholera outbreaks that can cause havoc in highly populated regions in Asia , Africa and Latin America .28 accumulation of intracellular K + . 
+ However this screen was the first to link reduced colonization and decreased survival in organic acids to a gshB mutation in V. cholerae . 
+ An additional virulence factor shown to be important in colonization and acid tolerance response was a HepA homolog . 
+ HepA was originally identified in E. coli as a protein that co-purified with RNA polymerase and inhibited the binding of sigma 70.38 A mutation in E. coli in this gene leads to increased sensitivity to UV damage .38 It is thought that acid stress causes DNA damage and if HepA is required for synthesis of the DNA repair enzymes it would establish a link between the two functions . 
+ Both of these mutations resulted in a 1000 fold decrease in colonization signifying the importance of both of these genes in the infant mouse model .37 An interesting finding from this screen is that over 20 % of the genes identified in the screen play a role in energy metabolism . 
+ This indicates that the environment in the small intestine of the suckling mouse is nutrient limited . 
+ Therefore to survive within such an environment V. cholerae must actively employ multiple pathways for energy source acquisition and survival . 
+ Es is a o m n em t n s rain ha e t e
+ . 
+ coli Bcimoosmcber of the commensal bacteria of the large intestine , however ceriaietnscveh.ability to cause disease . 
+ Enterohemorrhagic E. coli ( EHEC ) is known to cause disease in humans associated with diarrhea and hemorrhagic colitis.39-41 EHEC serotype O157 : H7 has emerged as a major cause of severe diarrhea worldwide and EHEC is the leading stribute . 
+ predecessor to pediatric acute renal failure in many countries .42,43 Healthy ruminants are the principal reservoir of EHEC and human infections occur by ingestion of contaminated meat or dairy products contaminated with ruminant feces .44,45 To increase the knowledge of EHEC factors associated with bovine coloni-zation a STM mutant bank was created in O157 : H7 background .46 A total of 1900 mutants were screened by oral inoculation of 10 -- 14-day old calves with recovery of the output pool 5 days post-infection . 
+ 79 mutants were identified to be absent or poorly represented in the output pool . 
+ All the genes were grouped according to their function and consisted of genes involved in TTSS , surface structure , O-islands , regulatory genes , genes involved in central intermediary metabolism and hypothetical genes . 
+ Thirteen transposons were inserted into genes on the locus of enterocyte effacement ( LEE ) . 
+ LEE encodes a TTSS required for formation of attaching and effacing lesions on the intestinal epithelia . 
+ Their data demonstrated that the structural component of TTSS escC plays a vital role in colonization of the calves as this mutant was highly attenuated following oral infection of calves . 
+ This was the first time that the structural components of the TTSS were implicated in colonization of the calf intestine .46 Furthermore this work demonstrated that coloni-zation of the bovine intestines requires multiple elements not associated with LEE . 
+ Their evidence suggests that a novel fimbrial locus ( z2199-z2206 ) plays an important function in intestinal colonization .46 This was the first comprehensive study to elucidate the genes required by E. coli O157 : H7 for infection of bovine intestine and this new information can be used to facilitate the grouped into two classes , with class B pilins being associated with intestinal infections .53 Mundy and colleagues suggest that disruption of cfcH is likely to prevent the assembly and/or secretion of CFC pili .50 This would result in C. rodentium cells being unable to adhere to colonic epithelia and therefore unable to establish an infection .50 Another gene identified in this screen encodes a novel type III secreted protein , EspI , which is encoded outside the LEE region and is present in the sequenced A/E EHEC and EPEC pathogens .49 The function of this gene has not been elucidated but it has been shown to play an important role in both bacterial colonization of colonic epithelium of infected mice and induction of hyperplasia in the colonic epithelium of infected mice .49 The second STM screen performed in C. rodentium utilized the C57BL/6 mouse model . 
+ The study examined 576 mutants of which 19 were attenuated for survival at 5 -- 7 d post-infection .48 Several insertions corresponded to previously identified virulence genes , including the gene cluster cfc and the espI . 
+ However one interesting finding was an insertion in the gene encoding a putative translocation effector of A/E pathogens , NleB .48,54 An nleB deletion mutant was constructed and tested for its ability to colonize the mouse . 
+ As with the transposon mutant the nleB deletion mutant was outcompeted by the wild-type in mixed ins ns . 
+ I ddit on si g e in ecti ns t w fectioBniaoisincnilefnocieas.also shown to be e sential for colonization and virulence .48 NleB is also present in the EHEC O157 : H7 strain indicating that C. rodentium is an invaluable small animal model to represent other A/E pathogens and to test the role of new effector proteins in disease .48 stribute . 
+ Campylobacter jejuni
+ Campylobacter jejuni is the most common bacterial cause of food-borne disease in the developed world , with estimates that it infects 1 out 100 individuals in the United States and United Kingdom .55,56 In the developed world , campylobacteriosis is common in neonates and young adults resulting in mild bloody diarrhea , abdominal cramps and the presence of fecal leukocytes .57,58 Although the vast majority of cases are self-limiting , campylobacter can cause severe post-infection complications , such as bacteraemia and polyneuropathies such as GuillainBarré and Miller-Fisher syndrome .59 C. jejuni usually infects the avian gastrointestinal ( GI ) tract particularly of chickens .60 During the slaughtering process , the GI contents may contaminate the meat products and ingestion and handling of contaminated meats are a main cause of sporadic cases of C. jejuni disease .55 STM has been used in both early and late chicken models of infection with varying degrees of success . 
+ The first STM screen in C. jejuni analyzed cecal colonization of chicks in a 1-d old chick model of commensalism .61 In total 1550 C. jejuni mutants were screened of which 29 were attenuated for colonization representing 22 different genes required for wild type levels of infection .61 The vast majority of the mutants ( 17 ) exhibited a non-motile phenotype or displayed altered flagellar motility .61 It was previously known that motility of C. jejuni is required for wild-type levels of cecal colonization therefore validating the efficacy of the screen .62,63 Of the remaining mutants two were of particular interest Cj0019c and Cj0020c chosen for proof-in-principle of STM because of its excellent genetic systems and well validated animal models .1,73 The original STM screen identified a novel pathogenicity island , SPI-2 74 . 
+ SPI-2 was only identified due to its role in disseminated infection . 
+ SPI-2 encodes a type three secretion system ( TTSS ) which is required for intracellular replication and systemic infection .74 Furthermore , SPI-2 genes have been shown to be involved in the survival of Salmonella within the macrophages and play a role in avoidance of NADPH oxidase-dependent killing.75-77 Recently SPI-2 mutants have been included in live attenuated vaccines in S. typhi and S. typhimurium indicating how STM has lead from proof-of-principle all the way to potential clinical applications .78,79 As stated earlier S. typhimurium can colonize a range of different hosts and STM has been used to try and elucidate both species specific virulence factors and common colonization factors to allow a better understanding of how this bacteria is able to infect such a wide variety of different niches . 
+ A previous mini-Tn5Km2 STM bank was used to screen mutants for attenuated virulence in both calves and chicks .80 S. typhimurium infection in calves results in enterocolitis followed by systemic infection , while infection of 2-week old chicks results in asymptomatic cecal colonization . 
+ The STM screen in the calves recovered mutants 3s s af r infe tio fro f om ho ge ize -- 5 dayBteiocsncm the ileal mucosa while in the chicks the mutants were recoveredirenmocned.ceca at 4 d post-infection .80 In total 1045 mutants were screened in both hosts . 
+ Of the screened mutants 75 were associated with attenuation in the calves , 61 were associated with attenuation in chicks alone stribute . 
+ and 52 mutants were attenuated in both species .80 A large proportion ( n = 40 ) of the mutated genes were within Salmonella pathogenicity islands ( SPIs 1 -- 5 ) . 
+ All of the mutants with a transposon insertion in SPI-1 or SP1 -- 2 were deficient for colonization of the calf model but only 3/32 of these SPI mutants resulted in poor colonization of the chick ceca .80 This suggests that S. typhimurium is much less dependent on TTSS-1 and TTSS-2 to colonize the intestines of chicks compared with calves .80 Furthermore SPI-4 was found to be required for coloni-zation of calf ileum but not for cecal colonization of chickens .80 Several genes required for production of LPS were identified by this in vivo screen as being attenuated in both calf and chick colonization models .80 LPS is widely considered to play a role in protecting the bacteria against host defense mechanisms such as bile salts , gastric acidity and phagocytes . 
+ The precise role of LPS in Salmonella virulence is not yet known but it clearly has a major function in colonization of the intestinal sites of different hosts . 
+ Of the genes associated with attenuation within the chick model several mutants had transposon insertions in genes required for production of six different fimbriae .80 Fimbriae are used by bacteria to adhere to each other as well as host surfaces , and this data indicates that fimbriae are important for colonization within the intestine of the chicken . 
+ The same STM mutant library was used to identify novel genes associated with virulence in the intestinal colonization of pigs .81 This screen identified 119 mutants attenuated for virulence in the porcine model of infection . 
+ Of these 119 mutants , the transposon insertion site of 79 had been identified in the previous screen .80 The remaining 40 transposon mutants were associated appropriate doses to be efficiently used to identify Salmonella mutants with altered fitness in vivo .86 Furthermore it is the first step toward a more complete description of Salmonella genes involved in systemic infection , in particular genes that may have a milder phenotype that are difficult to detect by the older STM methods .86 
+ Listeria monocytogenes
+ Listeria monocytogenes is a Gram-positive food-borne pathogen responsible for life-threatening infections in humans and animals . 
+ It is a facultative intracellular pathogen capable of entering a wide variety of host cells , including epithelial cells , hepatocytes , fibroblasts , endothelial cells and macrophages.91-94 Infection occurs in step-wise manner consisting of entry into the host , lysis of the phagosomal vacuole , multiplication in the cytosol and direct cell to cell spread using actin based motility .95 Each step is dependent on virulence factors which are located in a cluster of genes encoding a regulatory protein ( PrfA ) , a phosphatidylinositol specific phospholipase C ( PlcA ) , the hemolysin listeriolysin A ( LLO ) , a metalloprotease ( Mpl ) , an actin recruiting protein ( ActA ) and a lecithinase ( PlcB ) .95 A second locus encodes two proteins involved with invasion , InlA and InlB . 
+ Expression os vir e ce g nes are c ntr led y t e p ei f theseBulinoescioeolnbchel.otropic regulator PrfA . 
+ A STM approach was implemented in the L. monocytogenes EGDe background using a Tn917 derivative transposon . 
+ Mutants se re n bute . 
+ wertscreied in vivo for reduced colonization of the spleen and liver at 72 h post-infection .96 From this screen the response regulator VirR was identified . 
+ VirR ( Virulence Regulator ) has high homology to the OmpR-PhoB family of regulators .96 It is part of a seven gene operon , which also contains the sensor kinase , VirS . 
+ The VirRS two-component system ( TCS ) is novel in that unlike most TCS the constituent genes are not adjacent to each other but are separated by three other loci .96 The results obtained from this study demonstrated that the DvirR strain had a reduced ability to grow and multiply within both liver and spleen even after 24 h indicating a crucial role for VirR in the establishment of successful L. monocytogenes infection .96 This STM screen also identified another novel virulence factor in L. monocytogenes designated FbpA . 
+ This gene has strong homology to atypical fibronectin-binding proteins such as PavA of Streptococcus pneumoniae , Fpb54 of S. pyogenes and FbpA of S. gordonii.97-99 Fibronectin is a dimeric glyocoprotein that has a critical role in eukaryotic cellular processes such as adhesion , migration and differentiation .100 However , many bacteria such as S. pneumoniae and Staphylococcus aureus utilize fibronectin to facilitate their internalization into epithelial cells .101 Mutation of fbpA in L. monocytogenes resulted in a 100-fold decrease in bacterial counts in the intestine and liver in orally infected mice when compared with the wild-type strain after 72 h. 100 Furthermore , there was a 10-fold decrease in bacterial counts in the mesenteric lymph between mutant and wild-type but no difference in the spleenic bacterial counts .100 This data indicated that FbpA is involved in the hepatic phases of listeriosis and represents a novel virulence factor . 
+ scores for 1,645 mutants which represented insertions in 855 different genes . 
+ This represented 91.1 % of mutants analyzed while the previous STM bank only identified insertion sites in 4.2 % of mutants .107 Furthermore the STM screen in O157 : H7 identified 13 attenuating mutations in LEE genes but sequencing identified 54 insertions in the LEE region which corresponded to 21 different genes .107 Analysis of the STM bank of EHEC O26 : H - demonstrated a role for cytotoxins ( EhxA and PssA ) during pathogenesis but these genes were not identified in the EHEC O157 : H7 screen .46,47 Parallel sequencing revealed that several such mutants were represented in the library and were generally negatively selected in calves .107 A similar massively parallel sequencing approach called INSeq ( insertion sequencing ) was developed independently to analyze transposon insertion sites in the gut commensal organism Bacteroides thetaiotaomicron .108 The authors utilized this approach to analyze genes required for colonization of conventional and germ-free or mono-colonized gnotobiotic mice . 
+ The work revealed how colonization by Bacteroides is influenced by existing populations in the gut and competition for key nutrients in this environment . 
+ STM is a powerful genetic tool that allows identification of genes that are important for different facets of pathogenesis and is well suited for analysis of elements required for gut colonizas ribute . 
+ tiontand localized pathogenesis . 
+ Recent technical advances in the screening , choice and identification of negative selection screens have broadened its applicability and versatility . 
+ One such technical advance is the development of a positive STM screen .109 This was used to screen for patho-adaptive Pseudomonas aeruginosa mutants promoting survival in the cystic fibrosis lung .109 This novel approach could be applied to other pathogens that enhance fitness in the host through patho-adaptive mutations and could provide a basis for a more comprehensive understanding of chronic infectious disease .109 Overall the STM tool is an important method for better understanding the behavior of microbes in the gut and other environments and in conjunction with other genome wide techniques ( microarray technology , in vivo expression techno-logy , TraDIS ) can be used to fully understand the multi-faceted nature of bacterial pathogenesis . 
+ It is expected that the results from these STM screens can be used to help develop vaccines or drugs to prevent additional infections and decrease the economic burden associated with such infections . 
+ Acknowledgments
+ Joanne Cummins is supported by funding from the Science Foundation Ireland Research Frontiers Programme ( 08-RFP-Gen1320 ) . 
+ We acknowledge the support of funding from the Alimentary Pharmabiotic Centre , University College Cork under the Science Foundation Ireland Centres for Science Engineering and Technology ( CSET ) program .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/22890136.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/22890136.txt 0 → 100644
View file @27818a9
+ Under-representation of intrinsic terminators across bacterial genomic islands: Rho
+ Horizontal gene transfer Intrinsic termination Hairpin RNA polymerase Pathogenicity 
+ 1. Introduction
+ Transcription involves synthesis of RNA by RNA polymerase ( RNAP ) on a DNA template and is functionally divided into initiation , elongation and termination ( von Hippel , 1998 ) . 
+ The last step i.e. termination involves stopping of elongation , release of the RNA and dissociation of the RNAP machinery ( Richardson and Greenblatt , 1996 ) . 
+ In bacteria , termination functions by two mechanisms -- intrinsic and factor-dependent ( Peters et al. , 2011 ; Santangelo and Artsimovitch , 2011 ) . 
+ At intrinsic terminators ( ITs ) , termination is effected by the sequence and structural features of the hairpin and the U-trail of the na-scent RNA ( Epshtein et al. , 2007 ) . 
+ In contrast , factor-dependent termination predominantly involves the Rho protein which shows little preference for any speciﬁc sequence or structure on the RNA and the template DNA for its activity ( Ciampi , 2006 ; Richardson , 2002 ) . 
+ Rho seems to be the major termination factor for genes that do not have an IT downstream . 
+ It has been speculated that several genes are likely targets for Rho in vivo , although only few have been characterized ( Ciampi , 2006 ) . 
+ Historically , termination has received relatively lesser attention than the ﬁrst two steps of transcription . 
+ In the recent 
+ Abbreviations : GI , genomic islands ; IT , intrinsic terminator ; HGT , horizontal gene transfer ; RNAP , RNA polymerase . 
+ Corresponding author at : Department of Microbiology and Cell Biology , Indian Institute of Science , Bangalore-560012 , India . 
+ Tel. : +91 80 22932598 ; fax : +91 80 23600668 . 
+ E-mail address : vraj@mcbl.iisc.ernet.in ( V. Nagaraja ) . 
+ post-genomics era its regulatory importance in the context of the whole cell is being understood ( Cardinale et al. , 2008 ; Peters et al. , 2009 ) . 
+ An outcome of the large-scale sequencing and annotation of genomes is the `` pangenome '' concept . 
+ It is now understood that horizontal gene transfer has played a pivotal role in the evolution of pro-karyotes ( Boto , 2010 ; Boyd et al. , 2009 ; Juhas et al. , 2009 ; Ochman et al. , 2000 ) . 
+ In a nutshell , horizontal gene transfer ( HGT ) is the acquisition of DNA from the environment and its integration into the genome of the recipient species . 
+ The genes would be inherited by the daughter cells , even though they were not transmitted `` vertically '' . 
+ Such genes or gene clusters ( henceforth , generically referred to as genomic islands ( GI ) ) , code for various protein ( s ) with myriad functions . 
+ Their acquisition can result in `` quantum leaps '' by bacterial genomes ( Boto , 2010 ; Nakamura et al. , 2004 ) . 
+ However , un-concerted expression of any recently-acquired gene ( s ) or expression of toxic proteins from bacteriophages ( Canchaya et al. , 2004 ) ( Casjens , 2003 ) can have disastrous effects on cellular homeostasis of the host . 
+ Hence , after entering a genome , most GIs are repressed by silencing mechanisms that act at different stages of the gene expression process ( Navarre et al. , 2007 ) . 
+ Although mechanisms that control initiation and repression of transcription in GIs have been studied , the importance of transcription termination at genomic islands was noticed only recently . 
+ In Escherichia coli , the global regulatory role of Rho has emerged from two studies using either microarray or ChIP-chip approaches , ( Cardinale et al. , 2008 ; Peters et al. , 2009 ) . 
+ Such studies have unambiguously show that in E. coli Rho-dependent termination is important in suppressing aberrant expression from the genomic islands including prophages . 
+ In this manuscript , we propose that the suppression of transcription in GIs could be a universally conserved function of Rho . 
+ We show that Rho-dependent termination indeed seems to play a similar role in regulating expression at GIs across diverse bacterial phyla . 
+ Furthermore , based on the experimental understanding about the mechanism of interactions of Rho with the nascent RNA and RNAP ( Dutta et al. , 2008 ) , we suggest that the lesser density of ITs in GI could actually facilitate Rho-dependent termination . 
+ 2. Materials and methods
+ The program GeSTer can recognize both canonical and non-canonical intrinsic terminators . 
+ The mode of action of GeSTer has been described earlier ( Mitra et al. , 2009 , 2010 ; Unniraman et al. , 2002 ) . 
+ All genome sequences were downloaded in their GenBank format from NCBI ( ftp : / / ftp.ncbi.nih.gov / genbank/genomes/Bacteria / ) . 
+ Sequence information about GIs was obtained from available literature , and from the NCBI lists for individual genomes . 
+ Once GeSTer has identiﬁed all the ITs for a genome , we analyzed the intrinsic terminator-content in the GIs of that genome . 
+ For a given GI , the density of ITs ( DIT ) was calculated as the [ ( number of ITs identiﬁed ) / ( number of genes ) ] × 100 . 
+ Similarly , the genomic DIT = [ ( number of ITs identiﬁed in genome ) / ( number of genes in genome ) ] × 100 . 
+ To ascertain which transcription units ( multigenic operon or single-gene ) had an IT downstream , the gene at the 3 ′ end of a multigenic operon was identiﬁed from the DOOR database , and the GeSTer results for that genome were analyzed to see if that 3 ′ terminal gene had an IT after its stop codon . 
+ The HGT ( IT/TU ) % was calculated as ( number of ITs in the GI ) / ( number of transcription units in the GI ) . 
+ The total number of transcription units in the genome was calculated from the genome-speciﬁc statistics available at the DOOR site ( http://csbl1.bmb.uga.edu/OperonDB_10142009/displayspecies . 
+ php ) . 
+ Genomic ( IT/TU ) % is calculated as ( number of ITs in the genome ) / ( number of transcription units in the genome ) . 
+ 3. Results and discussion
+ 3.1. Rationale for the experimental design
+ A salient result of the microarray studies in E. coli K-12 is that , when Rho action was inhibited by the antibiotic Bicyclomycin , the transcription of several GIs ( known as K-islands in E. coli K-12 MG1655 ) signiﬁcantly increased ( Cardinale et al. , 2008 ) . 
+ These studies also revealed an under-representation of ITs in the same K-islands . 
+ Yet another study , treatment of E. coli K-12 MG1655 with sublethal dosage of Bicyclomycin followed by ChIP-chip analysis showed several regions on the chromosome where RNAP could localize only in presence of Bicyclomicin ( Peters et al. , 2009 ) . 
+ The inference was that Bicyclomycin speciﬁcally inhibited Rho in these cells , thus allowing RNAP to transcribe into regions where Rho would have caused termination in absence of the antibiotic ( Peters et al. , 2009 ) . 
+ These genomic regions , named Bicyclomycin Sensitive Regions ( BSRs ) , are thus sites where Rho-dependent termination would normally occur . 
+ The study identiﬁed 23 BSRs which were downstream of K-12-speciﬁc genes ( belonging to K-islands ) or pro-phage DNA . 
+ We analyzed the IT proﬁle of these BSRs and found that they have an under-representation of ITs and hairpins . 
+ Of the 23 BSRs that are downstream of the GIs , there was not a single IT or even a stable hairpin-forming sequence in 16 ( 70 % ) of them ( Supplementary Table S1 ) . 
+ Thus , ITs are under-represented in those regions of E. coli genome where Rho is functioning . 
+ In fact , the scarcity of ITs seems to have been compensated by the action of Rho ( Cardinale et al. , 2008 ) . 
+ Hence , Rho is most likely to terminate transcription at the ends of genes where ITs are absent as these are the only mechanisms of termination known in bacteria . 
+ This would mean that the intrinsic DIT of the GIs of any genome could be a pointer of Rho activity at such genomic islands . 
+ In other words , if the DIT of GI ( s ) is lower than the DIT of the whole genome , then Rho-dependent termination is probably an important mode of regulation in these GI ( s ) . 
+ Hence , we selected representative genomes from different phyla and clas-ses , for which information about GIs was available , and analyzed their IT proﬁles using the algorithm , GeSTer , which detects both ca-nonical and non-canonical ITs ( Mitra et al. , 2009 ; Mitra et al. , 2010 ; Unniraman et al. , 2002 ) . 
+ If the assumption that GIs across bacteria have extensive Rho-dependent termination is correct , we should observe a consistent trend of decreased presence of ITs in GIs in different species . 
+ Our sample included well characterized prophages , cryptic phages and other kinds of GIs . 
+ 3.2. GIs of other E. coli strains are poor in ITs
+ The importance of Rho-dependent termination in GIs of E. coli was based primarily on experiments in E. coli K-12 MG1655 . 
+ In particular , the paucity of ITs in GIs was shown only for the K-islands of E. coli K-12 ( Blattner et al. , 1997 ; Cardinale et al. , 2008 ) . 
+ At ﬁrst , we ensured that the results reported for the K-islands of E. coli K-12 could also be obtained using GeSTer . 
+ Tabulation of the ITs in 42 K-islands ( Cardinale et al. , 2008 ) showed that indeed , there was ~ 50 % reduction in DI . 
+ DIT in these GIs was only 21.9 % as compared to the whole genomic DIT of E. coli K-12 of 41.7 % . 
+ Thus , although we had used a different algorithm , these results were consistent with the previous study . 
+ Next , we considered another `` model '' strain , E. coli 0157 : H7 EDL933 , which also houses several GI , collectively called O-islands ( OIs ) . 
+ As with E. coli K-12 , the OIs of this genome also show enhanced transcription after bicyclomycin treatment ( Cardinale et al. , 2008 ) . 
+ Hence , the IT proﬁle of 11 OIs -- OI-7 , 8 , 9 , 35 , 36 , 43 , 44 , 45 , 47 , 48 and 50 ( consisting of a total of 616 genes i.e. 11.8 % of genome ) of E. coli 0157 : H7 EDL933 was analyzed . 
+ The major criterion for selecting these OIs was that they all were relatively large GIs . 
+ The largest among them , OI-43 , encoded for 106 genes while the smallest , OI-35 , contained 15 genes . 
+ Additionally , in order to assess the regions annotated as resident phages , we selected a prophage ( OI-45 ) and four representative cryptic phages . 
+ Out of 616 genes from these 11 O-islands , only 135 genes have an IT immediately downstream . 
+ Thus , as observed in E. coli K-12 , the number of IT is distinctly lower ( DIT = 21.9 % ) in these islands as compared to the genomic DIT of 36.6 % ( Fig. 1A ) . 
+ A closer examination into the IT proﬁles of the individual OIs showed that large stretches of genes are devoid of any ITs . 
+ Also , as reported in E. coli K-12 , we note that many genes occur in series on the same strand and most of these genes , including the gene at the 3 ′ end of the series , often lack ITs ( Cardinale et al. , 2008 ) . 
+ If these serial gene clusters are operons , then it seems likely that they lack an IT downstream . 
+ In addition , ITs are absent for most of the genes that are at the 5 ′ or 3 ′ ends of the OIs . 
+ Lack of identiﬁable ITs hints at the possibility that Rho-dependent termination is probably the major termination mechanism in these OIs . 
+ The genomes of two other strains of E. coli -- enteropathogenic E. coli 234869 ( Iguchi et al. , 2009 ) and uropathogenic E. coli CFT073 ( Lloyd et al. , 2007 ) -- code for several experimentally characterized pathogenicity islands . 
+ The total number of GI genes identiﬁed in E. coli 234869 is 493 . 
+ Besides prophages , these GIs also include the LEE island that has been implicated in virulence . 
+ Similarly , the CFT073 strain houses the well-known islands -- PAI-II , PAI-III and PAI-CFT073-serX -- that encode a total of 299 genes ( Lloyd et al. , 2007 , 2009 ) . 
+ The DIs of these islands show that there is a similar decrease in abundance of ITs . 
+ The DIT of the islands were 19.9 % and 20.1 % for strains 234869 and CFT073 respectively i.e. between 50 and 58 % of the genomic values ( Fig. 1B , Supplementary Fig. 1A ) . 
+ A detailed analysis of the two islands -- PAI-II from strain CFT073 and LEE from strain 234869 for the presence of ITs in relation to the genomi organization reafﬁrms the observations . 
+ PAI-II has 74 genes and can be considered divisible into 17 gene clusters ( Fig. 1C ) . 
+ Of these , ITs are totally absent in case of 11 clusters while only three clusters have ITs at the ends . 
+ The IT proﬁle of the LEE island is analogous . 
+ Of the nine gene clusters in the LEE island , six clusters have no IT at all ( Supplementary Fig S2 ) . 
+ This includes the clusters that are at the two ends of the LEE island , suggesting that Rho-dependent termination may also prevent read-through transcription into and out of the island . 
+ Thus , both the PAI-II and LEE islands are poor in ITs , with skewed distribution suggesting that Rho-dependent termination is a signiﬁcant regulator of these GIs . 
+ Thus , it seems that GIs across various E. coli strains are poor in ITs , and , as experiments have shown in E. coli K-12 , are most likely `` hotspots '' for Rho-dependent termination . 
+ 3.3. GIs in other γ-proteobacteria have a dearth of ITs
+ Salmonella enterica serovar Typhimurium ( LT2 isolate ) has a ge-nome that encodes several prophages , pathogenicity islands and phage remnants . 
+ We considered two pathogenicity islands SPI-I ( Lostroh and Lee , 2001 ) and SPI-II ( Hensel et al. , 1997 ) , four pro-phages and a phage remnant region ( 4422192 -- 4438335 bp ) . 
+ Similar to the results described above with the E. coli strains , both SPI-I and SPI-II showed very low DIT of 10.4 % and 6.8 % respectively , compared to the genomic DIT of 37.2 % ( Fig. 2A ) . 
+ Analysis of four representative prophage regions -- Gifsy-1 , Gifsy-2 , Fels-1 and Fels-2 also showed that the DIT values are consistently lower -- between 40 and 60 % of the genomic value . 
+ To ascertain that the paucity of ITs was observable only within the GIs , and was not a general feature of that part of the ge-nome , we resorted to a `` neighboring region '' approach . 
+ We analyzed the DIT ( Density of Intrinsic Terminators ) value of genomic stretches immediately adjacent to a GI . 
+ The stretch considered was very similar in total number of genes to the GI , but was part of the `` core genome '' . 
+ Thus , it served as a `` control experiment '' for the in silico analysis . 
+ The abundance of ITs in a 42-genes stretch ( STM2644-2693 ) ( DIT = 42.8 % ) was in sharp contrast to the neighboring 46-gene Fels-2 prophage which has a DIT of 17.4 % . 
+ Such `` neighborhood analysis '' revealed similar trends in other genomes . 
+ Another γ-proteobacterium , Pseudomonas aeruginosa PA-14 , harbors the large PAP-1 island ( Battle et al. , 2009 ) , shown to be important for virulence . 
+ GeSTer analysis showed that the PAP-1 island has only 11 ITs although it consists of 114 genes . 
+ This means that its DIT is only ~ 34 % of the genomic value . 
+ A closer inspection of the PAP-1 island ( Fig. 3A ) indicated the absence of ITs at either end of the island . 
+ Also , only two of the 16 gene-clusters in the PAP-1 island are probably terminated with an IT . 
+ Thus , Rho seems to be the major effector of transcription termination in this GI . 
+ The results are in congruence with the initial report in E. coli K-12 and suggest that Rho-dependent termination is indeed a strong regulator at GIs in γ-proteobacteria 
+ 3.4. IT proﬁles of α-, β- and ε-proteobacteria species
+ Next , extending the study beyond γ-proteobacteria , the geno-mic islands of 3 representative proteobacteria -- Bordetella petrii ( β-proteobacteria ) , Helicobacter pylori ( ε-proteobacteria ) and Brucella melitensis ( α-proteobacteria ) -- were analyzed . 
+ B. petrii , an environmental Bordetella species ( Lechner et al. , 2009 ) has 7 large genomic islands ( GI-1 to − 7 ) and 2 prophages , encoding for a total of 1150 genes . 
+ In line with the previous results , the GIs of B. petrii also have a lesser number of ITs ( ~ 43 % of genomic average ) . 
+ If a `` control '' region of 90 genes just upstream of GI-7 ( encodes 87 genes ) is considered , it has a DIT of 33.3 % , in sharp contrast to GI-7 's DIT of 11.5 % ( Fig. 2B ) . 
+ Also , the two prophage regions in the B. petrii have the lowest DIT . 
+ H. pylori 26695 strain has a DIT of 14.7 % ( Mitra et al. , 2009 ) . 
+ However , the DIT of the 27-gene encoding cag island ( Blomstergren et al. , 2004 ) is 7.4 % i.e. 50 % of the genomic value ( Figs. 2C , 3B ) . 
+ It is noteworthy that although absolute values of ITs are lower in this species when compared to others analyzed , the trend of lower DIT in GIs is consistent across distant species . 
+ In contrast , a `` control '' region of 35 genes immediately upstream of cag island showed a DIT of 13.9 % , very close to the genomic average . 
+ In case of the pathogenic α-proteobacteria , B. melitensis , a comparison between the genomic DIT ( 27.9 % ) and that of the genomic islands ( 16 % ) ( Supplementary Fig. 1B ) revealed a consistent trend observed in other proteobacterial species . 
+ 3.5. ITs in the genomic islands in other bacterial phyla
+ Several actinobacteria genomes sequenced so far also have their share of GIs . 
+ A functional Rho homologue has been reported for Micrococcus luteus , ( Nowatzke et al. , 1997 ) Streptomyces lividans and Mycobacterium tuberculosis ( Kalarickal et al. , 2009 ) . 
+ Thus , it is possible that Rho could play a similar `` silencing of xenogenic DNA '' role in actinobacteria . 
+ For the present analysis , three genomes were selected -- M. tuberculosis , Mycobacterium abscessus and Corynebacterium diphtheriae . 
+ GIs have been recently identiﬁed in the M. tuberculosis H37Rv genome ( Becq et al. , 2007 ) . 
+ The search identiﬁed only 36 ITs immediately downstream of the 454 GIs ' genes in M. tuberculosis H37Rv ( Supplementary Fig. 1C ) . 
+ Also , several `` large islands '' , notably , Rv739-750 , Rv2954-2961 , Rv3081-3089 , Rv3108-3227 and Rv298-303 did not have a single ﬂanking or internal IT . 
+ The islands Rv0057-0080 ( Fig. 3C ) and Rv0595-0614 had several gene clusters with no IT at their 3 ′ ends . 
+ The experimentally characterized genomic island Rv0986-0988 ( RosasMagallanes et al. , 2006 ) had no IT either . 
+ Thus , even for a bacterium which has a distinctly low abundance of ITs ( DIT = 11.9 % ) , the GIs , which comprise ~ 10 % of the genome , show further decrease in the IT content ( DIT = 7.7 % ) . 
+ In M. abscessus ( Ripoll et al. , 2009 ) , the causative agent of Buruli ulcer , a similar pattern is observed ( Fig. 2D ) . 
+ Furthermore , if the GIs of M. abscessus are divided into prophage and non-prophage regions , then the three prophages show a further reduction in th number of ITs . 
+ In contrast , a `` control '' region consisting of similar number of genes ( MAB0198-0220 ) immediately upstream of the prophage , MAB0221-0242 , has a much larger number of ITs . 
+ An analogous situation is seen in the case of the non-mycobacterial actinomycete , C. diphtheriae 13129 ( Cerdeno-Tarraga et al. , 2003 ) . 
+ The two known prophages of C. diphtheriae are the poorest with respect to the number of ITs ( Supplementary Fig. 1D ) . 
+ 3.6. Fewer operons within GIs have an IT downstream
+ It can be argued that the differences in DIT between GIs and `` core '' regions of a given genome are a function of their operonic content . 
+ To check this possible scenario , we used the DOOR ( Mao et al. , 2009 ) , which is considered a reliable database for operon prediction ( Brouwer et al. , 2008 ) . 
+ For any genome , the complete set of transcription units ( TU ) , that includes both multigenic operons and single-gene TU , can be obtained from DOOR . 
+ This data allowed us to ascertain the number of TUs in some of the GIs analyzed from diverse species and also how many of those TUs have an IT downstream . 
+ Although the results in this case are obtained from two prediction systems , DOOR and GeSTer are among the most reliable databases available , and so errors are likely to be minimal . 
+ The results show that fewer TUs belonging GIs have an IT downstream , as compared to the genomic estimates ( Supplementary Table 2 ) . 
+ The lack of ITs at the 3 ′ ends of many TU in these GIs indicates that these TU are employing Rho-dependent termination . 
+ 3.7 . 
+ ITs in the GIs of Bacillus subtilis and Staphylococcus aureus , two species with low expression of Rho 
+ B. subtilis , a ﬁrmicute , has been shown experimentally to have very low intracellular levels of Rho , constituting about 0.004 % of the total cellular soluble protein ( Ingham et al. , 1999 ) . 
+ In contrast , the level of Rho is ~ 0.15 % of the total protein in E. coli , and even higher levels of expression of Rho is seen in mycobacteria ( Mitra et al. , unpublished observations ) . 
+ The non-essential nature of Rho in B. subtilis has been demonstrated by the fact that B. subtilis grows well in the presence of Bicyclomycin , the speciﬁc inhibitor of Rho although the antibiotic inhibits all known Rho homologues , including B. subtilis Rho in vitro . 
+ Moreover , in other ﬁrmicutes such as S. aureus and Streptococcus species , Rho is non-essential ( Washburn et al. , 2001 ) or has been lost ( Mitra et al. , 2009 ) , illustrating the limited importance of Rho in ﬁrmicutes . 
+ Not surprisingly , the ﬁrmicutes are species ' with the highest incidence of ITs ( de Hoon et al. , 2005 ; Mitra et al. , 2009 ) . 
+ Since Rho action seems to be predominant wherever there is a lack of ITs , as a corollary , in species with decreased levels of Rho , not only `` core genome '' regions but also genomic islands would employ a larger number of ITs for regulation of gene expression . 
+ Indeed , searching the GIs of B. subtilis 168 genome ( Westers et al. , 2003 ) with GeSTer conﬁrms this hypothesis . 
+ The DIT for GIs is 32.7 % while the genomic DIT is 41.3 % . 
+ Similarly , the genomic DIT of S. aureus is 33.3 % , while that of three representative GIs ( including a prophage and TSST-pathogenicity island ) is 25 % . 
+ Thus of all the species analyzed , the GIs of B. subtilis and S. aureus have the highest number of ITs . 
+ It is most likely that the overall increased dependence of B. subtilis and S. aureus on intrinsic termination is mirrored in its GIs as there is insufﬁcient Rho for efﬁcient inter - and intragenic termination . 
+ Such species could be employing other mechanisms such as R-M systems or nucleoid-associated proteins to silence spurious expression of GIs . 
+ For example , the sequences of GIs often have a GC-content that is lesser than that of the host ge-nome , allowing NAPs to selectively silence such regions ( Gordon et al. , 2010 ; Navarre et al. , 2006 ) . 
+ 3.8. Dearth of ITs in GIs may facilitate Rho-dependent termination
+ A simple explanation to the observation , consistent with experimental data , is that these GIs are `` hotspots '' for Rho-dependent termination ( Cardinale et al. , 2008 ; Peters et al. , 2009 ) . 
+ A trans factor such as Rho is probably advantageous over cis-acting intrinsic termination in the context of GIs . 
+ Rho initiates termination by ﬁrst loading onto a stretch of nascent RNA called the rut ( Rho utilization ) site ( Richardson , 2003 ; Richardson and Richardson , 1996 ) . 
+ In the few Rho-dependent terminators that have been experimentally characterized ( Ciampi , 2006 ) the rut site is C-rich but has no consensus sequence . 
+ Thus , C-rich sequences can not be termed as speciﬁc sites and not all C-rich sequences are Rho sites . 
+ However , Rho also binds to other RNAs as well , and the recent genome-wide studies on Rho have not identiﬁed any degenerate sequence at Rho-dependent termination sites . 
+ Infact , the lack of a conserved sequence in the rut site could well enhance Rho 's ability to carry out both intergenic termination as well as intragenic termination of any gene provided it has access to a sufﬁcient length of naked RNA ( Ciampi , 2006 ; Faus and Richardson , 1990 ; Gowrishankar and Harinarayanan , 2004 ) ( Fig. 4 ) . 
+ Additionally , many of these xenogenic genes have a relatively poor codon adaptation index . 
+ Hence , it is likely that the leading ribosome actually lags far behind the transcribing RNAP allowing Rho to bind to the naked RNA in between the RNAP and the ribosome and cause termination ( Richardson , 2006 ) ( Fig. 4 ) . 
+ Thus , Rho is uniquely suited to be primary mediator for prematurely terminating transcription of genes encoded in GIs . 
+ Since Rho is functioning , there is no evolutionary constraint in favor of ITs at these GIs . 
+ However , there are two `` limitations '' to Rho 's mechanism , and both are based on its limited ability to bypass a hairpin structure on the transcript . 
+ An intervening double stranded RNA stem can prevent E. coli Rho 's ability to translocate along the RNA towards the RNAP ( Steinmetz et al. , 1990 ) . 
+ Additionally , Rho can not terminate RNAP when the latter has been paused by a class I pause hairpin , ( Dutta et al. , 2008 ) . 
+ The exact mechanism of how a pause hairpin inhibits Rho-dependent termination is unclear . 
+ However , it has also been shown recently that Rho employs an allosteric mechanism to cause termination ( Epshtein et al. , 2010 ) . 
+ Rho interacts with the lid and other domains in the exit channel region of RNAP to transmit an inhibitory signal to the active center of RNAP . 
+ β ′ domains extending from the lid and other neighboring parts of the β ′ clamp probably mediate signals from Rho ( or Rho-RNA complex ) to the catalytic site . 
+ The pause hairpin formed in the exit channel uses the β and β ′ domains contacted by Rho to transmit a pause-inducing signal to the active site of RNAP ( Toulokhonov and Landick , 2003 ; Toulokhonov et al. , 2001 , 2007 ) . 
+ Thus , when Rho encounters RNAP paused at a hairpin , it is most likely that the RNAP domains that Rho would have used to transduce a terminating signal are either occluded by the hairpin or , alternatively , are in a conformation that is unresponsive to the factor ( Supplementary Fig S3 ) . 
+ The paused RNAP would however , resume elongation after a speciﬁc time . 
+ But , the already formed hairpin that is now extruded from the RNAP exit channel could still impede translocation of Rho . 
+ Thus , presence of sequences which have potential to form hairpins would effectively reduce the efﬁciency of Rho-dependent termination and increase the probability of RNAP completing the transcription of toxic or unnecessary genes of GIs . 
+ Since Rho-dependent termination seems to be a GI-silencing mechanism it is possible that there could be progressive selection against hairpin-encoding sequences to facilitate Rho action . 
+ In other words , Rho 's silencing action at the various GIs may be facilitated by the lack of structured RNA moieties like intrinsic terminators and hairpins which makes the RNA unstructured and more suitable as a substrate for Rho . 
+ Moreover , since these regions are not part of the core genome it is easier to select against them . 
+ However , selection against hairpins by substitution or deletion is also likely to delete a signiﬁcant fraction of the ITs since all of them consist of a hairpin . 
+ In this scenario , their removal is not detrimental as these stretches of GIs are now regions where there is efﬁcient Rho-dependent termination . 
+ In effect , Rho would have functionally replaced ITs in these regions . 
+ Two pieces of evidence -- both of which focus on the mutual exclusiveness of Rho-dependent terminators and ITs/hairpins -- seem to corroborate the above model . 
+ Firstly , as described earlier , the Bicyclomycin-sensitive regions ( BSRs ) of E. coli K-12 MG 1655 ge-nome ( Peters et al. , 2009 ) have an under-representation of ITs and potential hairpins , especially the BSRs that are downstream of K-12 speciﬁc and prophage DNA ( Supplementary Table S1 ) . 
+ Secondly , there is an inverse correlation between genomic GC content and the prevalence of ITs in any genome . 
+ Additionally , Rho seems to become more indispensable in species ' as genomic GC content increases . 
+ Since experiments have shown that Rho action may be facilitated by the lack of hairpins ( Dutta et al. , 2008 ) it is possible that genomes , which predominantly rely on Rho for termination could have an overall under-representation of hairpins -- both ITs and pause hairpins -- in the regions downstream of the genes . 
+ This would happen in both GIs and in core genomic regions , as Rho is a global regulator . 
+ To assay this , we determined the total number of hairpins for a sample of bacterial genomes ( n = 27 ) and computed the genomic ( hairpins/genes ) ratio for these genome ( Fig. 5 ) . 
+ The results show that as the genomic GC content increases , the genomic ( hairpins/genes ) value tends to decrease . 
+ The results are in harmony with the fact that most bacteria which lack Rho have AT-rich genomes ( eg. , mycoplasma , many streptococci ) ( Mitra et al. , 2009 ) , while Rho seems to be indispensable in species with high GC content ( eg. , Caulobacter crescentus , M. luteus , M. tuberculosis , Steptomyces sp . ) . 
+ In other words , in bacteria where Rho-dependent termination is more important , the absolute number of stable hairpins in intergenic regions decreases across the entire ge-nome , possibly to favor Rho-dependent termination . 
+ Such a situation would also be consistent with the lack of ITs in GIs across different genomes . 
+ 3.9 . 
+ Intergenic , but not intragenic , Rho-dependent termination could function efﬁciently in expressing GIs 
+ As mentioned earlier , Rho can effect intragenic termination within the coding region of a poorly translated gene because transcription -- translation coupling is inefﬁcient , allowing the factor to access to nascent RNA and RNAP ( Fig. 4 ) . 
+ However , if a GI gene that has been silenced in the past by intragenic Rho-dependent termination is now incorporated into the cellular machinery , selection is likely to ensure that the gene 's codon adaptive index is similar to that of the cell . 
+ In that case , transcription -- translation coupling would now function efﬁciently preventing intragenic termination by Rho , and ensure gene expression . 
+ However , Rho-dependent termination would still continue to be the preferred mode of termination , once the stop codon has been crossed ( Fig. 5 ) . 
+ Thus , Rho is likely to be the default mode of intergenic termination for most GIs , irrespective of whether they are silenced or expressed across species . 
+ This could explain why ITs are rare even in genomic islands that are known to express and carry out deﬁned functions such as SPI-1 , SPI-2 , LEE , PAI-II , and cag . 
+ 4. Conclusion
+ Once a genomic island gets integrated into a genome , multiple checkpoints are likely to ensure that expression from its genes is silenced or stringently regulated to prevent any toxicity . 
+ Cis factors like ITs are of limited effectiveness in such situations as they can only function when sequences that encode them are `` strategically '' inserted into the GI . 
+ In contrast , a trans factor like Rho protein is more effective in bringing about termination as it has lesser sequence constraints and can effectively sense uncoupling of transcription and translation . 
+ Hence , Rho-dependent termination is likely to be more effective in regulating transcription from any xenogenic DNA that enters the genome . 
+ Consequently , in stretches of the genome where there is active Rho-dependent termination ( such as GIs ) , ITs not only become functionally redundant , but experimental evidence hints that they may also hinder efﬁcient Rho-dependent termination . 
+ Hence , over evolutionary timescales , these regions could undergo a selection against such RNA hairpins . 
+ Since our analysis is a snapshot in an evolutionary time-scale , a uniform decrease of ITs in GIs across different phyla is unlikely to be observed for various reasons . 
+ Both coding regions and non-coding regulatory elements of GIs are likely to be subjected to differential selection pressures . 
+ Individual GIs could have initially `` entered '' the host genome with different cohorts of ITs at different time points and varied time spans would have elapsed since their genomic integration . 
+ However , the genome analysis across diverse species reinforces the experimental evidence that Rho is indeed an important genome sentinel along with restriction -- modiﬁcation systems , Nucleoid Associated Proteins , transcription repressors and other factors ( proteins , small RNAs ) that act at different stages to silence expression of foreign DNA . 
+ Rho 's ability to interact without exquisite sequence speciﬁcity coupled to its property of translocating along RNA interacting with RNAP has resulted in a versatile component of cellular `` immunity surveillance '' mechanism . 
+ Supplementary data to this article can be found online at http : / / dx.doi.org/10.1016/j.gene.2012.07.064 . 
+ Acknowledgments
+ V. N. is a recipient of the J. C. Bose fellowship of the Department of Science and Technology , Government of India . 
+ The work is supported by the Centre of Excellence for Mycobacterial Research Grant , Government of India .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/22923524.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/22923524.txt 0 → 100644
View file @27818a9
+ Improved predictions of transcription factor binding
+ ABSTRACT 
+ Typical approaches for predicting transcription factor binding sites ( TFBSs ) involve use of a position-specific weight matrix ( PWM ) to statistic-ally characterize the sequences of the known sites . 
+ Recently , an alternative physicochemical approach , called SiteSleuth , was proposed . 
+ In this approach , a linear support vector machine ( SVM ) classifier is trained to distinguish TFBSs from background sequences based on local chemical and structural features of DNA . 
+ SiteSleuth appears to generally perform better than PWM-based methods . 
+ Here , we improve the SiteSleuth approach by considering both new physicochemical features and algorithmic modifications . 
+ New features are derived from Gibbs energies of amino acid -- DNA interactions and hydroxyl radical cleavage profiles of DNA . 
+ Algorithmic modifications consist of inclusion of a feature selection step , use of a nonlinear kernel in the SVM classifier , and use of a consensus-based post-processing step for predictions . 
+ We also considered SVM classification based on letter features alone to distinguish performance gains from use of SVM-based models versus use of physicochemical features . 
+ The accuracy of each of the variant methods considered was assessed by cross valid-ation using data available in the RegulonDB database for 54 Escherichia coli TFs , as well as by experimental validation using published ChIP-chip data available for Fis and Lrp . 
+ Mark Maienschein-Cline , Aaron R. Dinner , William S. Hlavacek and Fangping Mu * 1 1 2,3 2,3,
+ Department of Chemistry , University of Chicago , Chicago , IL 60637 , Theoretical Biology and Biophysics Group , Theoretical Division , Los Alamos National Laboratory , Los Alamos , NM 87545 and Department of 3 Biology , University of New Mexico , Albuquerque , NM 87131 , USA 
+ INTRODUCTION
+ Transcription factors ( TFs ) are key molecular components of gene regulatory networks that modulate gene expression by binding to DNA and affecting the ability of RNA polymerase to transcribe genes . 
+ Thus , methods for identifying TF binding sites ( TFBSs ) in DNA can provide important insights into cell biology and may in the future help to enable exquisite manipulation of cellular behavior through synthetic and systems biology approaches ( 1,2 ) . 
+ A large number of binding sites for diverse TFs have been characterized through targeted low-throughput experimental approaches . 
+ Additionally , several highthroughput methods , such as chromatin immunoprecipitation coupled to sequencing ( ChIP-seq ) and protein binding microarray ( PBM ) assays , are now available for large-volume detection of binding sites ( 3 -- 5 ) . 
+ TFBSs discovered through both low - and high-throughput approaches are documented in databases such as RegulonDB ( 6 ) and JASPAR ( 7 ) . 
+ These methods and the increasing catalog of TFBSs are providing new insights into the general nature of TF -- DNA interactions and promise to elucidate how TF binding speciﬁcity is achieved ( 8 ) . 
+ Experimental approaches for characterizing TFBSs are complemented by computational approaches , which can provide a level of detail inaccessible experimentally . 
+ For example , ChIP-seq binding sites are limited to a precision of a couple hundred base-pairs ( bp ) ( 9 ) , which is much longer than actual TFBSs . 
+ Computational methods typic-ally aim to model sets of TFBSs as ( sequence ) motifs ( 5,10 ) , built on the basis of a set of training data . 
+ A motif model can be used to summarize data , to more precisely localize a binding site within a region of DNA known to associate with a TF , to design experiments , or to predict the effect of a mutation on a known TFBS . 
+ It can also provide insights into the features of DNA sequences important for TF -- DNA recognition . 
+ A common approach to motif representation or motif modeling involves the construction of a position-speciﬁc weight matrix ( PWM ) or a consensus sequence ( 10 ) . 
+ There are many methods , as well as software tools , for modeling TFBSs in terms of PWMs ( 11 -- 17 ) , as well as more advanced techniques that consider dependencies between nucleotides in different positions ( 18 ) , but the vast majority are based on the assumption that letter representations of DNA sequences suitably capture the physicochemical properties of DNA ( and proteins ) that govern the speciﬁcity of protein -- DNA interactions . 
+ However , the general validity of this assumption is questionable ( 19 ) . 
+ The three-dimensional ( 3D ) structure of DNA is sequence dependent ( 20,21 ) , and shape readout is an important mode of recognition used by a large class of TFs ( 22 -- 25 ) . 
+ However , letter sequence similarity does not guarantee structure similarity and vice versa : DNA sequences can diverge at the level of letter representation but share a similar structure , and conversely , DNA sequences can differ in only one or two bases but have distinct local structures ( 20,22,26 ) . 
+ There is strong evidence that TFs that recognize one particular sequence can also recognize different sequences if the sequences have similar structural properties , and more generally that some TFs interact with multiple classes of DNA sequences at the level of letter representation ( 21 -- 24,27 -- 29 ) . 
+ Interestingly , it was recently reported that Hoogsteen base pairs , which are characterized by a pattern of hydrogen bonding that differs from that of Watson -- Crick base pairs , are present in free DNA in equilibrium with Watson -- Crick base pairs ( 30 ) . 
+ These discoveries imply that TF-DNA binding speciﬁcity is extremely unlikely to be described by a simple linear code ( 31 ) . 
+ The simple reason is that , with this newfound understanding of the plasticity of DNA structure , a letter code for a DNA sequence can no longer be taken to have an unambiguous structural interpretation . 
+ Moving beyond analysis of letter codes , researchers have made a number of attempts to use structural data to predict TFBSs ( 32 -- 42 ) . 
+ Some approaches focus on shape readout , although this is relevant only for some TFs . 
+ A second important mode of recognition is base readout ( 22 ) , which involves direct contacts between nucleotides and amino acid residues . 
+ In other words , some TFs scan the chemical signatures of DNA sequences , not their shapes alone . 
+ Methods that rely on atomic structures of protein -- DNA complexes can address these cases but are computationally more expensive and depend strongly on the quality of experimental structures ( 35 ) . 
+ In some cases homology structure predictions can provide some of these details , but these calculations still require a degree of expertize in structural modeling . 
+ Although promising results have been obtained by all-atom approaches in some cases , there is a need for methods that consider details at an intermediate resolution , between the ﬁne resolution of atomic characterization of macromolecular complexes and the coarse resolution of letter representations of DNA . 
+ In particular , we believe that a method requiring only the sequences of known binding sites ( i.e. the same inputs as standard PWM approaches ) but using physical properties of DNA to construct a TFBS model could begin to approach the accuracy of structure-based models while retaining the accessibility of the usual PWM models . 
+ Recently , we reported a motif modeling approach based on local structural and chemical features of DNA ( 26 ) . 
+ This approach , which we called SiteSleuth , maps DNA sequences to physicochemcial features and uses a support vector machine ( SVM ) classiﬁer that discrimin-ates between known TFBSs and genome background sequences . 
+ The features considered include structural features , which characterize the local conformation of a DNA sequence , and chemical features , which characterize the thermodynamics of interactions between small functional group probes and a DNA sequence . 
+ The SiteSleuth method typically performs better than commonly used PWM-based methods ( 26 ) . 
+ Here , we report an improvement of the SiteSleuth method obtained by considering both new physicochemical features and algorithmic modiﬁcations ( i.e. variations on the machine learning approach ) . 
+ We examine each improvement by implementing them one by one into distinct motif models , and by comparing to a standard PWM-based algorithm . 
+ In all , we compare six methods . 
+ To evaluate the different motif modeling approaches , we focused on 54 TFs in Escherichia coli and their binding sites documented in RegulonDB ( 6 ) , measuring the accuracy of each model through cross validation . 
+ We also used ChIP-chip binding data available for the E. coli TFs Fis and Lrp ( 43,44 ) . 
+ MATERIALS AND METHODS
+ Our physicochemical motif modeling approach is based on two essential ingredients : physicochemical features of DNA , and supervised machine learning , in particular the use of SVMs to discriminate known TFBSs from background genome sequences . 
+ In this section , we ﬁrst describe calculation of the various features used in our motif models and show how DNA sequences are mapped to those features . 
+ We consider two main classes of physicochemical features : structural features , which characterize the conformational rigidity and steric properties of DNA , and chemical features , which characterize the electrostatic proﬁle around DNA . 
+ We also introduce letter features , which makes the information used in training an SVM the same as that used in standard PWM-based approaches . 
+ We distinguish the new models using physicochemical features or letter features by including PMM or LMM ( for physical motif model or letter-based motif model ) , respectively , in their name . 
+ Second , we describe the details of the training and predicting aspects of the machine learning approach that we use , which is based on SVMs . 
+ We discuss optimization of SVM parameters through grid search , the differences between the linear and radial basis function ( RBF ) SVM kernels , improvements in the training step through feature selection , and improvements in the prediction step through consensus-based post-processing of the positive predictions . 
+ Finally , we discuss the sources of data used for training and testing . 
+ Deﬁnition and use of feature sets
+ Structural features
+ Our structural features are based on free DNA properties . 
+ Because structural correlations have been observed between free and bound TFBSs ( 45 ) , we expect tha these properties will be relevant for TFBSs prediction . 
+ First , parameters describing the geometry of base pairs and bp steps were derived from duplex structures of short DNA sequences ( all possible 3 - and 4-mers embedded between ﬂanking GC dinucleotides ) , which were found via molecular dynamics ( MD ) simulations as described previously ( 26 ) . 
+ Brieﬂy , for each duplex , the initial structure was taken to be the standard Watson and Crick structure for B-DNA . 
+ The NAMD program ( 46 ) and the CHARMM27 force ﬁeld ( 47 -- 50 ) were then used to produce an equilibrium average structure . 
+ For the middle base pair in each of the 64 possible 3-mers , we used normal mode analysis of the corresponding average structure to calculate six base-pairing parameters : shear , stretch , stagger , buckle , propeller and opening . 
+ Similarly , for the middle 2 bp in each of the 256 possible 4-mers , we used normal mode analysis of the corresponding average structure to calculate 6 bp step parameters : shift , tilt , slide , roll , rise and twist . 
+ Features derived from simulated structures have been shown to correlate with features derived from experimentally determined structures ( 26 ) . 
+ In addition to the geometric parameters described above , which were considered in earlier work ( 26 ) , we also considered a structural proﬁle deﬁned on the basis of hydroxyl radical cleavage of DNA ( 20,53,54 ) . 
+ This proﬁle has been shown to correlate with various aspects of DNA structure ( 54 ) , as the global structure of a DNA sequence imposes localized steric constraints on hydroxyl radical cleavage propensity ( 20 ) . 
+ The ORChID ( OH Radical Cleavage Intensity Database ) resource provides tools for predicting the hydroxyl radical cleavage proﬁle of a given DNA sequence ( 53 ) . 
+ The proﬁle is calculated by sliding a 4 bp window across the sequence and averaging over a database of experimentally measured cleavage proﬁles to generate a cleavage propensity at each nucleo-tide . 
+ Within the ORChID tool , the cleavage propensity at each position is completely determined by the three ﬂanking nucleotides on each side of a central nucleotide , so our hydroxyl radical cleavage feature list consists of the calculated structural proﬁles for all possible 7-mers ( 4 = 16 384 ) . 
+ Each of these 7-mers is associated with 7 two structural features : the cleavage propensities of the central nucleotides of the forward and reverse strands . 
+ Chemical features
+ Structural features characterize the conformation of free DNA . 
+ Here , we introduce chemical features to characterize the electrostatic proﬁle around DNA , which can be expected to inﬂuence site-speciﬁc protein -- DNA interactions . 
+ In earlier work , 31 small functional groups were used as probes , and thermodynamic parameters were calculated to characterize an array of probe and DNA conﬁgurations ( 26 ) . 
+ Here , we consider the 20 common amino acids as probes of the DNA electrostatic proﬁle . 
+ For a given probe , different spatial conﬁgurations of the probe and a DNA duplex are generated . 
+ Thermodynamic parameters are calculated for each conﬁguration and an average over the conﬁgurations is determined . 
+ Features based on functional group probes were calculated as described previously ( 26 ) . 
+ For amino acid probes , a similar approach is followed , except sampling of conﬁgur-ations is now more extensive because amino acids are relatively large , and as a result , thermodynamic parameters are more sensitive to conﬁgurational aspects of probe -- DNA interaction . 
+ These features , which are introduced in this study , are determined as described below . 
+ Initial structures of amino acids capped by an acetyl group at the N-terminus and an N-methylamide group at the C-terminus were obtained using CHARMM34b1 ( 55 ) . 
+ These structures were paired with equilibrated , average structures of DNA 3-mers ( with ﬂanking GC di-nucleotides ) , calculated as described above . 
+ For each of the 20 64 amino acid-DNA duplex pairs , we considered the following spatial conﬁgurations of the two molecules . 
+ As illustrated in Figure 1 , for each of the two central nucleotides in the DNA duplex , we considered a 6 6 3 grid ﬁlling a rectangular box . 
+ For each grid , the a-carbon of the amino acid probe was placed at each of the 108 grid points . 
+ The initial orientation of the amino acid was arbitrary but consistent across grid points . 
+ At each grid point , we considered 81 distinct whole-molecule rotations . 
+ Each of these rotations was a composition of a rotation of angle y 2 [ p/2 , 7p/18 , 5p/18 , ... , p/2 ] around the x-axis and a rotation of angle f 2 [ p , 8p / 9 , 7p/9 , ... , p ] around the z-axis ( 9 rotations each ) . 
+ At each grid point , we also considered ﬁfteen side-chain rotamers for all amino acid probes except alanine , glycine and proline . 
+ Rotamers of an amino acid were generated by rotating the side chain around the bond between the a - and b-carbons ( 1 dihedral ; additional rotations around 2 , 3 etc. were not considered ) . 
+ The angles of rotations were integer multiples of 2p/15 ( see Figure S1 of the Supplementary Materials for more details on how the angle increments were chosen ) . 
+ Thus , for each side of a DNA duplex , we considered a total of 2 256 984 probe-duplex conﬁgurations ( 108 81 ( 17 15 +3 ) ) . 
+ For each amino acid-DNA duplex pair ( Figure 1 ) , we estimated the Gibbs free energy at each grid point p using the expression Gp = kBTlnZp , where kB is the Boltzmann constant , T is the absolute temperature , which wePtook to be 298 K , and Zp is the partition function q exp ( Epq/kBT ) . 
+ In the partition function , q is an index for the elements of the set of all whole-molecule rotations and all side-chain rotations ( if any ) , and Epq is the total energy given by NAMD and the CHARMM27 force ﬁeld with the probe at grid point p in orientation q . 
+ The change in Gibbs free energy caused by interaction between the probe at a grid point and the DNA duplex , Gp , was found by subtracting the Gibbs free energy obtained when the probe and DNA duplex were separated by a large distance , G . 
+ In other words , Gp Gp G. To deﬁne chemical features , we ﬁrst split up the grid into three sub-grids : two 3 3 3 grids in the minor and major grooves of the DNA and a 6 3 3 grid outside the DNA ( orange dots , red dots and purple dots in Figure 1 , respect-ively ) . 
+ For each sub-grid , we computed the average ( over all favorable grid points with G < 0 ) , Gavg , and minimum , Gmin . 
+ This procedure resulted in 120 values for each side of a DNA duplex ( minimum and average G for each of three sub-grids and 20 amino acids ) 
+ Many amino acids may have similar interaction proﬁles with the various 3-mers , so many of these 120 dimensions are highly correlated across different DNA sequences . 
+ To eliminate correlated feature dimensions while retaining the essential information about the electrostatic proﬁle encapsulated in the chemical features , we used principal component analysis ( PCA ) to generate orthogonal vectors that capture the variability of the original feature set ; we normalized each of the 120 dimensions ( across all 64 possible 3-mers ) to have mean 0 and standard deviation 1 prior to performing PCA . 
+ We chose the top 20 principal components as an abbreviated feature list , which captured 90.5 % of the variance . 
+ The ﬁnal feature list was obtained by concatenating the features for each side of the DNA duplex . 
+ Thus , the chemical features consist of 40 values ( 20 for each strand ) associated with the center nucleotide of each DNA 3-mer . 
+ Letter features
+ The structural and chemical features discussed above are one facet of SVM-PMM and SiteSleuth that distinguishes these methods from other TFBSs prediction algorithms , such as PWM-based methods and other methods based on letter representations of DNA sequences ( 11 -- 14 ) . 
+ The other main difference is the use of an SVM classiﬁer . 
+ Because it was previously found that SiteSleuth outperforms other TFBSs prediction algorithms ( 26 ) , we wanted to measure how much of this improvement can be attributed to the use of SVM-based classiﬁcation versus the use of physicochemical features . 
+ To this end , we created an LMM that uses features designed to encode letter sequences : orthonormal 4D vectors ( 1,0,0,0 ) , ( 0,1,0,0 ) , ( 0,0,1,0 ) and ( 0,0,0,1 ) are designated as feature vectors of the single nucleotides A , C , G and T , respect-ively . 
+ Using letter features , we can independently assess the effects of using SVM-based classiﬁcation and physicochemical features in PMMs . 
+ We will use ` SVM-LMM ' to refer to TFBSs models that take letter features as input for training . 
+ Mapping DNA sequences to feature vectors
+ To associate DNA sequences of known or potential TFBSs with positions in a space of features , in which negative and positive examples can be separated using the SVM approach to classiﬁcation , we map each sequence to a feature vector of real numbers . 
+ Each scalar component of a feature vector corresponds to a letter , structural or chemical feature . 
+ Although it is time consuming to calculate the structural and chemical features of ( short ) DNA sequences , the mapping proced-ure described below allows us to pre-calculate and store sets of features and then to efﬁciently determine the physicochemical features of any new given DNA sequence . 
+ The procedure for mapping a given DNA sequence to a feature vector is illustrated in Figure 2 . 
+ Before starting this procedure , we select a known or potential binding site sequence ( Figure 2A ) , we add ﬂanking nucleotides at each end ( lower case ) in accordance with the genome sequence , and we identify the sets of pre-calculate features for short sequences that will be considered ( Figure 2B ) . 
+ In SVM-PMM , four feature sets are considered : ( i ) amino acid -- DNA chemical features ( ai ) ; ( ii ) structural base-pairing features ( bi ) ; ( iii ) structural bp step features ( gi ) ; and ( iv ) structural hydroxyl radical cleavage features ( di ) . 
+ Within a feature set , K features are associated with each of the possible DNA sequences of length N. Set ( i ) associates 20 features with each of the possible 3-mers ( i.e. ai 2 R20 , i = 1 , ... , 64 ) : Set ( ii ) associ-ates six features with each of the possible 3-mers : Set ( iii ) associates six features with each of the possible 4-mers : and Set ( iv ) associates two features with each of the possible 7-mers . 
+ In the case of letter features ( not considered in Figure 2 ) , four features ( three 0 's and one 1 ) are associated with each of the four 1-mers . 
+ To map a given DNA sequence to features , we start with the ﬁrst nucleotide of the sequence proper ( e.g. the ﬁrst capital letter in Figure 2A ) . 
+ For each feature set of interest , we consider the appropriate length N-mer sliding window across the sequence , illustrated in Figure 2B . 
+ Thus , for Set ( i ) or ( ii ) , for which N = 3 , we consider the ﬁrst nucleotide and its closest neighbors . 
+ The features associated with this N-mer are then concatenated to the feature vector for the sequence , as shown in Figure 2C . 
+ The example of Figure 2 is speciﬁc to SVM-PMM . 
+ The feature sets considered depend on the motif model under consideration , and motif models can incorporate different feature sets . 
+ For a given set of sequences to be used in SVM training , we linearly scale each feature associated with these sequences such that the numerical values associated with a given feature across all sequences lie between 1 and 1 . 
+ The purpose of this normalization step is to avoid differences in magnitude between feature dimensions overwhelming the differences within a feature dimension ( 56 ) . 
+ To scale the features , once all training sequences are mapped to feature vectors , we examine xi , j ( i over all sequences , j over all features ) and determine Mj maxj ( jxi , jj ) . 
+ We use xi , j/Mj as the components of feature vectors in SVM training . 
+ The values of Mj are saved and used to normalize components of feature vectors of test sequences . 
+ Algorithmic details
+ Machine learning algorithm
+ Using LIBSVM ( 56 ) , we train SVM classiﬁers to discriminate features of TFBSs from features of background genome sequences . 
+ The process is described below . 
+ We are given m feature vectors { x , ... , x } , where 1 m x 2 Rn and n is the number of features captured in each i vector ( recall that each feature corresponds to a single scalar quantity . ) 
+ We are also given the corresponding m classiﬁcation values { y , ... , y } , where y 2 { 1 , 1 } . 
+ 1 m i The feature vector x is mapped to a higher dimensional i space through a function f , the form of which depends on the form of a kernel function . 
+ The kernel function k ( zi , zj ) = f ( zi ) f ( zj ) deﬁnes a similarity measure between two points zi and zj . 
+ For a linear SVM , 
+ 1 2 ; ð1Þ which reﬂects use of a hyperplane to separate positive and negative examples . 
+ The RBF kernel is kðz ; z Þ 1/4 exp k z z k2 ; ð2Þ 1 2 1 2 where g is a constant . 
+ Because positive training examples are often tightly clustered in feature space , whereas negative training examples tend to be more broadly distributed , the RBF kernel often gives more accurate results . 
+ However , training an SVM classiﬁer with an RBF kernel is signiﬁcantly more computationally expensive ( 56 ) . 
+ Below , we use ` SVM ' in the name of a method t denote use of a linear kernel , and we use ` SVMR ' in the name of a method to denote use of an RBF kernel . 
+ In training an SVM , we ﬁnd a surface with weight vector w and offset d that separates the positive examples ( i.e. the examples for which yi = 1 ) from the negative examples ( i.e. the examples for which yi = 1 ) . 
+ This task is accomplished by solving the minimization problem X 1 min wTw þ Cþ i þ C w ;d ; i 2 y 1/4 1 i yiðw fðxiÞ þ dÞ 1 i and i 0 : ð4Þ 
+ The adjustable parameters xi ( i 2 [ 1 , ... , m ] ) are slack variables that are introduced to account for the fact that it is generally not possible to perfectly separate the training data . 
+ The C + and C parameters , which are called penalty parameters , are taken to have ﬁxed values . 
+ The minimization problem is solved using quadratic programming techniques . 
+ The solution can be express Xed as w 1/4 iyifðxiÞ : i where each ai is a Lagrange multiplier . 
+ The separating surface can be re Xpresented as for feature vector x. Thus , given values for C + , C and g ( if the RBF kernel is being used ) , the other SVM parameters ( w , d and xi for i = 1 , ... , m ) are uniquely determined by the solution of the minimization problem described above . 
+ The penalty parameters C + and C are introduced to balance the inﬂuences of positive and negative training data , which is important because we always have available many more negative examples than positive examples . 
+ Each of the C + , C and g ( if the RBF kernel is being used ) parameters affect the accuracy of a classiﬁer and should be optimized for best results . 
+ Optimization of these parameters is performed as follows . 
+ For an SVM with a linear kernel , we optimize C + and C . 
+ For an SVM with an RBF kernel , we set C + = C = C and optimize C and g . 
+ In both cases , optimization is performed through a 2D grid search . 
+ In this search , the optimality of a grid point is assessed using a cross-validation procedure , which is described below . 
+ This approach to SVM parameter optimization is an adaptation of a method recommended in the LIBSVM guide ( 56 ) . 
+ The search starts out over a coarse grid of points : C + , C = { 2 , 2 , ... , 2 } ( in the case of a linear 5 3 11 kernel ) , or C = { 2 , 2 , ... , 2 } and 5 3 15 g = { 2 15 , 2 , ... , 2 } ( in the case of a radial kernel ) . 
+ The optimiza-13 5 tion is reﬁned over two progressively ﬁner grids in smaller increments around the best grid point from the previous grid ( as assessed by the cross-validation procedure ) . 
+ For example , consider a linear kernel . 
+ If ( C + , C ) = ( 2 , 2 ) is 5 1 the optimal result from the ﬁrst grid search , the second grid search would be over C = { 2 , 2 3 4 5 6 7 + , 2 , 2 , 2 } and C = { 2 , 2 , 2 , 2 , 2 } . 
+ If ( C + , C ) = ( 2 , 2 ) is the 3 2 1 0 1 4 0 optimal result from the second grid search , the third grid search would be over C = { 2 , 2 3 3.5 4 4.5 5 + , 2 , 2 , 2 } and C = { 2 , 2 , 2 , 2 , 2 } . 
+ Reﬁnement stops after 1 0.5 0 0.5 1 the third grid search . 
+ Each grid point deﬁnes a pair of parameters . 
+ For each pair , we perform 3-fold cross-validation : the available training data are randomly split into three sets ( as equal in size as possible ) and each set is used to assess the prediction accuracy of an SVM classiﬁer trained , as described above , on the other two sets . 
+ The accuracy of the classiﬁer is quantiﬁed by the F-measure : 
+ F ¼ ; p þ r
+ where p and r are precision and recall , respectively . 
+ These quantities are deﬁned as where TP , FP and FN are counts of true positives , false positives and false negatives from the cross-validation procedure . 
+ Once optimal values for C + and C ( or C and g ) are obtained through the process described above , these par-ameter values and all available training data are used to determine the remaining SVM parameters by solving the minimization problem described above . 
+ The optimal F-measure obtained after the third grid search ( but before ﬁnal determination of w , d and xk ) is taken to represent the accuracy of an SVM classiﬁer . 
+ Once all SVM parameters are determined , an SVM classiﬁer can be used for prediction : a test sequence is mapped to a feature vector z and w f ( z ) + d is evaluated ( left-hand side of Equation ( 6 ) ) . 
+ If this value is positive ( i.e. z is on the same side of the separating surface as the positive examples ) , the sequence is considered to be a binding site . 
+ Feature selection
+ We evaluated a simple feature selection step to reduce the dimensionality of feature vectors , which results in decreased computation time and improved accuracy . 
+ Feature selection is done separately for each TF , with the aim of selecting the subset of features that most aptly describes DNA binding for that TF . 
+ Our feature selection step is performed prior to cross-validation and training of the SVM , so it depends only on the training data and the feature set used . 
+ The procedure is illustrated schematically in Figure S2A of the Supplementary Materials . 
+ First , we compute the mutual information MIj ( j = 1 , ... , n ) between the j-th dimension of the training data , xi , j , and the training classiﬁcations yi ; note that as j indexes the entire feature vector , it covers both the length of the binding site and all the features for each nucleotide of the binding site . 
+ Mutual information measures the dependence between two variables based on their probability distributions : a large value indicates greater dependence 
+ Finding the distribution of yi is straightforward , as it takes only the values 1 and 1 . 
+ After normalization , xi , j is a continuous variable between 1 and 1 . 
+ We approximate its distribution by a discrete histogram with 20 bins of width of 0.1 , which we denote as Bl = [ 1 + ( l 1 ) / 10 , 1 + l/10 ) , for l = 1 , ... , 20 . 
+ The quantity MIj is then computed as X X 20 MIj 1/4 Pðxi ; j 2 Bl ; yi 1/4 mÞ l 1/4 1 m 1/4 1 ; 1 Pðx 2 B ; y 1/4 mÞ i ; j l i log2 ð 2 Þ ð 1/4 Þ : P xi ; j Bl P yi m where P ( x 2 B ) and P ( y = m ) are the marginal distribu-i , j l i tions of x and y ( i.e. distributions computed over the i , j i training data , indexed by i ) and P ( x 2 B , y = m ) is the i , j l i joint distribution . 
+ After MI has been computed for each feature dimenj sion , we select the minimal subset of feature dimensions such that at least 90 % of the total mutual information is retained . 
+ In other words , we ` turn on ' dimensions one at a time , starting with the dimension with the largest MI ﬁrst , j until the sum of all the ` on ' MI is at least 90 % of the total j ( the sum over all j ) . 
+ The list of ` on ' dimensions is determined from the training data and saved so that the same features are retained in the test data . 
+ We will indicate the use of the feature selection step in a method by adding ` FS ' to its name . 
+ Figures S2B and C of the Supplementary Materials give statistics about the overall percentage of features retained by feature selection , and how that retention breaks down over the different types of features , respectively . 
+ SVM parameter optimization by cross-validation as described above results in selection of values for ( C + , C ) or ( C , g ) that yield good discrimination of negative and positive training examples . 
+ However , in some cases , there are many combinations of parameter values that yield approximately the same discrimination . 
+ For an example , compare Figures S3A and B of the Supplementary Materials , which show the accuracy of SVM-PMM at different ( C , C ) pairs for DnaA and + NanR , respectively . 
+ Because there is a degree of stochasticity in the cross-validation step introduced by the random three-way splitting of the training data , the best parameter pair can change from one training run to the next . 
+ For this reason , we repeated all training and prediction runs ﬁve times to assess the robustness of the SVM parameter settings that result from the training procedure . 
+ We can use the extra information from multiple training and prediction runs by considering the combined results . 
+ Multiple training runs on the same training data generate similar but different models , because parameter settings depend on the random splits of training data used in the cross-validation procedure . 
+ We can combine the results of prediction runs for these related models and focus our attention on predictions that are made by all or a speciﬁed fraction of the models , thereby ﬁltering out predictions that are sensitive to degenerate SVM parameter settings . 
+ Thus , in addition to examining the accuracy of the prediction steps individually , we also use a post-processing consensus approach to identify higher conﬁdence binding sites by requiring that a TFBSs be predicted in all ( ﬁve ) prediction runs . 
+ Data for training and testing
+ Training data were obtained and used as described previously ( 26 ) , except ﬂanking sequences were extended to 3 nucleotides instead of 1 . 
+ Brieﬂy , we considered binding sites of 54 TFs documented in RegulonDB ( 57 ) . 
+ Each of these TFs is associated with at least ﬁve TFBSs in RegulonDB . 
+ Binding sites of a given TF are all taken to have the same length , which is sufﬁciently long to encompass the binding sites documented in RegulonDB . 
+ Sequences used as positive training examples included the TFBSs sequences from RegulonDB as well as ﬂanking nucleotides . 
+ Flanking nucleotides were added in accordance with the E. coli genome sequence , which was obtained from KEGG ( 58 ) . 
+ Sequences used as negative training examples consisted of randomly selected non-coding sequences of the E. coli genome ( i.e. sequences were selected from regions not annotated to contain open reading frames ) ; the length of the negative training sequence was taken to be the same as that of a positive training sequence for that TF , so that the feature dimensions were equal for all training data . 
+ A total of 10 000 negative examples were considered for each TF ; known TFBSs were excluded from the negative examples . 
+ To assess the accuracy of motif models , we used the F-measure obtained from the cross-validation analysis described above . 
+ To further validate motif modeling approaches , we used published ChIP-chip data for Fis and Lrp ( 43,44 ) . 
+ These data include 894 sequences that putatively contain at least one binding site for Fis and 138 sequences that putatively contain at least one binding site for Lrp . 
+ Data from RegulonDB include 133 binding sites of Fis and 84 binding sites of Lrp . 
+ The sequences from RegulonDB were used for training ( as described above ) , whereas the sequences from the ChIP-chip data were used only for validation . 
+ Although the number of training sequences for Lrp is similar to the number of Lrp-bound sites detected by ChIP-chip assay , only 11 of the ChIP-chip sequences contain a binding site documented in RegulonDB . 
+ We deﬁne the accuracy of a Fis or Lrp motif model as the number of ChIP-chip sequences containing a predicted binding site divided by the total number of predicted binding sites . 
+ RESULTS
+ As detailed in the Materials and Methods section , we developed motif models based on physicochemical features of DNA . 
+ Using MD simulations , we generated a tabulated set of sequence-dependent structural and chemical features of short DNA sequences . 
+ We also obtained empirical structural features from hydroxyl radical cleavage proﬁles of DNA using the ORChID resource . 
+ Known o 
+ Original physicochemical features were introduced with the SiteSleuth method (26): structural bp features, structural bp step features and chemical features derived from functional group–DNA interaction energies. Expanded physicochemical features include: structural bp features, structural bp step features, hydroxyl radical cleavage features and chemical features derived from amino acid–DNA interaction energies.
+ potential binding sites for a given TF are mapped to vectors of these structural and chemical features , and feature vectors of positive and negative examples of TFBSs were used to train an SVM classiﬁer to discrimin-ate between true and false binding sites . 
+ Our results span a series of six methods and corresponding motif models : a standard PWM-based method , BvH , plus ﬁve SVM-based methods listed in Table 1 in order of increasing complexity and accuracy . 
+ The different methods are distinguished by the classiﬁer used to identify binding sites , the information/features input into the classiﬁer to describe potential binding sites , and add-itional algorithmic steps . 
+ The advances we discuss in the Results come from improvements in all three areas : the use of the radial SVM versus linear SVM classiﬁer ( and the improvement of both over the PWM classiﬁer ) , the introduction of new physicochemical features , and the mutual information-based feature selection step . 
+ Finally , we see additional improvements in the predicted TFBSs with use of consensus-based post-processing . 
+ Training results assessed using data in RegulonDB for 54 TFs in E. coli
+ Using F-measure averaged over ﬁve independent training runs and all 54 TFs to assess accuracy ( Figure 3 ) , we see signiﬁcant effects from both the training algorithm and the features used in each method ; average F-measures and training times ( as well as number of positive training sequences ) are given in Table S1 of the Supplementary Materials . 
+ We see steady improvements in accuracy from left to right in Figure 3A . 
+ First , although BvH and SVM-LMM use only the DNA letter sequence information ( i.e. the same features ) , the average F-measure for SVM-LMM is 67 % larger than BvH . 
+ Thus , solely the use of the SVM framework for training and predicting binding sites constitutes a substantial improvement over standard PWM-based methods . 
+ However , just as much improvement is observed with the introduction of physicochemical features in SiteSleuth , where the average F-measure increases further to 0.38 ( 138 % increase over BvH ) . 
+ Further improvement with the introduction of new physicochemical features is observed in SVM-PMM , where the average F-measure increases to 0.39 . 
+ Finally , additional improvements from algorithmic changes ( rather than changes of feature sets ) are obtained with the introduction of feature selection ( in SVM-PMM-FS ) , average F = 0.40 , and with use of a radial kernel 
+ ( in SVMR-PMM-FS ) , average F = 0.43 . 
+ Although the incremental gains are not always large , there are signiﬁcant , repeatable improvements when the 54 TFs are considered as a whole . 
+ By examining the improvements for each TF individually in Figure 3B ( and studying Table S1 of the Supplementary Materials ) we can gain additional understanding for the averaged improvements in Figure 3A . 
+ Some TFs show very marked improvement with the ﬁrst introduction of physical features ( i.e. SVM-LMM versus SiteSleuth ) and only small improvements thereafter . 
+ For example , the F-measure for MalT increases from 0.079 to 0.529 from SVM-LMM to SiteSleuth , with only a slight additional increase to 0.606 for SVMR-PMM-FS . 
+ Other TFs are apparently equally well described by the DNA letter sequence as by physicochemical features : Fis has an F-measure of 0.29 for SVM-LMM ( 56 % higher than BvH ) , but actually a decrease in F-measure for the linear physicochemical SVM methods ( its F-measure for SVMR-PMM-FS , 0.33 , is slightly higher ) . 
+ Although SVMR-PMM-FS is the most accurate method overall , the wide distribution of trends for each individual TF mean that different aspects of this method are important for accurate predictions for different TFs : in some cases , the important change is the radial kernel , whereas in other cases the important change is the physicochemical features . 
+ The different SVM-based models have different computational requirements , as can be seen in Supplementary Figure S4A . 
+ Reported training times include ( i ) mapping training sequences to feature vectors ; ( ii ) parameter optimization by grid search : and ( iii ) ﬁnal training of the model with optimal parameters . 
+ Step ( ii ) dominates the overall run time , as it requires training the SVM three times per parameter pair ; the time for training the SVM scales with the length of the feature vector ( which in turn scales with the number of features and the length of the binding sites ) , the number of training examples , and the SVM kernel . 
+ The latter is the main reason for the large increase in times for SVMR-PMM-FS ( although the initial coarse grid for that method is slightly larger as well ) : optimizing Equation ( 3 ) is much more computationally expensive for the RBF kernel . 
+ However , the increased time for SVMR-PMM-FS highlights the importance of the feature selection step , which in addition to a small increase in accuracy also results in a decrease in training time , about a 32 % speed-up on average . 
+ The distributio of speed-ups is shown in Figure S4B of the Supplementary Materials , which in some cases is more than 60 % . 
+ The training time for BvH is not indicated in Figure 3 , as this time is just the time required to compute a PWM , which is insigniﬁcant compared with SVM training times . 
+ In contrast to the wide distribution of trends in Figure 3B , changes in training time for different TFs in Figure S4A of the Supplementary Materials are fairly consistent and mirror the trends of the average training time in Figure 3A . 
+ We provide additional comparisons of each method against the most accurate overall method , SVMR-PMM-FS , in Figure 4 . 
+ In each panel of Figure 4 , we plot the cross-validation accuracy of TFBSs models obtained via SVMR-PMM-FS against those of models obtained via one of ﬁve other methods ( SVM-PMM-FS , SVM-PMM , SiteSleuth , SVM-LMM or BvH ) . 
+ Each point in a scatterplot corresponds to one of the 54 E. coli TFs under consideration . 
+ A point on the diagonal line in a panel is a point at which the accuracies of two models being compared would be exactly equal ; SVMR-PMM-FS is the more accurate model for points below the diagonal line . 
+ As algorithmic complexity increases from Figure 4A ( comparing to BvH ) to Figure 4E 
+ ( comparing to SVM-PMM-FS ) , fewer points fall far below the diagonal . 
+ Moreover , note than in each panel of Figure 4 , any points above the diagonal still tend to be close to the diagonal . 
+ That is , when SVMR-PMM-FS is less accurate than the other method being considered , it is only slightly less accurate . 
+ On the other hand , a number of points are always quite far below the diagonal in each panel , so there are TFs for which SVMR-PMM-FS performs much better . 
+ It should be noted that there are six TFs for which all methods fail ( i.e. for which F = 0 ) . 
+ These TFs are CysB , GcvA , OxyR , PspF , RcsAB and Rob . 
+ In these cases , the likely cause for the poor performance is that the binding sites for these TFs can be fairly diverse sequences , but too few positive training sequences were available for any classiﬁer to adequately construct a model . 
+ GcvA , PspF and RcsAB have only ﬁve positive training sequences ( the minimum number for inclusion in our study ) , Rob has only six , CysB has only eight and OxyR has only nine . 
+ A small positive training set alone does not necessarily mean low F-measure , as GadE and UxuR ( both with only ﬁve positive training sequences ) have F-measures of 0.89 and 0.58 under SVMR-PMM-FS , respectively ; indeed , the Pearson correlation between number o positive training sequences and F-measure for SVMR-PMM-FS over all 54 TFs is only 0.12 . 
+ The fact that some TFs had only 5 positive training sequences was also our reason for using 3-fold cross validation in the training procedure . 
+ To see if any improvement can be obtained from more splits , we tested 5 - and 10-fold cross validation for the 7 TFs with at least 80 positive examples , training SVM-PMM-FS . 
+ The results of this analysis are shown in Figure S5 of the Supplementary Materials . 
+ For ﬁve of the TFs there was essentially no change in F-measure from 3 - to 10-fold cross validation . 
+ Two of the TFs , Lrp and IHF , showed slight increases . 
+ Prediction results assessed using ChIP-chip data for Fis and Lrp
+ Individual prediction runs
+ We used the trained models for Fis and Lrp to predict TFBSs across the entire E. coli genome and compared the predictions with binding regions from ChIP-chip experiments ( 43,44 ) . 
+ For these data , we deﬁned the accuracy of a motif model as the number of predicted TFBSs in ChIP-chip regions divided by the total number of predicted TFBSs . 
+ This approach also allowed us to test the consensus-based approach for identifying predicted TFBSs , wherein we compare the predicted TFBSs from ﬁve independently trained models and retain only those sites that are predicted positive by each model ; there is no variability in the training procedure for BvH , so the consensus analysis is not performed for this method . 
+ Figure 5A gives the accuracy from each method for Fis and Lrp , and Figure 5B gives the number of predicted binding sites from each method . 
+ Note that the F-measures for Fis and Lrp are indicated by the boxed dots and circled dots , respectively , in Figure 4 . 
+ We do not report prediction times for the different methods under consideration , as prediction time is typically dominated by the mapping of test sequences to feature vectors , which is I/O intensive and therefore platform dependent . 
+ We also give the DNA sequence logos for Fis and Lrp , generated by WebLogo ( 59,60 ) from the positive training examples in Figure S6 of the Supplementary Materials for the reader 's reference . 
+ For Fis , in accordance with cross-validation results ( Figure 4 ) , we see a signiﬁcant improvement in accuracy from BvH to SVM-LMM . 
+ The improved accuracy can be attributed to the use of SVM-based classiﬁcation ( and negative training examples ) . 
+ However , SVM-LMM performs similarly to SiteSleuth , SVM-PMM and SVM-PMM-FS ; it is only signiﬁcantly outperformed by SVMR-PMM-FS ( Figure 5A , left set of bars ) . 
+ Correspondingly , when comparing F-measures for these methods , only SVMR-PMM-FS outperforms SVM-LMM . 
+ These results suggest that the physicochemical features considered here may not provide substantially more information than that found in the letter-based representations of Fis binding sites . 
+ The accuracy results of Figure 5A are mirrored in Figure 5B : as accuracy increases , the number of predicted TFBSs decreases . 
+ Thus , improvements in accuracy ( e.g. from BvH to SVM-LMM ) seem to be obtained by reductions of the false positive rate . 
+ Starkly different results are obtained for Lrp ( Figure 5 , right set of bars ) . 
+ The accuracy of BvH and SVM-LMM are close to zero , and the accuracy of SiteSleuth is identically zero . 
+ In fact , SiteSleuth was unable to predict any TFBSs for Lrp ; we were surprised by this result , since the SiteSleuth F-measure for Lrp is greater than the values for BvH and SVM-LMM . 
+ We see comparable accuracy between SVM-PMM and SVM-PMM-FS , and then signiﬁcant improvement in SVMR-PMM-FS . 
+ The latter method reaches > 10 % accuracy on average . 
+ As in the case of Fis , improvements in accuracy accompany decreases in the number of predicted TFBSs ( cf. panels A and B in Figure 5 ) . 
+ The high ( average ) accuracy of SVMR-PMM-FS is achieved because of prediction runs with very small numbers of predicted TFBSs . 
+ In 4 of the 5 runs , a few hundred TFBSs were predicted at an accuracy of about 3 % , and in the ﬁfth run only 3 TFBSs were predicted , 2 of which were correct ( 67 % accurate ) . 
+ Other than the lack of SiteSleuth predicted TFBSs , the trend of increasing accuracy from BvH to SVMR-PMM-FS ( Figure 5 , right set of bars ) is consistent with crossvalidation results ( Figure 4 ) . 
+ We computed a P-value for each predicted TFBSs by comparing its SVM score ( computed from Equation ( 6 ) ) to the distribution of SVM scores from 10 000 randomly generated DNA sequences with the same GC content as the E. coli genome ; higher ( more positive ) scores indicate stronger predictions , so the P-value for a predicted TFBSs is P ( random score > w f ( z ) + d ) . 
+ The P-value distributions are given in Figure S7 of the Supplementary Materials . 
+ The predictions for SVMR-PMM-FS are clearly the strongest for both Fis and Lrp . 
+ The fact that predictions for Fis from SVM-LMM tend to be slightly more signiﬁcant than those from SiteSleuth , SVM-PMM or SVM-PMM-FS is consistent with the fact that SVM-LMM has a larger F-measure and higher accuracy ( before consensus ﬁltering ) than these methods , as discussed above . 
+ Consensus-based predictions
+ The effects of adding ﬁve-way consensus ﬁltering of TFBSs predictions for Fis and Lrp models are indicated by the thick horizontal lines in Figure 5 . 
+ For Fis models incorporating physicochemical features ( SiteSleuth through SVMR-PMM-FS ) , there is a fairly consistent increase in accuracy . 
+ In fact , although the individual runs of SVM-PMM and SVM-PMM-FS were no better than those runs for SVM-LMM , the consensus predictions for the PMMs do outperform the consensus prediction for SVM-LMM . 
+ Improvements in accuracy obtained through the consensus procedure are mirrored by drops in the number of predicted TFBSs . 
+ For Lrp models , consensus ﬁltering results in a large accuracy improvement for SVM-PMM . 
+ The loss of accuracy for the SVM-PMM-FS Lrp model is reﬂected in the lower F-measure obtained when the feature selection step is used for Lrp ( one of a minority of TFs where feature selection did not improve accuracy ) . 
+ On the other hand , consensus ﬁltering reduces accuracy ( to zero ) for 
+ SVM-LMM and SVMR-PMM-FS : for SVM-LMM because three training runs produced a model that predicted no TFBSs , and for SVMR-PMM-FS because one run only predicted three binding sites , which did not overlap with the predictions of the other four runs . 
+ If we replace the ﬁve-way consensus requirement with a less stringent two-way consensus , accuracy of SVM-LMM increases to 0.019 % ( from 0.007 % for the case without consensus ﬁltering ) , and 3.2 % accuracy of SVMR-PMM-FS is obtained , consistent with the results of four of the runs . 
+ We assessed the effect of multiple training and prediction runs on the consensus analysis by performing an add-itional 15 training and prediction runs for Fis and Lrp using SVM-PMM-FS , for a total of 20 . 
+ We used a bootstrapping procedure to analyze the effect of the number of runs in the consensus analysis on the accuracy and number of predicted TFBSs : for a given number of runs n ( allowing n to vary from 2 to 20 ) , we randomly sampled ( with replacement ) n sets of TFBSs predictions out of the 20 total runs . 
+ We then performed the consensus-based post-processing analysis , requiring all n runs to agree on each prediction . 
+ This process was repeated 10 times for each n , and we computed the mean and standard deviation of the accuracy and number of TFBSs . 
+ These results are plotted in Figure S8 of the Supplementary Materials . 
+ In the bootstrapping results , we see a steady trend toward greater accuracy and fewer predicted TFBSs as the number of runs considered increases , although the standard deviations are generally fairly large with respect to the increases in accuracy . 
+ We also considered easing the consensus rule , allowing TFBSs predictions to pass if fewer than n prediction runs agree , as was necessary for Lrp predictions from SVM-LMM and SVMR-PMM-FS . 
+ However , for our extra analysis of SVM-PMM-FS in Figure S8 of the Supplementary Materials , we found that the best accuracy was almost always obtained by the most stringent consensus requirement ( results not shown ) . 
+ DISCUSSION
+ We have extended a recently proposed motif modeling paradigm ( 26 ) wherein physicochemical features of DNA -- protein interactions are used to discriminate TFBSs from background genome sequences . 
+ This approach constitutes a fundamental change from typical PWM-based motif modeling approaches , which consider only letter representations of DNA sequences . 
+ Here , we advance the physicochemical motif modeling ( PMM ) approach by considering new physicochemical features of DNA and algorithmic improvements . 
+ We implemented modiﬁcations of the motif modeling approach one by one to illustrate the effect of each modiﬁcation on accuracy . 
+ The PMM that incorporates all improvements considered here ( both new features and new algorithmic steps ) , SVMR-PMM-FS , was found to be the most accurate . 
+ The source code for the software used to generate our results is freely available at http://dinner-group.uchicago . 
+ edu/downloads . 
+ html 
+ The foremost difference between PMMs and PWM-based methods is the use of physicochemical features to directly capture the important aspects of protein -- DNA interactions . 
+ In Figure S9 of the Supplementary Materials , we compare distances between random DNA sequences in letter sequence space and feature space to illustrate the fact that DNA letter sequence alone is not a complete predictor of the structural and chemical properties of DNA : DNA sequences may correlate ( or anti-correlate ) in ways too subtle to be detectable in the discrete letter-sequence space . 
+ We attempted to recover these correlations through our chemical features , which should capture base readout mechanisms of TF binding , and our structural features ( bp , bp step and hydroxyl radical cleavage ) , which should capture shape readout mechanisms . 
+ Our chemical features calculations depended on the evaluation of amino acid-DNA energies for different amino acid rotamers . 
+ Our strategy for generating these rotamers involved a ﬁxed angle rotation around 1 . 
+ We wanted to examine if there would be any substantial effect of an approach that considered other side chain rotations as well ( e.g. 2 , 3 etc. ) . 
+ To sample additional side chains , we downloaded the Backbone-Dependent Rotamer Library of Shapovalov and Dunbrack ( 61 ) , which contains joint probabilities for different n combinations as a function of the backbone f-c angles . 
+ We then re-computed interaction G values for arginine , glutamic acid and tyrosine around the GGG 3-mer : selecting the appropriate joint distribution based on the backbone angles for each amino acid , for each of the 108 grid points shown in Figure 1 , we randomly sampled 50 times from this distribution and performed whole-molecule rotations and energy calculations as described in the Materials and Methods . 
+ In Figure S10A of the Supplementary Materials we plot the free energies from the rotamer library and the ﬁxed 1 rotation as a function of the 108 grid points . 
+ We see good agreement for arginine and tyrosine ; although free energies for glutamic acid do not always have the same trend across the grid , they still vary within the same range . 
+ Following the analysis described in the Chemical Features section of the Materials and Methods , we re-computed the minimum and average G for the minor groove , major groove and outside DNA sub-grids . 
+ Remaining differences between rotamer sampling strategies were typically much smaller than differences between sub-grids or amino acids after this step , see Figure S10B of the Supplementary Materials . 
+ The second principal difference between PMMs and PWM-based methods is the classiﬁer algorithm used to determine binding sites . 
+ Our use of the SVM is a direct result of the introduction of physicochemical features , as discrete ( ACGT ) DNA sequences are mapped to continuous physical feature vectors . 
+ However , improvements may be due to either aspect of the PMMs , so we quantiﬁed the relative improvement due to the physicochemical features versus the SVM classiﬁer by introducing a novel LMM in SVM-LMM . 
+ Interestingly , we found a signiﬁcant improvement when comparing this method to a standard PWM method , BvH . 
+ This essentially demonstrates that the SVM does a better job of integrating the information in the training data into a predictive model than the PWM . 
+ The SVM framework leads to additional potential algorithmic improvements . 
+ In this study , we investigated a simple feature selection step and use of an RBF versus linear kernel in the SVM . 
+ We found improvements in accuracy and training time with the feature selection step . 
+ Training with the RBF kernel resulted in substantially more accurate predictions at the cost of signiﬁcantly more training time . 
+ We also were able to take advantage of the stochasticity in the grid search parameter optimization step by training multiple models and selecting binding sites by consensus . 
+ The choices we made for these algorithmic changes are not necessarily unique , and others could be explored in the future . 
+ For instance , there are many possible non-linear kernels for the SVM ( 56 ) , but from our understanding of their differences we decided that the RBF would be the most appropriate for TFBSs prediction . 
+ Alternative feature selection strategies ( 62,63 ) could also be explored . 
+ In principle , feature selection can provide some physical details about the nature of TF-DNA binding . 
+ However , in practice we see that our routine does not eliminate a large percentage of features : about 75 85 % of all features are retained for most TFs ( Figure S2B ) ; CRP was an outlier , with 68 % retention ( and a corresponding 60 % speed-up with feature selection ) . 
+ We did not see any distinct patterns distinguishing binding modes from the features that were selected . 
+ Because our feature selection step chooses the minimal subset of features that retained 90 % of the total mutual information , at most 90 % of features can be selected . 
+ Since actual selection percentages are not much smaller than this maximum , we are limited in our ability to conclusively distinguish different modes of TF binding via our feature selection procedure . 
+ Nevertheless , we present a short analysis of the features retained by feature selection . 
+ Figure S2C of the Supplementary Materials gives the frequency of feature selection for different types of features ; frequencies are computed over the length of the training sequence , as each feature type is mapped to each nucleotide in the training sequence . 
+ The gray curve shows the average and standard deviation of feature selection frequency over all 54 TFs . 
+ Note that the effect of PCA on the chemical features can be seen by the high consistency with which features 1 , 2 , 21 and 22 are selected : these are the ﬁrst principal components for the forward and reverse strands , respectively , and together capture 42 % of the variance . 
+ We do not ﬁnd any distinct patterns among the structural features , other than the lower selection frequency for the hydroxyl radical cleavage features . 
+ Thus far we have not discussed the typical application of TFBSs prediction algorithms , where a speciﬁc set of genomic regions , such as promoters or other cis regulatory sites ( 64 ) , are examined and over-representation of predicted TFBSs in those regions is used to infer probable binding ( 65 ) . 
+ This typically involves computation of a P-value for the over-representation of TFBSs in the surveyed regions versus background regions . 
+ However , when we performed this analysis for the Fis and Lr predictions , we always found signiﬁcant over-representation of TFBSs in the ChIP regions for all six of the methods compared , including BvH and SVM-LMM . 
+ We chose to omit these results because they fail to clearly distinguish a method that predicts a large number of false positives , but still more true positives than would be expected by chance , from a more discriminating method that predicts far fewer false positives and obtains higher accuracy . 
+ In addition to considering the P-values for overrepresentation of TFBSs in particular genomic regions , we also considered P-values for each predicted TFBS , which measure the quality of each site individually . 
+ The distribution of P-values for each method is given in Figure S7 of the Supplementary Materials , where we see substantially more highly signiﬁcant predictions for SVMR-PMM-FS compared with the other methods . 
+ It should be noted that our use of randomly generated DNA sequences tends to generate less conservative estimates of P-values than other approaches , as has been discussed by Frith et al. ( 65 ) . 
+ A good alternative is to use a large genomic region as background ; however , since we used our models to make genome-wide predictions , all genomic regions are already tested , so we felt that randomly generated sequences constituted the simplest available approach for estimating P-values . 
+ Nevertheless , our results still demonstrate the relative quality of predictions from different models , and we include the option to use background sequences for P-value estimation in the source code available online . 
+ Standard TFBSs prediction approaches use a library of motif models to identify a subset of TFs that may preferentially bind to the genomic regions of interest . 
+ In this regard , the development of an online database of PMMs , like JASPAR , RegulonDB or TRANSFAC ( 6,7,66 ) for PWM-based motif models , would be a valuable resource for researchers interested in applying PMMs to their own data . 
+ Such a database would contain multiple SVMR-PMM-FS trained models for each TF to account for variability in the parameter optimization step and for consensus-based predictions . 
+ Although SVMR-PMM-FS was the most computationally expensive model to train , the model for a TF only needs to be trained once , as the same output can be used to predict an unlimited number of test sequences . 
+ We see evidence among our results that , for some TFs , our PMM features do not necessarily capture any more information about binding speciﬁcity than DNA letter sequences do . 
+ For example , predictions for Fis ( before consensus ﬁltering ) by SVM-LMM are just as accurate as the ( linear ) PMMs . 
+ Studies suggest that Fis speciﬁcity is likely to depend on both direct points-of-contact ( direct readout ) and the non-local mechanical properties of its DNA binding sites ( indirect readout ) : point mutations have a strong affect on binding afﬁnity ( 43,67 -- 69 ) , but Fis-DNA structures also show a distinct bent DNA structure ( 69,70 ) and mutations that are known to affect DNA structure are among those affecting Fis binding ( 71 ) . 
+ Properties like ﬂexibility , thermodynamic softness , and large-scale curvature could be included in a new PMM and might yield more accurate predictions of Fis TFBSs . 
+ There are a number of additional ways that PMMs could be further improved . 
+ We have focused on encoding the chemical and structural nature of DNA and protein -- DNA interactions , but there are likely many other relevant pieces of information in that regard . 
+ Epigenetic structure can play a key role in selecting TFBSs beyond just DNA sequences . 
+ In eukaryotic genomes , histone markers have been widely linked with promoter and enhancer regions ( 64,72 -- 77 ) ; experimental data detailing the relationship between DNA sequence and histone positioning and modiﬁcations could be translated into chromatin structure features ( 78 ) . 
+ Also , in many cases , the TFBSs are known to be dependent on cellular conditions or cooperation with other TFs ( 79,80 ) . 
+ This information is difﬁcult to include in existing static motif models , but could possibly be accounted for by deﬁning cell state-dependent features and grafting those onto the existing features . 
+ Ultimately , the ﬂexible nature of our SVM-based framework allows different features to be substituted easily , which makes it possible for different researchers to compute and test features independently . 
+ Also , reﬁnements in the quality of ( positive ) training data could also greatly improve accuracy . 
+ Although these binding sites are veriﬁed by high quality individual experiments , the precision of the experiment may not be to the level of a single bp . 
+ Even shifts of 1 or 2 bp could greatly affect the agreement between different positive training sequences , in either the space of DNA letter sequences or our mapped features . 
+ Besides stronger quality controls in the determination of positive training examples , adding an alignment step before training could also enhance agreement among positive training sequences in feature space , and thus the quality of the predictions . 
+ The nature of TF -- DNA interactions is one of the most important features of gene regulation but remains poorly understood , in that predictions of TFBSs tend to have a high false positive rate . 
+ We have presented a TFBSs prediction method with greatly improved predictive capability , and we believe that this tool constitutes an important step in the advancement of accurate TFBSs prediction . 
+ We have clearly demonstrated the improvements are gained through the use of physicochemical features , and that higher quality features yield higher quality results . 
+ Importantly , because the space of possible physical features is practically limitless , there is much room for further improvements . 
+ SUPPLEMENTARY DATA
+ Supplementary Data are available at NAR Online : Supplementary Tables 1 and Supplementary Figures 1 -- 10 . 
+ ACKNOWLEDGEMENTS
+ The authors thank A. L. Bauer for helpful discussions and the Center for Nonlinear Studies for use of ofﬁce space and computational resources during the visit of MMC to Los Alamos . 
+ We also acknowledge support from NIH 
+ US Department of Energy through the Computational Science Graduate Fellowship program and contract DE-AC52-06NA25396 ; National Institutes of Health ( NIH ) [ RR018754 , GM085273 , GM081892 ] . 
+ Funding for open access charge : NIH [ GM085273 ] . 
+ Conﬂict of interest statement. None declared.
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/23190111.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/23190111.txt 0 → 100644
View file @27818a9
+ Salmonella enterica serovars Typhi and Typhimurium reveals
+ Thomas Wileman ,1 Keith James ,1 Thomas Keane ,1 Duncan Maskell ,3 Jay C. D. Hinton ,4 Gordon Dougan1 and Robert A. Kingsley1 * 1The Wellcome Trust Sanger Institute , The Wellcome Trust Genome Campus , Hinxton , Cambridge CB10 1SA , UK . 
+ 2School of Biological Sciences , University of East Anglia , Norwich NR4 7TJ , UK . 
+ 3Department of Veterinary Medicine , University of Cambridge , Madingley Road , Cambridge CB3 0ES , UK . 
+ 4Institute of Integrative Biology , University of Liverpool , Crown Street , Liverpool L69 7ZB , UK . 
+ Summary
+ OmpR is a multifunctional DNA binding regulator with orthologues in many enteric bacteria that exhibits classical regulator activity as well as nucleoid-associated protein-like characteristics . 
+ In the enteric pathogen Salmonella enterica , using chromatin immunoprecipitation of OmpR : FLAG and nucleotide sequencing , 43 putative OmpR binding sites were identiﬁed in S. enterica serovar Typhi , 22 of which were associated with OmpR-regulated genes . 
+ Mutation of a sequence motif ( TGTWACAW ) that was associated with the putative OmpR binding sites abrogated binding of OmpR :6 ¥ His to the tviA upstream region . 
+ A core set of 31 orthologous genes were found to exhibit OmpR-dependent expression in both S. Typhi and S. Typhimurium . 
+ S. Typhimurium-encoded orthologues of two divergently transcribed OmpR-regulated operons ( SL1068 -- 71 and SL1066 -- 67 ) had a putative OmpR binding site in the inter-operon region in S. Typhi , and were characterized using in vitro and in vivo assays . 
+ These operons are widely distributed within S. enterica but absent from the closely related Escherichia coli . 
+ SL1066 and SL1067 were required for growth on N-acetylmuramic acid as a sole carbon 
+ Accepted 19 November , 2012 . 
+ * For correspondence . 
+ E-mail rak @ sanger.ac.uk ; Tel. ( +44 ) 1223495391 ; Fax ( +44 ) 1223494919 . 
+ source . 
+ SL1068 -- 71 exhibited sequence similarity to sialic acid uptake systems and contributed to coloni-zation of the ileum and caecum in the streptomycinpretreated mouse model of colitis . 
+ Introduction
+ OmpR is a DNA binding protein that , with the cognate sensor EnvZ , co-ordinates transcriptional response to environmental factors including osmotic stress in many enteric bacteria ( Forst and Roberts , 1994 ) . 
+ OmpR/EnvZ are central to the adaptive response to the intestinal environment ( Giraud et al. , 2008 ) , in part because of the distinct osmolyte composition of the lumen . 
+ As many as 125 genes in Escherichia coli ( Oshima et al. , 2002 ) and 305 genes in Salmonella Typhi ( Perkins et al. , 2009 ) have been implicated in OmpR/EnvZ-dependent expression . 
+ The OmpR regulon includes genes from the ` ancestral ' core genome shared with many enteric bacteria as well as genes of the accessory genome . 
+ The latter include virulence-associated loci such as the viaB locus that encodes Vi polysaccharide biosynthesis genes , and genes encoded on Salmonella pathogenicity island 2 ( SPI-2 ) via its regulation of ssrAB ( Pickard et al. , 1994 ; Feng et al. , 2003 ; Perkins et al. , 2009 ) . 
+ OmpR-regulated orthologues in diverse enteric bacteria deﬁne the ancestral regulon and include porin genes such as ompF and ompC . 
+ However , the OmpR regulon exhibits considerable plasticity and can include genes of the ancillary genome acquired by horizontal gene transfer , many of which are involved in host -- pathogen interactions . 
+ The acquisition of such genes and the ability to express them appropriately on moving from the intestinal lumen to the intracellular compartment were likely key features in the evolution of Salmonella ( Bäumler , 1997 ; Groisman and Ochman , 1997 ) . 
+ The genus Salmonella consists of more than 2500 sero-types that exhibit diverse host range and pathogenicity ( Bäumler et al. , 1997 ; Popoff et al. , 2004 ) . 
+ Most of the > 2500 serovars of Salmonella enterica have a relatively broad host range and are typically associated with gastro-enteritis in human ( Santos et al. , 2001 ) . 
+ In contrast , S. enterica serovar Typhi ( S. Typhi ) is highly host-adapted to cause the systemic disease typhoid speciﬁcally in human . 
+ S. Typhi can invade the intestinal mucosa but colonization of the intestine is relatively transient and rapid systemic dissemination can follow leading to typhoid . 
+ This distinct pathogenesis is driven at least in part by horizontally acquired genes that are required for virulence , including the OmpR-regulated viaB , encoding the Vi polysaccharide antigen ( Pickard et al. , 1994 ) . 
+ The OmpR regulon includes both Salmonella pathogenicity island 1 ( SPI-1 ) and SPI-2 , mediated through ssrAB expression . 
+ The integration of such horizontally acquired genes into existing regulons is a recurring theme in the evolution of pathogenesis . 
+ The mechanism by which OmpR regulates gene expression is not fully understood . 
+ It has been proposed that OmpR has only weak speciﬁcity for DNA binding ( Head et al. , 1998 ; Rhee et al. , 2008 ) and that it may have both a classical site-speciﬁc impact on gene expression through recruitment of RNA polymerase and additional nucleoid-associated protein ( NAP ) - like pro-perties that may also impact global gene expression ( Cameron and Dorman , 2012 ) . 
+ In this study we combine RNA-seq and ChIP-seq together with in vitro and in vivo phenotyping to deﬁne the interaction of OmpR with the chromosome and characterize two novel OmpR-regulated operons that are part of the S. enterica ancillary genome . 
+ Results
+ Identiﬁcation of candidate S. Typhi genes regulated by OmpR using ChIP-seq
+ We recently characterized the OmpR regulon of S. Typhi BRD948 using DNA microarray and RNA-seq ( Perkins et al. , 2009 ) ( Table S1 ) identifying 208 genes by microarray and 305 genes by RNA-seq , that exhibited OmpR-dependent transcription during mid-log phase culture in rich media . 
+ In order to further characterize the OmpR regulon we used ChIP-seq to identify candidate genome regions that are preferentially associated with the OmpR protein in the S. Typhi genome . 
+ To this end a S. Typhi BRD948 derivative TT53 .8 was constructed in which the 3 ′ end of ompR harboured an in-frame fusion with sequence encoding three repeats of the FLAG epitope ( 3 ¥ FLAG tag ) . 
+ TT53 .8 expressed the fusion protein ( OmpR : :3 ¥ FLAG ) in place of the wild-type OmpR protein from the native chromosomal location at single copy . 
+ To assess if this fusion protein had comparable function to wild-type OmpR , we indirectly monitored the expression of the ompR-dependent viaB locus in TT53 ( Pickard et al. , 1994 ) . 
+ Agglutination of S. Typhi TT53 with anti-Vi antiserum was indistinguishable to that of S. Typhi BRD948 in low-salt and high-salt culture media ( data not shown ) . 
+ Salmonella Typhi TT53 .8 or BRD948 were grown to mid-log phase ( OD600 = 0.6 ) and ChIP-seq was performed on DNA precipitated by anti-FLAG antibody . 
+ The normalized sequence depth at each base of the reference genome sequence was plotted as the number of standard deviations from the mean ( z-score ) to identify regions of signiﬁcantly enriched sequence coverage ( z-score > 3 ) and 43 ChIP-enriched peaks that were within intergenic regions were studied further ( 15 lay within annotated coding sequence and were excluded from further analysis ) ( Fig. 1 , Table S2 ) . 
+ Twenty-two of the genes with a sequence enrichment peak in their upstream region also exhibited OmpR-dependent expression as determined by RNA-seq or microarray analysis ( Perkins et al. , 2009 ) ( Fig. 1 , Table 1 ) . 
+ These included many previously identiﬁed OmpR-regulated genes such as tviA and ompS1 ( Fig. 2 ) . 
+ Some genes associated with aerobic lifestyle such as citrate synthase ( gltA ) and succinate dehydrogenase C ( sdhC ) also contained enrichment peaks . 
+ A ChIP peak was identiﬁed in the intergenic region of the divergently transcribed operons encoding stdA and dppA . 
+ Another within the intragenic region of two divergently transcribed putative operons encoding genes t1787 -- 1790 and t1791 -- 93 ( Table 2 ) . 
+ Surprisingly , statistically signiﬁcant peaks with a z-score > 3 were not identiﬁed in the well-characterized OmpR-regulated genes ompF and ompC , although a minor peak that fell just short of the statistical cut-off , mapped extensively to motifs ( C1 -- 3 ) implicated in OmpR binding ( Fig. 2C ) . 
+ To determine if the C-terminal FLAG tag of OmpR impacted binding to the ompC or ompF promoter region we compared expression of these genes in the wild-type ( BRD948 ) and ompR : : FLAG strains ( TT53 .8 ) ( Fig . 
+ S1 ) . 
+ Expression of tviB and ompF was similar in these two strains but ompC was expressed at a signiﬁcantly lower level in TT53 .8 . 
+ The degree to which ompC expression was decreased in TT53 .8 compared to wild-type BRD948 was not as pronounced as that in a strain in which ompR was deleted ( TT10 ) suggesting that some OmpR activity for the ompC promoter was retained in the epitope-tagged protein . 
+ Identiﬁcation of nucleotide sequence motifs associated with OmpR binding
+ To identify sequence motifs within the 43 intergenic ChIP-enriched sequence coverage peaks that may be involved in OmpR binding , nucleotide sequences were compared using the YMF algorithm ( Sinha and Tompa , 2003 ) , which identiﬁes candidate binding sites by searching for statistically overrepresented motifs . 
+ Five eight-nucleotide motifs ( z-score > 9.8 , Table S2 ) were identiﬁed in 14 separate loci , with some loci containing multiple motifs . 
+ The motif TGTWACAW occurred 21 times , in 12 ChIP-enriched sequences including the viaB locus ( 5 ′ tviA ) where it precisely coincided with the peak of sequence enrichment motif was present in two copies in the upstream region of seven genes : ompS1 , csgD , sdhC , galP , dppA , pckA and t4357 . 
+ t4357 encodes a putative integrase encoded on a prophage ~ 3.6 kbp upstream of tviA . 
+ A second motif AYGGCCTA was present in single copy in the upstream region of four loci : t0528 , t1320 , dppA and tviA . 
+ There was also a signiﬁcant difference in the magnitude of ChIP sequence peaks containing motif AYGGCCTA compared with those with no identiﬁable motif ( Student 's t-test , P < 0.0001 ) ( Fig . 
+ S2 ) , suggesting a link between this sequence and the avidity of OmpR binding . 
+ None of the sequence motifs are present in the previously described C1 -- 3 or F1 -- 4 OmpR binding sites in ompC and ompF promoter regions respectively . 
+ The largest enrichment peak ( z-score = 16.23 ) was found upstream of tviA of the viaB locus , encoding four different candidate motifs : TGTWACAW , CTAGACTA , AYGGCCTA and AACTAACW ( Table S2a ) . 
+ To ﬁnd if the TGTWACAW motif was involved in binding of OmpR to the tviA upstream region , we used an electrophoresis mobility shift assay ( EMSA ) with phosphorylated recombinant OmpR : :6 ¥ His protein and oligonucleotide probes . 
+ The probes comprised either tviA -133 to -460 or tviA -303 to -377 of the tviA upstream region . 
+ Arbitrary mutation of the TGTTACAA motif at the -- 341 to -348 position to GCTCGGAC resulted in abrogation of OmpR : :6 ¥ His binding to either probe ( Fig. 3 ) . 
+ No binding of OmpR : :6 ¥ His was observed with a probe containing the mutant motif sequence , suggesting that this motif is important for OmpR binding . 
+ Signiﬁcantly , the TGTTACAA motif in the tviA upstream region also coincides with the genome sequence most highly overrepresented following ChIP enrichment ( z-score = 30 , Fig. 2 ) . 
+ The OmpR regulons of S. Typhi and S. Typhimurium contain a core set of shared orthologous genes
+ We next compared the previously unreported OmpR regulon of the broad host range S. Typhimurium strain SL1344 with that of the human restricted S. Typhi Ty2 ( Perkins et al. , 2009 ) to identify previously uncharacterized genes , controlled by OmpR in both pathogens . 
+ A total of 208 genes and 329 genes were expressed in an OmpR-dependent manner in S. Typhi and S. Typhimurium respectively . 
+ Of these , 31 orthologous genes were expressed in an OmpR-dependent manner in both serotypes ( Table S3 ) . 
+ OmpR-dependent expression levels of genes that were OmpR-regulated in both S. Typhi and S. Typh-imurium showed a high degree of correlation ( Fig. 4 ; R2 = 0.73 ) indicating strongly conserved regulation between the two serovars in this cohort of genes . 
+ OmpR-dependent genes found in both serotypes included ompS1 , ompC , sprB and ompR , all of which showed decreased expression in the absence of OmpR . 
+ A number of genes were upregulated in the absence of OmpR , including the succinate dehydrogenase genes sdhCDA ( Cunningham and Guest , 1998 ) , fatty acid dehydrogenase genes fadABI ( Campbell et al. , 2003 ) , narK ( Rowe et al. , 1994 ) , required for nitrite extrusion , and the nitrite reduct-ase gene nrfA ( Clarke et al. , 2008 ) . 
+ Potentially important differences in the expression of SPI-1 , SPI-2 and ﬂagellin secretion apparatus in S. Typhi and S. Typhimurium were also revealed by the transcriptomic data in S. Typhimurium ; 28 SPI-1-associated genes exhibited decreased expression in the absence of OmpR , including the sprB gene . 
+ Furthermore , several genes associated with the ﬂagella type III secretion system ( ﬂiGHJLMOPR ) and 10 apparatus genes encoded on SPI-2 ( ssaA , ssaB , ssaGHIJKLT and STM1410 ) also exhibited up to ﬁvefold greater expression in the absence of a functional OmpR ( Table S4 ) . 
+ In contrast , the only known SPI-1 gene that was OmpR-dependent in S. Typhi was sprB , and this encodes a transcriptional regulator ( Golubeva et al. , 2012 ) . 
+ However , it is important to note that the culture conditions employed in our studies are known to result in low expression of SPI-1 and SPI-2 genes and therefore the biological impact of the observed differences in OmpR regulation in these conditions is not clear . 
+ We next characterized the function of t1787 -- t1790 and t1791 -- 1793 using a number of in vitro and in vivo assays . 
+ In S. Typhimurium SL1344 , genes SL1071 -- SL1068 ( STM1133 -- STM1130 ) and SL1067 -- SL1066 ( STM1129 -- STM1128 ) are orthologues of the S. Typhi t1787 -- t1790 and t1791 -- 1793 genes respectively ( Fig. 5 ) . 
+ However , t1792 and t1793 of S. Typhi are present as a single open reading frame ( SL1066 ) in S. Typhimurium , suggesting that these genes may represent fragments of a pseudogene in S. Typhi . 
+ We considered that these operons may be involved in host -- pathogen interactions since they were absent from the closely related species E. coli ( strain K12 ) ( Blattner et al. , 1997 ) but were highly conserved within S. enterica serotypes exhibiting > 99 % identity at the amino acid level in these pathogens ( Fig. 5 ) . 
+ The transcriptomic data showed that expression of at least two of the S. Typhimu-rium orthologues was OmpR-dependent , namely SL1068 ( 4.75-fold increase , P < 0.05 , orthologue of t1791 ) and SL1069 ( 9.59-fold increase , P < 0.05 , orthologue of t1789 , Fig. 4 ) . 
+ Proteins encoded by several of the genes in these operons exhibit similarity to proteins that have previously been implicated in sialic acid uptake and metabolism . 
+ An orthologue of SL1066 encoded by S. Typhimurium LT2 ( STM1128 ) is a sodium solute symporter ( SSS ) family transporter of sialic acid and shares 44 -- 48 % identity at the amino acid level with similar transport systems in Lactoba-cillus spp . 
+ and Staphylococcus spp . 
+ ( Severi et al. , 2010 ) . 
+ The STM1128 gene complemented a mutant E. coli lacking the sialic acid transporter NanT for growth on sialic acid as the sole source of carbon ( Severi et al. , 2010 ) . 
+ SL1067 is a homologue of nanE ( SL3309 ) that is encoded elsewhere on the SL1344 chromosome , sharing 69 % identity at the amino acid level . 
+ SL1068 to SL1071 have previously been proposed to be orthologues of genes encoded by E. coli K12 ( nanM , nanC , yjhB and yjhC respectively ) , some of which have been implicated in sialic acid metabo-lism ( Severi et al. , 2008 ) . 
+ However , the genomic context for these genes is quite different in E. coli compared to S. Typhimurium and amino acid sequence identity ranges from just 22 % for NanC to 62 % for YhjC , considerably less than observed for orthologous proteins of E. coli and S. Typhimurium of approximately 90 % ( Parkhill et al. , 2001 ) . 
+ We therefore tested the hypothesis that genes within these operons were involved in utilization of acetylated amino sugars including sialic acid as a sole carbon source , and colonization of the murine host . 
+ The surface of the agar was inoculated with each strain cultured on LB agar and incubated for 48 h at 37 °C . 
+ Growth was assessed and recorded as no growth ( - ) , growth retarded relative to that with glucose as a sole carbon source ( + ) and comparable growth to that with glucose as a sole carbon source ( + + ) . 
+ To this end , we determined the ability of S. Typhimurium SL1344 and the isogenic mutant derivatives RAK103 ments in the streptomycin-pretreated mouse model of colitis ( Hapfelmeier et al. , 2004 ; Hapfelmeier and Hardt , 2005 ) . 
+ Groups of streptomycin-pretreated mice were inoculated with 1 ¥ 103 cfu of an equal mixture of S. Typh-imurium RAK103 or RAK105 and RAK113 . 
+ Four days post inoculation RAK103 was present in similar proportion to RAK113 ( Table 4 ) . 
+ However , RAK113 was present in approximately threefold greater numbers in the caecum of mice compared with RAK105 ( Table 4 ) . 
+ This decrease in ﬁtness speciﬁcally in the inﬂamed gut was statistically signiﬁcant , and reproducible . 
+ Furthermore , when the SL1067/SL1066 genes were reintroduced into RAK105 by phage-mediated transduction giving rise to strain SW771 , the ability to compete successfully with the wild-type RAK113 in colonization of the caecum in streptomycinpretreated mice was restored ( Table 4 ) . 
+ Discussion
+ Transcriptional regulons have been deﬁned using DNA microarrays and more recently by RNA-seq approaches . 
+ Observed changes in transcript abundance can be directly or indirectly related to a regulator protein binding either within an operator or at a secondary regulatory site . 
+ We have combined measurement of transcript abundance with a direct assay of OmpR binding using ChIP-seq to gain a more complete understanding of the regulon and identify novel genes within this network . 
+ Using this approach genome sequences that were enriched included many previously described OmpR-regulated genes ( Fig. 1 ) . 
+ The most highly enriched sequences were upstream of the viaB locus ( tviA ) ( Fig. 2A ) . 
+ Furthermore , there was considerable enrichment in the 5 ′ UTR of ompS1 ( Fig. 2B ) , ompR and between the divergently transcribed csg operons . 
+ These observations provided proof of principle that the ChIP-seq approach identiﬁed wellknown OmpR-regulated genes . 
+ Perhaps surprisingly , substantial enrichment peaks were not observed in the 5 ′ UTR of the ompC and ompF genes , even though these are known to be regulated by OmpR ( Rhee et al. , 2008 ) . 
+ A minor peak that did map to previously identiﬁed OmpR binding sites ( C1 -- 3 ) was present , but fell below the criteria used for peak identiﬁcation . 
+ The reason for the lack of enrichment peaks associated with the ompC and ompF genes is not known , but may be related to the speciﬁc culture conditions used in this study resulting in incomplete phosphorylation of OmpR or due to interference from the C-terminal FLAG epitope tag . 
+ The presence of a C-terminal FLAG epitope had little impact on expression of the tviB and ompF genes but the ompC gene exhibited decreased expression , suggesting that the epitope may impact binding sites differently . 
+ Therefore , it is possible that all OmpR binding sites were not identiﬁed in this study . 
+ Speciﬁc binding of OmpR is thought to depend at least in part on short nucleotide sequence motifs in the 5 ′ UTR region of genes within the regulon ( Huang et al. , 1994 ; Harlocker et al. , 1995 ; Rhee et al. , 2008 ) , although the dependence on speciﬁc sequence is markedly less pronounced than for another two-component regulator of Salmonella , PhoP ( Harari et al. , 2010 ) . 
+ Speciﬁc recognition of these motifs by OmpR is dependent on the phosphorylation state of the regulator and subsequent positive regulation of transcription results from direct interaction with RNA polymerase . 
+ A number of motifs have been proposed based on sequence similarity in the 5 ′ UTR of the ompC and ompF genes of E. coli , and DNAase footprint analysis . 
+ However , the lack of speciﬁ-city for OmpR binding to these motifs is shown by the absence from the upstream sequence of other OmpR-regulated genes . 
+ We used the YMF algorithm to identify sequences that were statistically overrepresented within enriched sequences following immunoprecipitation . 
+ While no such motifs were identiﬁed in 28 of 43 enrichment peaks using this approach , the motif ( TGTWACAW ) was present in 12 enrichment peaks and appeared in multiple copies ( two or three copies ) in seven of these regions . 
+ The motif TGTTACAA was present precisely at the point of greatest ChIP enrichment in the tviA upstream region determined by sequencing . 
+ Furthermore , this sequence was critical for binding of recombinant OmpR -- 6 ¥ His in vitro using an EMSA approach . 
+ A total of four additional motifs were also identiﬁed and generally where these were present they were in the 5 ′ UTR of genes that also contained the common motif TGTWACAW . 
+ This suggested there may be a functional link between these sequences . 
+ A total of 31 orthologous pairs of genes showed OmpR-dependent expression in both S. Typhi and S. Typhimu-rium . 
+ Many more genes were OmpR-dependent in either S. Typhi or S. Typhimurium . 
+ The reason for this distinction is not clear but may be related to differences in the phosphorylation state of EnvZ , the OmpR cognate sensor kinase , that has been reported between these two sero-types ( Oropeza and Calva , 2009 ) . 
+ OmpR has pleiotropic effects on the homeostasis of the bacterial cell and these may manifest differently in Typhi and Typhimurium due to the overall differences in genotype . 
+ Facultative anaerobic bacteria such as E. coli and Sal-monella are thought to occupy a niche in the mucus layer close to the intestinal epithelium . 
+ Here they scavenge monosaccharides produced from the hydrolysis of complex polysaccharides and dietary ﬁbre by anaerobic bacterial members of the microbiota ( Chang et al. , 2004 ) . 
+ However , pathogenic bacteria such as Salmonella ( Stecher et al. , 2007 ) can induce a strong inﬂammatory response that results in a decrease in the population of many components of the microbiota that not only alters the available nutrients ( Stecher et al. , 2008 ) but also available respiratory electron acceptors ( Winter et al. , 2010 ) . 
+ Two divergently transcribed operons that were differentially expressed on inactivation of the ompR gene and contained a candidate OmpR binding site were predicted to be involved in scavenging and transport of alternative carbon sources . 
+ The predicted product of t1787 -- t1790 ( SL1071 -- SL1068 ) had sequence similarity to sialic acid transport and metabolism systems . 
+ However , genetic deletion of SL1071 -- SL1068 did not impact on the utilization of N-acetylneuraminic acid ( sialic acid ) as a sole carbon source under the conditions tested , probably due to the presence of other sialic acid transport system , such as NanA/NanT in S. Typhimurium ( Plumbridge and Vimr , 1999 ) . 
+ The proteins encoded by t1791 -- 3 ( SL1067 -- 1066 ) are also predicted to be involved in sialic acid metabolism . 
+ SL1067 exhibited homology with NanE , an N-acetylmannosamine-6-phosphate epimerase , and SL1066 orthologue has been reported to complement a nanT mutant of E. coli for growth in sialic acid as a sole source of carbon ( Severi et al. , 2010 ) . 
+ However , deletion of these genes did not detectably impact utilization of sialic acid in vitro , presumably because of the presence of nanT and nanE . 
+ However , deletion of these genes resulted in the inability to use a related acetylated carbon compound , N-acetylmuramic acid , as a sole source of carbon during culture in vitro . 
+ Furthermore , although SL1068 -- SL1071 were not obviously required for coloni-zation of the murine host in conventional mixed inoculum experiments , in the colitis model there was a reproducible and statistically signiﬁcant decrease in the ability of RAK105 DSL1068 -- SL1071 to colonize the caecum of streptomycin-pretreated mice in competition with the wildtype parent . 
+ S. Typhimurium RAK103 was indistinguishable from the RAK105 SL1067 -- SL1066 locus in the ability to colonize the murine host . 
+ Sialic acid has several potential impacts on host -- pathogen interactions . 
+ It can be utilized as a carbon or nitrogen source , and is used by Haemophilus inﬂuenzae to modify LPS in order to evade detection by the host immune system ( Severi et al. , 2005 ; 2007 ) , although this has not been reported in enteric pathogens to date . 
+ Nutrient content of the intestine is impacted by the microbial community because of the complex interplay in catabolism of complex nutrients in the luminal contents ( Bertin et al. , 2012 ) . 
+ However , it is likely that nutrient availability is altered as a result of the inﬂammatory response induced by Salmonella during infection , concomitant with disturbance of the normal microbiota ( Stecher et al. , 2007 ) . 
+ Indeed , it was recently reported that Salmonella can use hostderived ethanolamine as a carbon source and respiratory electron acceptor following the switch to anaerobic respiration in the inﬂamed intestine ( Thiennimitr et al. , 2011 ) . 
+ Our ﬁndings suggest that additional OmpR-regulated genes may contribute to nutrient scavenging in the inﬂamed intestine . 
+ Experimental procedures
+ Bacterial culture and strains
+ Salmonella Typhi was cultured routinely in LB broth with aromatic amino acids and pABA supplements as described previously ( Lowe et al. , 1999 ) . 
+ Growth media were supplemented with antibiotics as appropriate at ﬁnal a concentration of 0.05 mg l-1 kanamycin or 0.03 mg l-1 chloramphenicol . 
+ A strain in which the ompR gene is replaced by the aph gene encoding kanamycin resistance has been described previously ( Kingsley et al. , 2003 ) . 
+ To construct a chromosomally encoding ompR : :3 ¥ FLAG , overlap extension PCR was employed to create a sequence encoding an in-frame 3 ¥ FLAG peptide at the C-terminus of the ompR gene . 
+ This was complicated by the overlapping start -- stop codon of the ompB locus ( Parkhill et al. , 2001 ) . 
+ The Shine -- Dalgarno sequence of the envZ gene , predicted to be encoded in the ompR ORF , was encoded downstream of the stop codon after the 3 ¥ FLAG sequence . 
+ This sequence was cloned into the suicide vector pWT12 and the strain TT53 made by allelic exchange ( Turner et al. , 2006 ) . 
+ Primers used were as follows : CGTCAGGCAAACGAACTGCC , 5 ′ to 3 ′ ompR bases ( 364.383 ) , CCGTCATGGTCTTTGTAGTCTGCTTTA GAACCGTCCGGTA ( full reverse primer sequence 5 ′ to 3 ′ ) , GACTACAAAGACCATGACGGTGATTATAAAGATCATGATA TCGATTACAAGGATGACGATGACAAGTAGGTACCGGACG GTTCTAAAGC [ concatenated primers are 5 ′ to 3 ′ forward ( 1:69 FLAG + 1:20 ) ] , CGAAACGCAGGCGGCACG [ reverse for envZ is 5 ′ to 3 ′ ( 213:230 ) ] . 
+ A strain designated RAK105 in which the SL1068 -- SL1071 genes of SL1344 were replaced by the aph gene was constructed using oligonucleotides 5 ′ accataagatcactaatgatgaagctttactccaattgtatttcttcgcTGTG TAGGCTGGAGCTGCTTC 3 ′ and 5 ′ cataagcgcagcgccaccg gccaataacaccaccatccggctttcaattCATATGAATATCCTCCTTAG 3 ′ to amplify with the pKD4 plasmid template . 
+ A strain designated RAK103 in which the SL1067 -- SL1066 genes of SL1344 were replaced by the aph gene was constructed using oligonucleotides 5 ′ cgcgttggcgtcaccgtatgctgtgtcggtatagcgtggtatcatgaaaTGTGTAGGCTGGAGCTGCTTCG 3 ′ and 5 ′ agacataacataaaacggagcaaaacttcaaatatataaggcgga actggCATATGAATATCCTCCTTAG 3 ′ to amplify with the pKD4 plasmid template . 
+ In all cases the mutation was retransduced into S. Typhimurium SL1344 using bacteri-ophage P22 in order to decrease the chances of the accumulation unlinked mutations during the passaging of bacteria during mutation construction . 
+ Strains ( SW738 and SW771 ) in which the wild-type copy of genes SL1066 and SL1067 or SL1068 -- 1071 was replaced in strains RAK103 ( DSL1066 -- 1067 : : aph ) and RAK105 ( DSL1068 -- 1071 : : aph ) , respectively , were constructed using phage-mediated transduction . 
+ In order to select for transductants in this region a cat gene was introduced in the intergenic region of SL1071 and SL1072 using oligonucleotide primers 5 ′ cgcaaagtaaaactcactgaaat-tcttggctaaaattgaaagccgGTGTAGGCTGGAGCTGCTTCG 3 ′ and 5 ′ ccggtctacataagcgcagcgccaccggccaataacaccaccatc CATATGAATATCCTCCTTAG 3 ′ . 
+ The cat gene was then introduced into S. Typhimurium strain RAK105 by P22 transduction and chloramphenicol-resistant transductants selected on LB + Cm culture medium . 
+ Transductants that were resistant to chloramphenicol but sensitive to kanamycin were identiﬁed by replica plating on culture media containing the appropriate antibiotics . 
+ One such transductant was designated SW771 and the replacement of the aph gene with the wild-type SL1068 -- 1071 conﬁrmed by PCR ampliﬁcation . 
+ Expression analyses using microarray data and RNA-seq
+ Bacterial strains were cultured to OD = 0.6 and immediately 600 ﬁxed with RNAprotect ( Qiagen ) and harvested . 
+ The pellet was dried and RNA isolated using SV RNA isolation kit ( Promega ) according to manufacturer 's instructions ; however , elutions were performed using DEPC-treated water ( Ambion ) . 
+ Dye incorporation , microarray design and analysis were performed as described previously ( Kelly et al. , 2004 ) . 
+ RNA-seq data were described previously ( Perkins et al. , 2009 ) . 
+ For S. Typhimurium microarrays , strain SL1344 and variants were cultured shaking at 250 r.p.m. in a New Brunswick Innova 3100 water bath at 37 °C in 25 ml of fresh LB medium inoculated with a 1:100 dilution from an overnight bacterial culture . 
+ Three biological replicates were performed for each strain , and RNA was extracted at an optical density at 600 nm of 0.6 ( mid-exponential phase ) . 
+ RNA was extracted using Prome-ga 's SV 96 total RNA puriﬁcation kit . 
+ RNA quality was assessed on an Agilent 2100 Bioanalyser . 
+ Transcriptomic analyses were performed on a SALSA microarray that contained the 5000 open reading frames ( ORFs ) identiﬁed from the sequence of S. enterica serovar Typhimurium SL1344 , as described previously ( Balbontin et al. , 2006 ) . 
+ Hybridization , microarray scanning and data analysis were all performed as described previously ( Kelly et al. , 2004 ) , using a false-discovery rate of 0.05 . 
+ The expression data have been depos-ited in the NCBI GeneExpression Omnibus http://www.ncbi . 
+ nlm.nih.gov/geo/query/acc.cgi ? 
+ token = pbkdfwskomsowpq & acc = GSE35938 and are accessible through GEO Series Accession Number GSE35938 . 
+ All microarray data are MIAME-compliant . 
+ ChIP-seq
+ Salmonella Typhi ompR : :3 ¥ FLAG ( strain TT53 ) and S. Typhi BRD948 were cultured in LB broth to OD600 = 0.6 , lysed , incubated with 1 % formaldehyde at 37 °C for 20 min to cross-link DNA with protein then quenched with glycine ( ph7 ) to a ﬁnal concentration of 0.5 M. Cells were harvested and washed twice in TBS and lysed by osmotic shock . 
+ Genomic DNA was then sheered by sonication to an average size of 300 bp and immunoprecipitated using anti-3 ¥ FLAG monoclonal antibody ( Sigma , F3165 ) as previously described ( Pfeiffer et al. , 2007 ) using the Protein G Immunoprecipitation kit ( Sigma ) . 
+ Eluates were then treated with pronase ( 0.8 mg ml-1 , Sigma ) at 65 °C overnight . 
+ The nucleotide sequence of genomic DNA fragments was determined by Illumina GAII paired-end sequencing with read length 36 bp and mapped to the S. Typhi Ty2 whole genome sequence ( AE014613 ) . 
+ Sequence data were mapped to the S. Typhi Ty2 genome using the same parameters as previously described ( Perkins et al. , 2009 ) , without assigning the sequence reads to each strand . 
+ Plots were z-score-normalized , in order to indicate the number of stand-ard deviations above or below the mean for each datum point , and the differences between the untagged S. Typhi Ty2 and ompR : :3 ¥ FLAG-tagged associated DNA sequences determined . 
+ Plots were then read into the genome browser tool Artemis ( Rutherford et al. , 2000 ) . 
+ The Peakﬁnder function was used to determine enrichment for OmpR : :3 ¥ FLAG bound DNA sequences . 
+ The Peakﬁnder function ( 36 bp window and z-score cut-off score set to 3 ) identiﬁed 58 peaks . 
+ Due to the background noise of the mapped sequence data plots and low stringency of the Peakﬁnder conditions , identiﬁed peaks were then ﬁltered manually . 
+ Sites of DNA enrichment present within a predicted or known CDS were ignored unless there were multiple similar sites nearby , reducing the total number of analysed peaks to 43 . 
+ Enriched sequences were then input to the motif ﬁnding algorithm YMF with the length of motif set to eight nucleotides and with a maximum of two redundant bases ( Sinha and Tompa , 2003 ) . 
+ RNA extraction, reverse transcription-PCR (RT-PCR) and real-time PCR
+ RNA was extracted from S. Typhi using a Fast RNA Blue Kit ( MP Biomedicals ) according to the instructions of the manufacturer . 
+ RNA samples ( 40 mg ) were DNase I ( Thermo Scientiﬁc ) treated in a 100 ml volume and diluted to 100 ng ml-1 . 
+ RNA samples were reverse transcribed and used as the template for Real-Time PCR with Express One-Step SYBR GreenER ( Invitrogen ) in a 20 ml total reaction volume . 
+ RealTime PCR was performed using a StepOnePlus Real-Time PCR System ( Applied Biosystems ) with the oligonucleotides ( Sigma ) ATATGTTGGGCTTCCTCTGG and TTCAGATAAC GAGCCTCACG ( tviB ) , TTGATGGCCTGCACTACTTC and TGGTTGCCCTGAATCTGATA ( ompC ) , GAAACGCAGAT TAACACCGA and ACTTCCGCGTATTTCAAACC ( ompF ) and TACCTGCTGGCGGAGATTA and ATACCATGCTGAT GCAGAGAA ( waaY ) . 
+ Data were analysed by using the comparative CT method where target gene transcription of each sample was normalized to the CT of the waaY transcript . 
+ Electrophoretic mobility shift assay (EMSA)
+ For preparation of recombinant OmpR -- 6 ¥ His S. Typhi genomic DNA was PCR-ampliﬁed using oligonucleotide primers 5 ′ CATGCCATGGaagagaattataagattctgg 3 ′ and 5 ′ CCGCTCGAGtgctttagaaccgtccggtac 3 ′ . 
+ The ampliﬁed DNA was cloned into pET28 vector into the NcoI and XhoI restriction sites giving rise to pTW1 . 
+ One litre of E. coli BL21 pTW1 was cultured in Luria -- Bertani containing 1 mM IPTG broth at 25 °C to OD600 of 0.6 . 
+ Cells were disrupted using a constant cell disruptor ( Constant Systems ) , centrifuged at 23500 rcf and OmpR -- 6 ¥ His puriﬁed from the supernatant by affinity chromatography using nickel-resin chromatography . 
+ OmpR was phosphorylated with lithium potassium acetyl phosphate as previously described ( Kenney et al. , 1995 ) . 
+ Double-stranded DNA probes were either PCR-ampliﬁed from S. Typhi genomic DNA using primers 5 ′ 6-FAM -- AAC GGGATTTTTACACAACAGAG 3 ′ and 5 ′ 6-FAM -- AGTC ATTATCCATATCTTTAATTTG 3 ′ ( probe 1 ) , or by annealing the oligonucleotides 5 ′ 6-FAM -- TCAAAATAAGAATATT CCTAATCGTATTTGAAATAATCTGTTACAAATTTAATTGTTT 
+ GCACCTTTGGGGTTAAA 3 ′ and 5 ′ 6-FAM -- TTTAA CCCCAAAGGTGCAAACAATTAAATTTGTAACAGATTATTT CAAATACGATTAGGAATATTCTTATTTTGA 3 ′ . 
+ Probes with mutated putative binding motif were generated by overlap extension PCR using the oligonucleotide primers 5 ′ cat agaaaaggtacaagcaatatc 3 ′ , 5 ′ caattaaatgctcggacgattatt-tcaaatacgattaggaatattc 3 ′ , 5 ′ agtatcacccactacccagg 3 ′ and 5 ′ gaaataatcgtccgagcatttaattgtttgcacctttggg 3 ′ , and subsequent ampliﬁcation with 6-FAM labelled primers above . 
+ EMSA binding assay was performed in 10 mM Tris.HCl pH 7.2 , 50 mM KCl , 5 mM MgCl2 , 1 mM EDTA , 1 mM DTT , 1 mg ml-1 BSA , 0.001 mg of poly ( dIdC ) and 5 % glycerol . 
+ 10 nM OmpR -- 6 ¥ His was incubated with various concentrations of 6-carboxyﬂuorescein ( 6-FAM ) - labelled probe DNA shaking for 35 min at 30 °C . 
+ Samples were separated on a 10 % TBE polyacrylamide gel ( Biorad ) and 6-FAM-labelled nucleic acid imaged using a Typhoon 9200 ( Amersham ) . 
+ Animal experiments
+ In all mouse experiments female , 7 -- 8 week-old C57BL/6 mice ( Charles River ) were inoculated orally by gavage with S. Typhimurium suspended in PBS pH 7.4 . 
+ For mixed inoculum experiments in order to distinguish the wild-type strain from the mutant test strains , a cat ( chloramphenicol acetyltransferase , chloramphenicol resistance gene ) was inserted in the S. Typhimurium SL1344 chromosome in a position that has previously been described to have no effect on colonization of the murine host ( Kingsley et al. , 2003 ; Winter et al. , 2010 ) ( phoN locus , strain RAK113 ) . 
+ Groups of ﬁve mice were inoculated orally with a 1:1 ( log10 = 0 ) of approximately 1 ¥ 108 cfu of strain RAK113 and the test strain . 
+ When mice were moribund ( less than 80 % body weight compared with day of inoculation ) or on day 5 post inoculation , mice were culled and cfu of each strain in homogenized mesenteric lymph nodes ( MLN ) , caecum , ileum , spleen and liver was determined by serial dilution in PBS pH 7.4 and plating on LB agar containing cloramphenicol and LB agar containing kanamy-cin . 
+ Serial 10-fold dilutions were plated on LB + Cm or LB + Km agar , as appropriate , to determine cfu per organ . 
+ The ratio of wild-type ( strain RAK113 ) to test strain was transformed to log10 and to determine if these values were signiﬁcantly different from the log10 of the input ratio ( input ratio log10 = 0 ) was determined using the two-tailed Student 's t-test in the Prism 4 software version 4.0 c ( Graph Pad ) . 
+ Value for P < 0.05 was considered signiﬁcantly different . 
+ Minimal media growth assays
+ Salmonella Typhimurium SL1344 and isogenic mutant strains were grown overnight and washed three times in PBS and then plated onto M9 minimal media supplemented with L-histidine ( for SL1344 growth ) and 1 % agar . 
+ The glucose carbon source was substituted with amino sugars . 
+ Wild-type controls were grown concomitantly on separate plates made from the same agar mix . 
+ Ethics statement
+ All animal procedures were performed in accordance with the United Kingdom Home Office Inspectorate under the Animals 
+ ( Scientiﬁc Procedures ) Act 1986 . 
+ The Wellcome Trust Sanger Institute Ethical Review Committee granted ethical approval for these procedures . 
+ Acknowledgements
+ The work described here was funded by the Wellcome Trust through core funding for the Sanger Institute Pathogen Variation Group . 
+ The funders had no role in study design , data collection and analysis , decision to publish or preparation of the manuscript . 
+ The authors have no conﬂicts of interest to declare . 
+ Supporting information
+ Additional supporting information may be found in the online version of this article .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/23232715.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/23232715.txt 0 → 100644
View file @27818a9
+ Crp Is a Global Regulator of Antibiotic Production in Streptomyces
+ ABSTRACT Cyclic AMP receptor protein ( Crp ) is a transcription regulator controlling diverse cellular processes in many bacteria . 
+ In Streptomyces coelicolor , it is well established that Crp plays a critical role in spore germination and colony development . 
+ Here , we demonstrate that Crp is a key regulator of secondary metabolism and antibiotic production in S. coelicolor and show that it may additionally coordinate precursor ﬂux from primary to secondary metabolism . 
+ We found that crp deletion adversely affected the synthesis of three well-characterized antibiotics in S. coelicolor : actinorhodin ( Act ) , undecylprodigiosin ( Red ) , and calcium-dependent antibiotic ( CDA ) . 
+ Using chromatin immunoprecipitation-microarray ( ChIP-chip ) assays , we determined that eight ( out of 22 ) secondary metabolic clusters encoded by S. coelicolor contained Crp-associated sites . 
+ We followed the effect of Crp induction using transcription proﬁling analyses and found secondary metabolic genes to be signiﬁcantly affected : included in this Crp-dependent group were genes from six of the clusters identiﬁed in the ChIP-chip experiments . 
+ Overexpressing Crp in a panel of Streptomyces species led to enhanced antibiotic synthesis and new metabolite production , suggesting that Crp control over secondary metabolism is broadly conserved in the streptomycetes and that Crp overexpression could serve as a powerful tool for unlocking the chemical potential of these organisms . 
+ IMPORTANCE Streptomyces produces a remarkably diverse array of secondary metabolites , including many antibiotics . 
+ In recent years , genome sequencing has revealed that these products represent only a small proportion of the total secondary metabolite potential of Streptomyces . 
+ There is , therefore , considerable interest in discovering ways to stimulate the production of new me-tabolites . 
+ Here , we show that Crp ( the classical regulator of carbon catabolite repression in Escherichia coli ) is a master regulator of secondary metabolism in Streptomyces . 
+ It binds to eight of 22 secondary metabolic gene clusters in the Streptomyces coelicolor genome and directly affects the expression of six of these . 
+ Deletion of crp in S. coelicolor leads to dramatic reductions in antibiotic levels , while Crp overexpression enhances antibiotic production . 
+ We ﬁnd that the antibiotic-stimulatory capacity of Crp extends to other streptomycetes , where its overexpression activates the production of `` cryptic '' metabolites that are not otherwise seen in the corresponding wild-type strain . 
+ Streptomyces bacteria are an important source of bioactive compounds , with their products including two-thirds of clinically prescribed antibiotics , as well as immunosuppressants , anticancer agents , and antiparasitic molecules . 
+ The model streptomycete Streptomyces coelicolor has long been known to produce four chemically distinct antibiotics : actinorhodin ( Act ) ( 1 ) , undecylprodigiosin ( Red ) ( 2 , 3 ) , calcium-dependent antibiotic ( CDA ) ( 4 ) , and the plasmid-encoded methylenomycin ( Mmy ) ( 5 , 6 ) , although Mmy is not produced by the sequenced , plasmid-free S. coelicolor strain M145 . 
+ Act and Red are blue and red pigmented , respectively , and serve as outstanding markers for following the effects of genetic manipulation on antibiotic production . 
+ Recently , the characterized antibiotic repertoire of S. coelicolor has expanded to include a yellow-pigmented polyketide ( yCPK ) ( 7 ) . 
+ Notably , S. coelicolor has the genetic capacity to produce far more secondary metabolites than have been detected in the lab , encoding 22 predicted secondary metabolic gene clusters . 
+ This plethora of clusters specifying unknown molecules is a characteristic shared with all streptomycetes whose genomes have been sequenced to date . 
+ These `` cryptic '' clusters are of considerable interest , as they represent a vast reservoir of potentially novel bioactive molecules . 
+ The genes mediating antibiotic synthesis are usually arranged in contiguous clusters that range in size from a few kilobases to over 100 kb ( 8 ) . 
+ These clusters include genes encoding biosynthetic enzymes , resistance determinants , and regulatory proteins ( 8 ) . 
+ The pathway-speciﬁc regulators for Act ( ActII-ORF4 ) , Red ( RedD and RedZ ) , CDA ( CdaR ) , and yCPK ( CpkO ) activate the synthesis of their respective antibiotics through interactions with promoter regions within their individual clusters ( 8 , 9 ) . 
+ Expression of these activators is in turn controlled by disparately encoded regulators that affect the production of one or more anti-biotics . 
+ More than 15 of these `` global '' antibiotic regulators have been identiﬁed on the basis of their effects on antibiotic production ( 10 ) ; however , direct regulatory connections have been established for only a few of these proteins . 
+ In the `` activator '' class , only the TetR-like regulator AtrA has been characterized biochemically , binding directly to the actII-orf4 promoter region and stimulating its expression and the subsequent production of Act ( 11 ) . 
+ This activity is in contrast to that of the pleiotropic regulator DasR , which inhibits both Act and Red biosynthesis through its binding to sites overlapping the promoters of actII-orf4 and redZ ( 12 ) . 
+ The AbsA1/A2 two-component system also adversely impacts Act , Red , and CDA production , with phosphorylated AbsA2 repressing the expression of the pathway-speciﬁc regulators for each gene cluster ( 13 ) . 
+ Mechanistic insight into the roles played by global regulators -- and particularly the global activators -- is thus a critical missing component in the regulatory networks underpinning antibiotic production , as such activators could provide a key to unlocking the reservoirs of cryptic secondary metabolites encoded within Streptomyces genomes . 
+ Rigorous and multifaceted control of metabolism is a phenom-enon common to all organisms . 
+ One broadly conserved regulator of bacterial metabolism is the cyclic AMP ( cAMP ) receptor protein ( Crp ) . 
+ Crp is found throughout Gram-negative and - positive bacteria , although it is absent in Bacillus and the other Firmicutes ( 14 ) . 
+ Crp has been best studied in Escherichia coli , where it mediates carbon catabolite repression in conjunction with its effector molecule cAMP ( 14 , 15 ) . 
+ In E. coli , Crp binds tightly to ~ 70 different genetic regions and affects the expression of hundreds of genes ( 16 ) . 
+ In the actinomycetes , including Streptomyces , Crp also has important global regulatory roles , although it does not seem to function in carbon catabolite repression ( 17 -- 19 ) . 
+ Previous work on Crp in S. coelicolor has conﬁrmed its ability to interact with cAMP ( 20 ) , while functional studies have primarily focused on its role in morphological development , as crp mutants have very distinct developmental defects ( reduced and delayed germination , small colonies , and accelerated sporulation ) ( 19 , 21 ) . 
+ Here , we probe the function of Crp in controlling secondary metabolism and show that Crp contributes directly to the regulation of multiple antibiotics in S. coelicolor and stimulates secondary metabolism more broadly in the streptomycetes . 
+ Furthermore , we show that Crp directly affects the expression of enzymes needed for precursor synthesis , suggesting an ability to inﬂuence precursor ﬂux into secondary metabolism and a role for Crp at the interface of primary and secondary metabolism . 
+ RESULTS
+ Crp deletion affects antibiotic production in S. coelicolor . 
+ It has been noted previously that S. coelicolor crp mutants produce reduced levels of the blue-pigmented antibiotic actinorhodin ( 19 , 21 ) . 
+ We constructed a crp mutant in wild-type S. coelicolor strain M145 and observed a similar defect in Act production ( Fig. 1A ) . 
+ We set out to examine the antibiotic production potential of the crp mutant strain more broadly and compared the levels of Act , Red , and CDA produced by the mutant relative to its wild-type parent . 
+ Total Act ( actinorhodin and - actinorhodin ) was assessed over a 7-day time course during growth in rich liquid medium . 
+ Act levels in the wild-type strain increased sharply between days 2 and 4 , after which levels remained high through day 7 . 
+ A crp mutant strain , however , produced barely detectable levels of Act throughout the same time course ( Fig. 1B ) . 
+ A similar phenomenon was observed for CDA , where a plate-based bioassay revealed a complete abrogation of CDA production by a crp mutant ( Fig. 1C ) . 
+ In contrast , Red production proﬁles during growth in rich liquid medium were similar in both wild-type and crp mutant strains , although a reproducible lag of ~ 24 h was observed for the crp mutant ( Fig. 1D ) . 
+ In all cases , both the abundance and timing of antibiotic production could be restored to near-wild-type levels by complementing the crp deletion mutant with a construct carrying crp expressed from its native promoter , conﬁrming that the phenotypes were due to crp deletion ( Fig. 1 ) . 
+ These data suggest that Crp has a global inﬂuence on secondary metabolite production . 
+ Crp associates with multiple secondary metabolic gene clusters . 
+ Given the dramatic secondary metabolic defects exhibited by a crp mutant , we wanted to determine the targets of Crp activity in the cell . 
+ As a ﬁrst step , we monitored Crp transcript and protein levels over a 48-h time course in liquid culture ( prior to the onset of signiﬁcant actinorhodin production ) to determine when Crp was expressed . 
+ We found that crp was most highly expressed up until 20 h , after which transcripts decreased to levels barely detect-able by 36 h . 
+ In contrast , Crp protein levels were relatively constant throughout the same 48-h period ( see Fig . 
+ S1 in the supplemental material ) . 
+ We next examined cAMP levels over the time when crp was most highly expressed ( 12 to 20 h ) , as cAMP is presumed to be the effector molecule for Crp , based on studies in other bacteria ( 22 -- 24 ) . 
+ Extracellular cAMP levels were highest at 12 and 16 h , before dropping signiﬁcantly at 20 h ; intracellular levels were too low to be detected , consistent with previously published results ( 25 ) ( see Fig . 
+ S1 ) . 
+ Interestingly , cAMP levels were more than an order of magnitude higher in the crp mutant than in the wild-type strain ( see Fig . 
+ S1 ) ; enhanced cAMP production has been previously observed for crp mutants in both E. coli ( 26 ) and Salmonella enterica serovar Typhimurium ( 27 ) . 
+ We tested the effect of high levels of exogenous cAMP ( 2 mM ) on the behavior of the wild-type strain and found there to be no obvious phenotypic difference between this strain and the one grown without supplementation ( data not shown ) , suggesting that the phenotype of the crp mutant stems from loss of Crp and not from heightened cAMP production . 
+ Consequently , we pursued investigations into Crp targets after growth for 16 h , using chromatin immunoprecipitation assays with puriﬁed Crp-speciﬁc polyclonal antibodies , together with microarray analyses of the precipitated DNA ( ChIP-chip ) . 
+ As a negative control , parallel assays were conducted using crp mutant cultures . 
+ We considered a sequence to be Crp associated if the log2 ( wild-type/mutant signal ratio ) was greater than 3 times the standard deviation above the median ratio ( 1.73 ) and if at least one adjacent probe sequence also met this criterion . 
+ We found 393 Crp-associated sequences , distributed relatively evenly throughout the genome ( Fig. 2A ) . 
+ Candidate target genes were classiﬁed according to their predicted -- or demonstrated -- functions , as described in the literature or as annotated in the Streptomyces database StrepDB ( see Table S1 in the supplemental material ) . 
+ Among the genes with assigned functions , the most abundant functional groups were transcriptional regulators ( 9.9 % of targets ) and proteins involved in metabolism ( 17.6 % of targets ) , of which one-third were predicted , or demonstrated , to participate in secondary metabolism ( 5.1 % ) ( see Table S1 ) . 
+ Notably , eight out of the 22 predicted secondary metabolic clusters in S. coelicolor were associated with Crp binding sites ( Fig. 2A ; Table 1 ) ( 28 ) . 
+ Of the characterized clusters , Crp coimmunoprecipitated with at least two sites in , or upstream of , the coding regions of pathway-speciﬁc regulatory genes for the Act ( SCO5085 ; actII-ORF4 ) , Red ( SCO5881 ; redZ ) , and CDA ( SCO3217 ; cdaR ) biosynthetic gene clusters ( Table 1 ) . 
+ Multiple Crp binding sites were also associated with the biosynthetic genes for yCPK , speciﬁcally , upstream and within cpkA , which encodes a polyketide synthase ( Table 1 ) . 
+ The other four metabolic clusters associated with Crp binding are predicted to code for a nonribosomal peptide synthetase ( NRPS ) ( SCO6429-6438 ) , the sesquiterpene antibiotic albaﬂavenone ( SCO5222-5223 ) , a type II fatty acid synthase ( SCO1265-1273 ) , and a deoxysugar synthase/glycosyltransferase ( SCO0381-0401 ) ( Table 1 ) . 
+ These four clusters all lack obvious pathway-speciﬁc regulatory genes , and each is arranged such that they could be expressed as a single transcriptional unit . 
+ Intriguingly , the Crp-associated sequences for each of these clusters correspond to positions upstream and/or within the ﬁrst gene of each cluster . 
+ This suggests that there is potential for Crp to speciﬁcally regulate the expression of the entire cluster , possibly serving as a `` pathway-speciﬁc regulator '' for those clusters that lack one . 
+ To begin to validate Crp association with select sequences , we constructed a thiostrepton-inducible crp construct and introduced this plasmid into the crp mutant strain . 
+ We conducted chromatin immunoprecipitation assays prior to induction ( time zero ) and after induction for 15 and 45 min ; immunoprecipitated and total DNA samples were then used as the templates for quantitative PCR ( qPCR ) ampliﬁcation of the pathway-speciﬁc regulator-associated sequences for Act ( SCO5085 ; actII-ORF4 ) , CDA ( SCO3217 ; cdaR ) , and Red ( SCO5881 ; redZ ) . 
+ As a negative control , a sequence from SCO4662 ( tuf-1 ) was also subjected to qPCR ampliﬁcation , as this sequence was not identiﬁed as a Crp binding target in our initial ChIP-chip analyses . 
+ All target sequences , apart from the negative control , were enriched in the immunoprecipitated DNA within 15 min and , more signiﬁcantly , after 45 min of Crp induction ( Fig. 2B ) . 
+ This experiment indirectly conﬁrmed SCO5085 ( actII-ORF4 ) , SCO3217 ( cdaR ) , and SCO5881 ( redZ ) as Crp targets . 
+ Electrophoretic mobility shift assays ( EMSAs ) using select Crp-associated sequences failed to yield traditional shifts , a phenomenon that has been noted in previous studies ( 19 , 20 ) and may be due to the unusually low pI ( 5.8 ) of the Streptomyces Crp , relative to its counterpart in other bacteria . 
+ We therefore pursued DNase I footprinting assays on several of the Crp-associated sequences that gave an unusual `` downshift '' in our initial EMSA trials , in an effort to identify a consensus binding sequence ( see Fig . 
+ S2 in the supplemental material ) . 
+ We mapped sites upstream of crp itself , SCO4561 and SCO2977 , and identiﬁed a consensus binding sequence [ GTG ( N ) 6GNCAC ] ; derivatives of this motif could be found in all of the secondary metabolism-associated target sequences , although notably , one-half of the palindrome seemed to be better conserved than the other [ GTG ( N ) 6GNGAN ] ( Fig. 2C ; Table 1 ) . 
+ Crp induction affects the expression of secondary metabolic gene clusters . 
+ Since both phenotypic investigations and ChIP-chip assays had suggested a role for Crp in secondary metabolite regulation , transcriptome proﬁling was conducted to gain further insight into the Crp control of these genes/clusters . 
+ We opted to follow Crp-dependent effects using an inducible system , where crp was expressed from a thiostrepton-inducible promoter , rather than simply comparing expression patterns of wild-type and mutant strains , as these strains grow very differently ( the crp mutant is signiﬁcantly delayed in germination relative to the wild-type strain ) . 
+ RNA samples were prepared from thiostrepton-inducible crp and empty-plasmid control strains , before and after thiostrepton induction , and were analyzed using Affymetrix-based microarrays . 
+ Genes showing at least a 2-fold change in their expression following induction in the crp-containing samples , but not in the negative control , were regarded as potential targets . 
+ Overall , we found the expression of 360 genes to be activated and that of 91 genes to be repressed following Crp induction ( Fig. 2A ; see Table S2 in the supplemental material ) . 
+ Consistent with the ChIP-chip assay results , functional classi-ﬁcation of the Crp-affected genes supported a central role for Crp in governing secondary metabolism , with nearly 20 % of all differentially expressed genes encoding products involved in secondary metabolite biosynthesis ( see Table S2 in the supplemental material ) . 
+ Notably , genes within the Act , Red , CDA , and yCPK clusters were signiﬁcantly upregulated in response to Crp induction ( Fig. 3 ) . 
+ Expression of the NRPS gene cluster ( SCO6429-38 ) that contained a Crp association sequence was activated as well , whereas the albaﬂavenone biosynthetic genes ( SCO5222-23 ) were repressed ( Table 2 ) . 
+ As a further test , we used reverse transcription-qPCR ( RT-qPCR ) to examine the transcription proﬁles of select genes , including those from the Act ( actVA4 , actII-ORF4 ) , Red ( redD , redX ) , CDA ( cdaR , cdaPSI ) , yCPK ( cpkA , scF ) , and albaﬂavenone ( eizA ) biosynthetic clusters ( see Fig . 
+ S3 in the supplemental material ) . 
+ In every case , the RT-qPCR proﬁles matched our microarray results , effectively validating our array data . 
+ When comparing Crp-associated DNA targets from our ChIP-chip experiments with the differentially expressed genes identiﬁed in our microarray experiments , we found overlap not only of key secondary metabolic genes but also of genes encoding key primary metabolic enzymes that make important contributions to second-ary metabolism . 
+ These included genes involved in the synthesis of acetyl coenzyme A ( acetyl-CoA ) ( pckA/SCO4979 ; SCO5261 ) , as well as those needed to synthesize malonyl-CoA ( accA1/SCO6271 ; accA2/SCO4921 ) , both of which are used as precursors by polyketide enzymes in the synthesis of antibiotics and other secondary metabolites ( 29 ) . 
+ Also identiﬁed were genes required for the synthesis of cofactors like ﬂavin mononucleotide ( FMN ) ( e.g. , riboﬂavin biosynthesis , SCO1443-1439 ) , which is needed in the later stages of Act biosynthesis ( Tables 1 and 2 ) ( 30 ) . 
+ These results suggest that Crp activity plays a central role in promoting secondary metabolite production in S. coelicolor , integrating multiple regulatory nodes that include the direct control of antibiotic production via the pathway-speciﬁc regulators and the modulation of primary metabolic pathways feeding into secondary metabolism . 
+ The impact of Crp overexpression on secondary metabolism of Streptomyces . 
+ Crp is well conserved across the streptomycetes , with alignments revealing 90 % amino acid sequence identity shared between different Crp orthologs ( see Fig . 
+ S4 in the supplemental material ) . 
+ Given the importance of Crp to secondary me-tabolism in S. coelicolor , we tested whether Crp overexpression could enhance antibiotic production in this organism . 
+ We cloned the crp gene behind a strong constitutive promoter ( ermE * ) on an integrating plasmid vector whose target integration sequence is found in all sequenced Streptomyces species examined to date . 
+ The Crp overexpression construct , along with an empty-plasmid control , was then conjugated into S. coelicolor , and antibiotic production was analyzed . 
+ Signiﬁcant upregulation of the blue-pigmented Act antibiotic was obvious in surface-grown cultures of the Crp-overexpressing strain ( Fig. 1A ) , and this was further conﬁrmed through quantitative assays of liquid medium-grown cultures ( Fig. 1B ) . 
+ CDA production was also increased ( Fig. 1C ) , while Red production initiated at a higher level than in the control strain ( Fig. 1D ) . 
+ To determine whether the antibiotic-stimulatory effects of Crp were more universal , we introduced the Crp overexpression construct into a number of different Streptomyces species , including both sequenced strains and wild Streptomyces isolates ( see Table S3 in the supplemental material ) . 
+ Using immunoblotting , we conﬁrmed that Crp was overexpressed in these strains , relative to controls bearing the empty-plasmid vector , and veriﬁed that similar total protein levels were being compared using Coomassie blue staining ( see Fig . 
+ S5 ) . 
+ We initially conducted bioassays to compare the antimicrobial production capabilities of these different Streptomyces species carrying either the ermE * - crp construct or the empty vector , using an array of indicator strains ( Escherichia coli , Staphylococcus aureus , and Bacillus subtilis ) . 
+ Crp overexpression appeared to stimulate antibiotic production in the wild isolate Streptomyces sp . 
+ strain WAC4988 , as determined by the enhanced zones of clearing observed for S. aureus and B. subtilis indicator strains ( Fig. 4A ) . 
+ We also followed secondary metabolite production using liquid chromatography coupled with mass spectrometry ( LC-MS ) , to determine whether Crp overexpression induced any signiﬁcant secondary metabolic changes in strains that did not show increased antimicrobial activity . 
+ Some of the most striking changes were observed in Streptomyces sp . 
+ strain SPB74 , where levels of several metabolites were dramatically enhanced in the overexpression strain relative to the control . 
+ For example , molecules with m/z values of 620.189 and 638 were increased by 22-fold ( day 3 ) and ~ 33-fold ( day 7 ) , respectively , in the overexpression strain relative to the control ( Fig. 4B ) . 
+ These ﬁndings support a role for Crp as a global activator of secondary metabo-lism throughout the streptomycetes . 
+ DISCUSSION
+ Crp is a founding member of the cAMP receptor protein ( Crp ) / fumarate-nitrate-reductase ( FNR ) family of regulators and predominantly functions as a transcriptional activator ( 14 ) . 
+ In addition to regulating the catabolite repression pathway in E. coli , Crp also controls a much broader range of cellular functions , including primary metabolism , stress resistance , cell motility , and pathogenesis ( 31 ) . 
+ In S. coelicolor , the function of Crp in spore germination and morphological development has been well documented ( 19 -- 21 ) . 
+ Here , we extend the role of Crp , revealing it to be a central regulator capable of coordinating primary and secondary metabolism and demonstrating that its activity can be coopted to enhance antibiotic production in diverse Streptomyces species . 
+ Overexpressing regulators to activate secondary metabolism is a strategy with a history of success . 
+ Indeed , most classical `` global '' antibiotic regulators in S. coelicolor were initially identiﬁed through their actinorhodin-stimulatory effects following overexpression ( 32 -- 34 ) . 
+ More recently , activation of `` cryptic '' antibiotic clusters has been achieved using both directed approaches involving pathway-speciﬁc regulator overexpression ( e.g. , stambomycin activation in Streptomyces ambofaciens [ 35 ] ) and more-global approaches ( e.g. , overexpressing a mutant allele of the S. coelicolor antibiotic repressor AbsA1 stimulated new antimicrobial activity in Streptomyces ﬂavopersicus [ 36 ] ) . 
+ Crp is highly conserved among streptomycetes and can inﬂuence precursor abundance , the activity of pathway-speciﬁc regulators , and the expression of metabolic clusters lacking cognate regulators . 
+ We found that its overexpression led to increased production of the secondary metabolites Act , Red , and CDA . 
+ These data , and the extent of Crp interactions across the genome , indicate that Crp overexpression has the potential to be a powerful , multifaceted avenue for new secondary metabolite production . 
+ Crp induction has broad effects on both primary and second-ary metabolism . 
+ These processes are necessarily intertwined in the streptomycetes , as the precursors and cofactors required for secondary metabolite assembly are supplied by the primary meta-bolic pathways . 
+ Glycolysis leads to the production of acetyl-CoA , which can be directed into the citric acid cycle , or into any number of other biosynthetic pathways , including polyketide synthesis . 
+ Here , we ﬁnd that Crp directly controls the expression of several enzymes contributing to acetyl-CoA accumulation ( SCO4921 and SCO5261 ) . 
+ The preferred substrate for many polyketide synthase enzymes , however , is malonyl-CoA , whose synthesis requires the activity of an acetyl-CoA carboxylase enzyme complex ( ACCase ) ( 37 ) , whose subunits were either directly ( accA1 and accA2 ) or indirectly ( accBE ) regulated by Crp . 
+ Previous genetic studies have implicated both ACCase ( 38 ) and the malic enzyme encoded by SCO5261 ( 37 ) in actinorhodin production , and recent chemical genetic studies have demonstrated that precursor supply is one factor that limits antibiotic yields ( 39 ) . 
+ In addition to carbon ﬂux , phosphate and nitrogen levels have also been tightly correlated with antibiotic production ( 40 ) . 
+ Phosphate homeostasis in the cell is controlled by the response regulator PhoP , and recent work has shown intriguing cross regulation between PhoP and AfsR ( 41 , 42 ) , where AfsR is a transcription factor that broadly inﬂuences antibiotic production in S. coelicolor through its activation of afsS , a small sigma factor-like protein of unknown function ( 43 ) . 
+ We ﬁnd here that Crp adds an additional dimension to this regulatory interplay : comparisons of previous transcriptomic studies ( 44 , 45 ) revealed the expression of 24 genes to be affected by both AfsS and PhoP , and of these , nearly half were also inﬂuenced by Crp induction in our transcriptomic studies . 
+ A further 25 AfsS and 38 PhoP regulon-speciﬁc members were also affected by Crp activity ( see Fig . 
+ S6 in the supplemental material ) . 
+ These ﬁndings highlight the complex interplay between nutrient availability and antibiotic production and effectively illustrate the integrated nature of these disparate metabolic processes . 
+ An important future goal will be to deﬁne the conditions that stimulate Crp activity and to fully elucidate the regulatory networks connecting primary metabolism with secondary metabolism . 
+ The ability of Crp to modulate fundamental aspects of primary metabolism while at the same time directly to govern the expression of secondary metabolic gene clusters is reminiscent of DasR activity in S. coelicolor . 
+ DasR is a GntR-like regulator that directly controls both antibiotic production and N-acetylglucosamine uptake via the phosphotransferase system ( 12 ) . 
+ The critical difference between Crp and DasR lies at the heart of their regulatory behavior : Crp functions predominantly as an activator , while DasR acts as a repressor ( 12 , 46 ) . 
+ Crp induction led to increased expression for the majority of biosynthetic genes in the Red , CDA , yCPK , and SCO6429-38 gene clusters ( Fig. 3 ) . 
+ It did not have the same extensive effect on the Act cluster , but this is likely due to the nature of Act cluster organization and regulation . 
+ There are three operons under ActII-ORF4 control ( actVI/actVA , actIII , and actI / actVII/VI/VB ) , with the highest-afﬁnity binding site being upstream of actVI ( 47 ) . 
+ Expression from the actVI operon was activated after 60 min of Crp induction ; this was the ﬁnal time point examined in our transcriptome proﬁling experiments , and it is likely that expression of the remaining genes would have been upregulated after this time . 
+ It is worth noting that Crp also has repressor activity , as seen for the sesquiterpene antibiotic albaﬂavenone-encoding genes . 
+ Interestingly , Crp induction also led to repression of a gene encoding a related terpene synthase responsible for geosmin production ( 48 ) , although this effect appeared to be indirect . 
+ The increased cAMP levels observed in a crp mutant also suggested a repressive role for Crp in cAMP accumulation ; however , this also appears to be indirect , as Crp did not associate with sequences near the cya ( adenylate cyclase-encoding ) gene , nor did Crp induction impact cya expression in our transcriptomic experiments . 
+ In S. coelicolor , Crp exerts its regulatory inﬂuence by associating with sequences similar to those identiﬁed for Crp in other bacteria ( 22 , 23 ) . 
+ In E. coli , these binding sites are typically found immediately upstream or overlapping the 35 promoter element , where Crp binding facilitates RNA polymerase recruitment ( 31 ) . 
+ Here , Crp frequently bound multiple sites within any given region , including at least one intragenic site ; intergenic sites were often signiﬁcantly upstream of any mapped promoter , as was seen for the majority of secondary metabolic clusters shown in Table 1 . 
+ This unexpected coding sequence association was not restricted to secondary metabolite gene regulation ; more than 50 % of all Crp-associated sequences were within open reading frames . 
+ Crp in E. coli can bind within coding regions ; however , these are primarily low-afﬁnity binding sites ( 16 ) , whereas here , seven of the top 10 interaction scores for Crp were intragenic , and the highest-afﬁnity sites associated with most secondary metabolic genes were within coding regions . 
+ Collectively , this suggests a very different mechanism of Crp-mediated gene activation in S. coelicolor than that described for E. coli , as none of these intragenic binding sites appear to be associated with internal promoters , as determined by RNA Seq analyses ( M. J. Moody and M. A. Elliot , unpublished data ) . 
+ Intragenic binding is increasingly being observed for transcription factors throughout bacteria : in Salmonella , nearly half of SsrB binding sites are coding region associated ( 49 ) , and a similar situation has been seen for AbrB and Abh in Bacillus subtilis ( 50 ) , while in Pseudomonas syringae , the Crp-related Fur protein associates with intragenic sequences with an afﬁnity comparable to that for intergenic sites ( 51 ) . 
+ A major difference in the intragenic binding by these transcription factors , and that of Crp in S. coelicolor , however , is that the intragenic Crp sites were frequently associated with transcriptional effects ( both activation and repression ) , whereas for the other regulators , such effects were not commonly seen ( 49 -- 51 ) . 
+ Our work here reveals an important new role for the wellstudied Crp regulator in the control of antibiotic production . 
+ To date , Crp is one of the only global antibiotic regulators for which direct regulatory connections to a broad range of secondary met-abolic pathways have been established . 
+ Furthermore , we have shown that its ability to stimulate secondary metabolite production is not limited to S. coelicolor , and our results suggest that Crp overexpression is a useful strategy for accessing the previously untapped reservoirs of Streptomyces antibiotics and other natural products . 
+ MATERIALS AND METHODS
+ Bacterial strains , plasmids , and culture conditions . 
+ Streptomyces strains , Escherichia coli strains , and all plasmids/cosmids used in this study are summarized in Table S3 in the supplemental material . 
+ Streptomyces strains were grown at 30 °C on solid Difco nutrient agar or MS ( soy ﬂourmannitol ) , R2YE ( rich ) , or R5 ( rich ) agar media or in liquid R5 medium as described previously ( 52 ) . 
+ E. coli strains were grown at 37 °C on or in LB ( Luria-Bertani ) medium or in liquid 2 YT ( yeast-tryptone ) broth ( 52 ) . 
+ Antibiotics were added to maintain plasmids when necessary . 
+ Strain and plasmid construction . 
+ An in-frame deletion of crp was created using REDIRECT technology ( 48 ) , and mutants were conﬁrmed by PCR . 
+ The crp mutant strain was complemented using the wild-type crp gene , with extended upstream ( 273-bp ) and downstream ( 284-bp ) sequences , cloned into the integrating plasmid vector pIJ82 ( see Table S3 in the supplemental material ) . 
+ To create a crp-inducible construct , the crp gene was PCR ampliﬁed and cloned into the pCR2.1-TOPO vector ( Invitrogen ) before being subcloned downstream of the tipA promoter in the integrating Streptomyces vector pIJ6902 ( see Table S3 ) . 
+ A constitutive crp overexpression plasmid was made by cloning the crp gene and its downstream sequence immediately downstream of the ermE * promoter in the pMC500 vector ( see Table S3 ) , before excising ermE * - crp and inserting it into pIJ82 . 
+ Plasmids were introduced into Streptomyces strains via conjugation from the nonmethylating E. coli strain ET12567 containing the conjugation `` helper '' plasmid pUZ8002 ( 52 ) . 
+ All DNA oligonucleotides used in this study are summarized in Table S4 . 
+ Crp overexpression , puriﬁcation , and antibody generation . 
+ To create a Crp overexpression plasmid , the crp coding sequence was PCR ampliﬁed ( see Table S4 in the supplemental material ) and ligated into pET15b ( see Table S3 ) . 
+ The integrity of the resulting construct was con-ﬁrmed using sequencing before being introduced into E. coli BL21 ( DE3 ) ( Novagen ) ( see Table S3 ) . 
+ His6-Crp expression was induced overnight at 26 °C with 0.5 mM isopropyl - - D-thiogalactopyranoside ( IPTG ) , before the cells were collected and lysed using a French press . 
+ The protein was puriﬁed from the resulting cell extract using nickel-nitrilotriacetic acid ( Ni-NTA ) afﬁnity chromatography and was eluted using increasing concentrations of imidazole ( 100 mM to 500 mM ) . 
+ Puriﬁed His6-Crp was used to generate polyclonal antibodies ( Cedarlane Labs ) . 
+ To remove His6-tag-reactive species from the crude antiserum , an independent His6-tagged protein ( His6-VirB8 protein from Brucella suis ) was used . 
+ Brieﬂy , the His6-VirB8 protein was immobilized on an Ni-NTA agarose column . 
+ The column was washed ﬁve times with equilibration buffer ( 150 mM NaCl , 50 mM Tris-Cl , pH 7.4 ) , after which anti-Crp antiserum was passed through the column , and the ﬂowthrough was collected as the precleared antiserum . 
+ The precleared antiserum was then further afﬁnity puriﬁed using Ni-NTA-immobilized His - Crp and was eluted with 2 ml of a high-6 salt ( 4 M MgCl2 ) buffer , before buffer exchange into phosphate-buffered saline ( PBS ) . 
+ Cell extract preparation , SDS-PAGE , and immunoblotting . 
+ Cell extracts were prepared from Streptomyces cells grown in liquid R5 medium , and Bradford assays were conducted to measure total protein concentrations . 
+ The protein extracts were separated using SDS-PAGE and either were stained with Coomassie brilliant blue R-250 ( to ensure equivalent protein concentrations in all samples ) or were subjected to immunoblotting with anti-Crp polyclonal antibodies ( 1:2,000 ) and anti-rabbit IgG horseradish peroxidase ( HRP ) - conjugated secondary antibodies ( 1:3,000 ; Cell Signaling ) . 
+ ChIP and microarray assays . 
+ Wild-type strain M145 was grown in liquid R5 medium for 16 h before formaldehyde was added to a ﬁnal concentration of 1 % ( vol/vol ) . 
+ To ensure that we were working with cultures grown to similar optical densities ( OD ) , the crp mutant strain was grown for 64 h ( this strain exhibits signiﬁcant delays in germination and very slow vegetative growth ) before cross-linking . 
+ Cultures were cross-linked at 30 °C for 25 min before glycine was added to a concentration of 125 mM to stop the cross-linking . 
+ Immunoprecipitation was then carried out as described in reference 53 . 
+ DNA labeling , hybridization , and microarray scanning were performed by Oxford Gene Technology ( OGT ) according to their standard protocols . 
+ Microarrays consisted of 44,000 60-mer oligonucleotide probes covering the entire genome of S. coelicolor ( Oxford Gene Technology , Oxford , United Kingdom ) , and each strain was examined in duplicate . 
+ For each array , the signals of all probes were normalized to the median channel signal for the respective array to correct for any systematic errors . 
+ Signal ratios between immunoprecipitated DNA and total reference DNA were obtained for both the wild-type and the mutant strain experiments . 
+ A ﬁnal interaction score was calculated by taking the log2 value of the ratio between the wild-type and the mutant values for each probe . 
+ A probe was considered to contain a binding site only when it , and at least one adjacent probe , showed an interaction score 3 times the standard deviation above the median interaction score ( 1.7 ) . 
+ For temporal ChIP experiments , cultures of the crp strain ( pIJ6902crp ) were grown in liquid R5 medium for 16 h , before thiostrepton was added ( to a ﬁnal concentration of 50 g/ml ) to induce crp expression . 
+ Immunoprecipitation was carried out before induction and after 15 and 45 min , as described above . 
+ Immunoprecipitated DNA was analyzed using qPCR , as described below , and the threshold cycle ( CT ) value was normalized with the total-DNA CT value . 
+ The uninduced sample was used to assess the fold change of the DNA levels in the 15 - and 45-min samples . 
+ Three independent cultures were set up for the ChIP experiments , and qPCRs ( reactions described below ) for each were done in triplicate . 
+ Analyses of variance ( ANOVAs ) were performed using SPSS v17 .0 to test the statistical signiﬁcance ( P value , 0.05 ) of the results . 
+ DNase footprinting . 
+ DNA probes were prepared by PCR amplifying the intergenic regions of SCO3570-3571 , SCO2976-2977 , and SCO4561-4562 using oligonucleotides end labeled with T4 polynucleotide kinase and [ -32 P ] ATP . 
+ In each binding reaction , 0 , 27 , or 81 M Crp protein was incubated with 15,000 cpm of DNA probe at 30 °C for 15 min in the presence of 20 mM Tris-Cl ( pH 7.8 ) , 5 mM MgCl2 , 50 mM KCl , 1 mM dithiothreitol ( DTT ) , 0.1 mM EDTA , 5 % glycerol , 0.5 mg/ml bovine se-rum albumin ( BSA ) , 1 g poly ( dI-dC ) , and 50 M cAMP . 
+ This was followed by digestion using 0.01 U DNase I ( Invitrogen ) in a volume of 40 l at room temperature for 30 s . 
+ The digestion buffer contained 10 mM Tris-Cl ( pH 7.8 ) , 5 mM MgCl2 , and 1 mM CaCl2 . 
+ One hundred sixty microliters of stop buffer ( 200 mM NaCl , 30 mM EDTA , 1 % SDS ) was added to terminate each reaction . 
+ Samples were phenol-chloroform extracted and precipitated . 
+ Each pellet was dissolved in 13 l loading dye ( 80 % [ vol/vol ] formamide , 1 mM EDTA [ pH 8.0 ] , 10 mM NaOH , 0.1 % [ wt / vol ] bromophenol blue , 0.1 % [ wt/vol ] xylene cyanol FF ) and heated to 95 °C for 5 min prior to loading 6 l on a 6 % denaturing polyacrylamide gel . 
+ Sequencing reactions were prepared as described in reference 54 , except with the PCR-ampliﬁed probe sequences as the template . 
+ RNA isolation , RT , and qPCR . 
+ Cultures of crp ( pIJ6902crp ) and crp ( pIJ6902 ) strains were grown as described for the temporal ChIP experiments . 
+ RNA was harvested from cell aliquots before induction ( time zero ) and at 15 , 30 , 45 , and 60 min following induction with thio-strepton ( 50 g/ml ﬁnal concentration ) . 
+ Total RNA was harvested as described previously ( 55 ) , followed by passage through an RNeasy minicolumn ( Qiagen ) . 
+ Reverse transcription ( RT ) reactions were performed as described in reference 56 , except with 2 g of total RNA as the template . 
+ Semiquantitative PCRs were also conducted as described in reference 56 , and we optimized the number of cycles to ensure that ampliﬁcation was occurring within the linear range of the reaction ( 28 cycles for crp and 15 cycles for 16S rRNA ) . 
+ For qPCRs , 1 l of cDNA was used for each 25 l qPCR mixture , together with 1 PCR buffer , 2 mM MgSO4 , 0.2 mM deoxynucleoside triphosphate ( dNTP ) , 1 mM ( each ) gene-speciﬁc primer , 7.5 % dimethyl sulfoxide ( DMSO ) , 0.5 l SYBR green I dye ( 50 in DMSO ) ( Invitrogen ) , and 1.25 U Taq DNA polymerase ( Norgen ) , using a CFX96 qPCR detection system ( Bio-Rad ) . 
+ The cycling conditions used were 95 °C for 5 min , 95 °C for 30 s , 58 or 60 °C for 1 min ( annealing ) , 72 °C for 30 s ( extension ) , and 72 °C for 10 min , with steps 2 to 4 repeated for 40 cycles . 
+ All reactions were performed in triplicate . 
+ Transcriptome proﬁling . 
+ RNA samples were prepared as described above in duplicate and were processed and analyzed at the London Regional Genomics Center . 
+ cDNA samples were created by reverse transcription and were then biotinylated and fragmented before hybridization to custom-designed Affymetrix GeneChip arrays , as described in reference 40 . 
+ The hybridized arrays were stained and washed using an Af-fymetrix Fluidics station 450 and were scanned with an Affymetrix Scanner 3000 7G . 
+ Data were analyzed using the Partek Genomics Suite . 
+ The log2 values of the signals were normalized to the median value of the respective arrays . 
+ The transcriptional fold change of each gene was calculated as the ratio between the induced and the uninduced sample . 
+ Selected targets were validated with RT-qPCR , as described above . 
+ In addition to the genes of interest , 16S rRNA was included as a reference . 
+ For each time point , the CT of a target gene was normalized to the C of 16S rRNA , T which was obtained from the same cDNA . 
+ The uninduced ( time zero ) sample was used to establish a baseline expression level and to determine the fold change in transcript levels at each subsequent point in the time course . 
+ ANOVAs were performed using SPSS v17 .0 to determine whether the results ( microarray and RT-qPCR ) were statistically signiﬁcant ( P value , 0.05 ) . 
+ Antibiotic production assays . 
+ Act and Red production for S. coelicolor M145 ( pIJ82 ) , crp ( pIJ82 ) , crp ( pIJ82crp ) , and M145 ( pIJ82ermE * crp ) strains , grown in liquid R5 medium for 7 days , was quantiﬁed spectro-photometrically as described previously ( 13 , 57 ) . 
+ Three independent cultures were set up for each strain , and duplicate aliquots from each culture were tested . 
+ CDA production bioassays were conducted as outlined in reference 13 , except that Streptomyces strains were grown in liquid R5 medium for 48 h. Four replicates were conducted for each strain . 
+ CDA production was quantiﬁed by measuring the diameter of the inhibition zones . 
+ The levels of all antibiotics were normalized relative to the biomass of the mycelia from which the antibiotics were extracted . 
+ Antibiotic production by S. coelicolor M145 , Streptomyces venezuelae ATCC 10712 , Streptomyces pristinaespiralis ATCC 25486 , Streptomyces sp . 
+ strain SPB74 , Streptomyces sp . 
+ strain WAC4657 , and Streptomyces sp . 
+ strain WAC4988 ( see Table S3 in the supplemental material ) , containing pIJ82 or pIJ82ermE * crp ( crp overexpression construct ) , was tested against the following indicator strains : E. coli , Staphylococcus aureus , and Bacillus subti-lis . 
+ Approximately 106 spores ( in 5 l sterile distilled water ) of each strain were spotted on DNA or R2YE agar plates and incubated at 30 °C for 48 , 72 , 96 , 120 , and 144 h before being overlaid with soft agar ( 1:1 DNA plus Difco nutrient broth ) containing a 100-fold dilution of indicator strain overnight culture in liquid LB medium . 
+ The plates were incubated over-night at 37 °C before measuring the size of the inhibition zone ( distance from the outer edge of each Streptomyces circular patch to the edge of the zone of clearing ) . 
+ Each experiment included four replicates for each strain and was performed three times . 
+ Secondary metabolite extraction and analysis . 
+ crp-overexpressing Streptomyces strains and their vector-alone-containing controls were spread on R5 agar medium and incubated for 3 and 7 days . 
+ The cultures , along with the agar , were diced , soaked in 25 ml n-butanol , and sonicated in a Branson 2520 tabletop ultrasonic cleaner for 3 min before being macerated at room temperature overnight . 
+ The mixture was ﬁltered through Whatman ﬁlter paper and lyophilized in an HT-4X centrifugal vacuum evaporator ( Genevac ) , followed by reconstitution in 500 l acetonitrile-distilled water ( dH2O ) ( 1:1 , high-pressure liquid chromatography [ HPLC ] grade ) . 
+ R5 agar alone was processed in parallel as a negative control . 
+ Each sample was prepared in quadruplicate . 
+ LC-MS analysis was performed on an Agilent 1200 series analytical HPLC system equipped with a reverse-phase C18 column ( 2.1 by 100 mm , 2.6 m , 100 Å ) ( Kinetex ) coupled to a benchtop time-of-ﬂight spectrometer ( Bruker MicroTOF II ; Bruker Daltonics ) . 
+ The samples were separated using a gradient of 5 % to 95 % acetonitrile ( 0.1 % [ vol/vol ] formic acid ) at 50 °C over 22 min , with a ﬂow rate of 0.2 ml/min . 
+ Positive electrospray ionization was performed at 4.5 kV , and the ions were scanned over a mass range of 200 to 1,700 m/z . 
+ Data were analyzed using MZmine 2 software ( 58 ) . 
+ cAMP concentration measurement . 
+ Spores of the wild-type , crp , crp ( pIJ6902crp ) , and crp ( pIJ6902 ) strains were pregerminated and cultured in liquid R5 medium . 
+ For the wild-type strain , samples were harvested at 12 , 16 , 20 , and 24 h , while for the crp ( pIJ6902crp ) strain , cultures were induced at 16 h and samples were harvested at 16 ( preinduction ) , 18 , 20 , and 24 h. Cultures of the negative controls , the crp and crp ( pIJ6902 ) strains , were set up 48 h ahead of the wild-type and crp ( pIJ6902crp ) strains , respectively , and were then followed using the same time course . 
+ At each time point , 7 ml of culture was extracted and cells were pelleted . 
+ For determining extracellular cAMP levels , the supernatant was heated at 95 °C for 5 min and then diluted 10 ( M145 ) - or 40 [ crp ( pIJ6902crp ) ] - fold in work buffer ( BTI ; Biomedical Technologies ) . 
+ Samples were assayed using a cAMP enzyme immunoassay ( EIA ) kit ( BTI ) that allows cAMP quantiﬁcation in the range of 0.5 to 100 pmol/ml . 
+ For quantiﬁcation of intracellular cAMP , the cell pellets were washed in an equal volume of phosphate-buffered saline ( PBS ) ( 0.8 % NaCl , 0.02 % KCl , 0.15 % Na2HPO4 , 0.024 % KH2PO4 , pH 7.4 ) and resuspended in 1 ml of work buffer ( BTI ) . 
+ The mycelia were sonicated on ice and then centri-fuged . 
+ The cell extract supernatant was heated at 95 °C for 5 min before being assayed using the kit . 
+ Each strain was examined in duplicate , and the concentrations were normalized relative to the biomass of the mycelium pellets . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at http://mbio.asm.org / lookup/suppl/doi :10.1128 / mBio.00407-12 / - / DCSupplemental . 
+ Figure S1 , PDF ﬁle , 0.1 MB . 
+ Figure S2 , PDF ﬁle , 0.1 MB . 
+ Figure S3 , PDF ﬁle , 0.2 MB . 
+ Figure S4 , PDF ﬁle , 0.2 MB . 
+ Figure S5 , PDF ﬁle , 0.1 MB . 
+ Figure S6 , PDF ﬁle , 0.2 MB . 
+ Table S1 , PDF ﬁle , 0.1 MB . 
+ Table S2 , PDF ﬁle , 0.1 MB . 
+ Table S3 , PDF ﬁle , 0.1 MB . 
+ Table S4 , PDF ﬁle , 0.1 MB . 
+ We thank Sheila Pimentel-Elardo , Alison Berzins , and Chris Hanke for technical assistance and Mark Buttner and Justin Nodwell for helpful comments and discussions . 
+ This work was funded by Cystic Fibrosis Canada .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/23275538.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/23275538.txt 0 → 100644
View file @27818a9
+ micro-array analysis for genome-wide identification
+ ABSTRACT 
+ Nanobodies are single-domain antibody fragments derived from camelid heavy-chain antibodies . 
+ Because of their small size , straightforward production in Escherichia coli , easy tailoring , high affinity , specificity , stability and solubility , nanobodies have been exploited in various biotechnological applications . 
+ A major challenge in the post-genomics and post-proteomics era is the identification of regulatory networks involving nucleic acid -- protein and protein -- protein interactions . 
+ Here , we apply a nanobody in chromatin immunoprecipitation followed by DNA microarray hybridization ( ChIP-chip ) for genome-wide identification of DNA -- protein interactions . 
+ The Lrp-like regulator Ss-LrpB , arguably one of the best-studied specific transcription factors of the hyperthermophilic archaeon Sulfolobus solfataricus , was chosen for this proof-of-principle nanobody - assisted ChIP . 
+ Three distinct Ss-LrpB-specific nanobodies , each interacting with a different epitope , were generated for ChIP . 
+ Genome-wide ChIP-chip with one of these nanobodies identified the well-established Ss-LrpB binding sites and revealed several unknown target sequences . 
+ Furthermore , these ChIP-chip profiles revealed auxiliary operator sites in the open reading frame of Ss-lrpB . 
+ Our work introduces nanobodies as a novel class of affinity reagents for ChIP . 
+ Taking into account the unique characteristics of nanobodies , in particular , their short generation time , nanobody - based ChIP is expected to further streamline ChIP-chip and ChIP-Seq experiments , especially in organisms with no ( or limited ) possibility of genetic manipulation . 
+ INTRODUCTION
+ Chromatin immunoprecipitation ( ChIP ) is a widely used technique to measure DNA-binding events of transcription factors in vivo . 
+ ChIP , combined with DNA micro-array analysis ( ChIP-chip ) or high-throughput sequencing ( ChIP-Seq ) , allows genome-wide mapping of all locations where a factor is associated , through protein -- DNA or protein -- protein interactions ( 1,2 ) . 
+ In contrast to transcriptomics and proteomics that measure the consequences of the regulatory interactions ( changes in RNA or protein levels ) , which may be because of either direct or indirect ( cascade ) effects , ChIP-chip and ChIP-Seq provide information on the regulatory interactions themselves and are , therefore , the most direct ways to deﬁne regulons . 
+ An additional advantage of ChIP-chip and ChIP-Seq is that the analysis can be performed in a wild-type strain ; there is no need for a gene disruption mutant or a strain that overexpresses a tagged regulatory DNA-binding protein . 
+ The ChIP-chip procedures have been established for different organisms , ranging from prokaryotes and yeasts to higher eukaryotes , including mammals ( 3 -- 9 ) . 
+ In bacteria , ChIP-chip has been applied mainly to Escherichia coli ( 10,11 ) . 
+ The use of ChIP in archaea has been lagging behind and , to our knowledge , has only been applied to Halobacterium salinarum NRC-1 ( 12 -- 14 ) . 
+ A ChIP-chip assay consists of multiple sequential experimental steps . 
+ Living cells are ﬁrst treated with formaldehyde , resulting in covalent cross-linking of DNA-associated proteins to DNA . 
+ Subsequently , nucleoprotein is extracted and sheared into shorter DNA fragments , usually by sonication . 
+ This preparation is then subjected to immunoprecipitation using an antibody speciﬁc for the protein of interest . 
+ After ChIP , the enriched nucleoprotein complexes are treated to hydrolyse the cross-linked complexes , and DNA is puriﬁed . 
+ Generally , the yield of ChIP DNA is too low and needs to be ampliﬁed before array hybridization . 
+ Given the large number of experimental parameters in a ChIP-chip experiment , it is not surprising that there is a wide variation in the design of different studies . 
+ One of the most critical determinants of a successful ChIP-based approach is the antibody ( 5,11,15,16 ) . 
+ ChIP antibodies should be capable of capturing speciﬁcally one single protein of a vast pool of DNA-binding proteins . 
+ It should also be considered that DNA binding and DNA -- protein cross-linking might provoke conformational changes in the nucleoprotein complexes that lead to epitope masking , causing false-negative outcomes , whereas cross-reactivity of the antibodies to non-cognate targets could generate falsepositive outcomes . 
+ Effects of epitope masking can be minimized by using polyclonal antibodies ( pAbs ) ( 17 ) . 
+ However , pAbs increase the frequency of false-positive outcomes , their production requires regular immunization and they exhibit batch to batch variability ( 18,19 ) . 
+ In comparison with pAbs , monoclonal antibodies ( mAbs ) suffer less from the aforementioned problems . 
+ However , the availability of high-quality ChIP-grade mAbs is apparently limited ( 11,20 ) . 
+ Epitope tagging , by homologous recombination-mediated knock-in of the tagged genes , could circumvent the lack of ChIP-grade mAbs . 
+ Although this technology is relatively straightforward for some well-established model organisms , such as Saccharomyces cerevisiae and E. coli ( 7,8,14,21 -- 23 ) , genetic tools to achieve this in many organisms such as Sulfolobus , one of the archaeal model organisms , are still limited ( 24,25 ) . 
+ Moreover , it is not excluded that the characteristics ( e.g. stability , folding efﬁciency , hydrophobi-city ) of a tagged protein may differ from those of the wild-type . 
+ Evidently , such potential differences can affect the outcome of the ChIP experiment . 
+ Monospeciﬁc antigen-binding domains can also be produced by microorganisms at a fraction of the cost of mAbs , and they might constitute a novel and valuable resource of ChIP-grade antibodies . 
+ Especially the recombinant single-domain antigen-binding fragments , such as Nanobodies , seem to be attractive for ChIP . 
+ Remarkably , the antibody repertoire of camelids contains , in addition to conventional antibodies , a novel class of antibodies comprising heavy chains only ( 26 ) . 
+ These antibodies , referred to as heavy-chain antibodies , bind their cognate antigen by virtue of one singlevariable domain , termed VHH or nanobody . 
+ In contrast , the antigen binding by conventional antibodies relies on variable regions of both heavy and light chains ( VH and VL , respectively ) . 
+ Therefore , construction of libraries of antigen-binding domains of conventional antibodies involves random association of VHs and VLs . 
+ Consequently , large libraries are required to restore all possible VH -- VL combinations , of which some may represent the original VH -- VL pairing as it was afﬁnity matured in vivo during immunization with antigen . 
+ As camelid heavy-chain antibodies bind their target antigens by only one single domain , construction of large immune libraries to trap antigen-speciﬁc nanobodies has proven unnecessary ( 27,28 ) . 
+ Construction of libraries of antigen-binding repertoire of conventional antibodies is also complicated by the existence of multiple VH and VL gene families , whereas the vast majority of VHHs belong to one single sub-family ( 28 ) . 
+ The aforementioned technological advantages of constructing ` immune ' nanobody libraries , together with small size , recognition of unique epitopes , high afﬁnity , high solubility , high expression yield in heterologous expression systems and easy tailoring , make nanobodies an interesting class of afﬁnity reagents for various applications ( 27,29,30 ) . 
+ Here , we demonstrate the use of target-speciﬁc nanobodies in ChIP experiments . 
+ As a model system , we chose the well-characterized transcription regulator Ss-LrpB from the hyperthermoacidophilic archaeon Sulfolobus solfataricus ( 31 ) . 
+ Ss-LrpB belongs to the leucine-responsive regulatory protein ( Lrp ) family , a wide-spread and abundant family of regulators in prokaryotes , both bacteria and archaea ( 32,33 ) . 
+ Several regulatory targets of Ss-LrpB have already been identiﬁed by in vitro binding experiments and by in vivo gene expression analysis ( 34 ) . 
+ These targets include the regulator gene itself and a gene cluster juxtaposed to it , encoding a putative ferredoxin oxidoreductase and two permeases . 
+ In this work , different Ss-LrpB-speciﬁc nanobodies were generated and assessed for their capacity to capture spe-ciﬁcally the regulator , either free or bound to DNA . 
+ We then developed a nanobody - based ChIP protocol for S. solfataricus . 
+ The genome-wide application of nanobody - based ChIP for Ss-LrpB is demonstrated by implementation of the Roche NimbleGen microarray TM platform . 
+ The results presented here demonstrate the utility and speciﬁcity of nanobodies as a novel class of afﬁnity reagents for ChIP . 
+ MATERIALS AND METHODS Protein puriﬁcations
+ Full-length non-tagged Ss-LrpB protein was produced recombinantly in E. coli and was puriﬁed by heat treatment and ion exchange chromatography , as previously described ( 35 ) . 
+ The His-tagged C-terminal 2 + domain of Ss-LrpB was puriﬁed by Ni afﬁnity chromatography ( 36 ) . 
+ LysM and Ss-Lrp proteins were produced and puriﬁed by the same procedure as the Ss-LrpB puriﬁcation . 
+ For LysM , E. coli BL21 ( DE3 ) was ﬁrst transformed with construct pLUW632 ( 37 ) . 
+ After puriﬁcation , the Ss-LrpB and Ss-Lrp preparations were dialysed against 20 mM of Tris -- HCl ( pH 8.0 ) , 50 mM of NaCl , 0.4 mM of ethylenediaminetetraacetic acid ( EDTA ) , 0.1 mM of DTT , 12.5 % of glycerol and the LysM preparation against 20 mM of Tris -- HCl ( pH 8.0 ) and 20 % of glycerol 
+ After identiﬁcation as described later in the text , the Ss-LrpB-speciﬁc VHH ( nanobody ) genes were cloned into the pHEN6c vector , which allows expression of nanobodies in fusion with His6 tag ( 38 ) . 
+ Expression and puriﬁcation of nanobodies were performed as previously described ( 39 ) . 
+ Protein concentrations in the case of Ss-LrpB expressed in monomeric units were determined by ultraviolet absorption at 280 nm and by densitometric analysis of Coomassie stained sodium dodecyl sulphate ( SDS ) -- polyacrylamide gel ( PAG ) . 
+ Generation of Ss-LrpB-speciﬁc nanobodies
+ Ss-LrpB-speciﬁc nanobodies were generated by immunizing an alpaca ( Vicugna pacos ) with puriﬁed fulllength Ss-LrpB . 
+ Using peripheral blood lymphocytes of the animal , a VHH library was constructed , and speciﬁc nanobodies were selected according to published methods ( 38 ) . 
+ Surface plasmon resonance
+ Surface plasmon resonance ( SPR ) measurements of the interactions of Ss-LrpB-speciﬁc nanobodies with their antigen were performed with a Biacore T200 instrument . 
+ All measurements were performed in phosphate-buffered saline ( PBS ) at 25 C. CM5 chips ( GE Healthcare ) were used to covalently couple Ss-LrpB via its primary amines of lysine residues . 
+ Ss-LrpB was immobilized onto the chip until the signal reached 500 resonance units ( RUs ) . 
+ Measurements were performed by applying various concentrations of nanobodies ( between 3 and 500 nM ) as analyte to the chip , at a ﬂow rate of 30 ml/min . 
+ An association phase of 150 s was followed by a dissociation phase of 600 s. Regeneration was achieved by washing the chip with 10 mM of glycine hydrochloride ( pH 2.0 ) for 20 s , at a ﬂow rate of 60 ml/min . 
+ The association and dissociation curves of the sensorgrams were analysed with the Biacore Evaluation software , version 2.0 , yielding kinetic and equilibrium binding constants . 
+ Epitope analysis was done with 250 , 500 or 750 nM of each nanobody , either alone or combined , also at a ﬂow rate of 30 ml/min . 
+ Each association phase lasted 200 s. 
+ Immunoprecipitation (pull-down) assays with crude cell extracts
+ Escherichia coli BL21 ( DE3 ) crude cell extracts containing one of the three Lrp-like transcription factors from S. solfataricus ( Ss-LrpB , LysM or Ss-Lrp ) , expressed from recombinant pET24 vectors , were used for these experiments . 
+ Crude extracts from BL21 ( DE3 ) containing an empty pET24 vector served as negative control . 
+ Cell pellets from 20 ml cultures were resuspended in 1 ml of IP buffer [ 150 mM of NaCl , 50 mM of Tris -- HCl ( pH 8.0 ) , 1 % of Triton X-100 , 0.5 % of NP-40 , 1 % of deoxycholate ] , sonicated and centrifuged . 
+ Aliquots of 200 ml of the supernatants were incubated with different amounts of His-tagged nanobodies for 20 min at room temperature . 
+ Subsequently , the pull-down was performed using Nickel-NTA magnetic particles ( Bio-Nobile ) following supplier 's recommendations . 
+ The nanobody -- antigen complexes were eluted with 100 ml of PBS containing 250 mM of imidazole , and the eluted proteins were analysed using 12 % of SDS -- PAG electrophoresis ( PAGE ) . 
+ Native protein polyacrylamide gel electrophoresis
+ Native protein gel electrophoresis was performed with 10 % of Tris -- glycine gel ( Invitrogen ) . 
+ Each reaction mixture , with a total volume of 20 ml , contained 20 mg of Nb9 and/or 10 mg of the respective Lrp-like protein . 
+ Reaction mixtures were incubated for 30 min at room temperature before gel analysis . 
+ The electrophoresis was performed in Tris -- glycine ( pH 8.5 ) electrophoresis buffer at 125 V for 4 -- 5 h . 
+ The gel was stained with Coomassie blue . 
+ Electrophoretic mobility shift assays
+ Electrophoretic mobility shift assays ( EMSAs ) were performed with labelled DNA prepared by polymerase chain reaction ( PCR ) . 
+ One of the two oligonucleotides was 50-end labelled with 32P using [ g-32P ] - adenosine triphosphate ( Perkin Elmer ) and T4 polynucleotide kinase ( Fermentas ) . 
+ The PCR mixtures contained Taq DNA polymerase ( Ready Mix , Sigma-Aldrich ) , the labelled primer , a second non-labelled primer and the recombinant vector pUC18 / o Ss-lrpB or pBendBox1 as p template ( 31 ) . 
+ Primer sequences are given in Supplementary Table S1 . 
+ Labelled DNA fragments were puriﬁed from PAG . 
+ EMSA experiments were performed as previously described ( 40 ) . 
+ Binding reactions were allowed to equilibrate for 20 min at 37 C in Lrp-binding buffer [ 20 mM of Tris -- HCl ( pH 8.0 ) , 1 mM of MgCl2 , 50 mM of NaCl , 0.4 mM of EDTA , 0.1 mM of DTT , 12.5 % of glycerol ] . 
+ For binding reactions in which Ss-LrpB and nanobodies were combined , Ss-LrpB was pre-incubated with DNA for 20 min before addition of nanobody . 
+ Cell culture and formaldehyde cross-linking
+ Sulfolobus solfataricus P2 ( DSMZ 1617 ) was cultured aerobically by shaking at 80 C in Brock medium ( 41 ) supplemented with 0.1 % of tryptone as carbon and nitrogen source . 
+ Depending on the downstream application , 50 ml or 200 ml cultures were grown . 
+ When cells were in mid-exponential growth phase [ at optical density ( OD ) 600 nm of 0.5 ] , the cultures were cooled to 37 C , and formaldehyde was added to a ﬁnal concentration of 1 % , while shaking for 5 min , unless otherwise noted . 
+ The cross-linking reaction was quenched by adding glycine to a ﬁnal concentration of 125 mM , followed by an additional incubation of 5 min at 37 C. To ﬁnd the optimal cross-linking time , 50 ml cultures were formaldehyde-treated for different periods , centrifuged and sonicated for 5 min ( see later in the text for sonication details ) . 
+ After this treatment , 200 ml aliquots were subjected to phenol extraction to separate proteinfree from complexed DNA . 
+ This extraction was done by mixing with 2 volumes of phenol/chloroform/isoamyl alcohol ( 25:24:1 ) followed by 5 min of centrifugation a 
+ 20 817g . 
+ The protein-free DNA fractions were recovered from the upper aqueous phase by ethanol precipitation . 
+ Finally , the extracted DNA was treated with 16.5 ng RNase ( Invitrogen ) for 2 h at 37 C and column-puriﬁed . 
+ The degree of cross-linking was assessed by quantitative PCR ( qPCR ) analysis of the protein-free DNA samples relative to DNA from a non -- cross-linked sample , which represented total DNA , and was prepared by extraction and puriﬁcation as described for cross-linked samples . 
+ Sonication
+ Cross-linked cells from either 50 ml or 200 ml cultures were centrifuged at 3220g for 10 min ; cell pellets were washed twice with PBS and resuspended in 3 ml IP buffer . 
+ Sonication was performed with a Bioblock Scientiﬁc-Vibracell sonicator at 20 % of the maximal amplitude , in a pulsed operating mode with 9 s rest in between each 3 s of operation . 
+ The total operating time was 9 min , unless otherwise stated . 
+ Cells were continuously cooled during sonication . 
+ After sonication , the samples were centrifuged at 21 000g for 15 min , and the supernatants were used for ChIP as described later in the text . 
+ For small-scale sonication tests , samples from 50 ml cultures were cross-linked for 5 min and sonicated with total operation times between 3 and 30 min . 
+ For each sample , a 200 ml aliquot was puriﬁed by phenol extraction , and a 200 ml aliquot was de-cross-linked , as described later in the text . 
+ These two samples , corresponding to proteinfree DNA and total genomic DNA , respectively , were analysed by qPCR to calculate the ratio of cross-linked protein -- DNA complexes versus total DNA . 
+ The DNA size distribution was analysed by 1 % agarose gel electrophoresis of de-cross-linked samples . 
+ Chromatin immunoprecipitation
+ For each ChIP assay , 0.5 mg of puriﬁed nanobody was added to 3 ml cross-linked sonicated sample obtained as previously described after centrifugation . 
+ The mixtures were incubated overnight at 4 C . 
+ In parallel , 1 ml of His-Select Nickel Afﬁnity Gel suspension ( Sigma-TM Aldrich ) was blocked overnight at 4 C with IP buffer containing 0.5 % of bovine serum albumin , and it was added the next day to the mixtures of the nanobody -- cross-linked/sonicated sample . 
+ After 2 h incubation at room temperature , the gel pellets were washed three times with 4 ml of IP buffer each . 
+ The ChIP-enriched fractions were then eluted from the gel pellet by the addition of 400 ml of elution buffer [ 50 mM of Tris -- HCl ( pH 8.0 ) , 1 % of Triton X-100 , 0.5 % of NP-40 , 1 % of deoxycholate , 1 % of SDS , 300 mM of NaCl , 250 mM of imidazole ] and a further incubation of the gel mixture at room temperature for 1 h. Subsequently , 400 ml samples were subjected to de-cross-linking by incubation at 55 C for 16 h , followed by addition of 1 volume of protein lysis buffer [ 10 mM of Tris -- HCl ( pH 8.0 ) , 1 mM of EDTA , 31 nM of proteinase K , 0.9 mg/ml of glycogen ] and incubation at 37 C for 2 h. DNA was recovered from the mixture by phenol extraction , followed by a treatment with 50 ml of RNase A solution ( 33 ng/ml ) at 37 C for 2 h and by column puriﬁcation ( Qiagen ) . 
+ Input DNA , sampled after sonication , was also de-cross-linked and puriﬁed as aforementioned . 
+ Finally , all ChIP samples were ampliﬁed by whole-genome ampliﬁcation using the WGA-2 Kit ( Sigma-Aldrich ) following the manufacturer 's instructions for ChIP-chip samples in which the heat-induced fragmentation step is omitted . 
+ Mock immunoprecipitations were performed with a BcII b-lactamase-speciﬁc nanobody ( here referred to as NbX ) ( 38 ) . 
+ For the spiking immunoprecipitation experiment , covalently cross-linked Ss-LrpB -- DNA complexes were prepared as follows : formaldehyde was added ( 1 % , ﬁnal concentration ) to a mixture of 7.7 pM of pUC18p/oSs-lrpB plasmid DNA ( 31 ) and 1.5 nM Ss-LrpB protein , and it was incubated at room temperature for 10 min . 
+ The reaction was then quenched by adding glycine ( ﬁnal concentration 125 mM ) and by incubating for 5 min at 37 C . 
+ The mixture was sonicated as previously described . 
+ For all ChIP samples , enrichment was evaluated by qPCR relative to input DNA . 
+ Microarray design and data analysis
+ For DNA microarray analysis , a custom 385 K highdensity tiling array was designed and manufactured by NimbleGen ( Madison , WI , USA ; www.nimblegen . 
+ TM com ) . 
+ The probes ( 50 -- 75 bases , with an average tiling interval of 14 bases ) were designed based on the S. solfataricus P2 genome sequence ( 42 ) . 
+ Each probe occurred twice on each array . 
+ Sample labelling , hybridization and array processing were performed at NimbleGen . 
+ The ChIP input and output samples TM were labelled with Cy3 and Cy5 , respectively . 
+ The Ringo package ( 43 ) was applied to analyse the raw data sets , including removal of unreliable signals , normalization and smoothing of the data and assignment of ChIP-enriched regions ( chers ) . 
+ Venn diagrams were generated using ChIPpeakAnno ( 44 ) . 
+ All microarray data are available in Supplementary Material . 
+ Quantitative real-time PCR
+ qPCR reactions were performed with a My-iQ Single TM Colour Real-time PCR system ( Bio-Rad ) . 
+ Ampliﬁcation and detection were achieved using SYBR Green Master Mix ( Bio-Rad ) . 
+ Each 25 ml of PCR reaction contained 10 ng of template DNA and 200 nM of each primer . 
+ Cycling conditions ( 10 min at 94 C and 40 cycles of 30 s at 94 C , 30 s at 60 C ) were followed by melt curve analysis . 
+ Amplicon sizes were between 100 and 250 bp ; all primers are listed in Supplementary Table S1 . 
+ Quantiﬁcation cycles ( Cq ) were determined by My-iQ software ( Bio-Rad ) , and relative quantitative analysis was done using the 2 method ( 45 ) . 
+ All measurements Ct were normalized to reference DNA , a non-related sequence fragment ampliﬁed by PCR from E. coli gDNA , and spiked at 30 ng/sample before sonication . 
+ Experiments were performed at least in duplicate 
+ RESULTS
+ Generation , afﬁnity determination and epitope mapping of Ss-LrpB-speciﬁc nanobodies 
+ An alpaca was immunized by six injections at weekly intervals , each time with 200 mg of puriﬁed full-length recombinant Ss-LrpB . 
+ The plasma obtained 4 days after the last injection showed an end-titre of 10 [ the plasma 5 dilution which still gives an antigen-speciﬁc enzyme-linked immunosorbent assay ( ELISA ) signal , which is 3-fold above background ] . 
+ Subsequently , an immune VHH library comprising 10 independent transformants was 8 constructed from the peripheral blood lymphocytes taken from the immunized animal 4 days after the last immunization . 
+ The library was subjected to two distinct bio-panning experiments , against either full-length Ss-LrpB or the C-terminal domain of Ss-LrpB ( ` Ss-LrpB CTerm ' ) ( 36 ) . 
+ We included the selection against Ss-LrpB C-Term because the N-terminal domain of Lrp-like transcription regulators contains the DNA-binding helix-turn-helix motif . 
+ By selecting binders that recognize epitopes located in the C-terminal oligomerization and effector binding domain , we aimed to increase the chances of obtaining nanobodies interacting with regions of the regulator that remain accessible on DNA binding . 
+ After three rounds of selection , > 200 clones were randomly chosen to assess their nanobody to recognize Ss-LrpB in an ELISA . 
+ The nanobody nucleotide sequence of 75 ELISA positive clones ( 19 and 56 clones from panning against full-length Ss-LrpB and Ss-LrpB CTerm , respectively ) was determined , resulting in 47 different genes ( 10 and 37 against full-length Ss-LrpB and Ss-LrpB C-Term , respectively ) , encoding proteins that differ from each other in at least 1 amino acid . 
+ Several of the nanobody sequences possess a nearly identical CDR3 ( the third hypervariable antigen-binding loop ) . 
+ Of these binders , it is known that they interact with the same epitope , although sequence differences in other antigen-binding loops ( CDR1 and CDR2 ) might affect the afﬁnity for the antigen ( 46 ) . 
+ Remarkably , 24 different CDR3 sequences could be discerned for the Ss-LrpB C-Term target , and only two different CDR3 sequences were obtained for the full-length target . 
+ Twelve nanobody genes ( 2 for the full-length and 10 for the C-Term ) were re-cloned in an expression vector in fusion with a His6 tag , expressed and the recombinant protein puriﬁed to homogeneity to determine the afﬁnity and to identify binders that recognize a unique epitope by both , surface plasmon resonance ( SPR ) and ELISA . 
+ Based on these criteria , it was decided to continue with three nanobodies , designated Nb1 , Nb11 and Nb9 ( Figure 1A ) . 
+ The ﬁrst two nanobodies originated from the pannings on the full-length Ss-LrpB , the Nb9 was retrieved from the C-Term selections . 
+ The afﬁnity ( KD ) as obtained from SPR ranged from 40 ( Nb11 ) to 1 nM ( Nb9 ) ( Table 1 ) . 
+ The high afﬁnity of Nb9 is attributed to a high association rate constant ( kon ) , which is 6-and 64-fold higher than the kon of Nb1 and Nb11 , respectively ( Table 1 ) . 
+ The koff rate ( 10 s ) is similar 3 1 for all three nanobodies . 
+ Therefore , Nb9 is the best candidate nanobody for ChIP in terms of afﬁnity . 
+ SPR experiments , involving the sequential injection of two nanobodies at target saturating concentrations on the immobilized Ss-LrpB , further demonstrated that Nb1 , Nb11 and Nb9 indeed bind to independent sites on the regulator ( Figure 1B ) . 
+ This ﬁgure shows that Nb1 and Nb9 bind concomitantly to the immobilized Ss-LrpB protein : a second injection of the same nanobody did not result in a signiﬁcant RU change , indicating saturation of the ﬁrst occupied epitope . 
+ However , a second injection of the counterpart nanobody resulted in a similar RU change as observed after the ﬁrst injection . 
+ Similar results were obtained for Nb1/Nb11 and Nb9/Nb11 combinations ( data not shown ) . 
+ The same epitope grouping whereby the three nanobodies recognize three independent epitopes on the same antigen was further conﬁrmed by ELISA ( data not shown ) . 
+ Speciﬁcity of the Ss-LrpB–nanobody interaction
+ Thus far , all interaction analyses were performed with puriﬁed immobilized Ss-LrpB and , therefore , do not address the speciﬁcity of the nanobodies for their cognate target in a complex mixture . 
+ Moreover , immobilization of antigen can lead to ( partial ) denaturation , thereby exposing epitopes that are not present or accessible in the native soluble protein . 
+ To evaluate the capacity of nanobodies to capture Ss-LrpB in solution and to provide further information on their speciﬁcity , we performed pull-down assays with the three selected nanobodies , using total protein extracts from E. coli cells expressing recombinant Ss-LrpB ( Figure 2A ) . 
+ It is clear that different amounts of Nb1 ( from 6.5 to 25 mg ) capture speciﬁcally Ss-LrpB from crude cell extract , and that E. coli endogenous proteins are not observed after pull-down . 
+ Similar results were obtained in pull-down assays with Nb9 and Nb11 ( data not shown ) . 
+ The control nanobody , NbX with speciﬁcity for an antigen that is not expressed by E. coli , fails to capture Ss-LrpB or any other protein , thereby demonstrating its suitability as a negative control in ChIP ( Figure 2A , lower panel ) . 
+ However , the lack of cross-reactive antigens in E. coli does not guarantee that proteins displaying homology with Ss-LrpB are absent in S. solfataricus . 
+ In particular , S. solfataricus Lrp family members , other than Ss-LrpB , may cross-react with Ss-LrpB-speciﬁc nanobodies . 
+ Two such Lrp family members , Ss-Lrp ( encoded by Sso0606 ) and LysM ( encoded by Sso0157 ) , have already been reported to exhibit 31 and 25 % sequence identity and 60 and 52 % sequence homology to Ss-LrpB , respectively ( 32,47 ) . 
+ In addition , these Lrp-like transcription factors exhibit large structural homologies ( 33 ) . 
+ Nevertheless , physical interactions between Nb9 and LysM could be excluded from the results of a native protein PAGE with mixtures of the two proteins ( Figure 2B ) . 
+ The Lrp-like regulators migrate out of the gel towards the cathode because of their high-isoelectric point , whereas the nanobodies migrate into the gel because of their overall negative charge at pH 8.5 . 
+ Complexes between Lrp proteins and nanobodies enter into the gel , but with a slower migration velocity than the nanobodies alone . 
+ Nb9 forms a stable complex with Ss-LrpB , whereas this type of complex is not observed with LysM ( Figure 2B ) . 
+ Pull-down assays with total protein extracts from E. coli cells overexpressing Ss-Lrp or LysM further demonstrate the inability of Nb9 to capture these proteins ( Figure 2C ) . 
+ Similar results were obtained with Nb1 and Nb11 ( data not shown ) . 
+ Interaction between nanobodies and DNA-bound Ss-LrpB
+ DNA binding might inﬂuence the interaction between antibody and transcription factor because of epitope masking and/or conformational changes . 
+ Here , we used EMSAs ( Figure 3 ) to investigate the possible occurrence of supershifts as readout for nanobodies associating in vitro with Ss-LrpB in complex with its cognate target DNA , and as an indicator for the suitability of nanobodies for ChIP . 
+ Using a DNA fragment containing a single binding site for SsLrpB , stable complexes are formed , in which the semi-palindromic binding site is bound by an Ss-LrpB dimer ( Figure 3A ) . 
+ It is shown that the nanobodies do not provoke a band shift of the free DNA in an EMSA ( last lane ) . 
+ However , the addition of Nb1 or Nb11 to pre-equilibrated Ss-LrpB -- DNA complexes shifts the protein -- DNA equilibrium towards dissociation of the preformed complexes . 
+ The Nb1 induced dissociation occurs at lower nanobody concentration than with Nb11 , indicating that the dissociation is proportional to the afﬁnity of the nanobody -- Ss-LrpB interaction . 
+ This dissociation can be explained as resulting either from a direct association of the nanobody with the DNA-binding face o the Ss-LrpB , so that the nanobody competes effectively with the DNA for the Ss-LrpB protein , or from a nanobody - induced conformational change in Ss-LrpB that affects its DNA binding allosterically in a negative fashion . 
+ The former explanation would discourage the use of Nb1 and Nb11 in ChIP , as their Ss-LrpB epitopes are probably unavailable in cross-linked nucleoprotein . 
+ Conversely , Nb9 , with speciﬁcity to the C-terminal domain of Ss-LrpB , binds to the Ss-LrpB -- DNA complex as evidenced by supershifting ( Figure 3A , middle panel ) . 
+ This suggests that Nb9 recognizes an epitope that is not directly involved in , or affected by , DNA binding . 
+ Therefore , Nb9 is deﬁnitely the best candidate for ChIP . 
+ Ss-LrpB interacts with its main DNA targets ( control region of own gene and of the neighbouring pyruvate ferredoxin oxidoreductase ( porDAB ) operon ) by binding cooperatively to three regularly spaced semi-palindromic binding sites ( 31,34 ) . 
+ On binding , all three sites of the Ss-lrpB control region , the three protein dimers closely interact and are assumed to wrap the DNA , causing large conformational changes with respect to the nucleoprotein complex with a single binding site ( 35 ) . 
+ Besides the formation of three speciﬁc complexes , nonspeciﬁc binding is observed on adding larger amounts of Ss-LrpB , visible as a complex ( annotated ` NS ' ) of variable relative mobility in gel and dependent on the protein concentration ( 31 ) . 
+ To analyse Nb9 interaction with nucleoprotein complexes involving three Ss-LrpB binding sites , which are expected to be prevalent in vivo , an EMSA was performed with a tripartite operatorcontaining DNA fragment ( Figure 3B ) . 
+ At the highest Ss-LrpB concentration used , the addition of Nb9 causes supershifting and the disappearance of the triple bound Ss-LrpB -- DNA complex ( C3 ) , indicating the recognition of these complexes by Nb9 . 
+ At the lowest Ss-LrpB concentration used , with all three distinct complexes ( C1 , C2 and C3 ) being present , Nb9 interacts only with complexes having two and three Ss-LrpB dimers bound ( C2 and C3 ) . 
+ These data , in conjuncture with those presented in Figure 3A ( middle panel ) , suggest that although Nb9 binds all three complexes , it preferentially interacts with complexes involving two and three Ss-LrpB dimers . 
+ Optimization of the nanobody -assisted ChIP assay for S. solfataricus
+ Cross-linking conditions
+ The formaldehyde cross-linking of DNA -- protein complexes is a crucial step in ChIP ( 17 ) . 
+ As this process is temperature-dependent and is reversed at high temperatures , it is impossible to perform cross-linking at physiological temperatures of hyperthermophilic organisms . 
+ Although formaldehyde-induced ﬁxation of hyperthermophilic archaea chromatin works sufﬁciently well at room temperature ( 48,49 ) , we performed formaldehyde cross-linking at 37 C which is , as compared with room temperature , closer to hyperthermophiles ' physiological temperature . 
+ Cross-linking time is also an important parameter : a time that is too short might lead to insufﬁcient cross-linking , and as a consequence to inability to detect an interaction , whereas excessive cross-linking might lead to epitope unavailability because of epitope masking or aggregation ( 11,17,50 ) . 
+ To optimize the cross-linking time , we performed a time-course experiment in which the efﬁciency of cross-linking was evaluated by separating cross-linked from non -- cross-linked DNA with phenol extraction followed either by gel electrophoresis analysis ( see Supplementary Figure S1 ) or by qPCR quantiﬁcation of SsLrpB-target and non-target genomic regions ( Figure 4A ) . 
+ For all genomic regions tested , of which two are shown in Figure 4A , the fraction of protein -- cross-linked DNA ( over total DNA ) reached values between 84 and 99 % after 1 min cross-linking . 
+ Moreover , these values did not change signiﬁcantly on increasing the cross-linking time . 
+ Cross-linking efﬁciencies varied somewhat depending on the genomic region , which can be explained by differences in the abundance of genome-associated proteins ( 51 ) . 
+ In conclusion , formaldehyde treatment for 1 min at 37 C is sufﬁcient to cross-link S. solfataricus chromatin . 
+ This time is considerably shorter than the cross-linking time reported previously in eukaryotic or bacterial ChIP protocols , which varies from 10 min to several hours ( 7,50,52 ) . 
+ Note that we analysed cross-linking globally while individual proteins can have varying cross-linking efﬁciencies . 
+ To ensure successful cross-linking of Ss-LrpB , we decided to perform the cross-linking for 5 min for all further experiments . 
+ Sonication conditions
+ Sonication , to fragment the DNA in appropriate sizes and to solubilize the chromatin , is one of the most variable and critical steps in ChIP , and optimal conditions depend on cell type , cell quantity , chromatin structure and so forth ( 5 ) . 
+ Insufﬁcient sonication might lead to loss of resolution of binding events , whereas over-sonication can result in the disruption of cross-linked protein -- DNA complexes and introduction of noise in the microarray data . 
+ To determine the optimal sonication conditions , we performed small-scale tests for different periods of time ranging from 3 to 30 min and analysed both fragment size distribution and dissociation of protein -- DNA complexes ( Figure 4B and C ) . 
+ Sonication for 6 , 9 and 18 min yielded similar results with DNA size distribution of 0.2 -- 0.6 kb . 
+ On longer sonication ( 18 min ) , cross-linked protein -- DNA complexes tend to dissociate ( Figure 4C ) . 
+ The extent of dissociation was somewhat variable , depending on the genomic region under study . 
+ To ensure both the stability of cross-linked protein -- DNA complexes and an optimal fragment size distribution , we chose to sonicate for 9 min in further experiments . 
+ Minimal amount of cells
+ The number of cells subjected to ChIP is also an important element for a successful assay . 
+ It needs to be sufﬁciently high to obtain robust results ( 5 ) , whereas it also affects the concentration of targets , so that i combination with the antibody afﬁnity ( and speciﬁcity ) it might inﬂuence the outcome as well . 
+ Cell counting by plating and by microscopy indicated a cell density of 2 10 cells/ml for a cell suspension with 7 OD600 nm of 0.5 ( exponential growth phase ; data not shown ) . 
+ Taking into account that S. solfataricus is a haploid species characterized by a long G2 cell cycle phase ( 53,54 ) , most cells are expected to have two chromosomal copies , and a 50 ml culture at an OD600 nm of 0.5 is expected to harbour 1 -- 2 10 copies of each 9 genomic binding site . 
+ To determine the minimal amount of cells required for ChIP-based DNA enrichment with Ss-LrpB-speciﬁc nanobodies , a preliminary immunoprecipitation assay was performed with Nb9 ( Figure 4D ) . 
+ Here , different amounts of in vitro prepared cross-linked Ss-LrpB -- DNA complexes , containing the Ss-lrpB operator , were added to a constant amount of cross-linked S. solfataricus cells . 
+ The mixture was subjected to ChIP by Nb9 . 
+ The enrichment of the Ss-lrpB operator , compared with input DNA and normalized to E. coli reference DNA that was added to all samples before sonication , in the immunoprecipitated DNA was analysed by qPCR . 
+ Likewise , an unrelated genomic region , not bound by Ss-LrpB , was analysed as negative control . 
+ No ChIP enrichment was observed using cells from 50 ml culture ( Figure 4D ) . 
+ In contrast , after spiking , the samples with different amounts of cross-linked Ss-LrpB -- DNA complexes , enrichments exceeding a 4-fold ratio ( log2 value of 2 ) , were observed . 
+ Parallel ChIP with control nanobody NbX showed no enrichment ( data not shown ) . 
+ These experiments suggest 9 that at least 2 -- 3 10 speciﬁc complexes need to be present for detection by qPCR . 
+ Based on this result , 200 ml cultures , corresponding to 4 10 cells , were 9 used in subsequent ChIP-chip experiments . 
+ Comparative analysis of ChIP performance of nanobodies using predeﬁned DNA targets
+ Although Nb9 is the most promising nanobody for ChIP in terms of afﬁnity and epitope location , we compared the performance of the three Ss-LrpB-speciﬁc nanobodies by evaluating nanobody - mediated enrichment of known Ss-LrpB binding sites by qPCR and ChIP-chip . 
+ To avoid variability introduced by ChIP , the same ChIP DNA was used for both qPCR and ChIP-chip . 
+ In the ﬁrst approach , enrichment of known target regions wa analysed as the evaluation criterion ( Figure 5 ) . 
+ A nontarget genomic region and mock immunoprecipitation with the nanobody NbX were used as negative controls . 
+ By using qPCR as readout , signiﬁcant enrichment was observed with negative controls in ChIP DNA , as compared with the input DNA ( log2 values between 1 and 3 ) . 
+ Although the whole-genome ampli-ﬁcation protocol might result in a minimal ampliﬁcation bias ( 16 ) , we observed a bias towards more efﬁcient ampliﬁcation of longer DNA molecules ( data not shown ) , probably because longer DNA fragments might anneal to more primers yielding a larger number of ampliﬁcation products . 
+ This bias possibly explains the observed background enrichment , as the molecular weight of the reference E. coli DNA is signiﬁcantly lower than the average molecular weight of the chromatin DNA . 
+ Therefore , ampliﬁcation could cause a higher ratio of chromatin DNA/reference DNA in the ChIP DNA as compared with the unampliﬁed input DNA , irrespective of immuno-enrichment ( Figure 5A ) . 
+ Nevertheless , this bias does not affect the assessment of the ChIP enrichments , as they were calculated based on the relative fold enrichment and were compared with the ChIP enrichment of the negative control NbX . 
+ The experiment with the Nb9 resulted in 500 - and 33-fold enrichment of the Ss-lrpB and porDAB operators DNAs , respectively ( Figure 5A ) . 
+ This difference might reﬂect the difference in the binding afﬁnities of Ss-LrpB for the two operators . 
+ The use of Nb1 and Nb11 enriched the target Ss-lrpB operator 47 - and 62-fold , respectively . 
+ Furthermore , the Nb1 and Nb11 enrichment of porDAB operator DNA failed to exceed the background levels . 
+ Next , genome-wide ChIP-chip experiments were performed with DNA prepared with each of the nanobodies , and raw log2 fold-enrichment values were compared for the genomic regions known to bind Ss-LrpB ( Figure 5B ) . 
+ Although the sensitivity of this assay is lower than qPCR , the trends of the peaks conﬁrm th relative enrichment ratios observed with qPCR . 
+ For the porDAB operator region , absolute log2 values were somewhat higher , but the ChIP curve obtained with control nanobody NbX coincided with those of Nb1 and Nb11 ( data not shown ) . 
+ The ChIP-chip derived binding peaks for the known autoregulatory binding sites exhibit an unexpected shape , centred over the coding part of the gene rather than over the operator region ( Figure 5B ) . 
+ This observation prompted us to re-investigate autoregulatory binding of Ss-LrpB , and indeed , two additional potential binding sites were predicted in silico in the open reading frame ( ORF ) sequence ( Figure 6A ) . 
+ These sites , tentatively called Box4 and Box5 , are located at the 30-end of the ORF with a spacing of 26 bp . 
+ Based on sequence similarity with the Ss-LrpB consensus sequence ( 56 ) , both sites are expected to be low-afﬁnity sites ( Figure 6B ) . 
+ To further investigate possible Ss-LrpB binding within the ORF , EMSAs were performed with a fragment encompassing the promoter region only , both the promoter region and the coding region and with a fragment spanning the coding region only ( Figure 6C and D ) . 
+ In the former case , three complexes ( C1 -- C3 ) are formed , whereas a fourth complex ( C4 ) that migrates slower than the other three complexes ( C1 -- C3 ) is clearly present when the DNA fragment comprises both the promoter and the ORF . 
+ In contrast to Ss-LrpB complexes with the fragment containing the promoter region-only , the presence of the third complex ( C3 ) is seriously reduced , obviously in favour of forming a new complex , C4 . 
+ This suggests a cooperative binding to additional binding sites within the ORF . 
+ Supplementary DNA deformations ( looping ) and a higher protein stoichiometry may explain the signiﬁcant reduction in the relative mobility of complex C4 . 
+ Furthermore , low-afﬁnity binding to the ORF fragment is inferred ( Figure 6D ) , as two nucleoprotein complexes are detected in these EMSA ( C1 and C2 ) , although they result in smearing , reﬂecting binding instability . 
+ In conjunction , we provide strong evidence both in vivo and in vitro that Ss-LrpB binds two additional binding sites in the Ss-lrpB ORF , located 392 bp downstream of Box1 and oriented on the same side of the DNA helix ( with a centre-to-centre distance of four helical turns ) 
+ Thus far we analysed the ChIP enrichment of two highafﬁnity Ss-LrpB targets . 
+ Two other known Ss-LrpB targets , the operator regions of Sso2126 and Sso2127 , bind Ss-LrpB at a single site in vitro ( 34 ) , and Ss-LrpB indeed exerts a weak activation effect on gene expression of these targets . 
+ After inspection of the ChIP-chip binding proﬁles , recorded in the growth conditions in which the expression of Sso2126 and Sso2127 genes was analysed , none of the nanobodies enriched these sequences . 
+ Given the weak regulatory effect of Ss-LrpB on these targets under the growth conditions used , the Ss-LrpB binding afﬁnity for these sequences might be low , or the Ss-LrpB is only binding to these recognition sites in a subpopulation of cells , possibly because of the effect of cofactors . 
+ Comparative analysis of ChIP performance of nanobodies using genome-wide data
+ In an alternative approach , genome-wide ChIP-chip binding proﬁles were evaluated to assess the performance of different nanobodies ( Figure 7 ) . 
+ Binding patterns obtained with Nb1 and Nb11 almost completely overlap with the patterns obtained with control NbX and , consequently , fail to reveal any novel potential Ss-LrpB binding sites ( Figure 7A , ﬁrst and third panel ) . 
+ The two sites identiﬁed by Nb11 at cut-off of 1.0 and the site identiﬁed both by Nb11 and Nb1 at cut-off of 0.8 ( Figure 7B ) are considered as false-positive sites because the log2 enrichment of the negative control NbX corresponding to these sites are 0.93 , 0.82 and 0.79 , respectively . 
+ The mean log2 enrichment over the whole genome by NbX is 0.22 . 
+ In contrast , the binding proﬁle obtained with Nb9 showed signiﬁcant novel Ss-LrpB binding regions throughout the entire genome , besides the previously known Ss-LrpB target sites ( Figure 7A , middle panel ) . 
+ Depending on the signiﬁcance threshold , ChIP-chip analysis using Nb9 revealed between 36 ( cutoff = 2-fold or 1.0 log2-fold enrichment ) and 181 ( cutoff = 1.5-fold or 0.6 log2-fold enrichment ) novel putative Ss-LrpB binding sites ( Figure 7B ) . 
+ The ChIP-chip signals of most of the newly discovered potential binding sites , called ChIP-enriched regions ( chers ) , were higher than that of the Ss-lrpB operator region . 
+ To further validate the validity of these chers to represent genuine novel Ss-LrpB genomic association sites , qPCR analysis was performed for a selection of 13 chers that scored a log2 ChIP-chip value between 1.0 and 2.0 ( Figure 8 ) . 
+ All these chers showed enrichment in qPCR , and for more than half of them , enrichment values far exceeded the background enrichment level . 
+ Therefore , the use of Nb9 leads to the discovery of novel potential targets with ChIP-chip , whereas the use of Nb1 and Nb11 does not , although qPCR analysis shows enrichment of the main target ( p/o Ss-lrpB ) by these latter Nbs ( Figure 5A ) . 
+ A statistically solid identiﬁcation and further analysis of novel Ss-LrpB targets in the context of the physiological function of the transcription factor is beyond the scope of this work and will be published elsewhere . 
+ DISCUSSION
+ Chromatin immunoprecipitation is a valuable technique , especially in combination with deep sequencing or microarray analysis to decipher gene regulatory networks . 
+ However , its success is largely dependent on the quality of the antibodies ( i.e. speciﬁcity and afﬁnity for its cognate antigen ) ( 5,11,15,16 ) . 
+ Cross-reactivity of the antibody with other non-cognate antigens is an important source of high background signals and false-positive outcomes in genome-wide ChIP assays . 
+ A study with antibodies directed against modiﬁed histones has demonstrated a high level of speciﬁcity problems , as > 20 % of a panel of tested antibodies , including those with a ` ChIP-grade ' label , were shown to fail in ChIP experiments ( 18 ) . 
+ As argued in the ` Introduction ' section , recombinant antibodies constitute an interesting and renewable source of monospeciﬁc antibodies for various applications including ChIP . 
+ What is the problem with pAbs and mAbs in ChIP ? 
+ The polyclonal antibody preparations consist of a mixture of different antibodies , each with a different epitope recognition mode . 
+ Hence , it can be argued that pAbs are to be preferred over mAbs because of lower incidences of epitope masking in the cross-linked chromatin ( 17 ) . 
+ However , the pAbs are obviously less suitable for ChIP in terms of speciﬁcity ( 18,19 ) , and their use increases the risk of association to non-cognate antigens , thus crossreaction . 
+ Furthermore , the speciﬁcity of pAbs varies from batch to batch , necessitating a speciﬁcity analysis for each preparation ( 19 ) . 
+ With mAbs , which is renewable antibody source , most of these problems of non-cognate antigen binding can be avoided , and the antibody that performs best in terms of speciﬁcity and ChIP-efﬁciency can be selected and used repeatedly and reproducibly . 
+ However , as the mAb recognizes , in principle , only one epitope structure , this epitope may be masked during DNA binding , or within the chromatin architecture when interaction occurs with other transcription factors , or by ﬁxation during cross-linking . 
+ The problem of epitope masking can be avoided by careful design of the immunization and antibody selection protocols . 
+ For instance , the use of cross-linked DNA -- protein complexes to screen the mAbs from hybridomas should yield antibodies with greater chance of success in ChIP . 
+ However , this is rarely done . 
+ Finally , irrespective of whether polyclonal or monoclonal antibodies are used , antibodies are complex molecules comprising an Fc part that is recognized by multiple effector molecules , and thus forms a possible source of multiple unwanted binding events in ChIP . 
+ The latter complication is expected to be absent with antibody fragments , such as scFv and nanobodies , which lack the Fc part . 
+ Nanobodies are recombinant single-domain antigen-binding entities derived from unique heavy chain only antibodies naturally occurring in camelids ( 26 ) . 
+ Sharks also possess such heavy chain antibodies , referred to as IgNARs . 
+ However , IgNARs are more ancestral antibodies compared with the camel variant ( 57 ) , and the immunization of sharks might be rather complicated . 
+ The immunization of camelid 
+ ( camel , dromedary , alpaca and llama ) is more practical : these animals are routinely vaccinated in farms with optimized adjuvants , and we shortened the immunization time to 6 weeks . 
+ In addition , the cloning of the nanobody genes form the peripheral blood B-cells of the immune animal and the subsequent identiﬁcation of recombinant , antigen-speciﬁc nanobodies after phage display became indeed a fast and straightforward technology ( 27,30,58 ) . 
+ Moreover , nanobodies are well expressed in microbial systems , and with their small size ( < MW 15 000 ) , high robustness and high speciﬁcity for their cognate antigen they are versatile . 
+ Nanobodies seem to suffer minimally from non-speciﬁc antigen capturing in the context of complex proteomes as illustrated here ( Figure 2 ) 
+ This seems to be a general property of nanobodies , as they have already been used successfully as highly speciﬁc probes in antigen capturing and intracellular imaging ( 59,60 ) . 
+ Here , we have demonstrated the successful use of an Ss-LrpB-speciﬁc nanobody ( Nb9 ) in ChIP in S. solfataricus . 
+ A high-afﬁnity interaction between the nanobody and its cognate antigen [ KD in the nM range as routinely observed ( 28 ) ] warrants a speciﬁc and efﬁcient immunoprecipitation of the target nucleoprotein complex from the chromatin . 
+ However , it is clear that the exact epitope recognized by the antibody is also of crucial importance , and nanobodies are no exception to this rule . 
+ Indeed the nanobodies , like mAbs , need to be carefully screened . 
+ This is illustrated with two other Ss-LrpB speciﬁc nanobodies ( Nb1 and Nb11 ) with similar good afﬁnity characteristics as the Nb9 ( i.e. KD in low nM range ) but targeting a different epitope . 
+ These two Ss-LrpB-speciﬁc nanobodies failed in ChIP , as their antigen binding provokes a clear dissociation of the SsLrpB from its DNA ( Figure 3 ) . 
+ Hence , the epitope should be preferentially located not only outside the DNA-binding domain of the protein but also outside the regions used to interact with ( other ) partner chromatin proteins , and these are not always known in advance . 
+ It is possible to increase the chances to retrieve ` ChIP-able ' nanobodies by selecting during phage display pannings on truncated protein constructs lacking the DNA-binding domain as done here or by selecting on cross-linked DNA -- protein complexes . 
+ Finally , the use of nanobodies has the advantage that the vast majority of them are directed to conformational epitopes , which increases the speciﬁcity and decreases the background and false-positive signal after immunoprecipitation . 
+ The chance is indeed higher that a binder to a linear epitope also interacts with a mimetic peptide . 
+ The weakness of the nanobody - based ChIP technology is that a speciﬁc nanobody needs to be identiﬁed for each target . 
+ This can be avoided by using an epitope-tagging approach , where a unique peptide tag ( e.g. GFP , hemagglutinin , GST , myc , FLAG ) for which ChIP-able antibodies are available is knocked-in in the target gene , preferentially by homologous recombination ( 14 ) . 
+ Such tagging workﬂow is available , for example , in model organisms Saccharomyces cerevisae or E. coli ( 61 ) and the halophilic archaeon Halobacterium salinarum ( 14 ) . 
+ For those systems where a GFP has been introduced as a tag , a GFP binding nanobody could be used for ChIP . 
+ This GFP-speciﬁc nanobody has an excellent track record for intracellular targeting and for immune precipitation from cells expressing ﬂuorescent DNA-binding proteins as well ( 58,62,63 ) . 
+ Although the homologous recombination of tagged genes replacing the endogenous genes avoids the overexpression of recombinant proteins that are naturally of low abundance within the cell , the presence of an unnatural C - or Nterminal tag at the target protein might lead to complications , such as an induced loss of function by mislocalization , or multimerization and aggregation of the GFP-tagged protein . 
+ Therefore , it is probably safer and more relevant to avoid the strategy of tagged gene product . 
+ In addition , for higher eukaryotes and many ( extremophilic ) archaeal organisms for which genetic tools have not been developed yet or only work in the hands of specialists , homologous recombination may be less practicable . 
+ The advantage of the method propose here is that it is applicable to any organism . 
+ We , therefore , prefer the standard ChIP technology using dedicated antibodies , where mAbs have been substituted by nanobodies . 
+ As aforementioned , the generation of antigen-speciﬁc nanobodies is not a bottleneck for high-throughput ChIP experiments , as they are straightforward to generate in a short time . 
+ We use a fast immunization scheme with multiple antigens in one camel or llama . 
+ The following library construction and identiﬁcation of antigen-speciﬁc nanobodies requires only 2 weeks each . 
+ Hence , antigen-speciﬁc nanobodies against > 100 different antigens can be isolated by one researcher per year . 
+ The subsequent microarray or deep sequencing and the interpretation of the data is much more time consuming . 
+ Thus , the work presented here paves the way for a more widespread use of nanobodies in ChIP-chip and ChIP-seq approaches to analyse genome-wide binding of any desired chromatin-associated protein in any organism . 
+ Interestingly , the shape of the ChIP-chip proﬁles for the autoregulatory binding of Ss-LrpB , for which three binding sites are present in the promoter region , has led to the identiﬁcation of additional novel low-afﬁnity operator sites in the Ss-lrpB ORF . 
+ Supplementary binding of transcription factors to coding regions is not uncommon and has been previously observed for another archaeal Lrp-like transcription factor from Methanocaldococcus jannaschii called Ptr2 ( 64 ) . 
+ In this case , the additional site was located at the promoterproximal side of the gene ( position +7 ) at a reasonable distance from the main operator sites . 
+ In contrast , the auxiliary Ss-LrpB sites are located almost 400-bp downstream of the promoter Box1 . 
+ This situation is reminiscent of the E. coli Lac repressor , which binds to a site 401-bp downstream of the main operator site ( 65 ) . 
+ Simultaneous binding of the main operator O1 and the auxiliary operator O2 by a single Lac repressor tetramer induces the formation of a DNA loop and contributes to transcriptional repression . 
+ Possibly , Ss-LrpB binding to both operator regions ( upstream of the ORF and at the 0 3 - end of the ORF ) also alters the local conformation of the DNA and might even cause DNA looping . 
+ In any case , Box4 and Box5 binding is expected to contribute to autoregulation by increasing the local concentration of the transcription factor . 
+ Thus , the discovery of the novel Ss-LrpB binding sites within the Ss-LrpB ORF is a nice illustration of the capacity of nanobodies in ChIP-chip to rapidly identify novel operator sites . 
+ SUPPLEMENTARY DATA
+ Supplementary Data are available at NAR Online : Supplementary Tables 1 and 2 and Supplementary Figure 1 . 
+ ACKNOWLEDGEMENTS
+ The authors are grateful to Dr John van der Oost for the gift of pLUW632 plasmid . 
+ They thank Ningning Song and 
+ Amelia Vassart for puriﬁcation of Ss-LysM and Ss-Lrp protein , respectively . 
+ Liesbeth van Oeffelen is greatly acknowledged for assistance with the microarray data analysis . 
+ FUNDING
+ Fonds voor Wetenschappelijk Onderzoek-Vlaanderen ( to E.P. ) ; Vlaams Interuniversiair Instituut ( VIB ) ( to T.N.D. ) . 
+ Funding for open access charge : Onderzoeksfonds ( OZRGOA ) , Vrije Universiteit Brussel . 
+ Conﬂict of interest statement. None declared.
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/23470992.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/23470992.txt 0 → 100644
View file @27818a9
+ A comparison of dense transposon insertion
+ ABSTRACT 
+ Nucleic Acids Research , 2013 , Vol . 
+ 41 , No. 8 4549 -- 4564 doi :10.1093 / nar/gkt148 
+ Salmonella Typhi and Typhimurium diverged only 50 000 years ago , yet have very different host ranges and pathogenicity . 
+ Despite the availability of multiple whole-genome sequences , the genetic differences that have driven these changes in phenotype are only beginning to be understood . 
+ In this study , we use transposon-directed insertionsite sequencing to probe differences in gene requirements for competitive growth in rich media between these two closely related serovars . 
+ We identify a conserved core of 281 genes that are required for growth in both serovars , 228 of which are essential in Escherichia coli . 
+ We are able to identify active prophage elements through the requirement for their repressors . 
+ We also find distinct differences in requirements for genes involved in cell surface structure biogenesis and iron utilization . 
+ Finally , we demonstrate that transposon-directed insertion-site sequencing is not only applicable to the protein-coding content of the cell but also has sufficient resolution to generate hypotheses regarding the functions of non-coding RNAs ( ncRNAs ) as well . 
+ We are able to assign probable functions to a number of cis-regu-latory ncRNA elements , as well as to infer likely differences in trans-acting ncRNA regulatory networks . 
+ Lars Barquist*, Gemma C. Langridge, Daniel J. Turner, Minh-Duy Phan, A. Keith Turner, Alex Bateman, Julian Parkhill, John Wain* and Paul P. Gardner
+ INTRODUCTION
+ Salmonella enterica subspecies enterica serovars Typhi ( S. Typhi ) and Typhimurium ( S. Typhimurium ) are important human pathogens with distinctly different life-styles . 
+ S. Typhi is host-restricted to humans and causes typhoid fever . 
+ This potentially fatal systemic illness affects at least 21 million people annually , primarily in developing countries ( 1 -- 3 ) and is capable of colonizing the gall bladder creating asymptomatic carriers ; such individuals are the primary source of this human restricted infection , exempliﬁed by the case of ` Typhoid Mary ' ( 4 ) . 
+ S. Typhimurium , conversely , is a generalist , infecting a wide range of mammals and birds in addition to being a leading cause of foodborne gastroenteritis in human populations . 
+ Control of S. Typhimurium infection in livestock destined for the human food chain is of great economic importance , particularly in swine and cattle ( 5,6 ) . 
+ Additionally , S. Typhimurium causes an invasive disease in mice , which has been used extensively as a model for pathogenicity in general and human typhoid fever specifically ( 7 ) . 
+ Despite this long history of investigation , the genomic factors that contribute to these differences in lifestyle remain unclear . 
+ More than 85 % of predicted coding sequences are conserved between the two serovars in sequenced genomes of multiple strains ( 8 -- 11 ) . 
+ The horizontal acquisition of both plasmids and pathogenicity islands during the evolution of the salmonellae is believed to have impacted upon their disease potential . 
+ A 100 kb plasmid , encoding the spv ( Salmonella plasmid virulence ) genes , is found in some S. Typhimurium strains and contributes signiﬁcantly towards systemic infection in animal models ( 12,13 ) . 
+ S. Typhi is known to have harboured IncHI1 plasmids conferring antibiotic resist-ance since the 1970 's ( 14 ) , and there is evidence that these strains present a higher bacterial load in the blood during human infection ( 15 ) . 
+ Similar plasmids have been isolated from S. Typhimurium ( 16 -- 18 ) . 
+ Salmonella patho-genicity islands ( SPI ) -1 and -2 are common to both serovars and are required for invasion of epithelial cells [ reviewed in ( 19 ) ] and survival inside macrophages respect-ively ( 20,21 ) . 
+ S. Typhi additionally incorporates SPI-7 and SPI-10 , which contain the Vi surface antigen and a number of other putative virulence factors ( 22 -- 24 ) . 
+ Acquisition of virulence determinants is not the sole explanation for the differing disease phenotypes displayed in humans by S. Typhimurium and S. Typhi ; genome deg-radation is an important feature of the S. Typhi genome , in common with other host-restricted serovars such as S. Paratyphi A ( humans ) and S. Gallinarum ( chickens ) . 
+ In each of these serovars , pseudogenes account for 4 -- 7 % of the genome ( 9,25 -- 27 ) . 
+ Loss of function has occurred in a number of S. Typhi genes that have been shown to encode intestinal colonization and persistence determin-ants in S. Typhimurium ( 28 ) . 
+ Numerous sugar transport and degradation pathways have also been interrupted ( 9 ) but remain intact in S. Typhimurium , potentially underlying the restricted host niche occupied by S. Typhi . 
+ In addition to its history as a model organism for pathogenicity , S. Typhimurium has recently served as a model organism for the elucidation of non-coding RNA ( ncRNA ) function ( 29 ) . 
+ These include cis-acting switches , such as RNA-based temperature and magnesium ion sensors ( 30,31 ) , together with a host of predicted metabolite-sensing riboswitches . 
+ Additionally , a large number of trans-acting small RNAs ( sRNAs ) have been identiﬁed within the S. Typhimurium genome ( 32 ) , some with known roles in virulence ( 33 ) . 
+ These sRNAs generally control a regulon of mRNA transcripts through an antisense binding mechanism mediated by the protein Hfq in response to stress . 
+ The functions of these mol-ecules have generally been explored in either S. Typhimurium or Escherichia coli , and it is unknown how stable these functions and regulons are over evolutionary time ( 34 ) . 
+ Transposon mutagenesis has previously been used to assess the requirement of particular genes for cellular viability . 
+ The advent of next-generation sequencing has allowed simultaneous identiﬁcation of all transposon insertion sites within libraries of up to 1 million independent mutants ( 35 -- 38 ) , enabling us to answer the basic question of which genes are required for in vitro growth with extremely ﬁne resolution . 
+ By using transposon mutant libraries of this density , which in S. Typhi represents on average > 80 unique insertions per gene ( 35 ) , shorter regions of the genome can be interrogated , including ncRNAs ( 38 ) . 
+ In addition , once these libraries exist , they can be screened through various selective conditions to further reveal which functions are required for growth / survival . 
+ Using Illumina-based transposon-directed insertion-site sequencing [ TraDIS ( 35 ) ] with large mutant libraries of both S. Typhimurium and S. Typhi , we investigated whether these Salmonellae require the same protein-coding and ncRNA gene sets for competitive growth under laboratory conditions , and whether there are differences that reﬂect intrinsic differences in the pathogenic niches these bacteria inhabit . 
+ MATERIALS AND METHODS Strains
+ S. Typhimurium strain SL3261 was used to generate the large transposon mutant library , and contains a deletion relative to the parent strain SL1344 . 
+ The 2166 bp deletion ranges from 153 bp within aroA ( normally 1284 bp ) to the last 42 bp of cmk , forming two pseudogenes and deleting the intervening gene SL0916 completely . 
+ For comparison , we used our previously generated S. Typhi Ty2 transposon library ( 35 ) . 
+ Annotation
+ For S. Typhimurium strain SL3261 , we used feature annotations drawn from the SL1344 genome ( EMBL-Bank accession FQ312003 .1 ) , ignoring the deleted aroA , ycaL and cmk genes . 
+ We re-analysed our S. Typhi Ty2 transposon library with features drawn from an updated genome annotation ( EMBL-Bank accession AE014613 .1 ) . 
+ We supplemented the EMBL-Bank annotations with ncRNA annotations drawn from Rfam 10.1 ( 39 ) , Sittka et al. ( 40 ) , Chinni et al. ( 41 ) , Raghavan et al. ( 42 ) and Kröger et al. ( 32 ) . 
+ Selected protein-coding gene annotations were supplemented using the HMMER webserver ( 43 ) and Pfam ( 44 ) . 
+ Creation of S. Typhimurium transposon mutant library
+ S. Typhimurium was mutagenized using a Tn5-derived transposon as described previously ( 35 ) . 
+ Brieﬂy , the transposon was combined with the EZ-Tn5 transposase ( Epicenter , Madison , USA ) and electroporated into S. Typhimurium . 
+ Transformants were selected by plating on LB agar containing 15 mg/ml kanamycin and harvested directly from the plates following overnight incubation . 
+ A typical electroporation experiment generated a batch of between 50 000 and 150 000 individual mutants . 
+ Ten batches were pooled together to create a mutant library comprising 930 000 transposon mutants . 
+ DNA manipulations and sequencing
+ Genomic DNA was extracted from the library pool samples using tip-100 g columns and the genomic DNA buffer set from Qiagen ( Crawley , UK ) . 
+ DNA was prepared for nucleotide sequencing as described previously ( 35 ) . 
+ Before sequencing , a 22-cycle PCR was performed as previously described ( 35 ) . 
+ Sequencing took place on a single end Illumina ﬂowcell using an Illumina GAII sequencer , for 36 cycles of sequencing , using a custom sequencing primer and 2 Hybridization Buffe 
+ ( 35 ) . 
+ The custom primer was designed such that the ﬁrst 10 bp of each read was transposon sequence . 
+ Sequence analysis
+ The Illumina FASTQ sequence ﬁles were parsed for 100 % identity to the 50 10 bp of the transposon ( TAAGAGACA G ) . 
+ Sequence reads , which matched , were stripped of the transposon tag and subsequently mapped to the S. Typhimurium SL1344 or S. Typhi Ty2 chromosomes using Maq version maq-0 .6.8 ( 45 ) . 
+ Approximately 12 million sequence reads were generated from the sequencing run , which used two lanes on the Illumina ﬂowcell . 
+ Precise insertion sites were determined using the output from the Maq mapview command , which gives the ﬁrst nucleotide position to which each read mapped . 
+ The number and frequency of insertions mapping to each nucleotide in the appropriate genome was then determined . 
+ Statistical analysis of required genes
+ The number of insertion sites for any gene is dependent on its length ; therefore , the values were made comparable by dividing the number of insertion sites by the gene length , giving an ` insertion index ' for each gene . 
+ As before ( 35 ) , the distribution of insertion indices was bimodal , corres-ponding to the required ( mode at 0 ) and non-required models . 
+ We ﬁtted gamma distributions for the two modes using the R MASS library ( http://www.r-project . 
+ org ) . 
+ Log2-likelihood ratios ( LLR ) were calculated between the required and non-required models , and we called a gene required if it had an LLR of less than 2 , indicating it was at least four times more likely according to the required model than the non-required model . 
+ ` Non-required ' genes were assigned for an LLR of > 2 . 
+ Genes falling between the two thresholds were considered ` ambiguous ' for the purpose of this analysis . 
+ This proced-ure lead to genes being called as required in S. Typhimurium when their insertion index was < 0.020 and ambiguous between 0.020 and 0.027 . 
+ The equivalent cut-offs for the S. Typhi library are 0.0147 and 0.0186 , respectively . 
+ We calculated a P-value for the observed number of insertion sites per gene using a Poisson approximation with rate R = N/G where N is the number of unique insert sites ( 549 086 ) and G is the number of bases in the genome ( 4 878 012 ) . 
+ The P-value for at least X consecutive bases without an insert site is e , giving a 5 % cut-off at ( - RX ) 27 bp and a 1 % cut-off at 41 bp . 
+ For every gene g with ng , A reads observed in S. Typhi and ng , B reads observed in S. Typhimurium , we calculated the log2 fold change ratio Sg , A , B = log2 [ ( ng , A +100 ) / ( ng , B +100 ) ] . 
+ The correction of 100 reads smoothes out the high scores for genes with very low numbers of observed reads . 
+ We ﬁtted a normal model to the mode + / 2 sample standard deviations of the distribution of SA , B and calculated P-values for each gene according to the ﬁt . 
+ We considered genes with a P-value of 0.05 under the normal model to be uniquely required by one serovar . 
+ RESULTS AND DISCUSSION
+ TraDIS assay of every Salmonella Typhimurium protein-coding gene
+ Approximately 930 000 mutants of S. Typhimurium were generated using a Tn5-derived transposon . 
+ In all , 549 086 unique insertion sites were recovered from the mutant library using short-read sequencing with transposonspeciﬁc primers . 
+ This is a substantially higher density than the 371 775 insertions recovered from S. Typhi previously ( 35 ) . 
+ The S. Typhimurium library contains an average of one insertion every 9 bp or > 100 unique inserts per gene ( Figure 1 ) . 
+ The large number of unique insertion sites allowed every gene to be assayed ; assuming random insertion across the genome , a region of 41 bp without an insertion was statistically signiﬁcant ( P < 0.01 ) . 
+ As previously noted in S. Typhi , the distribution of length-normalized insertions per gene is bimodal ( see Supplementary Figure S1 ) , with one mode at 0 . 
+ We interpret genes falling in to the distribution around this mode as being required for competitive growth within a mixed population under laboratory conditions ( hereafter ` required ' ) . 
+ Of these , 57 contained no insertions whatsoever and were mostly involved in core cellular processes ( see Table 1 , Supplementary Data Set ) . 
+ There was a bias in the frequency of transposon insertion towards the origin of replication . 
+ This likely occurred as the bacteria were in exponential growth phase 
+ Protein-coding genes providing fundamental biological functions in S. Typhimurium. Genes in bold are required in S. Typhi (LLR between required and non-required models less than 2; see ‘Materials and Methods’ section). Asterisk indicates genes ambiguous in S. Typhimurium, having a LLR between 2 and 2.
+ immediately before transformation with the transposon . 
+ In this phase of growth , multiple replication forks would have been initiated , meaning genes closer to the origin were in greater copy number and hence more likely to be a target for insertion . 
+ We also observed a bias for transposon insertions in A+T rich regions , as was previously observed in the construction of an S. Typhi mutant library ( 35 ) . 
+ However , the insertion density achieved is sufﬁcient to discriminate between required and non-required genes easily . 
+ As was ﬁrst seen in S. Typhi ( 35 ) , we observed transposon insertions into genes upstream of required genes in the same operon , suggesting that most insertions do not have polar effects leading to the inactivation of downstream genes . 
+ Analysis of the S. Typhimurium mutant library allowed us to identify 353 coding sequences required for growth under laboratory conditions , and 4112 non-required coding sequences ( see Supplementary Data Set ) . 
+ We were unable to assign 65 genes to either the required or non-required category . 
+ Sixty of these genes , which we will refer to as ` ambiguous ' , had LLRs between 2 and 2 . 
+ The ﬁnal ﬁve unassigned genes had lengths < 60 bases , and they were removed from the analysis . 
+ All other genes contained enough insertions or were of sufﬁcient length to generate credible LLR scores . 
+ Thus , every gene was assayed , and we were able to draw conclusions for 98.7 % of the coding genome in a single sequencing run ( Figure 1 ) . 
+ Cross-species comparison of genes required for growth
+ Gene essentiality has previously been assayed in Salmonella using insertion-duplication mutagenesis ( 46 ) . 
+ Knuth et al. estimated 490 genes are essential to growth in clonal populations , though 36 of these have subsequently been successfully deleted ( 47 ) . 
+ Although TraDIS assays gene requirements after a brief period of competitive growth on rich media , we identify a smaller required set than Knuth et al. of 350 genes in each serovar , closer to current estimates of 300 essential genes in E. coli ( 48 ) . 
+ To demonstrate that TraDIS does identify genes known to have strong effects on growth , as well as to test our predictive power for determining gene essentiality , we compared our required gene sets in S. Typhimurium and S. Typhi to essential genes determined by systematic single-gene knockouts in the E. coli K-12 Keio collection ( 48 ) . 
+ We identiﬁed orthologous genes in the three data sets by best reciprocal FASTA hits exhibiting > 30 % sequence identity for the amino acid sequences . 
+ Required orthologous genes identiﬁed in this manner share a signiﬁcantly higher average percentage sequence identity with their E. coli counterparts than expected for a random set of orthologues , at 94 % identity as compared with 87 % for all orthologous genes . 
+ In 100 000 randomly chosen gene sets of the same size as our required set , we did not ﬁnd a single set where the average shared identity exceeded 90 % , indicating that required genes identiﬁed by TraDIS are more highly conserved at the nucleotide level than other orthologous protein-coding sequences . 
+ Baba et al. ( 48 ) have deﬁned an essentiality score for each gene in E. coli based on evidence from four experimental techniques for determining gene essentiality : targeted knockouts using - red mediated homologous recombination ( 48 ) , genetic footprinting ( 49,50 ) , large-scale chromosomal deletions ( 51 ) and transposon mutagenesis ( 52 ) . 
+ Scores range from 4 to 3 , with negative scores indicating evidence for non-essentiality and positive scores indicating evidence for essentiality . 
+ Comparing the overlap between essential gene sets in E. coli , S. Typhi and S. Typhimurium , we ﬁnd a set of 228 E. coli genes , which have a Keio essentiality score of at least 0.5 ( i.e. there is evidence for gene essentiality ; see Figure 2 ) that have TraDIS-predicted require orthologues in both S. Typhi and S. Typhimurium , constituting 85 % of E. coli genes with evidence for essentiality indicating that gene requirements are largely conserved between these genera . 
+ Including orthologous genes that are only predicted to be essential by TraDIS in S. Typhi or S. Typhimurium raises this ﬁgure to nearly 93 % . 
+ The majority of shared required genes between all three bacteria are responsible for fundamental cell processes , including cell division , transcription and translation . 
+ A number of key metabolic pathways are also represented , such as fatty acid and peptidoglycan biosynthesis ( Table 1 ) . 
+ A recent study in the alphaproteobacteria Caulobacter crescentus reported 210 shared essential genes with E. coli , despite C. crescentus sharing less than a third as many orthologous genes with E. coli as Salmonella serovars ( 38 ) . 
+ This suggests the existence of a shared core of 200 essential proteobacterial genes , with the comparatively rapid turnover of 150 -- 250 ` non-core ' lineage-speciﬁc essential genes . 
+ If we make the simplistic assumption that gene essenti-ality should be conserved between E. coli and Salmonella , we can use the overlap of our predictions with the Keio essential genes to provide an estimate of our TraDIS libraries ' accuracy for predicting that a gene will be required in a clonal population . 
+ Of the 2632 orthologous E. coli genes , which have a Keio essentiality score of less than 0.5 ( i.e. there is evidence for gene non-essentiality ) , only 33 are predicted to be required by TraDIS in both Salmonella serovars . 
+ S. Typhi contains the largest number of genes predicted by TraDIS to be required with E. coli orthologues with negative Keio essentiality scores . 
+ However , even if we assume these are all incorrect predictions of gene essentiality , this still gives a gene-wise false positive rate ( FPR ) of 2.7 % ( 81 of 2981 orthologues ) and a positive predictive value ( PPV ) of 75 % ( 247 with essentiality scores 0.5 of 328 predictions with some Keio essentiality score ) . 
+ Under these same criteria , the S. Typhimurium data set has a lower gene-wise FPR of 1.6 % ( 51 of 3122 orthologues ) and a higher PPV of 82 % ( 234 of 285 predictions as before ) , as we would expect given the library 's higher insertion density . 
+ In reality , these FPRs and PPVs are only estimates ; genes that are not essential in E. coli may become essential in the different genomic context of Salmonella serovars and vice versa , particularly in the case of S. Typhi where wide-spread pseudogene formation has eliminated potentially redundant pathways ( 26,27 ) . 
+ Additionally , TraDIS will naturally over-predict essentiality in comparison with targeted knockouts , as our library creation protocol ne-cessarily contains a short period of competitive growth between mutants during the recovery from electrotransformation and selection . 
+ As a consequence , genes that cause major growth defects , but not necessarily a complete lack of viability in clonal populations , may be reported as ` required ' . 
+ Serovar-speciﬁc genes required for growth
+ Many of the required genes present in only one serovar encoded phage repressors , for instance the cI proteins of Fels-2 / SopE and ST35 ( see Supplementary Tables S2 and S3 ) . 
+ Repressors maintain the lysogenic state of prophage , preventing transcription of early lytic genes ( 53 ) . 
+ Transposon insertions into these genes will relieve this repression and trigger the lytic cycle , resulting in cell death , and consequently mutants are not represented in the sequenced library . 
+ This again broadens the deﬁnition of ` required ' genes ; such repressors may not be required for cellular viability in the traditional sense , but once present in these particular genomes , their maintenance is required for continued viability as long as the rest of the phage remains intact . 
+ Serovars Typhimurium and Typhi both contain eight apparent large phage-derived genomic regions ( 54,55 ) . 
+ We were able to identify required repressors in all the intact lambdoid , P2-like and P22-like prophage in both genomes , including Gifsy-1 , Gifsy-2 and Fels-2 / SopE ( see Supplementary Tables S2 and S3 ) . 
+ With the exception of the SLP203 P22-like prophage in S. Typhimurium , all of these repressors lack the peptidase domain of the classical lambda repressor gene cI . 
+ This implies that the default anti-repression mechanism of Salmonella prophage may be more similar to a trans-actin mechanism recently discovered in Gifsy phage ( 56 ) than to the phage lambda repressor 's RecA-induced self-cleavage mechanism . 
+ We are also able to conﬁrm that most phage remnants and fusions contained no active repressors , with the exception of the SLP281 degenerate P2-like prophage in S. Typhimurium . 
+ This degenerate prophage contains both intact replication and integration genes , but appears to lack tail and head proteins , suggesting it may depend on another phage for production of viral particles . 
+ Both genomes also encode P4-like satellite prophage , which rely on ` helper ' phage for lytic functions and use a complex antisense-RNA based regulation mechanism for decision pathways regarding cell fate ( 57 ) using structural homologs of the IsrK ( 58 ) and C4 ncRNAs ( 59 ) , known as seqA and CI RNA in the P4 literature , respect-ively . 
+ Although the mechanism of P4 lysogenic mainten-ance is not known , the IsrK-like ncRNAs of two potentially active P4-like prophage in S. Typhi are required under TraDIS . 
+ This sequence element has previously been shown to be essential for the establishment of the P4 lysogenic state ( 60 ) , and we predict based on our observations that it may be necessary for lysogenic maintenance as well . 
+ The fact that some lambdoid prophage in S. Typhimurium encode non-coding genes structurally similar to the IsrK-C4 immunity system of P4 raises the possibility that these systems may be acting as a defense mechanism of sorts , protecting the prophage from predatory satellite phage capable of co-opting its lytic genes . 
+ In addition to repressors , 4 prophage cargo genes in S. Typhimurium and one in S. Typhi are required ( See Tables 2 and 3 ; Supplementary Tables S2 and S3 ) . 
+ The S. Typhimurium prophage cargo genes encode a PhoPQ-regulated protein , a protein predicted to be involved in natural transformation , an endodeoxyribonuclease and a hypothetical protein . 
+ The S. Typhi prophage cargo gene encodes a protein containing the DNA-binding HIRAN domain ( 62 ) believed to be involved in the repair of damaged DNA . 
+ These warrant further investigation , as they are genes that have been recently acquired and become necessary for survival in rich media . 
+ To compare differences between requirements for orthologous genes in both serovars , we calculated log-fold read ratios to eliminate genes , which were classi-ﬁed differently in S. Typhi and S. Typhimurium but did not have signiﬁcantly different read densities ( see ` Materials and Methods ' section . ) 
+ Even after this correction , 36 S. Typhimurium genes had a signiﬁcantly lower frequency of transposon insertion compared with the equivalent genes in S. Typhi ( P < 0.05 ) , including four encoding hypothetical proteins ( Table 2 ) . 
+ This indicates that these gene products play a vital role in S. Typhimurium , but not in S. Typhi when grown under laboratory conditions . 
+ A major difference between the two serovars is in the requirement for genes involved in cell wall biosynthesis ( see Figure 3 ) . 
+ A set of four genes ( SL0702 , SL0703 , SL0706 and SL0707 ) in an operonic structure putatively involved in cell wall biogenesis is required in S. Typhimurium , but not in S. Typhi . 
+ The protein encoded by SL0706 is a pseudogene in S. Typhi ( Ty2 unique ID : t2152 ) owing to a 1 bp deletion at codon 62 that causes a frameshift ( Figure 4a ) . 
+ This operon contains an additional two pseudogenes in S. Typhi ( t2154 and t2150 ) , as well as a single different pseudogene ( SL0700 ) in S. Typhimurium , indicating that this difference in gene requirements reﬂects the evolutionary adaptation of these serovars to their respective niches . 
+ Similarly , four genes ( rfbV , rfbX , rfbJ and rfbF ) within an O-antigen biosynthetic operon are required by S. Typhimurium , but not S. Typhi . 
+ There appears to have been a shufﬂing of O-antigen biosynthetic genes since the divergence between the two serovars , and rfbJ , encoding a CDP-abequose synthase , has been lost from S. Typhi altogether . 
+ These broader requirements for cell wall-associated biosynthetic and transporter genes suggest that surface structure biogenesis is of greater importance in S. Typhimurium . 
+ We also identiﬁed seven genes from the shared patho-genicity island SPI-2 that appear to contain few or no transposon insertions only in S. Typhimurium under la-boratory conditions . 
+ These genes ( spiC , sseA and ssaHIJT ) are thought to encode components of the SPI-2 type III secretion system apparatus ( T3SS ) ( 63 ) . 
+ In addition , the effector genes sseJ and sifB , whose products are secreted through the SPI-2-encoded T3SS ( 64,65 ) , also fell into the ` required ' category in S. Typhimurium alone . 
+ All of these genes display high A+T nucleotide sequence and have been previously shown ( in S. Typhimurium ) to be strongly bound by the nucleoid-associated protein H-NS , encoded by hns ( 61,66 ) . 
+ Therefore , rather than being ` required ' , it is instead possible that access for the transposon was sufﬁciently restricted that very few insertions occurred at these sites . 
+ In further support of this hypothesis , a comparison of the binding pattern of H-NS detected in studies using S. Typhimurium LT2 with the TraDIS results from the SPI-2 locus indicated that high regions of H-NS enrichment correlated well with both the ssa genes described here and with sseJ ( 61,66 ) ( see Supplementary Figure S1 ) . 
+ An earlier study also suggests that high-density DNA-binding proteins can block Mu , Tn5 and Tn10 insertion ( 67 ) ; however , a genome-wide study of the effects of H-NS binding on transposition would be necessary to conﬁrm this effect . 
+ Indeed , the generation of null S. Typhimurium mutants in sseJ and sifB , as well as many others generated at the SPI-2 locus suggest that these genes are not truly a requirement for growth in this serovar ( 65,68 -- 70 ) . 
+ Although this is a reminder that the interpretation of gene requirement needs to be made with care , the effect of H-NS on transposon insertion is not genome-wide . 
+ If this were the case , there would be an under-representation of transposon mutants in high A+T regions ( known for H-NS binding ) , which is not what we observed . 
+ In total , only 21 required genes fall into the ` hns-repressed ' category described in Navarre et al. ( 61 ) ( see Table 2 ; Supplementary Table S1 ) ; the remainder ( almost 400 ) contained sufﬁcient transposon insertions to conclude they were non-required . 
+ In addition , all SPI-1 genes that encode another T3SS and are of high A+T content were also found to be non-required . 
+ This phenomenon was not observed in S. Typhi , possibly because the strain used harbours the pHCM1 plasmid , which encodes th 
+ H-NS-like protein sfh and has been shown to affect H-NS binding ( 71,72 ) . 
+ Twenty-two S. Typhi genes had a signiﬁcantly lower frequency of transposon insertion compared with orthologues in S. Typhimurium ( P < 0.05 ) , indicating that they are required only in S. Typhi for growth under laboratory conditions ( Table 3 ) , including the fepBDGC operon . 
+ This indicates a requirement for ferric [ Fe ( III ) ] rather than ferrous [ Fe ( II ) ] iron . 
+ This can be explained by the presence of Fe ( III ) in the bloodstream , where S. Typhi can be found during typhoid fever ( 15 ) . 
+ These genes function to recover the ferric chelator enterobactin from the periplasm , acting with a number of proteins known to aid the passage of this siderophore through the outer membrane ( 73 ) . 
+ It has long been noted that aroA mutants of S. Typhi , deﬁcient in their ability to synthesize enterobactin , exhibit severe growth defects on complex media , whereas similar mutants of S. Typhimurium grow normally under the same conditions ( 74 ) , though the mechanism has not been clear . 
+ Our results suggest that this difference in growth of aroA mutants is caused by a requirement for iron uptake through the fep system in S. Typhi . 
+ During host adaptation , S. Typhi has accumulated pseudogenes in many iron transport and response systems ( 27 ) , presumably because they are not necessary for survival in the niche S. Typhi occupies in the human host , which may have led to this dependence on fep genes . 
+ In contrast , S. Typhimurium generally causes intestinal rather than systemic infection and is able to use a wider range of iron sources , including Fe ( II ) , a soluble form of iron present under anaerobic conditions such as those found in the intestine ( 75 ) . 
+ TraDIS provides resolution sufﬁcient to evaluate ncRNA contributions to ﬁtness
+ Under a Poisson approximation to the transposon insertion process , a region of 41 ( in S. Typhimurium ) or 6 bases ( in S. Typhi ) has only a 1 % probability of not containing an insertion by chance . 
+ NcRNAs tend to be considerably shorter than their protein-coding counterparts , but this gives us sufﬁcient resolution to assay most of the non-coding complement of the Salmonella genome . 
+ As a proof of principle , we performed an analysis of the best-understood class of small ncRNAs , the tRNAs . 
+ Francis Crick hypothesized that a single tRNA could recognize more than one codon through wobble recognition 
+ ( 76 ) , where a non-canonical G-U base pair is formed between the ﬁrst ( wobble ) position of the anticodon and the third nucleotide in the codon . 
+ As a result , some codons are covered by multiple tRNAs , whereas others are covered non-redundantly by a single tRNA . 
+ We expect that singleton wobble-capable tRNAs , i.e. wobble tRNAs which recognize a codon uniquely , will be required . 
+ In addition , we inferred the requirement for other tRNAs through the non-redundant coverage o their codons and used this to benchmark our ability to use TraDIS to reliably interrogate short genomic intervals . 
+ The S. Typhi and S. Typhimurium genomes encode 78 and 85 ( plus one pseudogene ) tRNAs , respectively , with 40 anticodons , as identiﬁed by tRNAscan-SE ( 77 ) . 
+ In S. Typhi , 10 of 11 singleton wobble tRNAs are predicted to be required or ambiguous , compared with 16 tRNAs below the ambiguous LLR cut-off overall ( signiﬁcant enrichment at the 0.05 level , two-tailed Fisher 's exact test P-value : 6.4e-08 ) . 
+ Similarly in S. Typhimurium , 9 of 11 singleton wobble tRNAs are required or ambiguous compared with 15 required or ambiguous tRNAs overall , again showing a signiﬁcant enrichment of required tRNAs in this subset ( Fisher 's exact test P-value : 5.2e-07 ) . 
+ The one singleton wobble tRNA , which is consistently not required in both serovars is the tRNA-Pro ( GGG ) , which occurs within a four-member codon family . 
+ It has previously been shown in S. Typhimurium that tRNA-Pro ( UGG ) can read all four proline codons in vivo owing to a cmo U34 modiﬁ-5 cation to the anticodon , obviating the need for a functional tRNA-Pro ( GGG ) ( 78 ) and making this tRNA non-required . 
+ The other non-required singleton wobble tRNA in S. Typhimurium , tRNA-Leu ( GAG ) , is similarly a member of a four-member codon family . 
+ We predict tRNA-Leu ( TAG ) may also be capable of recognizing all four leucine codons in this serovar ; such a leucine ` four-way wobble ' has been previously inferred in at least one other bacterial species ( 79,80 ) . 
+ Of the six required non-wobble tRNAs in each serovar , four are shared . 
+ These include two non-wobble singleton tRNAs covering codons uniquely , as well as a tRNA with the ATG anticodon , which is post-transcriptionally modiﬁed by the required protein mesJ/tilS to recognize the isoleucine codon ATA ( 80 ) . 
+ An additional two required tRNAs in both serovars , one shared and one with a differing anticodon , contain Gln anticodons and are part of a polycistronic tRNA operon containing other required tRNAs . 
+ This operon is conserved in E. coli with the exception of an additional tRNA-Gln at the 30 end that has been lost in the Salmonella lineage . 
+ It is possible that transposon insertions early in the operon may interfere with processing of the polycistronic transcript in to mature tRNAs . 
+ Finally , we do not observe insertions in a tRNA-Met and a tRNA-Val in S. Typhi and S. Typhimurium , respectively . 
+ Using this analysis of the tRNAs , we estimate a worst-case PPV for these short molecules ( 76 bases ) at 81 % , in line with our previous estimates for conserved protein-coding genes , and a FPR of < 4 % , higher than for protein-coding genes but still well within the typical tolerance of high-throughput experiments . 
+ This assumes that the ` required ' operonic tRNA-Glns and the serovar-speciﬁc tRNA-Met and tRNA-Val are all false positives ; it is not clear that this is in fact the case . 
+ Surveying the shared required ncRNA content of both serovars ( see Table 4 ) , we ﬁnd that the RNA components of the signal recognition particle ( SRP ) and RNaseP , two universally conserved ncRNAs , are required as expected . 
+ The SRP is an essential component of the cellular secretion machinery , whereas RnaseP is necessary for the maturation of tRNAs . 
+ We also ﬁnd a number of required known and potential cis-regulatory molecules associated with genes required for growth under laboratory conditions in both serovars . 
+ The RFN riboswitch controls ribB , a 3,4-dihydroxy-2-butanone 4-phosphate synthase involved in riboﬂavin biosynthesis , in response to ﬂavin mononucleotide concentrations ( 83 ) . 
+ Additionally , we are able to assign putative functions to a number of previously uncharacterized required noncoding transcripts through their 50 association with required genes . 
+ SroE , a 90 nt molecule discovered in an early sRNA screen ( 84 ) , is consistently located at the 50 end of the required hisS gene across its phylogenic distribution in the Enterobacteriaceae . 
+ Given this consistent association and the function of HisS as a histidyl-tRNA synthetase , we hypothesize that this region may act in a manner similar to a T-box leader , inducing or repressing expression in response to tRNA-His levels . 
+ The thrU leader sequence , recently discovered in a deep-sequencing screen of E. coli ( 42 ) , appears to regulate a polycistronic operon of required singleton wobble tRNAs . 
+ Three add-itional required cis-regulatory elements , t44 , S15 and StyR-8 , are associated with required ribosomal proteins , highlighting the central role ncRNA elements play in regulating fundamental cellular processes . 
+ The sRNAs required for competitive growth
+ Inferring functions for potential trans-acting ncRNA mol-ecules , such as anti-sense binding sRNAs , from requirement patterns alone is more difﬁcult than for cis-acting elements , as we can not rely on adjacent genes to provide any information . 
+ It is also important to keep in mind that TraDIS assays requirements after a brief competition within a large library of mutants on permissive media . 
+ This may be particularly important when surveying the bacterial sRNAs , which are known to participate in responses to stress ( 29 ) . 
+ This is demonstrated by two sRNAs involved in the sEmediated extracytoplamic stress response , RybB and RseX , both of which can be successfully knocked out in S. Typhimurium ( 101 ) . 
+ In S. Typhi , rpoE is required , as it also is in E. coli ( 48,102 ) . 
+ However , in S. Typhimurium , rpoE tolerates a heavy insertion load , implying that sE mutants are not disadvantaged in competitive growth . 
+ In S. Typhimurium , the sRNA RseX is required . 
+ Overexpression of RseX has previously been E shown to compensate for s essentiality in E. coli by degrading ompA and ompC transcripts ( 95 ) . 
+ This E suggests that RseX may also be short-circuiting the s stress response network in S. Typhimurium ( Figure 4 ) . 
+ To our knowledge , this is the ﬁrst evidence of a native ( i.e. not experimentally induced ) activity of RseX . 
+ S. Typhi on the other hand requires sE along with its activating proteases RseP and DegS and anchoring protein RseA , as well as the sE-dependent sRNA RybB , which also regulates OmpA and OmpC in S. Typhimurium , along with a host of other OMPs ( 103 ) . 
+ It is unclear why the sE response is required in S. Typhi , but not S. Typhimurium , though it may partially be due to the major differences in the cell wall and oute 
+ CONCLUSION
+ The extremely high resolution of TraDIS has allowed us to assay gene requirements in two very closely related Salmonellae with different host ranges . 
+ We found , under laboratory conditions , that 58 genes present in both serovars were required in only one , suggesting that identical gene products do not necessarily have the same phenotypic effects in the two different serovar backgrounds . 
+ Many of these genes occur in genomic regions or metabolic systems , which contain pseudogenes and/or have undergone reorganization since the divergence of S. Typhi and S. Typhimurium , demonstrating the complementarity of TraDIS and phylogenetic analysis . 
+ These changes may , in part , explain differences observed in the pathogenicity and host speciﬁcity of these two serovars . 
+ In particular , S. Typhimurium showed a requirement for cell surface structure biosynthesis genes ; this may be partially explained by the fact that S. Typhi expresses the Vi-antigen , which masks the cell surface , though these genes are not required for survival in our assay . 
+ S. Typhi on the other hand has a requirement for iron uptake through the fep system , which enables ferric enterobactin transport . 
+ This dependence on enterobactin suggests that S. Typhi is highly adapted to the iron-scarce environments it encounters during systemic infections . 
+ Furthermore , this appears to represent a single point of failure in the S. Typhi iron utilization pathways and may 
+ Of the 4500 protein-coding genes present in each serovar , only 350 were sufﬁciently depleted in transposon insertions to be classiﬁed as required for growth in rich media . 
+ This means that > 92 % of the coding genome has sufﬁcient insertion density to be queried in future assays . 
+ Dense transposon mutagenesis libraries have been used to assay gene requirements under conditions relevant for infection , including S. Typhi survival in bile ( 35 ) , Mycobacterium tuberculosis catabolism of chol-esterol ( 108 ) , drug resistance in Pseudomonas aeruginosa ( 109 ) and Haemophilus inﬂuenzae survival in the lung ( 110 ) . 
+ We expect that parallel experiments querying gene requirements under the same conditions in both serovars examined in this study will yield further insights in to the differences in the infective process between Typhi and Typhimurium and ultimately the pathways that underlie host-adaptation . 
+ Both serovars possess substantial complements of horizontally acquired DNA . 
+ We have been able to use TraDIS to assay these recently acquired sequences . 
+ In particular , we have been able to identify , on a chromosome wide scale , active prophage through the requirement for their repressors . 
+ The P4 phage uses an RNA-based system to make decisions regarding cell fate , and structurally similar systems are used by P1 , P7 and N15 phage ( 111,112 ) . 
+ C4-like transcripts have been regarded as the primary repressor of lytic functions , though the IsrK-like sequence is known to be essential to the establishment of lysogeny in P4 and is transcribed in at least two phage types ( 60,112 ) . 
+ Our observations in S. Typhi suggest an important role for the IsrK-like sequence in maintenance of the lysogenic state in P4-like phage , though the mechanism remains unclear . 
+ Recent advances in high-throughput sequencing have greatly enhanced our ability to detect novel transcripts , such as ncRNAs and short open reading frames ( sORFs ) . 
+ Our ability to identify these transcripts now far out-strips our ability to experimentally characterize these sequences . 
+ There have been previous efforts at high-throughput characterization of bacterial sRNAs and sORFs in enteric bacteria ; however , these have relied on labour-intensive directed knockout libraries ( 47,113 ) . 
+ Here , we have demonstrated that TraDIS has sufﬁcient resolution to reliably query genomic regions as short as 60 bases , in agreement with a recent highthroughput transposon mutagenesis study in the alphaproteobacteria Caulobacter crescentus ( 38 ) . 
+ Our method has the major advantage that library construction does not rely on genome annotation , and newly discovered elements can be surveyed with no further laboratory work . 
+ We have been able to assign putative functions to a number of ncRNAs using TraDIS though consideration of their genomic and experimental context . 
+ In addition , ncRNA characterization generally is done in model organisms like E. coli or S. Typhimurium , and it is unclear how stable ncRNA regulatory networks are over evolutionary time . 
+ By assaying two serovars of Salmonella with the same method under the same conditions , we have seen hints that there may be differences in sRNA regulator networks between S. Typhi and S. Typhimurium . 
+ In particular , we have found that under the same experimental sE conditions , S. Typhi appears to rely on the stress response pathway , whereas S. Typhimurium does not ; it is tempting to speculate that this difference in stress response is mediated by the observed difference in requirement for two sRNAs , RybB and RseX . 
+ We believe that this combination of high-throughput transposon mutagenesis with a careful consideration of the systems context of individual genes provides a powerful tool for the generation of functional hypotheses . 
+ We anticipate that the construction of TraDIS libraries in additional organisms , as well as the passing of these libraries through relevant experimental conditions , will provide further insights into the function of bacterial ncRNAs in addition to the protein-coding gene complement . 
+ ERA000097, ERA000217.
+ SUPPLEMENTARY DATA
+ Supplementary Data are available at NAR Online : Supplementary Tables 1 -- 3 , Supplementary Figures 1 -- 2 and Supplementary Data Set . 
+ ACKNOWLEDGEMENTS
+ The authors would like to thank Leopold Parts for assist-ance with the statistical analysis of required genes under laboratory conditions , Derek Pickard for discussions of phage biology and Amy Cain for comments on the manu-script . 
+ The S. Typhimurium nucleotide sequencing data have been deposited in the European Short Read Archive under accession number ERA000217 . 
+ The S. Typhi data can be accessed at ERA000097 . 
+ FUNDING
+ Wellcome Trust [ WT076964 , WT079643 and WT098051 ] ; Medical Research Council . 
+ Funding for open access charge : Wellcome Trust . 
+ REFERENCES
+ 7 . 
+ Santos , R.L. , Zhang , S. , Tsolis , R.M. , Kingsley , R.A. , Adams , L.G. and Baumler , A.J. ( 2001 ) Animal models of Salmonella infections : enteritis versus typhoid fever . 
+ Microbes Infect. , 3 , 1335 -- 1344 . 
+ 8 . 
+ McClelland , M. , Sanderson , K.E. , Spieth , J. , Clifton , S.W. , Latreille , P. , Courtney , L. , Porwollik , S. , Ali , J. , Dante , M. , Du , F. et al. ( 2001 ) Complete genome sequence of Salmonella enterica serovar Typhimurium LT2 . 
+ Nature , 413 , 852 -- 856 . 
+ 9 . 
+ Parkhill , J. , Dougan , G. , James , K.D. , Thomson , N.R. , Pickard , D. , Wain , J. , Churcher , C. , Mungall , K.L. , Bentley , S.D. , Holden , M.T. et al. ( 2001 ) Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18 . 
+ Nature , 413 , 848 -- 852 . 
+ 10 . 
+ Holt , K.E. , Parkhill , J. , Mazzoni , C.J. , Roumagnac , P. , Weill , F.X. , Goodhead , I. , Rance , R. , Baker , S. , Maskell , D.J. , Wain , J. et al. ( 2008 ) High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi . 
+ Nat . 
+ Genet. , 40 , 987 -- 993 . 
+ 11 . 
+ Deng , W. , Liou , S.R. , Plunkett , G. 3rd , Mayhew , G.F. , Rose , D.J. , Burland , V. , Kodoyianni , V. , Schwartz , D.C. and Blattner , F.R. ( 2003 ) Comparative genomics of Salmonella enterica serovar Typhi strains Ty2 and CT18 . 
+ J. Bacteriol. , 185 , 2330 -- 2337 . 
+ 12 . 
+ Gulig , P.A. and Curtiss , R. 3rd ( 1987 ) Plasmid-associated virulence of Salmonella Typhimurium . 
+ Infect . 
+ Immun. , 55 , 2891 -- 2901 . 
+ 13 . 
+ Gulig , P.A. and Doyle , T.J. ( 1993 ) The Salmonella Typhimurium virulence plasmid increases the growth rate of salmonellae in mice . 
+ Infect . 
+ Immun. , 61 , 504 -- 511 . 
+ 14 . 
+ Phan , M.D. , Kidgell , C. , Nair , S. , Holt , K.E. , Turner , A.K. , Hinds , J. , Butcher , P. , Cooke , F.J. , Thomson , N.R. , Titball , R. et al. ( 2009 ) Variation in Salmonella enterica serovar Typhi IncHI1 plasmids during the global spread of resistant typhoid fever . 
+ Antimicrob . 
+ Agents Chemother. , 53 , 716 -- 727 . 
+ 15 . 
+ Wain , J. , Diep , T.S. , Ho , V.A. , Walsh , A.M. , Nguyen , T.T. , Parry , C.M. and White , N.J. ( 1998 ) Quantitation of bacteria in blood of typhoid fever patients and relationship between counts and clinical features , transmissibility , and antibiotic resistance . 
+ J. Clin . 
+ Microbiol. , 36 , 1683 -- 1687 . 
+ 16 . 
+ Datta , N. ( 1962 ) Transmissible drug resistance in an epidemic strain of Salmonella Typhimurium . 
+ J. Hyg . 
+ ( Lond . ) 
+ , 60 , 301 -- 310 . 
+ 17 . 
+ Holt , K.E. , Thomson , N.R. , Wain , J. , Phan , M.D. , Nair , S. , Hasan , R. , Bhutta , Z.A. , Quail , M.A. , Norbertczak , H. , Walker , D. et al. ( 2007 ) Multidrug-resistant Salmonella enterica serovar Paratyphi A harbors IncHI1 plasmids similar to those found in serovar Typhi . 
+ J. Bacteriol. , 189 , 4257 -- 4264 . 
+ 18 . 
+ Cain , A.K. and Hall , R.M. ( 2012 ) Evolution of a multiple antibiotic resistance region in IncHI1 plasmids : reshaping resistance regions in situ . 
+ J. Antimicrob . 
+ Chemother. , 67 , 2848 -- 2853 . 
+ 19 . 
+ Darwin , K.H. and Miller , V.L. ( 1999 ) Molecular basis of the interaction of Salmonella with the intestinal mucosa . 
+ Clin . 
+ Microbiol . 
+ Rev. , 12 , 405 -- 428 . 
+ 20 . 
+ Ochman , H. , Soncini , F.C. , Solomon , F. and Groisman , E.A. ( 1996 ) Identiﬁcation of a pathogenicity island required for Salmonella survival in host cells . 
+ Proc . 
+ Natl Acad . 
+ Sci . 
+ USA , 93 , 7800 -- 7804 . 
+ 21 . 
+ Shea , J.E. , Hensel , M. , Gleeson , C. and Holden , D.W. ( 1996 ) Identiﬁcation of a virulence locus encoding a second type III secretion system in Salmonella Typhimurium . 
+ Proc . 
+ Natl Acad . 
+ Sci . 
+ USA , 93 , 2593 -- 2597 . 
+ 22 . 
+ Pickard , D. , Wain , J. , Baker , S. , Line , A. , Chohan , S. , Fookes , M. , Barron , A. , Gaora , P.O. , Chabalgoity , J.A. , Thanky , N. et al. ( 2003 ) Composition , acquisition , and distribution of the Vi exopolysaccharide-encoding Salmonella enterica pathogenicity island SPI-7 . 
+ J. Bacteriol. , 185 , 5055 -- 5065 . 
+ 23 . 
+ Seth-Smith , H.M. ( 2008 ) SPI-7 : Salmonella 's Vi-encoding Pathogenicity Island . 
+ J. Infect . 
+ Dev . 
+ Ctries. , 2 , 267 -- 271 . 
+ 24 . 
+ Townsend , S.M. , Kramer , N.E. , Edwards , R. , Baker , S. , Hamlin , N. , Simmonds , M. , Stevens , K. , Maloy , S. , Parkhill , J. , Dougan , G. et al. ( 2001 ) Salmonella enterica serovar Typhi possesses a unique repertoire of ﬁmbrial gene sequences . 
+ Infect . 
+ Immun. , 69 , 2894 -- 2901 . 
+ 25 . 
+ Thomson , N.R. , Clayton , D.J. , Windhorst , D. , Vernikos , G. , Davidson , S. , Churcher , C. , Quail , M.A. , Stevens , M. , Jones , M.A. , Watson , M. et al. ( 2008 ) Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/9 provides insights into evolutionary and host adaptation pathways . 
+ Genome Res. , 18 , 1624 -- 1637 . 
+ 26 . 
+ Holt , K.E. , Thomson , N.R. , Wain , J. , Langridge , G.C. , Hasan , R. , Bhutta , Z.A. , Quail , M.A. , Norbertczak , H. , Walker , D. , Simmonds , M. et al. ( 2009 ) Pseudogene accumulation in the evolutionary histories of Salmonella enterica serovars Paratyphi A and Typhi . 
+ BMC Genomics , 10 , 36 . 
+ 27 . 
+ McClelland , M. , Sanderson , K.E. , Clifton , S.W. , Latreille , P. , Porwollik , S. , Sabo , A. , Meyer , R. , Bieri , T. , Ozersky , P. , McLellan , M. et al. ( 2004 ) Comparison of genome degradation in Paratyphi A and Typhi , human-restricted serovars of Salmonella enterica that cause typhoid . 
+ Nat . 
+ Genet. , 36 , 1268 -- 1274 . 
+ 28 . 
+ Kingsley , R.A. , Humphries , A.D. , Weening , E.H. , De Zoete , M.R. , Winter , S. , Papaconstantinopoulou , A. , Dougan , G. and Baumler , A.J. ( 2003 ) Molecular and phenotypic analysis of the CS54 island of Salmonella enterica serotype Typhimurium : identiﬁcation of intestinal colonization and persistence determinants . 
+ Infect . 
+ Immun. , 71 , 629 -- 640 . 
+ 29 . 
+ Vogel , J. ( 2009 ) A rough guide to the non-coding RNA world of Salmonella . 
+ Mol . 
+ Microbiol. , 71 , 1 -- 11 . 
+ 30 . 
+ Waldminghaus , T. , Heidrich , N. , Brantl , S. and Narberhaus , F. ( 2007 ) FourU : a novel type of RNA thermometer in Salmonella . 
+ Mol . 
+ Microbiol. , 65 , 413 -- 424 . 
+ 31 . 
+ Cromie , M.J. , Shi , Y. , Latiﬁ , T. and Groisman , E.A. ( 2006 ) An RNA sensor for intracellular Mg ( 2 + ) . 
+ Cell , 125 , 71 -- 84 . 
+ 32 . 
+ Kroger , C. , Dillon , S.C. , Cameron , A.D. , Papenfort , K. , Sivasankaran , S.K. , Hokamp , K. , Chao , Y. , Sittka , A. , Hebrard , M. , Handler , K. et al. ( 2012 ) The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium . 
+ Proc . 
+ Natl Acad . 
+ Sci . 
+ USA , 109 , E1277 -- E1286 . 
+ 33 . 
+ Hebrard , M. , Kroger , C. , Srikumar , S. , Colgan , A. , Handler , K. and Hinton , J. ( 2012 ) sRNAs and the virulence of Salmonella enterica serovar Typhimurium . 
+ RNA Biol. , 9 , 437 -- 445 . 
+ 34 . 
+ Richter , A.S. and Backofen , R. ( 2012 ) Accessibility and conservation : General features of bacterial small RNA-mRNA interactions ? 
+ RNA Biol. , 9 , 954 -- 965 . 
+ 35 . 
+ Langridge , G.C. , Phan , M.D. , Turner , D.J. , Perkins , T.T. , Parts , L. , Haase , J. , Charles , I. , Maskell , D.J. , Peters , S.E. , Dougan , G. et al. ( 2009 ) Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants . 
+ Genome Res. , 19 , 2308 -- 2316 . 
+ 36 . 
+ Goodman , A.L. , McNulty , N.P. , Zhao , Y. , Leip , D. , Mitra , R.D. , Lozupone , C.A. , Knight , R. and Gordon , J.I. ( 2009 ) Identifying genetic determinants needed to establish a human gut symbiont in its habitat . 
+ Cell Host Microbe. , 6 , 279 -- 289 . 
+ 37 . 
+ van Opijnen , T. , Bodi , K.L. and Camilli , A. ( 2009 ) Tn-seq : high-throughput parallel sequencing for ﬁtness and genetic interaction studies in microorganisms . 
+ Nat . 
+ Methods. , 6 , 767 -- 772 . 
+ 38 . 
+ Christen , B. , Abeliuk , E. , Collier , J.M. , Kalogeraki , V.S. , Passarelli , B. , Coller , J.A. , Fero , M.J. , McAdams , H.H. and Shapiro , L. ( 2011 ) The essential genome of a bacterium . 
+ Mol . 
+ Syst . 
+ Biol. , 7 , 528 . 
+ 39 . 
+ Burge , S.W. , Daub , J. , Eberhardt , R. , Tate , J. , Barquist , L. , Nawrocki , E.P. , Eddy , S.R. , Gardner , P.P. and Bateman , A. ( 2013 ) Rfam 11.0 : 10 years of RNA families . 
+ Nucleic Acids Res. , 41 , D226 -- D232 . 
+ 40 . 
+ Sittka , A. , Sharma , C.M. , Rolle , K. and Vogel , J. ( 2009 ) Deep sequencing of Salmonella RNA associated with heterologous Hfq proteins in vivo reveals small RNAs as a major target class and identiﬁes RNA processing phenotypes . 
+ RNA Biol. , 6 , 266 -- 275 . 
+ 41 . 
+ Chinni , S.V. , Raabe , C.A. , Zakaria , R. , Randau , G. , Hoe , C.H. , Zemann , A. , Brosius , J. , Tang , T.H. and Rozhdestvensky , T.S. ( 2010 ) Experimental identiﬁcation and characterization of 97 novel npcRNA candidates in Salmonella enterica serovar Typhi . 
+ Nucleic Acids Res. , 38 , 5893 -- 5908 . 
+ 42 . 
+ Raghavan , R. , Groisman , E.A. and Ochman , H. ( 2011 ) Genome-wide detection of novel regulatory RNAs in E. coli . 
+ Genome Res. , 21 , 1487 -- 1497 . 
+ 43 . 
+ Finn , R.D. , Clements , J. and Eddy , S.R. ( 2011 ) HMMER web server : interactive sequence similarity searching . 
+ Nucleic Acids Res. , 39 , W29 -- W37 . 
+ 44 . 
+ Punta , M. , Coggill , P.C. , Eberhardt , R.Y. , Mistry , J. , Tate , J. , Boursnell , C. , Pang , N. , Forslund , K. , Ceric , G. , Clements , J. et al. ( 2012 ) The Pfam protein families database . 
+ Nucleic Acids Res. , 40 , D290 -- D301 . 
+ 45 . 
+ Li , H. , Ruan , J. and Durbin , R. ( 2008 ) Mapping short DNA sequencing reads and calling variants using mapping quality scores . 
+ Genome Res. , 18 , 1851 -- 1858 . 
+ 46 . 
+ Knuth , K. , Niesalla , H. , Hueck , C.J. and Fuchs , T.M. ( 2004 ) Large-scale identiﬁcation of essential Salmonella genes by trapping lethal insertions . 
+ Mol . 
+ Microbiol. , 51 , 1729 -- 1744 . 
+ 47 . 
+ Santiviago , C.A. , Reynolds , M.M. , Porwollik , S. , Choi , S.H. , Long , F. , Andrews-Polymenis , H.L. and McClelland , M. ( 2009 ) Analysis of pools of targeted Salmonella deletion mutants identiﬁes novel genes affecting ﬁtness during competitive infection in mice . 
+ PLoS Pathog. , 5 , e1000477 . 
+ 48 . 
+ Baba , T. , Ara , T. , Hasegawa , M. , Takai , Y. , Okumura , Y. , Baba , M. , Datsenko , K.A. , Tomita , M. , Wanner , B.L. and Mori , H. ( 2006 ) Construction of Escherichia coli K-12 in-frame , single-gene knockout mutants : the Keio collection . 
+ Mol . 
+ Syst . 
+ Biol. , 2 , 2006.0008 . 
+ 49 . 
+ Gerdes , S.Y. , Scholle , M.D. , Campbell , J.W. , Balazsi , G. , Ravasz , E. , Daugherty , M.D. , Somera , A.L. , Kyrpides , N.C. , Anderson , I. , Gelfand , M.S. et al. ( 2003 ) Experimental determination and system level analysis of essential genes in Escherichia coli MG1655 . 
+ J. Bacteriol. , 185 , 5673 -- 5684 . 
+ 50 . 
+ Tong , X. , Campbell , J.W. , Balazsi , G. , Kay , K.A. , Wanner , B.L. , Gerdes , S.Y. and Oltvai , Z.N. ( 2004 ) Genome-scale identiﬁcation of conditionally essential genes in E. coli by DNA microarrays . 
+ Biochem . 
+ Biophys . 
+ Res . 
+ Commun. , 322 , 347 -- 354 . 
+ 51 . 
+ Hashimoto , M. , Ichimura , T. , Mizoguchi , H. , Tanaka , K. , Fujimitsu , K. , Keyamura , K. , Ote , T. , Yamakawa , T. , Yamazaki , Y. , Mori , H. et al. ( 2005 ) Cell size and nucleoid organization of engineered Escherichia coli cells with a reduced genome . 
+ Mol . 
+ Microbiol. , 55 , 137 -- 149 . 
+ 52 . 
+ Kang , Y. , Durfee , T. , Glasner , J.D. , Qiu , Y. , Frisch , D. , Winterberg , K.M. and Blattner , F.R. ( 2004 ) Systematic mutagenesis of the Escherichia coli genome . 
+ J. Bacteriol. , 186 , 4921 -- 4930 . 
+ 53 . 
+ Echols , H. and Green , L. ( 1971 ) Establishment and maintenance of repression by bacteriophage lambda : the role of the cI , cII , and c3 proteins . 
+ Proc . 
+ Natl Acad . 
+ Sci . 
+ USA , 68 , 2190 -- 2194 . 
+ 54 . 
+ Thomson , N. , Baker , S. , Pickard , D. , Fookes , M. , Anjum , M. , Hamlin , N. , Wain , J. , House , D. , Bhutta , Z. , Chan , K. et al. ( 2004 ) The role of prophage-like elements in the diversity of Salmonella enterica serovars . 
+ J. Mol . 
+ Biol. , 339 , 279 -- 300 . 
+ 55 . 
+ Kropinski , A.M. , Sulakvelidze , A. , Konczy , P. and Poppe , C. ( 2007 ) Salmonella phages and prophages -- genomics and practical aspects . 
+ Methods Mol . 
+ Biol. , 394 , 133 -- 175 . 
+ 56 . 
+ Lemire , S. , Figueroa-Bossi , N. and Bossi , L. ( 2011 ) Bacteriophage crosstalk : coordination of prophage induction by trans-acting antirepressors . 
+ PLoS Genet. , 7 , e1002149 . 
+ 57 . 
+ Briani , F. , Deho , G. , Forti , F. and Ghisotti , D. ( 2001 ) The plasmid status of satellite bacteriophage P4 . 
+ Plasmid , 45 , 1 -- 17 . 
+ 58 . 
+ Padalon-Brauch , G. , Hershberg , R. , Elgrably-Weiss , M. , Baruch , K. , Rosenshine , I. , Margalit , H. and Altuvia , S. ( 2008 ) Small RNAs encoded within genetic islands of Salmonella Typhimurium show host-induced expression and role in virulence . 
+ Nucleic Acids Res. , 36 , 1913 -- 1927 . 
+ 59 . 
+ Forti , F. , Dragoni , I. , Briani , F. , Deho , G. and Ghisotti , D. ( 2002 ) Characterization of the small antisense CI RNA that regulates bacteriophage P4 immunity . 
+ J. Mol . 
+ Biol. , 315 , 541 -- 549 . 
+ 60 . 
+ Sabbattini , P. , Forti , F. , Ghisotti , D. and Deho , G. ( 1995 ) Control of transcription termination by an RNA factor in bacteriophage P4 immunity : identiﬁcation of the target sites . 
+ J. Bacteriol. , 177 , 1425 -- 1434 . 
+ 61 . 
+ Navarre , W.W. , Porwollik , S. , Wang , Y. , McClelland , M. , Rosen , H. , Libby , S.J. and Fang , F.C. ( 2006 ) Selective silencing of foreign DNA with low GC content by the H-NS protein in Salmonella . 
+ Science , 313 , 236 -- 238 . 
+ 62 . 
+ Iyer , L.M. , Babu , M.M. and Aravind , L. ( 2006 ) The HIRAN domain and recruitment of chromatin remodeling and repair activities to damaged DNA . 
+ Cell Cycle , 5 , 775 -- 782 . 
+ 63 . 
+ Kuhle , V. , Jackel , D. and Hensel , M. ( 2004 ) Effector proteins encoded by Salmonella pathogenicity island 2 interfere with the microtubule cytoskeleton after translocation into host cells . 
+ Trafﬁc , 5 , 356 -- 370 
+ 64 . 
+ Miao , E.A. and Miller , S.I. ( 2000 ) A conserved amino acid sequence directing intracellular type III secretion by Salmonella Typhimurium . 
+ Proc . 
+ Natl Acad . 
+ Sci . 
+ USA , 97 , 7539 -- 7544 . 
+ 65 . 
+ Freeman , J.A. , Ohl , M.E. and Miller , S.I. ( 2003 ) The Salmonella enterica serovar Typhimurium translocated effectors SseJ and SifB are targeted to the Salmonella-containing vacuole . 
+ Infect . 
+ Immun. , 71 , 418 -- 427 . 
+ 66 . 
+ Lucchini , S. , Rowley , G. , Goldberg , M.D. , Hurd , D. , Harrison , M. and Hinton , J.C. ( 2006 ) H-NS mediates the silencing of laterally acquired genes in bacteria . 
+ PLoS Pathog. , 2 , e81 . 
+ 67 . 
+ Manna , D. , Porwollik , S. , McClelland , M. , Tan , R. and Higgins , N.P. ( 2007 ) Microarray analysis of Mu transposition in Salmonella enterica , serovar Typhimurium : transposon exclusion by high-density DNA binding proteins . 
+ Mol . 
+ Microbiol. , 66 , 315 -- 328 . 
+ 68 . 
+ Hensel , M. , Shea , J.E. , Raupach , B. , Monack , D. , Falkow , S. , Gleeson , C. , Kubo , T. and Holden , D.W. ( 1997 ) Functional analysis of ssaJ and the ssaK/U operon , 13 genes encoding components of the type III secretion apparatus of Salmonella Pathogenicity Island 2 . 
+ Mol . 
+ Microbiol. , 24 , 155 -- 167 . 
+ 69 . 
+ Hensel , M. , Shea , J.E. , Waterman , S.R. , Mundy , R. , Nikolaus , T. , Banks , G. , Vazquez-Torres , A. , Gleeson , C. , Fang , F.C. and Holden , D.W. ( 1998 ) Genes encoding putative effector proteins of the type III secretion system of Salmonella pathogenicity island 2 are required for bacterial virulence and proliferation in macrophages . 
+ Mol . 
+ Microbiol. , 30 , 163 -- 174 . 
+ 70 . 
+ Ohlson , M.B. , Fluhr , K. , Birmingham , C.L. , Brumell , J.H. and Miller , S.I. ( 2005 ) SseJ deacylase activity by Salmonella enterica serovar Typhimurium promotes virulence in mice . 
+ Infect . 
+ Immun. , 73 , 6249 -- 6259 . 
+ 71 . 
+ Doyle , M. , Fookes , M. , Ivens , A. , Mangan , M.W. , Wain , J. and Dorman , C.J. ( 2007 ) An H-NS-like stealth protein aids horizontal DNA transmission in bacteria . 
+ Science , 315 , 251 -- 252 . 
+ 72 . 
+ Dillon , S.C. , Cameron , A.D. , Hokamp , K. , Lucchini , S. , Hinton , J.C. and Dorman , C.J. ( 2010 ) Genome-wide analysis of the H-NS and Sfh regulatory networks in Salmonella Typhimurium identiﬁes a plasmid-encoded transcription silencing mechanism . 
+ Mol . 
+ Microbiol. , 76 , 1250 -- 1265 . 
+ 73 . 
+ Rabsch , W. , Voigt , W. , Reissbrodt , R. , Tsolis , R.M. and Baumler , A.J. ( 1999 ) Salmonella Typhimurium IroN and FepA proteins mediate uptake of enterobactin but differ in their speciﬁcity for other siderophores . 
+ J. Bacteriol. , 181 , 3610 -- 3612 . 
+ 74 . 
+ Edwards , M.F. and Stocker , B.A. ( 1988 ) Construction of delta aroA his delta pur strains of Salmonella Typhi . 
+ J. Bacteriol. , 170 , 3991 -- 3995 . 
+ 75 . 
+ Tsolis , R.M. , Baumler , A.J. , Heffron , F. and Stojiljkovic , I. ( 1996 ) Contribution of TonB - and Feo-mediated iron uptake to growth of Salmonella Typhimurium in the mouse . 
+ Infect . 
+ Immun. , 64 , 4549 -- 4556 . 
+ 76 . 
+ Crick , F.H. ( 1966 ) Codon -- anticodon pairing : the wobble hypothesis . 
+ J. Mol . 
+ Biol. , 19 , 548 -- 555 . 
+ 77 . 
+ Lowe , T.M. and Eddy , S.R. ( 1997 ) tRNAscan-SE : a program for improved detection of transfer RNA genes in genomic sequence . 
+ Nucleic . 
+ Acids . 
+ Res. , 25 , 955 -- 964 . 
+ 78 . 
+ Nasvall , S.J. , Chen , P. and Bjork , G.R. ( 2004 ) The modiﬁed wobble nucleoside uridine-5-oxyacetic acid in tRNAPro ( cmo5UGG ) promotes reading of all four proline codons in vivo . 
+ RNA , 10 , 1662 -- 1673 . 
+ 79 . 
+ Osawa , S. , Jukes , T.H. , Watanabe , K. and Muto , A. ( 1992 ) Recent evidence for evolution of the genetic code . 
+ Microbiol . 
+ Rev. , 56 , 229 -- 264 . 
+ 80 . 
+ Marck , C. and Grosjean , H. ( 2002 ) tRNomics : analysis of tRNA genes from 50 genomes of Eukarya , Archaea , and Bacteria reveals anticodon-sparing strategies and domain-speciﬁc features . 
+ RNA , 8 , 1189 -- 1232 . 
+ 81 . 
+ Rosenblad , M.A. , Larsen , N. , Samuelsson , T. and Zwieb , C. ( 2009 ) Kinship in the SRP RNA family . 
+ RNA Biol. , 6 , 508 -- 516 . 
+ 82 . 
+ Frank , D.N. and Pace , N.R. ( 1998 ) Ribonuclease P : unity and diversity in a tRNA processing ribozyme . 
+ Annu . 
+ Rev. Biochem. , 67 , 153 -- 180 . 
+ 83 . 
+ Winkler , W. , Nahvi , A. and Breaker , R.R. ( 2002 ) Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression . 
+ Nature , 419 , 952 -- 956 . 
+ 84 . 
+ Vogel , J. , Bartels , V. , Tang , T.H. , Churakov , G. , Slagter-Jager , J.G. , Huttenhofer , A. and Wagner , E.G. ( 2003 ) RNomics in Escherichia transcriptional output in bacteria . 
+ Nucleic Acids Res. , 31 , 6435 -- 6443 . 
+ 85 . 
+ Tjaden , B. , Saxena , R.M. , Stolyar , S. , Haynor , D.R. , Kolker , E. and Rosenow , C. ( 2002 ) Transcriptome analysis of Escherichia coli using high-density oligonucleotide probe arrays . 
+ Nucleic Acids Res. , 30 , 3732 -- 3738 . 
+ 86 . 
+ Aseev , L.V. , Levandovskaya , A.A. , Tchuﬁstova , L.S. , Scaptsova , N.V. and Boni , I.V. ( 2008 ) A new regulatory circuit in ribosomal protein operons : S2-mediated control of the rpsB-tsf expression in vivo . 
+ RNA , 14 , 1882 -- 1894 . 
+ 87 . 
+ Meyer , M.M. , Ames , T.D. , Smith , D.P. , Weinberg , Z. , Schwalbach , M.S. , Giovannoni , S.J. and Breaker , R.R. ( 2009 ) Identiﬁcation of candidate structured RNAs in the marine organism ` Candidatus Pelagibacter ubique ' . 
+ BMC Genomics , 10 , 268 . 
+ 88 . 
+ Benard , L. , Philippe , C. , Ehresmann , B. , Ehresmann , C. and Portier , C. ( 1996 ) Pseudoknot and translational control in the expression of the S15 ribosomal protein . 
+ Biochimie , 78 , 568 -- 576 . 
+ 89 . 
+ Lease , R.A. , Cusick , M.E. and Belfort , M. ( 1998 ) Riboregulation in Escherichia coli : DsrA RNA acts by RNA : RNA interactions at multiple loci . 
+ Proc . 
+ Natl Acad . 
+ Sci . 
+ USA , 95 , 12456 -- 12461 . 
+ 90 . 
+ Chao , Y. , Papenfort , K. , Reinhardt , R. , Sharma , C.M. and Vogel , J. ( 2012 ) An atlas of Hfq-bound transcripts reveals 30 UTRs as a genomic reservoir of regulatory small RNAs . 
+ EMBO J. , 31 , 4005 -- 4019 . 
+ 91 . 
+ Chen , S. , Lesnik , E.A. , Hall , T.A. , Sampath , R. , Griffey , R.H. , Ecker , D.J. and Blyn , L.B. ( 2002 ) A bioinformatics based approach to discover small RNA genes in the Escherichia coli genome . 
+ Biosystems , 65 , 157 -- 177 . 
+ 92 . 
+ Diwa , A. , Bricker , A.L. , Jain , C. and Belasco , J.G. ( 2000 ) An evolutionarily conserved RNA stem-loop functions as a sensor that directs feedback regulation of RNase E gene expression . 
+ Genes Dev. , 14 , 1249 -- 1260 . 
+ 93 . 
+ Antal , M. , Bordeau , V. , Douchin , V. and Felden , B. ( 2005 ) A small bacterial RNA regulates a putative ABC transporter . 
+ J. Biol . 
+ Chem. , 280 , 7901 -- 7908 . 
+ 94 . 
+ Wassarman , K.M. , Repoila , F. , Rosenow , C. , Storz , G. and Gottesman , S. ( 2001 ) Identiﬁcation of novel small RNAs using comparative genomics and microarrays . 
+ Genes Dev. , 15 , 1637 -- 1651 . 
+ 95 . 
+ Douchin , V. , Bohn , C. and Bouloc , P. ( 2006 ) Down-regulation of porins by a small RNA bypasses the essentiality of the regulated intramembrane proteolysis protease RseP in Escherichia coli . 
+ J. Biol . 
+ Chem. , 281 , 12253 -- 12259 . 
+ 96 . 
+ Rivas , E. and Eddy , S.R. ( 2001 ) Noncoding RNA gene detection using comparative sequence analysis . 
+ BMC Bioinformatics , 2 , 8 . 
+ 97 . 
+ Kawano , M. , Oshima , T. , Kasai , H. and Mori , H. ( 2002 ) Molecular characterization of long direct repeat ( LDR ) sequences expressing a stable mRNA encoding for a 35-amino-acid cell-killing peptide and a cis-encoded small antisense RNA in Escherichia coli . 
+ Mol . 
+ Microbiol. , 45 , 333 -- 349 . 
+ 98 . 
+ Zurawski , G. , Brown , K. , Killingly , D. and Yanofsky , C. ( 1978 ) Nucleotide sequence of the leader region of the phenylalanine operon of Escherichia coli . 
+ Proc . 
+ Natl Acad . 
+ Sci . 
+ USA , 75 , 4271 -- 4275 . 
+ 99 . 
+ Naville , M. and Gautheret , D. ( 2010 ) Premature terminator analysis sheds light on a hidden world of bacterial transcriptional attenuation . 
+ Genome Biol. , 11 , R97 . 
+ 100 . 
+ Urban , J.H. and Vogel , J. ( 2008 ) Two seemingly homologous noncoding RNAs act hierarchically to activate glmS mRNA translation . 
+ PLoS Biol. , 6 , e64 . 
+ 101 . 
+ Papenfort , K. , Pfeiffer , V. , Lucchini , S. , Sonawane , A. , Hinton , J.C. and Vogel , J. ( 2008 ) Systematic deletion of Salmonella small RNA genes identiﬁes CyaR , a conserved CRP-dependent riboregulator of OmpX synthesis . 
+ Mol . 
+ Microbiol. , 68 , 890 -- 906 . 
+ 102 . 
+ De Las Penas , A. , Connolly , L. and Gross , C.A. ( 1997 ) SigmaE is an essential sigma factor in Escherichia coli . 
+ J. Bacteriol. , 179 , 6862 -- 6864 . 
+ 103 . 
+ Papenfort , K. , Pfeiffer , V. , Mika , F. , Lucchini , S. , Hinton , J.C. and Vogel , J. ( 2006 ) SigmaE-dependent small RNAs of Salmonella respond to membrane stress by accelerating global omp mRNA decay . 
+ Mol . 
+ Microbiol. , 62 , 1674 -- 1688 
+ 104 . 
+ Santiviago , C.A. , Toro , C.S. , Hidalgo , A.A. , Youderian , P. and Mora , G.C. ( 2003 ) Global regulation of the Salmonella enterica serovar Typhimurium major porin , OmpD . 
+ J. Bacteriol. , 185 , 5901 -- 5905 . 
+ 105 . 
+ Bossi , L. and Figueroa-Bossi , N. ( 2007 ) A small RNA downregulates LamB maltoporin in Salmonella . 
+ Mol . 
+ Microbiol. , 65 , 799 -- 810 . 
+ 106 . 
+ Jones , A.M. , Goodwill , A. and Elliott , T. ( 2006 ) Limited role for the DsrA and RprA regulatory RNAs in rpoS regulation in Salmonella enterica . 
+ J. Bacteriol. , 188 , 5077 -- 5088 . 
+ 107 . 
+ Coornaert , A. , Lu , A. , Mandin , P. , Springer , M. , Gottesman , S. and Guillier , M. ( 2010 ) MicA sRNA links the PhoP regulon to cell envelope stress . 
+ Mol . 
+ Microbiol. , 76 , 467 -- 479 . 
+ 108 . 
+ Grifﬁn , J.E. , Gawronski , J.D. , Dejesus , M.A. , Ioerger , T.R. , Akerley , B.J. and Sassetti , C.M. ( 2011 ) High-resolution phenotypic proﬁling deﬁnes genes essential for mycobacterial growth and cholesterol catabolism . 
+ PLoS Pathog. , 7 , e1002251 . 
+ 109 . 
+ Gallagher , L.A. , Shendure , J. and Manoil , C. ( 2011 ) Genome-scale identiﬁcation of resistance functions in Pseudomonas aeruginosa using Tn-seq . 
+ MBio , 2 , e00315 -- e00310 . 
+ 110 . 
+ Gawronski , J.D. , Wong , S.M. , Giannoukos , G. , Ward , D.V. and Akerley , B.J. ( 2009 ) Tracking insertion mutants within libraries by deep sequencing and a genome-wide screen for Haemophilus genes required in the lung . 
+ Proc . 
+ Natl Acad . 
+ Sci . 
+ USA , 106 , 16422 -- 16427 . 
+ 111 . 
+ Citron , M. and Schuster , H. ( 1990 ) The c4 repressors of bacteriophages P1 and P7 are antisense RNAs . 
+ Cell , 62 , 591 -- 598 . 
+ 112 . 
+ Ravin , N.V. , Svarchevsky , A.N. and Deho , G. ( 1999 ) The anti-immunity system of phage-plasmid N15 : identiﬁcation of the antirepressor gene and its control by a small processed RNA . 
+ Mol . 
+ Microbiol. , 34 , 980 -- 994 . 
+ 113 . 
+ Hobbs , E.C. , Astarita , J.L. and Storz , G. ( 2010 ) Small RNAs and small proteins involved in resistance to cell envelope stress and acid shock in Escherichia coli : analysis of a bar-coded mutant collection . 
+ J. Bacteriol. , 192 , 59 -- 67
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/23511241.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/23511241.txt 0 → 100644
View file @27818a9
+ Eficient transcription initiation in bacteria: an interplay of protein–DNA interaction parameters†
+ As the first , and usually rate-limiting , step of transcription initiation , bacterial RNA polymerase ( RNAP ) binds to double stranded DNA ( dsDNA ) and subsequently opens the two strands of DNA ( the open complex formation ) . 
+ The rate determining step in the open complex formation is opening of a short ( 6 bp ) DNA called the 10 region , which interacts with RNAP in both dsDNA and single stranded ( ssDNA ) forms . 
+ Accordingly , formation of the open complex depends on ( physically independent ) domains of RNAP that interact with ssDNA and dsDNA , as well as on parameters of DNA melting and sequences of 
+ 1 Introduction
+ Transcription initiation is both the first step and a major control point in gene expression . 
+ Transcription can not be initiated by 
+ Institute of Physiology and Biochemistry , Faculty of Biology , University of Belgrade , Studentski trg 16 , 11000 Belgrade , Serbia . 
+ E-mail : dmarko@bio.bg.ac.rs; + + Fax : 381 11 2639 882 ; Tel : 381 63 1312 976 † Electronic supplementary information ( ESI ) available . 
+ See DOI : 10.1039 / c3ib20221f core RNA polymerase alone , so a complex between core RNA polymerase and a s factor , which is called RNA polymerase holoenzyme ( RNAP ) , is formed .1 Different s factors interact with double-stranded DNA ( dsDNA ) and single-stranded DNA ( ssDNA ) in a sequence specific manner , and they are responsible for transcription under different conditions .2 In this work we concentrate on s70 ( the major s factor in E. coli ) , which is responsible for transcribing housekeeping genes .3 Transcription is initiated from the sequences called core promoters . 
+ The main elements of core promoters in bacteria are 35 element and 10 element , where 35 and 10 refer to typical distances of these elements from transcription start sites .4 As the first step of transcription initiation , RNAP reversibly binds to dsDNA of promoter elements , which is called the closed complex formation , and is described by the binding afinity K . 
+ B The binding afinity is , therefore , determined by interactions of s70 with dsDNA , which is exhibited through interactions of s70 domain 4.2 with 35 box , and s70 domain 2.4 with 10 box in the dsDNA form .2 This binding of RNAP leads to opening of the two DNA strands ( promoter melting ) , so that a transcription bubble is formed . 
+ This transcription bubble extends from the upstream edge of the 10 element to about two bases downstream of the transcription start site , which roughly corresponds to positions 12 to +2 ( +1 is the transcription start site ) .5 The ( inverse ) time needed to form the transcription bubble ( i.e. to open the two DNA strands ) is described by the transition rate from the closed to open complex ( kf ) . 
+ The transition rate , therefore , crucially depends on interactions of s70 with 10 element ssDNA , which are exhibited through s70 domain 2.3.6 Since almost the entire -- 10 element is a part of the transcription bubble , this element interacts with s70 in both dsDNA and ssDNA forms . 
+ While sequences from the downstream edge of the -- 10 element to the transcription start site are also part of the transcription bubble , mutating these sequences does not affect the bubble formation ,7 and it is considered that these sequences do not interact with s70 in a sequence specific manner . 
+ Furthermore , both theoretical studies8 and single molecule experiments9 show that opening of 10 element is the rate limiting step in the transcription bubble formation . 
+ Since the 10 box is a part of both the closed and the open complex , there is a complex interplay of biophysical interactions associated with this element : ( i ) DNA melting energies ,10 since the 10 box dsDNA is opened ( melted ) in the open complex , ( ii ) interaction energies of s70 with dsDNA through s70 subdomain 2.4,11 and ( iii ) interaction energies of s70 with ssDNA through s70 subdomain 2.3 . 
+ These three types of interactions are physically independent , since they either correspond to intrinsic DNA properties ( for melting energies ) or are exhibited through physically distinct s70 binding domains ( for s70 -- dsDNA and s70 -- ssDNA interactions ) .6 Given the complex set of physically independent interactions at the 10 element described above , there is a question of how their mutual relationship leads to eficient transcription . 
+ In particular , the RNAP binding afinity ( KB ) depends on interactions of 10 box dsDNA with s70 subdomain 2.4,6 where the stronger interaction leads to larger binding afinity . 
+ On the other hand , a stronger interaction of s2 .4 with dsDNA of 10 element leads to a slower transition from the closed to open complex .8 The transition rate ( kf ) also depends on interactions of 10 box ssDNA with s70 subdomain 2.3 and on the 10 element melting energy , both of which are physically independent of s2 .4.8,12 Due to this , KB and kf should a priori be negatively correlated , and there may be a large number of sequences in the genome that correspond to high KB but low kf . 
+ We call such sequences where RNAP is strongly bound to dsDNA ( high KB ) , but proceeds to the open complex too slowly to achieve functional transcription ( due to small kf ) , poised promoters ; more generally , the term poised promoter is used for all instances where RNAP is bound strongly to DNA , but fails to proceed to functional transcription .13 Naively , RNAP poising appears particularly detrimental for sequences that should be transcriptionally active ( functional promoters ) , since these sequences should result in eficient transcription . 
+ Given the kinetic issues discussed above , we here aim to understand the following questions : ( i ) what is the extent of RNAP poising in the genome ? 
+ ( ii ) Are binding specificities of s70 interaction domains , and/or sequences of E. coli intergenic regions , designed to minimize the number of poised promoters ? 
+ ( iii ) Do sequences of functional s70 promoters ( additionally ) suppress RNAP poising ? 
+ We here concentrate on the intergenic regions , rather than on the whole genome , since these regions are relevant for transcription regulation , i.e. both transcription start sites and regulatory elements are located in the intergenic regions . 
+ The questions posed above are important not only from the point of design of s70 -- promoter DNA interactions , but also from the point of searches for functional promoters in the genome . 
+ In particular , the most common experimental method to search for core promoters on a genome-wide scale is ChIP-chip14 and its alternatives ( e.g. ChIP-seq15 ) . 
+ However , immunoprecipitation ( ChIP ) detects DNA sequences that are strongly bound by the protein ( RNAP ) , rather than sequences with a high rate of transcription initiation -- which is the parameter that defines a functional promoter . 
+ Consequently , the high number of false positives , which is commonly associated with ChIP-chip experiments aimed for promoter detection ,16 may indicate extensive RNAP poising in the genome . 
+ The goal of this paper is to investigate a relationship between physical interactions at the 10 element and RNAP poising , which provides a basis for better understanding of the nature of false positives in ChIP-chip experiments . 
+ Along the same lines , DNA footprinting experiments detected sequences that are strongly bound by RNAP , but which result in transcriptionally inactive complexes ; these inactive complexes were shown to be due to ineficient formation of the open complex ( i.e. due to RNAP poising ) .17 Such observations seem particularly important from the point of computational searches of transcription start sites ( core promoters ) in the genome , which typically lead to a very high number of false positives . 
+ It was consequently proposed that kinetic effects -- an extreme example of which are poised promoters -- can significantly contribute to accuracy of the weight matrix ( computational ) searches of promoters .18 Furthermore , an understanding of the kinetic effects , which we will achieve in this paper , will motivate their inclusion within more physical methods of TSS recognition . 
+ With regard to this , it was frequently observed that coupling biophysical models with sequence statistics provides a significantly better prediction accuracy compared to simple statistical models .19 In order to analyze how the interplay of different interaction parameters leads to eficient transcription , one must be able to investigate kinetics of transcription initiation on a genome wide scale . 
+ This analysis can not be done through experiments , since KB and kf have to be measured through work-intensive t-plot measurements ,20 individually for each sequence of interest . 
+ We here instead approach the problem computationally , where we use a recently developed biophysical model of the open complex formation ,8 which allows the calculation of the kinetic parameters ( KB and kf ) for each sequence of interest . 
+ This model showed a very good agreement with both biochemical and genomics data , with no free parameters used in comparing the model with the experimental data .8 We will here show that binding specificities of s70 DNA interaction domains are designed to prevent extensive RNAP poising in the intergenic regions , but that the number of poised promoters is still suficient to significantly affect accuracy of core promoter searches . 
+ Surprisingly , we will find that sequences of functional 10 elements increase the extent of RNAP poising ; on the other hand , overall , the sequences in the intergenic regions have no tendency to affect RNAP poising . 
+ Though seemingly counter-intuitive , we will argue that this result fits well within the recently proposed mix-and-match model of promoter recognition .21 
+ 2 Results
+ 2.1 Design of in silico experiments
+ Our goal is to investigate how the interplay of physical interactions at the 10 promoter region provides for eficient transcription . 
+ We , consequently , systematically investigate relations between the kinetic parameters as the 10 element sequence is varied . 
+ To achieve this , we design a number of in silico experiments , where we start from a sequence of the lacUV5 promoter . 
+ This promoter has a consensus 10 element -- which is convenient as a reference for calculating kinetic para-meters -- but has an imperfect 35 element as is characteristic for most functional promoters .5 In the analysis/in silico experiments presented in the following subsections , we will substitute the consensus 10 element of lacUV5 promoter with different sets of DNA segments . 
+ The biophysical model of transcription initiation8 allows the calculation of the relevant kinetic parameters for sets of DNA segments at the scale of the entire genome ( see Methods and ESI † ) . 
+ In particular , in the analysis below , we will substitute the consensus 10 element of lacUV5 promoter with : ( i ) all 6 bp long segments from E. coli intergenic regions , ( ii ) all 10 elements that correspond to experimentally detected E. coli transcription start sites , ( iii ) segments that correspond to randomized intergenic regions and randomized 10 elements of experimentally detected promoters ; the computational procedure allows randomizing DNA sequences multiple times , so that statistics of the relevant quantities can be calculated . 
+ In the analysis below , we will also address how relevant s70 DNA-interaction domains contribute to the kinetic properties that we investigate . 
+ Experimentally , contributions of different protein domains to the properties of interest would be assessed by mutating amino-acid sequences of these domains . 
+ We will computationally assess contributions of s70 domains by randomizing interaction specificities of these domains ; similarly as with DNA sequences , we can perform multiple randomizations in order to calculate statistics of the relevant quantities . 
+ Finally , we will also substitute binding specificities of s70 domains with binding specificities of different E. coli transcription factors , in order to ensure that the reported relationships are not a consequence of generic properties of protein -- DNA interactions . 
+ 2.2 Kinetic properties of E. coli intergenic regions
+ We start from the sequence of the lacUV5 promoter , and substitute its consensus 10 element with all 6 bp long segments from E. coli intergenic regions . 
+ For all these substitutions we calculate the relative binding afinity ( KB ) and the relative transcription initiation rate ( j ) , by using eqn ( 1 ) and ( 3 ) ( see Methods ) . 
+ The relationship between logarithms of KB and j is shown in Fig. 1A , so that the quantities on the two axis correspond to the appropriate interaction energies that determine the relevant kinetic parameters . 
+ Specifically , the horizontal axis ( log ( KB ) ) corresponds to the s70 -- dsDNA binding energy , while the vertical axis corresponds to a combination of the energy terms that we refer to as the effective energy and which directly determines the transcription initiation rate ( see eqn ( 3 ) and ( 4 ) in Methods ) . 
+ Both KB and j , which are shown in Fig. 1A , are calculated relative to the binding afinity and the transcription initiation rate of the lacUV5 promoter . 
+ Note that we substitute ( vary ) only the 10 element of lacUV5 promoter , and that 10 element of this promoter corresponds to the consensus sequence ( ` 12TATAAT 7 ' ) . 
+ Consequently , zeros on the horizontal and the vertical axis correspond to the consensus 10 element , and stronger interaction energies correspond to larger ( less negative ) values on the two axes . 
+ The horizontal line in Fig. 1 ( transcription rate threshold ) indicates the transcription rate below which transcript levels can not be detected , while the vertical line ( binding threshold ) indicates the binding afinity above which a sequence is considered to be strongly bound by RNAP . 
+ The transcription rate threshold is set based on the estimate that the minimal rate of transcription is 1/400 per second , while the transcription rate of the reference lacUV5 can be estimated at 1/3 per second .22 The binding threshold is set so that it corresponds to the binding afinity of a weak Plac promoter , with sequences of 35 element and 10 element that correspond , respectively , to ` 36TTTACA 31 ' and ` 12TATGTT 7 ' ; 23 this definition is in accordance with an intuitive notion that strongly bound sequences should have a larger binding afinity than a weak promoter . 
+ Fig. 1A shows that there is a high positive correlation ( with a Pearson correlation coeficient of R = 0.85 ) between the transcription activity and the binding afinity for 10 elements derived from E. coli intergenic regions . 
+ One should note that the determinants of binding afinity and transcription activity are physically independent ( see the previous section ) , so the good correlation has to be due to the design of s70 interaction domains or due to the sequence of DNA intergenic segments , which is further explored in the next subsection . 
+ However , despite this high correlation , a significant fraction of the strongly bound sequences corresponds to poised promoters : in Fig. 1 , the green dots mark strongly bound DNA segments that correspond to the functional promoters ( i.e. to sequences that are above both the binding and the transcription activity threshold ) , while the red dots mark the sequences that correspond to the poised promoters ( i.e. to sequences that are above the binding , but below the transcription activity threshold ) . 
+ One can see that a significant fraction of the strongly bound sequences ( B30 % ) correspond to poised promoters . 
+ Such poised promoters can be falsely identified as targets by computational and experimental searches of core promoters , which we will further discuss in the next section . 
+ 2.3 Analyzing the good correlation between the transcription rate and the binding afinity In this subsection , we concentrate on the properties of s70 -- DNA interactions that lead to the good correlation between the transcription activity and the binding afinity , which is observed in Fig. 1A . 
+ As discussed above , KB depends on s70 interactions with 10 element dsDNA , while j depends on interactions of s70 with 10 box ssDNA and on DNA melting energies .8 Since KB and j are physically independent of each other , there is a question of why there is a good correlation between the transcription rate and binding afinity that is observed in Fig. 1A . 
+ The first possibility is that this good correlation is due to the sequence of E. coli intergenic regions , i.e. the presence of poised promoters is suppressed in these sequences . 
+ This possibility might be reasonable , since existence of a large number of poised promoters could be detrimental for eficient transcription initiation ( see also Discussion ) . 
+ The second possibility is that the good correlation is due to the design of s70 DNA interaction domains ( specifically due to the binding specificities of s70 subunits 2.3 and 2.4 ) . 
+ We test these two possibilities below . 
+ In order to generate an appropriate ensemble to test the possibility that the good correlation is due to the DNA sequence , we next randomize the DNA sequence of E. coli intergenic regions 50 times . 
+ The randomizations are performed so that frequencies of the nucleotides are preserved ( see Methods ) . 
+ We next re-calculate the correlation coeficient between the transcription rate and the binding afinity for each of the 50 randomized sequences , and obtain the mean for these 50 randomizations as R % = 0.84 ( the relationship between the transcription rate and the binding afinity for one such randomization is shown in ESI , † Fig . 
+ S1 ) . 
+ This value ( R % = 0.84 ) is only somewhat smaller compared to the correlation coeficient for the actual E. coli intergenic regions ( R = 0.85 ) . 
+ Consequently , the design of the DNA sequence of the intergenic regions is not a reason for the high correlation between the transcription rate and the binding afinity . 
+ As the second possibility , we analyze if the high correlation is due to the design of the binding specificities of s70 DNA interaction domains . 
+ To test this possibility , we randomize the binding specificities that correspond to s70 subunit 2.3 ( s70 -- ssDNA interactions ) and 2.4 ( s70 -- dsDNA interactions ) and DNA melting energies ( see Methods ) . 
+ We first permute the two parameters that -- in the single nucleotide approximation -- characterize DNA melting ( melting energies of A : T and G : C pairs -- see Methods ) ; the effect of this permutation is shown in Fig. 1B . 
+ In Fig. 1C and D we show the effect of randomization of , respectively , s70 binding domains 2.3 and 2.4 . 
+ Fig. 1B -- D show that ( separately ) randomizing each of the interaction energies leads to a large decrease in the correlation coeficient , and to a consequent large increase in the fraction of poised promoters ( the red dots in Fig. 1B -- D ) . 
+ In particular , note that not only randomizations of the interaction domain specificities ( Fig. 1C and D ) , but also the permutation of the melting energies ( Fig. 1B ) lead to a significant decrease in the correlation coeficient . 
+ This indicates that the reduction of RNAP poising in the genome depends on an interplay of all the relevant parameters ( i.e. on the mutual relation between ssDNA , dsDNA and melting energy parameters ) . 
+ To test statistical significance of the results , in Fig. 1C and D , we calculate correlation coeficients for 50 randomizations of ssDNA interaction parameters ( s70 subunit 2.3 ) , and for 50 randomizations of dsDNA interaction parameters ( s70 subunit 2.4 ) . 
+ The mean values and 95 % confidence intervals for these rando-mizations are shown in the histogram ( see Fig. 2 ) . 
+ For comparison , the correlation coeficient for the actual ( wild type ) interaction parameters and for the permutation of the melting parameters are also indicated . 
+ We see that all the randomizations indeed lead to a statistically significant ( and large ) decrease in the correlation coeficient . 
+ Consequently , the reduction in the number of poised promoters in the intergenic regions depends on the mutual relationship of all physical parameters that are relevant for opening the 10 element . 
+ Finally , from Fig. 2 one can also note that randomization of dsDNA interaction parameters ( s70 domain 2.4 ) leads to an almost complete loss of the correlation . 
+ The reason for this loss is that the binding afinity depends exclusively on dsDNA interactions , while the transcription rate depends on dsDNA interactions through only one out of six bases of the 10 element ( base 12 ) ( see eqn ( 1 ) , ( 3 ) and ( 4 ) ) . 
+ Consequently , randomization of dsDNA interactions leads to an almost complete loss of the relation between the binding afinity and the transcription rate . 
+ 2.4 Substitutions of r DNA interaction domains
+ In this subsection , we provide further evidence that the binding specificities of s70 interaction domains are designed to prevent extensive RNAP poising . 
+ Specifically , while we established that the good correlation is due to the specificities of s70 DNA-binding domains , it remains to be confirmed that the effect is not an artificial consequence of some generic property of protein -- DNA interactions . 
+ For example , such an artifact would arise if protein -- DNA binding domains would have a general tendency to recognize similar AT rich sequences . 
+ To test this , we substitute specificities of binding domains 2.3 and 2.4 with specificities of different E. coli DNA binding proteins . 
+ Parameters of protein -- DNA interactions are inferred from binding sequences assembled in DPInteract database ,24 by using the QPMEME algorithm .19 b From DPInteract database we can infer , with a high reliability , interaction specificities of 8 E. coli transcription factors ( see Methods ) . 
+ We then substitute specificities of RNAP binding domains 2.3 and 2.4 with these inferred specificities , which makes a total of 56 substitution pairs ; note that we do not allow for the same E. coli transcription factor specificity to substitute both s70 domains 2.3 and 2.4 . 
+ For each of these substitutions we calculate correlation between the rates of transcription and binding afinities , as described in the previous subsection . 
+ The distribution of the correlation coeficients for the substitutions is shown in Fig. 3 , and the correlation for the actual s70 binding domains is also indicated in the figure for comparison . 
+ We see that the correlation in the case of the actual s70 binding domains is significantly larger compared to all the substitutions , with a very high statistical significance ( P value of B10 24 ) . 
+ Therefore , the good correlation is not an artificial consequence of some generic property of protein -- DNA interactions , and interaction domains of RNAP are indeed `` hardwired '' so as to reduce RNAP poising in the genome . 
+ 2.5 Kinetic properties of experimentally detected r70 promoters
+ We next investigate kinetic properties of 10 elements associated with 342 experimentally confirmed transcription start sites . 
+ Selection of the transcription start sites with experimentally confirmed transcription activity from RegulonDB database ,25 and alignment of 10 elements associated with these transcription start sites , is described in Methods . 
+ We substitute the consensus 10 element of the lacUV5 promoter with these aligned 10 elements , and for each of these substitutions we calculate the transcription rate and the binding afinity ; the obtained relationship between these two quantities is shown in Fig. 4A . 
+ One may expect that RNAP poising at the transcriptionally active sequences should be suppressed to a larger extent compared to the generic segments from the intergenic regions . 
+ However , in contrast to this expectation , we find that the correlation in the case of the transcriptionally active 10 elements is notably smaller than the correlation for the intergenic segments ( 0.75 vs. 0.84 , compare Fig. 4A with Fig. 1A ) ; to further assess this result , we analyze how the correlation changes when functional 10 elements are randomized . 
+ To obtain appropriate statistics , we randomize the set of aligned 10 elements 50 times , and then calculate the correlation coeficient for each randomization . 
+ Consistent with the result obtained above , the mean of the correlation coeficients for these randomizations is notably larger compared to the correlation for the actual 10 elements ( 0.85 vs. 0.75 ) , with a very high statistical significance ( P B 10 39 ) . 
+ Therefore , the DNA sequences of the transcriptionally active 10 elements indeed significantly decrease the correlation between the transcription rate and the binding afinity , and consequently increase the extent of RNAP poising . 
+ Finally , to visualize the effect of 10 element randomization , we show the relationship between the transcription rate and the binding afinity , for one instance of 10 element randomization 
+ 2.6 Extension of the mix-and-match model to kinetic parameters We here establish a connection between the surprising decrease in the correlation coeficient for functional 10 elements and a recently proposed mix-and-match model of promoter recognition .21 The mix-and-match model initially proposed that the strengths of the promoter elements , that interact with dsDNA , complement each other so as to achieve a necessary level of overall binding afinity . 
+ Subsequently , a more detailed statistical analysis showed that promoter elements match each other to achieve a necessary level of total promoter strength .26 We here consider an extension of this model to the kinetic parameters , where we propose that the binding afinity and the transition rate match each other to achieve a necessary level of transcription activity . 
+ To test such extension of the mix-and-match model , we start from the intergenic segments ( analyzed in Fig. 1A ) , and from the transcriptionally active 10 elements ( analyzed in Fig. 4A ) . 
+ From each of these two sets of sequences , we select the following two subsets : ( i ) 30 % of the sequences with the highest value of the transition rate from the closed to open complex ( kf ) and ( ii ) 30 % of the sequences with the lowest value of the transition rate . 
+ The transition rates from the closed to open complex ( kf ) are calculated according to eqn ( 2 ) ( see Methods ) . 
+ We next calculate the distribution of the binding afinities for these two subsets -- i.e. for the sequences with the high and the low values of the transition rate -- by using eqn ( 1 ) ( see Methods ) . 
+ For the intergenic segments , the distributions for the two subsets are shown together in Fig. 5A . 
+ Similarly , the two distributions for transcriptionally active 10 elements are shown together in Fig. 5B . 
+ In Fig. 5A , we see that , for the intergenic segments , the mean binding afinity is significantly smaller for the group with small kf values than for the group with high k values ( P o 10 100 f ) . 
+ This property decreases the extent of RNAP poising for the intergenic segments , i.e. sequences with low values of the transition rates are generally not characterized by high values of the binding afinities . 
+ Note that this result is directly related to the high value of the correlation between the binding afinity and the transcription rate for the intergenic segments . 
+ On the other hand , for the transcriptionally active 10 elements , the distribution of the binding afinities for the group with low kf is shifted towards the stronger binding afinities , relative to the same distribution for the intergenic segments . 
+ As a consequence , for transcriptionally active 10 elements , the group of promoters with high kf values has smaller mean binding afinities compared to the group with low kf values ( with P o 0.05 ) . 
+ This result is a consequence of the decrease in the correlation coeficient between the transcription rate and the binding afinity for the transcriptionally active 10 elements relative to the intergenic segments ( Fig. 4A vs. Fig. 1A ) , and is analyzed below in terms of the mix-and-match model for promoter recognition . 
+ Though unexpected , the result in Fig. 5B is straightforward to interpret in terms of the extension of the mix-and-match model to kinetic parameters . 
+ This figure shows that KB and kf complement each other , so that lower kf is accompanied by higher KB ; this is notably different from the intergenic regions , where sequences with low kf have tendency to have low KB . 
+ This match of the kinetic parameters for the transcriptionally active 10 elements allows us to achieve a suficient level of transcription activity ( which is proportional to the product of KB and kf ) . 
+ This result , and the extension of the mix-and-match model to kinetic parameters , is further discussed in the next section . 
+ 3 Discussion
+ Interactions of s70 with 10 promoter elements are crucial for initiation of transcription . 
+ These interactions involve s70 binding domains that interact with dsDNA and ssDNA , as well as DNA melting energies . 
+ We here analyzed how the interplay of these interactions affects kinetics of transcription initiation . 
+ A prominent example of such kinetic effects are poised promoters , which are sequences where RNAP strongly binds to dsDNA , but has a too slow transition from the closed to open complex to achieve detectable transcription levels . 
+ Extensive RNAP poising could be detrimental for eficient transcription , since unproductively bound RNAP can disrupt normal transcription regulation -- e.g. note that the bound RNAP molecule protects B75 bps of DNA , which is often comparable to the size of E. coli intergenic regions .27 Such unproductive binding can also require a significantly larger RNAP production , in order to achieve a suficiently high RNAP concentration for function of transcriptionally active promoters . 
+ Consequently , it seems plausible that specificities of different interactions and DNA sequences , which are involved in transcription initiation , are somehow tuned to prevent RNAP poising . 
+ We here investigated this possibility and showed that s70 -- DNA interaction domains , though physically independent , are designed to reduce the extent of RNAP poising in the intergenic regions . 
+ This reduction depends on a mutual relationship between all three types of the interaction parameters ( ssDNA , dsDNA and melting energies ) , which strongly suggests that binding specificities of s70 -- DNA interaction domains are tuned to evade a large number of poised promoters in the intergenic regions . 
+ As another evidence that reduction of RNAP poising is a major ` design ' constraint on specificities of s70 -- DNA interaction domains , we found that the actual s70 binding specificities lead to a much larger correlation between binding afinity and transcription rate compared with substitutions of these domains with specificities of other E. coli transcription factors . 
+ It is interesting that the reduction in the number of poised promoters depends on the binding specificities of s70 interaction domains , rather than on the sequence of the intergenic regions . 
+ Such design may allow modularity in reduction of RNAP binding through different bacterial species : while binding specificities of s70 interaction domains are known to be well conserved across different bacteria ,5 DNA sequences of the intergenic regions are widely different . 
+ Therefore , imposing the reduction in the number of poised promoters at the level of ( conserved ) interaction domains , rather than at the level of ( variable ) DNA sequence , provides a straightforward strategy to impose reduction of RNAP poising in diverse bacterial sequences . 
+ Furthermore , there are likely numerous simultaneous constraints on bacterial regulatory ( intergenic ) regions , since these regions must accommodate a number of functional motifs ( e.g. core promoters , transcription factor binding sites , terminators ) . 
+ Due to this , tuning the binding specificities of s70 interaction domains may be easier than imposing the absence of poised promoters at the level of DNA sequence . 
+ The fact that s70 interaction domains are designed to reduce the number of poised promoters implies that any DNA sequence will have a tendency for high correlation between the binding afinity and the transcription activity . 
+ Such high correlation was also observed for DNA sequences of transcriptionally active promoters . 
+ However , we found that DNA sequences of these promoters have a tendency to decrease this correlation , i.e. to increase the extent of RNAP poising . 
+ This finding is surprising , since one may expect that transcriptionally active sequences should evade RNAP poising . 
+ To better understand this result , it is useful to discuss it from the point of the recently proposed mix-and-match model of promoter recognition . 
+ This model proposes that strengths of promoter elements mix with each other , and match each other strengths , so as to achieve the necessary level of promoter strength .21,26 For example , a weaker 10 element may be complemented by a stronger 35 element , so that a necessary level of transcription activity is achieved .28 Actually , Fig. 4A shows that many substitutions of the 10 element of the lacUV5 promoter with 10 elements that correspond to the experimentally detected TSS fall below either the binding afinity or the transcription rate threshold . 
+ It is likely that , for a substantial number of such 10 elements , the strengths of the other elements within the promoter ( 35 element , spacer ) are adjusted ( ` matched ' ) so that the kinetic parameters for the entire promoter are above the thresholds . 
+ Furthermore , one should note that some of the known promoters depend on transcription factors in order to achieve suficient binding afinity and transcription rate , so that their basal values of the kinetic parameters are below the relevant thresholds . 
+ We here proposed to extend the mix-and-match model to the kinetic parameters ; consequently , the observed decrease in the correlation between the binding afinity and the transcription activity can be explained by the need to match the lower transition rate from the closed to open complex with higher binding afinity . 
+ Our results show that , though statistically significant , this decrease in the correlation is still small enough as not to turn a transcriptionally active promoter into a poised promoter . 
+ That is , the observed increase of RNAP poising at functional promoters is such that to allow matching of the kinetic parameters , but not such to cause dysfunctional transcription . 
+ We here predicted that a significant fraction of the strongly bound sequences correspond to poised promoters . 
+ This prediction may have a direct consequence on experiments that identify transcription start sites by detecting sequences to which RNAP strongly binds , such as ChIP-chip or ChIP-seq experiments ; such measurements provide experimental strategy to detect transcription start sites on a genome-wide scale . 
+ Actually , it is interesting that the number of poised promoters , which is estimated here ( B30 % of the strongly bound sequences ) , roughly matches with the reported number of false positives in ChIP-chip experiments .16 However , care must be taken when literally comparing false positives in ChIP-chip experiments with our in silico results , due to possible different choices of the binding thresholds . 
+ That is , the binding threshold is to a good degree provisional in ChIP-chip experiments , i.e. it depends on the signal intensity above which the sequences are considered to be targets . 
+ Therefore , the binding threshold is likely different from one ChIP-chip experiment to the other , and may also be different from the choice of binding threshold in our study . 
+ Consequently , it is likely that false positives in ChIP-chip experiments come from both sequences that are poised promoters and from technical issues such as biases in DNA amplification or imperfect immunoprecipitations of DNA fragments cross-linked to protein . 
+ Furthermore , the importance of the kinetic effects strongly suggests that they should be incorporated in bioinformatic methods for TSS detection . 
+ In fact , TSS detection in bacteria is a classic bioinformatic problem , where available methods show poor accuracy .18 b ,24,29 An alternative to current methods , which are based on information theory , is a biophysics method that would detect promoters based on the calculated transcription rate . 
+ A major dificulty in developing such a method is that interactions of s70 with 35 element have ( to our knowledge ) not been measured until now . 
+ Note that in our in silico experiments we varied the 10 element , while the sequence of 35 element remained constant . 
+ While such a design is evidently useful for studying the interplay of physical interactions at the 10 element , it is not convenient for promoter detection , since promoters sample sequences with variable 35 elements . 
+ A solution to this problem may be a mixed bioinformatic and biochemical parameterization , which is our work that is currently in progress . 
+ In this work , we investigated kinetic effects of transcription initiation on a genome-wide scale . 
+ Such analysis is , to our knowledge , the first of its kind , since there is currently no high-throughput method for measuring kinetic parameters of transcription initiation for sequences of interest . 
+ Consequently , the kinetic parameters have to be experimentally measured through classical , but time-consuming , t-plot measurements , individually for each sequence of interest . 
+ To overcome this dificulty , we here used a quantitative model of transcription initiation , which showed a very good agreement with experimental data , and which allows eficient calculation of the kinetic parameters . 
+ The computational procedure also allowed 70 repeatedly altering both specificities of s -- DNA interaction domains , and relevant DNA sequences , which is experimentally not feasible . 
+ We consequently designed a set of experiin silico ments , which use a model of the specific biochemical process ( transcription initiation ) , in order to study kinetics of transcription initiation on a much larger ( whole genome ) scale . 
+ Through the in silico experiments we found that the extent of RNAP poising in the genome is highly suppressed , where this suppression is at the level of s70 interaction domains , rather than the DNA sequence . 
+ However , despite this suppression , a significant fraction of the sequences that are strongly bound by RNAP correspond to poised promoters . 
+ This significant fraction of poised promoters is directly relevant for interpreting results of experimental and computational searches of transcription start sites . 
+ Furthermore , we surprisingly found that sequences of the functional promoters increase the extent of RNAP poising , which we interpreted in terms of the mix-and-match model of promoter recognition . 
+ Overall , the analysis presented here strongly suggests that the kinetic effects are important , and that they should be incorporated in methods for core promoter detection . 
+ It is likely that this will allow both increasing the accuracy of computational predictions and better understanding the results of the experimental searches . 
+ 5 Methods
+ 5.1 Calculation of the kinetic parameters
+ To calculate the relevant kinetic parameters , we use a biophysical model of transcription initiation .8 For completeness , in ESI , † we summarize elements of this model that are directly relevant for the analysis presented here . 
+ Briefly , the model is used to express the rate by which RNAP opens the two DNA strands , in terms of the interactions of s70 with ssDNA and dsDNA , and DNA melting energies . 
+ To parameterize the model , we use a widely used independent nucleotide approximation ,30 according to which the interaction energies are given by the sum of the terms that correspond to different bases at different positions . 
+ Also , in this study we vary only the sequence of the 10 element , so that the energy terms that are associated with 35 element interactions and spacer lengths do not enter the relevant equations . 
+ Consequently , the binding afinity KB , the rate of transition from the closed to open complex kf , and the rate of transcription initiation j are given below , respectively , by eqn ( 1 ) , ( 2 ) and ( 3 ) ( see ref . 
+ 8 and ESI † ) : X 4 6 X . 
+ D ðdsÞ c G k T S i ; a B i ; a i 1/4 1 a 1/4 1 where in the last equation we introduced the effective binding energy DG ( eff ) i , a : . 
+ < > D ðssÞ ðmÞ 8 G þ i ; a DGa kBT for i 2 ð2 ; 6Þ DG > : D ðdsÞ ðeffÞ i ; a . 
+ G kBT for i 1/4 1 i ; a 
+ In the equations above , the index i denotes different positions within the 10 box , so that i = 1 corresponds to the position 12 , while i = 6 corresponds to the position 7 , relative to the transcription start site . 
+ Further , a denotes the four different bases ( A , T , C or G ) , while Si , a is equal to one if base a is present at position i in sequence S , and is equal to zero otherwise . 
+ Furthermore , DG ( m ) a denotes the melting energies of different bases , DG ( ss ) ia denotes the interaction energies of s with different bases at different positions of the non-template strand in the open complex , and DG ( ds ) ia denotes the interaction energies of s with different bases at different positions of duplex DNA for the 10 box . 
+ Note that the base 12 ( i = 1 ) appears asymmetrically in the expression for the effective energy ( see eqn ( 4 ) ) , since this is the only base of the 10 element that remains double stranded in the open complex .6 Also , note that due to the symmetry of the two DNA strands DG ( m ) T C G A = DG ( m ) and DG ( m ) = DG ( m ) , so that there are effectively two parameters that determine melting energy in the single nucleotide approximation . 
+ 5.2 Alignment of 10 promoter elements
+ To align 10 elements , we use the assembly of transcription start sites from RegulonDB database .25 This assembly includes both experimentally verified promoters and computational predictions , and corresponds to both s70 and alternative s factors . 
+ For our alignment , we select only experimentally verified s70 transcription start sites , i.e. we disregard all transcription start sites that are either not experimentally validated , or correspond to alternative s factors . 
+ This selection results in the total of 342 s70 transcription start sites , and we use the obtained start sites in order to extract DNA segments that correspond to positions 17 to 2 , relative to the transcription start sites . 
+ These positions were chosen having in mind that the position of 10 element can deviate for 5 bps relative to its canonical position ( 12 to 7 ) .31 
+ To identify the 6 bp long 10 elements within the selected DNA segments , we used the Gibbs sampler .32 The Gibbs sampler implements a version of the Gibbs search algorithm ,33 which is used to perform unsupervised motif alignment . 
+ Only the DNA strand defined by the direction of transcription was searched , since both 10 box and 35 box motifs are not palindrome symmetric . 
+ The search was done with the initial assumption that one motif element is present in each DNA segment ; however , in the end of the Gibbs sampler search , individual motif elements are added in or taken out , in a single pass of the algorithm , depending upon whether or not their inclusion improves the value of the alignment score . 
+ The last step allows excluding from the alignment those sequences that do not contain 10 box motifs , e.g. due to database miss-assignments . 
+ The search resulted in the identification of 322 aligned 10 boxes that correspond to the experimentally confirmed s70 transcription start sites in E. coli ; these aligned 10 elements were used in the further analysis . 
+ 5.3 Randomization of interaction specificities and DNA segments 
+ We aim to randomize the interaction specificities , without changing the overall strength of s70 -- DNA interactions . 
+ To achieve this , it is useful to visualize the interaction parameters in the form of a matrix , where index i corresponds to different positions within the 10 element , while index a corresponds to four different bases . 
+ OveralP l interaction strength for energy matrix ei , a can be defined as e 2.19 b i ; a Consequently , to rando-i ; a mize the interaction specificities , we randomly permute elements of the interaction matrix , whichP randomizes the interaction specificity but does not change e 2 i ; a . 
+ In order to i ; a obtain statistics for quantities of interest , we randomize a given matrix 50 times , according to the procedure described above . 
+ In order to randomize the interactions corresponding to DNA melting , we simply permute energies that correspond to AT ( DG ( m ) T C G A = DG ( m ) ) and GC base pairs ( DG ( m ) = DG ( m ) ) . 
+ This procedure results in a single randomization , and is a consequence of the fact that in the single nucleotide approximation there are only two parameters that describe DNA melting ( see above ) . 
+ We randomize DNA sequences , i.e. intergenic regions and 10 elements that correspond to the experimentally confirmed transcription start sites ( see above ) , by randomly permuting the bases within the sequences . 
+ Note that such randomization preserves nucleotide ( GC ) content of the sequences . 
+ Similar to s70 -- DNA interaction domains , to obtain appropriate statistics we randomize a given DNA sequence 50 times . 
+ 5.4 Interaction parameters for E. coli transcription factors
+ We use protein -- DNA interaction parameters that were obtained in ref . 
+ 19b . 
+ These interaction parameters were inferred from E. coli transcription factor binding sites which were assembled in DPInteract database .24 The interaction parameters were inferred from the example binding sites by using the QPMEME ( Quadratic Programming Method of Energy Matrix Estimation ) algorithm . 
+ To ensure a high accuracy of the inferred protein -- DNA interaction parameters , we select those transcription factors ( i.e. their corresponding interaction parameters ) , for which the following two conditions are satisfied : ( i ) the number of the example binding sites assembled in DPInteract database is larger than 10 , ( ii ) over representation for the transcription factor is also larger than 10 . 
+ The first condition ensures that too few example binding sites do not lead to overfitting of the interaction parameters . 
+ The second condition ( over representation ) is related to a measure of significance/functionality of the inferred interaction parameters .19 b This procedure results in selection of the interaction parameters for eight E. coli transcription factors . 
+ We then use the inferred interaction parameters for the selected E. coli transcription factors in order to substitute interaction specificities of s2 .3 ( s70 -- ssDNA interactions ) and s2 .4 ( s70 -- dsDNA interactions ) binding domains . 
+ A technical dificulty is that the length of s2 .3 and s2 .4 binding sites ( 5 bps and 6 bps , respectively ) is generally different ( shorter ) than the length of binding sites of the selected E. coli transcription factors . 
+ To resolve this dificulty , we select a subset of adjacent positions that correspond to maximal binding specificity within the interaction domain of each transcription factor ; the length of the selected adjacent positions corresponds to the length of s2 .3 or s2 .4 binding positions ( i.e. 5 bps or 6 bps ) . 
+ To select the adjacent positions with maximal specificity , we use a definition of the binding specificity si P at position i of the energy matrix e 2 i ; a : si 1/4 ei ; a . 
+ a
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/23580539.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/23580539.txt 0 → 100644
View file @27818a9
+ High-Resolution Mapping of In vivo Genomic Transcription Factor Binding Sites Using In situ DNase I Footprinting and ChIP-seq
+ Graduate School of Biological Sciences , Nara Institute of Science and Technology , 8916-5 , Takayama , Ikoma , Nara 630-0192 , Japan1 ; Department of Life Science and Informatics , Maebashi Institute of Technology , 460-1 , Kamisadori , Maebashi-City , Gunma , Japan2 ; Plant Global Education Project , Graduate School of Biological Sciences , Nara Institute of Science and Technology , 8916-5 , Takayama , Ikoma , Nara 630-0192 , Japan3 and School of Biosciences , The University of Nottingham , Sutton Bonington Campus , Sutton Bonington , Loughborough , Leicestershire LE12 5RD , UK4 
+ * To whom correspondence should be addressed . 
+ Tel. þ81-743-72-5431 ( S.I. and T.O. ) . 
+ Fax . 
+ þ81-743-72-5439 ( S.I. and T.O. ) . 
+ Email : shu@bs.naist.jp ( S.I. ) ; taku@bs.naist.jp ( T.O. ) . 
+ Abstract
+ Accurate identiﬁcation of the DNA-binding sites of transcription factors and other DNA-binding proteins on the genome is crucial to understanding their molecular interactions with DNA . 
+ Here , we describe a new method : Genome Footprinting by high-throughput sequencing ( GeF-seq ) , which combines in vivo DNase I digestion of genomic DNA with ChIP coupled with high-throughput sequencing . 
+ We have determined the in vivo binding sites of a Bacillus subtilis global regulator , AbrB , using GeF-seq . 
+ This method shows that exact DNA-binding sequences , which were protected from in vivo DNase I digestion , were resolved at a comparable resolution to that achieved by in vitro DNase I footprinting , and this was simply attained without the necessity of prediction by peak-calling programs . 
+ Moreover , DNase I digestion of the bacterial nucleoid resolved the closely positioned AbrB-binding sites , which had previously appeared as one peak in ChAP-chip and ChAP-seq experiments . 
+ The high-resolution determination of AbrB-binding sites using GeF-seq enabled us to identify bipartite TGGNA motifs in 96 % of the AbrBbinding sites . 
+ Interestingly , in a thousand binding sites with very low-binding intensities , single TGGNA motifs were also identiﬁed . 
+ Thus , GeF-seq is a powerful method to elucidate the molecular mechanism of target protein binding to its cognate DNA sequences . 
+ Key words : GeF-seq ; ChIP-seq ; AbrB ; Bacillus subtilis 
+ 1. Introduction
+ Genome-wide mapping of the in vivo DNA-binding sites of transcription factors or other DNA-binding proteins either by Chromatin Immunoprecipitation coupled with microarray ( ChIP-chip ) 1 or by the recently developed ChIP coupled with high-throughput sequencing ( ChIP-seq ) method have become widely used techniques in protein -- DNA interaction research .2 -- 5 The resolution of the DNA-binding sites determined by ChIP-seq was a dramatic improvement on the resolution that was possible using ChIP-chip , because of the higher resolution of high-throughput sequencing compared with oligonucleotide arrays . 
+ However , for both techniques , the DNA fragments , co-puriﬁed with the target protein ( ChIP-DNA ) , are generated by sonication and generally fall within the size range of 100 -- 500 bp . 
+ These sonicated fragments are often much longer than the actual protein-binding site and , thus , the sequence tags of the ChIP-DNA distribute in broad regions around the actual binding sites . 
+ In add-ition , as only the terminal sequences of ChIP-DNA fragments can be obtained by high-throughput sequencing , piled ChIP-seq tags on the forward ( þ ) and reverse ( 2 ) strands usually show bimodal peaks .6,7 To overcome these problems and determine the actual protein-binding sites to within a few 10 bp , algorithms for the processing of ChIP-seq data have been proposed , although the results obtained by them are still predictive .6 -- 9 Thus , more precise experimental mapping methods are required to determine the exact binding sites of DNA-binding proteins using ChIP-seq technology . 
+ Recently , the ChIP-exo method , which trims the 50-region of the protein-unbound region of ChIP-DNA by the use of 50 -- 30 lambda ( l ) exonuclease , has been developed , and this method demonstrated an improvement in resolution in determining the DNA-binding sites of target eukaryotic proteins through the determination of the edge positions of protein-bound genomic sequences .10 In contrast to DNA exonucleases , DNase I preferentially cleaves endogenous DNA regions that are not protected by bound proteins and , thus , has been employed for in vitro footprinting to precisely determine the DNA-binding sites of DNA-binding proteins .11 Using DNase I digestion , Vora et al. 12 proposed a method , designated in vivo protein occupancy display ( IPOD ) , which visualizes the in vivo binding proﬁle of total DNA-binding proteins on genomic DNA .12 In this method , genomic DNA cross-linked with total proteins and extracted from formaldehyde-treated cells was digested with DNase I , and the DNase I-resistant DNA fragments were puriﬁed by phenol extraction and mapped using a tiling array . 
+ We report here a novel method designated as Genome Footprinting by high-throughput sequencing ( GeF-seq ; in vivo GeF-seq ) . 
+ This method combines in situ DNase I digestion of bacterial genomic DNA with a modiﬁed ChIP-chip method ( ChAP-chip , Chromatin Afﬁnity Precipitation-chip ) we previously developed .13 Unlike IPOD , GeF-seq can visualize the binding proﬁle of a speciﬁc target protein at a resolution seen at the in vitro footprinting level . 
+ We evaluated the resolution achieved using the GeF-seq method by examining the binding proﬁle of the Bacillus subtilis transition state regulator , AbrB , in comparison with results obtained by ChAP-chip and a modiﬁed ChIP-seq method ( ChAP coupled with high-throughput sequencing ) utilizing sonication to fragment the genomic DNA . 
+ AbrB represses the expression of many genes during exponential growth , and we have demonstrated using ChAP-chip that AbrB binds to hundreds of sites throughout the entire B. subtilis genome during exponential growth .14 AbrB is a small protein ( 10.4 kDa ) , having a unique structure . 
+ The N-terminal domains of two AbrB molecules form a single DNA-binding domain , and AbrB forms a tetramer having a stable DNA-binding ability , via both N-terminal and C-terminal interactions . 
+ Structural modelling of AbrB bound to the target sequence indicated that the AbrB tetramer would interact with 20 bp sequences ,15 whereas in vitro footprinting studies detected a wider range of binding regions from 25 to 80 bp , suggesting that a higher order structure of the AbrB tetramer may be involved in DNA binding at some sites on the chromosome .16 -- 18 We previously proposed that AbrB binds to bipartite TGGNA motifs based on the in vivo AbrB-binding regions determined by ChAP-chip ana-lysis ,14 which is in accordance with a motif identiﬁed by the in vitro SELEX method .17 However , the consensus sequence was detected in a small number of AbrB-binding regions , and the consensus DNA-binding sequence for AbrB is not completely understood at present . 
+ We demonstrate here that , by mapping the sequences of short DNA fragments co-puriﬁed with AbrB after in situ DNase I digestion of the genomic DNA , the AbrB-binding proﬁle could be visualized with a resolution comparable with that of in vitro footprinting . 
+ Importantly , the BiPad web server for model-ling bipartite sequence elements19 automatically detected consensus sequences for AbrB binding in .95 % of the experimentally determined binding sites . 
+ Moreover , highly accurate DNA-binding site information obtained by GeF-seq enabled us to obtain a comprehensive view of the correlation between AbrBbinding signals and cognate recognition sequences ; AbrB not only binds to bipartite motifs in sequences with high binding signals , but also to single-sequence motifs in sequences with low signals . 
+ These results demonstrate the usefulness of the GeF-seq method . 
+ 2
+ . Materials and methods
+ 2.1. Bacterial strain
+ Bacillus subtilis strain OC001 expressing C-terminal 2HC ( 12 histidines plus a chitin-binding domain ) - tagged AbrB ( AbrB-2HC ) was used throughout .14 
+ 2.2. ChAP-chip and ChAP-seq
+ ChAP-chip data for AbrB binding on the B. subtilis genome were taken from our previous report .14 DNA fragments for ChAP-seq analysis were prepared , as previously described .13,14 Construction of the DNA library for Illumina sequencing was as described below except for the size of the DNA fragments used : 250 bp fragments , corresponding to 150 bp DN fragments isolated by ChAP without adapter sequences , were selected for PCR enrichment . 
+ 2.3 . 
+ In situ DNase I digestion of genomic DNA The GeF-seq method is schematically illustrated in Fig. 1A . 
+ To cross-link protein -- DNA complexes , 400 ml of OC001 ( abrB-2HC ) cells grown to the exponential phase in Luria-Bertani medium at 378C were treated with formaldehyde as previously described .14 To hydrolyze the cell wall without osmotic burst , cells were treated with 5 mg/ml lysozyme in 3 ml of isotonic sucrose-malate-magnesium buffer ( 0.02 M maleic acid , 0.5 M sucrose , and 0.02 M MgCl2 , pH 6.5 adjusted with NaOH ) 20 in the presence of 1 mM phe-nylmethylsulfonyl ﬂuoride ( PMSF ) . 
+ After 20-min incubation at 378C with mixing , cells were collected by centrifugation at 6000 g for 5 min at 48C . 
+ Cells were resuspended in 0.5 ml of a buffer containing 0.1 M Tris -- HCl ( pH 7.5 ) , 0.2 M NaCl , 1 % ( v/v ) Triton X-100 , 0.1 % ( w/v ) Na-deoxycholate , 0.2 % ( w/v ) Brij 58 , and 20 % ( v/v ) glycerol . 
+ To determine suitable conditions for in situ DNase I digestion of genomic DNA , four samples of OC001 cells were prepared as described above and mixed with 10 ml of RNase A ( 10 mg/ml ) and 50 ml of a solution containing 100 mM MgCl2 and 50 mM CaCl2 . 
+ DNase I digestion was started with the addition of 0.5 , 0.3 , 0.2 , and 0.1 units ( U ) of DNase I ( corresponding to a ﬁnal concentration of 1 , 0.6 , 0.4 , and 0.2 U/ml ) ( Takara ) and incubated at 378C with shaking ( 230 rpm ) for 30 min . 
+ The reaction was terminated by urea denaturation upon the addition of 3 ml of urea-Triton buffer [ 0.1 M 4 - ( 2-hydro-xyethyl ) -1 - piperazineethanesulfonic acid ( pH 7.5 ) , 0.01 M imidazole , 8 M urea , 0.5 M NaCl , 1 % Triton X-100 , 10 mM b-mercaptoethanol , and 1 mM PMSF ] instead of ethylenediaminetetraacetic acid , which severely inhibits protein puriﬁcation by Dynabeads TALON ( invitrogen ) . 
+ The samples were then sonicated on ice using an Astrason Ultrasonic Processor XL ( Misonix ) for 10 min ( 4 s ` on ' and 10 s ` off ' , at output level 5 ) . 
+ After centrifugation to remove cell debris , 30 ml of the supernatant was mixed with 70 ml of M-wash buffer ( 0.1 M Tris -- HCl , pH 7.5 , 1 % sodium dodecyl sulfate , 0.01 M dithiothreitol ) and incubated at 658C overnight to reverse the cross-linking . 
+ After the removal of proteins by phenol -- chloroform -- isoamyl alcohol treatment , DNA was recovered by ethanol precipitation in the presence of glycogen , resuspended in 50 ml of nucle-ase-free water and run on a 2 % agarose gel ( Fig. 1B ) . 
+ Treatment with 0.5 units of DNase I ( 1 U/ml ) generated DNA fragments ,100 bp in size , and incubation with higher amounts of DNase I resulted in a decrease in the amount of DNA detected by agarose gel electrophoresis ( data not shown ) . 
+ Thus , we selected 0.5 units ( 1 U/ml ) of DNase I for further analysis . 
+ 2.4 . 
+ Afﬁnity puriﬁcation of DNA fragments bound to AbrB AbrB -- DNA complexes were afﬁnity-puriﬁed from the clariﬁed DNase I-treated cell lysate , using 13,14 Dynabeads TALON as described previously , but with the following modiﬁcation : after protein -- DNA complexes were puriﬁed and reverse cross-linked by heat treatment at 65 C overnight , proteins were 8 removed using two phenol -- chloroform -- isoamyl alcohol extractions , and DNA fragments were recovered by ethanol precipitation in the presence of glycogen . 
+ 2.5 . 
+ Sequencing of DNA fragments co-puriﬁed with AbrB The DNA library for sequencing by the Illumina Genome Analyzer IIx ( GAIIx ) was generated using the NEB Next DNA Sample Prep Reagent kit ( New England BioLabs ) according to manufacturer 's instructions for ` Preparing Samples for Sequencing Genomic 
+ DNA ' ( Illumina ) with the following modiﬁcation ; after ligation of the adapters to the DNA fragments , the ligated product was run on a 2 % [ Tris-acetate-EDTA ( TAE ) ] low-range agarose gel ( Biorad ) at 50 V for 2.5 h in TAE buffer and the region of the gel 150 bp ( although the DNA was not visible on the gel ) , corresponding to 50 bp fragments without adapter sequences , was excised . 
+ The DNA fragments were then puriﬁed using a QIAquick Gel Extraction kit ( Qaigen ) and ampliﬁed using 14 cycles of PCR , to obtain at least 1 fmol of DNA library . 
+ The amount of DNA was determined by an Agilent 2100 Bioanalyzer using the High-Sensitivity DNA Kit ( Agilent ) . 
+ The sequence of the library was then determined by 75-bp single-ended sequencing using the Illumina GAIIx sequencer according to the manufacturer 's instructions . 
+ 2.6 . 
+ Mapping of read sequences and normalization of tag counts A total of 10 369 855 read sequences obtained from the Illumina GAIIx were mapped on the reference genome ( B. subtilis str . 
+ 168 , NC_000964 .3 ) , and the mapping results were visualized using the mpsmap and psmap softwares ( http://metalmine . 
+ naist.jp / maps/gefseq ) , respectively .21 Because DNA fragments of 50 bp ( without adapter sequences for PCR ampliﬁcation ) were selected in the sample preparation process to obtain complete sequences of the ChAP-DNA fragments , most of the reads reached into the adapter sequence attached to the 30-end of ChAP-DNA . 
+ Thus , unlike general IlluminaTM sequen-cing results obtained by following the instruction manual , most of the read sequences consisted of 50 bp of ChAP-DNA sequence followed by the adapter sequence , and both of these sequences varied in length . 
+ Since mapping of such different lengths of sequence containing the unmappable adapter sequence was not possible using a standard sequence mapping/assembly program , we utilized the property of mpsmap that maps different length sequences to the best chromosomal location , while allowing up to a speciﬁed number of mismatches without a gap . 
+ In this study , the read sequences were initially mapped allowing a maximum of 35 mismatches , and the adapter sequences were ﬁnally removed . 
+ As a result of the ﬁrst mapping , 9 685 519 ( 93 % ) of the read sequences were uniquely mapped to the reference genome . 
+ ( Thus , the genomic regions encoding the 10 rRNA operons were not included in the present analysis . ) 
+ Then , to remove the adapter sequences , the starting positions were assigned to seven or more bases allowing a two-base mismatch matched with 50-end of the primer sequence ( AGATCGGAAGAGCTCGTATGCCGTCTTCTGC 
+ TGA ) in the 30-region of the read sequences . 
+ In addition , mapped sequences ( without adapter sequences ) with .2 bp mismatches against the reference sequence were removed , and 8 571 055 ( 83 % ) of the read sequences remained for further analysis . 
+ Finally , in order to normalize the difference in the local copy number of genomic DNA , counts of mapped reads at each nucleotide position along the genome sequence were linearly scaled by using the oriC/terC ratio ( 5.15 ) , estimated by sequencing and mapping of whole genomic DNA fragments digested by DNase I , to deﬁne the AbrB-binding signals ( Supplementary Fig . 
+ S1 ) . 
+ Results shown in Supplementary Fig . 
+ S1 suggested that there was preferential digestion of AT-rich genomic sequences by DNase I. However , mapping results of the distribution of ChAP-DNA sequences suggested that the preferentiality apparently did not affect the quantitative estimation of the AbrBbinding proﬁle ( Supplementary Fig . 
+ S2 ) . 
+ 2.7. Detection of protein-binding regions
+ Most of the read sequences were mapped on distinct regions along the genome surrounded by regions where ends of the read sequences accumulated ( Fig. 3C ) . 
+ We used this feature to deﬁne the 
+ AbrB-binding sites . 
+ To estimate the end points of the genomic sequences in the read sequences more precisely , we reanalysed them so that adapter sequences at the 30-ends of the sequences could be subtracted from the genomic DNA they had been attached to during generation of the library for sequencing . 
+ We assigned sequences as ` adapter sequences ' when ﬁve bases at the 30-end of the read sequence were identical to the adapter primer sequence and the following sequences matched to the primer sequence with no more than two bases mismatched . 
+ The accumulation proﬁle of the 30-ends thus determined across the genome sequence was similar to that of 50-ends , which was deﬁned as the ﬁrst base of the read sequences ( Supplementary Fig . 
+ S3 ) , strongly suggesting that the procedure to estimate the 30-ends of read sequences was reliable . 
+ Then , the left ends of the read sequences relative to the reference genome sequence were deﬁned as a sum of the 50-ends of read sequences mapped on the plus strand and the 30-ends of read sequences mapped on the minus strand , whereas the right ends were deﬁned as a sum of the 30-ends of read sequences mapped on the plus strand and the 50-ends of read sequences mapped on the minus strand ( Supplementary Fig . 
+ S3 ) . 
+ We counted the numbers of left and right ends mapped to each nucleotide , and positions with 10 ends of read sequences and with the highest number of ends within +30 bp windows were determined in 1 bp steps , as candidates for the left and the right boundar-ies of the DNA-binding sequences . 
+ Then , we extracted regions surrounded by a pair of possible left and right boundaries positioned within a range from 25 to 80 bp ( considering the in vitro AbrB footprinting results ) , and regions , where AbrB-binding signal intensities exceeded a threshold value at more than half of the nucleotides between them , were extracted as AbrB-binding sites . 
+ In this study , we ﬁrst extracted AbrB-binding sequences using the signal intensity corresponding to the top 10th percentile of all nucleo-tides across the genome as the threshold ( Supplementary Fig . 
+ S4 ) . 
+ At some regions , different combinations of boundaries surrounding the overlapping sequences satisﬁed the criteria . 
+ In such cases , the innermost sequences were selected as AbrB-binding sequences for further analysis . 
+ Finally , the average of the AbrB-binding signals within the individual binding sequence was calculated , as the AbrBbinding signal intensities of each binding site . 
+ 2.8. Motif analysis
+ AbrB-binding DNA motifs were analysed by the BiPad web server ( http://bipad.cmh.edu ) for model-ling bipartite sequence elements .19 The BiPad program performs multiple local alignment by entropy minimization and cyclic reﬁnement using a stochastic greedy search strategy , and we used the following settings : left half-site , gap range lengths , right half-site , and the iteration cycles were set to 9 , 0 or 1 , 9 , and 500 , respectively . 
+ To examine the possibility of whether the AbrB-binding motif was discovered by chance , we selected three sets of data , each of which consists of 300 50 bp sequences randomly selected from the B. subtilis 168 genome sequence by the RSA tool ,22 and analysed by Bipad . 
+ 2.9. Sequencing data
+ Sequencing data in this study have been submitted to the DDBJ Sequence Read Archive ( DRA ) and the BioProject database under accession code DRA0 00758 and PRJDB675 , respectively . 
+ 3. Results
+ 3.1. In vivo GeF-seq
+ To improve the resolution of protein-binding site determination by ChIP-seq or ChAP-seq methodologies , we attempted in situ DNase I digestion of the cross-linked bacterial nucleoid to restrict the size of DNA fragments co-puriﬁed with the target protein to directly interacting sequences ( Fig. 1A ) . 
+ We employed B. subtilis AbrB as a model protein , whose binding sites were recently determined by use of the ChAP-chip method to be .600 sites scattered across the genome .14 Exponentially growing B. subtilis OC001 cells expressing C-terminal 2HC ( 12 histidines plus a chitin-binding domain ) - tagged AbrB ( AbrB-2HC ) were treated with formaldehyde to stabilize the protein -- DNA interactions by cross-linking , and the collected cells were treated with lysozyme in isotonic buffer to facilitate an efﬁcient penetration of DNase I into cells . 
+ Then , the genomic DNA was fragmented to ,100 bp by the DNase I treatment , followed by the afﬁnity puriﬁcation of the cross-linked AbrB -- DNA complexes using cobalt-coated magnetic beads . 
+ DNA fragments co-puriﬁed with AbrB ( ChAP-DNA ) were isolated after reversing the cross-linking between proteins and DNA . 
+ As we intended to obtain whole sequences of ChAP-DNA to avoid the bimodal distribution of sequence tags , DNA fragments containing 50 bp of inserted DNA , after ligation of adapter sequences , were selected to prepare the library for high-throughput sequencing by Illumina GAIIx . 
+ It has been demonstrated that AbrB interacts with 20 bp sequences15 and , thus , we also expected that 50 bp fragments would be enough to cover single AbrB-binding sites . 
+ Single-ended 75-bp sequencing by the Illumina GAIIx generated 9 685 519 ( uniquely mapped ) sequence reads . 
+ As expected , most of the read sequences ( 88.5 % ) contained the adapter sequences for PCR ampliﬁcation at the 30-end portion , with an average insert size of 50 bp after removal of them ( Supplementary Fig . 
+ S5 ) , and insert sequences were mapped on distinct sites on the B. subtilis genome . 
+ Then , counts of the mapped reads at each nucleotide position along the genome were normalized for differences in the local copy number of genomic DNA , to deﬁne the AbrB-binding signals . 
+ 3.2 . 
+ Comparison of the distribution of AbrB-binding signals determined by GeF-seq , ChAP-seq , and ChAP-chip To evaluate the resolution of the GeF-seq method in identifying genomic protein-binding sites , we initially compared the distributions of AbrB-binding signals along the genome as determined by three methods : GeF-seq , ChAP-seq , and ChAP-chip . 
+ The distributions of the AbrB-binding signals on the genome determined by GeF-seq and ChAP-seq in the present study were highly consistent with that of ChAP-chip we reported previously .14 Typical examples of the comparison are presented in Fig. 2 , and the complete proﬁles of the binding signals across the genome obtained by the three methods are available in Supplementary Fig . 
+ S2 . 
+ Close-up views of proﬁles of AbrB-binding signals ( Fig. 3A and B ) indicated that although the ChAP-seq method improved the resolution of detection of the binding regions compared with ChAP-chip , the GeF-seq method dramatically improved the resolution even when compared with ChAP-seq . 
+ Importantly , GeF-seq could resolve the closely positioned binding sites that appear as one peak in the ChAP-seq method , as shown in Fig. 3A . 
+ Using ChAP-seq , binding sites were often detected as two broad peaks on the forward ( þ ) and reverse ( 2 ) strands , as previously reported .6 In contrast , using GeF-seq , the distributions of sequence tags on the plus and 2 strands overlapped in the middle of the two ChAP-seq peaks ( Fig. 3B ) . 
+ Thus , the use of short DNA fragments enabled the conclusive determination of protein-bound regions of DNA without the necessity for the bioinformatic prediction of the binding sites . 
+ In addition , AbrB-binding signals at each binding site generally distributed in a trapezoid form , and the ends of the read sequences accumulated at the left and right edges ( Fig. 3C ) . 
+ These observations strongly suggested that in situ DNase I digestion occurred at the boundaries of proteinbinding sites , as observed in in vitro DNase I footprinting . 
+ Furthermore , the lengths of sequences protected from DNase I digestion ( 27 -- 80 bp ) suggested that these sequences would be interacting with one to three AbrB tetramer ( s ) . 
+ We used this feature to deﬁne AbrB-binding sequences , as described below . 
+ 3.3 . 
+ Determination of AbrB-binding sequences To automatically extract DNA sequences bound by AbrB from the GeF-seq results , we developed an analytical pipeline as described in Materials and methods . 
+ Brieﬂy , we ﬁrst surveyed pairs of nucleotide positions showing the highest accumulation of ends of read sequences , as candidates for the borders of the protein-binding regions . 
+ Then , AbrB-binding signals between them were evaluated using a relaxed threshold value , corresponding to the signal intensity at the top 10th percentile of all nucleotide positions across the genome ( Supplementary Fig . 
+ S4 ) . 
+ This resulted in 5897 possible AbrB-binding sites being detected ( Supplementary Table S2 ) , which included not only speciﬁc , but also non-speciﬁc AbrB-binding , sites . 
+ These were extracted and ranked by their average binding signal intensities of nucleotides included in each site . 
+ The peak ID was given from 1 to 5897 by their intensity ranked from high to low , respectively . 
+ The top 700 binding sites accompanied by highbinding signal intensities were ﬁrst examined , because this number was approximately similar to that obtained by previous ChAP-chip analysis ( 694 binding sites ) .14 The length of the AbrB-binding regions determined by the GeF-seq ranged from 27 to 79 bp ( Supplementary Fig . 
+ S6 ) , which was consistent with the results of in vitro footprinting experiments listed in a database of transcriptional regulation in B. subtilis23 and in recent reports .17,18,24 Among 32 AbrB-binding sites previously determined by in vitro DNase I footprinting , our GeF-seq experiment detected 11 AbrB-binding sequences ( Fig. 3C and Supplementary Table S1 ) . 
+ GeF-seq also detected 10 AbrB-binding sequences within the 5897 possible AbrB-binding sites , although binding intensities were lower than those of the top 700 binding sites ( Supplementary Table S1 ) . 
+ Thus , we found that these 21 binding sites matched those obtained by in vitro DNase I footprinting . 
+ These results indicate that our GeF-seq method has the ability to detect proteinbound DNA sequences with a resolution comparable with that of the in vitro footprinting method , although differences in boundaries are observed between our GeF-seq result and the in vitro footprinting result , which may result from differences in conditions between in vivo and in vitro experiments . 
+ 3.4 . 
+ Identiﬁcation of consensus sequences for the AbrB binding In previous ChAP-chip analysis ,14 we found a possible consensus sequence for AbrB binding to be TNCCA -- 4 bp -- TGGNA , which is composed of a pair of two AbrB-binding motifs previously identiﬁed by the in vitro SELEX method .17 However , those motifs were detected in a limited number of AbrB-bound sequences . 
+ In addition , we found that not only TNCCA -- 4 bp -- TGGNA , but also other bipartite TGGNA motifs , in palindromic or tandem orientation , separated by 4 -- 5 bp were enriched in AbrB-bound DNA sequences on the B. subtilis genome . 
+ In the present GeF-seq analysis , the lengths of automatically extracted AbrB-binding sequences were restricted to an in vitro DNase I footprinting level . 
+ Thus , we expected that the large amount of precise information on AbrB-binding sequences might give us a clear view on the consensus AbrB-binding sequence . 
+ We then utilized the BiPad web server , a web interface to predict sequence elements embedded within unaligned sequences , to analyse the experimentally derived AbrB-binding sequences . 
+ BiPad predicts various pairs of bipartite motifs with different gaps in different orientations as one consensus sequence .19 BiPad successfully identiﬁed a mixture of bipartite TGGNA motifs in 96 % ( 678 ) of the 700 experimentally identiﬁed sequences ( Fig. 4A ) , and we found that we could classify them into six patterns by manually sorting the predicted consensus in each AbrB-binding sequence ( Fig. 4B and Supplementary Table S3 ) . 
+ As a result , consensus sequences were found to be composed of bipartite TGGNA motifs separated by 4 or 5 bp AT-rich sequences arranged in direct , revers direct , inverted , and everted repeat orientations ( Fig. 4B ) . 
+ Importantly , the location of the consensus sequence was usually close to the middle of the experimentally identiﬁed binding sequences ( Fig. 4C ) . 
+ Thus , we not only conﬁrmed that the AbrB-binding consensus sequence we proposed previously was indeed detectable in almost all of the AbrB-binding sequences with high binding signals , but we also demonstrated that the information on the protein-binding sequences automatically extracted by the GeF-seq analysis enabled us to clearly identify a consensus sequence for protein binding , at least in the case of AbrB . 
+ It should be also noted that any clear consensus sequence was not detected in the remaining 22 sequences , although a degenerate single TGGNA motif was detected ( data not shown ) . 
+ Since Abr binding to these sequences was clearly detected with high signal intensity ( Supplementary Fig . 
+ S2 and Supplementary Table S2 ) , this result indicates that AbrB also binds to sequences without bipartite motifs by some mechanism , for example , when the DNA sequence forms structure ( s ) to ﬁt the AbrBbinding surface . 
+ 3.5 . 
+ Correlation between AbrB-binding signals and motif discovery Here , using 700 AbrB-binding DNA sequences with high GeF-seq AbrB-binding signals , we identiﬁed bipartite AbrB-binding motifs across the Bacillus genome arranged in any orientation with a 4 - or 5-bp spacing , which is consistent with our previous ChAP-chip analysis14 and the in vitro SELEX results reported by Xu and Strauch .17 These results strongly suggested that , when binding signal intensities are high , the sequences are those speciﬁcally recognized by AbrB . 
+ We usually use a threshold value to discrimin-ate ` real ' protein-binding peaks and possible ` artiﬁcial ' binding peaks in ChIP-chip and ChIP-seq experiments . 
+ However , actually , these threshold values have been operationally deﬁned by researchers ; for example , aiming to remove false positives or to remove false negatives , and examination of actual protein binding to extract possible binding sequences has rarely been examined thoroughly . 
+ The ﬁnding that we could identify AbrB-binding consensus sequences in almost all of the binding sequences accompanied with high binding signals in GeF-seq prompted us to comprehensively examine whether there was conservation of the binding motifs in sequences with lower binding signals . 
+ To this end , we divided the 5897 possible AbrBbinding sequences into 20 sets each containing 300 sequences , according to average AbrB-binding signal intensities ( Supplementary Table S2 and Supplementary Fig . 
+ S4C ) , and the consensus sequence for each set was extracted by the Bipad program ( Fig. 5 ) . 
+ In the three datasets with the top AbrB-binding signal intensities ( 1 -- 300 , 301 -- 600 , and 601 -- 900 ) , the bipartite TGGNA motifs with a 4 - or 5-bp spacer sequence were detected in almost all sequences ( 98 , 96 , and 96 % respectively , Fig. 5 ) . 
+ In the next group of datasets with lower binding signal intensities ( 901 -- 1200 , 1201 -- 1500 , and 1501 -- 1800 ) , although the consensus sequences containing bipartite TGGNA motifs with a 4 - or 5-bp spacer sequence could be detected , one half-site became degenerate . 
+ Interestingly , in the following 10 sets ( from 1801 to 4800 ) , only a single TGGNA motif was detected , whereas , in the remaining four sets with the lowest binding signal intensities ( from 4801 to 5897 , Fig. 5 ) , the single motif becomes very degenerate . 
+ We conﬁrmed that the TGGNA motif was not detected by chance because no motif was detected in similar sets of DNA sequences ( 300 50-bp sequences ) that were randomly extracted from the genome sequence of B. subtilis 168 ( data not shown ) . 
+ These results strongly suggested that AbrB not only binds stably , with high experimentally derived binding signals , to bipartite TGGNA motifs , but also interacts with single TGGNA motifs in sequences with lower but signiﬁcant experimentally derived signal intensities ( Supplementary Fig . 
+ S4 ) . 
+ . Discussion
+ We demonstrate here that , by mapping the sequences of 50 bp fragments co-puriﬁed with AbrB after in situ DNase I digestion of genomic DNA , in vivo AbrB-binding sites could be determined with a resolution comparable with that of in vitro footprinting . 
+ Furthermore , comprehensive and precise information on the DNA sequences that AbrB binds gave us a clear view of AbrB binding on the B. subtilis genome -- it would stably bind to bipartite TGGNA motifs , but it also interacted with many single TGGNA motifs on the genome . 
+ In vitro DNase I footprinting has currently been one of the most widely used methods to determine at high resolution the precise DNA sequences bound by transcription factors and other DNA-binding proteins . 
+ However , this method is laborious and can be performed against only a few DNA sequence targets in one experiment . 
+ In addition , the synthetic conditions under which DNase I foorprinting assays have been conducted risks leading to artifactual results for several reasons , e.g. the use of puriﬁed proteins that are not modiﬁed as would occur in vivo and may not work in the same way , the low-ionic strength of solutions used in in vitro footprinting experiments that may allow non-speciﬁc DNA -- protein interactions , the use of short DNA sequences that may lack the secondary structure of DNA found in vivo , experiments conducted at non-physiological temperatures , and the absence of essential effectors , which may impair the speciﬁc binding of the protein to the corresponding DNA sequence . 
+ In contrast , in the GeF-seq method , DNA -- protein interactions in the nucleoid are stabilized in the living cells by formaldehyde treatment , and then DNA digestion is carried out in situ to retain the native DNA-binding state of the target protein . 
+ Thus , the GeF-seq method identiﬁes the actual DNA-binding sequences of target proteins across the whole genome simultaneously , with minimal risk of artefacts , and at high resolution , which is comparable with that of in vitro footprinting . 
+ In analysing the resolution of the method , we conﬁrmed that 21 AbrB-binding regions we found using GeF-seq were consistent with in vitro footprinting results that have been reported previously ( Supplementar single TGGNA motifs in possible low-afﬁnity AbrBbinding sites , that would be generally ignored as non-speciﬁc , using the binding-site prediction software . 
+ Such low-afﬁnity AbrB binding may not be biologically important , but it is possible that those binding sites may have a role to concentrate AbrB molecules on the nucleoid to increase the chance of ﬁnding high-afﬁnity binding sites , which are directly involved in gene regulation .25 We usually use threshold value to discriminate ` real ' protein-binding peaks on the genome and possible ` artiﬁcial ' binding peaks in ChIP-chip and ChIP-seq experiments . 
+ However , our results clearly demonstrated that the use of threshold values could discard important information . 
+ Our results suggest that comprehensive and precise information on protein-binding sequences obtained by GeF-seq analysis , in combination with the identiﬁcation of consensus sequences in them , would give us a clear and comprehensive view of protein binding on the genome . 
+ Speciﬁcally , we clearly demonstrate here that the consensus sequence for the high-afﬁnity AbrB binding is comprised of bipartite TGGNA motifs gapped by a 4 - or 5-bp AT-rich sequence arranged in direct , reverse direct , inverted , and everted repeat orientations . 
+ This result is consistent with a previous in vitro SELEX study ,17 and our informatics analysis showing that various bipartite motifs are enriched in AbrB-binding regions determined by ChIP-chip .14 Thus , the GeF-seq results reported here show , for the ﬁrst time , the highly ﬂexible proposed consensus sequences , which are actually recognized by AbrB molecules in in vivo . 
+ Previous structural modelling of AbrB bound to the target DNA sequence indicated that the AbrB tetramer would interact with 20 bp sequences ,15 whereas in vitro footprinting studies detected a wider range of binding regions from 25 to 80 bp . 
+ In this study , GeF-seq also detected a similar range of AbrB-binding regions from 27 to 80 bp in size . 
+ When the positions of the bipartite motifs within the binding sequences are depicted , the motifs are usually located in the middle of the binding sequences , but some are not centrally located in the long binding sequences ( Fig. 4C , Supplementary Table S3 ) . 
+ Interestingly , we observed that the binding region is generally composed of multiple TGGNA motifs almost covering the full length of the sequenced region ( data not shown ) , suggesting that higher oligomers of AbrB may interact with multiple TGGNA motifs . 
+ Here , we have demonstrated that GeF-seq is a powerful tool for helping to understand the in vivo distribution of DNA-binding proteins on the genome . 
+ However , several issues remain to be explored , in order to fully establish the GeF-seq method . 
+ ( i ) We have not yet examined how DNase I digestion conditions would affect the results , although the results shown in Supplementary Figs S1 and S2 suggest that the GeF-seq results would be robust against changes in DNase I digestion conditions . 
+ ( ii ) We empirically selected criteria to map read sequences and to deﬁne protein-binding sites on the genome . 
+ Further improvements in the sequence data processing algorithms are desirable to automate this process . 
+ ( iii ) The Bipad program outputs one consensus sequence for each input sequence , and a method to identify multiple motifs in each sequence is desirable . 
+ ( iv ) GeF-seq data suggest that proteinbinding signal intensities to the genome should correlate with protein-binding afﬁnities to the cognate target sequences , but this needs to be shown experimentally . 
+ ( v ) GeF-seq has successfully determined protein-binding sites across a bacterial genome , but examination of whether this method is applicable for much larger genomes of higher organisms is necessary . 
+ Acknowledgements : We are grateful to Dr C. Bi for useful advice on motifs analysis using BiPad web server . 
+ Supplementary data : Supplementary Data are available at www.dnaresearch.oxfordjournals.org . 
+ Funding
+ This work has been supported by the Advanced Low Carbon Technology Research and Development Program ( ALCA ) of the Japan Science and Technology Agency ( JST ) and a UK Royal Society International Joint Project to T.O. and J.L.H. Interactions between the authors ' laboratories have been facilitated by a BBSRC/JST Japan Partnering Award .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/23717649.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/23717649.txt 0 → 100644
View file @27818a9
+ Genome-Wide Analysis of the Salmonella Fis Regulon
+ Abstract 
+ Fis , one of the most important nucleoid-associated proteins , functions as a global regulator of transcription in bacteria that has been comprehensively studied in Escherichia coli K12 . 
+ Fis also influences the virulence of Salmonella enterica and pathogenic E. coli by regulating their virulence genes , however , the relevant mechanism is unclear . 
+ In this report , using combined RNA-seq and chromatin immunoprecipitation ( ChIP ) - seq technologies , we first identified 1646 Fis-regulated genes and 885 Fis-binding targets in the S. enterica serovar Typhimurium , and found a Fis regulon different from that in E. coli . 
+ Fis has been reported to contribute to the invasion ability of S. enterica . 
+ By using cell infection assays , we found it also enhances the intracellular replication ability of S. enterica within macrophage cell , which is of central importance for the pathogenesis of infections . 
+ Salmonella pathogenicity islands ( SPI ) -1 and SPI-2 are crucial for the invasion and survival of S. enterica in host cells . 
+ Using mutation and overexpression experiments , real-time PCR analysis , and electrophoretic mobility shift assays , we demonstrated that Fis regulates 63 of the 94 Salmonella pathogenicity island ( SPI ) -1 and SPI-2 genes , by three regulatory modes : i ) binds to SPI regulators in the gene body or in upstream regions ; ii ) binds to SPI genes directly to mediate transcriptional activation of themselves and downstream genes ; iii ) binds to gene encoding OmpR which affects SPI gene expression by controlling SPI regulators SsrA and HilD . 
+ Our results provide new insights into the impact of Fis on SPI genes and the pathogenicity of S. enterica . 
+ Introduction
+ Bacterial regulators are broadly classified into two groups-global and local , depending on the number of genes the regulator targets [ 1 ] . 
+ Nucleoid-associated proteins ( NAPs ) are notable among the global regulators . 
+ Most NAPs possess an ability to alter the trajectory of the DNA molecule by bending , wrapping or bridging it , which influences the transcription of numerous genes by changing the global DNA structure [ 2 -- 5 ] . 
+ In addition , some NAPs also regulate specific genes by different mechanisms such as interacting with RNA polymerase and other proteins [ 6 ] . 
+ Fis , one of the best-studied NAPs , was first identified as a stimulator of inversion of the Hin invertible DNA element in Salmonella enterica serovar Typhimurium [ 7 -- 10 ] . 
+ Fis has been studied intensively from the perspective of gene regulation and has been reported to regulate gene expression by modulating the level of DNA supercoiling in the cell and interacting with RNA polymerase at the position of its binding site [ 2,11 -- 14 ] . 
+ The effects of Fis on gene transcription have been mainly studied in E. coli using transcriptomics analysis and chromatin immunoprecipitation ( ChIP ) analysis [ 3,4,15,16 ] . 
+ More than 900 genes were found to be regulated by Fis during the exponential growth stage in E. coli [ 3,4,17 -- 19 ] . 
+ Genes up-regulated by Fis are involved in translation , flagellar biosynthesis and energy metab-olism , while down-regulated genes are involved in stress responses , amino acid and nucleotide biosynthesis , and nutrient transport [ 4,20,21 ] . 
+ More than 1000 Fis-binding regions were determined , and several Fis-binding motifs were identified [ 3,4,16,22,23 ] . 
+ By comparing the genes bound by Fis with the genes regulated by Fis in E. coli , it was found that only a small proportion was present in both , indicating that most genes were indirectly influenced by Fis [ 3,4 ] . 
+ A global role for Fis in transcriptional regulation in S. enterica serovar Typhimurium has been studied using microarrays , and Fis was found to influence 291 genes during the exponential stage [ 7 ] . 
+ Fis-binding sites on several genes such as rpoS and gyrB in S. enterica , through which Fis regulates these genes , have also been identified [ 16 ] . 
+ However , a genome-wide analysis of Fis-binding sites in S. enterica has not yet been reported . 
+ It can be speculated that the Fisbinding regions in S. enterica are different from those of E. coli for two major reasons . 
+ First , there are marked differences in the genomes of these two species . 
+ For instance , approximately 29 % of the genes in S. enterica serovar Typhimurium LT2 ( including those of pathogenicity islands , functional prophages , and plasmids , most of which are closely associated with pathogenesis ) , are absent from E. coli K12 [ 24,25 ] . 
+ Furthermore , the genome regions present in both species share on average only 80 -- 85 % identity at the nucleotide level [ 24 ] . 
+ Second , the DNA supercoiling levels differ between the two species , and Fis binds to DNA to play an important role in the homeostasis of supercoiling [ 15,26 -- 28 ] . 
+ Besides its role in global regulation , the roles of Fis in the regulation of virulence properties in E. coli and S. enterica have been previously reported [ 7,29 -- 31 ] . 
+ For example , Fis was reported to influence the transcription of the virulence genes at the locus of enterocyte effacement ( LEE ) in enteropathogenic E. coli and therefore , to affect the invasion ability of the pathogen [ 29 ] . 
+ S. enterica serovar Typhimurium is a facultative intracellular pathogen that infects intestinal epithelial cells , subsequently be internalized by macrophages cells and then rapidly disseminates through the blood stream accumulating in mesenteric lymph nodes . 
+ It causes food-borne gastroenteritis in millions of people worldwide . 
+ Furthermore , its invasion process is mediated by a type III secretion system ( TTSS ) , which is encoded by Salmonella pathoge-nicity islands ( SPI ) -1 , and TTSS encoded by SPI-2 is responsible for delivering effector proteins to the host cell , which facilitates S. enterica survival and replication in host cells [ 25,32 -- 34 ] . 
+ Besides SPI-1 and SPI-2 , other SPIs such as SPI-3 , SPI-4 and SPI-5 also contribute to host cell invasion and intracellular pathogenesis [ 35 -- 38 ] . 
+ Most of the genes in SPI-1 , SPI-2 , SPI-4 , SPI-5 , and some genes in SPI-3 have been found to be positively regulated by Fis [ 7 ] , although the underlying regulatory mechanism remains to be clarified . 
+ In this study , we determined the genome-wide distribution of Fis-binding regions in S. enterica LT2 , analyzed the regulation of global gene transcription by Fis , and identified the molecular mechanisms by which Fis acts to influence virulence of LT2 . 
+ A lower degree of concordance ( 23 % ) in the Fis-binding regions of S. enterica LT2 and E. coli K12 was found , and a new Fis-binding motif was identified in LT2 that differed from the K12 form . 
+ A large proportion ( 65 % ) of Fis-binding genes was positively regulated by Fis , which is different from the effect of Fis in E. coli . 
+ In addition , we found that Fis up-regulated cobalamin ( B12 ) biosynthesis genes by controlling the B12 regulator gene , pocR . 
+ Using cell invasion assays , we showed that Fis enhances invasion and intracellular replication ability of LT2 within the host cell . 
+ Combining the results of ChIP-seq , RNA-seq , real-time PCR ( RTPCR ) , mutation , and cell infection experiments , we showed that Fis influences the expression of 63 of the 94 SPI-1 and SPI-2 genes , which are responsible for the invasion and intracellular replication of LT2 . 
+ Three regulatory modes were characterized by which Fis controls SPI gene transcription : i ) Fis binds to and activates SPI regulator genes ( hilC and ssrA ) ; ii ) Fis binds directly to SPI genes to enhance the transcription of these genes and those downstream ; iii ) Fis enhances the expression of the global regulator gene ompR [ 39 ] , which induces the expression of SPI positive regulator genes ( hilD and ssrA ) . 
+ Materials and Methods
+ Bacterial Strains and General Growth Conditions
+ Strains used in this work are listed in Table S1 . 
+ Luria-Bertani broth and agar ( 15 g/L ) were used for routine growth . 
+ Where necessary , antibiotics were used at the following final concentrations : ampicillin ( 100 mg/mL ) , chloramphenicol ( 15 mg/mL ) , kanamycin ( 50 mg/mL ) , respectively . 
+ Construction of LT2 Mutant and FLAG-tagged Strains
+ The Dfis strain was constructed by substitution of fis with a chloramphenicol acetyltransferase gene ( cat ) using the phage lambda Red recombination system [ 40 ] . 
+ The fis-FLAG strain was constructed by substitution of the fis termination codon with the 36FLAG epitope and a chloramphenicol resistance cassette amplified from the plasmid pLW1600F [ 41 ] using the same recombination system . 
+ The ompR mutant strains were generated by substitution of ompR with a kanamycin resistance cassette using the Red recombination system in LT2 and Dfis . 
+ For the overexpression of ompR , the ompR PCR product was digested with the restriction enzymes EcoRI and BamHI , and ligated into a low copy plasmid pwsk129 [ 42 ] . 
+ The plasmid pLW1599 containing the cloned ompR gene was then transferred into Dfis [ 42 ] . 
+ Deletion of Fis-binding sites in genes invE , invC and spaO in LT2 were made by substitutions of the corresponding sites with a kanamycin resistance gene ( kan ) using the Red recombination system . 
+ The relevant controls were constructed by the insertion of kan upstream of the Fis-binding sites on genes invE , invC and spaO , respectively in LT2 . 
+ Deletion of genes flhD , fruR , fucR , gutM , pocR and prpR in LT2 wild-type and Dfis were generated , respectively , by substitutions of the corresponding sites with kan using the Red recombination system . 
+ All primers designed for deletion and verification tests are shown in Table S2 . 
+ ChIP
+ The S. enterica LT2 fis-FLAG strain was used to perform all ChIP-seq experiments . 
+ Cells were grown aerobically at 37uC to mid-exponential ( OD A600 = approximately 0.6 ) phases . 
+ Formal-dehyde was then added to a final concentration of 1 % . 
+ After 25 min of incubation at room temperature , 0.5 M glycine was added for further 5 min to quench the unused formaldehyde . 
+ Cross-linked cells were harvested and washed three times with ice-cold Tris-buffered saline ( TBS ) . 
+ Cells were resuspended in 1 mL of lysis buffer composed of 50 mM Tris-HCl ( pH 7.5 ) , 100 mM NaCl , 1 mM EDTA , 1 mM protease inhibitor cocktail ( Sigma-Aldrich ) , 20 mg/mL lysozyme and 0.1 mg/mL RNase A . 
+ The cells were incubated for 30 min at 37uC and 1 mL 26immunoprecipitation ( IP ) buffer ( 100 mM Tris-HCl ( pH 7.5 ) , 200 mM NaCl , 1 mM EDTA , 2 % ( v/v ) Triton X-100 ) was added . 
+ The lysate was then sonicated ( Hielscher ) to an average size of approximately 250 bp with 20 cycles of 30 s on/off at 95 % amplitude . 
+ Insoluble cell debris was removed by centrifugation at 22,000 RCF for 10 min at 4uC , and the supernatant was split into two 900 mL aliquots . 
+ The remaining 200 mL was kept to check the size of the DNA fragments . 
+ Each 900 mL aliquot was incubated with 30 mL Dynabeads Protein A ( Invitrogen ) on a rotary shaker for 1 h at room temperature to remove non-specifically binding complexes . 
+ The supernatant was then collected and incubated with 50 mL Protein A , pre-diluted with PBST ( PBS buffer at pH 7.4 , 0.02 % Tween 20 ) , as mock-IP sample . 
+ The IP sample was added with FLAG mouse monoclonal antibody ( Sigma-Aldrich ) in the supernatant . 
+ Both samples were incubated on a rotary shaker at 4uC for 4 h , and washed once with IP buffer , once with IP buffer +500 mM NaCl , once with wash buffer ( 10 mM Tris-HCl buffer at pH 8.0 , 250 mM LiCl , 1 % [ v/v ] Triton X-100 , and 1 mM EDTA ) , and once with TE buffer ( pH 7.5 ) in order . 
+ After removing the TE buffer , the beads were resuspended in 200 mL elution buffer ( 50 mM Tris-HCl buffer at pH 8.0 , 10 mM EDTA , and 1 % SDS ) and eluted at 65uC for 20 min . 
+ DNA was purified and recovered by standard phenol-chloroform extraction and ethanol precipitation with 5 mg of glycogen ( Invitrogen ) . 
+ RNA Extraction
+ To prepare cells for RNA extraction , 100 mL of fresh LB was inoculated from an overnight culture ( 1:200 ) and incubated at 180 rpm at 37uC . 
+ S. enterica LT2 and the Dfis were collected at mid-exponential phase ( OD600 = 0.6 ) . 
+ RNA was extracted using 
+ TRIzol Reagent ( Invitrogen ) according to the manufacturer 's protocol . 
+ RNA samples were further purified using the RNeasy Mini Kit ( Invitrogen ) . 
+ The bacterial 23S and 16S rRNA was then depleted using the MicrobExpress Kit ( Invitrogen ) . 
+ RNA quality was determined using a Bioanalyser ( Thermo ) and by visualization following 1 % agarose gel electrophoresis . 
+ RNA was quantified 
+ Library Construction and Solexa Sequencing
+ Library construction of immunoprecipitated DNA samples was carried out using the Next DNA Sample Prep Master Mix Set 1 Kit ( NEB ) following the manufacturer 's instructions . 
+ DNA samples were purified using the QIA quick PCR Purification Kit ( QIAGEN ) and the QIA quick Gel Extraction Kit ( QIAGEN ) after each manipulation step . 
+ Samples were loaded at a concentration of 8 pM . 
+ RNA library construction was carried out using the mRNA-Seq 8-Sample Prep Kit ( Illumina ) according to the manufacturer 's protocol . 
+ Samples were loaded at a concentration of 10 pM . 
+ RT-PCR for ChIP-seq and RNA-seq Validation
+ To measure the enrichment of the Fis-binding targets in the immunoprecipitated DNA samples , RT-PCR was performed using the 7300 Fast Real-Time PCR systems ( Applied Biosystems ) . 
+ IP or mock-IP DNA ( 1 mL ) was used as a template , and the amplifications were performed using specific primers ( Table S2 ) and SYBR mix ( QIAGEN ) . 
+ To measure gene transcription in different strains , RT-PCR was carried out using specific primers based on targeted genes . 
+ Total RNA ( 1.0 mg ) was reverse transcribed to generate cDNA as the template for RT-PCR . 
+ The RT-PCR conditions were as follows : 25 mL SYBR mix ( QIAGEN ) , 1 mL each primer ( 10 pM ) , 1 mL cDNA or DNA , and 22 mL ddH2O . 
+ Three independent technical replicates were carried out for each reaction . 
+ Infection assays using human HeLa epithelial cells ( ATCC CCL-2 ) were performed as described previously [ 43 ] . 
+ HeLa cells ( 16105/well ) were infected ( multiplicity of infection ( moi ) of 10 ) for 30 min with bacteria grown to early exponential phase . 
+ To increase contact between the bacteria and cells , the 6-well plates were centrifuged at 10006g for 5 min , and incubated for 40 min at 37uC in 5 % CO . 
+ Macrophages were washed three times in 2 PBS to remove non-invasive bacterial cells , and fresh RPMI-1640 medium containing 50 mg/mL gentamicin was added to kill remaining extracellular bacteria . 
+ After 1 h , the rate of invasion was calculated according to the number of recovered bacterial cells relative to the input number . 
+ Experiments were carried out in triplicate . 
+ Macrophage Infection Assays
+ Infection assays using murine RAW 264.7 macrophage cell ( ATCC TIB-71 ) were performed as previously described [ 43 ] . 
+ RAW264 .7 macrophages were incubated in the RPMI-1640 medium and seeded ( 16106 cells/well ) in 6-well plates one day prior to infection . 
+ Bacteria were harvested at the exponential phase and used for infection of RAW264 .7 cells ( moi , 10:1 ) . 
+ Bacterial cells were centrifuged ( 37uC , 8006g , 5 min ) onto the macrophages and incubated for 40 min at 37uC in 5 % CO2 . 
+ Macrophages were washed three times with PBS to remove noninvasive bacterial cells ; this h was defined as the 0 h time-point . 
+ After washing , fresh RPMI-1640 medium containing 50 mg/mL gentamicin was added to kill remaining non-invasive bacterial cells . 
+ After 1 h , the medium was replaced with RPMI-1640 medium containing 15 mg/mL gentamicin , and incubated for an additional 28 h . 
+ The number of intracellular bacteria was determined at 0 , 1 , 2 , 4 , 6 , 8 , 12 , 21 , 24 and 28 h. To estimate the amount of intracellular bacteria at each time point , cells were lysed using 0.1 % SDS , and cell lysates were collected and serially diluted 10-fold in PBS , and aliquots were plated onto LB agar to enumerate bacterial colony-forming units ( cfu ) [ 44 ] . 
+ Experiments were carried out in triplicate . 
+ Fis Purification
+ Fis protein was purified by glutathione sepharose high performance ( GE Healthcare ) according to the manufacture 's protocol . 
+ The amplified fis product was cloned into pGEX4T-1 to generate the plasmid pLW1601 , and then transformed into E. coli BL21 to generate strain H2114 . 
+ Strain H2114 was grown at 37uC to OD600 = 0.4 , and Fis expression was induced by the addition of 0.1 mM isopropyl-b-D-thiogalactopyranoside ( IPTG ) for 3 h at 30uC . 
+ Fis purity was assessed by Coomassie stained SDS-polyacrylamide gel electrophoresis ( PAGE ) , and its concentration was quantified by Bradford assay . 
+ Electrophoretic Mobility Shift Assay
+ Gel mobility shift assays were performed by incubating amplified Fis-binding DNA fragments ( 1 nM ) at 25uC for 20 min with various concentrations of Fis protein ( 0 -- 400 nM ) in a 20 mL solution containing 20 mM Tris-HCl ( pH 7.5 ) , 80 mM NaCl , 0.1 mM EDTA and 1 mM DTT . 
+ The boundaries of the Fis-binding DNA found within intergenic regions or the open reading frame ( ORF ) regions were shown in Table S3 . 
+ DNA fragments containing the dmsA ORF region and the ompR promoter region were PCR amplified as negative and positive controls , respectively [ 4,39 ] . 
+ Samples were loaded with native binding buffer on a 6.0 % polyacrylamide gel in 0.56 TBE . 
+ Gel staining was operated according to the manufacture 's protocol [ 45 ] . 
+ Analysis of Fis-binding Regions from ChIP-seq Data
+ To identify Fis-binding regions on the LT2 chromosome , we mapped ChIP sequences to the genome . 
+ The sequencing reads were mapped to both strands and the distribution of read counts for each basepair formed a standard plot that range between 0 -- 200 at the genomic scale . 
+ An in-house perl script algorithm , coupled with the RPKM ( reads per kilobase per million mapped reads ) value [ 46 ] , was used to detect binding peaks . 
+ Several parameters were set up , including : m = 20 ( if the plot value of a base pair reached 20 , the base pair was considered to be the start site of a potential binding region ) ; r = 100 ( a potential binding region was determined as a real binding region only if the plot value of the peak in this region reached 100 ) . 
+ After identification of the potential Fis-binding regions corresponding to these conditions , RPKM values were re-calculated . 
+ False positive binding regions ( RPKM in the mock-IP greater than those in the IP ) were removed . 
+ All the remaining Fis-binding regions were then considered to be effective Fis-binding targets . 
+ Analysis of Fis-regulated Genes from RNA-seq Data
+ All the raw FastQ files were cleaned by SolexQA [ 47 ] . 
+ To obtain estimates of transcription levels , TopHat ( v1 .3.1 ) [ 48 ] was used to align the trimmed sequencing reads with the LT2 genome . 
+ The genome and gene annotation used in this study was obtained from the NCBI website . 
+ Cufflinks ( v1 .1.0 ) [ 49 ] was then used to estimate gene transcription levels based on the same gene model annotations . 
+ To select the genes with significant differential expression , the Cufflinks output was parsed by a perl script . 
+ To present the gene expression levels , RPKM was used as normalized metrics . 
+ RPKM values [ 46 ] were determined for all genes in each of the samples tested . 
+ In our research , genes showing greater than a twofold change ( ratio of RPKM ) in transcription between cells grown in the presence and absence of the fis gene were identified . 
+ Motif-searching
+ To identify Fis-binding motifs , the sequences of Fis-binding regions obtained from ChIP-seq were analyzed using MEME-ChIP software [ 50 -- 52 ] , with the following parameters : zero or one motif per sequence ; motif width ranging from 6 -- 20 ; searching both strands of the sequences ; using a background distribution file containing the mono-nucleotide frequencies of the LT2 chromosome . 
+ The motif with the lowest E-value was considered as the significant motif . 
+ Results
+ Genome-wide Mapping of Fis-binding Sites
+ Both mock-IP and IP samples of S. enterica serovar Typhimur-ium strain LT2 were collected in the mid-exponential phase ( OD600 = 0.6 ) , and ChIP-seq analysis was performed . 
+ For mock-IP samples , 6,576,604 reads of 100-nt length were mapped to the LT2 genome , amounting to 120-fold coverage ; for IP samples , 3,350,972 reads were mapped to the genome , amounting to 50-fold coverage ( Table 1 ) . 
+ After removing non-specific binding regions for which read counts from mock-IP samples are greater than those from IP samples , a total of 885 binding regions in 943 genes ( Table S3 ) were detected . 
+ Fis was found to bind to a total of 309 kb ( approximately 8 % ) sequences on the LT2 genome . 
+ The average length of Fis-binding regions is estimated to be 349 bp , and the average length of Fis-binding region intervals is 5.21 kb ssaU , sipB , sipC , invC , accC , ompR , mgtC and metJ ) . 
+ All six previously identified and 14 newly detected Fis-binding regions exhibited enrichment as a log2 ratio range of 0.47 to 22.76 , and the two control regions showed no significant enrichment ( Table S4 ) . 
+ These results indicate that the majority of Fis-binding regions identified by ChIP-seq are reliable . 
+ The 885 Fis-binding regions can be classified into four categories according to the relative position between binding regions and related genes : ORF , IG1 , IG2 and IG3 . 
+ The ORF category consists of Fis-binding peaks found within ORFs regions . 
+ Among the 885 binding regions , 492 ( 55.59 % ) regions belongs to the ORF category . 
+ The IG1 contains those found in intergenic regions between two genes transcribed in the same direction ; 195 ( 22.03 % ) regions were classified as IG1 . 
+ The IG2 category consists of Fis-binding peaks found in intergenic regions between two divergently transcribed genes ; 63 ( 7.12 % ) regions of this category were detected . 
+ All of the remaining binding regions ( 135 , 15.25 % ) were classified as IG3 , which are partly located in intergenic region and partly within the ORF ( Figure 1B ) . 
+ The average A+T content of the Fis-binding sites was estimated to be 51.57 % ( Figure 1C ) , which is higher than that of the LT2 chromosome ( 47.78 % ) [ 24 ] . 
+ LT2 contains 937 horizontally acquired genes [ 24 ] , and 207 of the 885 Fis-binding regions are within horizontally acquired genes . 
+ The average A+T content of the Fis-binding sites in horizontally acquired genes is 52.78 % , higher than that of total horizontally acquired genes ( 49.23 % ) . 
+ These data suggest that Fis binds preferentially to regions of higher A+T content in LT2 , which is in accordance with that found in E. coli K12 [ 3,4,16,53 ] . 
+ The unbiased motif-searching algorithm MEME was used to identify the most significant Fis-binding DNA sequence motif in LT2 . 
+ The motif with the lowest E-value ( 5.2e-039 ) was selected ( 8 nt in length ) , the log likelihood ratio of the motif being 1749 with an information content of 10.6 bits ( Figure 1D ) . 
+ The motif is non-palindromic , consisting mainly of A/T nucleotides , with four consecutive A nucleotides in the center . 
+ The motif is present in 641 of the 885 Fis-binding regions and appears 1592 times in total , twice per binding region on average . 
+ Fis Regulation on Global Gene Transcription
+ Fis regulation at the global level in LT2 in the mid-exponential phase was studied by RNA-seq on the wild-type and Dfis ( Figure 2 ) . 
+ The cDNA reads obtained for the wild-type and Dfis were 13,202,514 and 14,089,439 , with the map-rate of 95 % and 96 % , respectively ( Table 1 ) . 
+ A total of 1646 genes were found to be differently transcribed between the wild-type and Dfis ; 657 and 
+ 989 genes exhibiting higher or lower levels of transcription , respectively , in Dfis ( Table S5 ) . 
+ The RNA-seq results were confirmed by performing RT-PCR analysis of the wild-type and Dfis under the same culture condition as RNA-seq experiments . 
+ RT-PCR analysis targeted 12 genes , including five Fis-indepen-dent genes and seven Fis-dependent genes , and the results corresponded well with the RNA-seq data ( Table S6 ) . 
+ RNA-seq results indicated that Fis prefers to positively influence gene transcription in LT2 in the mid-exponential phase . 
+ The genes differentially transcribed were classified into functional categories based on clusters of orthologous groups ( COG ) designations ( www.ncbi.nlm.nih.gov/COG ) , and the percentage of up-regulated and down-regulated genes in Dfis in each COG category was calculated ( Figure 3 ) . 
+ Fis-regulated genes fell within almost all COGs except for genes involved in extracellular structures , RNA processing and modification . 
+ Among 23 COGs , 17 contained more genes activated by Fis than those repressed by Fis . 
+ The COG with the largest proportion of Fis-activated genes was the cell motility class of which approximately 45 % genes were up-regulated and 13 % genes were down-regulated by Fis . 
+ In the other four COGs , there were more genes repressed by Fis than those activated by Fis . 
+ The COG with lowest proportion of Fisactivated genes was the translation class , of which 22 % genes were Fis-repressed genes , and 4 % were Fis-activated genes ( Figure 3 ) . 
+ Among the 1646 differently transcribed genes between the wildtype and Dfis , 317 were Fis-binding genes , which included 207 genes ( 65.30 % ) with lower transcription in Dfis , and 110 ( 34.70 % ) genes with higher transcription ( Table 2 ) . 
+ It was also found that the activated gene ratio for Fis-binding genes ( 65.30 % ) was higher than that for all Fis-regulated genes ( 60.09 % ) . 
+ These results indicated that , in LT2 , Fis tends to perform an indirect regulatory role , while its direct regulatory role was shown to involve preferential up-regulation of the Fis-binding genes . 
+ Of the 1646 Fis-regulated genes , 1329 were not associated with Fis-binding ( Table S3 and S5 ) . 
+ Of these 1329 genes , 419 are known to be regulated by 55 transcription factors ( TFs ) in LT2 ( Regulon DB ) , including 27 Fis-independent and 28 Fis-dependent transcription factors ( TFs ) . 
+ It is known that the 28 Fis-dependent TFs regulate 303 of those 419 genes ( Regulon DB ) ; thus , it is highly likely that Fis controls the transcription of these 303 genes by regulating corresponding TFs . 
+ We randomly selected five of the 28 TF genes ( fruR , fucR , flhD , gutM and prpR ) [ 54 -- 58 ] to perform gene knockout experiments , and the transcription of 19 genes regulated by these five TFs were compared by RT-PCR between the wild-type and Dfis , and between DTF and DTFDfis . 
+ Three of the 19 genes showed 3.22 to 50.00-fold increases and 16 genes showed 1.25 to 26.35 - fold reductions in transcript levels in Dfis compared to the wild-type . 
+ However , in a TF mutant background , the three Fis-repressed genes exhibited only 1.06 to 5.00-fold higher transcription caused by the mutation of fis , and the effect of 
+ Fis on the 16 Fis-activated genes was attenuated or non-existent ( 0.27 to 7.67-fold ) ( Table 3 ) . 
+ This confirmed that Fis controls the transcription of these 19 genes by regulating the five TFs . 
+ Fis Regulates B12 Biosynthesis Genes by Binding to and Activating PocR B12 is a known cofactor for numerous enzymes mediating methylation , reduction , and intramolecular rearrangements [ 59 -- 
+ 61 ] . 
+ B12 biosynthesis genes , which are not present in E. coli , were acquired by horizontal gene transfer ( HGT ) in Salmonella [ 62 ] . 
+ In LT2 , a total of 30 genes are required for B12 biosynthesis , and 25 of these are clustered in the cob operon [ 61 ] . 
+ The transcription of the cob operon is mainly controlled by a trans-acting protein encoded by the pocR gene , which is located upstream of the cob operon [ 61,63 ] . 
+ In this study , none of the 25 cob genes were found to be associated with Fis-binding ( Table S3 ) , yet 16 of them were downregulated upon deletion of fis , indicating that Fis up -- regulates these genes ( Table S5 ) . 
+ The pocR gene was found to be activated 5.27-fold by Fis and a Fis-binding site was identified within the gene ( Table S3 and S5 ) . 
+ Evaluating of the transcriptional differences in the 16 cob genes in the wild-type , Dfis , DpocR and DfisDpocR by RT-PCR revealed 1.34 to 2.43-fold lower expression of these 16 cob genes in Dfis compared to the wild-type ( Table S7 ) . 
+ Four of the 16 genes showed 1.11 to 1.65-fold decreases , and nine of those 16 genes exhibited 1.03 to 4.44-fold increases in transcription in DfisDpocR comparing to DpocR ( Figure 4 ) , indicating that the positive effect of Fis on these 13 genes is significantly attenuated or absent ( Figure 4 ) due to the deletion of pocR . 
+ The reason that the expression of cob genes in pocR deleted strains is still dependent of Fis effect is Fis may also control the expression of B12 biosynthesis genes through other unknown negative regulators . 
+ In addition , the other three of the 16 genes ( cbiH , cbiQ and cbiO ) , showed 1.56 , 1.77 , 1.34-fold decreases in transcription , respectively , in DfisDpocR compared to the DpocR , which is similar to the expression changes ( 1.55 , 1.52 , 1.22-fold ) of those genes between Dfis and the wild-type . 
+ The result showed that Fis activates the transcription of B12 biosynthesis genes mainly through controlling the expression of pocR . 
+ Fis Effect on LT2 Invasion and Fis Regulation on SPI Genes
+ The invasion ability of S. enterica is dependent on its ability to invade intestinal epithelial cells and to survive inside macrophage cells [ 43,64,65 ] . 
+ In a previous study , S. enterica SL1344 Dfis exhibited 50 to 100-fold decreased ability to invade HEp-2 ( epithelial ) cell , compared to the wild-type strain [ 30 ] . 
+ In this study , the effect of Fis on the invasion of HeLa cells by LT2 was evaluated , and LT2 Dfis also exhibited decreased ability ( 5.02-fold ) ( Figure 5A ) . 
+ We also analyzed the effect of Fis on the survival of LT2 inside murine macrophage cells . 
+ The number of Dfis within cells was 2.00 to 14.42-fold less than within the wild-type strain at each time-point from 0 h to 28 h post-infection ( Figure 4B ) . 
+ These results indicated that Fis enhances invasion and intracellular replication ability of LT2 within the host cell . 
+ It is well known that SPI-1 and SPI-2 genes are mainly responsible for the invasion and intracellular replication of LT2 within host cells ; thus , we further investigated the effect of Fis on the expression of SPI-1 and SPI-2 genes . 
+ In comparisons of the expression profiles of LT2 wild-type and Dfis , 63 of the 94 SPI-1 and SPI-2 genes were found to be regulated by Fis , including 55 Fis up-regulated genes and eight Fis down-regulated genes ( Figure S1 ) . 
+ Of the 63 Fis-regulated genes ( Table S8 ) , only 23 could be directly bound by Fis in ChIP-seq analysis , indicating that these genes are directly regulated by Fis . 
+ We then randomly selected 16 genes , including hilC , sipA , sipB , spaS , sicA , invE , invC , spaO of SPI-1 , and orf242 , ssrA , ssaB , sscA , sscB , sseC , sseF , ssaV of SPI-2 , to perform gel mobility shift assays . 
+ As expected , the result showed that when the concentration of Fis protein ( 0 -- 400 nM ) was increased , more FisDNA complex and less free DNA were detected for all 16 genes ( Figure S2 ) . 
+ This confirmed the binding of Fis to these 16 genes . 
+ We also found that eight Fis-activated genes , although not associated with Fis-binding , were proximally located downstream of three Fis-binding genes ( invE , invC , and spaO ) . 
+ These included InvA and invB located downstream of invE , invI and invJ located downstream of invC , and spaP , spaQ , spaR and spaS located downstream of spaO . 
+ In Dfis , all eight genes showed greater than 23.37-fold decrease in transcription . 
+ The Fis-binding sites in the ORF regions of invE , invC , and spaO were substituted by kan in the wild-type . 
+ As a control , kan was also inserted upstream of the binding sites in invE , invC , and spaO in the wild-type . 
+ RT-PCR assays showed that deletion of the corresponding Fis-binding sites led to 1.13 to 5.78-fold transcriptional decrease of these genes compared with corresponding control strains ( Figure 6A ) . 
+ These results indicated that Fis regulates not only genes to which it binds directly , but also genes downstream of the Fis-binding genes ( Table S8 ) . 
+ The other 21 SPI genes , which are not associated with Fis binding , are probably regulated by Fis through control of the expression of global regulator or SPI regulators ( Table S8 ) . 
+ Among the SPI regulators , only hilC and ssrA were both Fis-regulated and Fis-binding genes , indicating that they are regulated by Fis directly . 
+ HilC plays a key role in co-ordinating expression of the SPI-1 genes [ 66 -- 68 ] , and its transcription was found to be decreased 43.82-fold in Dfis . 
+ Eleven Fis-regulated SPI genes have been reported to be under the control of HilC [ 25,32 ] . 
+ SsrA-SsrB is a two-component regulatory system for SPI-2 , which includes SsrA as the predicted integral membrane cognate sensor and SsrB as the response regulator binding to the promoters of all SPI-2 functional gene clusters [ 25,69 ] . 
+ The transcription of ssrA was found to decrease 2.90-fold in Dfis , and 10 Fis-regulated SPI genes have been reported to be under the control of SsrA [ 25,70 ] . 
+ Another SPI regulator , HilD , which is not associated with Fisbinding , acts in an ordered fashion with HilC to coordinately activate expression of the SPI-1 genes . 
+ The transcription of hilD was also found to be decreased 10.03-fold in Dfis . 
+ However , hilD were not found to be bound by Fis . 
+ We proposed that hilD was probably indirectly regulated by Fis through other proteins . 
+ The barA , fur and ompR genes were reported to be positively regulators of hilD [ 32,39,71 ] , and in this study , they were all found to be associated with Fis binding ( Table S3 ) . 
+ However , Dfis , barA and fur were found to maintain their transcription level , and only the expression of ompR decreased 2.41-fold . 
+ OmpR is associated with Fis binding ( Table S5 ) , which is consistent with previous study [ 39 ] , indicating that ompR is regulated by Fis directly . 
+ OmpR was also reported to bind to the promoter of the SPI-2 gene regulator ssrAB [ 69,72 ] . 
+ In this study , the transcription of 14 SPI-1 and 2 SPI-2 genes was evaluated by RT-PCR in the wild-type , Dfis , DompR and DfisDompR . 
+ Obvious decrease ( 3.20 to 21.41-fold ) in the transcription levels of these genes were observed in Dfis compared to the wild-type ( Figure 6B ) . 
+ However , this decrease was less marked ( 1.85 to10.34-fold ) in the DfisDompR strain compared to the DompR strain . 
+ We then overexpressed ompR in Dfis , and found that the increased transcription of the SPI-1 and SPI-2 genes was recovered ( 1.04 to 6.36-fold increase ) , thus partly compensating for the effect caused by the fis deletion ( Figure 6C ) . 
+ These results suggest that Fis positively influences the SPI-1 and SPI-2 regulators by binding to and activating ompR . 
+ To confirm the role of ompR in the effect of Fis on LT2 infection , cell invasion assays were carried out . 
+ The discrepancy in the HeLa invasion rate observed between the DompR and DfisDompR ( 3.83-fold decrease in DfisDompR ) was found to be smaller than that between the wild-type and Dfis ( 6.01-fold decrease in Dfis ) ( Figure 7A ) . 
+ Similar results were also detected in the infection assay using murine RAW 264.7 macrophages ( Figure 7B ) . 
+ Smaller discrepancies were detected for ompR deletion-mutant strains at 0 to 24 h post-infection ( 1.18 to 1.64-fold decrease in DfisDompR ) . 
+ Moreover , overexpression of ompR also increased the invasion ( 2.21-fold ) and survival ability of Dfis ( 1.27 to 2.41-fold ) ( Figure 7C and 7D ) . 
+ These results indicated that ompR plays an important role in Fis regulation of SPI-1 and SPI-2 genes , and influences the infection and replication of LT2 in host cells . 
+ Discussion
+ By using high-throughput sequencing methods , we clarified the genome-wide distribution of binding regions and global regulation pattern of Fis , one of the most important nucleoid-associated regulators , in S. enterica serovar Typhimurium LT2 . 
+ Furthermore , by using cell infection assay , we showed that Fis not only enhances invasion ability , but also intracellular replication ability of LT2 within macrophage cells . 
+ Most importantly , Fis was found to activate SPI genes , which are essential for the virulence of S. enterica . 
+ In this study , the three regulatory modes for Fis on SPI genes were illustrated for the first time . 
+ We identified 885 Fis-binding sites spread over on a total of 943 genes in LT2 . 
+ Compared to the reported 894 Fis-binding sites on 1341 genes in K12 [ 4 ] , some new features were observed in LT2 : i ) 
+ There is a different global regulation pattern of Fis in LT2 . 
+ Only 145 common genes are bound by Fis in both K12 and LT2 , which is possibly due to the difference in their genome sequence , as 320 of the 943 Fis-binding genes in LT2 are not present in K12 [ 24 ] . 
+ This result is also consistent with the phenomenon that the DNA supercoiling levels , which are controlled by Fis , are different between E. coli and S. enterica [ 26 ] . 
+ ii ) A higher percentage ( 23 % ) of Fis-binding sites was found within HGT genes in LT2 , which was significantly higher than that ( 10 % ) reported in K12 [ 3 ] . 
+ The discrepancy might be due to that LT2 acquired at least 1,106 gens mainly by HGT since the divergence from E. coli [ 24 ] . 
+ iii ) LT2 has a novel Fis-binding motif , which contains a A/T-tract ( similar to that in K12 ) , but has no conserved G/C on either side ( present in 
+ K12 ) . 
+ The two E. coli Fis-binding motifs [ 3,4 ] were searched against Fis-binding sequences in LT2 , but no regions of highhomology were identified . 
+ Our RNA-seq data shows that among 1646 Fis-regulated genes in LT2 , the expression of 60.09 % genes are repressed in Dfis . 
+ In contrast , only 33.59 % ( 310 of 923 genes ) of Fis-regulated genes were found to be up-regulated in the mid-exponential phase in K12 [ 4 ] . 
+ This indicates that in LT2 Fis preferentially mediates positively regulation of genes , which is opposite from that in K12 . 
+ Two hundred and thirty-six genes were found to be regulated by Fis in both K12 and LT2 , we found only a small proportion ( 163 genes ) of which are regulated by Fis with the same tendency , including genes for oxidative phosphorylation , secondary metab-olism , motility and carbon utilization . 
+ Seventy-three genes were found to be differently regulated by Fis between the two strains , including genes for the energy and fatty acid metabolism , membrane transport and signal transduction ( Table 4 ) . 
+ The great differences in Fis transcriptional regulation found between LT2 and K12 may be due to the different Fis-binding sites in two strains , and the difference in culture medium and high-throughput technologies used in the two studies . 
+ The effect of Fis on global transcriptional regulation in S. enterica Typhimurium SL1344 has been studied by Kelly et al. using microarrays [ 7 ] , only 291 genes were found to be influenced by Fis . 
+ By comparing data of that study with our research , most of Fis-regulated genes in SL1344 were found to be consistent with those in LT2 . 
+ For instance , genes involved in motility and SPI were found to be strongly down-regulated in Dfis in both studies . 
+ However , there are 30 Fis-regulated genes showed opposite regulation tendencies in the two studies . 
+ For instance , Kelly et al. reported that Fis did not affect genes involved in B12 production , while , our study surprisingly found that these genes were significantly repressed in Dfis . 
+ These differences may be due to the discrepancy in time-points analyzed in the two studies ( 1 hour post-subculture vs. mid-exponential phase after subculturing ) and the low resolution of microarrays . 
+ Fis was reported to influence the expression of many genes on SPI-1 and SPI-2 in S. enterica serovar Typhimurium . 
+ Most previous research has mainly been focused on the effects of Fis on regulating the expression of SPI genes , but the relevant molecular mechanisms are unclear . 
+ In this study , we firstly conclude the existence of three modes of Fis regulatory mechanism on SPI genes , and provide a regulatory network of Fis on SPI genes ( Figure 8 ) , including : i ) Fis binds to and activates the SPI positive regulator genes ; ii ) Fis binds to the ORF region of SPI genes to mediate transcriptional activation of these genes and those downstream genes ; iii ) Fis positively influences the SPI regulators by binding to and activating ompR . 
+ Among these mechanisms , the most interesting discovery is that we find the effect of Fis-binding on gene ORF regions may not exert a single effect on the single corresponding gene , but a small scale effect on several genes , which was not reported before . 
+ In most studies about global binding profiles of TFs in bacteria , such as Fis , HNS , RutR and Sfh , TF binding sites have been found to be located at both ORF regions and intergenic regions [ 3,4,13,73 -- 75 ] . 
+ Several studies further indicated that TF binding in ORF regions could also regulate the transcription of corresponding genes [ 3,4,73 ] , however the exact molecular mechanism for that is still unclear [ 76 ] . 
+ We noted that a ChIP-chip analysis to determine the genomic binding profiles of s in 70 E. coli revealed the existence of many s -- binding sites within the 70 ORF regions [ 77 ] , and another study suggested that at least 37 ORF regions bound by Fis in E. coli also have the core RNA polymerase ( RNAP ) or s binding sites [ 4 ] . 
+ It has been proposed 70 that Fis regulates the transcription by formation of DNA microloops , which form a separate topologica domain [ 78 ] , and in those regions , the RNAP may be trapped to repress the transcription or may recycle to efficiently activate the gene transcription process [ 4 ] . 
+ Therefore we suggest Fis-binding on SPI genes may recycle RNAP to promote the transcription of corresponding genes and those downstream genes . 
+ In this study , the identified Fis-dependent genes in SPI-1 and SPI-2 includes nine genes encoding effector proteins , which are transported via the T3SS into the host cell , 37 genes encoding needle complex structural proteins , six genes encoding SPI regulators , and three genes with unknown functions . 
+ Almost all genes encoding needle complex structural proteins and SPI regulators were activated by Fis . 
+ However , for the 18 genes encoding known effectors , only nine of them showed decrease expression in Dfis . 
+ It can be speculated that the weak effects of Fis on effectors observed in this study result from the inability of the LT2 growth conditions ( Luria-Bertani broth ) to induce transcription of effectors [ 79,80 ] . 
+ Although most genes on SPI were activated by Fis , a small proportion of SPI genes were down-regulated , such as STM2911 and STM2912 in SPI-1 and ttrB in SPI-2 . 
+ STM2912 was annotated as a putative transcriptional regulator [ 24 ] , and we propose that it is likely to be a negative regulator for SPI genes , which will be the subject of future study . 
+ Supporting Information
+ Acknowledgments
+ The sequencing was supported by Tianjin Biochip Corporation , Tianjin Economic-Technological Development Area , Tianjin . 
+ We gratefully thanks Dr. Laurence Van Melderen from Université Libre de Bruxelles for providing us with the plasmid pWSK129 . 
+ Author Contributions
+ Conceived and designed the experiments : LW HW . 
+ Performed the experiments : HW BL . 
+ Analyzed the data : LW HW BL QW . 
+ Contributed reagents/materials/analysis tools : LW . 
+ Wrote the paper : LW HW BL QW . 
+ 81 . 
+ Lobry JR ( 1996 ) Asymmetric substitution patterns in the two DNA strands of bacteria . 
+ Mol Biol Evol 13 : 660 -- 665 .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/23865838.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/23865838.txt 0 → 100644
View file @27818a9
+ A fast weak motif-finding algorithm based on
+ Abstract 
+ Background : Identification of transcription factor binding sites ( also called ` motif discovery ' ) in DNA sequences is a basic step in understanding genetic regulation . 
+ Although many successful programs have been developed , the problem is far from being solved on account of diversity in gene expression/regulation and the low specificity of binding sites . 
+ State-of-the-art algorithms have their own constraints ( e.g. , high time or space complexity for finding long motifs , low precision in identification of weak motifs , or the OOPS constraint : one occurrence of the motif instance per sequence ) which limit their scope of application . 
+ Results : In this paper , we present a novel and fast algorithm we call TFBSGroup . 
+ It is based on community detection from a graph and is used to discover long and weak ( l , d ) motifs under the ZOMOPS constraint ( zero , one or multiple occurrence ( s ) of the motif instance ( s ) per sequence ) , where l is the length of a motif and d is the maximum number of mutations between a motif instance and the motif itself . 
+ Firstly , TFBSGroup transforms the ( l , d ) motif search in sequences to focus on the discovery of dense subgraphs within a graph . 
+ It identifies these subgraphs using a fast community detection method for obtaining coarse-grained candidate motifs . 
+ Next , it greedily refines these candidate motifs towards the true motif within their own communities . 
+ Empirical studies on synthetic ( l , d ) samples have shown that TFBSGroup is very efficient ( e.g. , it can find true ( 18 , 6 ) , ( 24 , 8 ) motifs within 30 seconds ) . 
+ More importantly , the algorithm has succeeded in rapidly identifying motifs in a large data set of prokaryotic promoters generated from the Escherichia coli database RegulonDB . 
+ The algorithm has also accurately identified motifs in ChIP-seq data sets for 12 mouse transcription factors involved in ES cell pluripotency and self-renewal . 
+ Conclusions : Our novel heuristic algorithm , TFBSGroup , is able to quickly identify nearly exact matches for long and weak ( l , d ) motifs in DNA sequences under the ZOMOPS constraint . 
+ It is also capable of finding motifs in real applications . 
+ The source code for TFBSGroup can be obtained from http://bioinformatics.bioengr.uic.edu/TFBSGroup/ . 
+ Background
+ Transcription factors play an irreplaceable role in the activation and repression of gene expression by binding to specific sites within promoter regions of target genes . 
+ Identification of transcription factor binding sites ( TFBSs ) is a basic task for elucidating the molecular mechanisms of transcriptional regulation . 
+ Although traditional footprinting assays can accurately identify the precise binding sites of any factor , this low-throughput method is highly technical and can analyze only a single small region ( < 1 kb ) at a time . 
+ With the development of high-throughput sequencing technologies , a number of experimental techniques such as ChIP-chip and ChIP-seq have been used to identify the location of transcription factor binding sites . 
+ However , these methods are unable to resolve DNA-protein interactions at base pair resolution [ 1 ] . 
+ In silico identification of over-represented DNA motifs from the promoters of co-regulated or homologous genes as well as ChIP-enriched regions plays a significant role in locating binding sites in a high-throughput and high-resolution manner . 
+ Since a DNA motif is usually highly conserved or overrepresented among DNA sequences , there are two main approaches to its representation : ( 1 ) represent a motif by its profile or position-specific scoring matrix ( PSSM ) [ nj , k ] l × 4 , which records the frequency of base k ( k ∈ { A , C , G , T } ) at position j ( j = { 1 , 2 , · · · , l } ) for all aligned sites [ 2-4 ] or ( 2 ) characterize a motif as an l-length consensus string describing a motif with the most frequent nucleotide in each position of all aligned sites . 
+ According to these two TFBS models , existing motif-finding algorithms can be divided into two classes . 
+ The first includes algorithms that maximize a statistical or entropy-related score of a PSSM [ nj , k ] l × 4 . 
+ CONSENSUS [ 5 ] , MEME [ 6 ] , Gibbs Sampler [ 7 ] , AlignACE [ 8 ] , PROJECTION [ 9 ] , and CRMD [ 10 ] belong to this group . 
+ These algorithms use optimization techniques from the fields of statistics and machine learning , including the greedy strategy [ 5 ] , the Expectation-Maximization method [ 6 ] , Gibbs sampling methods [ 7-9 ] , and the clustering method [ 10 ] . 
+ These algorithms usually have a fast run time . 
+ Sometimes , however , they can not converge to the global optimum , especially for short motifs with high levels of statistical noise or long motifs with a large search space . 
+ The second class of algorithms usually searches for ( l , d ) motifs based on the consensus model [ 11 ] by employing heuristic methods , but in some cases use optimal techniques . 
+ It is supposed that each sequence contains zero , one , or multiple motif instance ( s ) with up to d mutations within a true motif [ 12 ] . 
+ A large number of algorithms have been proposed to exactly or almost exactly extract ( l , d ) motifs from N input sequences with length L. SPELLER [ 13 ] , WEEDER [ 14,15 ] , MITRA-count [ 16 ] , Voting [ 17 ] , PMSprune [ 18 ] , WINNOWER [ 11 ] , iTriplet [ 19 ] , VINE [ 20 ] , Stemming [ 21 ] , RecMotif [ 22 ] , and sMCL-WMR [ 23 ] are included in this group . 
+ These algorithms usually have a high time complexity for long motifs . 
+ This limits their application , especially toward prokaryotic promoters [ 24,25 ] . 
+ In this study , we intend to offer a highly efficient algorithm for finding long and weak ( l , d ) motifs and to use this algorithm to identify TFBSs in prokaryotic data sets . 
+ Motif-finding algorithms based on the consensus model can be further divided into two categories : pattern-driven and sample-driven approaches [ 16 ] . 
+ Using a patterndriven approach , one tries to enumerate all possible 4l l-mer motifs with lexical order . 
+ When applying a sample-driven approach , all possible ( l , d ) motifs generated from the real l-mers of input sequences are often tested . 
+ SPELLER , WEEDER , and MITRA-count are patterndriven approaches and Voting , PMPprune , WINNOWER , iTriple , VINE , Stemming , RecMotif , and sMCL-WMR are sample-driven . 
+ In general , pattern-driven approaches can automatically find ( l , d ) motifs in samples without being given the length l. However , the state-of-the-art sample-driven approaches are usually faster than the state-of-the-art pattern-driven approaches , and thus can be used to extract motifs with a larger l and d. Recently , the sample-driven approach , which transforms the ( l , d ) motif search by extracting the maximum clique or q-cliques ( q ≤ N ) from an N-partite graph , has attracted much attention . 
+ In this graph , each vertex is an l-mer . 
+ There is an edge between two l-mers from two different sequences if the Hamming distance between these two l-mers is no more than 2d . 
+ This is because the Hamming distance between each instance of a motif and the motif itself is assumed to be at most d. Thus , all instances of a motif must form a maximum clique or q-cliques in the graph . 
+ This idea was first presented in WINNOWER , which utilizes an extendable mechanism to cut off spurious edges by extending k-cliques ( k = 2 or k = 3 ) to larger ( k + 1 ) - cliques . 
+ However , the accuracy of WINNOWER can not be guaranteed since there is strong background noise in many sequences and true edges may be pruned by its local extension mechanism . 
+ The VINE algorithm , with its rigorous pruning steps , was proposed to speed up and increase the precision rate of WINNOWER . 
+ Similarly , iTriplet was designed to randomly select two reference sequences and identify all triplets ( 3-cliques ) in these as well as each of the remaining N − 2 sequences . 
+ All discovered triplets along with their sequence information are then inserted into hash tables as candidate motifs . 
+ If a table has enough instances ( e.g. , at least q ) , the motif can be identified as ` true ' . 
+ RecMotif was created to extract N-cliques by using reference sequences as well . 
+ It takes the selected reference vertices from the first x ( x = { 1 , 2 , · · · , N } ) reference sequences in order to select new reference vertices in the remaining sequences . 
+ As x is increased , the selection is continued ( x ← x + 1 ) if new reference vertices can be selected from the remaining sequences to obtain xcliques . 
+ Otherwise , the algorithm backtracks to the first x − 1 reference vertices and finds a substitute . 
+ RecMotif has been shown to be very fast for some ( l , d ) cases ( e.g. , ( 15 , 4 ) , ( 18 , 5 ) and ( 21 , 6 ) ) . 
+ However , in tests performed by Sun et al. [ 22 ] , it failed to find some weak motifs such as ( 15 , 5 ) , ( 18 , 6 ) , and ( 19 , 7 ) . 
+ Additionally , RecMotif operates under the OOPS constraint . 
+ During real applications , some sequences may not contain any instance of a true motif while some may contain multiple instances . 
+ With this work , we offer a more efficient algorithm for extracting long and weak ( l , d ) motifs from N-partite graphs using the more biologically-relevant ZOMOPS constraint . 
+ During our research , we made the following observations : ( 1 ) there may be too many spurious edges in an N-partite graph to extract the q-cliques ( q ≤ N ) needed to identify a weak motif ( e.g. , ( 15 , 4 ) or ( 18 , 6 ) ) and ( 2 ) if we construct the N-partite graph such that there is an edge between two l-mers from two different sequences if the Hamming distance between these two l-mers is no more than x ( d ≤ x ≤ 2d ) instead of exactly 2d , the motif instances in the graph may form a dense subgraph instead of a clique . 
+ Based on this information , we present a new algorithm : TFBSGroup . 
+ It first extracts dense subgraphs , which are groups of candidate instances ( i.e. , TFBSs ) , by the fast community detection algorithm BGLL [ 26 ] . 
+ I then greedily refines these candidate motifs towards the true motif within their own communities . 
+ Our empirical study shows that TFBSGroup can quickly discover long and weak ( l , d ) motifs ( e.g. , ( 18 , 6 ) and ( 24 , 8 ) motifs within 30 seconds ) in synthetical samples under the ZOMOPS constraint . 
+ More importantly , it is able to rapidly identify motifs in a large data set of prokaryotic promoters [ 25 ] generated from the Escherichia coli database RegulonDB [ 27 ] . 
+ It is also able to accurately discover motifs in 12 mouse transcription factor ChIP-seq data sets involved in ES cell pluripotency and self-renewal [ 28 ] . 
+ Results
+ We first tested TFBSGroup on a series of synthetic ( l , d ) samples and compared it with iTriplet ( source code : http://www.rci.rutgers.edu/∼gundersn/iTriplet/ ) and RecMotif ( source code provided by the authors ) . 
+ iTriplet and RecMotif are both sample-driven algorithms which heuristically extract q-cliques from an N-partite = graph ( q N for RecMotif because of the OOPS constraint ) . 
+ Meanwhile , we compared TFBSGroup with the pattern-driven algorithms SPELLER , WEEDER , and MITRA in order to reveal more differences between sample-driven and pattern-driven approaches . 
+ We then used TFBSGroup on a large data set of prokaryotic promoters generated from the Escherichia coli database RegulonDB for the purpose of finding real long and weak motifs . 
+ Also , we showed the results of TFBSGroup in discovering motifs in ChIP-seq data sets for 12 mouse transcription factors involved in ES cell pluripotency and self-renewal [ 28 ] . 
+ All experiments were performed on a computer with an Intel 2.99 GHz processor and 2GB of main memory running the Windows XP . 
+ Benchmark data sets
+ Like the previous work of Buhler and Tompa [ 9 ] , the testing samples were generated synthetically using the following steps : 1 ) A parent motif of length l was chosen by randomly picking l bases from the nucleotide set { A , C , G , T } . 
+ 2 ) N i.i.d. background sequences of length L were constructed at random . 
+ 3 ) q ( q ≤ N ) sequences were randomly selected from these N background sequences . 
+ 4 ) The following steps were performed for each of the selected q background sequences : 
+ In our experiments , unless otherwise specified , the number N and the length L of sequences in an ( l , d ) sample are set to 20 and 600 , respectively . 
+ Comparisons between TFBSGroup and state-of-art algorithms using ( l , d ) samples Firstly , to show the efficiency of TFBSGroup , we compared it with state-of-the-art algorithms including the pattern-driven SPELLER , WEEDER , and MITRA-count ( MITRA for short ) and the sample-driven iTriplet and RecMotif on the same testing samples ( Table 1 ) . 
+ Secondly , we tested the effect of the maximal length L of DNA sequences [ 20 ] . 
+ The test results are shown in Table 2 . 
+ WEEDER ( q ) indicates the execution time of WEEDER given q , q/f indicates that WEEDER failed to find the true motif for the given value q , TFBSGroup ( x ) indicates the run time of TFBSGroup given x ( d ≤ x ≤ 2d ) , s , m , and h denote seconds , minutes , and hours respectively , and ′ − ′ indicates a run time of over 10h . 
+ To demonstrate the accuracy of our algorithm , we ran TFBSgroup on 100 randomly generated test samples for each ( l , d ) pair and reported the number of samples in which the implanted motifs were correctly reported in the top 1/5/10 , and which were correctly identified but listed below the top 10 ( denoted b10 ) . 
+ We also reported the number of samples in which the implanted motifs were not correctly reported by TFBSGroup ( denoted f ) , since our algorithm TFBSGroup may fail in some cases . 
+ The accuracy of TFBSGroup for different ( l , d ) pairs is shown as 1/5/10 / b10/f after TFBSGroup ( x ) in Tables 1 and 2 . 
+ For example , we correctly found motifs ranked first in 95 samples , motifs ranked within the top 5 in 98 samples , and motifs ranked within the top 10 in 99 samples for ( 15 , 4 ) . 
+ However , we failed on one sample set . 
+ Thus , the accuracy of TFBSGroup on 100 ( 15 , 4 ) samples can be estimated as 95/98/99 / 0/1 , where motifs were ranked by their significance score [ 14,15 ] . 
+ Furthermore , since the run times of TFBSGroup on different samples of the same ( l , d ) pair showed negligible difference ( usually < 1 second ) under the same parameter settings , we did not average the run times of 100 samples for an ( l , d ) pair but instead kept the run times of TFBSGroup on the same sample sets used in the efficiency comparisons with other algorithms . 
+ In addition , Table 1 and Table 2 describe the results of VINE ( Huang et al. [ 20 ] ) and sMCL-WMR ( Boucher and King [ 23 ] , sMCL for short ) in order to compare these sample-driven algorithms , which extract cliques from N-partite graphs , to our work . 
+ ′ ∗ ′ indicates that no result was available from literature . 
+ The experiments using VINE were performed on a PC with an Intel Pentium IV 3.40 GHz processor and 1GB of main memor running Windows . 
+ Those for sMCL-WMR were performed on a Linux machine with a 2.6 GHz processor and 1Gb of RAM running Ubuntu Linux . 
+ We also tested iTriplet and found that the run times of two implementations of this algorithm on the same ( l , d ) sample were greatly different due to its random mechanism for selecting two reference sequences and an l-mer within a reference sequence . 
+ Taking five runs of iTriplet on the same ( 15 , 4 ) sample as an example , the minimum execution time was 0.859 seconds and the maximum was 9.156 seconds . 
+ We have reported the average run time of 5 runs of iTriplet . 
+ As shown in Table 1 , the sample-driven algorithms run faster than the pattern-driven variety . 
+ However , except for MITRA , we used only q and d as input in all implementations of the pattern-driven algorithms . 
+ The length l of planted motifs can be predicted by these algorithms while all the sample-driven types teste above and MITRA require l , d and q to be specified in advance . 
+ TFBSGroup can find all long and weak ( l , d ) motifs within 30 seconds under the ZOMOPS constraint . 
+ Most planted motifs in the synthetic samples ( with the exception of ( 10 , 2 ) ) were correctly reported in the top 1/5/10 . 
+ TFBSGroup performed with high accuracy when identifying exact matches for long and weak ( l , d ) motifs such as ( 15 , 4 ) , ( 16 , 5 ) , ( 18 , 6 ) , and ( 19 , 7 ) . 
+ However , TFBSGroup failed to find exact planted motifs in many cases involving conserved motifs ( e.g. , ( 10 , 2 ) ) . 
+ This may be because the networks are too sparse to form good communities . 
+ For hard ( l , d ) motif search problems such as ( 15 , 5 ) , ( 16 , 5 ) , ( 17 , 6 ) , and ( 18 , 6 ) , TFBSGroup is much more efficient than iTriplet , RecMotif , VINE , and sMCL-WMR . 
+ In addition , it is not easy to tune the parameter q in WEEDER since , with the decrease of q , the run time is dramatically increased . 
+ Moreover , according to our experiments , iTriplet can not be guaranteed to find true inserted motifs in all cases because of its random mechanism . 
+ Also , the memory usage of iTriplet is much higher and can freeze the computer during searches for long and weak motifs such as ( 19 , 7 ) . 
+ Mining for transcription factor binding sites in Escherichia coli K-12 In order to further evaluate TFBSGroup , we used the algorithm on a large data set ( ECRDB70-10 ) to find the binding sites within promoters of Escherichia coli K-12 DNA . 
+ This data , collected by Hu et al. [ 25 ] , is stored in RegulonDB [ 27 ] and contains groups of sequences with experimentally-determined binding sites in the middle regions of the sequences . 
+ We selected sequence sets with more than five DNA sequences . 
+ We used published motif consensuses in the literature , especially the results in Li et al. [ 30 ] , as a guide for inferring the values of l and d . 
+ We listed the motif consensuses , which were similar to the consensuses published in the literature . 
+ If no exact match or similar result was found in the literature , we listed the top ranked motif consensuses with the most binding locations in the middle regions of the sequences . 
+ The test results ( obtained within 1 minute by running TFBSGroup ) are shown in Figures 1 and 2 , where ` TF ' indicates the name of the transcription factor , ` # ' indicates the number of sequences in the corresponding set , ` Consensus ' indicates the motif consensus of the corresponding TFBSs given a TF , ` Logo ' indicates the sequence logo of all TFBSs for a specified TF ( created using the webbased application tool Weblogo [ 31 ] ) , ` Rank ' indicates the ranking number of the significance score [ 14,15 ] for the motif consensuses , ` Lit . ' 
+ indicates that similar motif consensuses have been published in the literature while ` * ' means no similar motif consensus was found in the literature , and ( l , d ) and x are represented in the same way as in Table 1 . 
+ As shown Figures 1 and 2 , TFBSGroup can efficiently find over-represented long motifs from prokaryotic promoters . 
+ We illustrate this point using the well-studied TFs CRP , FNR , and LexA as examples [ 21 ] . 
+ Binding site data for the CRP protein includes 138 DNA sequences of length 219 with the consensus TGTGAnnnnnnTCACA ( consensus model : ( 18 , 7 ) ) and 138 actual binding sites . 
+ The FNR data includes 50 DNA sequences of length 222 with the consensus TTGATnnnnATCAA ( consensus model : ( 14 , 4 ) ) and 50 actual binding sites . 
+ The LexA data includes 10 DNA sequences of length 222 with the consensus CTGTnnnnnnnnnnCAG ( consensus model : ( 16 , 6 ) ) and 10 actual binding sites . 
+ For all three sets , the expected motifs are ranked first by TFBSGroup in terms of their significance score : CRP is reported to have 121 sites ( 63 true ) , FNR is predicted to have 46 sites ( 23 true ) , and LexA is reported to have 12 sites ( 8 true ) . 
+ The precision on the site level ( precision = TP TP+FP ) is close to or greater than 50 % on these three data sets , where TP is the number of true positive sites and FP is the number of false positive sites . 
+ It should be pointed out , however , that some results marked with an ` * ' in Figures 1 and 2 may not be satisfactory due to the low specificity of binding sites for the TFs , insufficient number of sequences from which to a draw statistical conclusion , or a lack of knowledge of the proper ( l , d ) models . 
+ Compounding the problem is the fact that the true consensuses in these data sets are unknown , a difficulty which exists for all consensus model-based algorithms . 
+ Motif discovery in 12 mouse ES CELL ChIP-seq data sets To further evaluate the accuracy of motifs predicted by TFBSGroup , we analyzed the ChIP-seq data sets for 12 DNA-binding TFs ( CTCF , cMyc , Esrrb , Klf4 , Nanog , nMyc , Oct4 , Smad1 , Sox2 , STAT3 , Tcfcp2I1 , and Zfx ) involved in mouse embryonic stem cell pluripotency and self-renewal [ 28 ] . 
+ To prepare the data sets for use with motif discovery algorithms , we first extracted peak regions from ChIP-seq data using MACS [ 50 ] with a FDR ( false discovery rate ) threshold of 0.2 . 
+ We then mapped the centers of the ChIP-seq peaks to the mouse mm10 assembly and extracted 100-bp of genomic sequence centered around each peak . 
+ To compare motifs identified by TFBSGroup with motifs found in Chen et al. [ 28 ] , we ran TFBSGroup on hundreds of peaks with low p-values . 
+ The results of Chen et al. and TFBSGroup are shown in Figure 3 , where all sequence logos predicted by TFB-SGroup , including those in Figure 4 , were also created using the web-based application tool Weblogo [ 31 ] . 
+ We found motifs matching those identified in Chen et al. [ 28 ] . 
+ Specifically , motif logos predicted by Chen et al. [ 28 ] and TFBSGroup for each TF in Figure 3 ( with the exception of Klf4 and Zfx ) are exactly or ` almost ' exactly the same . 
+ However , it is well known that Klf4 is able to recognize GC-rich regions . 
+ ZFX has no known published consensus sequence , but its predicted motif agrees to some extent with the result of Chen et al. and the result predicted by cEMRMIT [ 51 ] . 
+ In [ 28 ] , Chen et al. used WEEDER and then refined and extended the motifs with an Expectation-Maximization method . 
+ This second step was necessary because the supplied version of the WEEDER algorithm limited the motif search to a maximum of 12 bps . 
+ As discussed in the previous sections , WEEDER operated with low efficiency for long motifs and was difficult to tune for the parameter q. On the contrary , TFBSGroup was able to find both long and weak motifs . 
+ We obtained the motifs and their TFBS locations in sequences within 1 minute for all data sets with just one run of TFBSGroup . 
+ In addition , we found alternative motifs for some TFs such as OCT4 , Esrrb and CTCF , which were also reported in a previous study [ 52 ] . 
+ One significant alternative motif for each of the three TFs is shown in Figure 4 . 
+ The TFBS sequences of this alternative motif were complementary to those of the main motif in Figure 3 for each of three TFs . 
+ Conclusions and discussion
+ In this work , we have developed a novel and efficient algorithm ( TFBSGroup ) for identifying ( l , d ) motifs under the ZOMOP constraint . 
+ It extracts dense subgraphs from an N-partite graph using a fast community detectio algorithm designed for processing large-scale networks ( BGLL ) . 
+ Based on these extracted communities , TFB-SGroup heuristically refines candidate motifs and their instances towards the true motifs . 
+ Experimental tests on synthetical samples have shown that TFBSGroup can very quickly discover long and weak ( l , d ) motifs and their instances . 
+ More importantly , TFBSGroup has achieved good performance in rapidly identifying motifs in a large data set of promoters generated from Escherichia coli and in accurately discovering motifs in ChIP-seq data set for 12 mouse transcription factors involved in ES cell pluripotency and self-renewal . 
+ Still , TFBSgroup may not work well in the extreme case that the number of mutations between each motif instance and the motif itself is exactly d , since the graph will be too dense to be partitioned sufficiently . 
+ Fortunately , this case seldom occurs in real applications . 
+ In the future , we plan to improve the algorithm by combining structure - and sequence-based methods in order to address this issue . 
+ Also , we plan to improve the algorithm 's ability to process large-scale ChIP-Seq data sets . 
+ Methods
+ ( l , d ) motif search and dense subgraph extraction Given a set of sequences S = { s1 , s2 , ... , sN } over a symbol set = { A , C , G , T } and positive integers l and d ( | si | ≤ L , 1 ≤ i ≤ N , 1 ≤ l ≤ L and 0 ≤ d < l ) , an ( l , d ) motif search finds a string t ∈ l such that for at least q ( q ≤ N ) sequences { si1 , si2 , · · · , siq } ⊆ S there exists a substring tij in each sij ( j = 1 , 2 , · · · , q ) with d ( t , tij ) ≤ d , where d ( t , tij ) indicates the Hamming distance between the two strings t and tij . 
+ Since the Hamming distance between each instance of a motif and the motif itself is at most d , the Hamming distance between two instances is no more than 2d and all instances of the true motif must form a q-clique . 
+ Therefore , we can obtain ( l , d ) motifs by extracting q-cliques from an N-partite graph where each vertex is an l-mer in S and there is an edge between two l-mers li and lj ( li and lj are l-mers of si and sj , respectively , i = j ) if the Hamming distance between the two l-mers is no more than 2d . 
+ As far as synthetic samples randomly generated by the method mentioned in the above section are concerned , the probability of two random l-mers with a maximum distance of x is x ( ) ∑ i 3 i 1 ( l − i ) p ( l , x ) = · . 
+ = l 4 4 i 0 
+ Thus , for a set of N sequences with length L , there are 0.5 × N × ( L − l +1 ) × ( N − 1 ) × ( L − l +1 ) × p ( l ,2 d ) random edges in the background sequences . 
+ For example , there are an estimated 18.2 million random edges in the background sequences for an ( 18 , 6 ) sample when N = 20 and L = 600 . 
+ There are also some spurious edges around the vertices of motif instances , especially for long and weak ( l , d ) motifs , since the neighbor vertices of a motif instance may have links to the neighbor vertices of other motif instances . 
+ Still , only q ∗ ( q − 1 ) / 2 ≤ N ∗ ( N − 1 ) / 2 edges are true positive links forming an expected q-clique . 
+ Suppose there is an edge between two vertices in an Npartite graph if the Hamming distance of the vertices is no more than x ( d ≤ x ≤ 2d ) instead of 2d . 
+ In this case , the number of spurious edges may be dramatically reduced . 
+ For example , if we set x = 7 for a real ( 18 , 6 ) sample , there are only 82,343 edges in the N-partite graph ( there are an estimated 80,335 edges using Eq . 
+ 1 ) . 
+ However , the instances of a motif in this situation may not form a qclique but rather a densely connected subgraph . 
+ We can obtain an ( l , d ) motif by detecting dense subgraphs in an N-partite graph where the distance of two vertices is at most x. 
+ Community detection and dense subgraph identification In recent years , complex network analysis has been highlighted in the research community . 
+ It is a powerful tool used to describe the structure of many complex systems in nature and society and has many potential applications . 
+ A network is usually represented by a graph G = V , E , ( ) where V is a set of n vertices and E is a set of m edges representing relationships between pairs of vertices . 
+ Community structure is one of the most important topological characteristics in a network . 
+ A community structure is a subgraph of a network whose vertices are more highly connected with each other than with vertices outside the subgraph . 
+ Therefore , the problem of community detection requires the partition of a network into communities of densely connected nodes such that ∀ u , v , u = v , Cu ∩ Cv = ∅ and ∪ uCu = V , where Cu ( or Cv ) is one of the partitioned communities . 
+ It should be apparent that community structure is a type of dense subgraph . 
+ The current algorithms for identifying communities in complex networks can be used to find dense subgraphs within graphs . 
+ Many methods to identify community structures in complex networks have been developed [ 53,54 ] . 
+ As mentioned above , however , an N-partite graph of input sequences with a long and weak ( l , d ) motif may be a large network with millions of edges . 
+ Fast community detection methods are required to partition a large-scale graph . 
+ In the field of complex network analysis , algorithms including Infomap [ 55 ] , BGLL [ 26 ] , LPA [ 56 ] , and RG [ 57 ] are designed to efficiently detect communities in very large networks . 
+ In this study , we use the BGLL algorithm [ 26 ] to find dense subgraphs in an N-partite graph . 
+ This algorithm is best for our purposes since we only need a coarse partition and BGLL is very fast and easy to use . 
+ The source code for this software can be obtained from http://findcommunities.googlepages.com/ . 
+ BGLL : a near linear time algorithm for community detection BGLL is a heuristic method for optimizing modularity ( Eq . 
+ 2 ) , which measures the difference between the empirical distribution of in-community connections of a partition and the expected distribution of in-community connections of a partition in a randomly generated graph with the same vertex degree distribution [ 58 ] . 
+ 1 ∑ kikj Q = [ Aij − ] δ ( Ci , Cj ) , 2m 2m i , j ∈ V where Aij is the weight of the edge ( i , j ) . 
+ If a network is unweig ∑ n hted , Aij = 1 for ( i , j ) ∈ E , otherwise Aij = 0 . 
+ ki = j = 1 Aij is the sum of the weights of the edges attached to vertex i or the degree of the node i for an unweighted network . 
+ Ci is the community to which the node i is assigned and δ ( · , · ) is the Kronecker function . 
+ A larger Q yields a better partition . 
+ The BGLL algorithm can be divided into two iterative phases . 
+ Firstly , it assigns a different community to each node of a network . 
+ Then , for each node i , it considers the neighbors j of i ( ( i , j ) ∈ E ) and evaluates the gain of modularity Q ( Eq . 
+ 3 ) that would take place by removing i from its community and placing it in the community of j. [ ∑ ∑ ] + 2k = in i , in − ( tot + ki ) 2 [ ∑ 2m ∑ 2m ] − in − ( tot ) 2 − ki 2 ( ) , 2m 2m 2m ∑ where in is the s ∑ um of the weights of the edges inside a community Cu , tot is the sum of the weights of the edges incident to nodes in Cu , and ki , in is the sum of the weights of the edges from i to nodes in Cu . 
+ If the gain is negative , i stays in its original community , otherwise , i is placed in the community which provides maximum gain . 
+ The second phase of the algorithm involves building a new network whose nodes make up the communities found during the first phase . 
+ The weights of the edges between the new nodes are the sum of the weight of the edges between nodes in the corresponding two communities . 
+ Edges between nodes of the same community lead to self-loops for this community in the new network . 
+ The algorithm naturally incorporates a notion of hierarchy , which results in communities of communities . 
+ The BGLL algorithm is extremely fast and performs in linear time on typical and sparse data , since the gain of modularity is very easy to compute with Eq . 
+ 2 and the number of communities decreases drastically after just a few passes . 
+ Most of the run time lies within the first iteration [ 26 ] . 
+ In this study , we took the results of the first iteration only since we are interested in obtaining coarse-grained candidate motifs and their group of instances from dense subgraphs of the original network and are not concerned about the hierarchical structure of communities 
+ TFBSgroup operates in two phases . 
+ Firstly , we construct an N-partite graph where the distances of pairs of vertices are no more than x ( d ≤ x ≤ 2d ) for a set of input sequences , which is assumed to contain an ( l , d ) motif and at least q ( q ≤ N ) instances . 
+ We then detect all communities with a size of at least t ( the default is q/2 ) in the N-partite graph using the BGLL algorithm to obtain a candidate motif consensus by aligning all l-mers in each community . 
+ Secondly , we greedily refine these candidate motifs toward the true motif using the following three steps : 1 ) For each candidate motif consensus , we look for the vertices in the neighbor set of the current community Cu for which the Hamming distance between the consensus and the corresponding l-mers of these vertices are no more than d in order to form a new community . 
+ A vertex belonging to the new community is in the neighbor set of the current community Cu if the position of the corresponding l-mer of the vertex is in the interval [ max { 0 , posvi − l } , min { posvi + l , L − l } ] . 
+ posvi is the start position of the corresponding l-mer of a vertex vi ( vi ∈ Cu ) in the sequence to which it belongs . 
+ 2 ) We align the new community to obtain a new candidate motif consensus . 
+ We then iteratively execute step 1 and step 2 until each new candidate consensus can not be changed . 
+ 3 ) We shift the corresponding l-mers ( an l-mer corresponds to a vertex vi in the final community ) of the final candidate motif in the interval [ max { 0 , posvi − l/3 } , min { posvi + l/3 , L − l } ] , since the true motif and its instances may be near the final candidate motif consensus and their instances . 
+ Furthermore , since there may be many false positive motifs , we sort all output motifs according to their statistical significance using the method of Pavesi et al. [ 14,15 ] and then delete the duplicates . 
+ Finally , the top k significant motifs and their instances are returned . 
+ TFBSGroup is so named because it completes its run after all potential motifs and the groups of their instances ( corresponding to the groups of TFBSs in a set of DNA sequences ) are reported as output . 
+ The details of the TFBSGroup algorithm are shown below , where ( i , j ) indicates an instance starting at the j-th position of the i-th sequence si . 
+ t ( default = q/2 ) is used to filter false positive groups forming small communities during the initial stage . 
+ This will not affect the result or the speed of the algorithm in our simulations because the largest group usually has the highest significance score . 
+ We can set t = 0 to ensure all candidate groups are examined . 
+ Furthermore , the window size l/3 is used to ensure that the predicted TFBSs are within its inserted positions and not around them . 
+ Generally , we can identify true motifs when the window size is less than l/3 . 
+ The source code for TFBSGroup can be obtained from http://bioinformatics.bioengr.uic.edu/ TFBSGroup / or Additional file 1 . 
+ Time and space complexity of TFBSGroup
+ The time complexity of TFBSGroup depends mainly on the first phase of the algorithm , which includes time for constructing an N-partite graph with distance x ( d ≤ x ≤ 2d ) and time for extracting communities from the constructed N-partite graph . 
+ During the second phase , the algorithm searches through candidate motif consensuses and their instances within each of the communities . 
+ There are at most N × ( L − l +1 ) l-mers for a set of DNA sequences with a length of at most L. Therefore , there are at most N × ( L − l + 1 ) × ( N − 1 ) × ( L − l + 1 ) × l/2 = O ( N2 × L2 × l ) comparisons for constructing an N-partite graph . 
+ During one pass of BGLL , the algorithm computes Q at most t times for each vertex in a network , where t is the maximum number of neighbors of a vertex in the network . 
+ The time complexity of BGLL is bound by O ( N × L × t × time Q ) for extracting communities from an N-partite graph because there are at most N × ( L − l + 1 ) vertices in a network , where time Q is the time complexity for computing Q . 
+ As a result , the time complexity of TFBSGroup for the worst case is × × × = × × × O ( t N L time Q ) O ( t N L m ) ∼ = O ( p ( l , x ) 2 × N4 × L4 ) , where m is the number of edges in a network , m = O ( p ( l , x ) × N2 × L2 ) and t = O ( p ( l , x ) × N × L ) , as estimated using Eq . 
+ 1 . 
+ However , it should be noted that the above bound can be a substantial overestimate . 
+ The time complexity of TFBSGroup is almost equal to the time complexity of BGLL , which is near linear with respect to m in real applications , especially for sparse networks [ 26 ] . 
+ The space complexity of TFBSGroup is mainly affected by the storage of all l-mers and an N-partite graph , where the distance of two vertices is at most x. Thus , the space complexity of TFBSGroup is O ( m ) = O ( p ( l , x ) × N2 × L2 ) , while it is at least O ( p ( l , 2d ) × N2 × L2 ) for previous graph-based algorithms . 
+ The time and space complexity of TFBSGroup and several related algorithms ( except for sMCL , since no complexity analysis is available for this algorithm in the literature [ 23 ] ) are shown in Table 3 , where the left half lists pattern-driven algorithms ( labeled ` Pattern-D ' ) and the right half lists sample - v ∑ ( dri ) en algorithms ( labeled ` Sample-D ' ) . 
+ N l , d = d i ( ) i = 0 l 3i and w is the word length , which corresponds to bit length of a processor . 
+ The time complexity of RecMotif relies heavily on the value of p ( l , 2d ) ( see Table one in Sun et al. [ 22 ] ) . 
+ According to Table 3 , each algorithm has its own advantages . 
+ The time complexity of pattern-driven algorithms is higher but they have lower space complexity . 
+ The time complexity of sample-driven algorithms is lower but they generally have higher space complexity . 
+ RecMotif is too sensitive to p ( l , 2d ) and L . 
+ When p ( l , 2d ) is small , RecMotif runs very fast . 
+ However , when p ( l , 2d ) is larger than 0.28 , RecMotif can not produce the results within a reasonable amount of time . 
+ TFBSGroup is a complement to sample-driven algorithms since it makes a reasonable trade-off between speed and accuracy . 
+ The choice of x
+ The key problem with TFBSGroup is the selection of the parameter x . 
+ If x is too large , the N-partite graph may be too dense to define a community containing only the instances of a motif . 
+ If x is too small , the graph is too sparse to form the expected communities and the true group of TFBSs will be missed . 
+ In this study , we use an experimental statistical method to estimate x for a specified ( l , d ) motif search problem . 
+ Firstly , for a given l-mer consensus , we created 500 instances of the consensus with the Hamming distance between the consensus and each instance equal to at most d . 
+ We then computed the Hamming distance for each pair of instances and counted the number ny of instance pairs with distance y ( y = { 0 , 1 , 2 , · · · , 2d } ) to get the frequency ny distribution . 
+ The center of this distribution should be an estimation of x , i.e. , x ≥ max { ny , y ∈ { 0 , 1 , 2 , · · · , 2d } } or close to this . 
+ Taking ( 18 , 6 ) and ( 19 , 7 ) as examples , the histograms of the frequency ny distribution are shown in Figure 5 . 
+ Using these distributions as a guide , we set x = 7 and x = 8 for ( 18 , 6 ) and ( 19 , 7 ) , respectively . 
+ Additional file 1: The C++ Version of TFBSGroup (for WindowsXP).
+ Competing interests
+ The authors declare that they have no competing interests.
+ Authors’ contributions
+ CJ initiated the project , analyzed the data , and drafted the manuscript ; MC and JY participated in the study design , discussion , and editing of the manuscript . 
+ All authors read and approved the final manuscript 
+ Acknowledgements
+ This work was supported in part by the National Nature Science Foundation of China ( Grant No. 60905029 , 61105055 , and 81230086 ) , the Beijing Natural Science Foundation ( Grant No. 4112046 ) , and the Fundamental Research Funds for the Central Universities . 
+ The authors would like to thank Dr. H. Q. Sun for providing the source code for RecMotif . 
+ Received: 22 October 2012 Accepted: 12 July 2013 Published: 17 July 2013
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/24053571.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/24053571.txt 0 → 100644
View file @27818a9
+ “Non-canonical protein-DNA interactions
+ Abstract 
+ Background : Studies of protein association with DNA on a genome wide scale are possible through methods like ChIP-Chip or ChIP-Seq . 
+ Massive problems with false positive signals in our own experiments motivated us to revise the standard ChIP-Chip protocol . 
+ Analysis of chromosome wide binding of the alternative sigma factor σ32 in Escherichia coli with this new protocol resulted in detection of only a subset of binding sites found in a previous study by Wade and colleagues . 
+ We suggested that the remainder of binding sites detected in the previous study are likely to be false positives . 
+ In a recent article the Wade group claimed that our conclusion is wrong and that the disputed sites are genuine σ32 binding sites . 
+ They further claimed that the non-detection of these sites in our study was due to low data quality . 
+ Results/discussion : We respond to the criticism of Wade and colleagues and discuss some general questions of ChIP-based studies . 
+ We outline why the quality of our data is sufficient to derive meaningful results . 
+ Specific points are : ( i ) the modifications we introduced into the standard ChIP-Chip protocol do not necessarily result in a low dynamic range , ( ii ) correlation between ChIP-Chip replicates should not be calculated based on the whole data set as done in transcript analysis , ( iii ) control experiments are essential for identifying false positives . 
+ Suggestions are made how ChIP-based methods could be further optimized and which alternative approaches can be used to strengthen conclusions . 
+ Background
+ In a recent article in this journal we described o.ur experience with application of the ChIP-Chip method [ 1 ] . 
+ Our focus was on the replication protein SeqA which had been shown to be specific for hemi-methylated GATC-sequences [ 2 ] . 
+ To gain a deeper understanding of the DNA-binding of SeqA we applied a widely used standard ChIP-Chip protocol [ 3 ] . 
+ As a proof that the method works well in our hands we performed ChIP-Chip experiments with RNA-polymerase antibody as published previously [ 4 ] . 
+ To our great surprise the binding sites we detected for SeqA and RNAP were highly similar . 
+ This was absolutely unexpected because many SeqA-bound DNA-regions detected in this experiment did not contain many of the established GATC binding sequences . 
+ One possibility is that these non-canonical protein-DNA interactions could be genuine binding sites and therefore an indication that our understanding of DNA-binding proteins is incomplete . 
+ We considered the alternative possibility that our surprising results might be artifacts . 
+ The key experiment to distinguish between these explanations was a ChIP-Chip using a ΔseqA strain with a SeqA antibody . 
+ Also in this experiment we detected binding signals which unambiguously demonstrated unspecifically enriched chromosomal regions via the used standard method . 
+ The unspecific signals could be caused by binding of non-target proteins by the antibody . 
+ In deed the quality and type of antibody are critical for the quality of ChIP based methods [ 5,6 ] . 
+ However , the antibody turned out not to be the problem in this case . 
+ Evaluation of the ChIP-Chip method led to the identification of four causes for these false signals : i ) non-unique sequences , ii ) incomplete reversion of crosslinks , iii ) inappropriate retention of protein in spincolumns and iv ) insufficient RNase treatment [ 1 ] . 
+ We established a modified ChIP-Chip protocol to minimize the effects of these sources of false positive ChIP peaks and applied it using the SeqA antibody . 
+ The SeqA binding pattern detected with this new protocol was radically different from the standard protocol with almost no overlap . 
+ This means that specific details of a protocol changed the chromosomal binding pattern completely . 
+ The SeqA binding sites we detected with our modified method were exclusively canonical binding sites with binding signals being proportional to the number of GATC sites in the respective regions . 
+ Thus , in the case of SeqA the non-canonical protein-DNA interactions identified with the standard ChIP-Chip method are artifacts . 
+ In 2006 , Wade and colleagues published a ChIP-Chip study on the alternative sigma factor σ32 [ 7 ] . 
+ In addition to 38 known binding sites they surprisingly found 49 new non-canonical binding sites . 
+ These non-canonical sites could be either genuine binding sites or artifacts . 
+ Wade et al. concluded that these sites are genuine σ32 binging sites . 
+ Based on our experience with SeqA described above we considered the possibility that these non-canonical sites might instead be false positives . 
+ This idea was supported by the lack of a control ChIP-Chip experiment in the Wade et al. study and the fact that they refer to the same protocol that gave the enormous false positive rate in our first SeqA attempt [ 1 ] . 
+ In our study , the ΔseqA control strain was crucial for identifying false positives . 
+ We applied our modified ChIP-Chip protocol to analyze σ32 binding on the E. coli chromosome . 
+ We detected almost all of the canonical σ32 binding sites but only very few of the non-canonical sites . 
+ Taken together these findings led to the conclusion that the majority of non-canonical σ32 sites described by Wade et al. are probably not genuine binding sites but instead false positives [ 1 ] . 
+ In a recent article in this journal Wade and colleagues published a new study claiming that our conclusion was wrong and that the noncanonical σ32 sites are in fact genuine binding sites [ 8 ] . 
+ They base their view on ChIP coupled with qPCR ana-lyses of 4 out of the 49 `` Disputed σ32 sites '' ( DSTs ) and the claim that the quality of our ChIP-Chip data is low compared to their study . 
+ In addition they find that the specific ChIP enrichment is reduced because of the increased stringency changes we introduced into the protocol . 
+ Here , we respond to the new study of the 
+ Wade group and use this to discuss some critical questions surrounding ChIP-Chip analysis . 
+ What is good data quality in ChIP-Chip studies ? 
+ As with most methods the quality of ChIP-Chip derived data varies . 
+ This might be due to the details of the meth-odology , the type and quality of the antibody , the biological samples , as well as the performance and experience of the experimenter . 
+ Wade and colleagues reanalyzed their own and our data regarding dynamic range and reproducibility and concluded that both aspects were better in their study . 
+ We appreciate if other scientists re-analyze our data to come to their own conclusion . 
+ This is why we routinely store our ChIP-Chip data in public databases such as the Gene Expression Omnibus ( GEO ) which is publically accessible . 
+ The required detailed description of experimental procedures and data processing together with storage of raw as well as processed data is essential for thorough follow-up analysis . 
+ Thus , we recommend the open access storage of genome wide ChIP studies in general . 
+ Unfortunately , the debated data of Wade and colleagues are not easily accessible . 
+ Wade and colleagues might want to consider storage of their data in a public database to facilitate data comparison and analysis by others . 
+ Below , we discuss questions related to the dynamic range and reproducibility of ChIP-Chip derived data . 
+ Dynamic range
+ We accept the claim by Wade and colleagues that the dynamic range of their study is higher than in our data set . 
+ However , the dynamic range is not a suitable quality measurement for inter-platform comparison . 
+ The main reason why the dynamic range in our study is lower is because we used an improved microarray with a higher probe number and density . 
+ With such a higher probe density the ChIP-DNA is distributed across a greater number of probes ( Figure 1 ) . 
+ This would certainly decrease the dynamic range but at the same time greatly increase data quality . 
+ This is because binding site detection can be assisted by comparisons between multiple neighboring probe signals ( Figure 1 ) . 
+ Qi and colleagues tested the relationship between probe density and confidence in binding site detection systematically and came to the conclusion that `` a single high density microarray ( 100-bp probe spacing ) provides better spatial resolution than three experimental replicates using lower density arrays ( 300-bp probe spacing ) '' [ 9 ] . 
+ If the lower dynamic range of our ChIP-Chip data is the reason why we did not detect the `` disputed σ32 sites '' ( DSTs ) , then this would only apply to the targets with the lowest values . 
+ However , the three DSTs with the highest ChIP-score in the Wade et al. paper ( ytfI , ygcI and yghJ ) were not detected in our study . 
+ At the sam time we detected known targets that showed a lower score in the Wade et al. study ( for example grpE , yccV , hepA ) . 
+ Furthermore , the question remains if our changes to the ChIP-Chip methodology decrease the dynamic range as suggested by Wade and colleagues and whether such a decrease is relevant to this discussion over the identification of false positives . 
+ We believe that our modifications of the ChIP-Chip protocol do not prohibit necessary dynamic ranges . 
+ Support for this comes from a SeqA ChIP-Chip experiment with synchronized E. coli cells [ 10 ] . 
+ Data was obtained from cells shortly after synchronous initiation of replication ( 5 or 6 min ) using both the standard protocol [ 11 ] and our modified version [ 10 ] . 
+ As Wade and colleagues point out the data from both protocols show similar results ( Figure 2 ) . 
+ However the dynamic range appears to be higher with our modified protocol ( 98.6 ) compared to the study with the standard protocol ( 5.1 ; Figure 2 ) . 
+ The critical point here is that the same antibody , the same E. coli strain and the same microarrays were used for the experiments . 
+ Also for genome wide analyses of SeqA binding in unsynchronized E. coli cells our changed method gave higher dynamic ranges compared to the standard protocol [ 1,11 ] . 
+ Reproducibility
+ For reliable data , experiments need to be reproducible and the data from the replicates should be comparable . 
+ For ChIP-Chip data , a straight-forward analysis of reproducibility is difficult . 
+ This is because most of the data on the microarray can be considered background . 
+ Even with a protein of interest binding some hundred times , this will be only a small fraction compared to the whole gen-ome . 
+ Subsequently only some probes are expected to give a relevant signal . 
+ The remaining probes will detect only background DNA . 
+ For calculations of correlation coefficients , as done by Wade and colleagues , this means that one mainly calculates the correlation of the background signal . 
+ Thus we consider the information gain of this number limited . 
+ The way we incorporated the reproducibility in our study was to consider only signals as relevant that reached a certain threshold in both replicates , as was the case in the analysis by Wade and colleagues . 
+ Since we have in this way detected in our data almost all known and published σ32 targets , we consider the reproducibility of our data as solid . 
+ There are other ways to assess the reproducibility of ChIP-Chip and ChIP-Seq data . 
+ The critical point is to focus on the target sites . 
+ This can be difficult if one lacks an estimate of the expected number of targets . 
+ One way to deal with this is a stepwise comparison of ranked target-lists and compute the fraction of overlapping targets in the highest 10 , 20 , 30 , ... % . 
+ In our previous study we used the highest 1.000 probes ( out of 40.000 ) to plot Venn diagrams for experiment comparison [ 1 ] . 
+ Such a quantitation of target reproducibility helps the reader in data interpretation and should be provided if possible . 
+ Control experiments
+ While discussion about data quality is certainly import-ant , it distracts from the main point of our study . 
+ The erroneous data we got for SeqA using the standar protocol had an excellent dynamic range and reproduci-bility was high . 
+ In fact we got the highest dynamic range with our control using a ΔseqA strain and the SeqA antibody . 
+ However , all of the detected peaks in this experiment must be false . 
+ This is actually what we consider the most dangerous fact about the false positive peaks we identified . 
+ They appear as wonderful , reproducible hits and not as noise ( Figure 3 ) . 
+ In our view this is why such false positive enrichments could easily be accepted and published as true binding sites . 
+ We have discussed the importance of control experiments as a critical step to identify false positives [ 1 ] . 
+ Our control experiment for the ChIP-Chip detection of the heat shock sigma factor σ32 in heat shocked E. coli cells was a similar experiment using non-heat shocked cells . 
+ It is remarkable that Wade and colleagues did not include any ChIP-Chip control experiment in their σ32 study . 
+ In a recent study , binding of LeuO to the Salmonella enterica genome was analyzed by ChIP-Chip [ 12 ] . 
+ Dillon and colleagues found 261 binding sites using the ChIPOTle peak finding program . 
+ However , they were aware of the possibility of false positives in ChIP-Chip data and performed a mock control experiment . 
+ In this control 83 peaks were detected overlapping with the 261 potential LeuO peaks . 
+ Dillon and colleagues identified them as false-positives and considered only the remaining 178 as likely LeuO binding sites . 
+ The approach of Dillon and colleagues supports our argument that , firstly , false positives are a serious problem in ChIP-Chip studies and secondly , control ChIP-Chip experiments can help to detect and reduce false-positives . 
+ This is also true for ChIP-Seq where it was shown that peak-scoring algorithms using 2-sample scoring ( scoring sample vs. control experiment ) perform better than single-sample scoring ones [ 13 ] . 
+ Are the new sigma32 targets found by Wade and colleagues real targets or false positives ? 
+ Although Wade and colleagues did not include a ChIP-Chip control experiment in their original study , they performed ChIP-qPCR experiments of 3 selected loci out of the 49 `` disputed σ32 sites '' [ 7 ] . 
+ Notably , here they included a non-heat shock control . 
+ In their new study they analyze 3 more loci [ 8 ] . 
+ The six analyzed loci indeed show temperature dependent association with σ32 . 
+ These results are contradictory to our ChIP-Chip data where no significant temperature dependent association at the respective loci was found . 
+ Further experiments might help to resolve this contradiction . 
+ It is even more important to analyze the 43 remaining DSTs for which no temperature dependent change in σ32 binding has been shown so far . 
+ We suggest that alternative method are needed for verification . 
+ Temperature dependent induction of mRNAs at the respective regions could be considered additional evidence but was not found for most `` disputed σ32 sites '' [ 7,14,15 ] . 
+ Also sequences resembling the well characterized σ32 target promoter sequence in the debated regions would promote them as genuine binding sites . 
+ However , Wade et al. note that for many of the DSTs no such typical binding sequences could be found [ 7 ] . 
+ They suggest that at these sites σ32 binding is mediated by transcriptional activators that are functional only after heat shock . 
+ Identification of these predicted factors would certainly be important for the discussion about DSTs One possibility to find these factors would be mChIP , where proteins co-purified in a ChIP reaction are analyzed [ 16 ] . 
+ What would be an appropriate alternative method to clarify disputed ChIP sites ? 
+ For protein interactions , a popular approach is to compliment one pull down experiment with the reverse pull down , meaning both protein partners should be interchangeable as ` bait ' and ` prey ' . 
+ For ChIP experiments the reverse approach would be to use the DNA as bait to catch the proteins which are proposed to bind this motif . 
+ Such methods have actually been developed [ 17,18 ] . 
+ Can the ChIP protocol still be improved?
+ One thing that becomes clear from both our study and that of the current Wade study is that the experimental details can change the output of ChIP-Chip experiments dramatically [ 1,8 ] . 
+ We did introduce some changes to the method that completely changed the detected SeqA binding pattern towards what we believe to be a more reasonable result . 
+ However , we also believe that there is still room for improvements . 
+ One main point in consideration is the use of Spin-X columns for washing of the IP bound to Protein A agarose beads . 
+ We had found that a problem is the unspecific binding of highly transcribed and consequently highly crosslinked pieces of DNA to the column matrix . 
+ We suggest that omission of such columns solves the problem that these unspecific bound fragments are washed off the column in the elution step and appear as peaks on the microarray . 
+ Instead of using the columns for collection of the agarose beads we use simple centrifugation and supernatant removal . 
+ Wade and colleagues make the point that the columns are ne-cessary to achieve thorough washing . 
+ Interestingly they actually omit the columns in the first step of the proced-ure where the beads are separated from the cell extract [ 8 ] . 
+ While in the original method description this is don using the Spin-X columns [ 7 ] , Wade and colleagues collect the beads by centrifugation without columns in this first step just as we suggested to do [ 8 ] . 
+ This first step is probably the point where most unspecific binding to the column occurs and the omission of columns in this step would be expected to greatly facilitate a reduction in false signals . 
+ The following washing steps might be less critical in this respect and the use of Spin-X columns possible or even beneficial . 
+ This is certainly a point for further investigations . 
+ A related potential improvement is the choice of the actual column to be used . 
+ The Spin-X column , for example , is available with various matrix material and pore sizes . 
+ We suspect that if unspecific binding to the column is a problem , then this should vary with the pore size and DNA fragment size . 
+ It is noteworthy that other aspects of ChIP-based methods need to be considered beyond the aspects covered by the current discussion . 
+ Most prominent is the computational part of the process which provides new challenges with the advent of ChIP-Seq [ 9,13 ] . 
+ This computational aspect is certainly important for identifying false positives . 
+ Conclusions
+ ChIP-Chip or ChIP-Seq are wonderful methods to get insights into protein binding to genomes . 
+ We try to promote these methods by optimizing them and alerting other scientists to potential difficulties in data generation and interpretation . 
+ We agree with Wade and colleagues that surprising non-canonical protein-DNA interactions can `` indicate novel functions for well-studied proteins '' . 
+ Examples show that non-canonical binding sites can indeed be functional relevant [ 19,20 ] . 
+ However , we and many others have detected false positives in ChIP-Chip experiments and it is not unlikely that some false positives have not been recognized as such but interpreted and published as real targets . 
+ Wade and colleagues write in their conclusion that our view that surprising ChIP-Chip results are often artifacts is a `` dogmatic approach '' [ 8 ] . 
+ Our conclusions in that regard were not meant to be taken as dogmatic , but rather a respectful caution against wasteful scientific pursuit that could be based upon erroneous conclusions . 
+ The revision of methods and criticism of published results of peers is not always appreciated , and neither is the prospect of having one 's own conclusions questioned . 
+ However , it is an essential part of scientific progress . 
+ We hope that other scientists examine the results and argumentations published by the Wade group and ourselves and come to their own conclusions . 
+ For the future , we are anticipating new results which we hope will help clarify the debated issues surrounding the ChIP-Chip method . 
+ Abbreviations
+ DST : Disputed σ32 targets ; ChIP : Chromatin immunoprecipitation ; ChIP-Chip : Chromatin immunoprecipitation with microarray technology ; ChIP-Seq : Chromatin immunoprecipitation with next generation sequencing technology ; RNAP : RNA polymerase . 
+ Competing interests
+ The authors declare that they have no competing interests.
+ Authors’ contributions
+ TW wrote the paper with input from DS . 
+ DS prepared the figures with input from TW . 
+ Both authors read and approved the final manuscript . 
+ Acknowledgements
+ We thank Matthew McIntosh , an anonymous reviewer and members of the Waldminghaus lab for helpful comments on the manuscript . 
+ We thank David Grainger and Stephen Busby for stimulating discussions . 
+ This work was supported within the LOEWE program of the State of Hesse . 
+ Response
+ By Richard P. Bonocora1 , Devon M. Fitzgerald2 , Anne M. Stringer1 and Joseph T. Wade1 ,2 † 1 Wadsworth Center , New York State Department of Health , Albany , NY 12208 , USA 2Department of Biomedical Sciences , University at Albany , Albany , NY 12201 , USA † corresponding author : jwade@wadsworth.org Debate and criticism must be welcomed in any scientific endeavour ; however , such debate and criticism must also be based on solid experimental evidence . 
+ While Schindler and Waldminghaus have responded to our critique [ 14 ] of their earlier paper , we note that their response offers no new experimental evidence or data analysis . 
+ Hence , our opinion is unchanged : the disputed sites of σ32 binding ( DSTs ) are genuine , and noncanonical sites identified by ChIP-chip or ChIP-seq are not artifacts to be disregarded . 
+ We feel that three points require specific attention : 
+ 1. The modifications to the ChIP method proposed by Waldminghaus and colleagues do not improve data quality. By directly comparing the
+ two methods in targeted , controlled ChIP assays , we have clearly demonstrated that the standard ChIP method is more effective than the modified method at detecting association of σ32 with both well-established targets and DSTs . 
+ Additional experiments with the transcription factor AraC confirm that the general outcome of ChIP experiments is unchanged by the use of Spin-X columns during the wash steps ; if anything , use of Spin-X columns increases signal . 
+ We note that most ChIP studies do not use Spin-X columns until the wash steps , and this modification to the method was applied before the study of Waldminghaus and Skarstad [ 1 ] , e.g. [ 21 ] . 
+ Instead of representing a methodological improvement , we propose that Waldminghaus and Skarstad 's σ32 ChIP-chip experiments simply had reduced sensitivity due to a combination of the ChIP protocol , antibody , and/or growth conditions , all of which differed from our own . 
+ Although we have focused on σ32 , an independent study of SeqA , using the standard ChIP-chip method , yielded almost identical data to those generated by Waldminghaus and colleagues . 
+ This similarity was recently noted by a group who successfully used the standard ChIP method to measure DNA binding by SeqA [ 22 ] . 
+ The data presented by Schindler and Waldminghaus in Figure 2A is misleading because the scales are not comparable for the two datasets , and the time-points after replication initiation are different ( signal at the replication origin is expected to drop rapidly following replication initiation ) . 
+ 2 . 
+ Most , if not all DSTs are genuine sites of σ32 association . 
+ Many lines of evidence support this . 
+ First , we tested 6 DSTs using ChIP/qPCR , with an appropriate control ; all 6 targets were confirmed . 
+ Whilst the remaining 43 DSTs were not tested , our data strongly suggest that most are genuine sites of σ32 association . 
+ We find it unrealistic to suggest that every single target identified by a ChIP-chip or ChIP-seq experiment should be validated individually . 
+ Second , while the standard ChIP protocol reliably yields higher enrichment values at DSTs in targeted experiments , the modified method also detects σ32 binding at 2 out of 4 DSTs tested . 
+ Moreover , the ranking of DSTs as a group increases significantly among all genomic regions followin heat shock in Waldminghaus and colleagues ' ChIP-chip data ( Mann-Whitney U Test , p = 1e-6 ) , as would be expected for genuine sites of σ32 association . 
+ Both of these findings support the idea that discrepancies between the datasets are related to differences in assay sensitivity , not the validity of targets . 
+ Third , nine of the DSTs were identified in independent transcriptomic studies [ 14,15 ] and we have shown that RNA polymerase levels increase at two of four DSTs tested following overexpression of σ32 . 
+ These two DSTs were not identified by either transcriptomic study , indicating that such experiments are not necessarily sensitive enough to detect all regulated RNAs . 
+ Fourth , DSTs are more likely to be located in intergenic regions than expected by chance ( Binomial Test , p = 0.00033 ) . 
+ Schindler and Waldminghaus suggest that DSTs are not real σ32 binding sites because many of them lack detectable motifs and/or were not detected in transcriptomic studies . 
+ However , since DSTs are generally weakly bound , they would be expected to bind more degenerate motifs and be associated with more subtle changes in transcription . 
+ headline conclusion of Waldminghuas and Skarstad 's paper `` ChIP on Chip : surprising results are often artifacts '' , and their subsequent criticism of work from multiple laboratories , is highly misleading . 
+ We and others have identified many non-canonical targets for bacterial DNA-binding proteins that have been validated in controlled experiments . 
+ In fact , 15 of the 47 σ32 targets identified by Waldminghaus and Skarstad are non-canonical ( located inside genes or not associated with detectable regulation in transcriptomic studies ) and , by their own logic , should be mistrusted . 
+ Although ChIP-chip and ChIP-seq studies in bacteria have been limited in number , a recent large-scale ChIP-seq study in Mycobacterium tuberculosis identified hundreds of non-canonical transcription factor binding sites [ 23 ] . 
+ We conclude that non-canonical binding sites for bacterial DNA-binding proteins occur often and should be the subject of further study precisely because they do not conform to the text-book model of transcription regulation . 
+ artifacts . 
+ BMC Genomics 2010 , 11:414 . 
+ 2 . 
+ Waldminghaus T , Skarstad K : The Escherichia coli SeqA protein . 
+ Plasmid 2009 , 61 ( 3 ) :141 -- 150 . 
+ 3 . 
+ Grainger DC , Overton TW , Reppas N , Wade JT , Tamai E , Hobman JL , Constantinidou C , Struhl K , Church G , Busby SJ : Genomic studies with Escherichia coli MelR protein : applications of chromatin immunoprecipitation and microarrays . 
+ J Bacteriol 2004 , 186 ( 20 ) :6938 -- 6943 . 
+ 4 . 
+ Reppas NB , Wade JT , Church GM , Struhl K : The transition between transcriptional initiation and elongation in E. coli is highly variable and often rate limiting . 
+ Mol Cell 2006 , 24 ( 5 ) :747 -- 757 . 
+ 5 . 
+ Kidder BL , Hu G , Zhao K : ChIP-Seq : technical considerations for obtaining high-quality data . 
+ Nat Immunol 2011 , 12 ( 10 ) :918 -- 922 . 
+ 6 . 
+ Aparicio O , Geisberg JV , Struhl K : Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo . 
+ Curr Protoc Cell Biol 2004 , Chapter 17 : Unit 17.7 . 
+ 7 . 
+ Wade JT , Castro Roa D , Grainger DC , Hurd D , Busby SJ , Struhl K , Nudler E : Extensive functional overlap between sigma factors in Escherichia coli . 
+ Nat Struct Mol Biol 2006 , 13 ( 9 ) :806 -- 814 . 
+ 8 . 
+ Bonocora RP , Fitzgerald DM , Stringer AM , Wade JT : Non-canonical protein-DNA interactions identified by ChIP are not artifacts . 
+ BMC Genomics 2013 , 14 ( 1 ) :254 . 
+ 9 . 
+ Qi Y , Rolfe A , MacIsaac KD , Gerber GK , Pokholok D , Zeitlinger J , Danford T , Dowell RD , Fraenkel E , Jaakkola TS , et al : High-resolution computational models of genome binding events . 
+ Nat Biotechnol 2006 , 24 ( 8 ) :963 -- 970 . 
+ 10 . 
+ Waldminghaus T , Weigel C , Skarstad K : Replication fork movement and methylation govern SeqA binding to the Escherichia coli chromosome . 
+ Nucleic Acids Res 2012 , 40 ( 12 ) :5465 -- 5476 . 
+ 11 . 
+ Sanchez-Romero MA , Busby SJ , Dyer NP , Ott S , Millard AD , Grainger DC : Dynamic distribution of SeqA protein across the chromosome of Escherichia coli K-12 . 
+ mBio 2010 , 1 ( 1 ) : e0012 -- 00010 . 
+ 12 . 
+ Dillon SC , Espinosa E , Hokamp K , Ussery DW , Casadesus J , Dorman CJ : LeuO is a global regulator of gene expression in Salmonella enterica serovar Typhimurium . 
+ Mol Microbiol 2012 , 85 ( 6 ) :1072 -- 1089 . 
+ 13 . 
+ Wilbanks EG , Facciotti MT : Evaluation of algorithm performance in ChIP-seq peak detection . 
+ PLoS One 2010 , 5 ( 7 ) : e11471 . 
+ 14 . 
+ Zhao K , Liu M , Burgess RR : The global transcriptional response of Escherichia coli to induced sigma 32 protein involves sigma 32 regulon activation followed by inactivation and degradation of sigma 32 in vivo . 
+ J Biol Chem 2005 , 280 ( 18 ) :17758 -- 17768 . 
+ 15 . 
+ Nonaka G , Blankschien M , Herman C , Gross CA , Rhodius VA : Regulon and promoter analysis of the E. coli heat-shock factor , sigma32 , reveals a multifaceted cellular response to heat stress . 
+ Genes Dev 2006 , 20 ( 13 ) :1776 -- 1789 . 
+ 16 . 
+ Lambert JP , Mitchell L , Rudner A , Baetz K , Figeys D : A novel proteomics approach for the discovery of chromatin-associated protein networks . 
+ Mol Cell Proteomics 2009 , 8 ( 4 ) :870 -- 882 . 
+ 17 . 
+ Butala M , Busby SJ , Lee DJ : DNA sampling : a method for probing protein binding at specific loci on bacterial chromosomes . 
+ Nucleic Acids Res 2009 , 37 ( 5 ) : e37 . 
+ 18 . 
+ Dejardin J , Kingston RE : Purification of proteins associated with specific genomic Loci . 
+ Cell 2009 , 136 ( 1 ) :175 -- 186 . 
+ 19 . 
+ Lefrancois P , Auerbach RK , Yellman CM , Roeder GS , Snyder M : Centromerelike regions in the budding yeast genome . 
+ PLoS Genet 2013 , 9 ( 1 ) : e1003209 . 
+ 20 . 
+ Wong D , Teixeira A , Oikonomopoulos S , Humburg P , Lone IN , Saliba D , Siggers T , Bulyk M , Angelov D , Dimitrov S , et al : Extensive characterization of NF-kappaB binding uncovers non-canonical motifs and advances the interpretation of genetic functional traits . 
+ Genome Biol 2011 , 12 ( 7 ) : R70 . 
+ 21 . 
+ Rhodius VA , Wade JT : Technical Considerations in using DNA Microarrays to Define Regulons . 
+ Methods 2009 , 47:63 -- 72 . 
+ 22 . 
+ Joshi MC , Magnan D , Montminy TP , Lies M , Stepankiw N , Bates D : Regulation of Sister Chromosome Cohesion by the Replication Fork Tracking Protein SeqA . 
+ PLoS Genet 2013 , 9 : e1003673 . 
+ 23 . 
+ Galagan JE , Minch K , Peterson M , Lyubetskaya A , Azizi E , Sweet L , Gomes A , Rustad T , Dolganov G , Glotova I , Abeel T , Mahwinney C , Kennedy AD , Allard R , Brabant W , Krueger A , Jaini S , Honda B , Yu WH , Hickey MJ , Zucker J , Garay C , Weiner B , Sisk P , Stolte C , Winkler JK , Van de Peer Y , Iazzetti P , Camacho D , Dreyfuss J , Liu Y , Dorhoi A , Mollenkopf HJ , Drogaris P , Lamontagne J , Zhou Y , Piquenot J , Park ST , Raman S , Kaufmann SH , Mohney RP , Chelsky D , Moody DB , Sherman DR , Schoolnik GK : The Mycobacterium tuberculosis regulatory network and hypoxia . 
+ Nature 2013 , 499:178 -- 83 .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/24098145.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/24098145.txt 0 → 100644
View file @27818a9
+ Multidrug Resistant Uropathogenic Escherichia coli
+ Abstract 
+ Escherichia coli ST131 is a globally disseminated , multidrug resistant clone responsible for a high proportion of urinary tract and bloodstream infections . 
+ The rapid emergence and successful spread of E. coli ST131 is strongly associated with antibiotic resistance ; however , this phenotype alone is unlikely to explain its dominance amongst multidrug resistant uropathogens circulating worldwide in hospitals and the community . 
+ Thus , a greater understanding of the molecular mechanisms that underpin the fitness of E. coli ST131 is required . 
+ In this study , we employed hyper-saturated transposon mutagenesis in combination with multiplexed transposon directed insertion-site sequencing to define the essential genes required for in vitro growth and the serum resistome ( i.e. genes required for resistance to human serum ) of E. coli EC958 , a representative of the predominant E. coli ST131 clonal lineage . 
+ We identified 315 essential genes in E. coli EC958 , 231 ( 73 % ) of which were also essential in E. coli K-12 . 
+ The serum resistome comprised 56 genes , the majority of which encode membrane proteins or factors involved in lipopolysaccharide ( LPS ) biosynthesis . 
+ Targeted mutagenesis confirmed a role in serum resistance for 46 ( 82 % ) of these genes . 
+ The murein lipoprotein Lpp , along with two lipid A-core biosynthesis enzymes WaaP and WaaG , were most strongly associated with serum resistance . 
+ While LPS was the main resistance mechanism defined for E. coli EC958 in serum , the enterobacterial common antigen and colanic acid also impacted on this phenotype . 
+ Our analysis also identified a novel function for two genes , hyxA and hyxR , as minor regulators of O-antigen chain length . 
+ This study offers novel insight into the genetic make-up of E. coli ST131 , and provides a framework for future research on E. coli and other Gram-negative pathogens to define their essential gene repertoire and to dissect the molecular mechanisms that enable them to survive in the bloodstream and cause disease . 
+ Introduction
+ Escherichia coli O25b : H4-ST131 ( E. coli ST131 ) is a recently emerged , globally disseminated clone that is often multidrug resistant and is responsible for a high proportion of community-and nosocomially-acquired urinary tract and bloodstream infections [ 1 -- 6 ] . 
+ E. coli ST131 strains are also capable of causing complicated infections including acute pyelonephritis , osteomye-litis , septic arthritis and septic shock [ 7,8 ] . 
+ E. coli ST131 are commonly associated with production of the CTX-M-15 enzyme , currently the most widespread extended spectrum b-lactamase ( ESBL ) of its type in the world [ 1,9 ] . 
+ In addition to resistance against oxyimino-cephalosporins ( i.e. cefotaxime , ceftazidime ) , and monobactams , E. coli ST131 strains are often co-resistant to fluoroquinolones [ 3,10 ] . 
+ Indeed , most fluoroquinolone-resistant E. coli strains belong to a recently emerged and dominant subgroup of ST131 strains [ 11 ] . 
+ Some E. coli ST131 strains have also been reported to produce carbapenemases [ 12 -- 14 ] , thus severely limiting treatment options that are currently available against this clinically predominant clone [ 15 ] . 
+ E. coli ST131 strains , like many other uropathogenic E. coli ( UPEC ) strains , are derived from phylogenetic group B2 [ 3 ] . 
+ Typically , UPEC strains possess a large and diverse range of virulence factors that contribute to their ability to cause urinary tract and bloodstream infections , including adhesins , toxins , siderophores and protectins [ 15,16 ] . 
+ Several studies have demonstrated that E. coli ST131 strains possess a similar suite of virulence factors and cause invasive disease , leading to the hypothesis that the widespread pathogenic success of E. coli ST131 strains may be in part due to enhanced virulence [ 7,8,17 ] . 
+ However , it has become clear from recent studies that E. coli ST131 strains do not possess a heightened virulence potential compared to other UPEC or B2 E. coli strains in causing invasive infections [ 18 ] or infections in nematodes and zebrafish embryos [ 19 ] . 
+ Thus , other factors such as enhanced metabolic capacity have been proposed to contribute to the fitness and pathogenic success of this dominant clone [ 20,21 ] . 
+ The genome sequence of one of the best-characterized E. coli ST131 strains , EC958 , was recently described [ 22 ] . 
+ E. coli EC958 is a member of the pulsed-field gel electrophoresis ( PFGE ) defined UK epidemic strain A , which represents one of the major pathogenic lineages ( PFGE strains A -- E ) of ESBL producing E. coli causing urinary tract infections ( UTI ) across the UK [ 23 ] . 
+ E. coli EC958 is resistant to eight antibiotic classes , including oxyimino-cephalosporins , fluoroquinolones and sulphonamides . 
+ E. coli EC958 colonizes the bladder of mice in a type 1 fimbriae-dependent manner [ 22 ] , can invade into bladder epithelial cells and form intracellular bacterial communities , can establish both acute and chronic UTI [ 24 ] and can inhibit the contraction of ureters , in vitro [ 25 ] . 
+ The ability to resist the bactericidal activity of serum , and thus survive in the bloodstream , represents an essential virulence trait for UPEC and other extra-intestinal E. coli ( ExPEC ) strains , including E. coli ST131 [ 26 -- 28 ] . 
+ In E. coli , several mechanisms have been shown to contribute to serum resistance . 
+ The importance of O-antigens and K capsules in resistance to serum has been recognized since the 1960s and 1980s , respectively [ 29 -- 31 ] ; and their multiple types , combinations and length contribute differently to serum resistance [ 32 -- 35 ] . 
+ The major outer membrane protein OmpA [ 36 ] , plasmid-encoded proteins TraT [ 37,38 ] and Iss [ 39 ] , and the phage membrane protein Bor [ 40 ] have also been reported to contribute to serum resistance in E. coli . 
+ Notably , each of these resistance mechanisms has been studied in isolation and in different strain backgrounds . 
+ Thus , while serum resistance is clearly a complex phenotype determined by multiple elements , little is known about the combination of factors that contribute to resistance in a single strain . 
+ High-throughput transposon mutagenesis combined with ge-nome-wide targeted sequencing was used recently to study the essential genes in Salmonella enterica serovar Typhi and Caulobacter crescentus [ 41,42 ] . 
+ Langridge et al. also used their transposon directed insertion-site sequencing ( TraDIS ) method to assay every gene for its role in the survival of S. Typhi in the presence of bile salts [ 42 ] . 
+ Similar approaches ( INSeq , HITS , Tn-seq ) have also been applied to a range of organisms to study gene requirements for survival in particular niches [ 43 -- 46 ] . 
+ Here , we adapted TraDIS and designed a multiplexing method to define the essential genes required for in vitro growth ( i.e. Luria-Bertani agar media supplemented with 30 mg/ml Cm at 37uC ) and the serum resistome ( i.e. genes required for resistance to human serum , in E. coli EC958 ) . 
+ We show that the essential gene list of E. coli EC958 comprises 315 genes , 231 of which are shared with E. coli K-12 . 
+ We also define for the first time a comprehensive inventory of genes required for resistance to human serum . 
+ Our study provides a molecular blueprint for understanding the mechanisms employed by E. coli ST131 to survive , grow in the bloodstream and cause disease . 
+ Results
+ Application of multiplexed TraDIS to the E. coli ST131 strain EC958
+ Approximately 1 million mutants were generated in the E. coli ST131 strain EC958 [ 22 ] using an in-house miniTn5 transposon carrying a chloramphenicol ( Cm ) resistance gene derived from the pKD3 plasmid [ 47 ] . 
+ A primer comprising four functional regions was designed to facilitate specific sequencing of the transposon insertion sites on the Illumina HiSeq 2000 platform while allowing for intra-lane multiple sample indexing ( Figure S1 ) . 
+ This primer contained ( 59-39 ) : ( i ) the P5 sequence to bind to TruSeq flowcells , ( ii ) the Illumina read 1 sequencing primer binding site , ( iii ) a 6-bp index sequence for multiple sample barcoding and ( iv ) a 25-bp transposon specific sequence designed to amplify the last 12 bp of the transposon and its adjacent genomic sequence . 
+ Using this custom primer , we successfully sequenced the transposon insertion sites for 6 samples on both TruSeq version 2 and version 3 flowcells ( Figure 1A ) . 
+ Each sample yielded from 6.8 million to 15 million reads that were tagged with transposon specific sequence , 71 % of which were reliably mapped to EC958 draft chromosome ( excluding unscaffolded contigs and plasmids ) ( Table 1 ) . 
+ All experiments were performed in duplicate , with the correlation coefficient for the number of insertions per gene for each pair of samples close to 1 ( R .0.99 ) and thus demonstrating a high level 2 
+ Essential genes in E. coli EC958
+ We initially used our saturated random insertion mutant library to determine the ` essential genes of EC958 ' , defined as the set of genes required for growth on LB agar supplemented with Cm 30 mg/ml . 
+ We extracted genomic DNA ( in duplicates : input A and B ) directly from the library pool and sequenced using our multiplexed TraDIS protocol . 
+ We combined the reads from input A and B to maximize the coverage resulting in 16 million transposon-tagged reads , of which 11 million uniquely mapped to the EC958 chromosome , resulting in 502,068 unique insertion sites . 
+ This equates to an average of one insertion site every 9.92 bp , with a very low probability of having 100 consecutive bp 25 without interruption by chance ( P = 4.2610 ) . 
+ The essential gene list was identified using a statistical analysis similar to that described by Langridge et al. , which recognized two distinct distributions of insertion indexes ( number of insertions divided by gene length ) for non-essential genes ( gamma ) and essential genes ( exponential ) and called those with insertion indexes less than or equal to the intercept of the two distributions as essential [ 42 ] . 
+ In our data , an insertion index cut-off of 0.0158 , resulted in the identification of 315 genes as essential ( Table S1 ) . 
+ This cut-off is equivalent to a log2-likelihood-ratio ( LLR ) of 23.6 , which means that our essential genes are at least 12 times more likely to belong to the exponential distribution ( essential ) than the gamma ( non-essential ) distribution . 
+ The functional category of each gene was identified based on the COG ( Clusters of Orthologous Groups ) numbers from the EC958 annotation ( accession number PRJEA61443 ) . 
+ Figure 2 shows an overview of essential functions in EC958 compared with the total number of genes in each functional category . 
+ Genes involved in translation , ribosomal structure and biogenesis account for 25 % of the total number of essential genes in EC958 , which is 42 % of the total number of genes in this category . 
+ The second most abundant category in the essential gene list comprised genes involved in cell wall/membrane/envelope biogenesis ( 12 % ) , followed by genes involved in coenzyme transport and metabo-lism . 
+ There were 23 essential genes with functions not identified in the COG database . 
+ To investigate the conservation of EC958 essential proteins among different E. coli pathotypes , we performed tfastx alignment ( FASTA v36 ) between the essential protein sequences and translated DNA from 50 E. coli complete genomes ( Table S2 ) . 
+ There were 270 ( 86 % ) proteins conserved across all genomes investigated . 
+ An additional 17 proteins were also present in more than 90 % of the genomes . 
+ Only 6 proteins were specific for EC958 ( not found in 50 genomes ) ( Table S3 ) . 
+ The serum resistome of E. coli EC958
+ Saturated transposon mutagenesis in combination with next-generation sequencing is a powerful tool for whole genome , highthroughput identification of all candidate genes involved in a particular phenotype . 
+ Here , we used our transposon mutant library in combination with TraDIS to identify genes from EC958 involved in resistance to human serum , thus enabling us to define the serum resistome of EC958 . 
+ We designed a mutant selection procedure in which 1 million mutants were exposed to pooled fresh human serum for 90 minutes and then allowed to grow in LB broth for 4 hours before genomic DNA extraction . 
+ This procedure permitted the growth of serum resistant mutants while eliminating or inhibiting mutants that were sensitive to serum ( Figure 3A ) . 
+ The procedure was performed in parallel with control samples where fresh serum was replaced by inactivated serum that lacked bactericidal activity ( data not shown ) . 
+ The genomic DNA from test and control samples were sequenced using our modified Illumina multiplexed TraDIS procedure ( Figure 3B ) to generate multiple datasets ( Figure 1A ) that were analysed by the Bioconductor package edgeR after filtering out genes identified as essential [ 48 ] . 
+ The serum resistance genes were identified as genes that have significant reduction in read counts in the test samples compared to the control samples ( i.e. less mutants survived after serum treatment ) ( Table S4 ) . 
+ A stringent threshold of log2 fold change of read counts ( logFC ) less than 21 and an adjusted p-value less than 
+ 0.001 was used to identify significant genes that are involved in serum resistance ( Figure S2 ) . 
+ Figure 4 shows the names and genomic locations of the 56 genes that satisfied these stringent criteria . 
+ Twenty-two ( 39.3 % ) of the genes belong to three operons responsible for LPS biosynthesis ( including both O-antigen biosynthesis and lipid A-core biosynthesis ) and enterobacterial common antigen ( ECA ) biosynthesis . 
+ This result represents the first layer of evidence demonstrating the importance of the O25 antigen as well as ECA in an E. coli ST131 background for protection from the bactericidal activity of human serum . 
+ Detailed characterisation of the O25 antigen gene cluster is discussed in subsequent sections . 
+ ECA is common to all Enterobacteriaceae and is expressed in both serum resistant ( smooth ) and sensitive ( rough ) strains , except for rough strains that are defective in the shared biosynthesis pathway affecting both O-antigen and ECA [ 49,50 ] . 
+ Seven out of 12 genes in the ECA operon were required for serum resistance as determined by the TraDIS technique . 
+ The remaining 60.7 % of genes in the serum resistome identified by TraDIS included genes encoding lipoprotein , membrane proteins , regulators and hypothetical proteins ( Table 2 ) . 
+ Some of these also affect LPS such as rfaH ( EC958_4322 ) , encoding a known regulator required for LPS biosynthesis [ 51,52 ] and virulence of pathogenic E. coli strains [ 53,54 ] , whilst others represent genes that have not previously been shown to be associated with serum resistance . 
+ The murein lipoprotein gene lpp ( EC958_1897 ) showed the greatest difference between the test and control samples ( logFC of 210 ) , followed by pgm ( EC958_0806 ) , encoding phosphoglucomutase . 
+ Four hypothetical proteins were also identified , two of which ( EC958_0460 and EC958_0461 ) were further characterized in this study ( see below ) . 
+ Validation of serum resistance genes
+ As mentioned above , we employed a stringent threshold combining fold change and statistical significance to define the set of 56 genes in the EC958 serum resistome . 
+ In order to validate these findings , we attempted to test all 56 genes independently for their role in serum resistance . 
+ Using a modified lambda red mediated-homologous recombination approach [ 22,47 ] we successfully generated defined knock-out mutants for 54 genes ( 96.4 % ) in EC958 . 
+ We were unable to obtain mutants for the remaining 2 genes ( acrA and EC958_2373 ) despite multiple attempts ( Table 2 ) . 
+ The 54 defined mutants were subjected to serum susceptibility testing , whereby the number of surviving colonies after a 90-minute exposure to fresh pooled human serum was compared to the number of colonies prior to treatment ( Table 2 ) . 
+ A mutant was defined as susceptible to serum when its log difference was at least 1 ( i.e. 10 fold reduction after exposure to serum ) . 
+ Forty-one genes 
+ ( 75.9 % ) contributed to serum resistance in EC958 using this assay . 
+ In the case of the remaining 13 mutants , it is possible that the lack of susceptibility to human serum was a reflection of the assay , suggesting that survival in serum within a mixed population of one million mutants may be very different from survival of a pure population carrying the same defective mutation . 
+ Therefore , to better mimic the condition of TraDIS library serum selection , a competitive assay was devised where EC958 wild-type was mixed equally with a mutant before exposure to serum and the competitive index of the mutant was measured . 
+ Using this competitive assay , five of the twelve mutants were significantly attenuated compared to the wild-type EC958 strain ( Table 2 ) . 
+ Thus , the overall number of validated susceptible mutants was 46 out of 54 tested ( 85.2 % ) . 
+ Additional characteristics of serum resistant mutants
+ We hypothesized that one mechanism associated with enhanced sensitivity to human serum could be due to decreased membrane integrity caused by destabilization of the outer leaflet of the outer membrane . 
+ In order to test this , we examined the survival of the 54 mutants in response to outer membrane stresses ( i.e. SDS ) and osmotic potential ( i.e. NaCl ) . 
+ In total , 50.0 % ( 27/54 ) of the mutants displayed enhanced sensitivity to SDS and 11.1 % ( 6/54 ) of the mutants displayed enhanced sensitivity to NaCl ( Table 2 ) . 
+ A comparative analysis of these phenotypes in the context of serum sensitivity is presented below . 
+ O25b antigen biosynthesis genes conferring resistance to serum 
+ It is well established that O-antigen represents the main determinant for serum resistance in E. coli . 
+ However , there are more than 180 different O-antigens that have been defined in E. coli and these may contribute differently to serum resistance in individual bacterial strains [ 32,33 ] . 
+ Furthermore , direct genetic evidence linking O-antigen biosynthesis to serum resistance is only available for a small number of specific O-antigen types . 
+ With this in mind , a detailed characterization of the O25b biosynthesis genes was performed using sequence comparison for function prediction in combination with analysis of LPS composition to deduce the role of each gene in resistance to serum . 
+ Similar to most E. coli strains , the O-antigen biosynthesis operon is located between the galF and gnd genes in EC958 . 
+ Figure 5A shows a comparison of the EC958 O-antigen operon with the equivalent operon from the K-12 strain MG1655 and the O25 serotype E. coli strain E47a [ 32,55 ] . 
+ Nucleotide sugar biosynthesis genes . 
+ The four genes encoding dTDP-a-L-rhamnose biosynthesis enzymes were highly conserved in the three strains , consistent with the presence of a-L-rhamnose in the O16 and O25 antigen repeat unit expressed by 
+ MG1655 and EC958 , respectively [ 56 -- 58 ] . 
+ The dTDP-a-L-rhamnose biosynthesis pathway starts from a-D-glucose-1-phos-phate to produce dTDP-a-L-rhamnose via the catalysis of RmlA ( to make dTDP-a-D-glucose ) , RmlB ( to make dTDP-4-dehydro-6-deoxy-a-D-glucose ) , RmlC ( to make dTDP-4-dehydro-6-deoxy-b-mannose ) and RmlD ( to make dTDP-a-L-rhamnose ) , respectively . 
+ LPS gel analysis of EC958 mutants lacking one of these four enzymes showed that RmlC and RmlD were required for the biosynthesis of O-antigen and mutation in either gene resulted in cells with only the lipid A core ( Figure 5C ) . 
+ On the contrary , the 
+ LPS patterns of rmlA and rmlB mutants showed only a small change in intensity but retained most of the O-antigen chain length distribution , except for the very long chain length band . 
+ This might be explained by the existence of RffH and RffG , two isozymes of RmlA and RmlB , in EC958 . 
+ RffH and RffG , however , were not able to fully compensate for RmlA or RmlB in resistance to serum ( Table 2 ) . 
+ The inability to produce the very long chain length O-antigen in rmlA and rmlB mutants might be a crucial factor in determining their susceptibility to human serum . 
+ Only the rmlC mutant displayed altered sensitivity to SDS , while mutations in rmlABCD did not affect sensitivity to NaCl ( Table 2 ) . 
+ The O-antigen operon in E47a has another set of nucleotide sugar biosynthesis genes ( fnlABC ) for the biosynthesis of UDP-N-acetyl-a-L-fucosamine ( UDP-FucNAc ) from UDP-N-acetyl-a-D-glucosamine ( UDP-GlcNAc ) . 
+ Curiously , EC958 does not have these three genes in the operon or anywhere else in its genome . 
+ O-antigen processing genes . 
+ The O-antigen flippase ( Wzx , EC958_2377 ) and O-antigen polymerase ( Wzy , EC958_2374 ) from EC958 are highly similar to the corresponding genes from E47a . 
+ In contrast , both genes share very low similarity to the corresponding genes from MG1655 . 
+ Mutation of the wzx gene in MG1655 results in the accumulation of high levels of the UndPP-O unit in the cytoplasm [ 59 ] and hypersensitivity to several antibiotics and other agents including nalidixic acid , tetracycline , mitomycin C and hydrogen peroxide [ 60 ] . 
+ Transcriptional analyses of MG1655 responses to the broad-spectrum biocide polyhexamethylene biguanide suggested that Wzx ( also known as RfbX ) might be involved in cellular stress responses [ 61 ] . 
+ In contrast to the previous studies , the wzx gene was defined as essential in this study ( Figure 4 , Table S1 ) . 
+ This might be explained by the fact that Cm was used to select for transposon mutants and the wzx mutants might be hypersensitive to this antibiotic . 
+ The wzy O-antigen polymerase gene ( EC958_2374 ) was also defined as essential in EC958 and thus was not further characterized . 
+ Mutation of the chain length regulator gene ( wzz , EC958_2368 ) showed a non-modal distribution of O-antigen chain length in its LPS pattern ( Figure 5C ) , an observation consistent with previous study describing the role of this gene [ 62 ] . 
+ The reduction of long chain length O-antigen in this mutant is likely to account for the serum sensitivity shown in our assay ( Table 2 ) . 
+ Glycosyltransferases genes . 
+ Glycosyltransferases ( GTs ) are required to form the glycosidic bonds between sugars in an O-antigen repeat unit . 
+ There are now more than 100,000 GTs within 94 families ( http://www.cazy.org ) . 
+ Based on the structure of the O25 repeat unit [ 63,64 ] , we predicted a requirement for 4 GTs in EC958 . 
+ Bioinformatic analysis ( blastp ) confirmed that there are indeed 4 GTs within the O-antigen operon : EC958_2371 , EC958_2372 , EC958_2375 and EC958_2376 . 
+ Surprisingly , two of the predicted GTs in EC958 are very different from those in E47a . 
+ Whether this is related to the different source of a-L-FucNAc in EC958 due to the lack of fnlABC remains to be investigated . 
+ Since the O25 antigen structure is known , we attempted to predict the glycosidic link formed by each GT using a combination of sequence similarity and O-antigen structure comparison as previously described [ 64 ] ; the results of which are shown in Table 3 and Figure 5B . 
+ We were only able to mutate EC958_2371 , while EC958_2372 and EC958_2375 were defined as essential in this study . 
+ Analysis of our EC958_2371 mutant confirmed that EC958_2371 is required for O-antigen biosynthesis ( Figure 5C ) , serum resistance and resistance to SDS ( Table 2 ) . 
+ Other serum resistance mechanisms affecting LPS in EC958
+ LPS gel analysis was performed on all 54 defined mutants to identify genes that contribute to serum resistance by affecting LPS ( Figure S3 ) . 
+ The normal LPS pattern of EC958 consists of 12 bands including a thick bottom band representing the lipid A-core and an 11-band laddering pattern of lipid A-core bound O-antigen polymers , followed by approximately 6 thick bands of very long O-antigen chain length ( Figure 5C ) . 
+ In addition to the 6 genes involved in O-antigen biosynthesis mentioned above , the LPS patterns of 20 mutants were altered in comparison to wild-type 
+ EC958 ; 6 of these mutants ( waaLKYJ , wecA and rfaH ) only produced a lipid A-core ( Table 2 and Figure S3 ) . 
+ LPS core biosynthesis genes . 
+ The genes involved in biosynthesis of the LPS outer core in EC958 share strong similarity with those from K-12 MG1655 ( waaL and waaGPBIJYK ) [ reviewed in 65,66 ] , suggesting that the LPS outer core structure in EC958 is the same as that in MG1655 . 
+ As expected from the functions of waaGBIJK in MG1655 , the waaG mutant showed the smallest size of lipid A-core , indicating that its whole outer core was not linked to the lipid A-inner core . 
+ Mutations in waaIJYK also produced an expected LPS pattern of lipid A-core only , as these mutations prevent the linkage of O-antigen to the outer core . 
+ The 
+ LPS of the waaB and waaP mutants produced O-antigen laddering patterns containing an abnormal lipidA-core band , indicating that although the core structures were changed , these changes still allow the linking of O-antigen to the outer core . 
+ WaaL is responsible for the ligation of O-antigen to the outer core and mutation of this gene resulted in the synthesis of a lipid A-core without O-antigen . 
+ The EC958 waaL and waaGPBIJYK mutants were sensitive to human serum and SDS ( except for the waaB and waaY mutant ) , while only the waaG , waaI and waaY mutants displayed enhanced sensitivity to NaCl ( Table 2 ) . 
+ ECA biosynthesis genes . 
+ Seven serum resistant candidate genes within the ECA biosynthesis wec operon were identified by TraDIS ( Table 2 ) . 
+ Two of these genes ( wzxE and wzyE ) were not confirmed by serum assays to confer resistance and no changes were observed in the LPS patterns of these two mutants ( Table 2 , Figure S3 ) . 
+ The wzyE mutant ( but not the wzxE mutant ) was , however , more sensitive to SDS ( Table 2 ) . 
+ The LPS patterns of the wzzE mutant revealed changes in the lipid A-core and in the modulation of O-antigen chain length ( Figure S3 ) . 
+ WzxE has previously been shown to preferentially form a protein complex with WzyE and WzzE for the biosynthesis of ECA over Wzy and Wzz [ 67 ] . 
+ However , our results suggested that WzzE might also contribute to the chain length regulation of O-antigen in EC958 . 
+ WecA has been shown previously to be involved in the biosynthesis of O7 , O18 , O75 , and O111 antigen [ 68 ] and our results indicate that the same is also true for O25b , thus explaining the role of WecA in serum resistance in EC958 . 
+ The wecD mutant was highly sensitive to serum , possessed an LPS profile that was altered in both the lipid A-core and the amount of O-antigen , and displayed enhanced sensitivity to SDS ( Table 2 , Figure S3 ) . 
+ Further investigation is required to understand the involvement of WecD in lipid A-core and O-antigen biosynthesis . 
+ Finally , none of the ECA biosynthesis mutants were altered in sensitivity to NaCl . 
+ Other genes . 
+ The remaining 8 mutants with altered LPS patterns were all confirmed as serum sensitive ( Table 2 ) . 
+ They included one mutant , rfaH , which produced only lipid A-core , and 7 mutants with different O-antigen patterns : hyxA ( EC958_0460 ) , hyxR ( EC958_0461 ) , nagA ( N-acetylglucosamine-6-phosphate dea-cetylase ) , pgm ( phosphoglucomutase ) , galE ( UDP-galactose-4-epim-erase ) , EC958_1112 and wcaF ( predicted acetyltransferase involved in colanic acid synthesis [ 69 ] ) . 
+ Some of these genes were also associated with resistance to additional stresses ; EC958 rfaH and wcaF mutants were sensitive to SDS and NaCl , while nagA and pgm mutants were sensitive to SDS ( Table 2 ) . 
+ Novel O-antigen chain length regulators
+ Mutation of the hyxA and hyxR genes in EC958 resulted in the modulation of O-antigen chain length ( Figure 6B ) . 
+ The EC958 hyxA mutant exhibited an increased proportion of O-antigen chain of 2 to 6 units with maximum number at 3 -- 5 units and reduction in very high chain length polymer . 
+ The EC958 hyxR mutant had an increased proportion of O-antigen polymer of 2 -- 4 units . 
+ The hyxA and hyxR genes are located in a pathogenicity island ( PAI-X ) consisting of 4 genes ( fimX and hyxRAB ) as previously described in the UPEC strain UTI89 [ 70 ] . 
+ The hyxB gene ( EC958_0459 ) has also been named upaB due to its function as an autotransporter [ 71 ] , and we prefer to maintain this nomenclature . 
+ This island is present ( in several variations ) in 24 out of 50 E. coli genomes across all pathotypes ( Figure 6A ) and current sequence data suggest that it is exclusive to E. coli . 
+ No functional prediction was found for hyxA , and thus this is the first report of hyxA involvement in serum resistance by regulating O-antigen chain length . 
+ The hyxR gene encodes a LuxR-like response regulator that suppresses the nitrosative stress response and contributes to intracellular survival in macrophages by regulating hmpA , which encodes a nitric oxide-detoxifying flavohaemoglobin [ 70 ] . 
+ The expression of the hyxR gene is regulated through bidirectional phase inversion of its promoter region by the upstream gene fimX , which encodes a tyrosine-like recombinase [ 70 ] . 
+ It is also worth noting that the contribution of hyxA to serum resistance was greater than hyxR , as demonstrated by the 3-log reduction in viability by the hyxA mutant compared to the hyxR mutant . 
+ A serum sensitive phenotype for the hyxR mutant was only observed in mixed competition assays , and both mutants did not exhibit altered sensitivity to SDS or NaCl ( Table 2 ) . 
+ Novel serum resistance mechanisms not affecting LPS
+ A major advantage of whole genome approaches such as TraDIS lies in their power of discovery . 
+ Out of 46 genes that define the serum resistome of EC958 , 21 ( 46 % ) genes were confirmed to be required for serum resistance independent of altered LPS patterns ( Table 2 ; Figure S3 ) . 
+ The function of these genes ranged across 10 COG functional categories and included ` Carbohydrate transport and metabolism ' ( 4 genes ) , ` Cell wall / membrane/envelope biogenesis ' ( 3 ) , ` Posttranslational modification , protein turnover , chaperones ' ( 3 ) and many others ( Table 2 ) . 
+ Twelve of these genes were also associated with enhanced sensitivity to SDS and one with enhanced NaCl sensitivity ( Table 2 ) 
+ Of the non-LPS genes required for serum resistance , the most notable was lpp ( logFC 210 , log difference = 6 ) ( Table 2 ) . 
+ The lpp gene encodes one of the most abundant proteins in E. coli and is responsible for the stabilisation and integrity of the bacterial cell envelope [ 72 ] . 
+ Mutation of the lpp gene results in the formation of outer membrane blebs , leakage of periplasmic enzyme ribonuclease , decreased growth rate in media of low ionic strength or low osmolarity and hypersensitive to toxic compounds [ 73,74 ] . 
+ Indeed , the EC958 lpp mutant was more sensitive to SDS , suggesting decreased membrane integrity ( Table 2 ) . 
+ To the best of our knowledge , this study is the first to show a direct link between Lpp and serum resistance . 
+ Another set of genes notable in our TraDIS analysis include tolQAB , which encode three of the six proteins ( YbgC-YbgF-TolQ-R-A-B-Pal ) that make up the Tol-Pal system of E. coli cell envelope . 
+ The Tol-Pal system is responsible for maintaining the integrity of the outer membrane . 
+ TolQRA form an inner membrane complex in which TolQR is necessary for its stability [ 75 ] . 
+ TolB , a periplasmic protein , connects the inner membrane complex with the peptidoglycan-associated lipoprotein , Pal , which is anchored to the outer membrane [ 76 ] . 
+ In our study , mutation of the tolA and tolQ genes caused sensitivity to human serum and increased sensitivity to SDS , while the tolB mutant did not ( Table 2 ) . 
+ Our results demonstrate that the Tol-Pal system is important for resistance to human serum , and thus describe a novel function for this important cell wall complex . 
+ BamB is a lipoprotein that is part of the BamABCD complex . 
+ Mutation in bamB results in increased outer membrane permeability , thus enhancing sensitivity to rifampin and dramatically reducing growth on SDS and novobiocin [ 77 ] . 
+ Our data showed that mutation of the bamB gene in EC958 resulted in increased sensitivity to both human serum and SDS ( Table 2 ) . 
+ Our TraDIS experiment also indicated that the modification of lipid A with L-Ara4N was important for serum resistance . 
+ Three ( arnDEF ) of the seven genes involved in the biosynthesis and attachment of L-Ara4N to lipid A-core were identified as part of the serum resistome of EC958 and their role was confirmed by mutagenesis ( Table 2 ) . 
+ This mechanism is known to confer resistance to polymixin B by preventing its binding to lipid A [ Reviewed in 78,79,80 ] . 
+ ArnD catalyzes a deformylation step to generate UDP-L-Ara4N before it is transported across the inner membrane by ArnEF [ 79,81 ] . 
+ The requirement of ArnDEF for serum resistance indicates that EC958 requires L-Ara4N modification to evade the antimicrobial activity of cationic peptides present in human serum . 
+ Interestingly , only the arnD mutation conferred sensitivity to SDS ( Table 2 ) , which might suggest a role of UDP-L-Ara4N in maintaining membrane integrity . 
+ Further investigation is needed to understand why ArnT , the final enzyme required for transferring the L-Ara4N residue to the 49-phosphate group of lipid A-core , was not identified in our TraDIS-defined serum resistome . 
+ We also identified three genes encoding catabolic enzymes that contributed to the serum resistance phenotype of EC958 ( gmm , pgi and fbp ) and confirmed their role by mutagenesis ( Table 2 ) . 
+ Of these genes , only the pgi mutant displayed enhanced sensitivity to SDS ( Table 2 ) . 
+ Gmm is a GDP-mannose mannosyl hydrolase capable of hydrolyzing both GDP-mannose and GDP-glucose [ 82 ] . 
+ This enzyme contributes to the biosynthesis of GDP-fucose , a component of colanic acid , possibly by influencing the concentration of 
+ GDP-mannose or GDP-glucose in the cell and thus regulating cell wall biosynthesis [ 82 ] . 
+ Both Pgi and Fbp catalyze the production of D-fructose-6-phosphate from b-D-glucose-6-phosphate and fruc-tose-1 ,6 - bisphosphate , respectively [ 83,84 ] . 
+ D-fructose-6-phosphate is a precursor for the biosynthesis of UDP-GlcNAc , which in turn is required for peptidoglycan , lipid A and ECA biosynthesis . 
+ Thus , these three enzymes may catalyse key reactions that , if disrupted , could adversely affect the downstream biosynthesis of cell surface components including colanic acid , peptidoglycan , lipid A and ECA . 
+ An ompA mutant is protected from serum killing when present at a low proportion in a mixed bacterial population Of all the chromosomal genes previously attributed to serum resistance , the only gene that was not identified in our TraDIS screen was ompA . 
+ To examine this further we constructed an EC958 ompA mutant and indeed observed it was sensitive to killing by human serum ( Figure 7 ) . 
+ One way to explain this discrepancy is that the phenotype of an ompA mutant could be complemented in trans by other ompA-intact bacteria in a mixed population such as the mutant library . 
+ In fact , OmpA inhibits serum-mediated killing by binding to C4b-binding protein ( C4BP ) to prevent the activation of C3b via the classical complement pathway [ 85 ] , and OmpA is known to be released from E. coli cells when treated with serum [ 86 ] . 
+ In our mutant library , ompA mutants only accounted for approximately 0.02 % of the total bacterial cells , and thus we hypothesized that the release of OmpA from 99.98 % of the cells , when treated with serum , provided OmpA in trans to complement ompA mutants . 
+ We tested this hypothesis by mixing the ompA mutant with wild-type EC958 at various ratios and indeed showed that ompA mutants were protected from serum killing if the proportion of ompA mutants was less than 15 % ( Figure 7 ) . 
+ This result strongly suggests that in trans complementation of OmpA prevents the identification of ompA as a serum resistance gene in our assay . 
+ Complementation of selected mutants restores serum resistance
+ To further demonstrate the function of non-LPS genes in serum resistance , we selected three orphan genes ( acnB , greA and fbp ) that do not belong to an operon to perform genetic complementation . 
+ For these experiments , the selected gene was amplified by PCR , cloned into the low copy number plasmid pSU2718G , and transformed into the respective mutant strain for complementation . 
+ In each case , the phenotype of the complemented strain exactly matched that of wild-type EC958 ( Table 4 ) . 
+ Taken together , these results confirm the role of acnB , greA and fbp in serum resistance and provide a further layer of evidence to support the use of techniques such as TraDIS in functional gene discovery . 
+ Discussion
+ The rapid advancement of new sequencing technologies has created novel opportunities to interrogate biological systems that were not previously possible . 
+ TraDIS was first described as a method that combined high-density mutagenesis with Illumina next generation sequencing technology to study the essential genes of S. Typhi and the conditional essential genes required for survival in bile [ 42 ] . 
+ The increased data output afforded by next generation DNA sequencing is particularly useful and costeffective for small bacterial genomes , however it presents technical and bioinformatical challenges for applications such as TraDIS that utilize low complexity DNA libraries . 
+ Here we present the application of a modified version of TraDIS that is amenable to multiplexing using the Illumina HiSeq 2000 platform , and we demonstrate its effectiveness by using it to define the essential gene repertoire and the serum resistome of a multidrug resistant strain from the globally disseminated E. coli ST131 lineage . 
+ The multiplexed TraDIS protocol utilizes a newly designed custom oligonucleotide in the library enrichment step of the Illumina library preparation protocol ( Figure S1 ) . 
+ This oligonucleotide incorporates the Illumina sequencing primer-binding site into transposon specific DNA fragments , enabling the use of the standard Illumina sequencing primer and eliminating the need to design and optimize another sequencing primer for each new transposon sequence . 
+ The 6-bp barcode immediately after the sequencing primer-binding site allows 12-sample multiplexing within one lane . 
+ The use of 12 barcodes at the first 6 nucleotides of read 1 increased the complexity of the library compared with the original TraDIS protocol , thus reducing data loss due to mis-identification of clusters [ 87,88 ] . 
+ However , the number of useable reads from our sequencing runs was still low ( 15 -- 20 % of total reads ) . 
+ We believe non-specific amplification in the enrichment step was the main cause , and further optimization of the enrichment PCR conditions is required . 
+ Similar approaches combining transposon mutagenesis with high-throughput sequencing ( Tn-seq [ 45 ] , INSeq [ 43 ] , HITS [ 44 ] ) have also been used to address different scientific questions , including the identification of essential genes and genes associated with enhanced fitness in specific growth conditions [ 41,42,45 ] , determination of niche-specific essential genes [ 43,44,89 ] , identification of genes associated with tolerance to various agents/conditions [ 42,90 ] and many other applications as reviewed elsewhere [ 91,92 ] . 
+ In terms of insertion density , we achieved 502,068 independent insertion sites with a density of 1/10 bp ( i.e. one insertion every 10 bp ) , which is comparable with the work by Christen et al. ( 1/8 bp in C. crescentus ) [ 41 ] , Langridge et al. ( 1/13 bp in Salmonella Typhi ) [ 42 ] and Barquist et al. ( 1/9 bp in Salmonella Typhimurium ) [ 46 ] . 
+ The identification of the essential gene set for a single organism is challenging due to several factors , including the presence of transposon insertion cold spots ( i.e. regions of low transposon insertion frequency ) , the difficulty in distinguishing mutations that prevent growth from those that severely reduce growth rate , preexisting gene duplications and the specific growth conditions used in the experiment [ 93 -- 95 ] . 
+ In this study , we define essential genes as those genes that , when mutated by transposon insertion , either prevent or severely attenuate growth on LB agar media supplemented with 30 mg/ml Cm at 37uC . 
+ The cut-off value to determine whether a gene is essential was defined as the intercept of two distributions of the insertion index of each gene : the exponential distribution representing essential genes and the gamma distribution representing non-essential genes [ 42 ] . 
+ This means that our essential genes also include those genes that can tolerate insertions but were severely attenuated in the input pool . 
+ Out of 315 essential genes , 64 genes had no transposon insertions , 178 genes had 1 to 5 transposon insertions and 73 genes had more than 5 transposon insertions ( Table S1 ) . 
+ The high density of insertion sites achieved in our study provided reliable data for the identification of essential genes within the EC958 genomes with a minimal probability of false positive calls due to transposon insertion cold spots . 
+ The identification of essential genes has previously been performed using several approaches in E. coli K-12 ( strains MG1655 and 
+ W3110 ) [ 96,97 ] . 
+ Baba et al. generated null mutations by lambda-red recombination in 3985 E. coli W3110 genes ( the Keio library ) , but were unable to mutate 303 candidate essential genes [ 97 ] . 
+ This set of essential genes was further consolidated by manual literature review on the EcoGene website ( www.ecogene.org ) , which reduced the set to 289 genes . 
+ Of the 315 essential genes identified for EC958 in this study , 231 genes ( 73 % ) matched those previously described in the EcoGene list ( Table S1 ) . 
+ There were 84 essential genes specific for EC958 , twenty-four of which do not have homologs in the MG1655 genome . 
+ In contrast , 58 genes previously identified as essential for E. coli K-12 were either not present in EC958 or not identified in our analysis . 
+ The majority of essential genes in EC958 are conserved with 91 % of the genes present in more than 90 % of E. coli complete genomes available . 
+ In this study , we provided two layers of evidence for the role of each serum resistance gene : by simultaneously assaying a large mutant library and by generation of defined mutants for independent phenotypic testing . 
+ Indeed , using defined mutagenesis we were able to confirm a role for 46 of the 56 genes identified by TraDIS in serum resistance . 
+ To the best of our knowledge , this represents the first large scale follow-up of TraDIS data in this manner and highlights the effectiveness of the technique in large-scale functional genomics . 
+ Our study also revealed that trans complementation of specific mutants can occur in a large mutant library population , as demonstrated by our findings with an ompA mutant . 
+ We also demonstrated complete complementation of mutants containing deletions in the acnB , greA and fbp genes , corroborating their novel role in serum resistance independent of LPS alterations . 
+ Finally , we provided further insight into the mechanistic action of the serum resistance genes identified in EC958 by examining the survival of the respective mutants to outer membrane stresses that affect antimicrobial access and osmotic potential . 
+ The search for genetic determinants of serum resistance in bacteria has been ongoing since the 1970s [ 98 ] . 
+ Our current understanding of the mechanisms that promote bacterial resistance to human serum include a role for surface structures such as O antigens , K antigens , outer membrane proteins ( OmpA ) , and plasmid-encoded proteins ( TraT , Iss ) [ 37 -- 39 ] ; notably , however , not all of these mechanisms are required for resistance in a single bacterial strain [ 32,33 ] . 
+ Our study represents the first report to simultaneously investigate the entire serum resistome of one strain . 
+ Our results demonstrated that both the lipid A-core and O25 antigen are crucial for serum resistance in EC958 , while K antigen does not contribute to serum resistance . 
+ Out of 54 defined mutants investigated , half had changes in their LPS gel patterns , all of which resulted in serum sensitivity . 
+ In contrast , none of the K capsular biosynthesis genes were identified in our TraDIS screen . 
+ This result is similar to that reported for the O75 : K5 UPEC strain GR-12 , where alterations in O75 LPS affected serum resistance more than a K5 null mutation [ 34 ] . 
+ It is likely , however , that there are strain-specific differences for the role of O antigen and K capsule in serum resistance , and that this reflects differences in the make-up of these structures . 
+ For example , previous analysis of an E. coli O4 : K54 : H5 blood isolate revealed that the K54 antigen contributes more to serum resistance than the O4 antigen [ 99 ] . 
+ Other K antigens such as the K1 and K2 capsules have also been shown to play an important role in serum resistance [ 100,101 ] . 
+ The K antigen expressed by EC958 has not been typed but genomic analysis shows that EC958 has a group 2 capsular gene cluster that conforms to the conserved structure of this group [ Reviewed in 102 ] . 
+ However , region 2 , which encodes glycosyltransferases specific for each K type , shares such low similarity with available sequences in the GenBank database that deducing its K type in silico was not possible . 
+ The O25 antigen gene cluster was further characterized using sequence analysis , targeted mutation and LPS profiling . 
+ All of the dTDP-a-L-rhamnose biosynthesis genes ( rmlCADB ) were required for serum resistance . 
+ However , EC958 lacks the biosynthesis genes for UDP-FucNAc , a component of O25 antigen unit . 
+ If , based on the cross reaction of antiserum against the O25 antigen with O25b expressing cells , we assume that EC958 has the same O-antigen repeat unit as the O25 determined from previous studies [ 57,58 ] , then EC958 must possess a novel mechanism for the synthesis or uptake of UDP-FucNAc . 
+ Two additional surface antigens that contribute to serum resistance in EC958 are the enterobacterial common antigen and colanic acid ( M antigen ) . 
+ Mutations in five ECA biosynthesis genes rendered EC958 susceptible to serum . 
+ While mutation of three of these genes ( wecA , wzzE and wecD ) affected LPS and sensitivity to SDS , mutation of wecE and wecF did not change LPS ( although a wecF mutant was more sensitive to SDS ) , suggesting that the ECA may be involved in serum resistance , perhaps indirectly via its role in membrane integrity . 
+ Our data also suggest the involvement of colanic acid in serum resistance . 
+ Three genes encoding for colanic acid biosynthesis ( wcaI , gmm and wcaF ) were identified by TraDIS . 
+ Gmm is most likely to be involved in the biosynthesis of colanic acid [ 82 ] , while mutation of the wcaI gene did not confer serum resistance . 
+ The product of wcaF was predicted to be an acetyltransferase [ 103 ] required for colanic acid production [ 69 ] . 
+ The EC958 wcaF mutant possessed an altered LPS pattern with a reduced amount of O-antigen ( especially very long chain length O-antigen ) and was sensitive to both SDS and high osmolarity . 
+ The enhanced sensitivity of the EC958 wcaF mutant could therefore be explained by a number of factors , including altered LPS , altered colonic acid and reduced overall membrane integrity . 
+ A number of other genes were identified that contributed to serum resistance in an LPS-dependent manner . 
+ The gene nagA encodes N-acetylglucosamine-6-phosphate deacetylase , an enzyme important for the metabolism of N-acetyl-D-glucosamine [ 104 ] . 
+ It catalyzes the first step in producing UDP-GlcNAc , a nucleotide sugar required for ECA , lipid A and peptidoglycan biosynthesis , by deacetylating N-acetylglucosamine-6-phosphate to glucos-amine-6-phosphate [ Reviewed in 105 ] . 
+ However , NagA is not solely responsible for the production of UDP-GlcNAc because glucosamine-6-phosphate can also be obtained via GlmS from fructose-6-phosphate or taken up from the environment by ManXYZ [ 105 ] . 
+ Indeed , the LPS banding pattern of the nagA mutant was different to that of the parent strain ( i.e. it possessed thicker second and third bands from the bottom of the gel ; Figure S3 ) , suggesting its enhanced sensitivity phenotype may be associated with a predominantly shorter O antigen . 
+ Pgm is a phosphoglucomutase that catalyses the reversible conversion of glucose-1-phosphate to glucose-6-phosphate , an important step in galactose and maltose catabolism [ 106 ] . 
+ A pgm mutant has several phenotypes ; it is defective in swimming and swarming mobility [ 107 ] , it possesses an aberrant ( shorter and wider ) cell morphology , is sensitive to detergents [ 108 ] and it stains blue with iodine when grown in the presence of galactose [ 106 ] . 
+ An EC958 pgm mutant produced little full length O-antigen ; the majority of its LPS condensed into a thick band of incomplete lipid A-core and a thin clear band of lipid A-core plus one unit of O-antigen . 
+ This feature is consistent with the high serum and SDS sensitivity phenotypes observed for this mutant . 
+ GalE is a well-studied enzyme that catalyzes the interconversion of UDP-galactose and UDP-glucose [ Reviewed in 109 ] . 
+ Both nucleotide sugars are required for colanic acid biosynthesis . 
+ Furthermore , UDP-glucose is used in three steps to synthesize the LPS outer core ( catalyzes by WaaG , WaaI and WaaJ ) . 
+ LPS patterns of the galE mutant exhibited a very thick band of lipid Acore , suggesting that the lipid A-core in this strain has multiple sizes . 
+ This may be explained by the limiting effect of UDP-glucose in the three steps involved in its incorporation into the outer core . 
+ UDP-glucose can also be synthesized by GalU from glucose-1-phosphate [ 110 ] , which may explain why an EC958 galE mutant could still make LPS ( Figure S3 ) . 
+ Despite being able to make LPS , however , the galE mutant was sensitive to human serum . 
+ Whether this sensitivity can be attributed to the effect a galE mutation has on LPS or colanic acid remains to be determined . 
+ We have demonstrated the successful application of multiplexed TraDIS for a functional genomics study targeted at E. coli EC958 , a prototype strain from the globally disseminated and multidrug resistant E. coli ST131 lineage . 
+ This approach enabled the first description of an essential gene set from an ExPEC strain . 
+ Our work has also defined the serum resistome in E. coli EC958 . 
+ This comprehensive inventory of E. coli EC958 genes that contribute to this phenotype provides a framework for the future characterization of virulence genes in ExPEC as well as other Gram-negative pathogens that cause systemic infection . 
+ Materials and Methods
+ Ethics statement
+ Approval for the collection of human blood was obtained from the University of Queensland Medical Research Ethics Committee ( 2008001123 ) . 
+ All subjects provided written informed consent . 
+ Bacterial strains and growth conditions
+ E. coli EC958 was isolated from the urine of a patient presenting with community UTI in the Northwest region of England and is a representative member of the UK epidemic strain A ( PFGE type ) , one of the major pathogenic lineages causing UTI across the United Kingdom [ 23 ] . 
+ EC958Dlac , which contained a mutation in the lac operon , was used in competitive assays . 
+ This strain had an identical growth rate to wild-type EC958 . 
+ Strains were routinely cultured at 37uC on solid or in liquid Luria Broth ( LB ) medium supplemented with the appropriate antibiotics ( Cm 30 mg/ml or gentamicin 20 mg/ml ) unless indicated otherwise . 
+ Generation of miniTn5-Cm mutant library
+ A miniTn5-Cm transposon containing a Cm cassette flanked by Tn5 mosaic ends ( sequence from Epicenter ) was PCR amplified from pKD3 plasmid DNA ( NotI digested ) using primers 2279 59-CTGTCTCTTATACACATCTcacgtcttgagcgattgtgtagg-39 and 2280 59 - CTGTCTCTTATACACATCTgacatgggaattagc-catggtcc-39 . 
+ The PCR reactions were performed using Phusion High-Fidelity DNA polymerase ( New England BioLabs ) . 
+ The amplicon was purified using the QIAGEN MinElute PCR purification kit before being phosphorylated using T4 polynucle-otide kinase ( New England BioLabs ) and subjected to the final purification step . 
+ A total of at least 800 ng of this miniTn5-Cm transposon DNA was incubated in an 8 ml reaction containing 4 ml of EZ-Tn5 transposase ( Epicenter Biotechnologies ) at 37uC for 1 h then stored at 220uC . 
+ Bacterial cells were prepared for electroporation as previously described [ 42 ] . 
+ Briefly , cells were grown in 26TY broth to an OD600 of 0.3 -- 0.5 , then harvested and washed three times in 0.56 volume of 10 % cold glycerol before being resuspended in a 1 / 10006 volume of 10 % cold glycerol and kept on ice . 
+ A volume of 60 ml cells was mixed with 0.2 ml of transposomes and electroporated in a 2 mm cuvette using a BioRad GenePulser set to 2.5 kV , 
+ 25 mF and 200V . 
+ Cells were resuspended in 1 mL SOC medium and incubated at 37uC for 2 hours , then spread on LB agar plates supplemented with Cm 30 mg/mL . 
+ After incubation overnight at 37uC , the total number of colonies was estimated by counting a proportion from multiple plates . 
+ Chloramphenicol resistant colonies were resuspended in sterilised LB broth using a bacteriological spreader before adding sterile glycerol to 15 % total volume and stored in 280uC . 
+ Each batch of mutants contained an estimated 32,000 to 180,000 mutants . 
+ The final library of 1 million mutants was created by pooling 11 mutant batches , resulting in a cell suspension of 261011 CFU/ml . 
+ Transposon library screening in human serum
+ Freshly pooled human serum was collected from at least two healthy individuals on the day of the experiment . 
+ Ten milliliters of blood was collected from each person and centrifuged at 4000 rpm for 10 minutes to collect the serum . 
+ Approximately 26108 viable mutants were incubated in 1 ml of 50 % freshly pooled human serum in LB broth at 37uC for 90 minutes . 
+ The control samples were prepared the same way but were incubated with inactivated serum ( Millipore ) instead of fresh serum . 
+ Both control and test experiments were performed in duplicate . 
+ The cells were then washed twice with sterile 16PBS to remove serum , transferred to 100 ml LB broth and allowed to grow at 37uC with 250 rpm shaking for 4 hours . 
+ The genomic DNA was then extracted from 5 ml of each culture using Qiagen 100-G genomic tips . 
+ Multiplexed TraDIS
+ Genomic DNA was standardized to 3.6 mg in a volume of 120 ml before being sheared by Covaris S2 according to the Illumina TruSeq Enrichment gel-free method ( TruSeq DNA sample preparation v2 guide ) . 
+ The subsequent steps of DNA end repair , DNA end adenylation and adapter ligation were also done following the Illumina TruSeq v2 instructions . 
+ The adapter-ligated fragments containing transposon insertion sites were enriched using a transposon-specific indexing forward primer and the Illumina reverse primer Index 1 ( Table S5 ) at 500 nM each per reaction . 
+ This 89 bp forward primer binds specifically to miniTn5-Cm transposon ( 25 bp ) and carries 64 bp overhang which includes 6 bp index sequence , 33 bp binding site for Illumina read 1 sequencing primer and 23 bp P5 sequence for binding to the flowcell . 
+ The enrichment step was done using the KAPA Library Amplification kit ( KAPA Biosystems ) at an annealing temperature of 60uC for 22 cycles . 
+ The KAPA Library Quantification kit was used to measure the concentration of DNA fragments in the enriched library . 
+ Twelve libraries from 12 samples were pooled to equimolar concentrations to achieve a cluster density of 850 K / 2 mm when 10 nM of library pool was loaded onto the flowcell . 
+ The 12-plex pool was loaded on 3 lanes of TruSeq v2 and 3 lanes of TruSeq v3 flowcells for sequencing using a 100 cycles , paired-end protocol to access the reproducibility and read quantity among lanes and flowcells . 
+ The data from six samples ( Figure 1A ) were presented in this study . 
+ The TraDIS sequence data from this study was deposited on the Sequence Read Archive ( SRA ) under the BioProject number PRJNA189704 . 
+ Analysis of nucleotide sequence data
+ Sequence reads from the FASTQ files were split according to twelve 6 bp index sequences combined with the 37 bp transposonspecific sequence using fastx_barcode_splitter.pl ( total length of 43 bp as barcodes , allowing for 2 mismatches ) ( FASTX-Toolkit version 0.0.13 , http://hannonlab.cshl.edu/fastx_toolkit/index . 
+ html ) . 
+ The barcode matching reads were trimmed off the 43 bp barcode at the 59end and 25 bp of potential low quality at the 39 end , resulting in high quality sequence reads of 31 bp in length that were used to map to the EC958 chromosome ( PRJEA61443 ) by Maq version 0.7.1 [ 111 ] . 
+ Subsequent analysis steps were carried out as previously described [ 42 ] to calculate the number of sequence reads ( raw read counts ) and the number of different insertion sites for every gene , which were then used to estimate the threshold to identify essential genes . 
+ The read counts and insertion sites were visualized using Artemis version 13.0 [ 112 ] . 
+ The circular genome diagram was generated by CGView [ 113 ] and linear genetic comparison was illustrated using Easyfig version 2.1 [ 114 ] . 
+ Statistical analyses
+ We identified genes required for survival in human serum by comparing the differences in read abundance of each gene between the inactivated serum control and active serum test samples using the Bioconductor package edgeR ( version 2.6.10 ) [ 48 ] . 
+ The raw read counts from two biological replicates of each treatment were loaded into the edgeR package ( version 2.6.12 ) using the R environment ( version 2.15.1 ) . 
+ Genes that have very low read counts in all the samples ( essential genes ) were removed from further analysis . 
+ The composition bias in each sequence library was normalized using the trimmed mean of M value ( TMM ) method [ 115 ] . 
+ We then used the quantile-adjusted conditional maximum likelihood ( qCML ) for negative binomial models to estimate the dispersions ( biological variation between replicates ) and to carry out the exact tests for determining genes with significantly lower read counts in the test samples compared to the control samples [ 116,117 ] . 
+ Stringent criteria of log foldchange ( logFC ) # 21 and false discovery rate # 0.001 were chosen to define a list of the most significant genes for further investigation by phenotypic assays . 
+ Molecular methods
+ Chromosomal DNA purification , PCR and DNA sequencing of PCR products was performed as previously described [ 118 ] . 
+ Defined mutations were made using the l-Red recombinase method with some modifications [ 22,47 ] . 
+ In brief , the final PCR products were fused and amplified from three fragments containing two 500-bp homologous regions flanking the gene of interest and a Cm cassette from pKD3 plasmid ( see Table S5 for list of primers ) . 
+ The fused PCR products were then electroporated into EC958 harbouring a gentamicin resistant plasmid carrying the l-Red recombinase gene . 
+ Mutants were then selected and confirmed by sequencing . 
+ Complementation was done by cloning the gene of interest into a gentamicin resistant derivative of pSU2718 [ 119 ] at BamHI-XbaI cut sites ( primers listed in Table S5 ) . 
+ The construct was then transformed into the respective mutant and induced using 1 mM IPTG before and during phenotypic assays . 
+ Serum resistance assay
+ Overnight bacterial cultures were washed in phosphate buffered saline ( PBS ) and then standardized to an OD600 of 0.8 . 
+ Equal volumes ( 50 mL ) of standardized cultures and pooled human sera were mixed and incubated for 90 min at 37uC ( in triplicates ) . 
+ Viable counts were performed to estimate the number of bacterial cells prior to serum treatment ( t = 0 min ) and post serum treatment ( t = 90 min ) . 
+ E. coli MG1655 was used as a control as it is completely killed by serum . 
+ Serum and PBS only samples served as sterility controls . 
+ Competitive serum resistance assays were performed in the same manner , except that a 50:50 mixture of wild-type ( EC958Dlac ) and mutant strains were used . 
+ Viable counts were performed on MacConkey agar , which allowed the differentiation of EC958Dlac ( non-lactose fermenter ) and the mutant strains . 
+ SDS and NaCl sensitivity assays (MIC)
+ The MICs of SDS and NaCl were determined by broth microdilution method as previously described [ 120 ] . 
+ We used five concentrations for SDS including 0.125 % , 0.0625 % , 0.031 % , 0.016 % and 0.008 % in LB . 
+ For NaCl , the range of concentration was 0.8 M , 0.6 M , 0.5 M , 0.4 M and 0.3 M. 
+ LPS gel assay
+ LPS was extracted from bacterial strains and LPS patterns were determined by Tricine-SDS Polyacrylamide gel electrophoresis ( TSDS-PAGE ) and visualized by silver staining as previously described [ 121,122 ] . 
+ Supporting Information
+ Acknowledgments
+ We would like to thank Gemma Langridge , Keith Turner , Sabine Eckert , Daniel Turner , Philip Hugenholtz , Nicholas West and Brian Forde for expert advice and Leopold Parts for statistical analysis in R. 
+ Author Contributions
+ Conceived and designed the experiments : MDP SS MT SAB MAS . 
+ Performed the experiments : MDP KMP SS SWL LPA MESA VMM . 
+ Analyzed the data : MDP DGM SAB MAS . 
+ Contributed reagents / materials/analysis tools : VMM MU SAB . 
+ Wrote the paper : MDP SS DGM MT MU SAB MAS .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/24565265.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/24565265.txt 0 → 100644
View file @27818a9
+ Inferring functional transcription factor-gene binding
+ Abstract 
+ Background : Chromatin immunoprecipitation ( ChIP ) experiments are now the most comprehensive experimental approaches for mapping the binding of transcription factors ( TFs ) to their target genes . 
+ However , ChIP data alone is insufficient for identifying functional binding target genes of TFs for two reasons . 
+ First , there is an inherent high false positive/negative rate in ChIP-chip or ChIP-seq experiments . 
+ Second , binding signals in the ChIP data do not necessarily imply functionality . 
+ Methods : It is known that ChIP-chip data and TF knockout ( TFKO ) data reveal complementary information on gene regulation . 
+ While ChIP-chip data can provide TF-gene binding pairs , TFKO data can provide TF-gene regulation pairs . 
+ Therefore , we propose a novel network approach for identifying functional TF-gene binding pairs by integrating the ChIP-chip data with the TFKO data . 
+ In our method , a TF-gene binding pair from the ChIP-chip data is regarded to be functional if it also has high confident curated TFKO TF-gene regulatory relation or deduced hypostatic TF-gene regulatory relation . 
+ Results and conclusions : We first validated our method on a gathered ground truth set . 
+ Then we applied our method to the ChIP-chip data to identify functional TF-gene binding pairs . 
+ The biological significance of our identified functional TF-gene binding pairs was shown by assessing their functional enrichment , the prevalence of protein-protein interaction , and expression coherence . 
+ Our results outperformed the results of three existing methods across all measures . 
+ And our identified functional targets of TFs also showed statistical significance over the randomly assigned TF-gene pairs . 
+ We also showed that our method is dataset independent and can apply to ChIP-seq data and the E. coli genome . 
+ Finally , we provided an example showing the biological applicability of our notion . 
+ Background
+ Cellular responses to external stimuli or environmental changes are usually conveyed through cellular regulatory networks consisting of different regulatory pathways [ 1-4 ] . 
+ Transcriptional regulation plays an essential role for construction of such regulatory pathways at the level of transcription . 
+ The binding of specific transcription factors ( TFs ) controls the initialization or the expression level of genes . 
+ Thus , unravelling functional TF-gene binding events is a fundamental start-up for us to understand the regulatory mechanisms in cells [ 1 ] . 
+ Chromatin immunoprecipitation experiments ( ChIP-chip or ChIP-seq ) are now the most comprehensive experimental approaches for mapping the binding of TFs to their target genes [ 2,3,5 ] . 
+ However , ChIP data alone are insufficient for identifying functional binding target genes of TFs for two reasons . 
+ First , there is an inherent high false positive/negative rate in ChIP-chip or ChIP-seq experiments [ 6 ] . 
+ Although by controlling the level of statistical significance for the analysis can reduce the false positive rate , this approach is prone to getting a great number of false negatives [ 7,8 ] . 
+ Second , binding signals in the ChIP-chip data do not necessarily imply functionality . 
+ The binding of TFs to the promoters of genes may not lead to subsequent transcription activation / repression [ 9,10 ] . 
+ It was suggested that one can improve the confidence of the TF-gene binding pairs by integrating ChIP-chip data with data from other high-throughput technologies [ 10 ] . 
+ Although other high-throughput data may themselves be noisy , the stochastic noises are generally assumed to be uncorrelated [ 9-11 ] . 
+ Hence , combining different sources of high-throughput data is a promising way of extracting biologically meaningful information embedded in any noisy high-throughput data . 
+ Previous studies had tried to extract functional binding target genes of TFs by integrating the ChIP-chip data with various kinds of high-throughput data . 
+ By the types of the integrated data , the integration processes could be roughly divided into two categories . 
+ The first type of existing methods relied on stepwise integration of the ChIP-chip data with the expression data and/or the TF binding motif data . 
+ Functionality of the TF-gene binding pairs was confirmed by some gene properties inferred from the mRNA expression profiles . 
+ For example , ChIP positives were classified into functional and non-functional TF-gene binding pairs by the regression analysis of the mRNA expression profiles [ 12 ] . 
+ And others tried to infer functional binding target genes of TFs from the ChIP-chip data by the synergy properties derived from the mRNA expression profiles and the TF binding motif data [ 10 ] . 
+ Finally , another group of researchers developed the CERMT algorithm to refine the possible functional binding target genes of TFs based on covariance of multiple expression time series [ 13 ] . 
+ The other type of existing bioinformatic approaches for extracting functional TF-gene binding pairs combined diverse biological data beside the mRNA expression profile data through the construction of different types of Baye-sian classifiers . 
+ Some utilized the framework of probabilistic inference to predict the functional TF-gene binding pairs by TF binding site motifs , evolutionary conservation , regulatory potential , nucleosome data and DNase hypersensitive sites [ 14 ] . 
+ Others constructed a Bayesian classifier from comprehensive sources of yeast high-throughput data such as protein-protein interaction data , the phylogenetic data and the nucleosome data [ 7,8 ] . 
+ Another group specified a hierarchical Bayesian model to augment the protein-DNA binding data with gene expression and sequence data [ 15 ] . 
+ Still others defined and trained a logistic regression classifier based on a mapping of preference scores on gene location information and TF-binding motifs [ 9 ] . 
+ While previous works had combined comprehensive sorts of high-throughput experimental data and biological data , these approaches did not consider the TF-gene regulatory relation when inferring functional TF-gene binding pairs . 
+ Expression data , TF binding motif data and other integrated biological data , such as nucleosome positioning and evolutionary conservation , did not directly provide the TF-gene regulatory relation . 
+ Nowadays , the TF knockout ( TFKO ) data are available for biologists to infer the TF-gene regulatory relation [ 4 ] . 
+ TFKO data convey the experimental results showing the change in the expression of some target gene caused by the deletion or mutation of certain TF-encoding gene , revealing the fact that the TF regulates this target gene via certain mechanisms [ 16 ] . 
+ Since none of previous methods had directly utilized the TF-gene regulatory relation , we propose an alternative to infer functional TF-gene binding pairs based on the integration of ChIP-chip data with TF-gene regulatory relation . 
+ In this study , instead of using the supervised or unsupervised learning tools as in the Bayesian approach and other methodologies , our method uses a network approach on the combination of the ChIP-chip data and the TFKO data to infer the functional TF-gene binding pairs . 
+ A TF-gene binding pair , or a ChIP positive , is called functional if we can also find evidence showing that the TF regulates the expression of the target gene . 
+ While direct overlapping of the ChIP-chip and TFKO datasets could give some possible functional TF-gene binding pairs , this only provided a very small number of such pairs because of the low overlap of the ChIP-chip data and the TFKO data [ 4,17,18 ] . 
+ It was shown that the low overlap between the TFKO data and the binding data partly resulted from knockout epistasis [ 4 ] or backup mechanisms [ 17 ] . 
+ The epistatic regulation cascade from the given TF-pair with a higher confident regulation of an intermediate TF on the target gene is suggested to compensate the knockout effect of the regulation of this hypostatic TF-gene pair . 
+ Hence we further considered the possible hypostatically masked ( to the epistatic regulation cascade ) TF-gene regulation relation deduced from the original TFKO data . 
+ The literature-curated TFKO regulation relation and the deduced hypostatic regulation regulation for given ChIP positive TF-gene binding pairs were also checked through regulatory confidence scores ( RCS ) . 
+ Finally , a TF-gene binding pair with a confident TF-gene regulation , which may be the curated TFKO regulation or deduced hypostatic TF-gene regulation , was classified to be functional . 
+ We validated the proposed method on a gathered ground truth set and also demonstrated the superior biological significance of our method to three previous methods by testing the results on functional enrichment , the prevalence of protein-protein interaction and target gene co-expression . 
+ Of all three different aspects of biological significance demonstration , our results all showed improvement over the three previous works . 
+ We also showed that our method is dataset independen and can apply to ChIP-seq data and the E. coli genome . 
+ Finally , we provided an example showing the biological applicability of our notion . 
+ Materials and methods ChIP-chip data and TF knockout data
+ Genome-wide in vivo TF-gene binding data of 204 yeast Saccharomyces cerevisiae TFs produced by the ChIP-chip technology were adopted from [ 3 ] . 
+ The TF-gene binding assignments were provided in the form of binding p-values , on the hypothesis that the TF binds to the promoter region of the target gene . 
+ To show the data-independence of our method , we also adopted the ChIP-chip data generated from [ 2 ] . 
+ In their location analysis protocol , a promoter region of a gene is defined as the upstream intergenic region . 
+ The genome wide intergenic regions were obtained and amplified using the Yeast Intergenic Region Primers ( Research Genetics ) [ 19 ] . 
+ In Saccaromyces cerevisiae , transcription factor binding sites are positioned further upstream in the intergenic regions and vary over a wide range in promo-ters [ 20 ] . 
+ In this study we adopted the promoter definition and promoter regulation as the ones used in the study of Harbison et al. [ 3 ] . 
+ The TF knockout data of 156 yeast Saccharomyces cerevisiae TFs were retrieved from the Yeastract Indirect evidence [ 16 ] . 
+ Yeastract has deposited the published data showing the change in the expression of the target genes resulting from the deletion or mutation of certain TF-encoding genes . 
+ This so-called indirect evidence therefore provides the TF-gene regulation information . 
+ We retrieved 21871 TFKO TF-gene regulation pairs for 156 TFs from Yeastract . 
+ Protein-protein interaction data and mRNA expression data Two different datasets were collected for use in the biological validations . 
+ For showing the prevalence of pro-tein-protein interaction , we gathered the physical protein-protein interaction data from the Biogrid data-base , which had deposited comprehensive collections of protein-protein interactions [ 21 ] . 
+ And for comparing the expression coherence between different methods , we retrieved 40 time series mRNA expression profiles in yeast Saccharomyces cerevisiae from ExpressDB [ 22 ] . 
+ These 40 different expression conditions were obtained as previously suggested [ 10 ] . 
+ Details of the 40 different conditions can be found in the online supplementary files of [ 10 ] . 
+ These conditions represent the natural and perturbed expression profiles , including the conditions under sporulation in budding yeasts [ 23 ] , yeast cell cycle conditions [ 24,25 ] , the DNA damaged conditions [ 26,27 ] and etc. . 
+ Benchmark control sets
+ A set of 484 functional TF-gene binding pairs adopted from [ 7 ] were used as the positive control set . 
+ These lit-erature-curated ground truth functional TF-gene binding pairs were collected from the Incyte YPD Database . 
+ To obtain the negative control set , we generated 1516 random TF-gene pairs . 
+ To enhance the stringency of the negative control set , we further required the random pairs not to belong to the positive control and not to have any literature evidence curated in the Yeastract database [ 16 ] . 
+ A total of 2000 TF-gene pairs were used as the control set . 
+ Finding the hypostatic TF-gene regulation relation We used the literature-curated TF-gene regulation pairs from the TFKO data to construct a regulatory relation network . 
+ An edge from a given TF to its regulatory target gene was added to the regulatory relation network if there is TF-gene regulatory relation from the TFKO data showing that the TF regulates the target gene . 
+ For a given TF-gene binding pair , if they are connected by a path of length of two with an intermediate node TF X in the constructed regulatory relation network , this means that the given TF regulates the TF X and the TF X regulates the given gene . 
+ We said that there is deduced hypostatic regulatory relation ( to the epistatic regulation cascade through TF X on the target gene ) in the constructed regulatory relation network for this given TF-gene pair ( Case II in Figure 2 ) . 
+ And the knockout effect of this hypostatic regulation relation may thereby be masked . 
+ Epistatic regulation cascade path of length more than two can be inferred in a similar manner . 
+ We searched such deduced hypostatic regulatory relation of a TF-gene binding pair by the modified breadth first search ( mBFS ) algorithm [ 18 ] . 
+ The algorithm returned the shortest regulation path between a given TF-gene pair in the regulatory relation network . 
+ To briefly explain the algorithm , two different sets of nodes were kept , one for the visited nodes and one for the discovered nodes . 
+ First , we started out from the given TF and put it in the set of visited nodes . 
+ Then we tried out all of its `` unvisited '' neighbours in the regulatory relation network and put the neighbours in the set of discovered nodes . 
+ This process was repeated for each node in the set of discovered nodes in the `` first-in , first-out '' manner , acting as a new starting node in each round , until we reached the target gene . 
+ The shortest regulation path could be obtained by tracing back the process . 
+ Calculating the RCSs for the confidence of the TF-gene regulation The deduced hypostatic TF-gene regulatory relation might be introduced by chance since there is still a larg amount of random noises in the original TFKO data due to the inherent uncertainty in high-throughput technol-ogies . 
+ These inherent random noises could cause the over-fitting problem when deducing hypostatic relations from analysing the network paths [ 17 ] . 
+ We avoided the stochastically introduced TFKO regulation or epistatic relation cascade by comparing the paths found in the constructed regulatory regulation network with those found in the randomly generated network . 
+ We forced the random networks to preserve the node degrees to mimic the degree distribution of the original TFKO regulatory relation pairs [ 28 ] . 
+ Then we used the Student t-test to test against the null hypothesis that the length of the shortest regulation cascade found in the constructed regulation relation network is statistically equal to the average of the lengths of the shortest paths in the randomly generated regulation network . 
+ Multiple hypotheses test correction was done by using the method of FDR . 
+ And the regulation confidence score ( RCS ) is calculated by the formula , which takes the minus logarithm on the corrected statistic p-value : 
+ The RCS measures the non-stochastic confidence of the given regulation pair . 
+ To calculate RCSs , we constructed 10000 degree-pre-serving TFKO random regulation networks . 
+ The choice for sampling size of 10000 from the random distribution is to have a sampling precision of 95 % confidence within 1 % of error , according to the sampling theorem [ 29 ] . 
+ To generate the degree preserving random network , first the degree sequences for nodes in the regulatory relation network were generated , including both the in-degree sequence and the out-degree sequence . 
+ Then we expanded the degree sequences into node frequency sequences . 
+ For example , if we have an in-degree sequence of { 3,2,1 } , then we get an in-degree node frequency sequence of { 1,1,1,2,2,3 } with respect to the given in-degree sequence , for the first node having three in-coming edges in the network . 
+ Then we randomly shuffled the in-degree node frequency sequence and the out-degree node frequency sequence . 
+ Edges in the random network were added for nodes from the randomly shuffled out-degree node frequency sequence to nodes from the randomly shuffled indegree node frequency sequence iteratively . 
+ This guarantees the random networks with the degree preserving property [ 30 ] . 
+ Details of the degree distribution and the properties of the random networks can be found in [ 30 ] . 
+ Results and discussion Overview of the approach
+ TFKO data conveys the experimental results showing the change in the expression of some target gene resulting from the deletion or mutation of certain TF-encod-ing gene and the ChIP-chip data conveys the TF-gene binding information ( Figure 1 ) . 
+ To extract the functional TF-gene binding pairs , we used a network approach to combine the ChIP-chip data and the TFKO data . 
+ The overall algorithm is depicted in Figure 2 . 
+ We started out from the ChIP positives as the potential functional TF-gene binding pairs . 
+ As mentioned , a ChIP positive is called functional if we can also find evidence showing that the TF regulates the expression of the target gene . 
+ Hence first we sought two different possible TF-gene regulation relation from the TFKO data : the curated TFKO TF-gene regulation ( Case I in Figure 2 ) and the deduced hypostatic TF-gene regulation ( Case II in Figure 2 ) . 
+ For a given TF-gene binding pair , if there was no litera-ture-curated TFKO TF-gene regulation for it , we then tried to see if there exists a possible hypostatic TF-gene regulation for it . 
+ It was shown that the low overlap between the binding data and the TFKO data may partly result from knockout epistatic mechanisms and a single TF knockeout effect on a target gene may be compensated by the epistasis regulation cascade through another paralogous partner TF X [ 4,17 ] ( Case II in Figure 2 ) . 
+ Note that TF X may not directly bind the target gene . 
+ This innovated us to find the possible masked hypostatic TF-gene regulation . 
+ The compensated TF-gene regulation was said to be hypostatic to the epistatic regulation cascade through TF X since the knockout effect of this TF-gene pair may possibly be masked by the epistatic regulation cascade with a more confident regulation of an intermediate TF X on the given target gene . 
+ This meant that there might exist at least an epistatic regulation cascade , or a path from the TF to its target gene through an intermediate TF X in the regulatory relation network , for this TF-gene pair . 
+ Therefore , we constructed a regulatory relation network from the TFKO data and sought the hypostatic TF-gene regulation relation by checking the existence of a regulation path in the regulation relation network for the given ChIP post-ives . 
+ This was done by a previously published path finding algorithm ( as described in the Methods section ) . 
+ Since there are inherent uncertainties in the highthroughput technologies , the TFKO regulation relation or the deduced hypostatic TF-gene cascade regulation may be introduced by chance . 
+ Because of this reason , as a second step , the curated TFKO regulation relation or the deduced hypostatic regulation relation for the given TF-gene ChIP positive was also checked by the regulatory confidence score ( RCS ) , which was scored through the comparison with random TFKO data ( See Methods section ) . 
+ A regulation relation with RCS higher than 1000 was set to be confident . 
+ Finally , a TF-gene binding candidate pair was classified to be functional if it has a confident TF-gene regulation evidence , which may come from the curated TF-gene regulation or the hypostatic TF-gene regulation . 
+ Validation on a literature-proven benchmark TF-gene set First we validated our proposed method on a gathered lit-erature-curated functional TF-gene binding set from [ 7 ] . 
+ The literature-proven functional TF-gene binding pairs were treated as the positive control set and the randomly generated TF-gene pairs were viewed as the negative control set . 
+ Applying our method on the prepared control set , we can generate the receiver operating characteristic ( ROC ) curve by adjusting the regulatory confidence scores ( RCSs ) ( Figure 3 ) . 
+ The RCS is a measurement for the confidence of the curated TFKO TF-gene regulation relation or the deduced hypostatic TF-gene regulation relation as described in the Methods section . 
+ An ROC curve plot is a graphical tool demonstrating the performance of the discriminating algorithm as its discriminating score varies . 
+ The curve is plotted for ( 1-specificity ) against sensitivity . 
+ Specificity is defined as the fraction of true negatives out of the discriminated negatives and ( 1-specificity ) is also known as the false positive rate . 
+ And sensitivity , also known as the true positive rate , is defined as the fraction of true positives out of the discriminated positives . 
+ In the ROC curve plot of our method , we can see that our method acted as a good classifier for discriminating functional binding pairs from non-functional binding pairs ( area under curve , AUC = 0.78 ) due to its performance of low false positive rates with high true positive rates ( to the leftmost of the ROC curve plot ) . 
+ Notice that the trembling phenomenon between 0.2 and 0.3 shows that most of the discriminating scores , which is the RCS , resulted in false positive rate of 0.2 to 0.3 with true positive rate of about 0.7 to 0.8 . 
+ Since our method does not rely on any training process , this result is unlikely to suffer from over-fitting . 
+ Hence we conclude that our method can distinguish functional TF-gene binding pairs from nonfunctional binding pairs . 
+ 82 % of the original TF-gene binding pairs suggests functionality In this study , we demonstrated our algorithm on yeast Saccharomyces cerevisiae because of the comprehensibility and availability of the genome-wide TFKO data source . 
+ Harbison et al. have performed the most comprehensive genome-wide chromatin immunoprecipitation microarray ( ChIP-chip ) experiments on the yeast Saccharomyces cerevisiae [ 3 ] . 
+ And from the experimental results , they reported the binding target genes of 204 TFs . 
+ It was suggested taking a p-value threshold of 0.001 in the original error models to ensure a low false positive rate . 
+ But it has been shown that the TF-gen binding pair might already be functional with the p-value threshold of 0.01 [ 7 ] . 
+ Hence in this study , we took the threshold of 0.01 to get start-up ChIP positives for identifying functional binding targets . 
+ Only a subset ( 95 TFs ) of the reported TFs in the study of Harbison et al. were possible for our analysis because of the lack of the TFKO data . 
+ After applying our proposed algorithm , we further required that the percentage of the functional binding targets of a TF should reach above 25 % since we observed a ` jump ' from 23 % to 60 % in the percentage distribution of the extracted functional binding target genes ( Figure 4 and Additional File 1 ) . 
+ Since the binding pairs adopted from Harbison et al. have been already restricted to the TF-gene binding pairs that fit into the promoter binding model , this ` jump ' indicated that the low percentage of functional binding targets of certain TFs might also result from the lack of TFKO data . 
+ As a result , there were 72 TFs suitable for our analysis and a total of 7259 functional TF-gene binding pairs were established by our method ( See Additional File 2 ) . 
+ On average , there are about 82 % ( 7259/8904 ) functional TF-gene binding pairs in the original ChIP-chip data for the 72 analysable TFs . 
+ Direct overlapping of the ChIP-chip data and the TFKO data resulted in 1220 functional TF-gene pairs . 
+ And we have expanded the number of functional TF-gene binding pairs by about 6 folds . 
+ We used these 72 analysable TFs with percentages of the functional binding targets above 25 % in the following validation . 
+ Biological significance comparison with previous methods We next compared the biological significance of the functional TF-gene binding pairs identified in this study and by three previously published methods . 
+ Only three approaches on yeast Saccharomyces cerevisiae were selected for our comparison because of data availability . 
+ For the methods of combining diverse biological data sources to extract the functional TF-gene binding pairs , the log likelihood score ( LLS ) method [ 7 ] is available for our comparison . 
+ The LLS method integrated the most comprehensive experimental datasets to train the Baye-sian classifier , where ChIP-chip data , TF binding motif data , data of sensu stricto species of Saccharomyces cerevisiae , co-expression clustering , physical protein-protein interaction data and the phylogenetic profiles of gene pairs were used . 
+ For the methods relying on stepwise integration of the ChIP-chip data with the expression data , there are two approaches available for our comparison . 
+ One is the method of using the expression coherence score ( ECS ) and TF binding sites information [ 10 ] and the other is the method of MA-networker ( MA ) algorithm [ 12 ] . 
+ The ECS method was based on the integration of co-expression clustering , TF binding motifs , TF synergistic interactions and the TF co-localization in the promoter regions of target genes , which were mostly evaluated by the EC scores . 
+ And the MA-networker algorithm classified the ChIPpositives into functional and non-functional targets based on their expression patterns across different experimental conditions and the transcription factor occupancy data . 
+ In the following sub-sections , we showed that our results conveyed better biological relevance than these three previous works by testing the identified functional binding target genes of TFs on functional enrichment , the prevalence of protein-protein interaction and coexpression . 
+ Details of the following validations can be found in Additional File 3 . 
+ Functional enrichment analysis When several genes are functionally bound by the same TF , one might expect that the gene products of these genes are prone to carry similar cellular functions [ 7,31 ] . 
+ Gene ontology ( GO ) terms provide this sort of characterization . 
+ Following the definition of [ 7 ] , the target gene set of a TF is called functionally enriched if the gene set significantly overlaps with at least one gene ontology category across the three different GO categories ( the biological process ontology , the molecular function ontology and the cellular component ontology ) . 
+ Based on this notion , the functional enrichment test was performed by the web-based tool , GO Term Finder [ 32 ] . 
+ The statistical functional GO term enrichment test was implemented by one-tailed Fisher Exact Test in Go Term Finder . 
+ The statistical results then went through FDR correction for multiple hypotheses tests . 
+ For our analysis , we took a p-value threshold of 0.05 . 
+ Of the 62 common TFs between our results and the results of the LLS method , 59 TF functional binding target gene sets ( 95.2 % ) extracted by our method showed significant functional enrichment while only 54 TF functional binding target gene sets ( 87.1 % ) extracted by the LLS method bore significant functional enrichment . 
+ And comparing the 46 common TFs between our results and the results of the ECS method , our results still outperformed the results of the ECS method ( 43 functionally enriched TF functional binding target gene sets compared with 37 functionally enriched TF functional binding target gene sets , i.e. 93.5 % compared with 80.4 % ) . 
+ As for the 18 common TFs between our results and the results of the MA algorithm , our results showed better functional enrichment ( 18 functionally enriched TF functional binding target gene sets compared with 17 functionally enriched TF functional binding target gene sets , i.e. 100 % compared with 95.6 % ) ( Figure 5 ) . 
+ Note that the high percentage of 100 % achieved in the comparison to the results of MA algorithm is mainly due to the scare available common functional gene target sets of TFs between our results and that of MA algorithm . 
+ In summary , our method can extract functional binding target genes of TFs with better functional enrichment than previous approaches . 
+ Prevalence of protein-protein interaction Functionally related genes tend to carry similar cellular functions by forming protein complexes [ 21 ] . 
+ Thus , if the target genes of a TF have statistically significant overlap with a protein complex , this prevalence of pro-tein-protein interaction may imply the trend that the TF-gene pairs are functional [ 31 ] . 
+ As proposed in [ 31 ] , a protein complex is defined by two set of genes , the core genes and the neighbouring genes . 
+ Core genes are defined by the genes that are both assigned as the target genes and translated to gene products with physical protein-protein interaction . 
+ The set of neighbouring genes gathers the genes that are translated to the gene products having physical protein-protein interaction with the core genes . 
+ A protein complex is formed by the union of the core genes and the neighbouring genes . 
+ Following the above definitions , a set of functional binding targets of a TF showed prevalence of proteinprotein interactions if the proportion of the interacting proteins , or the core genes , in this set was significantly higher than the proportion of the protein complex within the whole genome . 
+ By defining the protein complex as described , we then performed the one-tailed Fisher exact test to test the protein complex overlap significance with FDR correction [ 33 ] and a threshold of a = 0.05 . 
+ Among the 62 common TFs between our results and the results of the LLS method , 49 TF functional binding target gene sets ( 79.0 % ) extracted by our method showed prevalence of protein-protein interaction while only 42 
+ TF functional binding target gene sets ( 67.7 % ) extracted by the LLS method did . 
+ And among 46 TFs between our results and the results of the ECS method , 34 TF functional binding target gene sets ( 73.9 % ) extracted by our method showed prevalence of protein-protein interaction , comparing with only 23 TF functional binding target gene sets ( 50.0 % ) extracted by the ECS method did . 
+ For the 18 common TFs between our results and the results of the MA algorithm , 14 TF functional binding target gene sets ( 77.8 % ) extracted by our method showed prevalence of protein-protein interaction in comparison with only 12 TF functional binding target gene sets ( 66.7 % ) extracted by the MA algorithm did ( Figure 5 ) . 
+ In summary , our method can extract functional binding target genes of TFs with better protein functional cooperation than previous approaches can . 
+ Expression coherence analysis It has been shown that functionally relevant target genes of TFs tend to have similar mRNA expression profiles [ 34 ] . 
+ Using this notion , we calculated the Pearson correlation coefficients from the expression vectors between any two genes [ 18 ] . 
+ It has been pointed out that the TF-gene pairs are usually functional under different cellular conditions [ 10 ] . 
+ Hence , we collected 40 mRNA expression time series profiles under different conditions , as described in the Material section , and verified the expression coherence under these conditions . 
+ Since both positive correlation and negative correlation are both functionally relevant , we took the squares of the coefficients as our expression correlation measurement . 
+ Then under different conditions we performed the one-tailed rank sum test on the expression correlation coefficients to compare the expression coherence between two lists of functional binding target genes of TFs from different methods . 
+ We tested on the two different hypotheses : ( 1 ) the means of the square of correlation coefficients for the TF functional targets mined out in this study are higher than those generated by other methods ( 2 ) the means of the square of correlation coefficients for the TF functional targets mined out in this study are lower than those generated by other methods . 
+ Multiple hypotheses test correction was done by FDR correction and a threshold of a = 0.05 was adopted . 
+ In different expression time series conditions , we first counted the percentage of the functional binding target sets of TFs with statistically higher expression coherence . 
+ Note that the percentages of more expression coherent functional binding target sets of TFs in two different methods may not add up to 100 % since some of the TF functional binding gene sets may have statistically invar-iant average expression correlations between two methods . 
+ Then when one method gained more functional binding target sets of TFs than the other , we said that this method is more expression coherent than the other under this expression time series condition . 
+ We compared our results with those of LLS , ECS , and MA methods for all 40 different expression profiles . 
+ Compared with the LLS method , our method conveyed better expression coherence under 31 different conditions . 
+ Our results were more correlated in the expression profiles than the results of the LLS method under most of the conditions . 
+ And our method showed more expression coherent pairs than the ECS method did under all 40 different conditions . 
+ Finally , compared with the MA algorithm , our method stood out under 21 conditions while the results of MA algorithm got better expression coherence under 15 conditions ( Figure 6 ) . 
+ Our methods were still more correlated in the expression profiles than the results of the MA algorithm . 
+ All in all , our method can extract functional TF-gene binding pairs with better expression coherence . 
+ Comparison with random assignments
+ To make statistical assessment of the results in this study , we made simulations against random assignments of functional/non-functional TF-gene pairs . 
+ In our study , we have shown that about 82 % of the original TF-gene binding pairs suggests functionality . 
+ Hence we randomly removed 18 % of the original binding targets from the 72 analysable TFs as the random assignment of non-functional TF-gene binding targets . 
+ We repeated this process for 50 times and gained 50 randomly assigned functional TF-gene binding pair lists . 
+ Then we performed the biological significance validation for the randomly assigned results as the stochastic lower limit of the performance of the validation methods . 
+ After that , for the functional enrichment validation and the prevalence of protein-protein interaction validation , we used the left-tailed one sample student t-test to assess the significance of our result , compared to randomly generated TF-gene assignments for these 72 analysable TFs . 
+ The test was performed on the hypothesis that the average performance of the random results are statistically lower than the results in this study . 
+ As for the expression coherence validation , we performed the paired two sample t-test for our results and the random ones in every expression condition . 
+ In each condition-specific expression profile , we used the rank sum test as described earlier on the two stated hypotheses to compare the expression coherence between the result in this study and the randomly assigned TF-gene lists . 
+ The number of target gene sets satisfying the hypothesis of `` results in this study is better than the random results '' and the number of target sets satisfying the hypothesis of `` results in this study is worse than the random results '' for the comparison of our results to the 50 randomly assigned TF-gene lists formed the testing pairs . 
+ We said that our result is better than random assignments in the specific condition if we have the right-tailed p-value by the paired t-test on the 50 testing pairs below 0.05 in this condition 
+ As shown in Table 1 to Table 3 and Additional File 5 , the overall performance of results in this study is statistically better than the 50 random TF-gene target lists ( with p-value threshold of 0.05 ) . 
+ For the 72 TFs , our results generated 62 functionally enriched functional binding target gene sets of TFs , compared to the performance of random assignments with mean and standard deviation equal to 44.8 and 3.53 , respectively ( one-tailed p-value = 2.76 × 10-36 ) . 
+ For the validation of prevalence of protein-protein interaction , there were 53 functional binding target gene sets of TFs in this study showing prevalence of protein-protein interaction , while the random lists obtained a performance with mean and standard deviation of 17.68 and 3.78 , respectively ( one-tailed p-value = 7.04 × 10-50 ) . 
+ Finally for the expression coherence validation , in 39 of the 40 different expression conditions our results were statistically more expressioncoherent than the 50 random lists . 
+ Detail of the validation results can be found in Additional File 5 and Additional File 6 . 
+ In summary , the results generated by our method are statistically meaningful and outperforms mere random assignments . 
+ Applicability to different datasets
+ To show that our method for identifying functional TF-gene binding pairs is not dataset-dependent , we also performed our method on the dataset provided by Lee et al. [ 2 ] . 
+ Since the original binding dataset used in the ECS method and MA algorithm was from the experimental analysis of Lee et al. , we compared the biological relevance of this result with those generated by the ECS method and the MA algorithm . 
+ In applicable biological validations , similar conclusion also held for these comparisons ( See Additional File 4 ) . 
+ For statistical assessment of the our results obtained by applying our method to the dataset of Lee et al. , similar statistical significance also held ( See Table 2 and Table 3 ) . 
+ Detail of the validation results can be found in Additional File 5 and Additional File 6 . 
+ Applicability to ChIP-seq datasets and the E. coli genome The approach described in this study is not restricted to the Yeast genome or to merely ChIP-chip data . 
+ We further demonstrated that our method can applied to ChIP-seq binding datasets and to the E. coli genome data . 
+ ChIP-seq provides a promising way for identification of transcription factor binding sites , but requires high quality of antibodies to the transcription factors . 
+ Thus the technique is still not scalable to the genome-wide scale of transcription factors [ 35 ] . 
+ While in yeast this is already done by ChIP-chip , no similar work has yet been done repeatedly for ChIP-seq . 
+ Hence we only showed the applicability of our method to the binding target of Ste12 identified by ChIP-seq . 
+ We adopted the ChIP-seq data for genome-wide Ste12 transcription factor binding sites from the work of Lenfrancois et al. [ 36 ] . 
+ In their work , the binding targets were manually curated and provided in the form of binding p-values . 
+ We took the binding targets of Ste12 with the p-value threshold of 0.05 , as suggested in their analysis . 
+ A total of 926 targets of Ste12 were established from their experimental results . 
+ Ste12 is a transcription factor known to involve in mating and cell fusion [ 37 ] . 
+ Hence we tested the Gene Ontology enrichment of the original binding target list and the functional target list filtered by our method . 
+ Significantly enriched GO terms ( FDR corrected p-value < 0.001 ) related to cell fusion ( GO :0000747 , conjugation with cellular fusion ) and mating ( GO :0019236 , response to phero-mone ) were identified in the filtered functional TF-gene binding targets but not in the original target list ( Table 4 ) . 
+ This shows that our method can extract functional binding targets from the original ChIP-seq dataset . 
+ We also demonstrated the applicability of our algorithm to the genome-wide data of E. coli . 
+ Since there is no other similar analysis for E. coli , we gathered a literature-proven benchmark functional TF-gene binding pair set for E. coli and performed our algorithm on this ground truth dataset . 
+ The benchmark set of 338 functional TF-gene binding pairs with at least three different experimental supports was collected from RegulonDB [ 38 ] . 
+ The negative control set was generated as described in the Method Section . 
+ We also collected 3990 TF-gene regulation pairs , which conveyed the same information as the TFKO data , from RegulonDB . 
+ Then we used these TF-gene regulation pairs to construct the E. coli regulatory relation network . 
+ Applying our method on the prepared control set , we can obtain the ROC curve by varying the RCSs . 
+ As shown in Figure 7 , our method acted as a good classifier for discriminating functional binding pairs from non-functional binding pairs with AUC = 0.78 . 
+ Hence our method can well-suited for the E. coli genome as well . 
+ Biological applicability of our method
+ We have listed the potential epistatic regulation cascade for every functional TF-gene binding pairs settled in this study ( Additional File 2 ) . 
+ To demonstrate the biological applicability of our method , we took the literature-proven functional TF-gene binding pair ( Leu3p , BAP2 ) as an example . 
+ BAP2 is a gene encoding a permease in Saccharomyces cerevisiae for the uptake of branched-chain amino acids from media containing nitrogen source [ 39 ] . 
+ Deletion of BAP2 greatly reduced the up-take of leucine , isoleucine and valine . 
+ And Leu3p is a TF in yeast that regulates the transcription of a group of genes involved in leucine biosynthesis [ 40 ] . 
+ In the original binding data-set from the work of Harbison et al. , the promoter region of BAP2 was found to be bound by Leu3p ( with binding p-value of order 10-7 ) . 
+ But there were no single TF knockout evidence showing the regulation of BAP2 by Leu3p . 
+ In our method , we found out that although there were no TFKO evidence for this TF-gene pair , we could find a regulation cascade ( Leu3p ® Msn4p ® Rpn4p ® Yap1p ® Stp1p ® BAP2 ) in the constructed regulation relation network through the TF Stp1p with RCS bigger than 1000 . 
+ Hence we concluded that the ChIP positive ( Leu3p , BAP2 ) has hypostatic regulation evidence and is a functional TF-gene binding pair . 
+ The Leu3p binding site of BAP2 was established by computer assisted analysis [ 41 ] . 
+ In their work , Nielsen et al. . 
+ 31/926 (p = 0.00133)
+ also showed that mutating the Leu3p binding site reduced the transcription level of BAP2 on SC medium and concluded that Leup3 binding is required to obtain full BAP2 promoter activity . 
+ This matches our classification of the functional binding of Leup3 to BAP2 . 
+ They also demonstrated that Stp1p can functionally bind to the promoter of BAP2 independently of the presence of the functional binding of Leu3p to BAP2 and are synergistic with Leu3p , suggesting the possible masking effect on the knockout event of Leu3p on BAP2 . 
+ Conclusion
+ Inferring functional TF-gene binding pairs serves as the first step toward under-standing the regulatory pathways in cells . 
+ We have demonstrated that by integrating the ChIP-chip data with the TFKO data , our method can infer functional TF-gene binding pairs . 
+ And compared with three previous works , our method generated more biologically relevant results . 
+ Using our identified functional TF-gene binding pairs , it is possible to reconstruct a more reliable cellular transcriptional network , which will be helpful to unravel the unknown cellular mechan-isms in future researches . 
+ Additional file 6: Validation results of our results compared to 50
+ List of abbreviations used
+ ChIP : chromatin immunoprecipitation ; TF : transcription factor ; TFKO data : transcription factor knockout data ; PPI : protein-protein interaction ; LLS : log likelihood score method ; ECS : expression coherence score method . 
+ Competing interests
+ The authors declare that they have no competing interests.
+ Authors’ contributions
+ WSW conceived the research topic and provided essential guidance . 
+ THY developed the algorithms and wrote the manuscript . 
+ THY performed all the simulations and analysis . 
+ WSW proofread the final manuscript . 
+ Both authors have read and approved the final manuscript . 
+ Acknowledgements
+ This study was supported by National Cheng Kung University and Taiwan National Science Council NSC 99-2628-B-006-015-MY3 . 
+ And we thank Chia-Ming Yeh for helping perform the functional enrichment test on the 50 randomly assigned TF-gene pair lists . 
+ Declarations
+ The full funding for the publication fee came from Taiwan National Science Council and College of Electrical Engineering and Computer Science , National Cheng Kung University . 
+ This article has been published as part of BMC Systems Biology Volume 7 Supplement 6 , 2013 : Selected articles from the 24th International Conference on Genome Informatics ( GIW2013 ) . 
+ The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/7/S6 . 
+ Published: 13 December 2013
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/24650566.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/24650566.txt 0 → 100644
View file @27818a9
+ Methods
+ Inferring gene regulatory networks from gene expression data at whole genome level is still an arduous challenge , especially in higher organisms where the number of genes is large but the number of experimental samples is small . 
+ It is reported that the accuracy of current methods at genome scale signiﬁcantly drops from Escherichia coli to Saccharomyces cerevisiae due to the increase in number of genes . 
+ This limits the applicability of current methods to more complex genomes , like human and mouse . 
+ Least absolute shrinkage and selection operator ( LASSO ) is widely used for gene regulatory network inference from gene expression proﬁles . 
+ However , the accuracy of LASSO on large genomes is not satisfactory . 
+ In this study , we apply two extended models of LASSO , L0 and L1/2 regularization models to infer gene regulatory network from both high-throughput gene expression data and transcription factor binding data in mouse embryonic stem cells ( mESCs ) . 
+ We ﬁnd that both the L0 and L1/2 regularization models signiﬁcantly outperform LASSO in network inference . 
+ Incorporating interactions between transcription factors and their targets remarkably improved the prediction accuracy . 
+ Current study demonstrates the efﬁciency and applicability of these two models for gene regulatory network inference from integrative omics data in large genomes . 
+ The applications of the two models will facilitate biologists to study the gene regulation of higher model organisms in a genome-wide scale . 
+ 2014 Elsevier Inc. . 
+ All rights reserved . 
+ 1. Introduction
+ Inferring gene regulatory networks from high-throughput gen-ome-wide data is still a major challenge in systems biology . 
+ Transcriptome that is measured by microarray or RNA-seq describes the expression of all the genes of a genome . 
+ Various methods have been developed to infer gene regulatory networks from such transcriptome data ( reviewed in [ 1 ] ) . 
+ Even though most of the methods perform very well on smaller genomes such as Escherichia coli , very few of them can accurately handle larger genomes , such as human and mouse . 
+ In large genomes , the complexity of gene regulatory system dramatically increases . 
+ Thousands of regulators , such as transcription factors ( TFs ) , communicate in different ways to regulate tens of thousands of target genes in various tissues or biolog ¬ 
+ ⇑ Corresponding author at : Department of Biochemistry , LKS Faculty of Medicine , The University of Hong Kong , Hong Kong Special Administrative Region , China . 
+ Fax : +852 2855 1254 . 
+ E-mail address : junwen@hku.hk ( J. Wang ) . 
+ 1 These authors contributed equally to this work . 
+ ical processes . 
+ However , for a speciﬁc gene , only a few key TFs collaborate and control its expression change in a speciﬁc cell type or developmental stage . 
+ Thus , the gene regulatory network inference for such large genomes becomes a sparse optimization problem , which is to search a small number of key TFs from a pool of thousands of TFs for tens of thousands of targets based on the dependencies between the expression of TFs and the targets . 
+ The sparsity levels of the gene regulatory networks in large genomes are much higher than those in small genomes . 
+ One of the most popular approaches is to estimate the pair-wise correlation between genes using metrics like Pearson 's correlation coefﬁcient , Spearman 's correlation coefﬁcient , mutual information , partial correlation coefﬁcient or expression alignment , and then ﬁlter for causal relationships to infer TF-target pairs [ 2 -- 6 ] . 
+ For example , ARACNE ( Algorithm for the Reconstruction of Accurate Cellular Networks ) is proved to achieve low error rate and scaled up to mammalian system [ 2,7,8 ] . 
+ Another popular approach is to use the regression-based models to select TFs with target gene-speciﬁc sparse linear-regression [ 1,9,10 ] . 
+ In regression-based models , leas absolute shrinkage and selection operator ( LASSO ) is most commonly used for gene regulatory network inference . 
+ In the ﬁeld of optimization , LASSO is also called the L1 regularization model [ 11 ] . 
+ Many efﬁcient algorithms , such as ISTA ( Iterative Soft Thres-holding Algorithm ) , LAR ( Least Angle Regression ) and YALL1 ( Your ALgorithms for L1 ) , have been developed for this model , and some of them are applied to large datasets [ 11 -- 20 ] . 
+ Recently developed methods , such as NARROMI , take advantages of both correlation-and regression-based approaches and achieve improved accuracy [ 10 ] . 
+ However , both approaches suffer from several limitations when dealing with large genomes . 
+ Marbach et al. have shown that all of the 35 methods assessed , including both approaches , have much less precision for gene regulatory network inference in Saccharomyces cerevisiae than those in E. coli . 
+ Because the genome of S. cerevisiae is larger than that of E. coli , and its gene regulatory network is much more complex , only about 2.5 % area under the precision-recall curve ( AUPR ) is achieved by these methods in S. cerevisiae , which is close to random . 
+ The correlation-based approaches need to calculate the correlation of all gene pairs , thus the computation cost increases exponentially with the number of genes . 
+ The sparsity level of the gene regulatory networks in large genomes is much higher than those in small genomes , hence false positives also increase remarkably when a similar correlation cutoff is used to predict the gene regulatory networks [ 2,7,21 ] . 
+ The accuracy of LASSO in large-scale problems with high sparsity is also reported to be not satisfactory [ 22 -- 24 ] . 
+ To improve the performance , current methods require a large number of transcriptome proﬁles , usually at least one fold of the number of regulators [ 1 ] . 
+ However , in most biological studies , sample size is much smaller than the number of regulators due to high experimental cost . 
+ The limitation in sample size impedes performance of both approaches . 
+ In correlation-based methods , limited sample size makes the correlations between genes sensitive to noise , and thus high correlated gene pair needs not imply a true regulatory relationship . 
+ In large genomes , it is even more difﬁcult to infer true regulatory links from a larger pool of highly correlated gene pairs with smaller sample size . 
+ In LASSO-based methods , when sample size is smaller than the number of regulators , multiple solutions are available , which makes it difﬁcult to determine which solution is more biologically meaningful . 
+ To encounter the small sample size problem , heterogeneous datasets from different tissues , biological processes or experimental conditions are usually pooled together before the modeling , which increases prediction accuracy . 
+ However , the inferred network will lose its cell-type or condition speciﬁcity [ 1,25 ] . 
+ Further , heterogeneous datasets may weaken dependencies between the expression of TFs and their targets , since gene regulatory network topology varies among different biological processes , and one TF may regulate different gene sets in different cell states . 
+ Chromatin immunoprecipitation ( ChIP ) coupled with highthroughput techniques , such as sequencing or microarray ( ChIP-seq/chip , hereafter refer to as ChIP-X ) data , which is also called cistrome , are also widely used to construct gene regulatory network in recent years [ 26 -- 28 ] . 
+ However , TF binding sites detected by ChIP-X show only the genomic positions of the TF binding , but could not tell which gene is its target and whether and how the TF binding affects the transcription of its targets . 
+ Recently developed web server ChIP-Array that integrates both ChIP-X and transcriptome data to construct gene regulatory networks takes the advantages of both technologies and provides high accuracy , but it can be used for single TF-centered network only and requires the transcriptome data to be generated under the perturbation of the same TF as that of ChIP-X data . 
+ However , genome-wide gene regulatory networks contain multiple TFs , but only a limited number of TFs have both omics data . 
+ In summary , although several methods have been proposed to infer genome-wide gene regulation networks from transcriptome proﬁles either alone [ 1,29 ] , or in combination with predicted TF binding data [ 30 ] , they are all limited by high computation cost and low accuracy in large genomes . 
+ To tackle these limitations , here we propose to integrate ChIP-X data with transcriptome pro-ﬁles for gene regulatory network inference and use the Lp ( p < 1 ) regularization model to improve the accuracy of LASSO , hereafter referred to as the L1 regularization model . 
+ It is reported that it is able to achieve more sparse and accurate solutions by virtue of the Lp ( p < 1 ) regularization model , even from small amount of samples [ 22,23,31 ] . 
+ However , the Lp ( p < 1 ) regularization model suffers from its non-convexity and it is very difﬁcult in general to design efﬁcient algorithm for its solutions . 
+ Fortunately , the iterative hard thresholding algorithm [ 32 ] and iterative half thresholding algorithm [ 23 ] have been developed to solve the L0 and L1/2 regularization models respectively , but they have not been applied to gene regulatory network inference . 
+ Due to their low computation cost and fast convergence rate , we found that they are suitable for the gene regulatory network inference problem . 
+ Thus in this study , we apply the L0 and L1/2 regularization models to infer gene regulatory networks from ChIP-X and transcriptome data in mouse embryonic stem cells ( mESCs ) . 
+ We compare their performance with the L1 regularization model and ﬁnd that ChIP-X data dramatically improved the accuracy of all three models , and the L0 and L1/2 regularization models signiﬁcantly outperform the L1 regularization model in the presence of ChIP-X data . 
+ The proposed models biologists to infer gene regulatory networks in higher model organisms using integrative omics data , efﬁciently . 
+ All the datasets and codes are available at : http://jjwanglab.org/LpRGNI . 
+ 2. Materials and Methods
+ 2.1. Lp regularization models
+ Regulatory relationship between TFs and targets can be represented approximately by a linear system ( Fig. 1A ) AX 1/4 B þ e where A 2 Rm r denotes the expression matrix of candidate TFs , B 2 Rm n denotes the expression matrix of all target genes , e 2 Rm n denotes an error matrix and X 2 Rr n denotes the regulation matrix that describes the regulatory relationship between all TFs and the targets , m denotes the number of samples , r denotes the number of factors and n denotes the number of target genes . 
+ In gene regulatory network inference , for each target gene j , we want to minimize the difference between AX : ; j and B : ; j with a small number of selected TFs , which is a sparse optimization problem described as min kAX : ; j B : ; jk2 s : t : kX : ; jk0 6 K ; qfiP fifififififififififififififififi where k k2 denotes the Euclidean norm as kX r 2 : ; jk2 1/4 i 1/4 1Xi ; j and kX : ; jk0 denotes the number of non-zero elements in X : ; j . 
+ The less kX : ; jk0 means higher sparsity of X : ; j . 
+ It indicates how many TFs are found to regulate the target gene j. For this problem , a popular and practical technique is to transform the sparse optimization problem into an unconstrained optimization problem , called a regularization problem ( Fig. 1B ) . 
+ For example , given a gene j and its expression proﬁle B : ; j , the L0 regularization model is to minimize the difference between AX : ; j and B : ; j , and maximize the sparsity of X : ; j : minkAX : ; j B 2 r : ; jk2 þ kkX : ; jk0 ; X : ; j2 where k > 0 is the regularization parameter , providing a tradeoff between accuracy and sparsity . 
+ Even though the L0 regularization model is close to the original problem we want to solve , it is NP-hard to achieve a global optimal solution [ 33 ] . 
+ Thus , the L1 regularization model ( LASSO ) , a popular relaxation of the L0 regularization model , is introduced to solve the following problem : 
+ 1 i 1/4 1jXi ; jj . 
+ However , in many practical applications , the solutions yielded from the L1 regularization model are less sparse than those of the L0 regularization model [ 22 -- 24 ] . 
+ Recently , the L1/2 regularization model is proposed and proved to perform better than the L1 regularization model [ 23 ] . 
+ This model is described as 
+ : ; j B : ; jk2 þ kkX : ; jk1 = 2 ; X : ; j2R P pfififififififififi 2 where kX r : ; jk1 = 2 1/4 i 1/4 1 jXi ; jj . 
+ Neither L0 nor L1/2 regularization model has been used in gene regulatory network inference . 
+ 2.2. Algorithms
+ In this study , we apply the iterative thresholding algorithms to solve the Lp ( p = 1,1 / 2,0 ) regularization models for gene regulatory network inference from omics data . 
+ The iterative thresholding algorithm is the most widely studied class of the ﬁrst-order methods for the sparse optimization problem . 
+ It is convergent and of very low computational complexity [ 11,23,32 ] . 
+ Beneﬁtting from its simple formulation and low storage requirement , it is very efﬁcient and applicable even for the large-scale sparse optimization problem . 
+ In particular , the iterative soft thresholding algorithm i introduced and developed to solve the L1 regularization problem ; the iterative hard thresholding algorithm is proposed to solve the L0 regularization problem ; and the iterative half thresholding algorithm is designed for the L1/2 regularization problem . 
+ Brieﬂy , in each iteration , these three algorithms ﬁrstly have a same gradient step where the upper indexes of X and Z denote the number of iterations , v denotes the stepsize , which we always choose as 1/2 , and signð Þ denotes the sign function . 
+ The solutions of three algorithms achieved after 200 iterations , except those indicated , are used for further evaluation and comparison . 
+ For all the three algorithms , the regularization parameter k is updated iteratively so as to keep the sparsity of Xk . 
+ For the details , one can refer to [ 23,32 ] . 
+ Since : ; j the number of TFs ( the sparsity of X : ; j ) that regulate a particular gene is usually unknown and biologists need to select a small number of TFs for the experimental veriﬁcation , we make this parameter adjustable to the user . 
+ For further evaluation and comparison of three models , we test a series of factor numbers ( kX : ; jk0 ) from 1 to 100 ( sparsity level 0.1 % -10 % ) . 
+ In each test , we ﬁx the same sparsity ( kX : ; jk0 ) for all three models . 
+ We assume that the TFs which are detected in a higher sparsity will be more important than those detected in a lower sparsity . 
+ Thus we score each factor according to the highest sparsity where it gets a non-zero index in the ﬁnal solution ( Section 2.5 ) . 
+ 2.3. Data collections
+ Transcriptome data were downloaded from Gene Expression Omnibus ( GEO ) . 
+ 245 experiments under perturbations in mESC were collected from three papers ( Table 1 ) [ 34 -- 36 ] . 
+ Each experiment produced transcriptome data with or without overexpression or knockdown of a gene , two replicates for control and two replicates for treatment . 
+ Gene expression fold changes between control and TF perturbation samples of 19978 genes in all experiments were log2 transformed and formed matrix B ( Fig. 1A ) . 
+ Candidate regulators , including TFs , mediators , co-factors , chromatin modiﬁ-ers and repressors , were collected from four TF databases , TRANSFAC , JASPAR , UniPROBE and TFCat , as well as literatures . 
+ Matrix A was made up of the expression proﬁles of 939 regulators ( Fig. 1A ) . 
+ A literature-based golden standard TF-target pair set from biological studies ( Fig. 1C ) , including 97 TF-target interactions between 23 TFs and 48 target genes ( low-throughput golden standard ) , was downloaded from iScMiD ( Integrated Stem Cell Molecular Interactions Database ) . 
+ Another golden standard mESC network was constructed from high-throughput ChIP-X and transcriptome data under TF perturbation ( high-throughput golden standard ) . 
+ 28 TFs with evidences from both high-throughput ChIP-X and transcriptome data under perturbation were collected from literatures ( Tables 2 and 3 ) . 
+ TF binding sites were called with MACS [ 37 ] for ChIP-seq data and Cisgenome [ 38 ] for ChIP-chip data . 
+ Distance cutoff between a TF binding site and a potential target gene was set as 10 kbp . 
+ Differentially expressed genes under TF perturbation were deﬁned as top 5 % up-regulated and top 5 % down-regulated genes , whose expression changes were signiﬁcant with p-value < 0.05 . 
+ Single TF-centered network was constructed for each TF by ChIP-Array [ 39 ] with both high-throughput data . 
+ Direct target of all TFs were combined as a golden standard mESC network , which contains 40006 links between 13092 notes ( Fig. 1C ) . 
+ Basically , each target in the network is evidenced by the cell-type speciﬁc binding sites on its promoter and the expression change in the perturbation experiment of the TF , which is generally accepted as a true target . 
+ 2.4. Integration of ChIP-X and transcriptome data
+ ChIP-X identiﬁes in vivo active and cell-speciﬁc TF binding sites of a particular TF . 
+ A gene with an active TF binding site around its promoter is considered to be a potential target of the TF . 
+ Thus , ChIP-X data provides possible direct TF-target connections and may help regularization models to approximate the biologically meaningful solutions for the whole genome . 
+ Since matrix X describes the connections between TFs and targets , the TF-target connections deﬁned by ChIP-X data were converted into an initial matrix X0 ( Fig. 1A and Table 2 ) . 
+ Without ChIP-X data as a prior , the initial X0 was artiﬁcially set as 0 . 
+ When integrating ChIP-X data , if TF i has binding site around the gene j promoter within 10 kbp , except those indicated , the Pearson 's correlation coefﬁcient ( PCC ) between the expression proﬁles of TF i and gene j was calculated and assigned on X0 . 
+ PCC can be positive or negative , representing i ; j the TF can activate or repress the target gene expression . 
+ 2.5. Evaluations
+ The area under the curve ( AUC ) of a receiver operating characteristic ( ROC ) curve is widely applied as an important index of the overall classiﬁcation performance of an algorithm . 
+ We applied AUC to evaluate the performance of these three regularization models . 
+ For each pair of TF i and target j , if the Xi ; j is non-zero in the ﬁnal solution matrix X , this TF is regarded as a potential regulator of the target . 
+ A series of factor numbers ( kX : ; jk0 ) from 1 to 100 were tested for each target . 
+ We assume that the TFs which are detected in a higher sparsity ( smaller kX : ; jk0 ) will be more important than those detected in a lower sparsity ( larger kX : ; jk0 ) . 
+ Thus in the process of calculating the AUC , a score Si ; j was applied as the predictor for TF i on target j : maxð1 = kX : ; jk0Þ ; X 1/4 i ; j -- 0 Si ; j 0 ; Xi ; j 1/4 0 And either high-throughput or low-throughput evaluation dataset was used as the golden standard . 
+ Furthermore , 1000 times ' bootstrap was used to test the stability of AUCs . 
+ After the 1000 times ' bootstrap , 1000 AUCs of each model was obtained . 
+ We then used Wilcoxon test to compare the performance of three models . 
+ 3. Results and discussions
+ Since , current methods are mostly designed for transcriptome data , we ﬁrstly inferred the gene regulatory network using curren methods from transcriptome data alone . 
+ To retain the cell-type speciﬁcity of the inferred gene regulatory networks , we incorporated data from only mESC . 
+ Two TF-target datasets from highthroughput and low-throughput studies respectively are used as golden standards to evaluate the accuracy . 
+ As expected , their AUCs are all close to random on both evaluation data because the number of samples is much less than the number of regulators in our mESC data set ( Fig. 2 ) . 
+ Thus we incorporated ChIP-X data to improve the accuracy . 
+ We applied three Lp ( p = 1,1 / 2,0 ) regularization models on the integration of ChIP-X and transcriptome data , and compared their performance . 
+ The L1 regularization model used the same algorithm as ISTA , but using a different initial X0 derived from ChIP-X data . 
+ Without the integration of ChIP-X data , the performance of the three models is similar to other methods and very poor when evaluated with either high-throughput ( HGS ) or low-throughput ( LGS ) golden standard ( Fig. 3A and B ) . 
+ When ChIP-X data are integrated for network inference , the performance of all three regularization models dramatically improved , and the L0 and L1/2 regularization models signiﬁcantly outperformed the L1 regularization model ( Fig. 3A and B , Table 4 ) . 
+ The stabilities of the AUCs for all models are high when evaluated on high-through-put golden standard ( Fig. 3C ) , while , due to the small number of known TF-target pairs , AUCs of different models calculated on low-throughput golden standard are less stable ( Fig. 3D ) . 
+ But the L0 and L1/2 regularization models for integrative data still showed signiﬁcantly better performance when evaluated on this data set ( Table 4 and Fig. 3B ) . 
+ ROCs describe the information of false positive rate ( FPR ) and true positive rate ( TPR ) of all models . 
+ In biological studies , 0.05 is commonly used as the cutoff of FPR . 
+ At the FPR of 0.05 , when evaluated with HGS , integration of ChIP-X data achieved TPRs of 0.637 , 0.594 and 0.079 for the L0 , L1/2 and L1 regularization models respectively , and calculation with transcriptome data alone had TPRs of 0.031 , 0.034 and 0.044 for the L0 , L1/2 and L1 regulari-zation models respectively ( Fig. 3A ) . 
+ The L0 , and L1/2 regularization models achieved much higher sensitivity when integrating ChIP-X data , which meets biological researches ' demand much better . 
+ Fig. 7 shows an example networks known to be active in mESC . 
+ A strict and identical cutoff ( score Si ; j P 0:1 ) is used for all three models . 
+ The L0 and L1/2 regularization models reported much more true targets than the L1 regularization model . 
+ The advantages of the L0 and L1/2 regularization models are also demonstrated by the smaller error between AX : ; j and B : ; j when compared with the L1 regularization model ( Fig. 4 ) . 
+ The errors are calculated along different sparsity levels ( Fig. 4A ) or iterations ( Fig. 4B ) . 
+ When more TFs are selected , smaller error is achieved . 
+ To obtain solutions with same sparsity level , the L1 regularization model showed a larger error than the L0 and L1/2 regularization models , which means that it gets less sparse if we ﬁx the error allowance . 
+ This observation is consistent with several previous numerical experiments [ 22 -- 24 ] . 
+ The error reduction along the iteration of the L0 and L1/2 regularization models were also faster than the L1 regularization model after 100 iterations ( Fig. 4B ) . 
+ Heatmap in Fig. 5B illustrates an example gene regulatory network inferred by the L0 regularization model in mESC , in which only a small portion of TFs regulates most of the targets . 
+ The inferred TFs with more targets signiﬁcantly overlapped with the TFs that were intensively reported to be associated with ESCs ( Fig. 5A , p-value is 9.95E-10 in hypergeometric test ) . 
+ The iterative thresholding algorithms we applied here only require low storage and computation cost [ 11,23,32 ] . 
+ The computational complexity of iterative hard and soft thresholding algorithms for the L0 and L1 regularization models , respectively , has been reported to be O ( krlogm ) , where k is the number of iterations , r is the number of TFs and m is the number of targets [ 11,32,40 ] . 
+ The L1/2 regularization model costs the similar computation time , since its computational complexity is similar to that of the L0 and L1 regularization models ( Fig. 6 ) . 
+ Here , these three regularization models inferred the gene regulatory network with 939 TF , 19978 targets and 245 samples of mouse genome within on hour with one Intel Core i7 in personal laptop ( 2.00 GHz , 8.00 GB of RAM ) , slower than YALL1 , but much faster than ARACNE and NARROMI ( Fig. 6 ) . 
+ The ChIP-Array web server we developed previously can also integrate ChIP-X and transcriptome data to construct gene regulatory network for a single TF [ 39 ] . 
+ Even though it provides more conﬁdent network , it could be used only if both ChIP-X and transcriptome data of perturbation are available for the same TF . 
+ However , only a limited number of TFs have both omics data . 
+ Moreover , ChIP-Array constructs network for only one TF , although in most cases , target genes are regulated by multiple TFs . 
+ Unlike ChIP-Array , transcriptome data used by the Lp ( p = 1,1 / 2,0 ) regularization models are not necessary to be the paired transcriptome data obtained from the perturbation of the same TF of ChIP-X experiment . 
+ Moreover , the Lp ( p = 1,1 / 2,0 ) regularization models consider multiple TFs at the same time to infer more comprehensive gene regulatory network in a genome-wide scale . 
+ To assess how three models rely on the ChIP-X data , we tested their performance with different initial X0 s . 
+ A series of initial X0 s are made up of ChIP-X deﬁned TF-target relationships with different distance cutoffs between a TF binding site and a potential target gene from 200 bp to 50 kbp , which are commonly used in biological studies ( Table 5 ) . 
+ After the TF binding sites are detected from ChIP-X data , a gene that locates closely to a TF binding site is usually considered as a potential target of the TF . 
+ However , proximity may not always indicate a true target , because the TF may bind on a distal enhancer , it may have other unknown function rather than transcription regulation , or sometimes multiple genes are close to a single TF binding site . 
+ Thus a large portion of potential targets deﬁned by only ChIP-X data are false positives . 
+ Table 5 has shown that only 12.468-16 .277 % of potential targets deﬁned by ChIP-X data alone are true targets that are veriﬁed by the TF knockdown/overexpression experiments ( HGS ) . 
+ When a shorter distance cutoff is used , fewer genes will be deﬁned as potential targets in the initial X0 , less false targets , but some true targets in a longer distance may be lost . 
+ When a longer distance cutoff is chosen , more true targets will be included , but false targets will be increased also . 
+ With different initial X0 s , the L0 and L1/2 regularization models consistently outperformed the L1 regularization model ( Table 5 ) . 
+ Even though highthroughput golden standard shares the ChIP-X data with the initial X0 s , the large proportions of false targets in the initial X0 s show that ChIP-X data alone could not infer the true targets accurately . 
+ PCC values between TF and target expression proﬁles in the initial X0 s were used to indicate the possible regulatory effects of the TFs on the targets ( activated or repressed ) , however , using PCC value in the initial X0 s as the predictor to classify true targets in those potential targets deﬁned by ChIP-X data resulted in very low AUCs ( 0.530 -- 0.538 , Table 5 ) . 
+ Besides , least norm minimization can provide the solution having the smallest L2 norm in all the possible solutions , which is commonly used as the initial point for the algorithms for solving the regularization problems of sparse optimization [ 22 ] . 
+ We created initial X0 s via least norm minimization from transcriptome data , then performed the applied iterative thres-holding algorithms for the Lp ( p = 0,1 / 2,1 ) regularization models and evaluated results with the same methods . 
+ Three regularization models , L0 , L1/2 and L1 , obtained AUCs of 0.529 , 0.531 and 0.498 with high-throughput golden standard , and AUCs of 0.681 , 0.672 and 0.629 with low-throughput golden standard , respectively . 
+ Consistent with [ 22 ] , the L0 , L1/2 regularization models are slightly better than L1 regularization model . 
+ However , the results starting from L2 norm solution are still much worse than those starting from the ChIP-X data . 
+ Thus , either ChIP-X or transcriptome data alone can not achieve satisfactory accuracy , and the L0 and L1/2 regularization models did improve the performance of gene regulatory network inference from integrating ChIP-X and transcriptome data . 
+ Recursive optimization of these three models iteratively moves the initial X0 to the solution with minimum value of error between : ; j AX : ; j and B : ; j . 
+ When sample size is smaller than the number of factors , the solution is not unique . 
+ Without prior knowledge , X : ; j reaches one of the solutions that are close to the artiﬁcially assigned initial X0 , like 0 . 
+ Thus , even though the models achieve a : ; j small error , which is mathematically profound ; the non-unique-ness of the solution makes it biologically contradictory , i.e. it may not be the true biological solution we want . 
+ Integrating ChIP-X data provides partial knowledge of the gene regulatory network . 
+ The solutions that are close to the initial X0 deﬁned by ChIP - : ; j X are biologically more meaningful . 
+ Thus the performances of these three models improve after ChIP-X data are incorporated . 
+ Since the L1 regularization model is convex , its local minimizer is also the global minimizer . 
+ Thus different initial X0 s have less inﬂu-ence on the solution of L1 regularization model ( Table 5 ) . 
+ However , the L0 and L1/2 regularization models are non-convex , the corresponding algorithms only converge to some local minimizers [ 41 ] . 
+ Here , we have shown that these local minimizers of the L0 and L1/2 regularization models obtained from integrative data are much closer to the biological solutions we expect than those of the L1 regularization model ( Table 5 ) . 
+ 4. Conclusion
+ In this study , we apply the L0 and L1/2 regularization models to gene regulatory network inference from integrative omics data in large genome with a small number of samples . 
+ Integrating ChIP-X data with transcriptome proﬁles signiﬁcantly improves the performance of network inference . 
+ Compared with the commonly used L1 regularization model , the L0 and L1/2 regularization models have much higher accuracy for integrative omics data . 
+ We evaluated the inferred networks with both high-throughput and low-throughput golden standards . 
+ The L0 and L1/2 regularization models consistently outperformed the L1 regularization model for integrative omics data . 
+ Besides , the algorithms we applied here are computationally efﬁcient and can be executed by a personal computer within one hour . 
+ In summary , we have demonstrated that the L0 and L1/2 regularization models are applicable to gene regulatory network inference in biological researches that study higher organisms but generate only a small number of omics data , and facilitate biologists to analyze gene regulation at whole system level . 
+ Funding
+ This work was supported by funding from the Research Grants Council , Hong Kong SAR , China ( Grant number 781511M ) , National Natural Science Foundation of China , China ( Grant numbers 91229105 and 11101186 ) . 
+ Reference
+ [ 1 ] D. Marbach , J.C. Costello , R. Kuffner , N.M. Vega , R.J. Prill , D.M. Camacho , K.R. Allison , M. Kellis , J.J. Collins , G. Stolovitzky , Nat . 
+ Methods 9 ( 2012 ) 796 -- 804 . 
+ [ 2 ] A.A. Margolin , I. Nemenman , K. Basso , C. Wiggins , G. Stolovitzky , R. Dalla Favera , A. Califano , BMC Bioinf . 
+ 7 ( Suppl 1 ) ( 2006 ) S7 . 
+ [ 3 ] A.J. Butte , I.S. Kohane , Pac . 
+ Symp . 
+ Biocomput . 
+ ( 2000 ) 418 -- 429 . 
+ [ 4 ] A. de la Fuente , N. Bing , I. Hoeschele , P. Mendes , Bioinformatics 20 ( 2004 ) 3565 -- 3574 . 
+ [ 5 ] X. Zhang , X.M. Zhao , K. He , L. Lu , Y. Cao , J. Liu , J.K. Hao , Z.P. Liu , L. Chen , Bioinformatics 28 ( 2012 ) 98 -- 104 . 
+ [ 6 ] H.K. Yalamanchili , B. Yan , M.J. Li , J. Qin , Z. Zhao , F.Y. Chin , J. Wang , Bioinformatics 30 ( 2014 ) 377 -- 383 . 
+ [ 7 ] K. Basso , A.A. Margolin , G. Stolovitzky , U. Klein , R. Dalla-Favera , A. Califano , Nat . 
+ Genet . 
+ 37 ( 2005 ) 382 -- 390 . 
+ [ 8 ] A.A. Margolin , K. Wang , W.K. Lim , M. Kustagi , I. Nemenman , A. Califano , Nat . 
+ Protoc . 
+ 1 ( 2006 ) 662 -- 671 . 
+ [ 9 ] E.P. van Someren , B.L. Vaes , W.T. Steegenga , A.M. Sijbers , K.J. Dechering , M.J. Reinders , Bioinformatics 22 ( 2006 ) 477 -- 484 . 
+ [ 10 ] X. Zhang , K. Liu , Z.P. Liu , B. Duval , J.M. Richer , X.M. Zhao , J.K. Hao , L. Chen , Bioinformatics 29 ( 2013 ) 106 -- 113 . 
+ [ 11 ] I. Daubechies , M. Defrise , C. De Mol , Commun . 
+ Pur . 
+ Appl . 
+ Math . 
+ 57 ( 2004 ) 1413 -- 1457 . 
+ [ 12 ] M.A.T. Figueiredo , R.D. Nowak , S.J. Wright , IEEE J.-Stsp 1 ( 2007 ) 586 -- 597 . 
+ [ 13 ] J.F. Yang , Y. Zhang , Siam J. Sci . 
+ Comput . 
+ 33 ( 2011 ) 250 -- 278 . 
+ [ 14 ] A.C. Haury , F. Mordelet , P. Vera-Licona , J.P. Vert , BMC Syst . 
+ Biol . 
+ 6 ( 2012 ) 145 . 
+ [ 15 ] B. Efron , T. Hastie , I. Johnstone , R. Tibshirani , Ann . 
+ Stat . 
+ 32 ( 2004 ) 407 -- 451 . 
+ [ 16 ] R. Tibshirani , J. R. Stat . 
+ Soc . 
+ B : Met . 
+ 58 ( 1996 ) 267 -- 288 . 
+ [ 17 ] R. Tibshirani , J. R. Stat . 
+ Soc . 
+ B 73 ( 2011 ) 273 -- 282 . 
+ [ 18 ] M.K.S. Yeung , J. Tegner , J.J. Collins , Proc . 
+ Natl. Acad . 
+ Sci . 
+ USA 99 ( 2002 ) 6163 -- 6168 . 
+ [ 19 ] Y. Wang , T. Joshi , X.S. Zhang , D. Xu , L.N. Chen , Bioinformatics 22 ( 2006 ) 2413 -- 2420 
+ [ 20 ] R. Bonneau , D.J. Reiss , P. Shannon , M. Facciotti , L. Hood , N.S. Baliga , V. Thorsson , Genome Biol . 
+ 7 ( 2006 ) . 
+ [ 21 ] V. Belcastro , V. Siciliano , F. Gregoretti , P. Mithbaokar , G. Dharmalingam , S. Berlingieri , F. Iorio , G. Oliva , R. Polishchuck , N. Brunetti-Pierri , D. di Bernardo , Nucleic Acids Res . 
+ 39 ( 2011 ) 8677 -- 8688 . 
+ [ 22 ] R. Chartrand , V. Staneva , Inverse Prob . 
+ 24 ( 2008 ) . 
+ [ 23 ] Z.B. Xu , X.Y. Chang , F.M. Xu , H. Zhang , IEEE Trans . 
+ Neural Net . 
+ Lear 23 ( 2012 ) 1013 -- 1027 . 
+ [ 24 ] T. Zhang , J. Mach . 
+ Learn Res . 
+ 11 ( 2010 ) 1081 -- 1107 . 
+ [ 25 ] D. Marbach , S. Roy , F. Ay , P.E. Meyer , R. Candeias , T. Kahveci , C.A. Bristow , M. Kellis , Genome Res . 
+ 22 ( 2012 ) 1334 -- 1349 . 
+ [ 26 ] C.G. de Boer , T.R. Hughes , Nucleic Acids Res . 
+ 40 ( 2012 ) D169 -- D179 . 
+ [ 27 ] X. Chen , H. Xu , P. Yuan , F. Fang , M. Huss , V.B. Vega , E. Wong , Y.L. Orlov , W. Zhang , J. Jiang , Y.H. Loh , H.C. Yeo , Z.X. Yeo , V. Narang , K.R. Govindarajan , B. Leong , A. Shahab , Y. Ruan , G. Bourque , W.K. Sung , N.D. Clarke , C.L. Wei , H.H. Ng , Cell 133 ( 2008 ) 1106 -- 1117 . 
+ [ 28 ] A. Marson , S.S. Levine , M.F. Cole , G.M. Frampton , T. Brambrink , S. Johnstone , M.G. Guenther , W.K. Johnston , M. Wernig , J. Newman , J.M. Calabrese , L.M. Dennis , T.L. Volkert , S. Gupta , J. Love , N. Hannett , P.A. Sharp , D.P. Bartel , R. Jaenisch , R.A. Young , Cell 134 ( 2008 ) 521 -- 533 . 
+ [ 29 ] L. Zhang , X. Ju , Y. Cheng , X. Guo , T. Wen , BMC Syst . 
+ Biol . 
+ 5 ( 2011 ) 152 . 
+ [ 30 ] N. Novershtern , A. Regev , N. Friedman , Bioinformatics 27 ( 2011 ) i177 -- i185 . 
+ [ 31 ] R. Chartrand , IEEE Signal Process Lett . 
+ 14 ( 2007 ) 707 -- 710 . 
+ [ 32 ] T. Blumensath , M.E. Davies , J. Fourier Anal . 
+ Appl . 
+ 14 ( 2008 ) 629 -- 654 . 
+ [ 34 ] A. Nishiyama , L. Xin , A.A. Sharov , M. Thomas , G. Mowrer , E. Meyers , Y. Piao , S. Mehta , S. Yee , Y. Nakatake , C. Stagg , L. Sharova , L.S. Correa-Cerro , U. Bassey , H. Hoang , E. Kim , R. Tapnio , Y. Qian , D. Dudekula , M. Zalzman , M. Li , G. Falco , H.T. Yang , S.L. Lee , M. Monti , I. Stanghellini , M.N. Islam , R. Nagaraja , I. Goldberg , W. Wang , D.L. Longo , D. Schlessinger , M.S. Ko , Cell Stem Cell 5 ( 2009 ) 420 -- 433 . 
+ [ 35 ] L.S. Correa-Cerro , Y. Piao , A.A. Sharov , A. Nishiyama , J.S. Cadet , H. Yu , L.V. Sharova , L. Xin , H.G. Hoang , M. Thomas , Y. Qian , D.B. Dudekula , E. Meyers , B.Y. Binder , G. Mowrer , U. Bassey , D.L. Longo , D. Schlessinger , M.S. Ko , Sci . 
+ Rep. 1 ( 2011 ) 167 . 
+ [ 36 ] A. Nishiyama , A.A. Sharov , Y. Piao , M. Amano , T. Amano , H.G. Hoang , B.Y. Binder , R. Tapnio , U. Bassey , J.N. Malinou , L.S. Correa-Cerro , H. Yu , L. Xin , E. Meyers , M. Zalzman , Y. Nakatake , C. Stagg , L. Sharova , Y. Qian , D. Dudekula , S. Sheer , J.S. Cadet , T. Hirata , H.T. Yang , I. Goldberg , M.K. Evans , D.L. Longo , D. Schlessinger , M.S. Ko , Sci . 
+ Rep. 3 ( 2013 ) 1390 . 
+ [ 37 ] J. Feng , T. Liu , B. Qin , Y. Zhang , X.S. Liu , Nat . 
+ Protoc . 
+ 7 ( 2012 ) 1728 -- 1740 . 
+ [ 38 ] H. Ji , H. Jiang , W. Ma , W.H. Wong , Current protocols in bioinformatics/editoral board , Andreas D. Baxevanis , et al. , Chapter 2 ( 2011 ) Unit2 13 . 
+ [ 39 ] J. Qin , M.J. Li , P. Wang , M.Q. Zhang , J. Wang , Nucleic Acids Res . 
+ 39 ( 2011 ) W430 -- W436 . 
+ [ 40 ] T. Blumensath , M.E. Davies , Appl . 
+ Comput . 
+ Harmon A 27 ( 2009 ) 265 -- 274 . 
+ [ 41 ] Y.H. Hu , C. Li , X.Q. Yang , J. Mach . 
+ Learn . 
+ Res . 
+ ( submitted for publication ) , http://www.acad.polyu.edu.hk/~mayangxq/GPA-SO.pdf
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/24743342.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/24743342.txt 0 → 100644
View file @27818a9
+ Genome-Wide Profiling of Yeast DNA:RNA Hybrid Prone
+ Abstract 
+ Funding : PH is a senior fellow of the Canadian Institute For Advanced Research ( CIFAR ) and acknowledges support from the National Institutes of Health ( operating grant R01 : 4R01CA158162 ) and Canadian Institutes of Health Research ( CIHR operating grant MOP-38096 ) . 
+ MSK is a Senior Fellow of CIFAR and is funded by CIHR operating grant MOP-119383 and MOP-119372 . 
+ YAC acknowledges scholarship support from the Natural Sciences and Engineering Research Council of Canada , as well as the Roman M. Babicki Fellowship in Medical Research . 
+ MJA and PYTL were supported by Frederick Banting and Charles Best Canada graduate scholarships from CIHR . 
+ PCS was a fellow of the Terry Fox Foundation ( # 700044 ) , the Michael Smith Foundation for Health Research and is currently supported by the Cancer Research Society . 
+ The funders had no role in study design , data collection and analysis , decision to publish , or preparation of the manuscript . 
+ Introduction
+ Elevated DNA : RNA hybrid formation due to defects in RNA processing pathways leads to genome instability and replication stress across species [ 1 -- 7 ] . 
+ R loops threaten genome stability and often form under abnormal conditions where nascent mRNA is improperly processed or RNA half-life is increased , resulting in RNA that can hybridize with template DNA , displacing the nontranscribed DNA strand [ 8 ] . 
+ A recent study also found that hybrid formation can occur in trans via Rad51-mediated DNA-RNA strand exchange [ 9 ] . 
+ Persistent R loops pose a major threat to genome stability through two mechanisms . 
+ First , the exposed nontranscribed strand is susceptible to endogenous DNA damage due to the increased exposure of chemically reactive groups . 
+ The second , more widespread mechanism , identified in Escherichia coli , Saccharomyces cerevisiae , Caenorhabditis elegans and human cells , involves the R loops and associated stalled transcription complexes , which block DNA replication fork progression [ 3,4,8,10,11 ] . 
+ R loop-mediated instability is an area of great interest primarily because genome instability is considered an enabling characteristic of tumor formation [ 12 ] . 
+ Moreover , mutations in RNA splicing / processing factors are frequently found in human cancer , heritable diseases like Aicardi-Goutieres syndrome , and a degenerative ataxia associated with Senataxin mutations [ 13 -- 17 ] . 
+ To avoid the deleterious effects of R loops , cells express enzymes for the removal of abnormally formed DNA : RNA hybrids . 
+ In S. cerevisiae , RNH1 and RNH201 , each encoding RNase H are responsible for one of the best characterized mechanisms for reducing R loop formation by enzymatically degrading the RNA in DNA : RNA hybrids [ 8 ] . 
+ Another extensively studied anti-hybrid factor is the THO/TREX complex which functions to suppress hybrid formation at the level of transcription termination and mRNA packaging [ 4,11,18,19 ] . 
+ In addition , the Senataxin helicase , yeast Sen1 , plays an important role in facilitating replication fork progress through transcribed regions and unwinding RNA in hybrids to mitigate R loop formation and RNA polymerase II transcription-associated genome instability [ 5,20 ] . 
+ Several additional anti-hybrid mechanisms have also been identified including topoisomerases and other RNA processing factors [ 2,6,7,9,21 -- 23 ] . 
+ To add to the complexity of DNA : RNA hybrid management in the cell , hybrids also occur naturally and have important biological functions [ 24 ] . 
+ In human cells , R loop formation facilitates immunoglobulin class switching , protects against DNA methylation at CpG island promoters and plays a key role in pause site-dependent transcription termination [ 25 -- 28 ] . 
+ Transcription of telomeres by RNA polymerase II also produces telomeric repeatcontaining RNAs ( TERRA ) , which associate with telomeres and inhibit telomere elongation in a DNA : RNA hybrid-dependent fashion [ 29 -- 31 ] . 
+ Noncoding ( nc ) RNA such as antisense transcripts , perform a regulatory role in the expression of sense transcripts that may involve R loops [ 32 ] . 
+ The proposed mechanisms of antisense transcription regulation are not clearly understood and involve different modes of action specific to each locus . 
+ Current models include chromatin modification resulting from antisense-associated transcription , antisense transcription modulation of transcription regulators , collision of sense and antisense transcription machineries and antisense transcripts expressed in trans interacting with the promoter for sense transcription [ 32 -- 40 ] . 
+ More recently , studies in Arabidopsis thaliana found an antisense transcript that forms R loops , which can be differentially stabilized to modulate gene regulation [ 41 ] . 
+ Similarly , in mouse cells the stabilization of an R loop was shown to inhibit antisense transcription [ 42 ] . 
+ Here we describe , for the first time , a genome-wide profile of DNA : RNA hybrid prone loci in S. cerevisiae by DNA : RNA immunoprecipitation followed by hybridization on tiling micro-arrays ( DRIP-chip ) . 
+ We found that DNA : RNA hybrids occurred at highly transcribed regions in wild type cells , including some identified in previous studies . 
+ Remarkably , we observed that DNA : RNA hybrids were significantly associated with genes that have corresponding antisense transcripts , suggesting a role for hybrid formation at these loci in gene regulation . 
+ Consistently , we found that genes whose expression was altered by overexpression of RNase H were also significantly associated with antisense transcripts . 
+ A small-scale cytological screen found that diverse 
+ RNA processing mutants had increased hybrid formation and additional DRIP-chip studies revealed specific hybrid-site biases in the RNase H , Sen1 and THO complex subunit Hpr1 mutants . 
+ These genome-wide analyses enhance our understanding of DNA : RNA hybrid-forming regions in vivo , highlight the role of cellular RNA processing activities in suppressing hybrid formation , and implicate DNA : RNA hybrids in control of a subset of antisense regulated loci . 
+ Results
+ The genomic distribution of DNA:RNA hybrids
+ DNA : RNA hybrids have been previously immunoprecipitated at specific genomic sites such as rDNA , selected endogenous loci , and reporter constructs [ 2,5 ] . 
+ Subsequently , DRIP coupled with deep sequencing in human cells has demonstrated the prevalence of R loops at CpG island promoters with high GC skew [ 26 ] . 
+ To investigate the global profile of DNA : RNA hybrid prone loci in a tractable model , we performed genome-wide DRIP-chip analysis of wild type S. cerevisiae ( ArrayExpress E-MTAB-2388 ) using the S9 .6 monoclonal antibody which specifically binds DNA : RNA hybrids , as characterized previously [ 43,44 ] . 
+ DRIP-chip profiles were generated in duplicate ( spearman 's r = 0.78 when comparing each of over 2 million probes after normalization and data smoothing , Supplementary Figure S1 ) and normalized to a no antibody control . 
+ Overall , our DRIP-chip profiles identified several previously reported DNA : RNA hybrid prone sites including the rDNA locus and telomeric repeat regions ( Figure 1 , Supplementary Tables S1 , S2 ) [ 2,29 -- 31 ] . 
+ DNA : RNA hybrids were also observed at 1217 open reading frames ( ORFs ) ( containing greater than 50 % of probes above the threshold of 1.5 and found in both wild type replicates ) ( Supplementary Table S3 ) . 
+ These were generally shorter in length than average ( p = 4.29 e ) , highly transcribed 258 26 ( Wilcoxon rank sum test p = 2.21 e ) , and had higher GC content 250 ( p = 2.52 e ) ( Figure 2A , 2B and 2C , Supplementary Figure S2 ) . 
+ Importantly , despite the correlation between DNA : RNA hybrid association and transcriptional frequency , the wild type DRIP-chip profiles compared to the localization profile of the RNA polymerase II subunit Rpb3 revealed very low correlation ( r = 0.0097 ; [ 45 ] ) . 
+ This suggests that the DRIP-chip method was not unduly biased towards the short DNA : RNA hybrids that could theoretically have been captured within active transcription bubbles . 
+ Importantly , because genes with high GC content also have high transcriptional frequencies ( Supplementary Figure S3 ) , it is not clear from our findings whether GC content or transcriptional frequency contributed more to DNA : RNA hybrid forming potential . 
+ Furthermore , we observe that DNA : RNA hybrid prone loci do not encode for mRNA transcripts with particularly long half-lives ( Supplementary Figure S2D ) , suggesting that the act of transcription is vital to DNA : RNA hybrid formation and supporting the notion of co-transcriptional hybrid formation as the major source of endogenous DNA : RNA hybrids . 
+ Our data also revealed DNA : RNA hybrids highly associated with Ty1 and Ty2 subclasses of retrotransposons ( Figure 2E , Supplementary Table S4 ) . 
+ Consistent with our findings at ORFs , the levels of DNA : RNA hybrids correspond well with the known levels of expression of these elements . 
+ In general , Ty1 which constitutes one of the most abundant transcripts in the cell has the highest levels of DNA : RNA hybrids . 
+ Ty3 and Ty4 that are only slightly expressed have much lower levels of hybrids , and the lone Ty5 retrotransposon which is transcriptionally silent is not enriched for DNA : RNA hybrids ( Figure 2E ) ( [ 46 -- 48 ] ) . 
+ In contrast to the trends observed with ORFs , GC content in retrotransposons is not highly correlated with the levels of expression , suggesting that expression is the main contributor to 
+ DNA : RNA hybrid formation . 
+ Specifically , Ty3 retrotransposons have the highest GC content but have only modest levels of expression and DNA : RNA hybrids . 
+ DNA : RNA hybrids are significantly correlated with genes associated with antisense transcripts 
+ Certain DNA : RNA hybrid enriched regions identified by our DRIP-chip analysis such as rDNA and retrotransposons are associated with antisense transcripts [ 49,50 ] . 
+ Therefore , we checked if this was a common feature of DNA : RNA prone sites by comparing our list of DNA : RNA prone loci to a list of antisense-associated genes ( [ 51 ] ) . 
+ Because the expression of antisense-associated transcripts may be highly dependent on environmental conditions , we based our analysis on a list of transcripts identified in S288c yeast grown to mid-log phase in rich media which most closely mirrors the growth conditions of our cultures analyzed by DRIP-chip ( [ 51 ] ) . 
+ DNA : RNA hybrid enriched genes significantly overlapped with antisense-associ-ated genes , suggesting that DNA : RNA hybrids may play a role in antisense transcript-mediated regulation of gene expression ( Fisher 's exact test p = 1.03 e ) ( Figure 3A , 3B and 3C , 212 Supplementary Table S5 ) . 
+ RNase H overexpression reduces detectable levels of DNA : RNA hybrids in cytological screens and suppresses genomic instability associated with R loop formation presumably through the degradation of DNA : RNA hybrids [ 7,52,53 ] . 
+ To test for a potential role of DNA : RNA hybrids in antisense-mediated gene regulation , we performed gene expression microarray analysis of an RNase H overexpression strain compared to an empty vector control ( GEO GSE46652 ) . 
+ This identified genes that had increased mRNA levels ( upregulated n = 212 ) or decreased mRNA levels ( downregulated n = 88 ) as a result of RNase H overexpression . 
+ A significant portion of the genes with increased mRNA levels were antisense-associated ( Fisher exact test p = 2.9 e ) 27 ( Figure 3D , Supplementary Table S5 ) and tended to have high GC content , similar to DNA : RNA hybrid enriched genes in wild type ( Supplementary Figure S4 ) . 
+ However , the genes with increased mRNA levels under RNase H overexpression and the antisense-associated genes enriched for DNA : RNA hybrids in our DRIP experiment both tended towards lower transcriptional frequencies ( Figure 3E ) . 
+ These findings suggest that antisense-associated DNA : RNA hybrids moderate the levels of gene expression . 
+ Indeed , genes that were both modulated by RNase H overexpression and enriched for DNA : RNA hybrids were all found to be antisense-associated ( Figure 3F ) . 
+ The mechanism underlying altered gene expression in cells overexpressing RNase H remains unclear . 
+ While the association with antisense transcription is compelling , alternative models exist . 
+ One possibility is that the stress of RNase H overexpression triggers gene expression programs that coincidentally are antisense regulated . 
+ We analyzed gene ontology ( GO ) terms enriched among genes whose expression was changed by RNase H overexpression . 
+ Consistent with previous work , genes for iron uptake and incorporation were strongly activated by RNase H overexpression ( p = 2.21 e ) ( Figure 4A , Supplementary 212 Table S6 ) and several of these iron transport genes ( i.e. FRE4 , FRE2 , FRE3 , FET3 , FET4 ) are antisense-associated ( [ 51,54 ] ) suggesting that overexpression of RNase H activates transcription of these genes by perturbing antisense-mediated regulation . 
+ Alternatively , changes in RNase H levels may increase the cellular iron requirements since sensitivity to low iron concentration is associated with DNA damage and repair [ 55 ] . 
+ To test this alternative hypothesis , we tested the RNase H deletion and sen1-1 mutants for sensitivity to low iron conditions compared to a fet3D positive control ( Figure 4B ) . 
+ The sen1-1 mutant , RNase H depletion or overexpression did not induce sensitivity to low iron ruling out the possibility that the transcriptional response in cells overexpressing RNase H was a result of cellular iron requirement . 
+ Collectively , our DRIP-chip and microarray analysis suggest that DNA : RNA hybrids may be an important player in antisensemediated gene regulation . 
+ Cytological profiling of RNA processing mutants for R loop formation
+ Transcription-coupled DNA : RNA hybrids have been shown to accumulate in a diverse set of transcription and RNA processing mutants involved in a wide range of transcription related processes ( Table 1 ) . 
+ To gain a broader understanding of factors involved in R loop formation , we performed a cytological screen of RNA processing , transcription and chromatin modification mutants for 
+ DNA : RNA hybrids using the S9 .6 antibody . 
+ Importantly , previous work in our lab has shown that all of the mutants screened exhibit chromosome instability ( CIN ) , which would be consistent with increased hybrid formation [ 53 ] . 
+ Significantly elevated hybrid levels were found in 22 of the 40 mutants tested compared to wild type , including a SUB2 mutant which has been previously linked to R loop formation ( Figure 5 , [ 4 ] ) . 
+ We also assayed some of the well-characterized R-loop forming mutants , RNase H , Sen1 and Hpr1 , as positive controls for elevated DNA : RNA hybrid levels ( Figure 5 ) . 
+ In our screen , we detected hybrids in mutants affecting several pathways linked to DNA : RNA hybrid formation such as transcription , nuclear export and the exosome ( Figure 5 , Table 1 ) . 
+ Consistent with findings in metazoan cells , we also observed hybrid formation in some splicing mutants ( Figure 5 , Table 1 ; [ 56 ] ) . 
+ Several rRNA processing mutants were enriched for DNA : RNA hybrids ( 7 out of the 22 positive hits ) , likely due to DNA : RNA hybrid accumulation at rDNA genes , a sensitized hybrid formation site ( Figure 1 ; [ 2 ] ) . 
+ It is possible that , as seen in mRNA cleavage and polyadenylation mutants , DNA : RNA hybrid formation may contribute to their CIN phenotypes [ 6 ] . 
+ Currently , there are 52 yeast genes whose disruptions have been found to lead to DNA : RNA hybrid accumulation , 21 of which were newly identified by our screen ( Table 1 ) . 
+ The success of this small-scale screen suggests that most RNA processing pathways suppress hybrid formation to some degree and that many DNA : RNA hybrid forming mutants remain undiscovered . 
+ DRIP-chip profiling of R loop forming mutants
+ To better understand the mechanism by which cells regulate DNA : RNA hybrids , we performed DRIP-chip analysis of rnh1Drnh201D , hpr1D , and sen1-1 mutants in order to determine if these contribute differentially to the DNA : RNA hybrid genomic profile . 
+ The rnh1Drnh201D , hpr1D , and sen1-1 mutants are particularly interesting because they have well established roles in the regulation of transcription dependent DNA : RNA hybrid formation . 
+ Our DRIP-chip profiles revealed that , similar to wild type profiles , the mutant profiles were enriched for DNA : RNA hybrids at rDNA , telomeres , and retrotransposons ( Figure 6 , Supplementary Tables S1 , S2 , S3 ) . 
+ The rnh1Drnh201D , hpr1D , and sen1-1 mutants also exhibited DNA : RNA hybrid enrichment in 1206 , 1490 and 1424 ORFs respectively compared to the 1217 DNA : RNA hybrid enriched ORFs identified in wild type ( Supplementary Table S4 ) . 
+ Interestingly , in addition to the similarities described above , our profiles also identified differential effects of the mutants on the levels of DNA : RNA hybrids . 
+ In particular , we observed that deletion of HPR1 resulted in higher levels of DNA : RNA hybrids along the length of most ORFs with a preference for longer genes compared to wild type ( Figure 7A , 7B and 7C ) . 
+ This observation is consistent with Hpr1 's role in bridging transcription elongation to mRNA export and its localization at actively transcribed genes ( [ 4,57 -- 59 ] ) . 
+ In contrast , mutating SEN1 resulted in higher levels of DNA : RNA hybrids at shorter genes ( Figure 7A and 7B ) , which is consistent with Sen1 's role in transcription termination particularly for short proteincoding genes ( [ 5,60,61 ] ) . 
+ The rnh1Drnh201D mutant revealed higher levels of DNA : RNA hybrids at highly transcribed and longer genes ( Figure 7A and 7B ) which is supported by a wealth of evidence of RNase H 's role in suppressing R loops in long genes to prevent collisions between transcription and replication 
+ Further inspection of our profiles also revealed that rnh1Drnh201D and sen1-1 mutants but not the hpr1D mutant had increased DNA : RNA hybrids at tRNA genes ( two tailed unpaired Wilcox test p = 1.56 e in the rnh1Drnh201D mutant and 219 1.68 e in the sen1-1 mutant ) ( Figure 8A , 8B and 8C , 215 Supplementary Table S7 ) and this was confirmed by DRIP-quantitative PCR ( qPCR ) of two tRNA genes in wild type and rnh1Drnh201D ( Supplementary Figure S5 ) . 
+ Because tRNAs are transcribed by RNA polymerase III , this observation indicates that Hpr1 is primarily involved in the regulation of RNA polymerase II specific DNA : RNA hybrids while RNase H and Sen1 have roles in a wider range of transcripts . 
+ Mutation of SEN1 also led to increased levels DNA : RNA hybrids at snoRNA ( two tailed unpaired Wilcox test p = 1.81 e ) ( Figure 8D , 8E and 8F , 26 Supplementary Table S8 ) consistent with its role in 39 end processing of snoRNAs ( [ 63 ] ) . 
+ Discussion
+ The genomic profile of DNA:RNA hybrids
+ Identifying the landscape of genomic loci predisposed to DNA : RNA hybrids is of fundamental importance to delineating mechanisms of hybrid formation and the contributions of various cellular pathways . 
+ Although our profiles depend on the specificity of the anti-DNA : RNA hybrid S9 .6 monoclonal antibody , this aspect has been well characterized [ 44 ] and several of our observations are consistent with what has been reported in the literature . 
+ Locus specific tests showed that DNA : RNA hybrids occur more frequently at genes with high transcriptional frequency and GC content [ 4,5,18 ] . 
+ Moreover , in rnh201D cells , there is an inverse relationship between GC content and gene expression levels , suggesting that DNA : RNA hybrids accumulate at regions of high GC content and block transcription in the absence of RNase H [ 64 ] . 
+ Our work extends the knowledge of DNA : RNA hybrids from a few locus-specific observations to show that , in wild type , there are potentially hundreds of hybrid prone genes that tend to be shorter in length , frequently transcribed and high in GC content [ 2,4,56 ] . 
+ The latter is consistent with recent studies in human cells that demonstrated that genomic regions with high GC skew are prone to R loop formation , which plays a regulatory role in DNA methylation [ 26,27 ] . 
+ However , while we determined the relationship between GC content and DNA : RNA hybrid formation , we were unable to do the same analysis for GC skew , likely due to the low level of GC skew and lack of DNA methylation in Saccharomyces . 
+ This is unsurprising since the best characterized functional element associated with GC skew , CpG island promoters [ 26,27 ] , are not found in yeast . 
+ Importantly , our findings at retrotransposons support the notion that expression levels and not GC content contribute more to DNA : RNA hybrid forming potential . 
+ Additionally , DRIP-chip analysis of wild type cells identified hybrid enrichment at rDNA , retrotransposons , and telomeric regions . 
+ Along with previous studies , our DRIP-chip analysis confirms that rDNA is a hybrid prone genomic site and suggests that many factors of rRNA processing and ribosome assembly suppress potentially damaging rDNA : rRNA hybrid formation [ 2,7 ] . 
+ The presence of TERRA-DNA hybrids at telomeres is supported by our observation of significant hybrid signal at telomeric repeat regions across all DRIP-chip experiments . 
+ Antisense association of DNA:RNA hybrids
+ The DRIP-chip dataset is a resource for future studies seeking to elucidate the localization of DNA : RNA hybrids across antisense-associated regions and the impact of DNA : RNA hybrid removal on genome-wide transcription . 
+ We observed that genes associated with antisense transcripts were significantly enriched for 
+ DNA : RNA hybrids and modulated at the transcript level by RNase H overexpression . 
+ Antisense regulation has been reported at mammalian rDNA and yeast Ty1 retrotransposons , loci that were also enriched for DNA : RNA hybrids in our DRIP-chip [ 49,50 ] . 
+ The role of DNA : RNA hybrids and RNase H in antisense regulation is currently unclear . 
+ However , there are several non-exclusive models of antisense gene regulation . 
+ One model proposes that the physical presence of the antisense transcripts is crucial to antisense gene regulation . 
+ For instance , trans-acting antisense transcripts have been shown to control Ty1 retrotransposon transcription , reverse transcription and retrotransposition [ 65 ] . 
+ Another study has further shown that trans-acting antisense transcripts that only overlap with the sense strand promoter can block sense transcription , potentially by hybridizing with the nontemplate DNA strand [ 33 ] . 
+ These suggest that antisense transcription in cis is not necessary as long as the antisense transcript is present . 
+ It is possible that DNA : RNA hybrids may be formed by the antisense or the sense transcript with genomic DNA . 
+ Moreover , DNA : RNA hybrids may play a functional role in antisense transcription regulation as shown by antisense-associated genes both enriched for DNA : RNA hybrids and affected transcriptionally by RNase H overexpression . 
+ Experiments comparing the ratio of antisense versus sense transcripts and determining the amount of DNA : RNA hybrid formation by either transcript under conditions known to regulate the particular gene will further elucidate the role of RNase H and DNA : RNA hybrids in antisense regulation . 
+ DRIP-chip analysis of hybrid-resolving mutants
+ Our investigation of mutant-specific DNA : RNA hybrid formation sites is consistent with the existing literature on Hpr1 , Sen1 and RNase H. Significantly , the hpr1D and rnh1Drnh201D mutants exhibited increased DNA : RNA hybrid levels along the length of long genes , while the sen1-1 mutant exhibited increased 
+ DNA : RNA hybrid levels along the length of short genes ( Figure 7A ) . 
+ This coheres with Hpr1 's function in transcription elongation and mRNA export , and RNase H 's role in preventing transcription apparatus and replication fork collisions , which carry greater consequence for long genes ( [ 4,57 -- 59,62 ] ) . 
+ In contrast , Sen1 is particularly important for transcription termination at short genes ( [ 61 ] ) . 
+ In addition , the RNase H deletion and sen1-1 mutants had increased hybrids at tRNA genes , suggesting that they are both required to prevent tRNA : DNA hybrid accumulation . 
+ Interestingly , a recent study found that the mRNA levels of genes encoding RNA polymerase III and proteins that modify tRNA are increased in an rnh1Drnh201D mutant [ 64 ] , which may be in response to a lack of properly processed tRNA transcripts . 
+ The finding that both tRNA and snoRNA genes were enriched for hybrids in sen1-1 highlights the role of Sen1 in RNA polymerase I , II and III transcription termination and transcript maturation [ 60,63,66 ] . 
+ More broadly , our data and the literature support the notion that transcripts from RNA polymerases I , II and III can be subject to DNA : RNA hybrid formation especially in RNA processing mutant backgrounds . 
+ Perspective
+ Factors regulating ectopic , genome destabilizing DNA : RNA hybrids are best characterized in yeast , although less is known about the functions of native R loop structures . 
+ The genome-wide maps of DNA : RNA hybrids presented here recapitulate the known sites of hybrid formation but also add important new insights to potential functions of R loops . 
+ Most importantly , we demonstrate the usefulness of DRIP profiling for detecting biologically meaningful differences in mutant strains . 
+ Therefore , DRIP profiling of yeast genomes in various mutant backgrounds will be key to understanding the causes and consequences of inappropriate R loop formation and how these are modulated by other cellular pathways . 
+ Methods
+ Strains and plasmids
+ All strains are listed in Supplementary Table S9 . 
+ For RNase H overexpression experiments , recombinant human RNase H1 was expressed from plasmid p425-GPD-RNase H1 ( 2m , LEU2 , GPDpr-RNase H1 ) and compared to an empty control plasmid p425-GPD ( 2m , LEU2 , GPDpr ) [ 7 ] . 
+ DRIP-chip and qPCR
+ Briefly , cells were grown overnight , diluted to 0.15 OD600 and grown to 0.7 OD600 . 
+ Crosslinking was done with 1 % formaldehyde for 20 minutes . 
+ Chromatin was purified as described previously [ 67 ] and sonicated to yield approximately 500 bp fragments . 
+ 40 mg of the anti-DNA : RNA hybrid monoclonal mouse antibody S9 .6 ( gift from Stephen Leppla ) was coupled to 60 mL of protein A magnetic beads ( Invitrogen ) . 
+ For ChIP-qPCR , crosslinking reversal and DNA purification were followed by qPCR analysis of the immunoprecipitated and input DNA . 
+ DNA was analyzed using a Rotor-Gene 600 ( Corbett Research ) and PerfeCTa SYBR green FastMix ( Quanta Biosciences ) . 
+ Samples were analyzed in triplicate on three independent DRIP samples for wild type and rnh1Drnh201D . 
+ Primers are listed in Supplementary Table S11 . 
+ For DRIP-chip , precipitated DNA was amplified via two rounds of T7 RNA polymerase amplification ( [ 68 ] ) , biotin labeled and hybridized to Affymetrix 1.0 R S. cerevisiae microarrays . 
+ Samples were normalized to a no antibody control sample ( mock ) using the rMAT software and relative occupancy scores were calculated for all probes using a 300 bp sliding window . 
+ All profiles were generated in duplicate and replicates were quantile normalized and averaged . 
+ Spearman correlation scores between replicates are listed in Supplementary Table S10 . 
+ Coordinates of enriched regions are available in Dataset S1/S2/S3 / S4/S5/S6 / S7 / S8 . 
+ DRIP-chip data is available at ArrayExpress E-MTAB-2388 . 
+ DRIP-chip analysis
+ Enriched features had at least 50 % of the probes contained in the feature above the threshold of 1.5 . 
+ Only features enriched in both replicates were reported . 
+ Transcriptional frequency [ 69 ] , GC content ( [ 70 ] ) and gene length were compared using the Wilcoxon rank sum test . 
+ Antisense association was analyzed by the Fisher 's exact test using R. Statistical analysis of genomic feature enrichment was performed using a Monte Carlo simulation , which randomly generates start positions for the particular set of features and calculates the proportion of that feature that would be enriched in a given DRIP-chip profile if the feature were distributed at random [ 67 ] . 
+ 500 simulations were run per feature for each DRIP-chip replicate to obtain mean and standard deviation values . 
+ These values were used to calculate the cumulative probability ( P ) on a normal distribution of seeing a score lower than the observed value by chance . 
+ DRIP-chip visualization
+ CHROMATRA plots were generated as described previously ( [ 71 ] ) . 
+ Relative occupancy scores for each transcript were binned into segments of 150 bp . 
+ Transcripts were sorted by their length , transcriptional frequency or GC content and aligned by their Transcription Start Sites ( TSS ) . 
+ For transcriptional frequency transcripts were grouped into five classes according to their transcriptional frequency described by Holstege et al 1998 . 
+ For GC content transcripts were grouped into four classes according to their GC content obtained from BioMart ( [ 70 ] ) . 
+ Average gene , tRNA or snoRNA profiles were generated by averaging all the probes that were encompassed by the features of interest . 
+ For averaging ORFs , corresponding probes were split into 40 bins while 1500 bp of UTRs and their probes were split into 20 bins . 
+ For smaller features like tRNAs and snoRNAs corresponding probes were split into only 3 bins . 
+ Average enrichment scores were calculated using in house scripts that average the score of all the 
+ Gene expression microarray
+ Gene expression microarray data is available at GEO GSE46652 . 
+ Strains harboring the RNase H1 over-expression plasmid or empty vector were grown in SC-Leucine at 30uC . 
+ All profiles were generated in duplicate . 
+ Total RNA was isolated from 1 OD600 of yeast cells using a RiboPure Yeast kit ( A&B Applied Biosystems ) , amplified , labeled , fragmented using a Message-Amp III RNA Amplification Kit ( A&B Applied Biosystems ) and hybridized to a GeneChIP Yeast Genome 2.0 microarray using the GeneChip Hybridization , Wash , and Stain Kit ( Affymetrix ) . 
+ Arrays were scanned by the Gene Chip Scanner 3000 7G and expression data was extracted using Expression Console Software ( Affymetrix ) with the MAS5 .0 statistical algorithm . 
+ All arrays were scaled to a median target intensity of 500 . 
+ A minimum cut off of pvalue of 0.05 and signal strength of 100 across all samples were implemented and only transcripts that had over a 2-fold change in the RNase H over-expression strain compared to wild type were considered significant . 
+ The correlation between duplicate biological samples was : control ( r = 0.9955 ) , RNase H over-expression ( r = 0.9719 ) . 
+ For statistical analysis , GC content , transcription frequencies and antisense association were analyzed as for DRIP ¬ 
+ Yeast chromosome spreads
+ Cells were grown to mid-log phase in YEPD rich media at 30uC and washed in spheroplasting solution ( 1.2 M sorbitol , 0.1 M potassium phosphate , 0.5 M MgCl2 , pH 7 ) and digested in spheroplasting solution with 10 mM DTT and 150 mg/mL Zymolase 20T at 37uC for 20 minutes similar to previously described ( [ 72 ] ) . 
+ The digestion was halted by addition of ice-cold stop solution ( 0.1 M MES , 1 M sorbital , 1 mM EDTA , 0.5 mM MgCl2 , pH 6.4 ) and spheroplasts were lysed with 1 % vol/vol Lipsol and fixed on slides using 4 % wt/vol paraformal-dehyde/3 .4 % wt/vol sucrose ( [ 73 ] ) . 
+ Chromosome spread slides were incubated with the mouse monoclonal antibody S9 .6 ( 1 mg/mL in blocking buffer of 5 % BSA , 0.2 % milk and 16 PBS ) . 
+ The slides were further incubated with a secondary Cy3-conjugated goat anti-mouse antibody ( Jackson Laboratories , # 115-165-003 , diluted 1:1000 in blocking buffer ) . 
+ For each replicate , at least 100 nuclei were visualized and manually counted to obtain the fraction with detectable DNA : RNA hybrids . 
+ Each mutant was assayed in triplicate . 
+ Mutants were compared to wild type by the Fisher 's exact test . 
+ To correct for multiple hypothesis testing , we implemented a cut off of p ,0.01 divided by the total number of mutants compared to wild type , meaning mutants with p ,0.00024 were considered significantly different from wild type . 
+ BPS sensitivity assay
+ 10-fold serial dilutions of each strain was spotted on 90 mM BPS plates with FeSO4 concentrations of 0 , 2.5 , 20 or 100 mM and grown at 30uC for 3 days [ 55 ] . 
+ A summary of this paper was presented at the 26 International th Conference on Yeast Genetics and Molecular Biology , August 2013 [ 74 ] . 
+ Supporting Information
+ Dataset S1 Wild type replicate 1 enriched region coordinates . 
+ ( XLSX ) 
+ Dataset S2 Wild type replicate 2 enriched region coordinates . 
+ ( XLSX ) 
+ Acknowledgments
+ The RNase H1 plasmid and anti-DNA : RNA hybrid antibody S9 .6 were kind gifts from Doug Koshland and Stephen Leppla respectively . 
+ We thank Alice Wang and Grace Leung for their assistance with the DRIP-chip protocol and Nigel O'Neil for helpful discussions . 
+ We thank Gian Luca Negri for helpful discussions of the chip-on-chip data analysis and for providing scripts . 
+ Author Contributions
+ Conceived and designed the experiments : YAC MJA AH PCS . 
+ Performed the experiments : YAC MJA ZL AH . 
+ Analyzed the data : YAC MJA PCS . 
+ Contributed reagents/materials/analysis tools : MJA PYTL ZL MSK PH. Wrote the paper : YAC MJA PCS PH.
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/25085508.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/25085508.txt 0 → 100644
View file @27818a9
+ Characterization and analysis of the Burkholderia
+ Abstract 
+ Background : Burkholderia pseudomallei is a facultative intracellular pathogen and the causative agent of melioidosis . 
+ A conserved type III secretion system ( T3SS3 ) and type VI secretion system ( T6SS1 ) are critical for intracellular survival and growth . 
+ The T3SS3 and T6SS1 genes are coordinately and hierarchically regulated by a TetR-type regulator , BspR . 
+ A central transcriptional regulator of the BspR regulatory cascade , BsaN , activates a subset of T3SS3 and T6SS1 loci . 
+ Results : To elucidate the scope of the BsaN regulon , we used RNAseq analysis to compare the transcriptomes of wild-type B. pseudomallei KHW and a bsaN deletion mutant . 
+ The 60 genes positively-regulated by BsaN include those that we had previously identified in addition to a polyketide biosynthesis locus and genes involved in amino acid biosynthesis . 
+ BsaN was also found to repress the transcription of 51 genes including flagellar motility loci and those encoding components of the T3SS3 apparatus . 
+ Using a promoter-lacZ fusion assay in E. coli , we show that BsaN together with the chaperone BicA directly control the expression of the T3SS3 translocon , effector and associated regulatory genes that are organized into at least five operons ( BPSS1516-BPSS1552 ) . 
+ Using a mutagenesis approach , a consensus regulatory motif in the promoter regions of BsaN-regulated genes was shown to be essential for transcriptional activation . 
+ Conclusions : BsaN/BicA functions as a central regulator of key virulence clusters in B. pseudomallei within a more extensive network of genetic regulation . 
+ We propose that BsaN/BicA controls a gene expression program that facilitates the adaption and intracellular survival of the pathogen within eukaryotic hosts . 
+ Background
+ Melioidosis is a serious and often fatal infectious disease common to Southeast Asia and Northern Australia caused by the Gram-negative soil bacterium Burkholderia pseudomallei . 
+ B. pseudomallei is a highly versatile pathogen cap-able of surviving inside mammalian cells and in many environmental niches . 
+ The bacterium can infect numerous animal species , amoebae , nematodes , and tomato plants [ 1-5 ] , and has been previously found within the tissues of exotic grasses in Australia [ 6 ] . 
+ The environmental origin of B. pseudomallei and its promiscuous host range have shaped the hypothesis that some of its genetic loci evolved in the rhizosphere as anti-predation determinants that subsequently promote `` accidental '' virulence in humans and animals . 
+ In recent years , important advances have been made in understanding the pathogenic mechanisms of B. pseudomallei including the roles of the Type III and Type VI Secretion Systems ( T3SS , T6SS ) [ 7-11 ] . 
+ B. pseudomallei contains three T3SSs and six T6SSs , but only T3SS3 ( also referred to as the Burkholderia secretion apparatus , or T3SSBsa ) and T6SS1 are critical for pathogenesis in mice and hamsters [ 7,12,13 ] . 
+ Expression of the T3SS3 and T6SS1 gene clusters is tightly controlled , both temporally and spatially , during the B. pseudomallei intracellular lifecycle . 
+ We have identified a regulatory cascade that coordinately activates T3SS3 and T6SS1 gene expression in growth medium and in infected mammalian cells [ 8,14 ] . 
+ At the top of the cascade is the TetR-type regulator BspR that stimulates the expression of bprP . 
+ The bspR gene is located on chromosome 1 of the B. pseudomallei genome whereas bprP is part of the T3SS3 gene cluster on chromosome 2 [ 14 ] . 
+ The ToxR-like BprP in turn activates genes encoding the structural components of T3SS3 , including the araCtype regulatory gene bsaN . 
+ BsaN is important for the activation of T3SS3 effector and translocon gene expression , and several regulatory genes including bprC and virAG , whose gene products control T6SS1 expression [ 8 ] . 
+ The mechanisms through which these transcriptional regulators control the expression of their target genes are not understood . 
+ It is also unclear whether these regulators are acting directly on the identified target genes or through as yet undiscovered intermediary regulators , and whether additional host cell cofactors are involved that may serve as intracellular signals . 
+ Compared to T3SSs in other pathogens such as Pseudo-monas , Salmonella and Shigella , only a limited number of effectors have been identified for B. pseudomallei T3SS3 . 
+ One of the effector proteins secreted by T3SS3 is BopE , which is annotated to exhibit guanine nucleotide exchange factor activity and has been reported to facilitate invasion of epithelial cells [ 15 ] . 
+ bopA is generally assumed to encode a T3SS3 effector since it is located adjacent to bopE , although T3SS3-dependent secretion of BopA has never been verified . 
+ Functionally , BopA has been described to promote resistance to LC3-associated autophagy and a bopA mutation results in an intracellular replication defect [ 16,17 ] . 
+ A third effector protein , BopC ( BPSS1516 ) , was recently shown to be secreted via T3SS3 , and bopC mutants were reported to be less invasive in epithelial cells [ 18 ] and to exhibit delayed endosome escape and reduced intracellular growth in J774 murine macrophages [ 19 ] . 
+ To determine the full extent of the BsaN regulon and examine whether BsaN activates the expression of add-itional effector genes , we performed global transcriptome analysis of B. pseudomallei KHW wildtype ( WT ) and a ΔbsaN mutant strain using RNAseq . 
+ Our analysis shows that 111 genes are under the direct or indirect transcriptional control of BsaN . 
+ In addition to activating loci associated with T3SS3 , we demonstrate that BsaN functions to repress transcription of other loci . 
+ Thus , BsaN functions as a central regulatory factor within a more extensive network to facilitate the intracellular lifecycle of B. pseudomallei . 
+ Results
+ Identification of the BsaN regulon through RNAseq analysis BsaN ( BPSS1546 in the reference B. pseudomallei K96243 genome ) was previously shown to function as a central regulator of a hierarchical cascade that activates effector and translocon genes of T3SS3 as well as several associated regulatory genes [ 8,14 ] . 
+ Furthermore , BsaN was shown to activate the expression of certain T6SS1-associated genes including the two-component regulatory system locus virAG ( BPSS1494 , 1495 ) , and the bim actin motility genes ( BPSS1490-1493 ) . 
+ To gain further insight into the BsaN regulon , massive parallel sequen-cing was performed on cDNA prepared from RNA isolated from wild-type and ΔbsaN mutant strains . 
+ We had previously shown that complementation of our ΔbsaN mutant with a bsaN plasmid could restore the secretion of the BopE effector [ 14 ] , showing that our complementation restored protein expression of the effectors and that the mutation was specific to bsaN and not due to off target effects . 
+ Between 16 and 56 million reads ( n = 2 from 3 combined cultures ) were obtained that aligned to nonribosomal genes in the KHW [ 20 ] genome ( Additional file 1 : Table S1 ) . 
+ Reads of the technical replicates displayed high reproducibility ( R-value ) ( Additional file 1 : Table S1 ) demonstrating that variability was not introduced through sample preparation or sequencing errors . 
+ The K96243 reference genome was co-aligned for ease of gene annotation . 
+ The nucleotide sequences of chromosomes I and II are 99.3 and 99.1 % identical , respectively . 
+ Comparison between wild-type and ΔbsaN transcriptomes identified 111 genes that were differentially regulated using 3-fold or more ( adjusted p-value < 0.01 ) as the cut off . 
+ Of these , 60 genes were expressed more highly in wild-type KHW compared to the ΔbsaN strain , indicating that BsaN directly or indirectly activates their transcription ( Table 1 ) . 
+ However , 51 genes were expressed more highly in the ΔbsaN mutant suggesting that BsaN can function directly or indirectly as a repressor ( Table 2 ) . 
+ RNAseq results were validated using quantitative real time-PCR ( qRT-PCR ) analysis for select loci . 
+ RNAseq ana-lysis identified all genes that we had previously shown to be activated by BsaN [ 8,14 ] ( Figure 1A and 1B , Table 1 ) . 
+ The effector and chaperone genes bopE , bopA and bicP together with the regulatory gene bprD were amongst the highest activated genes ( 50-270-fold ) . 
+ In addition , two putative transposase genes separating the T3SS3 genes and the T6SS1 gene clusters were highly activated by BsaN ( Table 1 ) . 
+ Genes activated at lower levels ( 3-4-fold ) include a hybrid non-ribosomal peptide synthase ( NRPS ) / polyketide synthase ( PKS ) locus consisting of 22 genes ( BPSL0472-BPSL0493 ) unique to B. pseudomallei and B. mallei . 
+ NRPS/PKS systems are found in microbes and fungi , and are generally responsible for the production of complex natural compounds such as antibiotics and siderophores . 
+ Burkholderia species are rich in NRPS/PKS loci that contain multiple metabolic genes or encode large multidomain synthases [ 21 ] . 
+ Although the precise function of this NRPS/PKS locus is not currently known , the presence of a diaminobutyrate-2-oxoglutarate amino transfer-ase gene ( BPSL0476 ) suggests that 2,4-diaminobutrate i one of the polyketide 's component . 
+ Loci for methionine and threonine biosynthesis , as well as ribose uptake ( Table 2 ) , were activated at similar levels . 
+ Representative BsaN-activated genes were confirmed by qRT-PCR ( Figure 1C-D ) . 
+ Intriguingly , genes encoding the T3SS3 apparatus components were found to be repressed in the wildtype compared with the ΔbsaN mutant , suggesting a role for BsaN in limiting apparatus synthesis when translocon and effector genes are transcribed ( Figure 1A , 1E , Table 2 ) . 
+ Also repressed are polar flagellar motility loci on chromosome 1 including the flagellin genes fliC and fliD , as well as flagellar hook proteins flgL and flgK . 
+ Repression of these genes as well as motA ( BPSL3309 ) and cheD ( BPSS3302 ) were validated by qRT-PCR ( Figure 1E ) . 
+ In Salmonella and other bacteria , motAB are key components of the flagellar motor complex [ 22 ] . 
+ motAB in KHW are part of a chemotaxis ( che ) locus , which is repressed 2 -- 2.9-fold ( p < 0.01 ) as assessed by RNAseq . 
+ In addition , expression of a second polyketide biosynthesis locus ( BPSS0303-BPSS0311 ) was reduced in a ΔbsaN mutant , possibly by repression of a co-localized araC-type regulatory gene , BPSS0336 ( Table 2 ) . 
+ However , down-regulation of this cluster could not be verified by qRT-PCR ( data not shown ) . 
+ We were likewise unable to validate repression of BPSL2404-2405 , which putatively encode transport and energy metabolism functions , respectively , in addition to BPSS1887-1888 , which are postulated to encode oxidative enzymes for energy metabolism . 
+ Additional loci implicated in lipid and energy metab-olism are also repressed ( Table 2 ) . 
+ Catabolic genes encode a cytochrome o oxidase typically used by bacteria in an oxygen-rich environment [ 23 ] , along with enzymes involved in the aerobic degradation of aromatic compounds and in the degradation of arginine . 
+ A gene involved in the synthesis of betaine , an osmoprotectant which serves to adapt Gram-negative bacteria to conditions of high osmolarity , is also repressed by BsaN . 
+ BsaN together with chaperone BicA directly activate T3SS3 effector and T6SS1 regulatory genes We have previously shown that expression of the two component regulatory system virAG and the genes from BPSS1520 ( bprC ) to BPSS1533 ( bicA ) in the T3SS3 cluster were regulated by BsaN in concert with the chaperone 
+ BicA [ 14 ] . 
+ To determine whether BsaN/BicA activate these genes directly , bsaN and bicA open reading frames ( orfs ) from B. pseudomallei strain KHW were inserted into a plasmid downstream of an arabinose-inducible promoter on pMLBAD [ 24 ] . 
+ These constructs were introduced into E. coli DH5α [ 25 ] along with an additional construct containing putative promoter regions of several BsaN target genes transcriptionally fused to lacZ on pRW50 [ 26 ] or pRW50mob , which contains the oriT fragment for pOT182 [ 27 ] . 
+ The effect of BsaN/BicA on promoter activity was then assessed by β-galactosidase activities . 
+ The putative bsaN orf is annotated in the B. pseudomal-lei genome database to initiate from a GTG start codon [ 28 ] . 
+ We identified a second potential start codon ( ATG ) and ribosome binding site 117 nucleotides ( nt ) upstream of GTG ( Figure 2A , B ) . 
+ bsaN/bicA expression constructs ( Figure 2A ) that were initiated from GTG were unable to activate transcription of bicA , bopA and bopE in E. coli ( Additional file 1 : Table S2 ) , supporting the notion that the ATG was the actual start codon for BsaN . 
+ Furthermore , a transcriptional start site was identified 56 nucleotide upstream of the ATG codon via RNA ligase-mediated rapid amplification of cDNA ends ( RLMRACE ) ( Figure 2B ) . 
+ A putative Ribosomal Binding Site ( RBS ) is located in front of the ATG condon . 
+ Replacing the GTG-initiated bsaN orf with the longer version containing the ATG start site resulted in activation of the bicA , bopA and bopE promoters as well as those for BPSS1521 ( bprD ) , BPSS 1495 ( virA ) and the putative transposase BPSS1518 ( Figure 3A-F ) . 
+ Expression of BsaN alone was not sufficient to activate these promoters ( Additional file 1 : Table S2 ) , demonstrating the co-requirement for BicA . 
+ No apparent BsaN/BicA-dependent promoter activity was obtained for BPSS1528 ( bapA ) , BPSS1523 ( bicP ) , BPSS1530 ( bprA ) , or BPSS1520 ( bprC ) ( Additional file 1 : Table S2 ) ( refer to Figure 2C for gene location ) . 
+ Furthermore , BsaN/BicA could not activate transcription of a BPSS1512 ( tssM ) - lacZ fusion in E. coli ( Figure 3G ) . 
+ Thus , BsaN/BicA drives the expression of bprDC and the BPSS1518-1516 operons directly , whereas bicP and bprB gene expression is likely driven by the upstream-located bopA promoter . 
+ Transcription of the bapABC and bprA genes could be driven from the bicA promoter . 
+ Collectively , these results are represented i 
+ Identification of transcriptional start sites and the sequence motif for BsaN/BicA activation Similarities between BsaN/BicA regulated promoters were examined by first determining their transcriptional start sites using RLM-RACE . 
+ One transcriptional start site was identified for the bicA , bprD and BPSS1518 promoters , and two start sites were detected for the bopA and virA promoters . 
+ We were unable to identify a transcriptional start site for bopE , which is divergently transcribed from bopA ( Figure 2C ) . 
+ A 150-bp sequence upstream of each transcriptional start site was submitted to MEME ( Motif Elicitation for Prediction of DNA Motifs ) , which identified a 15 bp motif that we designated as the putative BsaN box ( Figure 4A ) . 
+ The distance from the transcriptional start site varied from 24 bp ( virA ) to 35 bp ( bicA and bopA ) ( Figure 4B ) . 
+ When the motif was submitted to Motif Alignment & Search Tool ( MAST ) to search for other potential BsaN/BicA-regulated promoters in the B. pseudomallei genome ( strain K96243 ) , BsaN boxes were also found upstream of tssM and BPSS1889 , a putative gene encoding an AraC family protein , in addition to those already identified . 
+ However , qRT - PCR analysis of BPSS1889 expression in ΔbsaN and ΔbicA mutants did not reveal a decrease in expression compared to wild type bacteria ( data not shown ) . 
+ BPSS1889 is located adjacent but transcribed in the opposite direction to the operon BPSS1884-1888 , which was shown by RNAseq to be repressed by BsaN ( Table 2 ) . 
+ Although we could not confirm BsaN-dependent regulation of BPSS1889 by qRT-PCR , the upstream BsaN box suggests the possible involvement of this putative regulator in repression of the operon in vivo . 
+ It is likely that conditions for 
+ BsaN-dependent repression are difficult to establish in vitro resulting in variability and lack of validation . 
+ We also could not identify any − 10 and − 35 sequences for prokaryotic housekeeping sigma factor in these promoters . 
+ It is likely that the BsaN/BicA-regulated promoters are transcribed by one or more alternative sigma factors . 
+ Unfortunately , B. pseudomallei genome harbours more than 10 alternative sigma factors tha have not been systematically studied . 
+ Therefore , their recognition sequences are currently unknown . 
+ tssM is one of the highly activated genes in our RNAseq analysis ( Table 1 ) confirming previous in vivo expression studies [ 29 ] . 
+ However , despite the presence of the BsaN box upstream of the putative tssM operon ( BPSS1512-1514 ) , BsaN/BicA alone is not sufficient to activate tssM transcription in E. coli ( Figure 3G ) . 
+ This suggests that tssM regulation is more complex and likely requires additional cis and/or trans-acting regulatory elements for activation . 
+ Determining the sequence motif requirement for BsaN/BicA activation To determine whether the putative BsaN box motif was required and sufficient for the other genes regulated by BsaN/BicA , we constructed two types of truncated promoter-lacZ fusions . 
+ The `` type 1 '' deletion contained only the BsaN motif and lacked all upstream sequences . 
+ The `` type 2 '' deletion lacked all upstream sequences in addition to the first six bp of the putative BsaN box motif . 
+ We assayed the ability of these truncated promoters to drive lacZ expression in the presence of BsaN/BicA . 
+ All truncated versions of the promoter regions for bicA , virA and BPSS1518 lost promoter activity ( Figure 5A-C ) . 
+ In contrast , versions containing the intact BsaN box for bprD ( Figure 5D ) and bopA ( Figure 5E ) were still functional , but further truncation eliminated their activation . 
+ The type 1 truncated version of the bprD promoter ( PbprD1-lacZ ) had three-fold higher β-galactosidase activity than the full length promoter - lacZ fusion ( PbprD-lacZ ) ( Figure 5D ) , whereas deletion of sequences upstream of the bopA promoter did not have a significant effect on the level of activation ( Figure 5E ) . 
+ BprP directly activates bsaN and bsaM
+ In the hierarchical control of T3SS3 and T6SS1 expression , BspR was suggested to activate the expression of bprP [ 14 ] . 
+ Previously , BprP was shown to bind sequences upstream of bsaM and bsaN ( refer to Figure 2C for gene location ) [ 14 ] , suggesting that it directly activates their transcription . 
+ bsaN is the first orf of the putative operon that encodes structural components of T3SS3 ( Figure 2C ) and is divergently transcribed from bsaM . 
+ To better understand how bsaN expression itself is controlled , we examined the relationships to its upstream regulators BspR and BprP using the LacZ fusion assay as described previously [ 8 ] . 
+ Plasmids with either bspR or bprP under arabinose induction control were introduced into E. coli containing plasmids with either a bsaN-lacZ fusion or a bsaM-lacZ fusion . 
+ A bprP-lacZ fusion served as control for BspR regulation . 
+ The ability of BspR and BprP to directly activate bsaN-lacZ , bsaM-lacZ and bprP-lacZ expression was determined by measuring β-galactosidase activity . 
+ As expected , BprP activated both the bsaM and bsaN promoters in E. coli ( Figure 6A , B ) . 
+ The presence of bprQ , a gene immediate downstream from bprP , had no effect on the activity of BprP . 
+ Furthermore , BprP did not activate its own promoter in E. coli ( data not shown ) . 
+ However , BspR was not able to activate the promoter of bprP demonstrating that this regulator is not active in E. coli or that additional factors are required for activation ( Figure 6C ) 
+ Analysis of BsaN/BicA-regulated virulence loci BsaN/BicA directly induces the expression of three known T3SS3 effector loci , bopA , bopC and bopE . 
+ Recent studies suggest that the T3SS3 effectors BopC and BopE are involved in invasion of epithelial cells and endosome escape [ 15,18,19 ] , while BopA has been implicated in escape from autophagy [ 17 ] . 
+ BopC was recently shown to be secreted via T3SS3 in B. pseudomallei K96243 [ 18 ] , and our data confirm this ( Additional file 1 : Figure S1 ) . 
+ In B. pseudomallei KHW , mutation of bopA , bopC or bopE [ 30 ] individually resulted in no detectable difference in numbers of bacteria inside RAW264 .7 mouse macro-phages when measured 2 hr . 
+ after infection ( Additional file 1 : Figure S2A ) . 
+ Upon extended incubation times , however , the ΔbopA and the ΔbopACE [ 30 ] strains exhibited an intracellular replication defect that was intermediate between levels observed for wildtype KHW and the ΔbsaM [ 30 ] or ΔbsaN mutant derivatives . 
+ No differences in intracellular growth or host cell cytotoxicity were observed for the bopC or bopE mutant strains , although infection with the bopA or bopACE triple deletion mutants resulted in a decrease in cytotoxicity ( Additional file 1 : Figure S2B ) that coincided with a reduction in the rate of intracellular replication ( Additional file 1 : Figure S2A ) , suggesting that intracellular replication results in host cell toxicity . 
+ This is in contrast to the T3SS3 ΔbsaM and the ΔbsaN regulatory mutants in strain KHW , which are limited in their ability to multiply intracellularly as previously reported ( Additional file 1 : Figure S2A ) . 
+ Three BsaN/BicA-activated orfs are located between the T3SS3 and T6SS1 loci , and upstream of the T3SS3 effector gene bopC . 
+ We analyzed these orfs for potential roles in intracellular replication and cell-to-cell spread . 
+ BPSS1512 encodes TssM , was previously shown to be secreted independently of T3SS3 and T6SS1 and functions as a broad-base deubiquitinase , with activity on TNFR-associated factor-3 , TNFR-associated factor-6 , and IκBα [ 31 ] . 
+ BPSS1513 is predicted to encode a short ( 97 aa ) protein of unknown function and was not secreted under our assay conditions ( Additional file 1 : Figure S3A ) . 
+ folE ( BPSS1514 ) encodes a putative GTP cyclohydrolase I , suggesting a role in tetrahydrofolate biosynthesis rather than in virulence . 
+ Consistent with this notion , Δ ( BPSS1513-folE ) mutant did not exhibit defects in cell-based virulence assays ( Additional file 1 : Figure S3B-E ) . 
+ Discussion
+ T3SSs and T6SSs play important roles in bacterial-host cell interactions [ 32,33 ] . 
+ As each system is a complex structure encoded by 20 or more genes , it is expected that their expression and assembly would be tightly regulated . 
+ In B. pseudomallei , T3SS3 and T6SS1 gene clusters are highly induced following host cell infection [ 8 ] , and their function is critical for virulence in animal models [ 8,13 ] . 
+ T3SS3 has been shown to promote escape from endocytic vesicles , and T6SS1 plays a key role in promoting intercellular spread by fusion of adjacent cell membranes , leading to the formation of MNGCs that can be found in melioidosis patients [ 34 ] . 
+ Upregulation of T3SS3 is mediated by a signalling cascade initiating from BspR through BprP , which in turn increases the expression of the AraC-type regulator BsaN . 
+ Using global transcriptome and promoter activation analysis , we have shown that the BsaN regulon occupies a central position in modulating the expression of T3SS3 , T6SS1 and several additional loci that are likely involved in promoting virulence and intracellular survival . 
+ Regulatory factors may act to control expression by acting directly on a given gene , or indirectly by modulating a regulatory intermediate . 
+ We found that BsaN in complex with the T3SS3 chaperone BicA directly controls the expression of 19 loci in a region of chromosome 2 containing T6SS1 and T3SS3 accessory genes ( BPSS1494-BPSS1533 ) . 
+ BsaN/BicA activated transcription of the operons encoding T3SS3 effector proteins , the BipBCD translocon complex , chaperones , and other transcriptional regulators , as well as two genes of unknown function ( BPSS1513-1514 ) . 
+ BsaN/BicA upregulates expression of T6SS1 by activating the transcription of the two component regulatory system loci virAG and bprC , which in turn induce the hcp and tssAB loci , encoding T6SS1 tube and sheath proteins [ 8,35 ] . 
+ Interestingly , our RNAseq and qRT-PCR analyses revealed that BsaN also acts to repress transcription of T3SS3 apparatus genes in the bsaM and bsaN operons that are otherwise directly activated by the upstream regulator BprP . 
+ It is possible that BsaN mediates repression indirectly as the bsaM and bsaN intergenic region lacks a recognizable BsaN binding motif ( see below ) . 
+ It is unlikely , however , that repression occurs due to decreased expression of bprP since its transcription is unchanged in a ΔbsaN mutant . 
+ Taken together , these findings demonstrate that BsaN plays a dual role in th regulation of T3SS3 ; one in coordinating translocon and effector transcription , and a second in preventing costly synthesis of T3SS3 apparatus components that are no lon-ger required . 
+ Given the critical role of T3SS3 and T6SS1 in causing disease , BsaN/BicA could be considered a central regulator of B. pseudomallei mammalian virulence . 
+ Virulence studies in mice support this notion , since the ΔbsaN mutant was unable to cause disease [ 8 ] in contrast to the ΔbspR mutant , which produced a more chronic infection in mice compared to wildtype bacteria [ 14 ] . 
+ In addition to loci associated with T3SS3 and T6SS1 , 41 other genes with potential roles in virulence were also found by RNAseq to be positively regulated by BsaN , most notably the bimBCAD intracellular motility operon and tssM . 
+ Regulation of bimA has been shown to be through virAG [ 8 ] , explaining why no BsaN motif was identified for the operon . 
+ While bimA encodes an autotransporter protein that nucleates and polymerizes host cell actin to facilitate intracellular motility and cell-cell spread by the bacteria [ 36 ] , the functions of the other loci in the bim operon are unknown . 
+ TssM has been shown to suppress host NFκB and Type I interferon pathways [ 31 ] . 
+ TssM is expressed and secreted inside cells following infection with B. mallei [ 29 ] , however , secretion occurs independently of T3SS3 and T6SS1 [ 31 ] . 
+ BsaN was also found to activate expression of a putative non-ribosomal peptide synthase ( NRPS ) / polyketide synthase ( PKS ) biosynthesis locus . 
+ The diversity of polyketides , PKSs and NRPS/PKS hybrid systems was recently reviewed by Hertweck [ 37 ] . 
+ The B. pseudomallei locus is similar in gene content to that of a recently described plasmid encoded NRPS/PKS system in the marine bacter-ium Alteromonas macleodii , which was suggested to produce a bleomycin-related antibiotic Unlike A. macleodii , the gene encoding the putative bleomycin-family resist-ance protein ( BPSL2883 ) is not co-localized with the NRPS/PKS gene cluster , although they are similarly regulated by BsaN ( Table 1 ) . 
+ BsaN is homologous to the Salmonella typhimurium InvF , Shigella flexneri MxiE and the Yersinia enterocholi-tica YsaB transcriptional regulators [ 38-40 ] . 
+ All belong to the AraC/XylS family of transcriptional regulators , which act in complex with a chaperone to activate their respective T3SS genes . 
+ The chaperones not only serve as cognate partners to the transcriptional activators but also pair with T3SS translocase proteins , which are secreted into the host membrane to facilitate the injection of effector proteins [ 41 ] . 
+ We currently , have no understanding of the timed mechanism that frees BicA and allows it to partner with BsaN . 
+ The S. typhimurium chaperone SicA was shown to partition the translocase SipB and SipC , and it is sequestered by SipB [ 42 ] . 
+ Once apparatus assembly is complete , translocases are secreted and SicA is free to complex and thus activate 
+ InvF . 
+ The InvF-SicA split feedback regulatory loop , which includes positive autoregulation of invF , is conserved in Y. enterocholitica [ 40 ] . 
+ However , in S. flexneri MxiE-dependent activity is inhibited via sequestration by the T3SS substrate OspD1 when the apparatus is inactive [ 43 ] . 
+ Only when OspD1 is secreted , can MxiE partner with its chaperone IpgC to activated transcription of effector genes . 
+ Regulation by BsaN-BicA is distinct from the previously described systems . 
+ The designation of BsaN-BicA as a dual-function regulatory protein complex is illustrated by its role in activating T3SS effector and accessory genes while repressing the system 's structural and secretion components as summarized in Figure 7 . 
+ BsaN was also found to suppress the transcription of 51 additional genes in the B. pseudomallei genome including those belonging to the fla1 flagellar and chemotaxis locus on chromosome 1 ( Figure 1E ) . 
+ Fla1 is the sole flagellar system in Southeast Asian B. pseudomallei strains such as KHW , in contrast to Australian B. pseudomallei isolates which possess a complete second system encoded on chromosome 2 ( Fla2 ) [ 9,44 ] . 
+ The conserved fla1 locus encodes polar flagella and was shown to be responsible for swimming in liquid medium and swarming in soft agar , but played no role in intracellular motility following infection [ 9 ] . 
+ Moreover , we were intrigued to find that BsaN suppresses a second PKS/NRPS cluster ( BPSS0130 , BPSS0303-BPSS0311 , BPSS0328-BPSS0339 ) ( Table 2 ) , where almost identical homologs were identified in B. mallei and B. thailandensis by Biggins et al. and shown to produce an iron-chelating siderophore called malleilactone [ 45 ] . 
+ Disruption of the MAL cluster in B. thailandensis reduced lethality following infection of C. elegans , and purified malleilactone was toxic to mammalian cells at micromolar concentrations . 
+ How the function of MAL fits within an overall regulatory framework that promotes virulence is not clear , although it is conceivable that BsaNmediated suppression of MAL reduces the production of toxic products during infection , thereby promoting long term survival within eukaryotic hosts . 
+ Alternatively , mal-leilactone itself may regulate virulence factor production similar to that reported for the P. aeruginosa siderophore pyoverdine [ 46 ] . 
+ Until recently , BopA and BopE were the only two known T3SS effector proteins in B. pseudomallei . 
+ The dearth of effectors is surprising when compared to other intracellular pathogens such as Shigella and Salmonella that are known to possess numerous effectors . 
+ We have independently identified BopC ( BPSS1516 ) as a new T3SS3 effector based on its regulatory control by BsaN / BicA . 
+ bopC is transcribed in an operon encoding its chaperone ( BPSS1517 ) and a transposase ( BPSS1518 ) that are also activated by BsaN/BicA . 
+ Incidentally , we had previously predicted by a genome-wide screen tha 
+ BPSS1516 would encode a T3SS effector based on gen-omic colocalization with T3SS chaperones [ 47 ] . 
+ The BsaN regulatory motif we found in the promoters of the effectors was also recently reported to be associated with T3SS3 in a condition-dependent transcriptome study [ 48 ] . 
+ Of the T3SS3-linked effector proteins ; BopA , BopC and BopE , our results suggest that BopA is the most critical for promoting cellular infection , consistent with prior studies linking BopA to intracellular survival of B. pseudomallei and B. mallei [ 16,17,49 ] . 
+ No cellular phenotype was evident following infection with ΔbopC or ΔbopE deletion mutants , and the ΔbopACE triple effector mutant was indistinguishable from the ΔbopA single deletion strain . 
+ As with bopE and bopC , no roles were observed for the BsaN-regulated effector candidate loci BPSS1513-1514 in cell-based virulence assays . 
+ BPSS1513 encodes a hypothetical protein and BPSS1514 is annotated as folE , a predicted GTP cyclohydrolase . 
+ Based on their genomic organization , the transcription of these loci is likely driven from the promoter upstream of BPSS1512 tssM . 
+ The secretion of HA-tagged BPSS1513 was not detected in in vitro secretion assays , although it is possible that the epitope tag could have interfered with secretion of BPSS1513 , or that the assay was not performed at conditions permissive for secretion . 
+ It is intriguing why these three genes are placed under BsaN/BicA regulation by the bacterium . 
+ One possibility could be that they are important under specific stress conditions or during chronic infection . 
+ Conclusions
+ Elucidating the scope of the BsaN regulon significantly enhances our understanding of B. pseudomallei patho-genic mechanisms . 
+ BsaN orchestrates the temporal and spatial expression of virulence determinants during progression through the intracellular lifecycle , promoting endosome escape and possibly evasion of autophagy through activation of T3SS3 effector loci , facilitating cell-cell spread by activation of T6SS1 and the bim intracellular motility loci , and suppressing cellular immunity via the action of the TssM ubiquitin hydrolase . 
+ BsaN also suppresses other loci that are potentially counterproductive following intracellular localization , such as the fla1 fla-gellar motility and chemotaxis locus , which could lead to activation of cellular immunity pathways through PAMP recognition . 
+ It is likely that the BsaN regulon and other virulence determinants that promote pathogenesis in higher mammals have been shaped primarily as a result of interactions with free-living protozoa , similar to what is believed to be the case for L. pneumophila [ 50 ] 
+ Indeed , many of the same BsaN-regulated systems , namely T3SS and T6SS , are thought to act as `` anti-predation determinants '' that facilitate endosome escape and promote survival within bacteriovorus amoebae by manipulating eukaryotic pathways that are conserved from protists to humans [ 3 ] . 
+ The dual regulatory roles of BsaN -- that of an activator and a suppressor -- indicate that it is a key node in a regulatory program that successfully enables an environmental saprophyte to transition from the soil to surviving intracellularly . 
+ Methods Bacterial strains and culture conditions
+ Bacterial strains are listed in Table 3 . 
+ Plasmids are listed in Table 4 and Additional file 1 : Table S2 . 
+ The B. pseudomal-lei wild-type strains used in this study are clinical isolates KHW . 
+ Plasmids were introduced into E. coli DH5α and S17-1 [ 51 ] strains by electro - or chemical-transformation . 
+ Plasmids were transferred into B. pseudomallei by conjugation from E. coli S17-1 on membrane filters . 
+ E. coli donors and B. pseudomallei recipients were first mixed on filters and incubated at 37 °C on non-selective Luria-Bertani ( LB ) agar for 3 hours before transferring the filters onto selective media . 
+ In our RNA isolation for transcriptome analysis and qRT-PCR , B. pseudomallei wild-type and mutant strains were cultured in acidic ( pH 5.0 ) RPMI medium containing 10 % fetal bovine serum at 37 °C for 4 hours , when bacteria were in their mid exponential growth phase ( OD600 ~ 1.0 ) . 
+ Acidification results in higher T3SS3 expression without affecting cell growth . 
+ Bacterial mutant construction
+ B. pseudomallei gene deletions were generated by allelic exchange . 
+ Approximately 1 kb fragments upstream and downstream of the target gene were amplified from genomic DNA and cloned into pK18mobsacB vector [ 52 ] simultaneously using In-Fusion PCR cloning kit ( Clontech ) . 
+ The plasmids were introduced into B. pseudomallei strains by conjugation . 
+ Homologous recombination was then selected for by growing bacteria in LB + 15 % sucrose to counter select the sacB gene in the pK18mobsacB plasmid backbone . 
+ Successful double cross-over clones were screened by colony PCR from kanamycin sensitive colonies . 
+ Activation of potential promoters by regulators The ability of regulators to directly activate the expression of promoters was examined in E. coli DH5α as described previously [ 8 ] . 
+ Briefly , upstream regions of B. pseudomallei genes encompassing at least 100 bp of non-coding sequence upstream of the start codon were amplified from KHW genomic DNA and fused to the lacZ gene in pRW50 or pRW50mob to generate transcriptional fusions ( Table 4 ) . 
+ Coding sequences of regulators were amplified from KHW genomic DNA and cloned into the arabinose-inducible expression vector pMLBAD . 
+ The lacZ fusion plasmid and arabinose-inducible regulator plasmid were introduced into the E. coli DH5α . 
+ βgalactosidase activities arising from the expression of promoter-lacZ fusions were assessed . 
+ β-Galactosidase assays were performed and values were calculated as previously described [ 53 ] . 
+ Transcriptome analysis by RNAseq
+ Total RNA was extracted from three independently grown bacterial cultures that were combined at equal cell density in their exponential growth phase and quick frozen in dry ice-ethanol slurry . 
+ Approximately 2 × 109 ice cold cells were centrifuged at 3000 × g for 45 sec and 4 °C and RNA was isolated from cell pellets using the RiboPure ™ - Bacteria Kit ( Ambion ) . 
+ Stable RNAs were removed from 10 μg RNA using the MICROBExpress kit from Ambion . 
+ Absence of genomic DNA contamin-ation was confirmed by PCR . 
+ Paired-end libraries for Illumina sequencing [ 54 ] were prepared using the Tru-Seq RNA sample preparation kit version 2.0 ( Illumina ) according to manufacturer 's High Sample ( HS ) protocol albeit omitting the initial poly A selection step . 
+ Libraries were generated from 2 technical replicates using 350 -- 500 ng enriched RNA from wildtype and ΔbsaN mutant strains as the starting material . 
+ Library preparation and sequencing was done by the UCLA Neuroscience Genom-ics Core ( UNGC ) . 
+ Reads were aligned to chromosomes I and II of B. pseudomallei KHW ( also called BP22 ) ( RefSeq identification numbers NZ_CM001156 .1 an 
+ NZ_CM001157 .1 ) and B. pseudomallei K96243 ( RefSeq identification numbers NC_006350 .1 and NC_006351 .1 ) as the annotated reference genome . 
+ The number of reads aligning to each genomic position on each strand was calculated and normalized using RPKM ( [ reads/kb of gene ] / [ million reads aligning to genome ] ) . 
+ Differentially expressed genes identified by the log2 ratio of the differential between the wildtype and ΔbsaN RPKMs . 
+ Only , genes with a Δlog2 value of > 1.5 and < − 1.5 corresponding to 3-fold up or down regulated genes with a 
+ Measurement of B. pseudomallei gene expression by qRT - PCR Expression of activated genes was confirmed by qRTPCR of RNA prepared from bacteria grown in acidified RPMI . 
+ Gene repression was difficult to observe under these conditions ; RNA for qRT-PCR analysis was therefore prepared from infected RAW264 .7 cells using the following procedure : RAW264 .7 cells ( 5 × 105 cells/well ) were seeded and grown overnight in DMEM medium in 12 well plates . 
+ RAW264 .7 cells were transferred to RPMI medium prior to infection and infected at MOI of 100:1 . 
+ Bacterial RNA was isolated from infected RAW264 .7 cells 4 hours post infection using TRIzol and PureLink RNA mini-kit ( Invitrogen ) . 
+ cDNA was synthesized using 1 μg of RNA and the High Capacity Reverse Transcription Re-agent Kit ( Applied Biosystems ) . 
+ Transcripts were quantified using GoTaq qPCR Master Mix ( Promega ) in a BioRad iQ5 machine . 
+ Real-time PCR primers are listed in Additional file 1 : Table S4 . 
+ Relative RNA level of a particular gene in mutant strains was normalized to that of wild type using the 2 − ΔΔCt method with 16S rRNA or recA as reference gene [ 55 ] . 
+ Mapping transcriptional start sites
+ The transcriptional start ( +1 ) sites of the promoters were mapped by RNA ligase-mediated rapid amplification of cDNA ends ( RLM-RACE ) [ 56 ] . 
+ The RLM-RACE was performed using GeneRacer Kit ( Invitrogen ) according to manufacturer 's instructions . 
+ The B. pseudomallei RNA was isolated as described previously [ 14 ] . 
+ Sequence motif predication and database search of motifs in the B. pseudomallei KHW genome 150 base pairs of nucleotide sequence upstream of the transcriptional starts of each gene was submitted to the bioinformatics tool -- MEME ( http://meme.nbcr.net/ meme/cgi-bin/meme . 
+ cgi ) for prediction of DNA motifs [ 57 ] . 
+ The motif with the highest statistical significance ( lowest E-value ) was chosen and its data -- in PositionSpecific Probability Matrix format was submitted to MAST ( http://meme.nbcr.net/meme/cgi-bin/mast.cgi ) to search for the best matching positions in the upstream sequences of B. pseudomallei KHW genes [ 58 ] . 
+ Statistical analysis
+ Results were presented as mean ± standard deviation . 
+ Student 's t-test was used to find the significant differences between the means , defined as when p < 0.05 ( * ) and p < 0.01 ( ** ) . 
+ Competing interests
+ The authors declare no competing interests.
+ Authors’ contributions
+ YC , IS and YHG designed the experiments . 
+ YC , IS , CTF , XJY , BET and IJT performed the experiments . 
+ YC , IS , CTF , AJ and YHG analyzed the results . 
+ YC and YHG conceived the study and together with IS and JFM wrote the manuscript . 
+ All authors read and approved the final manuscript . 
+ Acknowledgements
+ We thank M. A. Valvano ( University of Western Ontario ) for pMLBAD plasmid , S.J. Busby ( University of Birmingham ) for pRW50 plasmid , S. Korbsrisate ( Mahidol University ) for BopC antibody , and M.P. Stevens ( University of Edinburgh ) for BopE antibody . 
+ This work is supported by grants T208A3105 from the Ministry of Education to YHG , NMRC/1221 / 2009 from the National Medical Research Council to YHG , an award from the Pacific Southwest Regional Center of Excellence in Biodefense and Emerging Infectious Diseases ( NIH U54 A1065359 ) to JFM , and grant HDTRA1-11-1-0003 from the Defense Threat Reduction Agency to JFM . 
+ We would like to thank Isabelle Chen for her technical assistance . 
+ Author details
+ 1Department of Biochemistry , Yong Loo Lin School of Medicine , National University of Singapore , Singapore 117597 , Singapore . 
+ 2Department of Microbiology , Immunology and Molecular Genetics , The University of California Los Angeles , Los Angeles , CA 90095 , USA . 
+ 3Department of Molecular , Cell and Developmental Biology , The University of California Los Angeles , Los Angeles , CA 90095 , USA . 
+ 4California NanoSystems Institute , The University of California Los Angeles , Los Angeles , CA 90095 , USA . 
+ 5Molecular Biology Institute , The University of California Los Angeles , Los Angeles , CA 90095 , USA . 
+ 6Immunology Program , Yong Loo Lin School of Medicine , National University of Singapore , Singapore 117597 , Singapore
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/25089258.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/25089258.txt 0 → 100644
View file @27818a9
+ Genomics Data
+ abstract 
+ Genomic DNA from Mycobacterium smegmatis mc2155 derived strains AB SOLiD 4 system high-throughput genome sequencer Raw data : sra ﬁles , normalized data : wig , SOFT , MINiML , and TXT ﬁles In the M. smegmatis strain that was used , the carD gene had been deleted from the native chromosomal locus and the strain instead constitutively expressed a functional C-terminal HA tagged version of CarD.The exception wasthecontrol strainthatexpressedan untagged HA peptide and retained the carD gene at its endogenous locus . 
+ All M. smegmatis strains were isogenic to mc2155 and were grown at 37 °C in LB supplemented with 0.5 % dextrose , 0.5 % glycerol , and 0.05 % Tween 80 to late log phase ( ODλ600 of 1.0 ) before crosslinking the protein -- nucleic acid complexes . 
+ Direct link to deposited data
+ The direct link for the ChIP-seq data is : http://www.ncbi.nlm.nih . 
+ gov/geo/query/acc.cgi ? 
+ acc = GSE48164 . 
+ Experimental design, materials and methods
+ Bacterial strains and culture conditions
+ All M. smegmatis strains were isogenic to mc2155 and were grown at 37 °C in LB supplemented with 0.5 % dextrose , 0.5 % glycerol , and 0.05 % Tween 80 ( broth ) . 
+ For immunoprecipitation of CarD , RNAP β , and RNAP σA , a carD merodiploid strain was produced by integrating pMSG430smcarD-HA ( constitutively expresses M. smegmatis Cterminal HA tagged CarD , kanamycin resistant ) into the attB site of M. smegmatis mc2155 . 
+ Allelic exchange experiments were performed with the carD merodiploid strain using a DNA donor sequence with homology to mc2155 nucleotides 6141480 to 6142268 and 6140266 to 6141010 to delete all of the carD gene except the nucleotides encoding the ﬁrst 10 and last 3 amino acids from the endogenous locus , generating ΔcarD attB : : tetcarD-HA [ 6 ] . 
+ For immunoprecipitation of unfused HA peptide as a control , mc2155 was transformed with pmsg431 , which integrates into the attB site of the genome and constitutively expresses HA peptide . 
+ This strain was called mc2155 attB : : pmsg431 . 
+ Chromatin immunoprecipitation
+ Cultures of M. smegmatis ΔcarD attB : : tetcarD-HA and mc2155 attB : : pmsg431 strains were grown to late log phase ( ODλ600 = ~ 1 ) before adding a ﬁnal concentration of 2 % formaldehyde and shaking at room temperature for 30 min to crosslink DNA and proteins . 
+ The crosslinking was quenched by the addition of 0.25 ml of 2.5 M glycine per 5 ml of culture and incubated 5 min at 25 °C with shaking . 
+ 5 ml ( ~ 2.5 × 109 mycobacterial cells ) of each culture was then collected by centrifugation . 
+ The cells were washed once with TE and resuspended in 100 μl of TE supplemented with Roche Complete protease inhibitor cocktail . 
+ The cell suspension was lysed using a Covaris Focused-Ultrasonicator so that the genomic DNA was sheared into ~ 100 base pair ( bp ) fragments , as assessed by DNA gel electrophoresis . 
+ The use of the Covaris Focused-Ultrasonicator was critical for this step and other sonicator systems were unable to yield a comparable consistency and homogeneity of DNA fragment distribution . 
+ The cell debris was spun down and the ly-sate was added to 400 μl ChIP lysis buffer ( 50 mM HEPES-KOH [ pH 7.5 ] , 140 mM NaCl , 1 mM EDTA , 1 % Triton X-100 ) plus Roche Complete protease inhibitor cocktail . 
+ Protein -- nucleic acid complexes containing CarD-HA were immunoprecipitated from the M. smegmatis mc2155 ΔcarD attB : : tetcarD-HA strain cell lysate by adding 50 μl of anti-HA agarose ( Sigma ) . 
+ Complexes containing unfused HA were immunoprecipitated from the mc2155 attB : : pmsg431 strain with the same anti-HA agarose . 
+ RNAP β and σA were immunoprecipitated from ΔcarD attB : : tetcarDHA with monoclonal antibodies speciﬁc for these subunits ( Neoclone ; 8RB13 for β , 2G10 for σ ) immobilized on GammaBind G Sepharose ( GE Healthcare Life Sciences ) . 
+ Each immunoprecipitation was performed in duplicate from two separate cultures , thus comprising two biological replicates . 
+ However , one of the RNAP σA samples was lost during library preparation and , therefore , there is only data for one RNAP σA replicate . 
+ The lysates and antibodies were incubated overnight by rotating at 4 °C . 
+ The antibody matrix was washed 2 × with ChIP lysis buffer , 2 × with ChIP lysis buffer plus an additional 360 mM NaCl , 2 × with ChIP wash buffer ( 10 mM Tris-HCl pH 8.0 , 250 mM LiCl , 0.5 % NP-40 , 0.5 % so-dium deoxycholate , 1 mM EDTA ) , and 2 × with TE , each time by rotating for 10 min at 4 °C . 
+ Complexes that co-precipitated with the respective antibody matrix were eluted twice by adding 100 μl of ChIP elution buff-er ( 50 mM Tris-HCl pH 8.0 , 10 mM EDTA , 1 % SDS ) , incubating for 10 min at 65 °C with agitation , spinning down the antibody matrix , and transferring the eluate to a new tube . 
+ Wash and elution buffers were all supplemented with Roche Complete protease inhibitor cocktail . 
+ To reverse the crosslinks , the eluates were incubated overnight at 65 °C . 
+ 15 μl of each sample was removed for Western blot analysis of proteins , while 100 μg / ml of proteinase K was added to the rest of each sample and incubated at 37 °C for 2 h before isolating nucleic acid by chloroform phenol extracting 2 times , ethanol precipitating and resuspending the DNA pellet in 34 μl of water . 
+ Sequencing
+ Co-precipitated DNA was sequenced using an AB SOLiD 4 highthroughput genome sequencer ( Life Technologies ) and a 50 bp read length , which provided sufﬁcient reads for over 100-fold coverage of the genome in each sample , wherein the M. smegmatis genome is 6,988,209 bp in length and the coverage of each sequencing reaction was over 800 Mbp . 
+ Table 1 shows the total number of reads and number of mapped reads for each sample . 
+ Normalization
+ The DESeq method [ 1 ] was used to normalize the raw data sequence reads . 
+ Speciﬁcally , the normalized coverage ( or counts ) was determined by multiplying the raw ( sequenced ) coverage ( or counts ) in each sample by that sample 's size factor . 
+ The size factors are determined by taking the median of the ratios of observed counts . 
+ The normalized number of sequence reads per base pair was then expressed as a log2 value . 
+ If a read mapped with equal quality at multiple loci ( but not more than 3 ) , its contribution was distributed evenly among them . 
+ For example , the sequences of the 16S , 23S , and 5S ribosomal RNA are identical in the M. smegmatis rrnA and rrnB operons . 
+ Therefore , the total number of reads for those sequences was split equally between the operons . 
+ If the number of mapping loci was higher than 3 , the read was discarded . 
+ The normalized number of reads for each base pair was saved as a wig ﬁle for each sample . 
+ Data analysis
+ We ﬁrst determined how well replicate samples of the distribution of a given protein correlated to each other and how well the distribution of CarD correlated to the distributions of RNAP β and RNAP σA ( Tables 2 and 3 ) . 
+ The correlations were obtained by computing the Pearson correlation of the genomic coverage proﬁles of each pair of samples . 
+ The coverage proﬁles were computed by summing the contributions of all mapped fragments , assuming they were 100 bp long , and then , in 20-bp steps along the entire genome , computing the average coverage of the surrounding 100-bp window . 
+ Table 2 shows the correlations between the individual replicates . 
+ These data showed that individual replicates for a single immunoprecipitation condition correlated highly with one another ( bolded in Table 2 ) and indicated that the distribution of CarD-HA or RNAP β was consistent between biological replicates . 
+ This consistency between replicates allowed us to average the Pearson correlation values for each comparison to simplify the comparisons between immunoprecipitation conditions ( Table 3 ) . 
+ The correlation between the distribution of CarD-HA and the distribution of RNAP σA ( bolded in Table 3 ) was almost as high as the correlation between the two CarD-HA replicates , indicating that the distribution of CarD-HA is very similar to that of RNAP σA . 
+ To directly compare the genome distributions of CarD-HA , RNAP β , and RNAP σA , the reads per base pair from the unfused HA peptide sample served as the background control and were subtracted from the other datasets . 
+ The rationale for this control was that as a non-DNA binding protein , the HA peptide should be diffusely localized throughout the cell and serve as a readout for the background levels of nonspeciﬁc crosslinking to the DNA . 
+ The normalized , backgroundcorrected log2 reads per base pair were then smoothed over a 20-bp window and RNAP σA and CarD-HA peaks were identiﬁed as described previously [ 3,4 ] . 
+ Brieﬂy , maxima and minima were assigned as inﬂectio 
+ Number of sequencing reads for each sample from the AB SOLiD 4 high-throughput genome sequencer set to a 50 bp read length . 
+ points where the values ± 10 bp were both lower or both higher , respectively . 
+ Maxima within 20 bp were merged with the peak location assigned to the maximum with the highest absolute signal value . 
+ Adjacent minima were merged analogously . 
+ To assess the statistical signiﬁcance , peaks were divided into 0.1 interval bins of peak heights with a lower cutoff of peak height of 0.4 log2 reads per base pair . 
+ Starting with the lowest bin , we then calculated the distance of each peak to the nearest gene start and compared these distances to those computed using genome coordinates arbitrarily rotated 1 × 106 bp around the M. smegmatis genome . 
+ Using the Wilcoxon -- Mann -- Whitney ranksum test for nonsimilarity of distributions [ 3,4 ] , RNAP σA peaks in the 1.1 -- 1.2 peak-height bins and CarD-HA peaks in the 0.5 -- 0.6 peak-height bins were statistically signiﬁcant ( P values for similarity of the distributions b0 .0001 ) . 
+ In other words , for each peak-height bin , the two lists of peak-to-gene start distances ( actual and rotated by 1 × 106 bp ) were tested for whether they were from different populations using the Wilcoxon -- Mann -- Whitney ranksum test . 
+ Peaks in the lowest peak-height bin for which peak-to-gene start distances differed from random with a P value of b0 .0001 plus all peaks with greater heights were then used as the statistically signiﬁcant peaks . 
+ We then identiﬁed RNAP σA peaks associated with each gene as the closest RNAP σA peak upstream from the gene start and CarD-HA peaks associated with the RNAP σA peaks as the closest CarD-HA peak to each selected RNAP σA peak . 
+ To calculate average ChIP signals for the aggregate proﬁles ( Fig. 1 ) , we selected a subset of 62 genes meeting the following criteria : ( i ) ≥ 300 bp in gene length , ( ii ) average RNAP log2 ChIP signal ≥ 1.6 / bp , ( iii ) associated with an RNAP σA peak with log2 ChIP signal ≥ 3/bp , ( iv ) absence of other RNAP σA peaks within 500 bp upstream or 1000 bp downstream of the associated RNAP σA peak , ( v ) absence of an oppositely oriented gene with an average RNAP β log2 ChIP signal ≥ 1 upstream from the gene ( because an oppositely oriented gene could create a divergent promoter region with potential for overlapping RNAP σA and CarD-HA ChIP signals ) , and ( vi ) absence of an upstream gene with average RNAP log2 ChIP signal N0 within 100 bp upstream from the gene ( because such an arrangement would indicate the gene is an internal member of an operon ) . 
+ The RNAP β , CarD-HA , and RNAP σA signals from the 62 genes were then averaged using the distance from the center of the associated σA peaks to align the genes ( Fig. 1 ) . 
+ For the gene alignments , the distance from the center of the associated σA peak served as a proxy for the transcriptional start site , since most transcriptional start sites are not mapped in M. smegmatis . 
+ This analysis showed that whereas RNAP β was found throughout transcribed regions of the genome , CarD-HA 
+ Average Pearson correlations of the genomic coverage proﬁles for each immunoprecipitation condition examined . 
+ Each sample was done in duplicate , except σA was done once . 
+ Correlations are the average of each duplicate to one another . 
+ The bolded number shows the correlation between the distribution of CarD-HA and the distribution of RNAP σA . 
+ and RNAP σA were primarily associated with promoter regions . 
+ These data matched the high correlation calculated for the distribution of CarD-HA and RNAP σA ( Tables 2 and 3 ) . 
+ Levels of both CarD-HA and RNAP σA dropped off immediately following the promoter sequences , suggesting that these proteins are lost from the RNAP elongating complex after transcription initiation . 
+ The colocalization of RNAP σA and CarD-HA led us to propose that in vivo , CarD associates with RNAP initiation complexes at most promoters and is therefore a global regulator of transcription initiation . 
+ Further analysis of the dataset also revealed that CarD was never present on the genome in the absence of RNAP , suggesting that it may be targeted to the genome through its interaction with RNAP . 
+ Discussion
+ CarD modulates transcription through its direct interaction with RNAP [ 6,7 ] . 
+ To determine at which stage of the transcription cycle ( initiation , elongation , or termination ) CarD acts , we used ChIP-seq [ 2 ] to survey the distribution of CarD throughout the M. smegmatis chromosome . 
+ Our data shows that CarD is localized to promoters throughout the M. smegmatis genome , indicating that CarD functions during transcription initiation . 
+ Despite the previous ﬁnding that CarD has sequence non-speciﬁc DNA binding activity [ 5 ] , the ChIP-seq experiments also revealed that CarD was never present on the genome in the absence of RNAP β or RNAP σA , suggesting that CarD is targeted to the genome through its interaction with RNAP . 
+ The ChIP-seq data for the distribution of RNAP σA also serves as a map of potential promoter elements throughout the M. smegmatis genome , which has never before been experimentally examined . 
+ The ChIP-seq experimental dataset has also raised a number of new questions . 
+ Compilation of the ChIP-seq data and previous microarray expression proﬁling analyses [ 6 ] indicates that CarD is broadly distributed on promoters of most transcription units regardless of whether they were deregulated during CarD depletion . 
+ This brings into question whether CarD activity exhibits promoter speciﬁcity . 
+ There is also the striking correlation between the distributions of CarD and RNAP σA on the genome , despite the fact that no direct interaction between these proteins has been reported . 
+ The factors contributing to the enrichment of CarD at RNAP σ containing holoenzymes as opposed to elongating RNAP core complexes remain unknown and will be a topic of future study . 
+ All together , results from these experiments have provided invaluable information that will help direct the ongoing efforts in determining the mechanism of transcription regulation by CarD . 
+ In addition , this work serves as a framework for further investigations into RNAP function in mycobacteria . 
+ The authors thank the Genomics Core Laboratory ( GCL ) at Memorial Sloan Kettering Cancer Center ( MSKCC ) for performing the next-generation sequencing for ChIP-seq experiments . 
+ The GCL is supported by the cancer center core grant P30 CA008748 .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/25757765.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/25757765.txt 0 → 100644
View file @27818a9
+ Chapter 1
+ Plants consist of many functionally specialized cell types , each with its own unique epigenome , transcriptome , and proteome . 
+ Characterization of these cell type-speciﬁc properties is essential to understanding cell fate speciﬁcation and the responses of individual cell types to the environment . 
+ In this chapter we describe an approach to map chromatin features in speciﬁc cell types of Arabidopsis thaliana using nuclei puriﬁcation from individual cell types with the INTACT method ( isolation of nuclei tagged in spe-ciﬁc cell types ) followed by chromatin immunoprecipitation and high-throughput sequencing ( ChIP-seq ) . 
+ The INTACT system employs two transgenes to generate afﬁnity-labeled nuclei in the cell type of interest , and these tagged nuclei can then be selectively puriﬁed from tissue homogenates . 
+ The primary transgene encodes the nuclear tagging fusion protein ( NTF ) , which consists of a nuclear envelope-targeting domain , the green ﬂuorescent protein , and a biotin ligase recognition peptide , while the second transgene encodes the E. coli biotin ligase ( BirA ) , which selectively biotinylates NTF . 
+ Expression of NTF and BirA in a speciﬁc cell type thus yields nuclei that are coated with biotin and can be puriﬁed by virtue of their afﬁnity for streptavidin-coated magnetic beads . 
+ Compared with the original INTACT nuclei puriﬁcation protocol , the procedure presented here is greatly simpliﬁed and shortened . 
+ After nuclei puriﬁcation , we provide detailed instructions for chromatin isolation , shearing , and immunoprecipitation . 
+ Finally , we present a low input ChIP-seq library preparation protocol based on the nano-ChIP-seq method of Adli and Bernstein , and we describe multiplex Illumina sequencing of these libraries to produce high quality , cell type-speciﬁc epigenome proﬁles at a relatively low cost . 
+ The procedures given here are optimized for Arabidopsis but should be easily adaptable to other plant species . 
+ 1 Introduction
+ Plants build their bodies by drawing on pools of undifferentiated stem cells in the meristems to produce a wide array of specialized cell types , each with its own unique form and function . 
+ This cell fate speciﬁcation requires reprogramming of the stem cell epig-enome to establish and maintain the speciﬁc transcriptional program underlying the phenotype of a given specialized cell type . 
+ Understanding cellular differentiation is an important goal in developmental biology , but progress has been slowed by the technical challenges associated with isolating pure populations of speciﬁc cell types from a whole organism . 
+ In recent years a number of techniques have been used to characterize the molecular properties of different plant cell types . 
+ These methods include laser capture microdissection ( LCM ) and ﬂuorescence-activated cell sorting ( FACS ) , which use mechanical separation to isolate whole cells of the desired type , as well as afﬁnity-based methods such as tagged ribosome afﬁnity puriﬁcation ( TRAP ) and isolation of nuclei tagged in speciﬁc cell types ( INTACT ) , which are able to purify translating ribosomes and nuclei , respectively , from the target cell type . 
+ Each of these methods comes with certain advantages and limitations , and all have been reviewed in the recent literature [ 1 -- 4 ] . 
+ In this chapter , we describe the use of the INTACT method for characterizing the chromatin landscape in speciﬁc cell types of Arabidopsis thaliana . 
+ The strategy behind INTACT is to afﬁnitylabel the nuclear envelope in a desired cell type such that labeled nuclei can be afﬁnity-puriﬁed from a tissue homogenate , in a procedure similar to performing an immunoprecipitation . 
+ This system requires two transgenes : ( 1 ) the tripartite nuclear tagging fusion ( NTF ) , which encodes a nuclear envelope-targeting domain , green ﬂuorescent protein ( GFP ) , and the biotin ligase recognition peptide ( BLRP ) ; and ( 2 ) E. coli biotin ligase ( BirA ) , which biotinylates the BLRP domain of NTF . 
+ The NTF transgene is expressed from a cell type-speciﬁc promoter and BirA is expressed from a constitutive promoter , such that the nuclear envelope is ﬂuorescently labeled and biotinylated only in the cell type of interest ( Fig. 1a , b ) . 
+ Nuclei from the desired cell type can then be speciﬁcally puriﬁed from a tissue homogenate using streptavidin-coated magnetic beads ( Fig. 1c , d ) . 
+ This approach was initially applied to the characterization of the epigenome and nuclear transcriptome of Arabidopsis root epidermal cell types [ 5 , 6 ] and has subsequently been used in the Arabidopsis embryo [ 7 ] , C. elegans muscle and Drosophila mesoderm [ 8 ] , Xenopus cardiac tissue [ 9 ] , as well as tomato roots [ 10 ] . 
+ The protocol presented here utilizes an improved version of the originally published procedure for nuclei puriﬁcation using INTACT [ 6 ] . 
+ In the original protocol , magnetic bead-bound nuclei were captured from a liquid column as they ﬂowed past a magnet , which required a substantial amount of time given the volume of bead -- nuclei solution and the low ﬂow rate required to efﬁciently capture highly pure bead-bound nuclei . 
+ The updated protocol does away with this ﬂow-based setup in favor of capturing the bead-bound nuclei directly in a tube placed in a magnet with larger surface area to accommodate the required volume of liquid . 
+ This alteration greatly reduces the amount of time required and further simpliﬁes the procedure without affecting yield or purity of target nuclei . 
+ Following the puriﬁcation of nuclei from transgenic plants carrying the NTF and BirA transgenes , we next present an optimized chromatin immunoprecipitation ( ChIP ) protocol that can be used on as few as 10,000 puriﬁed nuclei . 
+ Finally , we describe the preparation of ChIP-seq libraries for the Illumina sequencing platform using an adaptation of the nano-ChIP-seq method developed by Adli and Bernstein [ 11 , 12 ] . 
+ The procedures laid out here can theoretically be applied to epigenome proﬁling of any plant cell type , given the availability of an appropriate cell type-speciﬁc promoter . 
+ Prepare all solutions with molecular biology-grade water . 
+ Commercially available products should be stored and handled according to the manufacturer 's instructions . 
+ To prepare , dilute 10 × NPB to 1 × concentration and add spermidine , spermine , and Roche Complete protease inhibi-tors just before starting the nuclei puriﬁcation procedure . 
+ Keep this solution on ice and use within 1 h of preparation . 
+ 7 . 
+ Nuclei Puriﬁcation Buffer containing 1 % formaldehyde ( NPBf ) : 20 mM MOPS ( pH 7 ) , 40 mM NaCl , 90 mM KCl , 2 mM EDTA , 0.5 mM EGTA , 0.5 mM spermidine , 0.2 mM spermine , 1 % ( v/v ) formaldehyde ( Sigma Aldrich , catalog no. 252549 ) . 
+ To prepare , dilute 10 × NPB to 1 × concentration and add spermidine , spermine , and formaldehyde just before starting the nuclei puriﬁcation procedure . 
+ Keep the solution at room temperature and use within 1 h. Formaldehyde is toxic . 
+ Avoid inhalation and skin exposure , and dispose of the used solution according to local regulations . 
+ 8 . 
+ Nuclei Puriﬁcation Buffer containing 0.1 % Triton X-100 ( NPBt ) : 20 mM MOPS ( pH 7 ) , 40 mM NaCl , 90 mM KCl , 2 mM EDTA , 0.5 mM EGTA , 0.5 mM spermidine , 0.2 mM spermine , 0.1 % ( v/v ) Triton X-100 . 
+ To prepare , dilute 10 × NPB to 1 × concentration and add spermidine , spermine , and Triton X-100 just before starting the nuclei puriﬁcation procedure . 
+ Keep this solution on ice and use within 1 day of preparation . 
+ 9 . 
+ 2 M glycine : To make 100 ml , dissolve 15.01 g Glycine in water for a ﬁnal volume of 100 ml , then ﬁlter-sterilize the solution . 
+ Store at 4 °C for up to 3 months . 
+ 10 . 
+ DAPI staining stock solution : dissolve 10 mg of DAPI powder ( Sigma Aldrich , catalog no . 
+ D9542 ) in 5 ml of water , ﬁltersterilize the solution and store at 4 °C in dark . 
+ Before nuclei staining , make a 1:10 dilution of the stock in water , and use within several hours . 
+ Stock solution can be stored at 4 °C for several months . 
+ 11 . 
+ DynaMag 2 magnetic rack for 1.5 ml tubes ( Life Technologies , catalog no. 12321D ) and DynaMag 15 magnetic rack for 15 ml tubes ( Life Technologies , catalog no. 12301D ) . 
+ 12 . 
+ Sterile plastic 10 ml serological pipettes ( Fisher Scientiﬁc , catalog no. 13-678-12E ) . 
+ 13 . 
+ Pipet-Aid ( Fisher Scientiﬁc , catalog no.13-681-161 ) , or equivalent . 
+ 14 . 
+ Porcelain 50 ml mortar and pestle ( Fisher Scientiﬁc , catalog nos . 
+ FB-961-A and FB-961-K ) , or equivalent . 
+ 15 . 
+ Liquid nitrogen . 
+ 16 . 
+ 70 µM nylon cell strainers ( Fisher Scientiﬁc , catalog no. 08-771-2 ) . 
+ 17 . 
+ M-280 Streptavidin Dynabeads ( Life Technologies , catalog no. 11205D ) 
+ 18 . 
+ Nutator platform rotator ( Fisher Scientiﬁc , catalog n 14-062 ) , or equivalent . 
+ 19 . 
+ 1.5 ml Eppendorf tubes ( Fisher Scientiﬁc , catalog no . 
+ S348903 ) . 
+ 20 . 
+ 15 ml Falcon tubes ( Fisher Scientiﬁc , catalog no. 05-527-90 ) . 
+ 21 . 
+ Vacuum desiccator for tissue cross-linking ( Fisher Scientiﬁc , catalog no. 08-594-16A ) . 
+ 22 . 
+ Hausser Bright Line hemocytometer ( Fisher Scientiﬁc , catalog no. 02-671-1 ) , or equivalent . 
+ 23 . 
+ Tabletop microcentrifuge and refrigerated centrifuge with rotor for 15 ml tubes . 
+ 24 . 
+ 4 °C cold room . 
+ 1 . 
+ Nuclei lysis buffer : 50 mM Tris -- HCl ( pH 8 ) , 10 mM EDTA ( pH 8 ) , 1 % ( w/v ) SDS , and 1 × Roche complete protease inhibitor . 
+ Keep at room temperature and use within 1 h of preparation . 
+ 2 . 
+ ChIP dilution buffer : 1.1 % ( v/v ) Triton X-100 , 1.2 mM EDTA ( pH 8 ) , 16.7 mM Tris -- HCl ( pH 8 ) , 167 mM NaCl . 
+ Keep solution on ice and use within 1 day . 
+ 3 . 
+ Appropriate ChIP-grade antibodies . 
+ 4 . 
+ Dynabeads protein A ( Life Technologies , catalog no. 10002D ) or protein G ( Life Technologies , catalog no. 10003D ) 5 . 
+ Low-salt wash buffer : 20 mM Tris -- HCl ( pH 8 ) , 150 mM NaCl , 0.1 % ( w/v ) SDS , 1 % ( v/v ) Triton X-100 , and 2 mM EDTA ( pH 8 ) . 
+ Keep solution on ice before use . 
+ Solution can be stored at 4 °C for up to 1 month . 
+ 6 . 
+ High-salt wash buffer : 20 mM Tris -- HCl ( pH 8 ) , 500 mM NaCl , 0.1 % ( w/v ) SDS , 1 % ( v/v ) Triton X-100 , and 2 mM EDTA ( pH 8 ) . 
+ Keep solution on ice before use . 
+ Solution can be stored at 4 °C for up to 1 month . 
+ 7 . 
+ LiCl wash buffer : 10 mM Tris -- HCl ( pH 8 ) , 250 mM LiCl , 1 % ( w/v ) sodium deoxycholate , 1 % ( v/v ) NP-40 , 1 mM EDTA ( pH 8 ) . 
+ Keep solution on ice before use . 
+ Solution can be stored at 4 °C for up to 1 month . 
+ 8 . 
+ TE : 10 mM Tris ( pH 8 ) , 1 mM EDTA ( pH 8 ) . 
+ Keep solution on ice before use . 
+ Solution can be stored for several months at room temperature . 
+ 9 . 
+ ChIP elution buffer : 100 mM NaHCO3 , 1 % ( w/v ) SDS . 
+ This solution should be kept at room temperature and used within several hours of preparation . 
+ 10 . 
+ 5 M NaCl : To make 1 l , dissolve 292.2 g NaCl in water for a ﬁnal volume of 1 l , then autoclave or ﬁlter-sterilize the solution 
+ 11. RNase A (Ambion, catalog no. AM2270).
+ 12 . 
+ Proteinase K ( New England Biolabs , catalog no . 
+ P8107S ) . 
+ 13 . 
+ Qiagen MinElute PCR puriﬁcation kit ( Qiagen , catalog no. 28006 ) . 
+ 14 . 
+ PicoGreen dsDNA quantitation kit ( Life Technologies , catalog no . 
+ P7589 ) . 
+ 15 . 
+ Fluorometer ( e.g. , BioTek Synergy HT , Life Technologies Qubit , or equivalent ) . 
+ 16 . 
+ 0.6 ml low retention microcentrifuge tubes ( Fisher Scientiﬁc , catalog no. 02-681-311 ) . 
+ 17 . 
+ DynaMag 2 magnetic rack for 1.5 ml tubes ( Life Technologies , catalog no. 12321D ) . 
+ 18 . 
+ Diagenode Bioruptor Standard Sonicator ( Diagenode Inc. ) , or equivalent . 
+ 19 . 
+ Tabletop microcentrifuge . 
+ 20 . 
+ 4 °C cold room . 
+ 21 . 
+ 100 °C heat block . 
+ 1 . 
+ Primer 1 : 5 ′ - GACATGTATCCGGATGTNNNNNNNNN-3 ′ . 
+ N represents a degenerate position with all four nucleotides at that location . 
+ Prepare a 4 µM solution and store at -- 20 °C . 
+ BciVI site is underlined . 
+ 2 . 
+ Primer 2 : 5 ′ - GACATGTATCCGGATGT-3 ′ . 
+ Prepare a 10 µM solution and store at -- 20 °C . 
+ BciVI site is underlined . 
+ 3 . 
+ 100 mM dNTP mix ( 25 mM each nucleotide ; New England Biolabs , catalog no . 
+ N0446S ) . 
+ Prepare dilutions of both 3 mM and 10 mM ﬁnal concentration and store at − 20 °C . 
+ 4 . 
+ 0.1 M Dithiothreitol ( DTT , Affymetrix , catalog no. 70726 ) . 
+ 5 . 
+ 10 mg/ml bovine serum albumin ( BSA , New England Biolabs , catalog no . 
+ B9001S ) . 
+ 6 . 
+ Sequenase Version 2.0 DNA sequencing kit ( Affymetrix , catalog no. 70770 ) : includes Sequenase Version 2.0 DNA Polymerase ( 13 U / µl ) , 5 × Sequenase enzyme reaction buffer , and glycerol enzyme dilution buffer . 
+ 7 . 
+ ExoSAP-IT reagent for PCR cleanup ( Affymetrix , catalog no. 78200 ) . 
+ 8 . 
+ Phusion Hot Start Flex DNA Polymerase and reaction buffer ( New England Biolabs , catalog no . 
+ M0535L ) . 
+ 9 . 
+ Qiagen MinElute PCR puriﬁcation kit ( Qiagen , catalog no. 28006 ) . 
+ 10 . 
+ Qiagen MinElute gel extraction kit ( Qiagen , catalog no. 28606 ) 
+ 3 Methods
+ 11 . 
+ PicoGreen dsDNA quantitation kit ( Life Technologies , catalog no . 
+ P7589 ) . 
+ 12 . 
+ BciVI enzyme and CutSmart digestion buffer ( New England Biolabs , catalog no . 
+ R0596L ) . 
+ 13 . 
+ SYBR Green I nucleic acid gel stain ( Sigma Aldrich , catalog no . 
+ S9430 ) . 
+ 14 . 
+ Equipment for agarose gel electrophoresis and imaging of DNA . 
+ 15 . 
+ Bioo Scientiﬁc NEXTﬂex ChIP-Seq library preparation kit ( Bioo Scientiﬁc , catalog no. 5143-01 ) . 
+ 16 . 
+ Bioo Scientiﬁc NEXTﬂex ChIP-Seq barcode adapters ( 12 unique barcodes ; Bioo Scientiﬁc , catalog no. 514121 ) . 
+ 17 . 
+ Agencourt AMPure XP beads ( Beckman Coulter , catalog no . 
+ A63880 ) . 
+ 18 . 
+ 80 % Ethanol : mix 8 ml of molecular biology-grade ethanol ( Fisher Scientiﬁc , catalog no . 
+ BP2818 ) with 2 ml of water . 
+ Prepare fresh before use . 
+ 19 . 
+ Thermal cycler with heated lid and adjustable ramp rate . 
+ 20 . 
+ DynaMag 2 magnetic rack for 1.5 ml tubes ( Life Technologies , catalog no. 12321D ) . 
+ 21 . 
+ Tabletop microcentrifuge . 
+ 22 . 
+ Agilent Bioanalyzer . 
+ Carry out all procedures at room temperature unless otherwise speciﬁed . 
+ 4 . 
+ Draw the tissue suspension into a 10 ml serological pipette and ﬁlter through a 70 µM nylon cell strainer into a 15 ml tube on ice . 
+ Spin down nuclei at 1,200 × g for 10 min at 4 °C . 
+ Decant the supernatant carefully without disturbing the pellet of nuclei and debris . 
+ 5 . 
+ Gently resuspend the pellet in 1 ml of cold NPB by pipetting up and down and transfer the crude nuclei suspension to a 1.5 ml tube . 
+ Keep on ice . 
+ 6 . 
+ Wash the appropriate amount of Streptavidin M280 Dynabead suspension ( 25 µl for nuclei from 1 to 3 g of tissue ; see Note 1 ) with 1 ml of ice-cold NPB in a 1.5 ml tube and collect beads on the DynaMag 2 magnetic rack . 
+ Discard the superna-tant and resuspend beads to their original volume with NPB ( e.g. , 25 µl ) . 
+ 7 . 
+ Add 25 µl of washed and resuspended beads to the 1 ml of nuclei suspension from step 5 . 
+ Mix well and rotate on a nutator at 4 °C for 30 min . 
+ Work in the 4 °C cold room for steps 8 -- 15 . 
+ 8 . 
+ Transfer the 1 ml of bead -- nuclei mixture to a 15 ml tube and gently add 13 ml of ice-cold NPBt to the mixture to bring the volume to 14 ml . 
+ Mix gently and place on a nutator for 30 s. 9 . 
+ Place the 15 ml tube in the DynaMag 15 magnetic rack for 2 min to capture the nuclei -- beads on the walls of the tube . 
+ 10 . 
+ Slowly and carefully remove the NPBt supernatant with a serological pipette , taking care not to disturb the beads on the walls of the tube . 
+ Gently resuspend the beads in 14 ml of ice-cold NPBt , mix gently , and place on a nutator for 30 s. 11 . 
+ Place the tube in the DynaMag 15 magnetic rack for 2 min to capture the nuclei -- beads . 
+ 12 . 
+ Repeat steps 10 and 11 . 
+ 13 . 
+ Slowly and carefully remove the NPBt with a serological pipette and resuspend the beads in 1 ml of ice-cold NPBt . 
+ Save 25 µl of this nuclei -- bead suspension and store on ice for counting of the captured nuclei on a hemocytometer . 
+ 14 . 
+ Transfer the remaining nuclei -- bead suspension to an ice-cold 1.5 ml tube and capture nuclei -- beads on a DynaMag 2 magnetic rack . 
+ 15 . 
+ Remove NPBt supernatant , resuspend the beads in 20 µl of NPB , keep on ice , and proceed with the chromatin immunoprecipitation procedure . 
+ Alternatively , nuclei -- beads can be stored at − 80 °C until further use ( see Note 2 ) . 
+ 16 . 
+ To view and count nuclei under the microscope , add 1 µl of diluted DAPI solution ( 0.2 µg / µl ) to each 25 µl sample from step 13 , mix well , and place on ice for 5 min in darkness 
+ 1 . 
+ Add 120 µl of nuclei lysis buffer to the puriﬁed nuclei from step 15 of Subheading 3.1 and transfer nuclei -- buffer mix to a 0.6 ml low-retention tube . 
+ Vortex vigorously for 2 min to lyse the nuclei . 
+ 2 . 
+ Sonicate the lysed nuclei at 4 °C in a Diagenode Standard Bioruptor water bath sonicator for 40 min using the high power setting and 45 s on/15 s off sonication cycle ( see Note 4 ) . 
+ 3 . 
+ After sonication , centrifuge the lysate at 18,000 × g for 2 min at 4 °C to pellet beads and debris . 
+ Transfer the supernatant containing the sheared chromatin to a new 1.5 ml tube . 
+ 4 . 
+ Measure the volume of fragmented chromatin using a micro-pipette , and add ice-cold ChIP dilution buffer to make a tenfold dilution of the sonicated chromatin ( e.g. , the ﬁnal volume of diluted chromatin should be approximately 1.4 ml ) . 
+ Mix gently by inverting the tube several times , and then place the tube on ice . 
+ 5 . 
+ Move 10 % of the diluted chromatin sample to a new tube and store at − 80 °C as the `` input '' chromatin fraction ( see Note 5 ) . 
+ The remaining diluted chromatin is enough for approximately 1 -- 4 ChIP experiments ( see Note 6 ) . 
+ 6 . 
+ Divide diluted chromatin into the appropriate number of 0.6 ml low retention tubes ( or a single 1.5 ml tube if using entire chromatin sample for one ChIP experiment ) . 
+ Add the appropriate amount of antibody to each aliquot , mix well , and rotate on a nutator platform at 4 °C for 2 -- 5 h ( see Note 7 ) . 
+ 7 . 
+ To prepare magnetic beads , add the appropriate amount of protein A Dynabead suspension ( or protein G Dynabead suspension , depending on antibody isotype ) into an ice-cold 1.5 ml tube ( Use 30 µl of bead suspension per ChIP sample ) . 
+ Add 1 ml of ice-cold ChIP dilution buffer , and invert the tube several times to wash the beads . 
+ Collect the beads on a DynaMag 2 magnet rack , decant , and resuspend the beads to their original volume with ChIP dilution buffer . 
+ Keep the resuspended beads on ice until use . 
+ 8 . 
+ Add 30 µl of the washed protein A ( or protein G ) Dynabeads from step 7 to each ChIP sample . 
+ Mix well and incubate on a nutator platform at 4 °C for 1 -- 2 h. Work in the 4 °C cold room for steps 9 -- 12 . 
+ 9 . 
+ Collect the beads using a DynaMag 2 magnetic rack and remove the supernatant . 
+ 10 . 
+ Resuspend the beads in 0.5 ml of low-salt wash buffer and incubate the beads on a nutator platform at 4 °C for 5 min 
+ 11 . 
+ Repeat steps 9 and 10 using the following series of buffers : high-salt wash , LiCl wash , and TE . 
+ 12 . 
+ After the TE wash , move the beads -- buffer suspension to a new , ice-cold 0.6 ml low retention tube , collect beads on the DynaMag 2 magnet rack , and remove the supernatant . 
+ 13 . 
+ Add 200 µl of ChIP elution buffer to the beads , mix well , and vortex vigorously for 5 min at room temperature . 
+ 14 . 
+ Collect the beads on a DynaMag 2 magnetic rack and move the supernatant containing eluted chromatin to a new 0.6 ml low-retention tube . 
+ Perform all subsequent steps on both this sample and the `` input '' chromatin sample from step 5 . 
+ 15 . 
+ Adjust the `` input '' chromatin sample to 200 µl with ChIP dilution buffer and then add 20 µl of 5 M NaCl to the 200 µl samples of eluted chromatin and `` input '' chromatin . 
+ Mix well and incubate at 100 °C for 15 min to reverse the formaldehyde cross-links . 
+ Centrifuge brieﬂy at 18,000 × g to collect condensation . 
+ 16 . 
+ Add 1 µl of RNase A ( 1 µg ) to each sample , mix well , and incubate for 15 min at 37 °C to digest RNA . 
+ Centrifuge brieﬂy to collect condensation , and then add 1 µl of Proteinase K ( 0.8 U ) . 
+ Mix well and incubate for 15 min at 55 °C to digest protein and antibody , and then centrifuge brieﬂy to collect condensation . 
+ 17 . 
+ Purify the ChIP DNA and input DNA using the Qiagen MinElute PCR puriﬁcation kit . 
+ Start by adding 1 ml of buffer PB to the ~ 220 µl sample and vortex to mix . 
+ 18 . 
+ Add 700 µl of this solution to a MinElute column resting in a 2 ml collection tube . 
+ Centrifuge at 18,000 × g for 1 min and discard the ﬂow-through . 
+ 19 . 
+ Add the remaining solution from step 17 to the same column . 
+ Centrifuge at 18,000 × g for 1 min and discard the ﬂow-through . 
+ 20 . 
+ Add 750 µl of buffer PE to the column . 
+ Centrifuge at 18,000 × g for 1 min and discard the ﬂow-through . 
+ 21 . 
+ Centrifuge at 18,000 × g for 2 min to remove any remaining buffer PE from the column . 
+ 22 . 
+ Place column in a new 1.5 ml tube , add 12 µl of room temperature elution buffer EB to the center of the column membrane , and let the column stand for 1 min . 
+ 23 . 
+ Centrifuge at 18,000 × g for 1 min , discard column and place the eluted DNA on ice . 
+ 24 . 
+ Measure the DNA concentration using the PicoGreen DNA quantitation kit according to the manufacturer 's instructions ( see Note 8 ) 
+ Given the limited quantities of DNA recovered from ChIP experiments using nuclei from individual cell types , it is generally necessary to amplify the ChIP DNA prior to construction of sequencing libraries . 
+ The procedure presented here uses the nano-ChIP-seq method developed by Adli and Bernstein [ 11 , 12 ] . 
+ This method employs four initial rounds of random priming of the ChIP and input DNA using a primer with nine random bases at the 3 ′ end and a unique sequence , including a BciVI restriction site , at the 5 ′ end . 
+ This primer is designed to form a hairpin at the 5 ′ end at low temperatures in order to minimize primer self-annealing . 
+ The priming reaction is therefore carried out using Sequenase polymerase , which is active at 37 °C but is not thermostable . 
+ Thus , additional enzyme is added after each cycle of priming and denaturation . 
+ After the four cycles of random priming , a limited number of PCR cycles are carried out using a primer corresponding to the unique 5 ′ end of the primer used in the priming step , in order to amplify the DNA and add BciVI sites to each end . 
+ Finally , the ampliﬁed DNA is digested with BciVI to generate 3 ′ A overhangs , and this DNA is used for conventional ChIP-seq library preparation . 
+ For all steps in this and the subsequent section , include both ChIP DNA and `` input '' DNA samples . 
+ Also include a negative control reaction ( without added DNA ) to ensure that no DNA contamination is present in the reagents or environment . 
+ allow the cycler to proceed through Phases 3 -- 8 ( see Table 3 ) in which the temperature gradually increases from 8 to 37 °C , then holds at 37 °C for 8 min . 
+ 6 . 
+ While the thermal cycler is progressing through Phases 3 -- 8 , prepare diluted Sequenase enzyme solution ( 1:4 dilution ) by mixing 0.9 µl of Sequenase dilution buffer and 0.3 µl of the Sequenase enzyme per priming reaction . 
+ Prepare a master mix of the diluted Sequenase sufﬁcient for three additions of 1.2 µl to each reaction ( i.e. , 3.6 µl per reaction ) . 
+ Mix well by gently pipetting the entire volume up and down several times . 
+ Keep on ice . 
+ 7 . 
+ After the thermal cycler has passed again through Phase 1 ( 98 °C ) and has been in Phase 2 ( 8 °C ) for 1 min , pause the thermal cycler , remove and brieﬂy centrifuge the tubes to collect condensation , and place on ice . 
+ To each tube add 1.2 µl of diluted Sequenase enzyme ( from step 6 ) , mix well by pipetting up and down , and return the tubes to the thermal cycler , and resume the program at 8 °C for 2 min before it proceeds to Phase 3 again . 
+ 8 . 
+ Repeat step 7 two more times for a total of four rounds of priming . 
+ 9 . 
+ When the priming cycles are complete , remove excess primer by adding 3 µl of ExoSAP-IT reagent to each sample , and mix well by pipetting up and down . 
+ Incubate the reactions at 37 °C for 15 min , followed by 80 °C for 15 min to inactivate the ExoSAP-IT . 
+ 10 . 
+ Dilute the reaction product from step 9 by adding 45 µl of water and mix well by vortexing . 
+ 11 . 
+ For each ChIP and input sample , set up four identical parallel PCRs in 0.2 ml tubes by using 15 µl of diluted product from step 10 as a template for each reaction . 
+ Set up the PCR mix according to Table 4 , mix well , and perform PCR cycling as described in Table 5 ( see Note 10 ) . 
+ 12 . 
+ Pool the 4 parallel PCRs for each sample into one 1.5 ml tube . 
+ 13 . 
+ Purify the DNA using the Qiagen MinElute PCR puriﬁcation kit . 
+ Start by adding 5 volumes of buffer PB ( 1 ml ) to the samples , and vortex to mix . 
+ 14 . 
+ Add 700 µl of the solution from step 13 to a MinElute column resting in a 2 ml collection tube . 
+ Centrifuge at 18,000 × g for 1 min and discard the ﬂow-through . 
+ 15 . 
+ Add remaining solution from step 13 to the same column . 
+ Centrifuge at 18,000 × g for 1 min and discard the ﬂow-through . 
+ 16 . 
+ Add 750 µl of buffer PE to the column . 
+ Centrifuge at 18,000 × g for 1 min and discard the ﬂow-through 
+ 17 . 
+ Centrifuge at 18,000 × g for 2 min to remove any remaining buffer PE from the column . 
+ 18 . 
+ Place the column in a new 1.5 ml tube . 
+ Add 12 µl of room temperature elution buffer EB to the center of the column membrane , let the column stand for 1 min , and centrifuge at 18,000 × g for 1 min . 
+ 19 . 
+ Add an additional 12 µl of elution buffer EB to the center of the column membrane , let the column stand for 1 min , and centrifuge at 18,000 × g for 1 min . 
+ Discard the column and place the eluted DNA on ice . 
+ 20 . 
+ Measure the DNA concentration using the PicoGreen DNA quantitation kit according to the manufacturer 's instructions , in order to ensure that ampliﬁcation was successful ( see Note 11 ) 
+ The ampliﬁed ChIP DNA fragments now contain BciVI sites at each end and are of sufﬁcient quantity to prepare sequencing libraries using conventional methods . 
+ The DNA is ﬁrst digested with BciVI to generate 3 ′ A overhangs and sequencing library adapters with 5 ′ T overhangs are ligated onto the fragments . 
+ The ligated DNA is then size selected and ampliﬁed again prior to sequencing . 
+ appropriate NEXTﬂex ChIP-Seq barcode adapters . 
+ Mix well by pipetting the entire volume up and down several times and incubate at 22 °C for 15 min . 
+ The remaining BciVI digested DNA can be stored at − 80 °C for later use ( see Note 12 ) . 
+ 5 . 
+ Clean up the ligation product by using the MinElute PCR puriﬁcation kit according to steps 13 -- 18 in Subheading 3.3.1 and store the eluted DNA on ice . 
+ 6 . 
+ Mix the eluted ligation products with 1.1 µl of diluted SYBR green I gel stain ( 10,000 × stock diluted 1:1,000 in water ) and 2 µl of 10 × DNA loading dye , and incubate at room temperature for 10 min in the dark . 
+ Separate DNA on a freshly prepared 2 % ( w/v ) agarose gel . 
+ Include a DNA size marker spanning at least 100 -- 1,000 bp , in 100 bp increments . 
+ 7 . 
+ Visualize the separated products on a UV light box and sizeselect the adapter-ligated DNA fragments by cutting out a gel slice corresponding to fragment sizes between 250 and 600 bp . 
+ 8 . 
+ Purify the DNA from the agarose gel slice using the Qiagen QIAquick gel extraction kit . 
+ Weigh the gel slice in a colorless tube and add 3 volumes of Buffer QG to 1 volume of gel ( e.g. , 300 µl of QG per 100 mg of gel slice ) . 
+ Four hundred mg of gel is the maximum amount that can be used per puriﬁcation column . 
+ 9 . 
+ Incubate the gel slice in buffer QG at 50 °C for 10 min , inverting the tube every 2 -- 3 min during the incubation to mix . 
+ 10 . 
+ After the gel slice has dissolved completely , check that the color of the mixture is yellow , similar to Buffer QG without dissolved agarose , indicating the correct pH ( see Note 13 ) . 
+ 11 . 
+ Add 1 gel volume of isopropanol to the sample ( e.g. , 100 µl of isopropanol per 100 mg of gel slice ) and mix well by vortexing . 
+ 12 . 
+ Apply up to 700 µl of the sample to a MinElute spin column resting in a 2 ml collection tube , centrifuge at 18,000 × g for 1 min , discard the ﬂow-through , and place the MinElute column back in the same collection tube . 
+ 13 . 
+ If the volume of dissolved gel solution from step 11 was greater than 700 µl , add the remainder of it to the same column and repeat step 12 . 
+ 14 . 
+ Add 0.5 ml of Buffer QG to the column , centrifuge at 18,000 × g for 1 min , discard the ﬂow-through , and place the MinElute column back in the same collection tube . 
+ 15 . 
+ Add 0.75 ml of Buffer PE to the column and centrifuge at 18,000 × g for 1 min . 
+ 16 . 
+ Discard the ﬂow-through , place the MinElute column back in the same collection tube , and centrifuge the column for an additional 1 min at 18,000 × g 
+ 17 . 
+ Place column into a clean 1.5 ml tube . 
+ To elute DNA , add 30 µl of room temperature water to the center of the membrane , let the column stand for 1 min , and then centrifuge for 1 min at 18,000 × g. Discard the column and place the eluted DNA on ice . 
+ 18 . 
+ Set up the library ampliﬁcation mix according to Table 8 , using reagents from the Bioo Scientiﬁc NEXTﬂex ChIP-Seq library preparation kit . 
+ Mix well by gently pipetting the entire volume up and down several times . 
+ Perform PCR using the thermal cycling conditions indicated in Table 9 ( see Note 14 ) . 
+ 19 . 
+ Perform PCR puriﬁcation by using Agencourt AMPure XP magnetic beads . 
+ Pre-warm the beads to room temperature and gently swirl the bottle to resuspend any magnetic particles that have settled . 
+ 20 . 
+ Add 90 µl of AMPure XP beads ( 1.8 × the volume of the PCR ) to the PCR product and mix thoroughly by pipetting the entire volume up and down several times . 
+ Let the sample incubate for 5 min at room temperature with occasional mixing . 
+ 21 . 
+ Place the tube onto the DynaMag 2 magnetic rack for 2 min to remove the beads from solution 
+ 4 Notes
+ 22 . 
+ Remove the supernatant from the tube and discard . 
+ 23 . 
+ With the tubes still situated on the magnetic rack , add 200 µl of freshly prepared 80 % ethanol to each tube and incubate for 30 s. Remove the ethanol from the tubes and discard . 
+ Repeat this step for a total of two washes . 
+ 24 . 
+ Remove the tube from the magnetic rack and allow the beads to dry for about 5 min ( see Note 15 ) . 
+ 25 . 
+ Add 30 µl of room temperature water to each tube , pipette the entire volume up and down ten times to resuspend the beads , and incubate for 2 min at room temperature . 
+ 26 . 
+ Place the tube on the DynaMag 2 magnetic rack for 2 min to separate beads from the solution . 
+ Transfer the supernatant containing eluted sequencing library DNA to a new microcentrifuge tube and place on ice . 
+ 27 . 
+ Quantify DNA using the PicoGreen DNA quantitation kit according to the manufacturer 's instructions , and check the library size distribution on an Agilent Bioanalyzer . 
+ The library should appear as a range of fragments between approximately 200 and 600 bp ( Fig. 2a ; see Note 16 ) . 
+ Store the sequencing libraries at − 80 °C until use . 
+ 28 . 
+ The DNA is now ready for high-throughput sequencing on the Illumina platform ( see Note 17 ) . 
+ Figure 2b shows a genome browser shot of typical Arabidopsis ChIP-seq data from librar-ies made using the procedures presented here . 
+ mark and the amount of starting chromatin . 
+ For example , a ChIP for H3K4me3 from 25,000 Arabidopsis nuclei will yield approximately 5 -- 10 pg of DNA . 
+ 9 . 
+ During ampliﬁcation and library preparation , it is essential to avoid DNA contamination from the environment . 
+ Ensure that all work surfaces , pipettes , and reagents are free of DNA contamination . 
+ 10 . 
+ The number of PCR cycles needed should be determined empirically . 
+ The appropriate number of cycles to be used can be estimated by performing a test ampliﬁcation on a relevant amount of `` input '' DNA and following the reaction progress in a real-time PCR instrument . 
+ Cycling should be stopped during the exponential phase of the reaction , and as few cycles as possible should be used . 
+ Using 25 cycles of PCR , 10 and 100 pg of starting DNA should yield approximately 50 and 200 ng of product , respectively . 
+ 11 . 
+ It is recommended to perform qPCR to test for ChIP enrichment of one or more positive ( and negative ) control genomic regions , if such regions are known , at this step before performing the library preparations . 
+ 12 . 
+ Unique barcoded adapters can be used for each sample if multiple libraries will be sequenced in an individual ﬂow cell lane . 
+ Thirty nanograms of DNA in the adapter ligation reaction is recommended , but as little as 10 ng can be used . 
+ 13 . 
+ If the color of the dissolved gel solution is violet or orange , this means the pH is too high . 
+ This can be rectiﬁed by adding 10 µl of 3 M sodium acetate ( pH 5 ) , which should bring the pH down and the color back to yellow . 
+ 14 . 
+ The number of PCR cycles required for library ampliﬁcation should be determined empirically , as described in Note 10 . 
+ Consult your sequencing core facility to determine the total amount of sequencing library DNA required for each experiment . 
+ 15 . 
+ A drying time of 5 min is generally sufﬁcient to remove all traces of ethanol from the beads , but this time may vary depending on ambient temperature and humidity . 
+ Care must be taken not to overdry the beads ( bead pellet will appear cracked if overdried ) , as this will negatively affect DNA elution . 
+ 16 . 
+ Occasionally libraries will contain signiﬁcant amounts of adapter dimers ( a distinct band of approximately 125 bp ) and / or primer dimers ( a distinct band of approximately 80 bp ) , which will negatively affect sequencing results . 
+ These products can be easily removed before sequencing by size selection with Agencourt SPRIselect beads ( Beckman Coulter , catalog no . 
+ B23317 ) 
+ Acknowledgements
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/25873626.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/25873626.txt 0 → 100644
View file @27818a9
+ organism
+ ABSTRACT 
+ The cMonkey integrated biclustering algorithm identiﬁes conditionally co-regulated modules of genes ( biclusters ) . 
+ cMonkey integrates various orthogonal pieces of information which support evidence of gene co-regulation , and optimizes biclusters to be supported simultaneously by one or more of these prior constraints . 
+ The algorithm served as the cornerstone for constructing the ﬁrst global , predictive Environmental Gene Regulatory Inﬂuence Network ( EGRIN ) model for a free-living cell , and has now been applied to many more organisms . 
+ However , due to its computational inefﬁciencies , long run-time and complexity of various input data types , cMonkey was not readily usable by the wider community . 
+ To address these primary concerns , we have signiﬁcantly updated the cMonkey algorithm and refactored its implementation , improving its usability and extendibility . 
+ These improvements provide a fully functioning and user-friendly platform for building co-regulated gene modules and the tools necessary for their exploration and interpretation . 
+ We show , via three separate analyses of data for E. coli , M. tuberculosis and H. sapiens , that the updated algorithm and inclusion of novel scoring functions for new data types ( e.g. ChIP-seq and transcription factor over-expression [ TFOE ] ) improve discovery of biologically informative co-regulated modules . 
+ The complete cMonkey2 software package , including source code , is available at https://github.com/baliga-lab/cmonkey2 . 
+ 1Institute for Systems Biology , 401 Terry Ave N , Seattle , WA 98109 , USA and 2Department of Microbiology , University of Washington , Seattle , WA 98103 , USA 
+ INTRODUCTION
+ It is widely acknowledged that gene regulatory networks ( GRNs ) are inherently modular in nature and organized hierarchically ( 1 -- 3 ) . 
+ This modular structure results from the regulation of genes by distinct combinations of regulatory factors ; transcripts regulated by the same ( set of ) factor ( s ) are presumed to express similar patterns of differential expression over different cellular and environmental conditions . 
+ Such modularity is evident in GRNs across organisms , from the simplest prokaryotes to complex metazoans . 
+ Therefore , identifying co-regulated gene modules can significantly reduce the complexity of the problem of inference of genome-wide GRNs from data , and they can be exploited to greatly improve the accuracy of the inferred regulatory network topology ( 4 -- 7 ) . 
+ For this reason , the detection of co-regulated gene modules via integrated modeling of multiple supporting data types has been an active research topic for more than a decade . 
+ Since the seminal integrated ` module networks ' publications of Segal et al. ( 5,8 ) , SAMBA ( 2 ) and cMonkey ( 9 ) , many groups have released tools with similar overarching goals of data integration for the discovery of conditionally co-regulated modules , using various underlying statistical models and optimization methods . 
+ For example , LeMoNe ( 4 ) infers co-regulated gene modules from expression data , including detection of conditionality of co-regulation . 
+ DISTILLER ( 10 ) extends the LeMoNe framework to integrate known regulation with gene expression data . 
+ Like cMonkey , COALESCE ( 11 ) and Allegro ( 12 ) integrate de novo detection of sequence motifs with coexpression clustering to identify co-regulated gene modules and the cis-regulatory sequence features putatively responsible for their co-regulation . 
+ We refer the reader to a recent review ( 13 ) of integrated methods for detection of biological modules , and note the caveats presented by ( 14 ) regarding the large search-space involved ( particularly for complex metazoan systems ) . 
+ One primary issue with all aforementioned methods and tools is the difficulty of easily extending them to new data types or organisms . 
+ Many of these are implemented as complex command-line tools or graphical user-interfaces that can only be applied to a limited predefined set of model organisms ( typically , a few metazoans 
+ [ human , mouse , fruit fly , etc. ] , and often E. coli and S. cerevisiae ) . 
+ The cMonkey integrated biclustering algorithm ( 9 ) was designed to decipher , from genome-wide measurements , the conditional co-regulation of genes by integrating different types of information which support evidence for their co-regulation and effectively constrain , or regularize , the complex search space mentioned above . 
+ In the end , cMonkey produces biclusters that are constrained by one or more of these streams of data ( 9 ) , over subsets of experimental measurements . 
+ The three primary data sources that were originally integrated and optimized by cMonkey were ( i ) transcript co-expression ( similarity of expression profiles ) of clustered genes across subsets of measurements ; ( ii ) de novo detection of common putative cis-acting gene regulatory motifs ( which we will hereafter abbreviate as GREs , for gene regulatory elements ) in the promoters of clustered genes ( putative binding locations of the same transcriptional regulators ) ; and ( iii ) significant connectivity between clustered genes in functional association or physical interaction networks ( implying meaningful functional association , which is often correlated with co-regulation ) . 
+ We used cMonkey to construct a Environmental Gene Regulatory Influence Network ( EGRIN ) model ( 15 ) for Halobacterium salinarum NRC-1 , and have more recently used it to construct EGRIN models for many more organisms covering all three branches in the tree of life ( ( 7,16 -- 19 ) , and unpublished ) . 
+ Work on the cMonkey algorithm and its implementation has been ongoing during that time , and we are now releasing a completely updated and reengineered version of the cMonkey software tool , with optimized performance , improved documentation and ease-of-use for end-users , and enhanced modularity which will make it straightforward to extend by interested developers . 
+ The primary algorithmic modification in the new implementation is that it uses a global optimization , rather than the local , individual cluster optimization utilized by the original procedure . 
+ Additional algorithm updates include changes to the individual scoring scheme for subnetwork clustering , as well as to the heuristic used to integrate the different scores . 
+ All of these changes , which serve to improve the procedure 's runtime performance by roughly 3-fold , result in additional benefits which we will elucidate below . 
+ We have reimplemented the updated algorithm into a new framework , called cMonkey2 , which improves ease-of-use for the end-user ; greatly simplifies automated integration of additional data types and scoring mechanisms ; and enhances the resulting output to facilitate visualization and exploration of biclusters and their associated evidence ( e.g. de novo predicted GREs ) in the context of other databases and web services . 
+ These improvements make cMonkey2 a fully functioning , unified platform for integrating many kinds of genome-wide data to build gene co-regulatory modules , plus the necessary tools to explore and use them to inform biological insights . 
+ MATERIALS AND METHODS
+ Hereon , we refer to the originally published version of cMonkey as cMonkey1 . 
+ For a detailed overview of the cMonkey1 algorithm and its data integration model , we refer the reader to ( 9 ) . 
+ In the following we describe only the relevant and notable algorithm changes which have been made in the updated version , which we call cMonkey2 . 
+ These include ( i ) a switch from local bicluster optimization to a global optimization procedure ; ( ii ) a switch from a probabilistic association network-based score to a network density-based scoring function ; ( iii ) a modified , more efficient heuristic for combining the three model components into an integrated clustering score while enabling stochastic exploration of the search space . 
+ Although all three of these modifications appear to replace rigorous statistical models and distributions with heuristics , as we will show , the practical effect is a significant decrease in algorithm run-time with no detriment to performance ( to the contrary , cMonkey2 actually achieves improvement in performance ) . 
+ cMonkey1 is a local optimization procedure , in which biclusters are seeded , one at a time , and then optimized individually . 
+ As each additional bicluster is generated and optimized , any overlap ( in the form of gene membership ) between it and previously-optimized biclusters was reduced by constraining the number of biclusters ( default expected value $ v $ = 2 ) into which each gene may fall . 
+ The most significant modification we have made to the algorithm is that cMonkey2 instead performs a global optimization , that is modeled on the simple , widely-used and effective k-means clustering algorithm ( 20 ) . 
+ After beginning with a chosen distance metric and an initial partitioning of all genes into exactly k clusters ( $ v $ = 1 cluster per gene ) , the basic kmeans algorithm iterates between two steps until convergence : ( i ) ( re - ) assign each gene to the cluster with the closest centroid and ( ii ) update the centroids of each modified cluster . 
+ The updated cMonkey2 algorithm performs an analogous set of moves with four primary distinctions relative to k-means : ( i ) the distance of each gene to the centroid of each cluster is computed using a measure that combines condition-specific expression profile similarity , similarity of putative GREs detected in gene promoters , and connectedness in one or more gene association networks ( and/or additional scoring measures added via the new modular plug-in framework ; see below ) ; ( ii ) each gene can be ( re - ) assigned to more than one cluster ( default $ v $ = 2 ) ; ( iii ) at each step , conditions ( in addition to genes ) are moved among biclusters to improve their cohesiveness ; and ( iv ) at each step , genes and conditions are not always assigned to the most appropriate clusters . 
+ We now elaborate upon these four details . 
+ In cMonkey2 , as with standard k-means ( as well as the original cMonkey1 ) , k must be chosen a priori ; by default , cMonkey2 sets k such that each bicluster will contain ∼ 20 genes on average ( thus , given that each gene is assigned to $ v $ = 2 biclusters by default , k is set to Ng × $ v $ / 20 , where Ng is the number of transcripts measured across all experiments ) . 
+ cMonkey2 begins each iteration with a set of bicluster memberships mi for each element ( gene or condition ) i , where by default | mi | = $ v $ = 2 for genes ( as described previously ) , and | mi | = Nc/2 for conditions ( Nc is the numbe of conditions , or measurements , in the expression data set ; note that for standard k-means clustering , | mi | = 1 for genes and m = N for conditions ) . 
+ then computes log - | i | c cMonkey2 likelihood score matrices R , S and T , for membership of ij ij ij each element i in each bicluster j based upon , respectively , co-expression with the current gene members ( R ) , similarity of GREs in gene promoters ( S ) , and connectivity of genes in networks ( T ) . 
+ For the network scores ( T ) , the original procedure computed a p-value for enrichment of network edges among genes in each bicluster using the cumulative hypergeometric distribution . 
+ This computation was inefficient , and moreover could not account for weighted edges in the input networks , so we replaced it in cMonkey2 with a more standard weighted network clustering coefficient ( 20 ) evaluated only over the genes within each bicluster . 
+ Following computation of the individual component scores , computes a score matrix M that contains cMonkey2 ij the integrated score ( a weighted sum of log-likelihoods , as in cMonkey1 ) supporting the inclusion of gene i in bicluster j . 
+ At this stage cMonkey1 would then train an ` iteratively-reweighted constrained logistic regression ' on each bicluster 's M to obtain a posterior probability distribution p , . 
+ j ij to classify potential bicluster members i based upon these scores . 
+ This procedure proved to be a significant bottleneck on algorithm performance . 
+ In cMonkey2 we instead compute a kernel-based cumulative density distribution from these scores , to estimate the relative probability pij that each element i belongs in each cluster j . 
+ The width of the density distribution kernel is set dynamically to be larger for smaller ( fewer gene ) biclusters , so as to increase the tendency to add genes to small biclusters , rather than remove them . 
+ Whereas cMonkey1 would then sample elements i from pij to stochastically add or remove elements from each bicluster j , in the new implementation we instead add a cMonkey2 small amount of normally-distributed random ` noise ' to the scores Mij in order to achieve a similar type of stochasticity ( which helps prevent the algorithm from falling into local minima ; this noise decreases during the run to zero at the final iteration ) . 
+ The result of this noise is that at the beginning of a cMonkey2 run , biclusters are rather poorly defined ( co-expression , for example , is poor ) , but during the course of a full set of 2000 iterations , as this noise is decreased , the biclusters settle into a much more significant set of minima ( Supplementary Figure S1 ) . 
+ At the end of each iteration , cMonkey2 chooses a random subset of genes or conditions i , and moves i into bicluster j if , for any biclusters j ′ which it is already a member , p p ∀ j ′ , and out of the corresponding worse biclus-i j > ′ , ter j ′ for which pi j > pi j ′ . 
+ Thus , as with the k-means clus-i j tering algorithm , performs a global optimization cMonkey2 of all biclusters by moving elements among biclusters to improve each element 's membership scores , rather than by optimizing each bicluster one-at-a-time ( as cMonkey1 did ) . 
+ Note , however , that we have introduced an added degree of stochasticity to the optimization procedure , both from the selection of a random subset of genes and conditions to be moved at each iteration , and from the randomization described above . 
+ This type of metaheuristic does not exist in the standard k-means clustering algorithm . 
+ The cMonkey2 tool: implementation details
+ We have completely re-implemented the cMonkey software 2 tool , transforming it into a data integration platform that enables non-technical researchers to easily analyze their gene expression data in the context of additional evidence , while allowing developers to extend and tailor the base functionality with minimal development effort . 
+ cMonkey2 is available as a command-line-driven Python application . 
+ All aspects of the implementation center on modularity and extensibility , enabling developers to easily incorporate their novel data types and/or scoring methodologies into the procedure . 
+ For data input and integration , the tool now downloads , automatically from external databases , relevant information for nearly any microbe , including genome data and gene annotations ( currently using NCBI parsed annotations from RSAT ( 21 ) ; Microbes Online ( 22 ) ) ; gene functional associations ( STRING ( 23 ) ) , expression data ( Gene Expression Omnibus ( GEO ) ( 24 ) ) and others ( e.g. DoE KBase ) . 
+ We implemented cMonkey2 in object-oriented Python , in order to more effectively modularize and streamline the codebase . 
+ This switch alleviated all of the major speed and memory bottlenecks which were causing difficulty with the original implementation in GNU R. For a typical cMonkey2 run , the user provides a file containing a set of mRNA expression log-ratios in standard tab-delimited text file format . 
+ The user will also typically provide a three-letter KEGG organism code ( e.g. ` eco ' for E. coli ) , which is used to identify and automatically fetch additional information ( genome sequence , annotations , operon predictions , promoter sequences , functional associations ) from various scientific databases ( 21 -- 23 ) . 
+ For sequenced prokaryotes and unicellular eukaryotes ( such as S. cerevisiae ) , cMonkey2 provides solutions to common tasks such as name mapping and abstraction of organism-specific aspects including genomic information . 
+ Upon initialization , the tool downloads ( if necessary ) and caches the relevant public data locally . 
+ For example , predicted operons ( 25 ) are downloaded from Microbes Online ( 22 ) , full genome sequences and gene annotations are fetched from NCBI via RSAT ( 21 ) , and these data are used to parse out predicted promoter sequences for each annotated transcript ( Figure 1 ) . 
+ In cases where the organism of interest is not yet included in these databases ( e.g. it is newly sequenced ) , custom versions of these files may be supplied in standard formats ( e.g. tab-delimited , GFF , FASTA , etc. ) to enable motif searching and network clustering . 
+ If these additional data are not available , biclustering on the expression data only is still possible , and will be performed by default . 
+ cMonkey2 scoring functions . 
+ At the core of a cMonkey2 computation lies a cMonkey run . 
+ A run consists of a set of input data and configuration parameters and a set of scoring algorithms that are activated at certain iterations of the optimization procedure ( i.e. following a schedule ) , and combined using user-specified weights . 
+ All parameters have defaults which have been configured to work well with all test cases , data sets and organisms tested ( at this point , over twenty ) . 
+ The user can override certain configuration parameters or methods to customize the run . 
+ The configuratio parameters may be set via command-line or a hierarchy of user-defined and default configuration files . 
+ At the heart of the cMonkey run is a scheduler that executes specific scoring functions at user-defined iterations and with a user-defined scoring weight . 
+ Each scoring function computes the corresponding k × | i | scores for associating a given gene/condition i with each of the k biclusters . 
+ An important detail is that these scores are not required to be comprised of standard distance measures which may be difficult or expensive to calculate . 
+ They can alternatively simply consist of a ( relative ) measure of prior expectation that gene/condition i belong in cluster k given the data and given the other genes/conditions in the cluster . 
+ An example of such an ad hoc scoring mechanism is delineated in detail in our description of the new set-enrichment scoring function ( see below ) . 
+ After computation of all scoring matrices , cMonkey2 integrates the scores , as described in more detail below , via a combiner function , which runs a list of scoring functions in sequence and combines their results according to specified weights . 
+ Bicluster memberships ( rows and columns ) are then updated based upon these combined scores . 
+ A cMonkey2 scoring function has a standard interface , so users can implement and run an arbitrary number of differentlyweighted scoring functions based upon user preferences . 
+ The user has the choice to override the parameters of the scoring functions , via command-line options and/or configuration files . 
+ Default scoring functions for the three standard cMonkey data types -- -- co-expression , MEME-detected conserved promoter GREs and network clustering -- -- are provided by default . 
+ Below , we describe implementations and use-cases for two additional , newly implemented scoring functions . 
+ Implementation and integration of novel row scoring func- tions
+ To demonstrate the utility and ease of integrating additional streams of evidence for gene co-regulation via implementation of new scoring mechanism ( as described above ) , we implemented two additional row ( gene ) scoring functions which were not part of the original cMonkey1 algorithm . 
+ The first of these integrates a ` set-enrichment ' scoring function , which enables the user to influence bicluster optimization to enrich for user-defined sets of genes ( e.g. similar gene functional annotations ; known promoter binding mapped via ChIP or RNase hypersensitivity ; or known GREs ) . 
+ The second scoring function integrates an additional motif detection algorithm ( Weeder ( 26 ) ) to add motif detection via enriched k-mers , and to search for motifs in regions other than gene promoters ( here , we use 3 ' UTRs ) -- -- a functionality complementary to that already provided by MEME . 
+ Below , we describe the motivation and implementation of these two scoring functions . 
+ Later , in the Results section , we present an analysis of their influence on cMonkey2 biclustering results on data sets for three different organisms . 
+ Set-enrichment row scoring function . 
+ The set enrichment scoring function was developed to easily incorporate ( and enrich biclusters for ) predefined gene sets . 
+ Given a file which lists annotations or groupings of genes into ( possibly overlapping ) sets with unique identifiers , the set enrichment scoring function computes , at each iteration , the significance of overlap between each bicluster 's member genes and the genes annotated for each set using the Fisher 's exact test . 
+ For each bicluster , the set with the most significant overlap ( smallest p-value ) is chosen for training , and a set of row scores is generated that increases the probability of retaining cluster genes or adding new genes that are the enriched set . 
+ The gene scores are computed by a simple heuristic in which we multiply the log10 of the aforementioned p-value by 1.0 for genes which are in the bicluster and are members of the enriched set ; by 0.5 for genes which are in the set but are not in the bicluster ; and by 0.0 for all other genes . 
+ To increase stability of the set enrichment approach we added a low-pass filter which sets the p-value to a minimum of the Bonferroni cutoff p-value given the number of sets tested . 
+ These final gene ( row ) scores are then normalized and combined with the other scoring functions to train cMonkey2 biclusters . 
+ Weeder motif detection row scoring function for detection of enriched k-mers . 
+ While MEME ( 27 ) has been shown ( 28,29 ) to be among the most sensitive and robust sequence motif detection algorithms available , the fact that it models GREs as position-specific scoring matrices ( PSSMs ) and uses expectation maximization to detect the most overrepresented signatures in a set of input sequences means that it is more sensitive to detecting certain types of motifs . 
+ The Weeder algorithm ( 26 ) searches for overrepresented degenerate kmers ( rather than PSSMs ) , and has been shown ( 28 ) to be quite sensitive to detecting other types of regulatory motif signatures , including miRNA binding sites in mammalian genomes ( 30 ) . 
+ Thus , in order to detect putative signatures for miRNAs in human genomes that might be associated with disease , we developed a cMonkey2 scoring function that integrates Weeder motif detection and optimization into the algorithm . 
+ This custom row scoring function replaces MEME with Weeder to search for enriched motifs in gene promoters ( using default parameters , in ` medium ' search mode on both strands ) , and returns gene-specific scores assessing the significance of each promoter 's match to the detected motifs . 
+ Specifically , the function records the n ≤ 4 highest-scoring motifs , ranked by score and provided as PSSMs by Weeder . 
+ As with the default MEME scoring function , it then provides those PSSMs as input to MAST ( 31 ) to compute the significance ( sequence p-values ) of the match of each gene 's promoter to the detected motif ( s ) . 
+ These sequence p-values are then used by the remainder of the cMonkey2 pipeline identically to the default MEME calculation . 
+ cMonkey output , monitoring and visualization and explo-2 ration 
+ Provenance is a key concern in the new implementation , and all information necessary to completely reproduce a given run is stored in the output , including code version , input parameters and configuration files and random number seeds . 
+ cMonkey2 provides a web interface to the output cMonkey2 results database , served via an embedded web server , that enables a user to observe the progress of the run , via histograms of scores and optimization progress of mea statistics at each iteration ( Supplementary Figure S4 ( A ) ) . 
+ The design of this monitoring implementation as an embedded web server allows cMonkey2 analyses to be run on a remote server ( e.g. Amazon 's EC2 platform ) and monitored using a local web browser . 
+ In the future , we intend to extend this interface to enable initialization and control over cMonkey2 runs , as well as remote storage , visualization and exploration of computation results . 
+ The interface also enables searching , selection and viewing of individual biclusters during a run , and upon completion , a Cytoscape ( 32 ) network visualization for exploring relationships between genes , detected GREs , and their membership in biclusters ( Supplementary Figure S4 ( B ) ) . 
+ All of these resources are populated with FireGoose ( 33 ) / ChromeGoose XML microformats , which enable easy integration and exploration using other external tools and databases via the Gaggle ( 34 ) , such as MeV ( 35 ) for expression analysis , STAMP ( 36 ) and RegPrecise/RegPredict ( 37,38 ) for motif comparisons , DAVID ( 39 ) for function analysis , KEGG ( 40 ) for pathway analysis , among many others . 
+ Additionally , we have implemented an interface which enables recorded exploration of biclusters via an interactive IPython notebook . 
+ Methods for evaluation of cMonkey2 module detection E. coli : Comparisons to other published module detection algorithms . 
+ In order to assess the ramifications of the algorithm changes which we made to cMonkey2 , we evaluated its performance relative to both cMonkey1 , to other popular clustering methods -- -- k-means ( 41 ) and WGCNA ( 42 ) , and to published data integration/module detection algorithms COALESCE ( 11 ) , DISTILLER ( 10 ) and LeMoNe ( 4 ) . 
+ We note that this list is by no means comprehensive , but is rather meant to provide a representative sampling of the various data integration and module detection algorithms available . 
+ For all algorithms , we used an E. coli gene expression compendium , containing 868 measurements of mRNA expression for 4203 genes , compiled and normalized by ( 10 ) . 
+ To quantify algorithm performance , we used intrinsic ( tightness of cluster co-expression ; motif significance ) and extrinsic ( recapitulation of known biology ) measures . 
+ For the intrinsic measures , we used the mean square residue ( 43 ) and MEME ( 27 ) motif E-value to quantify bicluster and detected GRE quality , respectively . 
+ For the extrinsic quality assessment , we used the RegulonDB ( 44 ) database as the gold standard for comparing with known E. coli regulatory network modularity and known regulation . 
+ We ran all algorithms listed , other than DISTILLER , for which we used the author-provided clusterings ( based upon the same E. coli data set ) . 
+ For all algorithm runs , we attempted to generate clusters with a similar average number of genes to that of the cMonkey runs ( either , for example , by adjusting k for k-means , or trying different size k-mers for COALESCE ) . 
+ Comparisons of intrinsic measures of cluster quality . 
+ For the intrinsic measures of cluster and motif detection , we used cluster mean squared residue ( 43 ) ) to quantify cluster cohesiveness across included conditions . 
+ For motif detection , we quantified the likelihood of detecting clusters with genes that contain a statistically significant putative GRE ( MEME ( 27 ) E-value ≤ 1 ) . 
+ Detection of known E. coli regulons . 
+ The primary goal of cMonkey is to reconstruct , from expression data , a comprehensive set of co-regulated gene modules . 
+ We have chosen to define a co-regulated gene module as a set of genes which are regulated by the same combination of transcription factors ( TFs ) . 
+ We assessed cMonkey and the other algorithms in their capability to recapitulate experimentally annotated E. coli regulons in RegulonDB ( 44 ) using precision and recall . 
+ For precision , we computed the fraction of computed biclusters that had significant gene membership overlap ( more than two genes , computed via cumulative hypergeometric p-value , controlled for false discovery rate FDR ≤ 0.01 ) with at least one of the 257 such combinatorial regulons in RegulonDB . 
+ For recall , we computed the fraction of all 257 combinatorial regulons which were rediscovered by the algorithm . 
+ A standard measure of algorithm performance relative to an incomplete gold standard is an area under the precisionrecall ( AUPR ) curve . 
+ However given our evaluations which use a single clustering for each algorithm , we use the geometric mean of precision and recall , or G-measure ( 45 ) . 
+ Thus , for a given number of true positives ( TP ) , false positives ( FP ) and false negatives ( FN ) , 
+ For completeness , we also report the F1 score , which is the harmonic mean of precision and recall . 
+ Detection of known E. coli transcription factor binding sites . 
+ cMonkey has the distinction among the algorithms tested ( along with COALESCE ) that it infers , de novo , conserved putative GREs in the promoters of genes in each cluster using MEME ( 27 ) ( by default ) , and optimizes the clusters to improve those GREs . 
+ To evaluate this aspect of cMonkey , we assessed the performance of all algorithms in identifying groups of genes containing significant combinations of bona fide GREs in their promoters . 
+ For each cluster generated by each algorithm , we applied MEME for motif detection post facto , with the same set of parameters utilized by cMonkey , to the promoters of each cluster 's member genes . 
+ Using FIMO ( 46 ) , we then scanned the GREs detected by MEME across the entire E. coli genome to identify significant ( FDR ≤ 0.05 ) motif instances . 
+ We then compared the locations of these motif instances with 2283 experimentally-determined binding locations for 101 transcription factors ( TFs ) with at least three binding sites in the RegulonDB BindingSiteSet table . 
+ If positions of a GRE aligned to the positions of a TF significantly more often than expected at random ( FDR ≤ 0.01 ) , then we classified that GRE as a match to the TF . 
+ As previously , we assessed each algorithm 's precision ( fraction of all clusters with a match to a RegulonDB TF ) and recall ( fraction of all 101 RegulonDB TFs independently detected ) , and combined these into a single G measure 
+ Evaluations of new set-enrichment scoring module for E. coli . 
+ To evaluate the efficacy of the set-enrichment module ( see Methods section for details ) , we parsed the RegulonDB regulons into gene sets and integrated these into cMonkey2 using the set enrichment row scoring function . 
+ Thus , we added an additional constraint to cMonkey2 which allowed the algorithm to optimize biclusters that ( simultaneously with the other aforementioned constraints -- -- coexpression , GRE detection , and network connectivity ) should be more consistent with the annotated E. coli regulons . 
+ Mycobacterium tuberculosis : Integration of ChIP-seq and TF overexpression via set-enrichment . 
+ The newlyintroduced cMonkey2 set-enrichment scoring function was developed to improve the capability of to concMonkey2 struct co-regulated gene modules which are simultaneously enriched for known gene sets . 
+ By enriching clusters for gene sets which are expected to include additional evidence for co-regulation ( e.g. regulons from ChIP-chip/seq or known regulons ; functional annotation such as Gene Ontology ; or co-regulation via some other pre-computed evidence type ) . 
+ To further test the capability of the cMonkey2 set-enrichment scoring function to improve detection of experimentally validated regulons , we investigated its influence on modules detected for Mycobacterium tuberculosis , using a large gene expression compendium and new global ChIP-seq and transcription factor overexpression ( TFOE ) measurements . 
+ M. tuberculosis data . 
+ We used a compendium of 2,325 publicly available Mycobacterium tuberculosis transcriptome measurements collated from TBDB ( http://tbdb.org ) , as described in ( 18 ) . 
+ For our set-enrichment assessment , we integrated genome-wide binding measurements for 154 M. tuberculosis TFs , assayed via ChIP-seq ( 47 ) , and transcriptome measurements following induction ( over-expression ; hereafter , TFOE ) of 206 TFs ( 48 ) . 
+ As described in ( 18 ) , from the ChIP-seq measurements we distilled 7,248 significant TF-gene interactions through significant binding in regions spanning -150 to +70 nucleotides around transcriptional start sites for 142 TFs . 
+ Similarly , for the TFOE measurements , we used RNA measurements from the same cultures in which the TFs were induced for ChIP-Seq to obtain transcriptomes resulting from overexpression of 205 TFs . 
+ From these measurements , we identified 3,785 mRNAs with significant expression change ( p-value ≤ 0.01 ) . 
+ Running cMonkey2 on M. tuberculosis . 
+ We ran cMonkey2 on the M. tuberculosis data in eight different combinations : ( i ) without the ChIP-seq or TFOE gene sets ( default ) ; ( ii ) with only the ChIP-seq gene sets via the set-enrichment scoring function ; ( iii ) with only the TFOE gene sets via the set-enrichment scoring function ; and ( iv ) both ChIP-seq and TFOE gene sets , weighted equally via the standard cMonkey2 weighting mechanism . 
+ We additionally ran cMonkey2 with motif detection/integration turned off for all four combinations . 
+ For all runs , we used k = 600 , as in ( 18 ) , but excluded the ( default ) inclusion of EMBL STRING functional co-association networks , in order to eliminate the possibility of redundancy between test - and training data . 
+ Given the non-deterministic optimization of cMonkey2 , we ran each combination of parameterizations ten times , enabling us to report estimates of the mean and standard errors to facilitate comparisons . 
+ In total , this investigation comprised 80 separate cMonkey2 runs . 
+ Recovery of gene sets significantly enriched in cMonkey2 modules . 
+ For each run , we used a Benjamini-Hochberg-corrected p-value ( p-value ≤ 0.01 ) to identify the total number of biclusters ( out of 600 ) which were significantly enriched for any of the 142 ChIP-seq gene sets or any of the 205 TFOE gene sets . 
+ Human Lung Squamous Cell Carcinoma ( LUSC ) . 
+ To test the capability of the new cMonkey2 set-enrichment and Weeder scoring functions to improve detection of validated regulons in mammalian systems , we investigated their performance on The Cancer Genome Atlas ( TCGA ) lung squamous cell carcinoma data set . 
+ In particular , we assessed the recovery of miRNA regulators ( using 3 ' UTR sequences as input for training to the Weeder scoring function , and a pre-computed database of miRNA to target gene predictions as training input for the set-enrichment function ) . 
+ Human LUSC data . 
+ We downloaded RNA-seq gene level counts for 20 351 genes across 475 lung squamous cell carcinoma ( LUSC ) tumors ( 49 ) from the October 17 , 2014 run of the Broad TCGA GDAC Firehose ( doi :10.7908 / C1CJ8CFD ) . 
+ Using DESeq2 ( 50 ) we normalized the RNA-seq gene level counts using the variance-stabilizing transformation , computed the coefficient of variation for each gene , and selected the top 2000 genes with largest coefficients of variation as input for cMonkey2 . 
+ Each cMonkey2 run detected 133 biclusters . 
+ Significant coexpression for each bicluster was ensured by filtering out all biclusters where the variance explained by the first principal component was less than the variance explained by the first principle component for randomly sampled gene sets of same size for more than 5 % of the random samples . 
+ This filtering led to an average of 112 ± 8 biclusters per cMonkey2 run on the LUSC tumors . 
+ Running cMonkey2 on LUSC tumors . 
+ We used GeneMANIA ( 51,52 ) as the gene-gene interaction network training input for human cMonkey2 runs . 
+ A gene synonym thesaurus was created from the Ensembl BioMart database to covert between different gene identifiers . 
+ Weeder was used to discover motifs in the 3 ' UTR sequences that were extracted from the UCSC genome browser FTP site using the same methods we have described previously ( 30,53 ) . 
+ The TargetScan database of predicted human miRNA target genes release 6.2 ( 54 ) was used for set-enrichment training . 
+ We ran cMonkey2 on the LUSC normalized RNA-seq gene expression using three different training approaches : ( i ) no cis-regulatory training inputs ( i.e. no de novo motif detection ) ; ( ii ) training on de novo-detected 3 ' UTR Weeder motifs ; and ( iii ) training only on set-enrichment using TargetScan miRNA target gene predictions . 
+ As cMonkey2 is non-deterministic in nature we ran each of the three training approaches six times to provide estimates of the mean and standard errors to facilitate comparisons 
+ Human LUSC : comparing number of significant weeder motifs . 
+ The significance of Weeder motifs was determined by calculating empirical p-values for each bicluster Weeder motif score using pre-computed Weeder motifs scores from 1000 randomly sampled gene sets . 
+ As bicluster sizes vary we precomputed gene sets sizes from 5 to 65 genes on the interval of 5 genes and selected the closest gene set size for empirical p-value calculation . 
+ A Weeder motif score was considered to be significant if it had an empirical p-value less than or equal to 0.05 . 
+ A post-hoc Weeder motif discovery and empirical p-value calculation was conducted to determine the number of significant motifs detected for cMonkey2 runs that did not train on Weeder motifs . 
+ A Student 's t-test was used to compare the number of significant Weeder motifs between the three approaches . 
+ Recovery of miRNAs implicated in LUSC . 
+ Each bicluster was tested for enrichment of miRNA target genes from the TargetScan database of predicted human miRNA target genes as described previously ( 53 ) . 
+ For each run we determined the overlap of the miRNAs enriched in biclusters with the manually curated miR2Disease database which implicates 110 different miRNAs in the etiology of lung cancer and/or non-small cell lung cancer . 
+ A Student 's t-test was used to compare the number of LUSC miRNAs rediscovered between the three approaches . 
+ RESULTS
+ Below , we summarize results of our three separate analyses described in the Methods section . 
+ Overall , cMonkey2 proves to be a worthy successor to cMonkey1 , and , based upon our assessments of both cluster quality and recapitulation of known modularity and mechanisms in prokaryotic gene regulatory networks , is an excellent tool for this purpose . 
+ We also would like to note that these comparisons may be supplemented by our previously-published evaluations of cMonkey1 , which included extensive performance comparisons with many other biclustering methods , and an analysis of performance on randomized and shuffled data sets . 
+ Evaluation and comparison of module detection for E. coli In the following sections , we evaluate the performance of cMonkey2 in recapitulating known regulation as annotated in the RegulonDB database ( see Methods section for details ) . 
+ Results of all comparisons are summarized in Table 1 and plotted in Supplementary Figure S2 . 
+ Intrinsic measures of cluster quality . 
+ When compared to the algorithms tested ( see Methods section , we found that cMonkey2 identified clusters with , on average , tighter coexpression ( cluster mean squared residue ( 43 ) ) , and , other than cMonkey1 , with a greater likelihood of containing a statistically significant GRE ( MEME ( 27 ) E-value ≤ 1 ) ( Table 1 and Supplementary Figure S2 ) . 
+ Typically , there is , somewhat paradoxically , a tradeoff between obtaining tight coexpression and detecting significant GREs . 
+ Thus it is noteworthy that cMonkey2 obtained tighter clusters , while still detecting more clusters with more statistically significant 
+ GREs . 
+ While cMonkey1 clusters were more likely to contain a significant motif ( 96 % ) , this is primarily because it is both ( a ) training more heavily on GREs than on expression data , which explains the less coherent ( higher residual ) cMonkey1 biclusters ; and ( b ) redundantly detecting the same significant GREs in multiple redundant clusters ( i.e. achieving greater precision at the expense of lower recall ) . 
+ This also explains the reason that even cMonkey2 ( no motif ) , for example , achieved greater recall ( and hence greater G score ) than cMonkey1 . 
+ The modified algorithm of cMonkey2 , which only allows each gene to be assigned to no more than two biclusters , is far more stringent than the probabilistic constraint in cMonkey1 . 
+ A similar effect explains the greater precision of some other algorithms ( e.g. WGCNA ) than cMonkey -- -- the discovery of relatively fewer ( e.g. only 25 by WGCNA ) and significantly larger ( ∼ 8 × larger , for WGCNA ) modules , enables it to focus on only the most significant ( and thus most easily characterized ) modules . 
+ Detection of known E. coli regulons . 
+ Surprisingly , cMonkey2 detected combinatorial regulons with substantially greater precision ( 51 % vs. 42 % ) and recall ( 79 % vs. 50 % ) than cMonkey1 . 
+ In fact , cMonkey2 achieved greater recall than all algorithms tested , and greater precision than all except COALESCE ( 54 % ) . 
+ cMonkey2 achieved the greatest G score ( see Methods section ) for combinatorial regulon detection vs. RegulonDB ( Table 1 and Supplementary Figure S2 ; using the F1 score instead ( Methods section ) produces the same outcome ) . 
+ If we instead compare performance in recovering standard regulons ( as opposed to combinatorial regulons ; see Methods section ) , cMonkey2 again achieved the greatest G ( and F1 ) . 
+ It is noteworthy that cMonkey2 surpassed even DISTILLER on these combined measures , even though DISTILLER uses known regulon data as part of its training set , which gives it greater precision . 
+ Detection of known E. coli GREs . 
+ cMonkey2 again outperformed cMonkey1 in both precision and recall for detection of validated GRE sites in RegulonDB ( Table 1 and Figure S2 ; see Methods section ) . 
+ It also achieved greater recall than all other assessed methods , although with relatively lower precision than COALESCE . 
+ However , cMonkey2 surpassed all methods in the combined precision-recall G ( and F1 ) measure . 
+ While DISTILLER achieved greater performance than the original cMonkey1 in these measures on our E. coli GRE detection benchmarks , primarily due to its greater precision , our analysis reveals that the algorithm modifications in cMonkey2 have enabled it to outperform all methods . 
+ Evaluation of cMonkey2 integration of motifs and networks for E. coli . 
+ In order to evaluate whether the data integration scheme of cMonkey2 performed as expected , we included results for runs of cMonkey2 in which motifs and/or networks were not included as part of the training data . 
+ Not surprisingly , using the full complement of data performs significantly better than excluding motif information . 
+ However , we found that excluding only the network data ( here , STRING ( 23 ) functional association links ) did not significantly handicap the algorithm ; although exclud ing both networks and motifs performed significantly worse than only excluding either data type separately . 
+ Enrichment for E. coli known regulons via new set-enrichment row scoring module . 
+ We will now present the results of integrating the two novel cMonkey2 scoring modules ( described in Methods section ) in more detail and evaluate their utility in improving the method 's performance in recapitulating RegulonDB regulons and known GREs . 
+ Table 1 and Figure S2 shows that this integration effectively improved the cMonkey2 recapitulation of RegulonDB regulons and combinatorial regulons ( particularly , the precision of regulon detection ) , while not significantly impeding its ability to meet the other default constraints of tight conditional co-expression and significant GRE detection . 
+ Clearly , the degree to which this module can improve these measures ( and by result decrease other measures ) depends upon adjustment of its weighting schedule . 
+ While we acknowledge the circularity of this assessment , it proves that the set enrichment row scoring module has the intended effect and could be effectively used to integrate ChIP , functional annotations , or related data types into the cMonkey2 module detection process . 
+ Evaluation of co-regulated modules detected for Mycobac- terium tuberculosis
+ All assessments of cMonkey2 module predictions for M. tuberculosis are summarized in Table 2 and Supplementary Figure S3 . 
+ Set-enrichment scoring function significantly increases recovery of set-enriched modules . 
+ Recently we reported the construction of a global gene regulatory network for Mycobacterium tuberculosis ( Mtb ) by applying cMonkey to 2,325 publicly available Mtb transcriptome profiles ( 18 ) . 
+ We previously validated this predicted network using two separate global data sets : ( i ) genome-wide binding locations for 143 TFs measured via ChIP-seq ( 47 ) , and ( ii ) global transcriptional consequences of overexpressing 206 TFs ( TFOE ) ( 48 ) . 
+ We hypothesized that training on these experimentally determined TF-regulated genes from ChIP-seq or TFOE would improve cMonkey2 gene regulatory module inference . 
+ We tested this hypothesis by applying cMonkey2 analysis to the same transcriptome profiles for Mtb as used previously ( 18 ) , and varying the training inputs ( MEME de novo cisregulatory motif detection ; ChIP-seq set-enrichment ; and TFOE set-enrichment ) . 
+ Importantly , we first found that integration of MEME motifs in cMonkey2 optimization significantly increased the number of modules that were enriched for both ChIP-seq ( 350 vs. 187 significantly enriched modules , Student 's t-test p-value = 5.0 × 10 − 12 , Table 2 ) and TFOE ( 346 vs. 249 significantly enriched modules , p-value = 6.2 × 10 − 10 , Table 2 ) TF targets , in comparison runs in which motifs were excluded from training ( i.e. co-expression alone ) . 
+ This result demonstrates clearly that the cMonkey2 integration of sequence information from MEME de novo motif detection significantly improves discovery of biclusters that are enriched for Mtb TFs . 
+ Using the set-enrichment scoring function to train on ChIP-seq or TFOE ( while excluding motif detection ) , also significantly increased the number of enriched modules beyond what was discovered by co-expression training alone ( ChIP-seq p-value = 2.3 × 10 − 9 , TFOE p-value = 2.5 × 10 − 13 , Table 2 ) , or to the runs trained on MEME motifs ( ChIP-seq p-value = 1.3 × 10 − 7 ; TFOE p-value = 5.8 × 10 − 7 , Table 2 ) . 
+ Notably , this improvement was achieved with relatively little decrease in bicluster co-expression ( i.e. little increase in residual , Table 2 ) , suggesting that , by integrating this alternate form of prior cis-regulatory information , cMonkey2 is effectively exploiting the complex , multifactorial biclustering search space to result in modules with similar co-expression ` quality , ' but which are significantly more enriched with the desired sets . 
+ Due to the indirect nature of TFOE responses ( vs. inherently direct interactions measured via ChIP ) , the condition sensitivity of the experiments , and the noise inherent to ChIP-seq measurements , there is a small amount of overlapping cis-regulatory information between the ChIP-seq and TFOE data sets ( 18 ) , which leads to little to no increase in the number of TFOE set-enriched modules detected when trained on ChIP-seq sets , and vice versa . 
+ The complementary nature of these data sets and the lack of a gold-standard set of cis-regulatory predictions meant that we did not have an independent validation that could be used to assess th strength of these predictions . 
+ We address this lack of external validation in the following section on lung squamous cell carcinoma where an external validation set is available . 
+ Evaluation of module detection for human Lung Squamous Cell Carcinoma (LUSC)
+ Increased recovery of LUSC-implicated miRNAs by training on Weeder motifs . 
+ Previously , we developed methods to discover miRNA mediated regulation from gene coexpression clustering by discovering 3 ' UTR motifs using Weeder post-facto on the 3 ' UTR sequences of genes in the clusters ( 53 ) . 
+ The integration of Weeder into cMonkey2 allows us to train biclusters based simultaneously on coexpression and Weeder 3 ' UTR motifs thereby increasing the potential for discovering meaningful miRNA co-regulation . 
+ We tested this hypothesis by applying cMonkey2 to The Cancer Genome Atlas ( TCGA ) lung squamous cell carcinoma ( LUSC ) patient tumors to discover miRNA mediated co-regulation . 
+ We observed significant increase in the number of significant 3 ' UTR motifs discovered in when cMonkey2 is trained on Weeder motifs when compared with runs not trained on any cis-regulatory inputs ( p-value = 1.5 − × 10 3 ; Table 3 ) . 
+ Training on Weeder 3 ' UTR motifs also led to a significant 2.5 fold increase in recovery of miRNAs implicated in lung cancer as compared to runs not trained on any cis-regulatory inputs ( p-value = 6.0 × 10 − 5 ; Table 3 ) . 
+ We have demonstrated that integration of training on Weeder 3 ' UTR motifs into cMonkey2 has improved the discovery of 3 ' UTR motifs , which in turn leads to an impressive increase in the discovery of disease implicated miRNAs . 
+ Increased recovery of LUSC-implicated miRNAs by set-enrichment training . 
+ A faster alternative for training cMonkey2 runs to discover miRNA mediated regulation in mammalian species with large genomes is set-enrichment with databases of pre-computed miRNA target gene predictions such as TargetScan ( 54 ) . 
+ cMonkey2 was run on the TCGA LUSC patient tumors ( see above ) and trained , using set-enrichment , on TargetScan miRNA target gene predictions . 
+ This analysis led to a significant 2.3-fold increase in recovery of miRNAs implicated in lung cancer as compared to runs not trained on any cis-regulatory inputs ( pvalue = 1.8 × 10 − 6 ; Table 3 ) . 
+ Importantly , there was not a significant difference in the number of miRNAs recovered between Weeder 3 ' UTR motifs or TargetScan miRNA target gene training approaches ( p-value = 0.24 ) ; however omitting the de novo motif detection resulted in a ∼ 3 × improvement in cMonkey2 run time in the TargetScan set-enrichment runs ( 4.7 ± 0.08 ) hours , versus the Weeder 3 ' − 5 UTR motif runs 14.3 ± 0.2 hours ; p-value = 9.1 × 10 ) . 
+ These results demonstrate that if regulatory factor to target gene databases exist that set-enrichment approaches can be used in place of de novo motif detection and also lead to significant performance improvements . 
+ DISCUSSION
+ We have described cMonkey2 , an updated and improved framework for detecting co-regulated modules of genes via automated data integration and optimization . 
+ We have described our recent modifications to the algorithm , which served to improve both its runtime performance , as well as its ability to discover optimized and experimentally validated gene regulatory modules . 
+ Based upon our tests on E. coli , cMonkey2 proves to be a strong performer in this crowded arena of regulatory network module detection and data integration . 
+ We have completely overhauled the cMonkey2 implementation , focusing on ease-of-use for the end user ( with automatic downloading and integration of many different data sources for any sequenced and annotated microbe ) , and on simplicity for the developer in integrating new data types and scoring schemes into the procedure . 
+ We demonstrated the utility of two new scoring mechanisms with use-cases for three different organisms -- -- E. coli , M. tuberculosis and H. sapiens , and showed that a simple integration of a new set-enrichment scoring procedure , as well as a new motif detection algorithm ( Weeder ) improved upon the existing capability of cMonkey2 to detect valid co-regulated gene modules and cis-regulatory motifs . 
+ These tests moreover demonstrated the importance of motif integration as part of cMonkey , revealing that this constraint can significantly improve module detection performance when additiona 
+ Weeder motif training Set-enrichment training
+ Notes: (1) All p-value comparisons are relative to ‘Expression only.’
+ data sets ( other than expression data ) are not available -- -- as would be the case , for instance , for newly-sequenced and culturable microbes with no developed genetics capability . 
+ Because various motif detection algorithms use different statistical models and heuristics for learning them , they have different , often complementary capabilities for detecting different types of real signatures ( 55 ) . 
+ For this reason , a number of researchers ( 56,57 ) have taken to integrating the predictions of several different motif detection algorithms and have demonstrated resulting increased sensitivity and/or precision on prokaryotic genomes ( 56,58 ) . 
+ Our long-term goal is to integrate a number of motif detection algorithms into an ensemble learning procedure for learning co-regulated modules with cMonkey . 
+ This will in-2 clude searching , in addition to annotated gene promoters ( the current default ) , their 5 ' and 3 ' UTRs for potential miRNA and post-transcriptional regulatory motifs . 
+ As we have shown , cMonkey now provides a straightforward 2 framework for this type of integration , and we have identified excellent candidates ( 29 ) including BoBro ( 59 ) , Spacer ( 60 ) or dyad-analysis ( 61 ) , and BioProspector ( 62 ) , which model DNA motifs in different ways , as additional targets for integration . 
+ In the future , in addition to integration of multiple motif detection algorithms , we intend to use this new framework to add additional constraints via phylogenetic genomic conservation ( e.g. ( 63 ) and related ) , as well as other new data types including genomic location constraints provided by ChIP-seq ( 64 ) , ATAC-seq ( 65 ) or DNase-seq ( 66 ) measurements , which will provide significant additional constraints on the bicluster ( and in particular , motif ) optimization . 
+ Moreover , the framework provides the opportunity to investigate other measures of gene expression pattern similarity ( e.g. mutual information ) to identify other potential patterns of co-regulation . 
+ Our desire is to see cMonkey2 become a focal point for a community of users and developers , with additional data types and scoring function modules being contributed by the community . 
+ To this end , development of the framework is openly hosted on Github ( http://github.com/baliga-lab/ cmonkey2 ) , with extensive documentation , wikis , and discussion groups . 
+ We will moreover provide a framework for automatically testing modifications and improvements contributed by the community via benchmarks similar to the RegulonDB ones presented here . 
+ ACKNOWLEDGEMENTS
+ We would like to thank Aaron Brooks , Justin Ashworth , Sam Danziger , Frank Schmitz and Serdar Turkarslan for their input into the design and implementation of cMonkey2 . 
+ We would like to acknowledge Eliza Peterson for filtering and providing the Mtb ChIP-seq and TFOE data in machine-readable format . 
+ FUNDING
+ U.S. National Science Foundation [ ABI NSF-1262637 , EAGER-MSB-1237267 , DB-1262637 , MCB-1330912 ] ; National Institute of Allergy & Infectious Diseases of the National Institutes of Health [ U 19 AI10676 ] ; National Institute of General Medical Sciences of the National Institutes of Health [ P50GM076547 ] ; ENIGMA , supported by the Office of Science , Office of Biological & Environmental Research of the US Department of Energy Contract No . 
+ DE-AC02-05CH11231 ) ; University of Luxembourg -- ISB Partnership . 
+ Funding for open access charge : U.S. National Science Foundation [ ABI NSF-1262637 ] . 
+ Conflict of interest statement . 
+ None declared .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/25875675.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/25875675.txt 0 → 100644
View file @27818a9
+ Molecular Characterization of a Multidrug
+ Abstract 
+ Escherichia coli sequence type 131 ( E. coli ST131 ) is a recently emerged and globally disseminated multidrug resistant clone associated with urinary tract and bloodstream infections . 
+ Plasmids represent a major vehicle for the carriage of antibiotic resistance genes in E. coli ST131 . 
+ In this study , we determined the complete sequence and performed a comprehensive annotation of pEC958 , an IncF plasmid from the E. coli ST131 reference strain EC958 . 
+ Plasmid pEC958 is 135.6 kb in size , harbours two replicons ( RepFIA and RepFII ) and contains 12 antibiotic resistance genes ( including the blaCTX-M-15 gene ) . 
+ We also carried out hyper-saturated transposon mutagenesis and multiplexed transposon directed in-sertion-site sequencing ( TraDIS ) to investigate the biology of pEC958 . 
+ TraDIS data showed that while only the RepFII replicon was required for pEC958 replication , the RepFIA replicon contains genes essential for its partitioning . 
+ Thus , our data provides direct evidence that the RepFIA and RepFII replicons in pEC958 cooperate to ensure their stable inheritance . 
+ The gene encoding the antitoxin component ( ccdA ) of the post-segregational killing system 
+ CcdAB was also protected from mutagenesis , demonstrating this system is active . 
+ Sequence comparison with a global collection of ST131 strains suggest that IncF represents the most common type of plasmid in this clone , and underscores the need to understand its evolution and contribution to the spread of antibiotic resistance genes in E. coli ST131 . 
+ Introduction
+ Escherichia coli sequence type 131 ( E. coli ST131 ) is a recently emerged and globally disseminated multidrug resistant clone associated with urinary tract and bloodstream infections [ 1 , 2 ] . 
+ E. coli ST131 was originally identified in 2008 as a major clone linked to the spread of the CTX-M-15 ex-tended-spectrum β-lactamase ( ESBL ) - resistance gene [ 3 -- 5 ] . 
+ Since then , E. coli ST131 has also been strongly associated with fluoroquinolone resistance , as well as co-resistance to aminoglycosides and trimethoprim-sulfamethoxazole [ 6 , 7 ] . 
+ Recent analyses of the global epidemiology of E. coli ST131 using whole genome sequencing has revealed the CTX-M-15 allele is highly prevalent within a fluoroquinolone resistant-FimH30 ( H30 ) ST131 sublineage [ 8 ] and demonstrated a significant role for recombination in the evolution of this E. coli lineage [ 9 ] . 
+ As observed for most other multidrug resistant Enterobacteriaciae pathogens , plasmids are the major vehicles for carriage of antibiotic resistance genes in E. coli ST131 . 
+ Multiple plasmids from a range of incompatibility groups and containing various combinations of antibiotic resistance genes , conjugative transfer genes and other cargo genes have been described in E. coli ST131 strains [ 2 ] . 
+ This includes the IncF plasmids pEK499 , pEK516 [ 10 ] , pGUE-NDM [ 11 ] , pC15-1a [ 12 ] , pJJ1886-5 [ 13 ] , pEC_B24 , pEC_L8 , pEC_L46 [ 14 ] , pJIE186-2 [ 15 ] , as well as the IncN plasmid pECN580 [ 16 ] , the IncX plasmid pJIE143 [ 17 ] and the IncI plasmid pEK204 [ 10 ] . 
+ E. coli EC958 represents one of the best-characterised genome-sequenced E. coli ST131 strains [ 18 ] . 
+ E. coli EC958 is a phylogenetic group B2 , CTX-M-15 positive , fluoroquinolone resistant , H30 E. coli ST131 strain [ 19 ] . 
+ The strain belongs to the pulse field gel electrophoresis defined United Kingdom ( UK ) epidemic strain A [ 20 ] , and the recently defined ST131 Clade C2/H30-Rx sublineage [ 8 , 9 ] . 
+ E. coli EC958 contains multiple genes associated with the virulence of extra-intestinal E. coli , including type 1 fimbriae which are required for adherence to and invasion of human bladder cells , the formation of intracellular bacterial communities , and colonization of the mouse bladder [ 19 , 21 ] . 
+ In animal models , E. coli EC958 causes acute and chronic urinary tract infection ( UTI ) [ 21 ] and impairment of uterine contractility [ 22 ] . 
+ E. coli EC958 is also resistant to the bactericidal action of human serum , and the complement of genes that define this phenotype have been comprehensively defined [ 23 ] . 
+ E. coli EC958 contains a large IncF plasmid ( pEC958 -- HG941719 ) containing multiple antibiotic resistance genes . 
+ Here we describe the full annotation of pEC958 , and demonstrate that genes encoded on pEC958 are common among other Clade C2/H30-Rx ST131 strains . 
+ Plasmid pEC958 contains two replicons , and we show that both replicons contribute to its maintenance in E. coli EC958 . 
+ Materials and Methods
+ Bacterial strains and growth conditions
+ E. coli EC958 is a UTI strain originally isolated in the UK in 2005 [ 19 ] . 
+ E. coli TOP10 has been described previously [ 24 ] . 
+ E. coli strains were stored in 15 % glycerol at -80 °C and routinely cultured at 37 °C on solid or in liquid Lysogeny Broth ( LB ) medium . 
+ Antimicrobial susceptibility testing
+ The minimal inhibitory concentrations ( MICs ) were determined by Etest ( bioMérieux Austra-lia ) on Mueller-Hinton agar at 37 °C . 
+ The procedure and interpretation of MIC were performed as recommended by the manufacturer using CLSI breakpoints [ 25 ] . 
+ Molecular methods
+ Plasmid DNA purification was performed using the PureLink HiPure Plasmid Filter Midiprep Kit ( Life Technologies ) . 
+ E. coli TOP10 electro-competent cells were prepared as previously described [ 23 ] and pEC958 plasmid DNA was transformed into TOP10 in a 2 mm cuvette using a BioRad GenePulser set to 2.5 kV , 25 mF and 200 O. Cells were resuspended in 1 mL SOC me-dium and incubated at 37 °C for 2 hours , then selected on LB agar plates supplemented with ampicillin 100 μg / mL . 
+ The FAB formula for IncF plasmids ( IncF RST scheme [ 26 ] ) was identified in silico using the online pMLST tool ( http://cge.cbs.dtu.dk/services/pMLST/ ) [ 27 ] . 
+ The pEC958 information was uploaded to the pMLST database ( http://pubmlst.org/ ) . 
+ Annotation of pEC958
+ The sequence of plasmid pEC958 ( emb | HG941719 ) [ 18 ] was manually curated in Artemis [ 28 ] using BLAST and literature searches . 
+ Antibiotic resistance genes were named in accordance with ResFinder 1.4 [ 29 ] and confirmed manually by BLAST and literature searches . 
+ TraDIS analyses
+ The TraDIS sequence data used in this work was generated from a previously published study that examined essential genes in EC958 ( BioProject number PRJNA189704 ; input A and B samples ) [ 23 ] . 
+ The short reads were mapped to the pEC958 sequence using Maq version 0.7.1 [ 30 ] . 
+ Counts of insertion per gene and insertion index were calculated as previously described [ 23 ] . 
+ Phylogenetic tree building
+ The maximum-likelihood phylogenetic tree of EC958_A0140 homologs was built using the PhyML v3 .0 online tool [ 31 ] . 
+ The tree used the WAG model for amino acid substitution and branch supports were calculated using approximate likelihood-ratio test ( aLRT ) [ 32 ] . 
+ Visualization
+ The read counts and insertion sites from TraDIS were visualized using Artemis version 15.0 [ 28 ] . 
+ The circular genome diagram was generated by DNAplotter [ 33 ] and linear genetic diagrams were constructed using Easyfig version 2.1 [ 34 ] . 
+ Circos [ 35 ] and Circoletto [ 36 ] were used to generate the sequence comparison figure . 
+ Sequence comparisons of pEC958 against ST131 strains were generated using BLAST Ring Image Generator ( BRIG ) [ 37 ] . 
+ Results
+ Characteristics of plasmid pEC958
+ The plasmid pEC958 is a 135,600 bp circular DNA molecule containing 142 coding sequences ( CDSs ) and 10 pseudogenes ( Fig 1 ) . 
+ The most closely related plasmid to pEC958 is pEK499 ( 99 % identity covering 85 % of pEC958 ; pEK499 lacks the second transfer region present in pEC958 , which accounts for the remaining 15 % of pEC958 ) ( Fig 2 ) . 
+ In silico replicon sequence typing identified pEC958 as a hybrid plasmid containing both IncFII and IncFIA replicons ( FAB formula F2 : A1 : B - ) . 
+ The RepFIA replicon
+ The 6,509 bp RepFIA replicon in pEC958 is 99 % identical to the corresponding region on the F-plasmid ( nt 45922 to 52516 , accession no . 
+ NC_002483 .1 ) and 100 % identical to two other plasmids isolated from E. coli ST131 strains , pEK499 ( NC_013122 .1 [ 10 ] ) and pJJ1886_5 ( NC_022651 .1 [ 13 ] ) ( Fig 2 ) . 
+ As observed in many other RepFIA sequences , this region does not contain the repC gene ( replication initiation ) found on the F-plasmid . 
+ The first region of RepFIA in pEC958 contains two rfsF sites ( the target sequences of the site-specific resolvase ResD [ 38 ] ) , followed by the oriV-1 origin of replication , ccdAB genes ( post-segregational killing ) , and resD ( multimer resolution ) . 
+ The second region of RepFIA in pEC958 contains the replication repE gene ( RepFIA ) with its upstream sequences ssiA ( single strand initiation ) and oriV-2 ( including the DnaA boxes , A/T rich region and four iterons ) , and the downstream incC iterons ( incompatibility and copy-number control ) . 
+ The third region of RepFIA in pEC958 contains the sopAB partition genes and their target centromere-like sopC sequence . 
+ This is the only partition system found on pEC958 . 
+ Although this RepFIA replicon contains two origins of replication ( oriV-1 and oriV-2 ) , replication is predicted to start unidirectionally from oriV-2 because the bidirectional replication from oriV-1 is known to require the missing RepC protein [ 39 , 40 ] . 
+ The RepFII replicon
+ The second replicon in pEC958 , RepFII ( 4,068 bp ) , is 99 % identical to the IncFII replicon in the Shigella flexneri 2b plasmid R100 ( accession no . 
+ NC_002134 .1 , [ 41 ] ) and 100 % identical to the RepFII replicon in the E. coli ST131 plasmid pEK499 ( Fig 1 ) . 
+ This replicon encodes the essential gene for its replication , repA1 , which is regulated by the negative regulator RepA2 ( CopB ) , the non-coding RNA copA and the regulatory leading peptide RepA6 [ 42 -- 44 ] . 
+ The pEC958 RepFII origin of replication ( ori ) is located between repA1 and repA4 , consistent with previous descriptions for the initiation of DNA replication from this replicon [ 42 , 45 -- 47 ] . 
+ The repA4 region is important for plasmid stability and contains the ter sites for replication termination [ 48 ] . 
+ The pEC958 RepII replicon contains the tir ( transfer inhibition protein [ 49 ] ) and the type II toxin-antitoxin system pemI/pemK [ 50 , 51 ] downstream of repA4 . 
+ The transfer region of pEC958 is not functional
+ The transfer ( tra ) region of pEC958 is disrupted by a composite mobile element flanked by IS26a and IS26b , carrying blaTEM-1 gene ( Fig 1 ) . 
+ The first half of this tra region is 100 % identical to the corresponding region on pEK499 ( F2 : A1 : B - ) , and 99 % identical to the corresponding region of several other IncF plasmids including pJJ1886_5 ( F2 : A1 : B - ) , pEC_L46 ( F2 : A1 : B - ) , pEC_L8 ( F2 : A1 : B - ) , pEFC36a ( F2 : A - : B - ) and pChi7122-2 ( F11 : A - : B - ) . 
+ In contrast , the second half of the pEC958 tra region is 100 % identical to pC15-1a ( F2 : A - : B - ) , R100 ( F2 : A - : B - ) , pHN3A11 ( F2 : A - : B - ) , pFOS-HK151325 ( F2 : A - : B - ) , pXZ ( F2 : A - : B - ) , pHK23a ( F2 : A - : B - ) , pHK01 ( F2 : A - : B - ) and pEG356 ( F2 : A - : B - ) . 
+ However , the pEC958 conjugation system is missing three genes , namely trbI , traW and traU . 
+ TrbI is an inner membrane protein that affects pilus retraction [ 52 ] ; TraW is required for F-pilus assembly [ 52 ] ; and mutations in traU significantly reduce plasmid transfer proficiency [ 53 ] . 
+ Despite repeated attempts , we were unable to demonstrate conjugative transfer of pEC958 to recipient strains , supporting the bioinformatic prediction that its conjugation system is non-functional ( data not shown ) . 
+ Toxin-antitoxin systems
+ The pEC958 plasmid encodes four toxin-antitoxin ( TA ) systems : the hok/sok system , the ccdAB system encoded within RepFIA , the pemIK system encoded within RepFII and the vagDC system . 
+ The hok/sok locus encodes a type I TA system including a `` host killing '' ( hok ) transmembrane protein that damages the cell membrane , a `` modulation of killing '' ( mok ) and a `` suppression of killing '' ( sok ) antisense RNA that inhibits translation of mok [ 54 ] . 
+ Both ccdAB and pemIK belong to type II TA system where the toxin protein is inactivated by direct interaction with the antitoxin protein . 
+ The ccdB gene encodes for a gyrase inhibitor toxin [ 55 ] that kills the cell in the absence of the CcdA anti-toxin , which is unstable and degraded by the Lon protease [ 56 ] . 
+ PemK is a sequence-specific endoribonuclease that cleaves mRNAs to inhibit protein synthesis [ 50 ] whereas PemI blocks the endoribonuclease activity and is also subjected to Lon proteolysis [ 57 ] . 
+ There are two identical copies of the vagDC genes in pEC958 . 
+ Sequence analysis of VagD revealed a PIN_VapC-FitB ( cd09881 ) domain found in toxins of many bacterial TA systems . 
+ VagC contains an antitoxin-MazE ( pfam04014 ) domain . 
+ The vagDC genes have been shown to be involved in plasmid stability in Salmonella Dublin , where VagD inhibits cell division and VagC modulates the activity of VagD [ 58 ] . 
+ Mobile genetic elements and antibiotic resistance genes
+ The majority of mobile genetic elements and antibiotic resistant genes in pEC958 cluster in two regions : an 8-kb region in the middle of the tra region , and a 41-kb region located immediately downstream of the RepFII replicon ( Fig 1 ) . 
+ Plasmid pEC958 contains eight IS26 elements ( named IS26a-IS26h ) , two IS1 elements , one ISEc23 element and one group II intron ( E.c.I11 , found outside of the two regions ) ( Fig 3 ) . 
+ IS26a and IS26b are located at the two ends of the 8-kb region , flanking ISEcp1 , a remnant of Tn3 , which includes the blaTEM-1 gene , and a partial sequence of Tn21 . 
+ The beginning of the 41-kb region contains a partial sequence of Tn5403 followed by IS26c . 
+ The region between IS26c and IS26d contains a cluster of 6 genes ( EC958_A0096 to EC958_A0101 ) predicted to encode a series of ABC transporters and an iron permease . 
+ Downstream of IS26d is a class I integron In54 [ 59 ] with gene cassettes consisting of dfrA17 , aadA5 and sulI , encoding trimethoprim , streptomycin and sulfonamide resistance , respectively . 
+ The mphR-mrx-mph ( A ) operon encoding resistance to macrolides is located between IS6100 and IS26e . 
+ Immediately after IS26e is the blaCTX-M-15 gene encoding cefotaxime resistance . 
+ Located between IS26f and IS26g are catB4Δ ( non-functional ; disrupted by IS26f ) , blaOXA-1 ( beta-lactam resistance ) and aac ( 6 ' ) - Ib-cr ( fluoroquinolone and aminoglycoside resistance ) . 
+ After IS26g lies Tn1721 , which harbours tetR and tet ( A ) , encoding resistance to tetracycline . 
+ The end of the 41-kb region contains a partial sequence of Tn5403 and IS26h . 
+ Functional characterization of antibiotic resistance genes on pEC958 
+ To investigate the antibiotic resistance phenotypes conferred by plasmid pEC958 , we transformed the plasmid into E. coli TOP10 . 
+ Table 1 shows the resistance profile of wild-type EC958 ( which contains pEC958 ) compared to TOP10 ( pEC958 ) . 
+ EC958 is resistant to 11 of the 18 antibiotics tested , five of which are fully transferable via pEC958 . 
+ EC958 is resistant to the cepha-mycin cefoxitin and the three third-generation cephalosporins tested ( cefotaxime , ceftazidime and cefpodoxime ) . 
+ These phenotypes , however , were not fully transferred to TOP10 by pEC958 . 
+ TOP10 ( pEC958 ) had elevated MICs to cefoxitin , cefotaxime , ceftazidime and cefpodoxime ( MIC of 6 , 1.5 , 1.5 and 8.0 μg / mL , respectively ) compared to the background strain TOP10 ( MIC of 4 , 0.047 , 0.38 and 0.25 μg / mL , respectively ) , but these MICs were still 6 -- 10 fold lower than those of the EC958 wild-type strain . 
+ This suggests that blaCTX-M-15 on pEC958 plasmid does not mediate the full resistance against third-generation cephalosporins . 
+ This is consistent with previous reports of lower resistance to cephalosporins in strains where the blaCTX-M-15 is separated by IS26 from its promoter within the ISEcp1 element [ 60 -- 63 ] . 
+ The other resistance phenotypes not transferred were for quinolones and fluoroquinolones . 
+ Chromosomal mutations in gyrA ( S83L , D87N , A828S ) and parC ( S80I , E84V , A192V , A471G , D475E , Q481H ) genes are likely to mediate these phenotypes , even though the plasmid carries aac ( 6 ' ) - Ib-cr [ 64 -- 66 ] . 
+ Genes required for the stable maintenance of pEC958
+ In order to gain insights into molecular mechanisms of plasmid stability , we analyzed the Tra-DIS data from a saturated transposon mutant library of EC958 [ 23 ] against the complete sequence of pEC958 to identify genes required for plasmid stability . 
+ We used a total of 12 million transposon-tagged reads , of which 901,588 reads ( 7.4 % ) were mapped to plasmid pEC958 , identifying 27,317 unique insertion sites ( i.e. one insertion site every 4.96 bp ) . 
+ To devise a biological threshold for the identification of genes required for the stable maintenance of pEC958 , the insertion index ( number of mapped reads normalized by gene length ) of each plasmid gene was calculated and compared with the sopAB genes , which are known to be essential for plasmid partitioning ( Fig 4 ) . 
+ A total of 9 genetic elements were identified to be required for the stable maintenance of pEC958 . 
+ They are the ccdA , sopA and sopB genes in RepFIA ; the copA , repA6 , repA1 , repA4 genes and the oriV region in RepFII ; and the hypothetical gene EC958_A0140 . 
+ Our results indicate that replication of pEC958 is initiated at the oriV of RepFII and requires at least the copA , repA6 , repA1 , repA4 genes . 
+ While RepFIA is not essential for replication , it is required for partitioning ( sopAB ) of pEC958 into daughter cells . 
+ Our data also demonstrated that the ccdAB TA system located within RepFIA is functional . 
+ EC958_A0140 represents a novel gene associated with plasmid maintenance . 
+ We screened the NCBI complete plasmid sequence database and identified 17 other plasmids that also contain this gene ( Fig 5 ) . 
+ All of these plasmids were IncF type except for pECL_A ( non-typable ) , and several were also isolated from E. coli ST131 strains ( pJJ1886_5 , pEK499 , pEC_L8 and pEC_L46 ) . 
+ Bioinformatic analysis of EC958_A0140 did not yield any clues regarding is function , and thus further work is required to confirm its role in plasmid stability . 
+ The prevalence of pEC958-like plasmid sequences was assessed in a previously described global collection of 97 E. coli ST131 strains [ 9 ] . 
+ Fig 6 shows the overview of plasmid sequences from 97 ST131 strains plus four complete ST131 plasmids available on GenBank in comparison with the pEC958 sequence . 
+ There are 20 strains and 2 database plasmids ( pEK499 and pJJ1886_5 ) that contain more than 70 % of pEC958 gene content , all of which belong to the clade C subli-neage C2 ( 40 % ) ( Fig 6 and S1 Table ) . 
+ Twelve out of these 20 strains ( plus pEK499 ) also harbor all 9 pEC958 essential genes identified above . 
+ In silico replicon sequence typing of IncF plasmids was also performed on the 97 strains . 
+ Table 2 shows the 8 most common FAB types found within this collection . 
+ The FAB formula of pEC958 , F2 : A1 : B - , is also the most common replicon type , accounting for 20.6 % of all 97 E. coli ST131 strains , or 27.8 % of clade C strains , all of which also belong to subclade C2 . 
+ The second most common type is F1 : A2 : B20 , of which 17 are in subclade C1 and 1 is in clade A . 
+ In terms of individual replicons , FIB is present in 100 % of clade A and B strains , while FII is most common in clade C ( 87.5 % ; S1 Table ) . 
+ Based on our sequence analysis , 3/97 strains do not harbor an IncF plasmid . 
+ Discussion
+ Our study presents a full annotation of pEC958 , a multi-drug resistance plasmid in the well-characterized E. coli ST131 strain EC958 [ 18 , 19 , 23 ] . 
+ In addition , we identified genes required for the maintenance and stability of pEC958 . 
+ Although IncF plasmids are extremely successful in the E. coli ST131 clonal lineage [ 67 ] , this is the first study to examine the biology of an IncF plasmid in its native host using TraDIS [ 68 ] . 
+ The replication and stability of IncF plasmids ( Fplasmid , R1 , and R100 ) has been well documented [ 39 , 47 , 69 , 70 ] . 
+ Here we provide insights into the interplay between two replicons in order to achieve stable maintenance of the circular plasmid DNA on which they co-exist . 
+ The data analysis in this study used a straight cut-off based on the insertion index of the sopAB genes , which encode the partitioning system of pEC958 . 
+ Mutation of sopAB is known to cause destabilization of IncF plasmids and thus they represent characterised genes involved in plasmid stability [ 71 , 72 ] . 
+ This deviation from the model-based approach , in which the cut-off is defined as the intercept of two distribution models representing essential and non-essential genes [ 23 ] , was chosen because of two reasons : ( i ) the number of genes on plasmid is insufficient to build two distribution models ; and ( ii ) the cut-off previously defined using chromosomal data is not applicable because of the higher insertion frequency on the plasmid ( i.e. one insertion every 4.96 bp compared to every 9.92 bp on the chromosome ) . 
+ In the case of the well-characterised IncF system , use of a straight cut-off assumed that any gene with an insertion index lower than the sopAB genes would have a similar or stronger effect on plasmid stability . 
+ The stable maintenance of large plasmids such as pEC958 is achieved by the contribution of multiple factors , including systems involved in replication , partitioning and toxin-antitoxin production . 
+ Using the strategy outlined , we aimed to identify genes that when mutated caused destabilization of plasmid pEC958 -- thus they must play a role in plasmid stability . 
+ Our results showed that RepFII , particularly the copA , repA1 , repA4 genes and oriV region , is required for the replication of pEC958 . 
+ This is consistent with previous studies on the function of RepFII in the IncFII plasmid R100 [ 41 ] . 
+ In contrast to R100 , the RepFII region on pEC958 does not contain its own intrinsic partition system ( stb locus on R100 [ 73 , 74 ] ) . 
+ Furthermore , we could not identify any region that resembles a partition site ( centromere-like ) elsewhere on pEC958 other than within the RepFIA region . 
+ Thus , it is reasonable to assume that the sopAB genes in the RepFIA region [ 75 , 76 ] represent the only active partition system on pEC958 . 
+ Indeed , our transposon mutagenesis revealed a very low insertion index for both sopA and sopB , confirming the requirement of these two genes for pEC958 partitioning and allowing us to use these genes as a reference threshold to identify biologically significant genes required for plasmid maintenance . 
+ Using TraDIS , we were able to demonstrate that none of the known replication genes in RepFIA are required for pEC958 replication . 
+ This included the oriV-1 of RepFIA , which was not expected to be functional due to the absence of the repC gene [ 40 ] . 
+ The oriV-2 and its associated genes in RepFIA appear to be intact yet dispensable in pEC958 . 
+ Similar behavior has been reported in the dual-replicon plasmid pCG86 , which contains an active RepFII replicon and an inactive ( but intact ) RepFIB replicon [ 77 ] . 
+ This is consistent with a previously proposed model for plasmid speciation , in which the existence of co-integrate plasmids ( such as pEC958 ) allows one replicon to be relaxed and free to accumulate mutations whilst the other replicon is constrained by evolutionary pressure to maintain its replication function [ 78 ] . 
+ The RepFIA also carries one toxin-antitoxin system ccdAB in which the antitoxin CcdA is protected from transposon mutagenesis , indicating that the system is active in pEC958 . 
+ There are three other TA systems in pEC958 , none of which were required for plasmid stability under the conditions tested in this study . 
+ Others have suggested that TA systems are more than just plasmid maintenance systems ; they can also function as a stress-response system [ 79 , 80 ] , as a programmed cell-death network [ 81 ] , or as a reversible bacteriostasis system ( i.e. induction of dormancy or persistence ) [ 82 , 83 ] . 
+ It is conceivable that the redundancy of TA systems on pEC958 is linked to other functions that provide a fitness advantage to its host . 
+ Plasmids of several different incompatibility types have been identified in E. coli ST131 , including IncF , IncI1 , IncN , IncA/C and pir-type [ 2 ] . 
+ Our data demonstrates that IncF plasmids are the most common plasmid type in E. coli ST131 , and is in accordance with previous studies [ 2 , 4 ] . 
+ To investigate the prevalence of pEC958 sequences in our strain collection , we used ge-nome sequence data to evaluate the prevalence of pEC958 genes and to perform in silico IncF replicon sequence typing . 
+ We identified 20 strains ( including EC958 ) that contained more than 70 % of the genes identified on pEC958 , suggesting that many ST131 strains carry very similar plasmids . 
+ We also identified 20 strains that possess the F2 : A1 : B - plasmid replicon formula , 17 of which contain > 70 % of pEC958 genes . 
+ Taken together , our data demonstrate that pEC958 belongs to the most common group of IncF plasmids found in E. coli ST131 . 
+ The overall success of IncFII plasmids extends beyond the carriage of blaCTX-M-15 in E. coli ST131 . 
+ IncFII plasmids that have acquired the blaNDM-1 gene ( thus conferring carbapenem resistance ) have been described in the ST131 lineage [ 11 , 84 ] , but strain EC958 was isolated prior to the discovery of NDM determinants and we did not find any NDM determinants in the 97 ST131 strain collection . 
+ The IncFIIk plasmid , a replicon type originally found in Klebsiella [ 26 ] , has also been found in KPC-producing ST131 strains in the USA and China [ 85 , 86 ] . 
+ The evolution and continual gain of new antimicrobial resistance determinants in IncFII plasmids represents a major challenge for our understanding of plasmid biology and the spread of antibiotic resistance genes . 
+ Here , we shed novel insight into our knowledge of plasmid replication by providing direct evidence that the RepFIA and RepFII replicons in pEC958 cooperate to ensure their stable inheritance . 
+ The combination of replication from RepFII and partition from RepFIA may represent a co-evolutionary adaptation for this common plasmid type . 
+ Supporting Information
+ We thank Majed Alghoribi for technical assistance.
+ Author Contributions
+ Conceived and designed the experiments : MDP MU SAB MAS . 
+ Performed the experiments : MDP KMP SS SH MU . 
+ Analyzed the data : MDP BMF MSC NLBZ MU . 
+ Contributed reagents / materials/analysis tools : MSC MU SAB . 
+ Wrote the paper : MDP MU SAB MAS .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/26070154.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/26070154.txt 0 → 100644
View file @27818a9
+ The B. subtilis Accessory Helicase PcrA
+ Abstract 
+ In bacteria the concurrence of DNA replication and transcription leads to potentially deleterious encounters between the two machineries , which can occur in either the head-on ( lagging strand genes ) or co-directional ( leading strand genes ) orientations . 
+ These conflicts lead to replication fork stalling and can destabilize the genome . 
+ Both eukaryotic and pro-karyotic cells possess resolution factors that reduce the severity of these encounters . 
+ Though Escherichia coli accessory helicases have been implicated in the mitigation of head-on conflicts , direct evidence of these proteins mitigating co-directional conflicts is lacking . 
+ Furthermore , the endogenous chromosomal regions where these helicases act , and the mechanism of recruitment , have not been identified . 
+ We show that the essential Bacillus subtilis accessory helicase PcrA aids replication progression through protein coding genes of both head-on and co-directional orientations , as well as rRNA and tRNA genes . 
+ ChIP-Seq experiments show that co-directional conflicts at highly transcribed rRNA , tRNA , and head-on protein coding genes are major targets of PcrA activity on the chromosome . 
+ Partial depletion of PcrA renders cells extremely sensitive to head-on conflicts , linking the essential function of PcrA to conflict resolution . 
+ Furthermore , ablating PcrA 's ATPase/helicase activity simultaneously increases its association with conflict regions , while incapacitating its ability to mitigate conflicts , and leads to cell death . 
+ In contrast , disruption of PcrA 's C-terminal RNA polymerase interaction domain does not impact its ability to mitigate conflicts between replication and transcription , its association with conflict regions , or cell survival . 
+ Altogether , this work establishes PcrA as an essential factor involved in mitigating transcription-replica-tion conflicts and identifies chromosomal regions where it routinely acts . 
+ As both conflicts and accessory helicases are found in all domains of life , these results are broadly relevant . 
+ Introduction
+ Transcription is a major impediment to DNA replication . 
+ Head-on conflicts arise when a gene is encoded on the lagging strand , prompting transcription in the direction opposite to the movement of the replisome . 
+ Conversely , transcription of genes encoded on the leading strand causes co-directional conflicts which occur between replication and transcription complexes moving in the same direction . 
+ The significantly faster rate of replication , relative to transcription , leads to the meeting of the two complexes co-directionally . 
+ Though the deleterious effects of head-on replication-transcription conflicts have been appreciated for some time , the impact of the less severe , but more common , co-directional conflicts has only recently been established [ 1 -- 6 ] . 
+ The majority of genes in bacterial genomes are co-oriented with replication [ 7 -- 11 ] . 
+ This ge-nome co-orientation bias is thought to be a strategy to avoid the more deleterious head-on rep-lication-transcription conflicts . 
+ Although the co-orientation bias of bacterial genomes reduces the prevalence of head-on conflicts , by definition , it increases the prevalence of co-directional genes . 
+ Importantly , most highly transcribed and essential genes , including rRNA and tRNA operons , are co-oriented with replication [ 7 -- 9,11 -- 15 ] . 
+ Previously identified consequences of co-directional conflicts at rRNA genes include replication stalling and restart in Bacillus subtilis [ 5 ] . 
+ In Escherichia coli , co-directional conflicts caused by permanently arrested RNA polymer-ases have been shown to cause double-strand breaks [ 6 ] . 
+ Cells possess mechanisms that promote replication progression through conflict regions [ 16 ] . 
+ One strategy is the use of accessory helicases such as E. coli UvrD , Rep , and DinG [ 17 ] . 
+ Previous work has shed light on the beneficial effects of accessory helicases on replisome progression . 
+ There is strong evidence showing that in E. coli these proteins can promote replisome progression through artificially inverted ( head-on ) rRNA genes and that their combined activities contribute to cell survival under these conditions [ 17 ] . 
+ Additional work showed that mutations in RNA polymerase that impact the expression of rRNA genes ( rpoB mutations ) can rescue the viability of strains lacking Rep and UvrD , implying that accessory helicases may also function at endogenous co-directionally oriented rRNA genes [ 18 ] . 
+ However , direct evidence that accessory helicases resolve co-directional conflicts is lacking . 
+ The bulk of previous reports have focused on the rRNA genes , except for one instance where inversion of a large region of the chromosome containing several protein-coding genes of both head-on and co-directional orientations also led to growth defects in a rep uvrD double mutant [ 17 ] . 
+ Though these data implied that protein-coding genes may also produce physiologically relevant replication-transcription conflicts , they did not dissect the contribution , if any , that co-directional conflicts played in the associated growth defect . 
+ Because co-directional genes represent at least 50 % of most bacterial genomes , and are known to cause replisome stall-ing in B. subtilis , the additive impact of many co-directional conflicts , especially at rRNA or highly expressed protein-coding genes , may be quite significant . 
+ Therefore , clarifying the relative impact of accessory helicases in otherwise identical head-on versus co-directional conflicts should yield insight into their impact on replication . 
+ Moreover , identifying the regions where accessory helicases act within the genome should clarify their predominant function ( s ) in the cell . 
+ Homologues of the E. coli accessory helicases UvrD and Rep exist in all domains of life . 
+ B. subtilis , which diverged from E. coli more than 1 billion years ago [ 19 ] , harbors one such homologue , PcrA [ 20,21 ] . 
+ Whereas ΔuvrD and Δrep are only lethal in E. coli in combination , deletion of pcrA alone is lethal in B. subtilis . 
+ Currently , the reason PcrA is essential remains unclear , however the lethality of both ΔuvrD Δrep and ΔpcrA strains can be rescued by inactivation of the RecFOR pathway . 
+ This complex facilitates the loading of RecA onto single stranded DNA gaps [ 22,23 ] . 
+ PcrA and UvrD can remove RecA from DNA [ 24 ] and PcrA depletion strains are hyper-recombinogenic [ 21 ] . 
+ These findings suggest that the essential nature of these helicases is related to excessive RecA activity . 
+ It is unclear whether the RecA removal activity of UvrD is important in the context of conflicts . 
+ Additionally , it is unclear whether the conflict resolution activity of UvrD is conserved in PcrA . 
+ Several studies have shown that both E. coli UvrD and B. subtilis PcrA interact with RNA polymerase [ 25,26 ] . 
+ However , the physiological significance of these interactions remains unclear , except for a role recently identified for UvrD in transcription-coupled repair [ 27 ] . 
+ Furthermore , although there is an abundance of in vitro studies on UvrD and PcrA 's helicase and ATPase activities , which are coupled , the physiological relevance of these functions in vivo are poorly understood [ 24,28,29 ] . 
+ The physiological significance of PcrA 's RNA polymerase interaction or helicase/ATPase activities is unclear . 
+ Additionally , whether these features of accessory helicases are important in conflict resolution is unknown . 
+ Here we show that PcrA associates with both head-on and co-directional genes and reduces transcription-dependent replisome stalling at these regions . 
+ Using chromatin immunoprecipitations ( ChIPs ) of the replicative helicase DnaC and 2D gel analyses , we were able to detect increased replisome stalling at a single conflict in both the head-on and co-directional orientations when PcrA is depleted . 
+ Accordingly , partial depletion of PcrA , which is normally sub-lethal , causes a severe survival defect when a single head-on gene is highly transcribed . 
+ Using ChIP-Seq of DnaC and PcrA we identified chromosomal regions where PcrA predominantly associates and impacts replisome stalling . 
+ These regions include the heavily transcribed rRNA , tRNA , and other co-directionally and head-on oriented protein-coding genes . 
+ Additionally , we found that the helicase/ATPase activity of PcrA , but not its C-terminal RNA polymerase interaction domain , is required for survival in general . 
+ Furthermore , although its recruitment is not ablated , a helicase/ATPase mutant of PcrA can not mitigate conflicts and shows dominant negative effects on replisome stalling at specific transcription units . 
+ Altogether , these results identify PcrA as an essential conflict mitigation factor , provide direct evidence for its activity in replication progression through co-directional genes , map the endogenous regions of the chromosome where PcrA routinely resolves conflicts , and establish a correlation between PcrA 's essential function and its helicase/ATPase activity in resolution of conflicts . 
+ Results Construction and characterization of a conditional PcrA degradation
+ mutant To investigate whether PcrA might mitigate replication-transcription conflicts in B. subtilis , w generated a conditional mutant by developing a PcrA degron strain as previously described 
+ [ 30 ] . 
+ In this strain , the C-terminal end of the endogenous pcrA gene is translationally fused to an ssrA degradation tag . 
+ At a second locus , we integrated an IPTG-inducible gene encoding the SspB adaptor protein . 
+ PcrA is depleted when IPTG is added to the culture , inducing the expression of SspB , which then binds the ssrA tag and delivers PcrA to the ClpXP protease . 
+ After treatment of cells with 100 μM IPTG for 15 minutes we observe a 60 -- 90 % depletion of PcrA ( Fig 1A ) . 
+ Under these conditions cell survival is completely ablated ( Fig 1B ) , indicating that our conditional depletion system functions as expected . 
+ To further validate that our PcrA depletion mimics a complete pcrA knockout , we also tested the ability of recF deletion to rescue viability defects of PcrA depleted cells . 
+ We found that PcrA depletion in the absence of recF no longer causes viability defects , consistent with previous studies ( Fig 1B , and [ 21 ] ) . 
+ Depletion of PcrA leads to increased replisome stalling at transcription units of both orientations 
+ If PcrA mitigates conflicts , then replisome progression should be hindered in the absence of PcrA . 
+ We previously showed that ChIP of the replicative helicase DnaC is sensitive enough to identify replisome stalling and restart at both head-on and co-directional genes [ 5 ] . 
+ Because the majority of the genes within the genome are co-directional , and we are most interested in the physiologically relevant and naturally occurring conflicts , we again chose to use this technique for our studies . 
+ To test the potential role of PcrA in conflict mitigation , we used DnaC ChIPs to measure replisome stalling at two inducible genes : hisC under the lower strength , IPTG-inducible Pspank ( hy ) promoter and lacZ under the strongly expressed Pxis promoter , which is constitutively active or repressed , depending upon the strain ( Fig 2A and 2B ) . 
+ These constructs were integrated into the chromosome in either the head-on ( HO ) or co-directional ( CD ) orientation relative to replication . 
+ To account for the possibility that local context might affect our experiments , we used two different integration loci : Pxis-lacZ was integrated at thrC ( left chromosomal arm ) and Pspank ( hy ) - hisC was integrated at amyE ( right chromosomal arm ) . 
+ These constructs each allow for the direct comparison of otherwise identical co-directional and head-arrow ) and MLS resistance gene ( black triangle ) were integrated onto the chromosome either co-directionally ( CD ) , or head-on ( HO ) to replication at the thrC locus ( upstream and downstream thrC fragments used for integration into the chromosome are shown in white ) . 
+ Pspank ( hy ) - hisC constructs share the same conceptual design , but have a spectinomycin resistance gene in place of the MLS gene and are integrated at amyE . 
+ B ) mRNA levels were determined by RT-qPCR using primers that bind in the middle of hisC or lacZ . 
+ Levels are shown relative to the control gene , yhaX . 
+ `` Trx '' refers to transcription before or after induction with IPTG ( Trx-and Trx + , respectively ) . 
+ C ) The relative association of DnaC with either CD or HO hisC was determined by ChIP-qPCR and plotted relative to its association with control region yhaX . 
+ D ) The relative association of DnaC with lacZ was determined as in 2C . 
+ Here lacZ is expressed/repressed in strains lacking/possessing repressor protein ImmR , respectively . 
+ `` R '' refers to inhibition of transcription by subsequent addition of rifampicin ) . 
+ `` Rep. '' refers to unperturbed or HPUra-inhibited replication . 
+ An additional condition is shown where DnaC association with lacZ was determined after replication was inhibited by 15 minutes of HPUra treatment ( last bar on the right in the lacZ `` HO '' panel ) . 
+ doi :10.1371 / journal.pgen .1005289 . 
+ g002 on conflicts . 
+ The two loci , and distinct coding genes , also control for potential chromosomal lo-cation-dependent and gene sequence-dependent effects . 
+ Using ChIP-qPCR we measured the degree of DnaC association with the regions expressing either lacZ or hisC in the two orientations , when transcription is activated or repressed , and in the presence or absence of PcrA . 
+ We observed a transcription-dependent enrichment of DnaC with both the head-on and co-directional lacZ and hisC genes relative to a previously established control region , yhaX ( [ 5,31 -- 33 ] and several other control loci around the chromosome give similar results to yhaX [ 34 ] ) ( Fig 2C and 2D ) . 
+ Normalization of ChIP data from a region of interest compared to a control locus ( in this case yhaX ) generally provides the most consistent results between experiments . 
+ However , we also analyzed non-normalized , raw IP/input values to rule out any potential artifacts of normalization in these experiments . 
+ Induction of transcription leads to a significant increase in DnaC association with the head-on lacZ construct regardless of normalization to yhaX , though the absolute degree of enrichment varies between experiments ( S1A Fig ) . 
+ Consistent with the higher transcriptional level of lacZ , DnaC association at head-on lacZ was approximately 4-fold higher than at head-on hisC ( Fig 2B , 2C and 2D ) . 
+ To measure the impact of PcrA on replisome stalling we carried out DnaC ChIPs in strains harboring the PcrA degron system . 
+ After depletion of PcrA , DnaC association increased significantly relative to the conditions where PcrA was present at wild-type levels , with both of the head-on reporters ( Fig 2C and 2D , ( and see S1B Fig ) for non-normalized IP/Input values in the degron experiment ) . 
+ Again , the effect of PcrA depletion on replisome stalling was less severe at hisC relative to the lacZ gene , where DnaC association tripled , reaching 55-fold over the control region . 
+ PcrA depletion did not affect DnaC association with co-directional hisC , however it caused a small increase in DnaC association -- from 2.4 to 4.0 ( p < 0.01 ) with co-direc-tional lacZ . 
+ To determine if the increased stalling in the absence of PcrA is due to transcription , after we induced the conflict by de-repressing transcription , we treated cells with 300 μg / ml rifampicin for 3 minutes to inhibit transcription initiation . 
+ We observed that DnaC association both before and after PcrA depletion decreased to baseline after rifampicin treatment , indicating that the increased replisome stalling at these conflict regions , without PcrA , is due to repli-cation-transcription conflicts ( Fig 2C and 2D ) . 
+ Previous reports have demonstrated that certain proteins may artificially stick to transcription units and produce ChIP artifacts [ 35,36 ] . 
+ To control for the possibility that DnaC could bind non-specifically during ChIPs , we shut off replication for 15 minutes through the addition of HPUra . 
+ HPUra is a specific nucleotide analogue that inhibits PolC by inserting into its active site [ 37 ] . 
+ Under these conditions DnaC association with lacZ drops by approximately 90 % , demonstrating that this signal is replication-dependent ( Fig 2D ) . 
+ In contrast , a 15 minute HPUra treatment does not reduce the association of RNA polymerase beta subunit ( RpoB ) with this reporter region , as determined with RpoB ChIPs ( S2 Fig ) . 
+ Therefore DnaC ChIP signal is unlikely to be an artifact of transcription . 
+ To further confirm the results of our ChIP assay we analyzed replication fork stalling at headon oriented Pxis-lacZ using 2D gels ( Fig 3 ) . 
+ 2D gels allow for the direct visualization and analysis of replication intermediates [ 38 ] . 
+ Restriction digestion ( Fig 3A ) of replicating chromosomes releases branched fragments that generate Y-arcs on 2D gels ( Fig 3B diagram ) . 
+ The 2D gels with lacZ fail to reveal any replication intermediates in the absence of transcription ( Fig 3B , top panels ) -- with or without PcrA -- because replication through the transcriptionally silent region of the chromosome is extremely fast . 
+ However , after transcriptional induction an arc of replication intermediates forms , consistent with impaired replication fork movement approaching and entering the lacZ gene . 
+ The comparison between the + PcrA and -- PcrA gels reveals that the signal intensity is not uniform across the Y arcs , with areas of pausing indicated by locally darkened regions ( Fig 3B lower panels ) . 
+ Quantification of the Y-arc demonstrates that the increase in replication fork stalling at the 3 ` end of lacZ is roughly 1.7 ± 0.25 following PcrA depletion ( Fig 3C and 3D ) . 
+ These results indicate that replication is indeed slowed by the head-on transcription of lacZ , and that depletion of PcrA exacerbates this effect . 
+ We also quantified the apparent decrease in signal within the region of the larger Y-intermediates 5 ` of the initial stall site : Signal in this region drops as low as ~ 0.64 ± 0.13 when PcrA is depleted ( PcrA - / PcrA + ) . 
+ We note that a digestion intermediate is present on the EagI/ApaLI gels , partly obscuring the lacZ region . 
+ These undigested DNAs produced a second Y-arc that is not obscured and showed the same trends displayed in 3B on the right hand side , and 3D ( S3 Fig ) . 
+ These data suggest that , absent PcrA , replication forks are highly compromised in their ability to proceed past the initial point of contact with head-on RNA polymerases . 
+ They also serve as confirmation of our ChIP experiments which indicated that that DnaC accumulates at the 3 ` end of head-on genes after PcrA depletion ( Fig 2 ) . 
+ PcrA associates with conflict regions
+ It is conceivable that the effects of PcrA depletion on replisome stalling is indirect . 
+ If PcrA acts directly at our reporters , it should be physically present there . 
+ To address this possibility , we tested the association of an N-terminally Myc-tagged PcrA with our reporters by ChIP-qPCR . 
+ After confirming that Myc-tagged PcrA compliments the cell death phenotype observed in the degron ( S4 Fig ) , we performed ChIP in our reporter strains using a monoclonal antibody specific for the Myc peptide . 
+ We observed that PcrA associates with both hisC and lacZ reporters after transcription induction ( Fig 4A and 4B ) . 
+ Similar to DnaC , PcrA also associates more with both head-on reporters compared to their co-directional counterparts . 
+ To further confirm that PcrA association with the conflict region depends on transcription , we treated cells with rifampicin for 3 minutes . 
+ Because rifampicin inhibits transcription initiation , the RNA polymerase occupancy within the gene significantly decreases after treatment , thereby preventing the occurrence of replication-transcription conflicts . 
+ We found that addition of rifampicin reduced PcrA association with the hisC and lacZ loci to background levels 
+ PcrA followed by qPCR for hisC in the presence ( + IPTG ) or absence ( - IPTG ) of transcription ( Trx + , or Trx - , respectively ) and following transcriptional induction and subsequent inhibition by rifampicin ( Trx - `` R '' ) . 
+ B ) ChIP-PCR of PcrA was performed as in 4A , but for lacZ . 
+ * P < 0.05 and ** P < 0.01 . 
+ Depletion of PcrA leads to increased replisome stalling at endogenous chromosomal loci 
+ We set out to identify endogenous chromosomal regions where PcrA promotes replication progression . 
+ To globally examine all chromosomal regions , we conducted DnaC ChIP-Seq experiments in the PcrA degron strain ( HM448 ) . 
+ After deep sequencing both the total ( input ) DNA used for the IPs , and the DNA recovered from the DnaC ChIPs , we normalized the ChIP sequence reads to the input signal for both the + PcrA ( - IPTG ) and the -- PcrA ( + IPTG ) samples by subtracting input from IP signal . 
+ We then subtracted the normalized-IPTG signal from the normalized + IPTG signal , identifying the regions where DnaC association increases after PcrA depletion . 
+ We found that following PcrA depletion , DnaC association increased significantly at several regions including the rRNA and tRNA genes , and many protein-coding genes of both orientations ( Fig 5A and S5 Fig ) . 
+ For comparison , non-normalized DnaC ChIP-Seq data are displayed in S6 Fig . 
+ In addition to the rRNA and tRNA genes , the most prominent peaks included the head-on operons dltABCDE , amtB/glnk , ykaA-pit , the head-on ribosomal protein gene rpsD , and the co-directional ribosomal protein genes encoded between rrnW and rrnI ( from 8 ° -13 ° ) ( Fig 5A and S5 Fig ) . 
+ As expected , activation of the degron system also led to a significant increase in DnaC association downstream of the thrC locus where Pspank-sspB produces an artificial head-on conflict ( Fig 5A , peak H5 ) . 
+ Consistent with the observation that PcrA promotes replication across transcription units , we frequently observed an increase in ChIP signal throughout whole genes . 
+ Therefore , rather than calling peaks , we found it most appropriate to simply identify genes affected by PcrA depletion . 
+ To do this , we calculated the ChIP signal in terms of the maximum signal , or area under the curve within each gene on the chromosome . 
+ To account for peaks present in intergenic regions , we performed the same analysis with all intergenic regions greater than 5 nucleo-tides in length . 
+ Genes containing ChIP signal with a maximum height of more than 5-fold over background were considered peak containing regions . 
+ A list containing the 50 regions that met these criteria is presented in S1 Table . 
+ However , a comprehensive list of all genes and intergenic regions is also included in S2 Table . 
+ Interestingly , though replisome stalling was most 
+ ChIP-Seq data showing chromosomal locations where DnaC association increases after PcrA depletion ( strain HM448 ) . 
+ Data were calculated by first normalizing ChIP-Seq samples to inputs . 
+ Normalized ChIP signal when PcrA was present was then subtracted from ChIP signal after PcrA depletion . 
+ The resulting differential signal is shown . 
+ Peaks at rRNA genes and tRNA genes are indicated . 
+ Selected peaks are labelled according to orientation . 
+ C = co-directional , H = head-on : ribosomal protein genes at 8 -- 13 ° position ( C1 ) , ssbA ( C2 ) , pit ( H1 ) , cotC ( H2 ) , yoaM ( H3 ) , rpsD ( H4 ) , thrC ( H5 ) , amtB ( H6 ) , dltA-E ( H7 ) . 
+ The ter region is indicated by * . 
+ B ) ChIP-Seq of Myc-PcrA in an otherwise wild-type strain containing the myc-pcrA allele ( HM224 ) . 
+ ChIP data were normalized to input , then non-specific peaks ( identified via antibody control ChIP-Seq ) were subtracted out . 
+ Peaks at rRNA and tRNA genes are indicated . 
+ prominent within genes , we also observed peaks within promoter regions , including the promoters for nagP and qdoI which contain transcriptional repressor binding sites . 
+ These data are consistent with reports that PcrA removes DNA binding proteins in addition to RNA polymer-ase [ 28 ] Among the gene regions affected by PcrA activity , approximately 16 % ( 8 genes ) are head-on , and 84 % ( 42 genes ) were co-directional . 
+ These data demonstrate that replisome progression slows within genes of both head-on and co-directional orientation , consistent with DnaC association measurements at the lacZ reporter genes . 
+ Previous reports have suggested that ChIP may produce inaccurate data due to non-specific association of target proteins with the rRNA genes . 
+ To assess the accuracy of our DnaC ChIP signal at endogenous rRNA genes , we analyzed the formation of replication intermediates at these regions using 2D gels ( S8 Fig ) . 
+ When PcrA is present we did not observe any replication intermediates within rRNA genes , despite our use of digest conditions that allowed us to collectively probe for all 10 rDNA repeats simultaneously ( S8 Fig , left side ) . 
+ However , after PcrA depletion we clearly observed the formation of replication intermediates within these regions ( S8 Fig , right side ) . 
+ This result is consistent with our DnaC ChIP-Seq data and demonstrates that when PcrA is absent , replication slows when it passes through the co-directionally oriented rRNA genes . 
+ This strongly suggests that the DnaC ChIP-Seq data is accurate and not an artifact caused by the non-specific adhesion of DnaC proteins to rDNA or other genes . 
+ We also considered the possibility that the activity of PcrA at conflict regions could be related to the removal of RecFOR-loaded RecA -- a function of PcrA that has been well-characterized in vitro . 
+ To test this possibility we carried out the DnaC ChIPs-qPCRs , with and without PcrA , in a ΔrecF background and measured the levels of DnaC association at the most common conflict region we identified -- rRNA genes . 
+ We found that even when RecF is not present , depletion of PcrA leads to increased DnaC association with rRNA loci ( S9 Fig ) . 
+ This result suggests that any effects of RecF related to PcrA activity in conflicts occur either downstream of replisome stalling or are independent of conflicts . 
+ PcrA associates with rRNA and tRNA genes
+ In keeping with the results of our reporter assays , we anticipated that on the chromosome PcrA should associate with the regions where replisome stalling increases following PcrA depletion . 
+ To identify these regions we conducted ChIP-Seq of Myc-PcrA in strain HM224 ( Fig 5B ) . 
+ This strain differs from the PcrA degron strain ( HM448 ) used in Fig 5A in that it does not possess the sspB gene encoded at thrC . 
+ To reduce background signal due to non-specific interaction of the Myc antibody with endogenous proteins , we normalized Myc-PcrA ChIP signal to both input DNA and a mock IP with the anti-Myc antibody . 
+ We first normalized both the PcrA ChIP and mock IP to their respective input ( total ) DNA data sets by subtracting the input signal from the IP signal at each nucleotide position . 
+ Furthermore , because this normali-zation still produced nonspecific signal ( peaks present in both the experimental and mock IPs ) we also subtracted the mock IP-total signal from the PcrA ChIP-total signal . 
+ For comparison , non-normalized and normalized data are shown together in S7 Fig . 
+ The resulting data set indicated that PcrA associates predominantly with the rRNA and tRNA genes ( Fig 5A ) . 
+ We quantified the data as with the DnaC ChIP-Seq data set , and present a list of peak-containing regions in S3 Table . 
+ A comprehensive list of all gene regions can be found in S4 Table . 
+ The absence of detectible signal at protein-coding genes identified in the DnaC ChIP-Seq following PcrA depletion suggests that either the association of PcrA with these loci is simply below our detection limit or that its activity at these regions is transient . 
+ PcrA association and activity at endogenous conflict regions is transcription-dependent
+ We set out to determine if PcrA association and activity at the endogenous loci we identified is transcription-dependent . 
+ To address this question and to confirm our ChIP-Seq data we analyzed DnaC and PcrA association with different candidate loci relative to the control locus yhaX using ChIP-qPCRs ( Fig 6 ) . 
+ The candidate loci included rRNA , tRNA and protein-coding genes of both orientations . 
+ Specifically , for genes in the co-directional orientation , we examined the ribosomal RNA gene rrn23S , the Val1-Thr1 tRNA pair , trnSL-Ser1 , and the protein coding gene rplGB . 
+ Though we did not detect PcrA association with rplGB in our PcrA ChIP -- Seq experiment , DnaC levels increased at this locus following PcrA depletion . 
+ Therefore , rplGB was anticipated to represent the lower end of our detection range for PcrA association . 
+ For genes in the head-on orientation , we examined the ribosomal protein gene rpsD and two genes from the dltA-E operon , dltA and dltB . 
+ ( We also confirmed DnaC association , with and without PcrA , with the genes pit , cotC , and yoaM -- see S10 Fig ) . 
+ As a negative control , we also investigated the head-on gene yutJ which showed no detectable PcrA or DnaC association . 
+ To determine if the association of DnaC and PcrA with the candidate loci was transcrip-tion-dependent we used rifampicin to shut off transcription initiation : we treated cells for 3 minutes then analyzed PcrA and DnaC association with these regions . 
+ Consistent with the ChIP-Seq experiments , we found that PcrA association with some coding genes was low in ChIP-qPCR experiments ( Fig 6A ) . 
+ However , regardless of the degree of association , transcription shut-off reduced association of PcrA with all examined loci to some degree , except at the and correlate with RpoC association . 
+ A ) Relative association of PcrA with nine candidate loci , compared to the control locus yhaX was measured by ChIP-qPCR , with active ( white bars , before rifampicin treatment ) and inhibited ( gray bars , after 3 min . 
+ of rifampicin treatment ) transcription . 
+ B ) Relative association of DnaC ( compared to yhaX ) as measured by ChIP-qPCR in the presence of PcrA and transcription ( no rifampicin , black ) , following PcrA depletion ( no rifampicin , white ) , and following PcrA depletion and transcription shut off ( 3 min . 
+ of rifampicin treatment , gray bars ) . 
+ C ) Relative association of RpoC-GFP with the candidate regions ( compared to yhaX ) was measured by ChIP-qPCR before ( white ) and after transcription inhibition with rifampicin ( gray ) . 
+ Co-directional and head-on genes are indicated above the graph . 
+ Pearson correlation coefficient for DnaC ( - PcrA ) vs. PcrA association = 0.7709 . 
+ Pearson correlation coefficient for co-directional genes : R = 0.7 for DnaC ( - PcrA ) vs. RpoC , and R = 0.9 for PcrA vs. RpoC . 
+ N 5 . 
+ * P < 0.05 and ** P < 0.01 . 
+ negative control locus , yutJ ( Fig 6A ) . 
+ Also in agreement with our ChIP-Seq data , in the ChIP-qPCR experiments , DnaC association increased after PcrA depletion at all chromosomal loci examined , with the exceptions of the single tRNA gene trn-SL-ser1 , dnaK and yutJ ( Fig 6B , S10 Fig and see S11 Fig for non-normalized IP/Input values for DnaC association with the rRNA genes ) . 
+ Presumably , a single tRNA gene may simply be so short that local RNA polymerase occupancy remains limited , thereby minimizing the impact on replication . 
+ Regardless , we do observe stalling at a locus encoding multiple tRNA genes ( Val1-Thr1 ) . 
+ As with the PcrA ChIPs , after rifampicin treatment and in the absence of PcrA , DnaC association with the other loci was ablated ( Fig 6B ) . 
+ These data confirm that PcrA 's association and activity at conflict regions requires active transcription as initially indicated by our reporter data . 
+ RNA polymerase occupancy correlates with PcrA and DnaC association 
+ To determine if the association and activity of PcrA was correlated with transcription , we took a second approach : we measured RNA polymerase occupancy at the loci identified in the ChIP-Seq experiments ( as a measure of transcription level ) by conducting ChIP-qPCRs of a GFP-fusion allele of the beta ' subunit of RNA polymerase , RpoC , using an anti-GFP polyclonal antibody . 
+ We found that RpoC associates at predictably varying degrees with all but the negative control candidate regions tested ( Fig 6C ) . 
+ As anticipated , rifampicin treatment reduced this association at all examined loci by 80 % or more ( Fig 6C ) . 
+ To control for potential artifacts of the GFP-fusion we carried out ChIP-qPCRs of the beta subunit of RNA polymerase , RpoB , using a native antibody . 
+ Although the absolute degree of association of RpoB compared to the RpoC ChIPs was different , the relative association patterns with conflict regions were equivalent ( S12 Fig ) . 
+ RpoC ChIP-qPCRs allowed us to compare RNA polymerase occupancy with PcrA association and conflict severity ( DnaC association ) . 
+ RpoC , DnaC and PcrA associations closely correlated with all co-directional genes examined ( Pearson coefficient 0.7 for DnaC ( - PcrA ) vs. RpoC , and 0.9 for PcrA vs. RpoC ) . 
+ Among the head-on genes replisome stalling in the absence of PcrA correlates with PcrA occupancy ( Pearson coefficient 0.89 for DnaC ChIP -- PcrA vs. + PcrA ChIP ) and is highest at the gene with the highest RNA polymerase occupancy , rpsD . 
+ This correlation suggests that conflict severity is related to transcription levels for head-on genes . 
+ We also find comparison of the co-directional and head-on genes to be informative : replisome stalling is similar between dltA , dltB and the co-directional rplGB gene despite the significantly lower transcription levels for dltA and dltB ( roughly 3 fold lower RpoC association compared to rplGB ) . 
+ Similarly , replisome stalling at rpsD ( head-on ) is equivalent to levels at the Val1-Thr1 tRNA genes ( co-directional ) despite an approximately 3 fold lower RpoC association with rpsD . 
+ Therefore , these data are consistent with our reporter experiments where we observed that a head-on conflict causes significantly more replisome stalling than an equivalent co-directional conflict ( Fig 2C and 2D ) . 
+ Furthermore , the results of our genome-wide analyses , together with the results of the engineered conflict experiments , underscore the potent effects of head-on relative to co-directional conflicts . 
+ Although the signals observed in these experiments are relatively small , altogether , the consistency between the data from the various experiments shows that PcrA resolves both head-on and co-directional conflicts genome-wide . 
+ Partial depletion of PcrA renders cells highly sensitive to head-on conflicts
+ Highly expressed co-directional genes are common in the genome . 
+ Therefore , expression of an additional co-directional gene should not have a major effect on cell viability . 
+ However , since the number of highly expressed head-on genes is limited during fast growth , we wondered if the addition of a highly expressed head-on gene would increase the sensitivity of cells to PcrA depletion . 
+ To test this hypothesis , we measured viability before and after partial PcrA depletion in cells harboring the lacZ reporters in both orientations . 
+ In the presence of PcrA , transcription of lacZ had no effect on cell survival regardless of its orientation ( Fig 7A , no IPTG ) . 
+ However , following partial PcrA depletion with 2 μM IPTG we observed slow growth in cells with the co-directional lacZ gene and cells harboring the repressed head-on lacZ reporter ( Fig 7A , 2 μM IPTG , and 7B ) . 
+ Expression of co-directional lacZ did not cause a decrease in plating efficiency , suggesting that a single additional co-directional conflict does not have a major effect on replication or growth rate . 
+ However , expression of head-on lacZ caused a severe decrease in plating efficiency after an otherwise non-lethal degree of PcrA depletion ( Fig 7B ) . 
+ Strains harboring the hisC reporters ( which are incorporated at different regions on the chromosome and are expressed under a different promoter ) showed a similar asymmetric effect on plating efficiency ( S13 Fig ) . 
+ These results suggest that mitigation of severe conflicts by PcrA is essential for viability . 
+ The C-terminal domain of PcrA is not required for its conflict mitigation activity 
+ The mechanism ( s ) allowing accessory helicases to be recruited to conflict regions have not been defined . 
+ PcrA could be recruited to either the replication fork or to RNA polymerase during a conflict . 
+ Previous work has established that PcrA interacts with RNA polymerase [ 25,26 ] . 
+ Whether this interaction is important for its role in conflict mitigation is unknown . 
+ To address this question , we produced a mutant allele of myc-pcrA shown to dramatically reduce PcrA 's RNA polymerase association in vitro , in the closely related species , Geobacillus stearothermo-philus : myc-pcrA-ΔC [ 25 ] . 
+ This mutant lacks the final 71 amino acids of PcrA 's C-terminal domain . 
+ To avoid the potential problem of lethality , or accumulation of suppressor mutations , we expressed this mutant conditionally by placing it under the control of an IPTG-inducible promoter . 
+ When expressed in cells already harboring the PcrA degron system , the addition of IPTG triggers the simultaneous depletion of PcrA-ssrA and induction of myc-pcrA-ΔC . 
+ We found that the PcrA-ΔC protein had wild-type level activity in preventing replisome stalling and association with conflict regions of both orientations ( Fig 8A and 8B ) . 
+ Furthermore , this mutant completely rescued the viability of PcrA degron strains , indicating that RNA polymer-ase interaction through the C-terminal domain of PcrA is not essential for its conflict mitigation activity ( Fig 8C ) . 
+ It is possible that this mutant does not completely ablate PcrA 's interaction with RNA polymerase , as an N-terminal interaction between PcrA and RNA polymerase has also been detected [ 39 ] . 
+ However , based on previous work , we expect this disruption to at least partially reduce PcrA 's association with RNA polymerase . 
+ The complete lack of a phenotype in conflict mitigation and survival in strains harboring this mutant suggests that its RNA polymerase interaction is not required for PcrA 's association or activity at conflict regions . 
+ PcrA 's ATPase and helicase activities are required for conflict mitigation In vitro studies of PcrA have demonstrated that its helicase and ATPase activities are required for its ability to separate DNA strands , but are dispensable for RecA removal . 
+ To determine if PcrA 's helicase/ATPase activities are required for its conflict mitigation functions and viability , we constructed a previously characterized separation of function allele , myc-pcrA K37A Q254A ( PcrA H - ) which is defective in helicase/ATPase activity [ 24 ] . 
+ By expressing this allele in a strain already harboring the conditional degron system ( as discussed above ) we were able to use ChIP-qPCR to determine whether this mutant is capable of mitigating head-on and co-assays were carried out for strains containing the Pxis-lacZ reporter constructs . 
+ 1:10 dilutions of exponential cultures were plated on agar plates with IPTG ( at 2 μM or 100 μM , as indicated , leading to PcrA depletion ) or without IPTG ( no PcrA depletion ) . 
+ Transcription repression ( Trx - , strain HM876 ) or de-repression ( Trx + , strain HM877 ) due to the presence or absence , respectively , of the ImmR repressor protein is indicated below . 
+ Co-directional ( CD ) and head-on ( HO ) orientations of the reporters are indicated below the dilution series . 
+ B ) Quantification of cell survival following PcrA depletion during lacZ expression ( Trx + ) is plotted . 
+ Percent survival of each strain containing the reporters , after IPTG-induced depletion of PcrA , relative to predepletion , was quantified and plotted ( CD Pxis-lacZ ( gray ) and HO Pxis-lacZ ( black ) ) . 
+ Symbol * indicates that no colonies were detected after PcrA depletion with 100 μM IPTG in the presence of head-on lacZ . 
+ N = 6 . 
+ directional conflicts , and measure its association with conflict sites ( Fig 8 ) . 
+ We found that the PcrA H - allele failed to resolve both head-on ( dltB ) and co-directional ( rrn23S and rplGB ) rep-lication-transcription conflicts and that conflict severity at these regions was exaggerated in the presence of this mutant ( Fig 8A ) . 
+ The inability of PcrA H - to resolve conflicts does not reflect an inability to associate with conflict sites , as we actually observed increased association of this mutant with rrn23S and rplGB ( Fig 8B ) . 
+ This increase could potentially reflect a reduced ability to release from the DNA due to a defect in ATP hydrolysis [ 40 ] . 
+ Though we did not observe a significant association of either wild-type or the PcrA H - mutant protein at dltB , this result is not entirely surprising given the low PcrA ChIP signal we previously observed at these loci and the further decreased overall ChIP signal in these experiments . 
+ Nevertheless , the effect of PcrA H - on replisome stalling at dltB indicates that it is active at this location . 
+ To determine if the helicase/ATPase activities of PcrA are important for viability , we carried out plating efficiency assays . 
+ Here we observed that cells depleted of the wild-type PcrA and expressing the PcrA H - allele are inviable ( Fig 8C ) . 
+ Because this mutant is capable of removing RecA and preventing RecA-dependent strand exchange in the closely related species S. aureus [ 24 ] , the loss of viability in the strain harboring the PcrA H - protein may not be due to inability to remove RecA . 
+ Discussion
+ Though the role of accessory helicases in head-on conflict resolution has been appreciated in E. coli , whether these functions are conserved in other bacteria was unknown . 
+ Furthermore , though indirect evidence has also suggested the involvement of accessory helicases in co-direc-tional conflict mitigation , direct evidence for this role has been lacking . 
+ This gap in our knowledge is largely due to the lack of detectable replication stalling in co-directionally oriented genes using traditional methods . 
+ The use of ChIPs allowed us to overcome this difficulty and directly investigate PcrA 's activity at endogenous regions , including those that are co-direction-ally oriented to replication . 
+ Here we also confirm that PcrA mitigates head-on conflicts , as was anticipated based upon previous work on the E. coli homologues of PcrA . 
+ Importantly , we identify the locations where accessory helicases routinely function on the chromosome , including highly transcribed rRNA and tRNA genes , protein coding co-directional genes , and headon genes where severe conflicts occur . 
+ The results of our studies on the separation of function mutants provide insight into the mechanism of PcrA recruitment to , and activity at , these conflict regions . 
+ Together with the plating efficiency assays , our data also strongly suggest that PcrA is essential due to its role in conflict mitigation . 
+ Model for PcrA recruitment to conflict regions
+ There are at least two models that could potentially explain how PcrA associates with the site of a conflict . 
+ Based on data from E. coli regarding the interaction and movement of Rep with the replication fork , it is conceivable that PcrA is also recruited to conflict regions via interactions with the replication fork [ 41 ] . 
+ On the other hand , reports also indicate that PcrA interacts with RNA polymerase , suggesting that PcrA may be recruited to the conflict site through this association . 
+ In our system , removing the C-terminal domain of PcrA , which prevents detect-able association with RNA polymerase in vitro , did not impact its function in conflicts [ 25 ] . 
+ Because the data suggest that PcrA is recruited to conflict regions independent of its interaction with RNA polymerase , recruitment via an interaction with the replisome seems more plausible . 
+ Model for PcrA activity at conflict regions
+ There are at least two models for PcrA activity at conflict regions . 
+ PcrA may directly remove either RNA polymerases or RecA bound to single stranded DNA ahead of the replication fork . 
+ These two possibilities are not mutually exclusive . 
+ We found that the helicase/ATPase activities of PcrA , which facilitate strand separation , are required for conflict mitigation . 
+ This result was not necessarily expected given that mutants lacking these activities retain the ability to efficiently remove RecA from DNA in related species [ 24 ] . 
+ ( As the potential RecA removal activity of PcrA H - has not yet been demonstrated in B. subtilis or in vivo , the following analysis is predicated on the assumption that this in vitro activity of S. aureus PcrA is relevant to living B. subtilis cells . ) 
+ If PcrA 's role in replisome progression through transcription units stems from direct removal of RecA , then a mutant defective in helicase/ATPase activity should have , at least partially , retained the ability to mitigate conflicts . 
+ However , PcrA H - is defective in both conflict mitigation and survival . 
+ Therefore , we propose that PcrA may not mitigate conflict severity via the direct removal of RecA . 
+ In addition , the inability of this mutant to support life suggests that the rescue of ΔpcrA strains by inactivation of the RecFOR pathway stems from an activity of PcrA that is upstream of RecA recruitment ( i.e. PcrA indirectly prevents excessive RecA recruitment ) . 
+ Given the correlation between decreased viability and increased conflict severity , we suspect that the essentiality of PcrA is due specifically to its activity in conflict mitigation . 
+ We prefer a model in which PcrA clears RNA polymerases ahead of the replisome , and thereby prevents excess single stranded DNA formation and subsequent RecA binding at conflict regions . 
+ Accessory helicases and conflict mitigation
+ Previous studies comparing the two types of conflicts have shown that head-on conflicts are far worse than co-directional conflicts . 
+ Furthermore , a number of studies have suggested that accessory helicases reduce the deleterious impact of transcription on replisome progression in vivo and in vitro . 
+ In vitro work had previously suggested that B. subtilis PcrA , similar to UvrD and Rep , promotes replisome progression past a single protein block [ 41 ] . 
+ Furthermore , PcrA can complement the survival deficiencies of UvrD and Rep mutants [ 20,41 ] . 
+ Although these studies suggested that PcrA acts similarly to UvrD and Rep , the role of PcrA in conflict resolution was not directly shown prior to this study . 
+ Furthermore , the endogenous chromosomal regions where UvrD , Rep or PcrA act have not been reported . 
+ Here , we provide direct evidence for PcrA activity in conflict mitigation and identify for the first time the natural targets of PcrA . 
+ These include , as we and others suggested , naturally occurring co-directionally oriented rDNA . 
+ Interestingly , we also identified specific head-on and co-directional genes that were not necessarily predictable . 
+ Previous studies investigating the role of accessory helicases in conflict mitigation took advantage of severe conflicts caused by artificially inverted ( head-on ) rDNA , which causes major survival defects . 
+ However , the interpretations of these studies regarding conflicts may be complicated by the unique properties of rRNA genes such as GC richness , RNA polymerase stabilizing anti-termination proteins , and secondary DNA structures . 
+ Our experiments using lacZ , hisC , and several endogenous coding genes , circumvent these potential complications . 
+ Also , by detecting effects on replisome stalling at an otherwise identical gene ( i.e. hisC or lacZ ) in the two orientations , we can estimate the relative impact of gene expression and orientation on replisome progression : when PcrA is present , transcription-dependent replication stalling increases roughly by 6-fold at the head-on lacZ construct compared to its co-directional counterpart . 
+ However , in the absence of PcrA , this differential reaches more than 20-fold . 
+ Unfortunately , because these results are gathered from ensemble assays , it is difficult to further quantify the severity or frequency of replisome stalling in single cells or during a round of replication . 
+ Future experiments examining the impact of conflicts on replisome progression in single cell studies could potentially answer these questions . 
+ The impact of co-directional conflicts on replication
+ Bacterial genomes are generally organized such that highly transcribed and essential genes are oriented co-directionally with respect to DNA replication . 
+ In the case of rRNA genes , co-orien-tation is essentially universal . 
+ Though co-orientation reduces conflict severity , replication-tran-scription conflicts still occur in these regions to some degree . 
+ Our observations highlight the importance of co-directional conflicts in vivo , despite the apparent lack of effect on replisome progression in vitro [ 42 ] . 
+ Though any one co-directional conflict may not severely inhibit replication , the abundance of co-directional conflicts suggests that collectively , they can significantly slow replication . 
+ This consideration is especially important given that co-orientation appears to be a major strategy cells use to reduce conflict severity ; in doing so , cells increase the impact of co-directional conflicts . 
+ Conflict mitigation mechanisms in different bacterial species
+ Although similar conflict mitigation mechanisms seem to exist in both B. subtilis and E. coli , there is a significant difference in how the two species tolerate head-on oriented rRNA genes [ 3,43,44 ] . 
+ Furthermore , the genome co-orientation bias in the two organisms is significantly different : there are many more head-on genes in E. coli compared to B. subtilis ( 45 % vs. 26 % , respectively ) . 
+ Together , the rDNA inversion experiments and the genome co-orientation biases from these two different species suggest that E. coli cells are much more tolerant of conflicts than B. subtilis . 
+ In keeping with these differences , studies in E. coli were unable to detect replication intermediates in rRNA genes , even after accessory helicase deletion , or rDNA inversion ( replication intermediates only formed in inverted rDNAs after accessory helicase deletion ) [ 17 ] . 
+ However , we observed replication intermediates in naturally occurring , co-directionally oriented rRNA genes following PcrA deletion in B. subtilis . 
+ What is the reason for the higher conflict tolerance of E. coli ? 
+ Since multiple E. coli accessory helicases can be deleted without detectibly slowing replication through the rDNA , it seems likely that either E. coli possesses additional , as yet unidentified , conflict mitigation mechanisms , or that the replication machinery is inherently less susceptible to stalling at transcription units in E. coli . 
+ Materials and Methods
+ Media and growth conditions
+ For all experiments , B. subtilis 168 cells were plated on LB supplemented with the corresponding antibiotics . 
+ Single colonies from plates were used to inoculate cultures of liquid rich medium ( Luria-Bertain ( LB ) ) . 
+ Liquid cultures were grown to mid-log at 30 °C , shaking at 260 r.p.m. , then diluted back to OD 0.05 , and grown again to OD 0.3 -- 0.35 before harvesting . 
+ For 
+ Rifampicin treatments , 30 mg/ml Rifampicin in DMSO was added to a final concentration of 0.3 mg/ml , for 3 minutes . 
+ PcrA Degron cultures were grown as above , with the exception that cells were split at OD 0.2 into two cultures , with or without IPTG at 100 μM final concentration . 
+ At OD 0.3 -- 0.35 , cells were harvested . 
+ 2D gels
+ B. subtilis cultures were grown to OD 0.3 , then treated with 0.2 % NaAzide to arrest growth . 
+ 20 mg of cells were then suspended in low-melt agarose plugs ( 0.5 % ) as previously described [ 45 ] . 
+ Lysis was performed in 2 mg/ml lysozyme for 16 hours at 37 °C . 
+ Protein was removed via incubation with 5 mg/ml proteinase K , 5 % sarkosyl , 0.5 M EDTA for 4 hours at 37 °C . 
+ Proteinase K was then removed by 8 successive 4 hour washes in TE at 4 °C . 
+ DNA was digested overnight in plugs equilibrated in 1x CutSmart buffer plus a 0.5 μl of each of the indicated enzymes ( NEB ) . 
+ DNA was subjected to 2-dimensional electrophoresis and Southern blotting as previously [ 46 ] . 
+ Probes for Southern blots were generated via random priming of gel-extracted PCR products corresponding to the lacZ region or the rrn16S-23S rRNA regions . 
+ Probes were radioactively labelled using α-32P-dATP . 
+ Hybridization images were generated on X-ray film or with phosphor screens ( GE Healthcare ) . 
+ Y-arcs were quantified by conforming a single lane to the shape of the entire arc in ImageQuant software ( GE Healthcare ) . 
+ This yielded a histogram consisting of approximately 450 data points along the arc . 
+ Strain constructions
+ Strains used in this publication are listed in Table 1 . 
+ The ImmR protein , which is coded for by the mobile genetic element ICEBs1 , tightly represses Pxis [ 47 ] . 
+ We introduced the Pxis-lacZ constructs into strains that either harbored ( Trx - ) or were cured of ( Trx + ) ICEBs1 . 
+ To produce the PcrA degron strains , genomic DNA from strain HM448 was transformed into strains harboring the Pxis-lacZ constructs prior to selection on MLS for the Pspank-sspB construct . 
+ The pcrA-ssrA allele from HM448 was then transformed into the resulting strains as a second transformation . 
+ Survival assays
+ Single colonies were grown in liquid LB and grown to OD 0.5 , then serially diluted at a 1:10 ratio prior to plating . 
+ ChIP-qPCR analysis
+ Polyclonal rabbit anti-DnaC antibodies were used for ChIP of native DnaC [ 5,48 ] . 
+ Mouse monoclonal anti-Myc antibody purchased from Invitrogen was used for anti-Myc-PcrA ChIP experiments ( Product Number 13 -- 2500 ) . 
+ Polyclonal rabbit anti-GFP antibodies were used for RpoC-GFP ChIP experiments . 
+ DNA samples for ChIP were prepared essentially as previously described [ 5,48 ] : Bacillus subtilis cells were grown in LB medium as described . 
+ Cells were cross-linked with formaldehyde at a final concentration of 1 % v/v . 
+ Following 20 minutes of incubation at room-temperature , reactions were quenched with glycine , and cells were pelleted , washed once in 1x PBS , pelleted again , then frozen at -80 C. Pellets were re-suspended in 1.5 ml solution A ( 20 % sucrose , 50 mM NaCl , 10 mM EDTA , 10 mM Tris pH 8.0 ) , plus 1 mg/ml lyso-zyme , and 1 mM AEBSF . 
+ Following a 30 minute incubation at 37 °C , lysates were sonicated on a Fisher sonic dismembrator ( Fisher FB120 ) for 40 seconds ( 10 seconds on , 10 seconds off ) , at 
+ 30 % amplitude . 
+ Lysates were spun at 8k rpm for 15 min at 4 °C in microcentrifuge tubes , and the supernatant cell extract was transferred to fresh tubes and frozen at -80 C. ChIP was performed by adding 12 μl anti-Myc or 1 μl anti-DnaC antibody to 1 ml aliquots of extract , then incubating over-night at 4 °C in an end-over-end nutator . 
+ Antibody-bound protein : DNA complexes were precipitated using protein A sepharose beads ( GE 45000143 ) , decrosslinked over-night at 65 °C , then purified by phenol : chloroform extraction and ethanol precipitation . 
+ qPCR analysis was performed on a Bio-Rad CFX connect ( Product Number 1855201 ) using Sso Advanced SYBR green master mix ( product number 1725262 ) . 
+ Primer pairs include HM192 ( 5 ` - CCGTCTGACCCGATCTTTTA-3 ` ) and HM193 ( 5 ` - GTCATGCTGAATGTC GTGCT-3 ` ) which amplify the low conflict region yhaX , HM80 ( 5 ` - AGGATAGGGTAAGCG CGGTATT ) and HM81 ( 5 ` - TTCTCTCGATCACCTTAGGATTC-3 ` ) which amplify the rrn23S repeat of rRNA gene repeats , HM766 ( 5 ` - GCT GGG AGA GCA TCT GCC TT-3 ` ) , HM767 ( 5 ` - CCAACCTACTGATTACAAGTCAGTTGCTCTA-3 ` ) which amplify between Threonine and Valine tRNA genes at 4 repeats , HM892 ( 5 ` - CATGAAAAAGCTCGGCAA AG-3 ` ) and HM893 ( 5 ` - TGGAATCTTACGCAAAAACAAA-3 ` ) which amplify within the rplGB gene , HM803 ( 5 ` - TGTTTTGCGGAGAGGTTCTT-3 ` ) and HM804 ( 5 ` - CGGGCCGT ACGTATTAAAAA-3 ` ) which amplify within the dltA gene , and HM902 ( 5 ` - CGGGGTCAG CTACATTATGG-3 ` ) , and HM903 ( 5 ` - AGACATATGCCAGCGATTCC-3 ` ) which amplify within the dltB gene . 
+ HM770 ( 5 ` - TCTCCAGCTGTGATAAACGGTA-3 ` ) and HM771 ( 5 ` - A AAACGGCATTGATTTGTCA-3 ` ) which amplify within the dnaK gene , HM952 ( 5 ` - GGTG TAAACGAACGTCAATTCCGCAC-3 ` ) and HM953 ( 5 ` - AGCTTGTACACAACGTTATCA AGACGAGAATC-3 ` ) which amplify within the rpsD gene , and HM954 ( 5 ` - GAAGAAAAA GTGAATGAGCTGCTGAAGGAA-3 ` ) and HM955 ( 5 ` - AATGTCTTCGCTCTCAAAAAA CTCAATCAAACG-3 ` ) which amplify within the yutJ gene . 
+ qPCR analysis was conducted for both yhaX and test DNA species in both input and ChIP samples . 
+ Final fold enrichment was calculated as ( Test DNA in ChIP Sample/Test DNA in Input Sample ) / ( yhaX in ChIP sample / yhaX in Input sample ) . 
+ ChIP-Seq analysis
+ ChIP DNA samples were analyzed first using qPCR to validate that a given sample is representative . 
+ DNA samples were then processed and sequenced by the University of Washington High Throughput Sequencing Genomics Center , on an Illumina Next-Seq . 
+ FASTQ file analysis
+ Approximately 750k paired-end Illumina Next-Seq reads per sample were mapped against the genome of B. subtilis strain JH642 ( GenBank : CP007800 .1 ) using Bowtie 2 with the -- no-mixed option . 
+ This option prevents unpaired alignments , such that only reads that aligned uniquely at both ends were mapped [ 49 ] . 
+ As discordant mapping was minimal and did not significantly alter the resulting profile , discordant mapping was active . 
+ The resulting . 
+ sam file was processed by SAMtools , view , sort , and mpileup functions [ 50 ] , to produce wiggle plots . 
+ We tested the effect of removing PCR-based and optical duplicates using Picard v1 .3 and found that the same gene regions were identified in subsequent analyses , but at a slightly lower signal : noise ratio . 
+ Therefore , in the presented data sets , duplicates were not removed . 
+ Myc-PcrA ChIP-Seq data and antibody control data were first normalized to input samples ( signal in the ChIP sample minus the signal in the corresponding input sample ) . 
+ Normalized antibody control IP ( Mock IP ) data representing non-specific signal enrichment was subtracted from the normalized ChIP signal . 
+ For DnaC ChIP-Seq , ChIP samples were first normalized to inputs ( ChIP minus input ) . 
+ The normalized + PcrA DnaC ChIP sample signal was then subtracted from the normalized -- PcrA DnaC ChIP signal at each nucleotide position , establishing the differential signal . 
+ ChIP-Seq data were quantified as follows : for each gene and intergenic region , the maximal signal , average signal , area under the curve , and area under the curve divided by total gene length ( normalized area under the curve ) were calculated . 
+ Local background was independently determined for regions proximal to oriC ( the 0 -- 300k nucleotide region , where background signal is slightly higher due to higher chromosomal copy number ) , or distal to oriC , by calculating the average maximal signal in ~ 100 kb regions that were devoid of peaks . 
+ Genes containing a maximum signal of more than 5-fold above background were called as peak-containing regions . 
+ Supporting Information
+ S1 Fig . 
+ Normalization of ChIP signal to the yhaX locus allows for robust sample comparison and preserves trends in ChIP data identified at the lacZ locus . 
+ Here data are presented as individual non-normalized sample isolates , averaged non-normalized samples , and average normalized samples ( to totals and yhaX ) . 
+ Because we normalized using a control locus ( yhaX ) we did not normalize the amount of template DNA added to each reaction , leading to apparent variability between non-yhaX-normalized replicates . 
+ A ) DnaC ChIP in the presence or absence of transcription , B ) DnaC ChIP in the presence or absence of PcrA , C ) PcrA and DnaC ChIP-qPCR signal - / + transcription or - / + PcrA ( DnaC only ) demonstrates that target protein association with yhaX does not change between experimental conditions , making yhaX an ideal normalization locus . 
+ ( EPS ) 
+ S2 Fig . 
+ RNA polymerase association does not decrease following replication inhibition . 
+ RpoB ChIP-qPCR indicates that RNA polymerase association with conflict regions is not ablated by 30 minute HPura treatment ( which inhibits replication ) . 
+ However , HPUra treatment does cause a major decrease in DnaC association at the same locus as demonstrated in Fig 1D . 
+ ( EPS ) 
+ S3 Fig . 
+ A second Y-arc confirms trends apparent on the EagI/ApaLI digestion of the lacZ gene region . 
+ Here the 2D gel data displayed in fig 3B , bottom right panels , are shown again , but with the second Y-arc emanating from the undigested spot fully displayed . 
+ Here the left edge of the Y-arc is free from background signal . 
+ This allows for an improved view of the lacZ gene region where replication intermediate signal decreases after PcrA is depleted . 
+ White arrows indicate the approximate location of the 3 ` end of the lacZ gene within the second Y-arc . 
+ ( EPS ) 
+ S4 Fig . 
+ N-terminally Myc-tagged PcrA rescues cell death caused by depletion of endogenous PcrA . 
+ PcrA degron strains harboring an empty vector ( left ) or a second , N-terminally Myc-tagged copy of PcrA , integrated at amyE ( right ) were plated on media lacking IPTG ( top ) , or including IPTG ( bottom ) . 
+ Cells harboring the Myc-PcrA allele were viable following depletion of endogenous PcrA , demonstrating the functionality of the tagged protein . 
+ ( EPS ) 
+ S5 Fig . 
+ Replisome stalling increases at chromosomal regions following PcrA depletion . 
+ Candidate gene regions from DnaC ChIP-Seq ( which are also quantified by qPCR in Fig 6 and S9 Fig ) are shown in detail , either before ( blue ) or after ( red ) PcrA depletion . 
+ Sequencing coverage is indicated on the left , the gene or gene regions are identified at the top of each box , and the location ( s ) of the gene ( s ) are indicated below . 
+ Gene orientation relative to replication is denoted `` HO '' for head-on genes , and `` CD '' for co-directional genes . 
+ ( EPS ) 
+ S6 Fig . 
+ DnaC ChIP-Seq prior to and post-normalization . 
+ A ) Wiggle files for DnaC ChIP prior to ( left ) and following ( right ) PcrA depletion , as well as total ( input ) samples for both conditions , and ChIP minus total normalizations . 
+ B ) Final normalization of the DnaC ChIP -- Seq data set . 
+ Here the total normalized DnaC ChIP-Seq signal + PcrA ( S5A Fig , bottom left ) , was subtracted from -- PcrA condition signal ( S5A Fig , bottom right ) . 
+ ( EPS ) 
+ S7 Fig . 
+ PcrA ChIP-Seq prior to and post-normalization . 
+ A ) Wiggle files for PcrA ChIP , total ( input ) , and ChIP minus total normalization . 
+ B ) Wiggle file for Mock IP , total ( input ) , and Mock IP minus total normalization . 
+ C ) Magnified view of one chromosomal region encoding several rDNA repeats following total normalization . 
+ D ) Final normalization of the PcrA ChIP -- Seq data set on a global level ( right ) , and within the first 200k nucleotides . 
+ Denotes the position of a single tRNA gene . 
+ ( EPS ) 
+ S8 Fig . 
+ Replication intermediates accumulate in rRNA genes following PcrA depletion . 
+ To test for potentially slow replication through rDNA genes after PcrA depletion , chromosomal 
+ DNA was digested using KpnI and EagI restriction enzymes which cut at the same position in all rRNA genes , causing all 10 rRNAs to run together during 2D gel electrophoresis . 
+ A ) Restriction digest map showing the 16S-23S rRNA fragment that was probed against during 2D gel analysis . 
+ B ) 2D gels for rRNA gene fragments in the presence or absence of PcrA . 
+ The relative amount of DNA loaded , as indicated by quantification of the non-replicating 1N spot , is indicated at the top right . 
+ The appearance of an arc of replication intermediates following PcrA depletion indicates replication slowing/stalling in this region . 
+ ( EPS ) 
+ S9 Fig . 
+ RecF is not required for the increase in replisome stalling observed after PcrA depletion . 
+ Replisome stalling at rrn23S was measured by DnaC ChIP-qPCR in the presence ( left ) or absence of recF ( right ) , and before ( black ) or after PcrA depletion ( white ) . 
+ The equivalent increase in DnaC association following PcrA depletion in the presence or absence of RecF suggests that the viability of the PcrA degron ΔrecF strain , following PcrA depletion , is not due to a reduction in replication fork stalling . 
+ ( EPS ) 
+ S10 Fig . 
+ Additional peaks identified via DnaC ChIP-Seq also show increased replisome stalling after PcrA depletion when assayed using ChIP-qPCR . 
+ Replisome stalling , as measured by DnaC ChIP-qPCR , was measured for three additional gene regions ( all head-on ) identified in Fig 5 . 
+ These locations each show increased DnaC association after PcrA depletion , verifying that PcrA mitigates conflicts in these regions . 
+ ( EPS ) 
+ S11 Fig . 
+ Normalization of ChIP signal to the yhaX locus allows for more accurate sample comparison . 
+ ChIP-qPCR data showing A ) DnaC association with experimental locus rrn23S as individual non-normalized sample isolates , B ) DnaC association with control locus yhaX as individual samples , or C ) averaged samples normalized only to input ( total samples ) and average normalized samples ( to totals and yhaX ) . 
+ Trends in the data are observable prior to , as well as after normalization . 
+ However , normalization controls for sample loading , and ChIP efficiency , thereby contributing to a more accurate and consistent measurement of protein association with test locus rrn23S . 
+ ( EPS ) 
+ S12 Fig . 
+ RNA polymerase beta and beta ` subunits shows similar association patterns with genomic loci . 
+ To measure relative RNA polymerase occupancy , ChIP-qPCR of RNA polymer-ase subunits beta ( RpoB ) , or beta ` ( RpoC-GFP ) , were analyzed for their association with four genomic loci . 
+ RpoC-GFP was immunoprecipitated using anti-GFP antibodies , whereas RpoB was immunoprecipitated with a native RpoB monoclonal antibody ( Abcam ab12087 ) . 
+ The rrn23S , and rplGB loci are co-directional , and the dltA , and rpsD loci are head-on to replication . 
+ ( EPS ) 
+ S13 Fig . 
+ PcrA essentiality is linked to its activity in conflict mitigation . 
+ A ) Cells exhibit equal plating efficiency in the presence of PcrA ( left panel ) , regardless of whether they harbor the non-transcribed co-directional hisC allele ( lanes 3 and 4 , left panel ) or the non-transcribed hisC head-on allele ( lanes 3 and 4 , left panel ) . 
+ Also , the non-induced PcrA degron system does not affect viability relative to strains lacking the PcrA degron ( lanes 2 and 4 , versus 1 and 3 , respectively ) . 
+ In these strains , IPTG addition ( right panel ) leads to the simultaneous induction of hisC , and induction of the sspB adaptor protein gene . 
+ Hence IPTG addition causes cell death in strains harboring the PcrA degron ( lanes 2 and 4 , both panels ) , but not in strains that lack the complete degron system ( lanes 1 and 3 , both panels ) . 
+ ( Control strains in lanes 1 and 3 possess the sspB adaptor protein , but lack the pcrA-ssrA allele ) . 
+ The strain harboring the PcrA degron system and the transcribed co-directional hisC allele shows less sensitivity to partial depletion of PcrA , than the equivalent strain with a transcribed head-on hisC allele ( right panel , lanes 2 vs. 4 ) . 
+ B ) . 
+ Quantification of cell survival after partial PcrA depletion ( left ) or near-complete PcrA depletion ( right ) . 
+ N 3 for all conditions . 
+ indicates that no colonies were detected . 
+ ( EPS ) 
+ S1 Table . 
+ Peaks in the DnaC ChIP-Seq data set . 
+ Here the quantitative ChIP-Seq data for each chromosomal region where DnaC enrichment increases to 5-fold above background following PcrA depletion are reported . 
+ Data include the maximal and average number of reads within each defined gene or intergenic region . 
+ We also report the area under the curve , and area under the curve divided by the total regions length for each region . 
+ ( TXT ) 
+ S2 Table. DnaC ChIP-Seq quantification for all chromosomal regions. (TXT)
+ S3 Table . 
+ Peaks in the PcrA ChIP-Seq data set . 
+ Here the quantitative ChIP-Seq data for each chromosomal region where DnaC enrichment increases to 5-fold above background are reported . 
+ Data include the maximal and average number of reads within each defined gene or intergenic region . 
+ We also report the area under the curve , and area under the curve divided by the total regions length for each region . 
+ ( TXT ) 
+ S4 Table. PcrA ChIP-Seq quantification for all chromosomal regions. (TXT)
+ Acknowledgments
+ We thank Ariana Samadpour for the construction and characterization of the Pxis-lacZ report-er strains . 
+ We thank the Grossman lab for their generous gift of anti-GFP and DnaC antibodies , strains , and plasmids . 
+ We also thank Peter Lewis for the RpoC-GFP strain . 
+ Conceived and designed the experiments: CNM HM. Performed the experiments: CN lyzed the data: CNM HM BJB. Contributed reagents/materials/analysis tools: CNM H Wrote the paper: CNM HM BJB. Provided technical help: BJB.
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/26131613.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/26131613.txt 0 → 100644
View file @27818a9
+ pathogens
+ 1 School of Chemistry & Molecular Biosciences , the University of Queensland , Brisbane , Queensland 4072 , Australia ; E-Mails : n.benzakour@uq.edu.au ( N.L.B.Z. ) ; m.phan1@uq.edu.au ( M.-D.P. ) ; b.forde@uq.edu.au ( B.M.F. ) ; m.stantoncook@uq.edu.au ( M.S.-C . ) 
+ 2 Australian Infectious Diseases Research Centre , the University of Queensland , Brisbane , Queensland 4072 , Australia 
+ 1. Introduction
+ Uropathogenic Escherichia coli ( UPEC ) are a major cause of urinary tract infections ( UTI ) , causing ~ 80 % of all cases [ 1 ] . 
+ Over the last few decades , several pandemic clones of UPEC , some of which are associated with multidrug resistant infections , have disseminated worldwide . 
+ This includes UPEC clones belonging to several multi-locus sequence types , including sequence type 131 ( ST131 ) , ST69 , ST73 and ST95 [ 2,3 ] . 
+ E. coli ST131 was originally identified in 2008 as a major clone linked to the spread of the CTX-M-15 extended-spectrum β-lactamase ( ESBL ) - resistance gene [ 4 -- 6 ] , the most widespread CTX-M ESBL enzyme worldwide [ 7,8 ] . 
+ ST131 strains have now been identified in both hospital and community settings from virtually all parts of the globe [ 9 -- 12 ] . 
+ ST131 causes a variety of extra-intestinal infections , most commonly UTI and bacteremia . 
+ Many ST131 strains exhibit resistance to multiple antibiotics , and therefore these infections are often associated with limited treatment options and frequent recurrences . 
+ The largest sub-clonal lineage of E. coli ST131 is resistant to fluoroquinolones and contains the type 1 fimbriae fimH30 ( H30 ) allele [ 13 ] . 
+ Three complete ST131 genome sequences have been generated . 
+ This includes SE15 [ 14 ] , EC958 [ 15 ] and JJ1886 [ 16 ] . 
+ Another ST131 strain , NA114 , while listed among the completely sequenced genomes on the NCBI database , remains in draft format [ 15,17 ] . 
+ This review will present an overview of our recent genomic analysis of ST131 and provide an update on the molecular characterization of the ST131 reference strain EC958 . 
+ 2. Global Epidemiology of ST131
+ ST131 belongs to the E. coli phylogenetic group B2 , which encompasses the largest group of E. coli associated with extra-intestinal infections . 
+ Based on phylogenetic analyses , the ST131 strains EC958 , NA114 and JJ1886 cluster together in a clade discrete from SE15 , and separate from representative strains from other E. coli phylogroups ( Figure 1 ) . 
+ Two recent studies have independently examined the global epidemiology of ST131 using genome sequence-based methods [ 18,19 ] . 
+ These studies identified a globally dominant fluoroquinolone resistant-FimH30 sub-lineage defined as H30 [ 18 ] or clade C [ 19 ] . 
+ All strains within this sub-lineage possessed the fluoroquinolone resistance alleles gyrA1AB and parC1aAB . 
+ Further analysis also revealed that ST131 strains containing the blaCTX-M-15 allele comprised a smaller subset of strains within this sub-lineage and were referred to as H30-Rx [ 18 ] or clade C2 [ 19 ] . 
+ Strikingly , the data from both studies supports the recent emergence and global dissemination of this sub-lineage from a single progenitor , provoking intriguing questions with respect to ST131 transmission , colonization and virulence . 
+ In addition to the dominant clade C that comprised 79 % of our sequenced ST131 strains , our analysis also identified two other well-supported ST131 clades referred to as A and B [ 19 ] . 
+ Clade A , represented by the reference strain SE15 , was the most divergent and comprised strains that contained the fimH41 allele . 
+ In contrast , strains from clade B were very similar to those from clade C and characterised by possession of the fimH22 allele . 
+ The prevalence of these fimH alleles , including the dominant H30 allele , is consistent with that reported previously from a large and extensive collection of ST131 strains [ 13 ] 
+ Our own detailed genomic analysis focused on the major defining features of the three ST131 clades [ 19 ] . 
+ While sequence analysis did not reveal any significant association with geographic origin , the majority of the single nucleotide polymorphisms that defined each clade were strongly associated with recombination . 
+ In total , 137 regions were defined as recombinant within our ST131 strain set , with the majority of large recombinant regions located adjacent to insertion sites for prophages and mobile genetic elements . 
+ Other recombination regions within the ST131 strain set were also identified , some of which encompassed virulence genes including fimH , the fliC flagella major subunit gene , and genes involved in capsule and O antigen biosynthesis . 
+ One other notable recombination region encompassed the fimB recombinase gene that contributes to the regulation of type 1 fimbriae expression . 
+ Most ST131 strains from clade C have a 1,895 bp insertion element within the fimB gene ( fimB : : ISEc55 ) , suggestin they may possess an altered type 1 fimbriae expression profile . 
+ Indeed , the fimB : : ISEc55 insertion has been associated with a slower `` off '' - to - `` on '' type 1 fimbriae switching phenotype in ST131 [ 20,21 ] . 
+ We are currently investing the impact of this insertion on ST131 virulence . 
+ 3. Molecular Characterisation of the ST131 Reference Strain EC958
+ EC958 is an O25b : H4 serotype strain isolated in 2005 from the urine of an 8-year old girl presenting with a community-acquired UTI in the United Kingdom [ 21 ] . 
+ The complete genome sequence of EC958 has been determined [ 15 ] . 
+ EC958 contains multiple genes associated with UPEC virulence , including genes encoding adhesins ( e.g. , type 1 fimbriae , curli and the afimbrial adhesin ) , autotransporter proteins ( e.g. , Ag43 , UpaG , UpaH and PicU ) and the biosynthesis of several siderophores ( enterobactin , aerobactin and yersiniabactin ) . 
+ Both EC958 and JJ1886 belong to the globally dominant CTX-M-15 positive , fluoroquinolone resistant , H30 clade C ST131 sub-lineage . 
+ The two strains display a high level of synteny at the core genome level , with major differences due to the number , content and location of genomic islands ( GIs ) and other mobile elements ( Figure 1 ) . 
+ For example , GI-selC is present in EC958 but not JJ1886 , while the Phi8 prophage is only present in JJ1886 . 
+ The two strains cluster distinct from the ST131 clade A SE15 strain . 
+ Based on whole-genome BLASTn comparisons , the major structural differences between EC958/JJ1886 and SE15 are the presence of seven prophage loci ( Phi1-Phi7 ) and four genomic islands ( GI-thrW , GI-pheV , GI-selC , and GI-leuX ) ( Figure 2 ) . 
+ Future examination of complete genomes of ST131 strains from different origins will be required to determine the extent of divergence of prophage , genomic islands and other mobile genetic elements in the ST131 clonal group . 
+ 4. Virulence of E. coli ST131
+ EC958 has been characterised extensively with respect to several virulence characteristics . 
+ The strain possesses the fimB : : ISEc55 insertion but can express type 1 fimbriae after several rounds of static subculture . 
+ The expression of type 1 fimbriae by EC958 is required for adherence to and invasion of human T24 bladder epithelial cells , and colonization of the mouse bladder [ 21 ] . 
+ In mice , E. coli EC958 causes acute and chronic UTI [ 22 ] . 
+ EC958 bladder infection involves the formation of intracellular bacterial communities ( IBCs ) in superficial epithelial cells and the subsequent release of rod-shaped and filamentous bacteria into the bladder lumen [ 22 ] . 
+ EC958 also causes impairment of rat uterine contractility [ 23 ] . 
+ The ability of EC958 to resist the bactericidal action of human serum has been extensively interrogated using hyper-saturated transposon mutagenesis in combination with transposon directed insertion-site sequencing ( TraDIS ) [ 24 ] . 
+ TraDIS is a high-throughput functional genomics method that enables a pool of transposon mutants to be characterized by direct sequencing of DNA flanking transposon insertion sites [ 25 ] . 
+ In total , 56 genes were defined by TraDIS to comprise the EC958 serum resistome , of which 46 genes were validated by the generation and testing of specific mutants . 
+ The majority of these genes encode outer membrane proteins , or were associated with the biosynthesis of lipopolysaccharide ( LPS ) , the enterobacterial common antigen or colonic acid . 
+ Overall , the murein lipoprotein Lpp and two lipidA-core biosynthesis enzymes ( WaaP and WaaG ) were most strongly associated with serum resistance . 
+ The hyxR gene , which has previously been shown to contribute to th nitrosative stress response and intramacrophage survival of UPEC [ 26 ] , was also identified as a minor regulator of O-antigen chain length . 
+ 5. Plasmids of ST131
+ Plasmids represent a major vehicle for the carriage of antibiotic resistance genes . 
+ Among the Enterobacteriaceae , plasmids from a range of incompatibility ( Inc ) groups have been characterised that contain various combinations of resistance , conjugative transfer and other cargo genes . 
+ The diversity of plasmid types in ST131 has been examined , with 50 % of the most frequent gamma-proteobacterial plasmid groups identified within the ST131 lineage [ 28 ] . 
+ Our own analysis revealed that the majority of ST131 strains harbor an IncF plasmid , many of which are associated with the carriage of antibiotic resistance genes [ 29 ] . 
+ Indeed , complete genome sequencing of EC958 demonstrated it contains a large 135.6 kb plasmid that harbors two replicons ( RepFIA and RepFII ) and 12 antibiotic resistance genes ( including blaCTX-M-15 ) . 
+ The most closely related plasmid to pEC958 is pEK499 ( 99 % identity covering 85 % of pEC958 ; Figure 3 ) , which was also isolated from an ST131 strain in the United Kingdom [ 30 ] . 
+ Interestingly , despite the presence of the blaCTX-M-15 gene on pEC958 , we have shown that this is not the major determinant responsible for EC958 resistance to second and third generation cephalosporins . 
+ Instead , EC958 contains a chromosomally-located blaCMY-23 gene that drives this resistance phenotype [ 31 ] 
+ We employed TraDIS as a novel approach to investigate the biology of pEC958 [ 29 ] . 
+ Analysis of TraDIS data from our saturated transposon mutant library of EC958 identified 27,317 reads that mapped to unique insertion sites in plasmid pEC958 ( i.e. , one insertion site every 4.96 bp ) . 
+ Genetic elements required for pEC958 stability were identified in both the RepFIA and RepFII replicons ; the ccdA , sopA and sopB genes in RepFIA , and the copA , repA6 , repA1 , repA4 genes as well as the oriV region in RepFII . 
+ Interestingly , this data suggests a model where both replicons contain features that ensure their stable inheritance : replication in RepFII and partition as well as post-segregational killing in RepFIA . 
+ Our analysis also identified EC958_A0140 as a novel gene of unknown function that is associated with pEC958 stability . 
+ Screening of the NCBI complete plasmid sequence database revealed EC958_A0140 is present in 17 other plasmids , all of which are IncF type except for pECL_A ( non-typable ) . 
+ However , bioinformatic analysis of EC958_A0140 did not yield any clues regarding its function and thus this remains an area of ongoing study . 
+ 6. Conclusions
+ Our current understanding of ST131 epidemiology supports its divergence into three discrete sub-lineages sometime before the year 2000 , with acquisition of multiple mobile genetic elements , associated recombination events and point-mutations jointly responsible for the emergence of the most prevalent clade C/H30 strains . 
+ Several studies have now reported the identification of ST131 strains resistant to last-line carbapenem antibiotics [ 32 -- 35 ] , highlighting the alarming scenario of pan-resistance in a UPEC clone that has already demonstrated its capacity to disseminate rapidly across the globe . 
+ Future work will explore the continued evolution of the globally dominant clade C/H30 group , and address important questions that relate to ST131 resistance , transmission , colonization and virulence . 
+ Acknowledgments
+ M.A.S. and S.A.B. would like to thank other members of their research teams who have contributed to this ongoing area of research . 
+ In addition , we thank our many national and international collaborators for their valuable contributions . 
+ This work was supported by a grant from the National Health an 
+ Medical Research Council ( NHMRC ) of Australia ( APP1067455 ) . 
+ M.A.S. is supported by an Australian Research Council Future Fellowship ( FT100100662 ) and S.A.B. is supported by an NHMRC Career Development Fellowship ( APP1090456 ) . 
+ Conflicts of Interest
+ 1
+ 2
+ 15 . 
+ Forde , B.M. ; Ben Zakour , N.L. ; Stanton-Cook , M. ; Phan , M.D. ; Totsika , M. ; Peters , K.M. ; Chan , K.G. ; Schembri , M.A. ; Upton , M. ; Beatson , S.A. . 
+ The complete genome sequence of Escherichia coli EC958 : A high quality reference sequence for the globally disseminated multidrug resistant E. coli O25b : H4-ST131 clone . 
+ PLoS ONE 2014 , 9 , e104400 . 
+ 6 . 
+ Andersen , P.S. ; Stegger , M. ; Aziz , M. ; Contente-Cuomo , T. ; Gibbons , H.S. ; Keim , P. ; Sokurenko , E.V. ; Johnson , J.R. ; Price , L.B. Complete genome sequence of the epidemic and highly virulent CTX-M-15-producing H30-RX subclone of escherichia coli st131 . 
+ Genome Announc . 
+ 2013 , 1 , doi :10.1128 / genomeA.00988-13 . 
+ 7 . 
+ Avasthi , T.S. ; Kumar , N. ; Baddam , R. ; Hussain , A. ; Nandanwar , N. ; Jadhav , S. ; Ahmed , N. Genome of multidrug-resistant uropathogenic Escherichia coli strain NA114 from India . 
+ J. Bacteriol . 
+ 2011 , 193 , 4272 -- 4273 . 
+ 18 . 
+ Price , L.B. ; Johnson , J.R. ; Aziz , M. ; Clabots , C. ; Johnston , B. ; Tchesnokova , V. ; Nordstrom , L. ; Billig , M. ; Chattopadhyay , S. ; Stegger , M. ; et al. . 
+ The epidemic of extended-spectrum-β-lactamase-producing Escherichia coli ST131 is driven by a single highly pathogenic subclone , H30-RX . 
+ MBio 2013 , 4 , e00377 -- 00313 . 
+ 19 . 
+ Petty , N.K. ; Ben Zakour , N.L. ; Stanton-Cook , M. ; Skippington , E. ; Totsika , M. ; Forde , B.M. ; Phan , M.D. ; Gomes Moriel , D. ; Peters , K.M. ; Davies , M. ; et al. . 
+ Global dissemination of a multidrug resistant Escherichia coli clone . 
+ Proc . 
+ Natl. Acad . 
+ Sci . 
+ USA 2014 , 111 , 5694 -- 5699 . 
+ 20 . 
+ Paul , S. ; Linardopoulou , E.V. ; Billig , M. ; Tchesnokova , V. ; Price , L.B. ; Johnson , J.R. ; Chattopadhyay , S. ; Sokurenko , E.V. Role of homologous recombination in adaptive diversification of extraintestinal Escherichia coli . 
+ J. Bacteriol . 
+ 2013 , 195 , 231 -- 242 . 
+ 21 . 
+ Totsika , M. ; Beatson , S.A. ; Sarkar , S. ; Phan , M.D. ; Petty , N.K. ; Bachmann , N. ; Szubert , M. ; Sidjabat , H.E. ; Paterson , D.L. ; Upton , M. ; et al. . 
+ Insights into a multidrug resistant Escherichia coli pathogen of the globally disseminated ST131 lineage : Genome analysis and virulence mechanisms . 
+ PLoS ONE 2011 , 6 , e26578 . 
+ 22 . 
+ Totsika , M. ; Kostakioti , M. ; Hannan , T.J. ; Upton , M. ; Beatson , S.A. ; Janetka , J.W. ; Hultgren , S.J. ; Schembri , M.A. . 
+ A FimH inhibitor prevents acute bladder infection and treats chronic cystitis caused by multidrug-resistant uropathogenic Escherichia coli ST131 . 
+ J. Infect . 
+ Dis . 
+ 2013 , 208 , 921 -- 928 . 
+ 23 . 
+ Floyd , R.V. ; Upton , M. ; Hultgren , S.J. ; Wray , S. ; Burdyga , T.V. ; Winstanley , C. Escherichia coli-mediated impairment of ureteric contractility is uropathogenic E. coli specific . 
+ J. Infect . 
+ Dis . 
+ 2012 , 206 , 1589 -- 1596 . 
+ 24 . 
+ Phan , M.D. ; Peters , K.M. ; Sarkar , S. ; Lukowski , S.W. ; Allsopp , L.P. ; Gomes Moriel , D. ; Achard , M.E. ; Totsika , M. ; Marshall , V.M. ; Upton , M. ; et al. . 
+ The serum resistome of a globally disseminated multidrug resistant uropathogenic Escherichia coli clone . 
+ PLoS Genet . 
+ 2013 , 9 , e1003834 . 
+ 25 . 
+ Langridge , G.C. ; Phan , M.D. ; Turner , D.J. ; Perkins , T.T. ; Parts , L. ; Haase , J. ; Charles , I. ; Maskell , D.J. ; Peters , S.E. ; Dougan , G. ; et al. . 
+ Simultaneous assay of every Salmonella typhi gene using one million transposon mutants . 
+ Genome Res . 
+ 2009 , 19 , 2308 -- 2316 . 
+ 6 . 
+ Bateman , S.L. ; Seed , P.C. Epigenetic regulation of the nitrosative stress response and intracellular macrophage survival by extraintestinal pathogenic Escherichia coli . 
+ Mol . 
+ Microbiol . 
+ 2012 , 83 , 908 -- 925 . 
+ 27 . 
+ Sullivan , M.J. ; Petty , N.K. ; Beatson , S.A. Easyfig : A genome comparison visualizer . 
+ Bioinformatics 2011 , 27 , 1009 -- 1010 
+ 28 . 
+ Lanza , V.F. ; de Toro , M. ; Garcillan-Barcia , M.P. ; Mora , A. ; Blanco , J. ; Coque , T.M. ; de la Cruz , F. Plasmid flux in Escherichia coli ST131 sublineages , analyzed by plasmid constellation network ( placnet ) , a new method for plasmid reconstruction from whole genome sequences . 
+ PLoS Genet . 
+ 2014 , 10 , e1004766 . 
+ 29 . 
+ Phan , M.D. ; Forde , B.M. ; Peters , K.M. ; Sarkar , S. ; Hancock , S. ; Stanton-Cook , M. ; Ben Zakour , N.L. ; Upton , M. ; Beatson , S.A. ; Schembri , M.A. Molecular characterization of a multidrug resistance IncF plasmid from the globally disseminated Escherichia coli ST131 clone . 
+ PLoS ONE 2015 , 10 , e0122369 . 
+ 0 . 
+ Woodford , N. ; Carattoli , A. ; Karisik , E. ; Underwood , A. ; Ellington , M.J. ; Livermore , D.M. Complete nucleotide sequences of plasmids pEK204 , pEK499 , and pEK516 , encoding CTX-M enzymes in three major Escherichia coli lineages from the united kingdom , all belonging to the international O25 : H4-ST131 clone . 
+ Antimicrob . 
+ Agents Chemother . 
+ 2009 , 53 , 4472 -- 4482 . 
+ 31 . 
+ Phan , M.D. ; Peters , K.M. ; Sarkar , S. ; Forde , B.M. ; Lo , A.W. ; Stanton-Cook , M. ; Roberts , L.W. ; Upton , M. ; Beatson , S.A. ; Schembri , M.A. Third-generation cephalosporin resistance conferred by a chromosomally encoded blaCMY-23 gene in the Escherichia coli ST131 reference strain EC958 . 
+ J. Antimicrob . 
+ Chemother . 
+ 2015 , 70 , 1969 -- 1972 . 
+ 32 . 
+ Johnson , T.J. ; Hargreaves , M. ; Shaw , K. ; Snippes , P. ; Lynfield , R. ; Aziz , M. ; Price , L.B. Complete genome sequence of a carbapenem-resistant extraintestinal pathogenic Escherichia coli strain belonging to the sequence type 131 H30R subclade . 
+ Genome Announc . 
+ 2015 , 3 , doi :10.1128 / genomeA.00272-15 . 
+ 3 . 
+ Accogli , M. ; Giani , T. ; Monaco , M. ; Giufre , M. ; Garcia-Fernandez , A. ; Conte , V. ; D'Ancona , F. ; Pantosti , A. ; Rossolini , G.M. ; Cerquetti , M. Emergence of Escherichia coli ST131 sub-clone H30 producing VIM-1 and KPC-3 carbapenemases , Italy . 
+ J. Antimicrob . 
+ Chemother . 
+ 2014 , 69 , 2293 -- 2296 . 
+ 34 . 
+ Cai , J.C. ; Zhang , R. ; Hu , Y.Y. ; Zhou , H.W. ; Chen , G.X. Emergence of Escherichia coli sequence type 131 isolates producing KPC-2 carbapenemase in China . 
+ Antimicrob . 
+ Agents Chemother . 
+ 2014 , 58 , 1146 -- 1152 . 
+ 5 . 
+ Naas , T. ; Cuzon , G. ; Gaillot , O. ; Courcol , R. ; Nordmann , P . 
+ When carbapenem-hydrolyzing β-lactamase KPC meets Escherichia coli ST131 in France . 
+ Antimicrob . 
+ Agents Chemother . 
+ 2011 , 55 , 4933 -- 4934 . 
+ © 2015 by the authors ; licensee MDPI , Basel , Switzerland . 
+ This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/4.0/ )
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/26389830.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/26389830.txt 0 → 100644
View file @27818a9
+ Salmonella enterica Serovar Typhimurium
+ Abstract 
+ The alternative sigma factor σE functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly . 
+ The transcriptional regulatory network governed by σE in Salmonella and E. coli has been examined using microarray , however a genome-wide analysis of σE -- binding sites in Salmonella has not yet been reported . 
+ We infected macrophages with Sal-monella Typhimurium over a select time course . 
+ Using chromatin immunoprecipitation followed by high-throughput DNA sequencing ( ChIP-seq ) , 31 σE -- binding sites were identified . 
+ Seventeen sites were new , which included outer membrane proteins , a quorum-sensing protein , a cell division factor , and a signal transduction modulator . 
+ The consensus sequence identified for σE in vivo binding was similar to the one previously reported , except for a conserved G and A between the -35 and -10 regions . 
+ One third of the σE -- binding sites did not contain the consensus sequence , suggesting there may be alternative mechanisms by which σE modulates transcription . 
+ By dissecting direct and indirect modes of σE-medi-ated regulation , we found that σE activates gene expression through recognition of both canonical and reversed consensus sequence . 
+ New σE regulated genes ( greA , luxS , ompA and ompX ) are shown to be involved in heat shock and oxidative stress responses . 
+ Introduction
+ Where household sigma factor σ70 is responsible for promoting transcription of a large number of genes in a bacterial cell , alternative sigma factors are produced or activated when cells E undergo particular physiological stresses [ 1 ] [ 2 ] . 
+ The alternative sigma factor E ( σ ) responds to extracytoplasmic cues generated by temperature , osmotic , or oxidative stress , which releases σE from sequestration at the inner membrane through a series of protein cleavage events . 
+ Free σE in the cytoplasm binds core RNA polymerase and initiates transcription at σE-dependent promoters , resulting in a specific response to promote restoration of homeostasis in the cell [ 3 ] [ 4 ] . 
+ In Salmonella enterica serovar Typhimurium ( referred to as STM or Salmonella hereafter ) , deletion of σE ( ΔrpoE ) is lethal for intracellular survival in macrophages ; the strongest phenotype among regulator deletion strains examined [ 5 ] . 
+ The transcriptional regulatory network controlled by σE has been investigated in previous studies [ 6 ] [ 7 ] [ 8 ] [ 9 ] . 
+ Recent findings from our group indicated that approximately 58 % of the entire Salmonella genome was regulated by σE , an effect most likely produced by modulation of the expression of multiple general regulators [ 10 ] . 
+ Aided by sample-matched global proteomic and transcriptomic analyses , we found that σE regulated Salmonella gene expression not only at the transcriptional level , but also by a post-transcriptional mechanism , which was partially dependent on the RNA-binding protein Hfq [ 11 ] . 
+ The above studies suggested that the majority of transcriptional regulation mediated by σE occurred indirectly , driven by a small number of genes that were directly regulated by σE . 
+ Therefore , it is essential to identify σE-binding sites globally to understand the intricacies of σE-mediated gene regulation . 
+ Chromatin-immunoprecipitation followed by high-throughput DNA sequencing ( ChIP-seq ) has been successfully used in the identification of chromosomal binding sites of various regulators in bacteria [ 12 ] [ 13 ] [ 14 ] . 
+ It offers higher resolution , lower noise , and greater coverage than its array-based predecessor ChIP-chip [ 15 ] . 
+ We selected ChIP-seq as a method for genome-wide profiling of σE -- binding sites in STM . 
+ Our results indicated some σE -- binding sites resided 5 ' to divergent flanking genes , and raised the question whether σE regulated these genes equally . 
+ Moreover , a previous study from the Gross group mentioned the possibility that a reversed consensus sequence of σE might be involved in repressing gene expression , exemplified by ompX [ 8 ] . 
+ We elucidated the above questions by constructing consensus sequence substituted strains , and dissecting direct from indirect regulatory effects of σE . 
+ In this study we identified 31 σE -- binding sites on the STM genome during in vivo infection of Raw264 .7 macrophages with STM and determined a consensus sequence for σE-binding . 
+ However , this consensus sequence was not the only mechanism utilized by σE in transcriptional regulation . 
+ Moreover , σE did not regulate its flanking genes equally when the σE-binding site resided between bi-directional promoters . 
+ σE activated gene expression through binding the consensus or reversed consensus sequence . 
+ Finally , we found that new targets directly regulated by σE were involved in response to heat shock and oxidative stress . 
+ Materials and Methods
+ Bacterial strains and growth conditions
+ Salmonella enterica Serovar Typhimurium ATCC 14028s was used as the parent strain in this study . 
+ The rpoE-deletion strain ( ΔrpoE ) was constructed using λ red recombination system as described [ 5 ] . 
+ The consensus sequence substituted mutants were constructed by replacing the σE -- binding motif of greA , ompX , ompA , luxS and rpoE with a substitutive consensus sequence consisting of nucleotides with the least prevalence at the corresponding positions in the σE consensus sequence ( ATTTGCGGGAACATGCGAAGACTGACTG ) . 
+ The substitutive consensus sequence was synthesized with SacI and AvrII restriction sites at two ends , and ligated into pKD13 modified plasmid [ 16 ] , resulting in a new plasmid pKD13-RpoE1 . 
+ λ red recombination was used to construct the consensus sequence substituted mutants with pKD13-RpoE1 plasmid similar to the ΔrpoE strain . 
+ All substitutions were validated by DNA sequencing . 
+ Primers used for construction of the above mutants are shown in S1 Table . 
+ For in vitro study , the bacteria were grown overnight in Luria-Bertani ( LB ) medium , washed twice in pH 5.8 , low phosphate , low magnesium-containing medium ( LPM ) , and resuspended in LPM at 1:10 dilution for an additional 4 h , 8 h or 20 h [ 17 ] . 
+ All bacterial cultures were grown in triplicate . 
+ Anti-σE antibody generation
+ The anti-σE antibody ( Ab ) was generated as previously described [ 10 ] . 
+ Briefly , the rpoE gene of STM was cloned into the plasmid pET200/D-TOPO ( Invitrogen ) and transformed into BL21 ( DE3 ) E. coli strain ( Invitrogen ) . 
+ Transformed bacteria were induced using Isopropyl β-D-1-thiogalactopyranoside ( IPTG ) , lysed and sonicated for purification of recombinant σE using HisPurTM Cobalt resin ( Pierce ) . 
+ The eluted protein was separated on SDS-PAGE , and stained with Coomassie blue . 
+ The single gel bands at the size of recombinant RpoE ( His tagged ) were excised and sent to Pacific Immunology Corp. ( Ramona , CA ) for polyclonal antisera production . 
+ The σE antisera generated in rabbits were purified by affinity chromatography . 
+ CH Sepharose 4B ( GE Healthcare ) was coupled with recombinant σE , then σE antisera was loaded in a chromatography column ( Bio-Rad ) . 
+ After washes , the column was eluted with 200 mM glycine pH 2.0 , then adjusted to pH 7.0 with 5 N NaOH . 
+ Fractions containing purified anti-σE Ab as judged by SDS-PAGE were frozen at -20 °C . 
+ Protein concentration determination was performed according to modified Lowry method using bovine serum albumin ( BSA ) as reference protein [ 18 ] . 
+ Immunoblot analysis
+ The WT and ΔrpoE strains were cultured in LB for 4 h. Cells were washed and approximately 5 x 107 colony-forming units were pelleted and re-suspended in Laemmli sample buffer , boiled for 5 min , and then separated on SDS-PAGE . 
+ Proteins on the gel were next transferred to poly-vinylidene difluoride ( PVDF ) membrane ( Millipore ) . 
+ After blocking in Tris-buffered saline ( TBS ) plus 5 % nonfat dry milk for 1 h , membranes were probed with anti-σE Ab or anti-GFP Ab at 1:1000 , 1:3000 , and 1:5000 dilutions . 
+ The flow-through of rabbit immune sera during affinity chromatography purification of anti-σE Ab was also used to probe the membrane as control . 
+ Membranes were washed and probed with secondary antibody ( anti-rabbit IgG conjugated with peroxidase ) ( Cell Signaling Technology ) . 
+ The immune complexes were detected via chemiluminescence using Western LightningTM ( PerkinElmer ) , and then exposed to XAR Bio-film ( Kodak ) . 
+ Salmonella infection
+ The Raw264 .7 macrophage cell line was purchased from ATCC and grown in Dulbecco 's modified Eagle 's medium ( DMEM ) containing 10 % fetal calf serum in 6 - well plates until reaching confluency of 80 -- 90 % . 
+ STM grown overnight in LB was washed and diluted with PBS to an appropriate concentration . 
+ Raw264 .7 cells were infected with STM at MOI = 100 for 4 , 8 , or 18 h. Infections were initiated by centrifuging the bacteria onto the cell monolayers at 1,000 × g for 5 min and plates were incubated at 37 °C with 5 % CO2 for 1 h. To remove extracellular bacteria after internalization , cells were washed with PBS and incubated in DMEM containing 10 % fetal calf serum and gentamicin ( 100 μg / ml ) for 1 h . 
+ The cells were then washed with PBS and overlaid with DMEM containing 10 % fetal calf serum and 20 μg / ml gentamicin for the remainder of the experiment . 
+ After 4 , 8 , or 18 h infection with STM , Raw264 .7 cells were washed with PBS and scraped off the plates . 
+ The cells were collected and cross-linked by 1 % formaldehyde at room temperature for 25 min with shaking , then quenched using 125 mM glycine for an additional 5 min of incubation at room temperature . 
+ Chromatin Immunoprecipitation assay was performed as described [ 10 ] . 
+ Cells were lysed and sonicated , and the supernatant was split into two samples . 
+ One was mixed with affinity purified rabbit anti-σE Ab to immunoprecipitate σE -- DNA complex , and the other sample was mixed with rabbit monoclonal Ab to GFP as control . 
+ After overnight incubation at 4 °C , 50 μl of the Dynabead M-280 sheep anti-rabbit IgG ( Invitrogen ) was added into the mixture and further incubated at 4 °C for 6 h. Beads were washed and resuspended in elution buffer ( 50 mM Tris-HCl at pH 8.0 , 10 mM EDTA , and 1 % SDS ) and incubated at 65 °C overnight to reverse the cross-linking . 
+ Protein and RNA were removed from the samples and DNA was purified with a PCR purification kit ( QIAGEN ) . 
+ Gene-specific quantitative PCR was carried out using the DNA samples ( primers in S2 Table ) . 
+ The DNA was then combined with 5x Sequenase buffer and Random 9-Ns primer and cycled to 94 °C for 2 min then cooled to 10 °C . 
+ A mix of 5x Sequenase buffer , dNTP mix , BSA , DTT , and Sequenase was added to the DNA and ramped up from 10 °C to 37 °C at 0.1 °C / s , held at 37 °C for 8 min , heated to 94 °C for 2 min and then cooled to 10 °C . 
+ Sequenase dilution buffer and Sequenase enzyme were added and the sample was ramped up from 10 °C to 37 °C at 0.1 °C / s , held at 37 °C for 8 min and then cooled to 4 °C . 
+ Samples were diluted and combined with a mix of pfu buffer , dNTP mix , Rand universal primer , and pfu polymerase . 
+ The DNA was cycled to 94 °C for 30 s , 40 °C for 30 s , 50 °C for 30 s , and 72 °C for 2 min for 25 cycles . 
+ The amplified DNA was purified using the Qiagen PCR purification kit according to protocol . 
+ The purified DNA was precipitated using 3M sodium acetate and ethanol overnight . 
+ Samples were centri-fuged at 4 °C for 1 h and subsequently washed with 70 % ethanol , dried and resuspended in water . 
+ Libraries for sequence analysis on the Illumina HiSeq 2000 using V3 chemistry were generated per the Illumina TruSeq standard protocol . 
+ Peak calling
+ HiSeq fastq files were generated and loaded into the CLC Genomics Workbench 7 software ( CLCBio , Boston MA ) for ChIP peak identification . 
+ Imported reads were mapped to the reference genome retrieved from NCBI . 
+ Default mismatch rates were applied , with length and similarity fractions both at 90 % . 
+ All experimental samples were compared to the reference controls to identify binding events of interest . 
+ The most statistically significant peak calls at each time point were then visually verified using the GenomeView genome browser [ 19 ] . 
+ Quantitative RT-PCR analysis
+ Total RNA was isolated by RNeasy mini kit , combined with RNA-free DNase for on-column DNase digestion ( Qiagen ) according to manufacturer 's instructions . 
+ cDNA was synthesized using the iScript cDNA synthesis kit ( Bio Rad ) . 
+ The amount of cDNA corresponding to 10 ng of input RNA was used as template for real-time reaction containing Power SYBR green ( Applied Biosystems ) and gene-specific primers . 
+ The primers were designed with Primer Express 3.0 software and tested for amplification efficiencies ( S2 Table ) . 
+ The gyrB gene , encoding for the B subunit of the DNA gyrase , was utilized as endogenous control . 
+ The RT-PCR reactions were carried out at 95 °C for 10 min , 95 °C for 15 s and 60 °C for 1 min for 40 cycles ( ABI 7700 , Applied Biosystems ) . 
+ The expression ratio of each gene was the average from three independent RNA samples and was normalized to the level of gyrB . 
+ We recently characterized the set of genes affected by σE in STM 14028s by microarray and identified 2533 genes exhibiting σE-dependent transcription during growth in nutrient-rich and acidic minimal media . 
+ However , only 81 genes ( 3 % ) were regulated by σE in all of the growth conditions examined , suggesting genes regulated by σE were tuned by growth conditions that activated σE via different environmental cues [ 10 ] . 
+ In order to study the genomic binding sites of σE under conditions approximating the phagosomal environment , we infected murine macrophage Raw264 .7 with STM and performed ChIP-seq on intracellular bacteria to identify genes directly bound by σE on the STM genome . 
+ To avoid the possible influence of epitope tagging on protein function , we generated primary anti-σE Ab , purified with affinity chromatography , and assessed its specificity by Western blot before applying it to ChIP . 
+ A single band at the correct size of σE ( 22 kDa ) was detected in the cell lysate of the WT strain and absent in the ΔrpoE using anti-σE Ab ( S1 Fig ) . 
+ When diluted 1:5000 , the anti-σE Ab yielded a single reactive band . 
+ This evidence indicated that the generated anti-σE Ab was highly specific and had strong affinity . 
+ The monoclonal anti-GFP Ab , which did not show any cross reaction with STM , was used as negative control in the ChIP-seq experiments ( S1 Fig ) . 
+ After peaks were called , the most statistically significant from each time point were visually verified . 
+ To pass this step , the read counts composing a peak were required to be substantially higher than background and also needed to be distributed evenly on both sides of the transcript ( Fig 1 ) . 
+ The peaks were ambiguous for the in vivo 4 h infection . 
+ We believe that the signal at this early time was not strong enough to provide robust results , so we excluded it from the analysis . 
+ A total of 31 ChIP-enriched peaks were identified for the 8 h and 18 h infections . 
+ Fourteen of the peak-associated genes were previously shown to be directly regulated by σE [ 8,9 ] , supporting the effectiveness of the ChIP-seq procedure ( Table 1 ) . 
+ The remaining 17 genes represented new targets of varying function ; encoding outer membrane proteins ( ompA , ompF , and ompN ) , a quorum sensing protein ( luxS ) [ 20 ] , a cell division factor ( minD/minE ) [ 21 ] , a signal transduction modulator ( sixA ) [ 22 ] , and hypothetical proteins ( stm14_1163 , stm14_1514 , stm14_1018 , stm14_2321 , stm14_2952 ) . 
+ Validation of in vivo σE-binding sites
+ To validate σE-binding , we performed ChIP-qPCR on target genomic loci and determined their fold enrichment in pulldowns using anti-σE Ab versus anti-GFP Ab ( control ) . 
+ A known σE -- binding site , rpoE promoter 3 , was pulled down in our ChIP-seq assay and used as positive control , and a non-target site , the promoter region of hns , was chosen as negative control [ 10 ] . 
+ For the 8 h infection , except for the binding site upstream of spf , all the ChIP-seq peaks exhibited more than 2-fold enrichment when compared to control samples , and were therefore considered authentic σE-binding sites ( Fig 2 ) . 
+ For the 18 h infection , all of the ChIP-seq peaks were validated . 
+ For both conditions , binding sites upstream of luxS , stm14_1018 , ompX , stm14_2321 , and yhjJ exhibited enrichment as high as the positive control ( rpoE , 64-fold ) , hence were considered high-affinity binding sites for σE ( Fig 2 ) . 
+ Identification of the consensus sequence of σE-binding in vivo
+ The sequences of the 31 binding regions determined by our ChIP-seq experiments were analyzed with the MEME suite tool to identify a consensus sequence for σE-binding [ 23 ] . 
+ For 20 of these peak regions , a consensus motif was identified ( p-value < 0.0001 ) . 
+ This A-T rich σE consensus sequence is largely similar to the one that was identified previously for STM [ 9 ] , except calls for each replicate 8 h and 18 h are shown . 
+ The peaks in the blue box correspond to the binding site of σE to 5 ' region of the gene rpoE , while those in the orange box correspond to the binding site 5 ' to the gene rseA . 
+ that the nucleotides between the -35 and -10 regions appeared to contain conserved G and A in our in vivo σE -- binding consensus sequence . 
+ Except for the six genes ( rpoE , rpoH , fkpA , htrA , yfiO , and ygiM ) that were previously identified to contain σE-dependent promoters [ 9 ] , the σE consensus sequence was also found upstream of genes encoding outer membrane proteins ( ompA , ompF , ompN , and ompX ) , putative permease ( perM ) , putative cytoplasmic protein ( ybhP ) [ 24 ] , transcription elongation factor ( greA ) [ 25 ] , and quorum sensing autoinducer 2 synthase ( luxS ) [ 20 ] ( Fig 3 ) . 
+ There were 11 σE-binding sites ( cspE , galF , minD , rpoD , sixA , spf , stm14_2321 , stm14_2952 , surA , yggN , yhjJ ) validated in our study that did not contain any recognizable consensus sequence . 
+ However , some of these sites exhibited high σE-binding capacity , such as stm14_2321 , ybjJ ( > 64-fold enrichment ) , and stm14_1514 , surA , sixA , stm14_2952 , galF , minD ( > 16-fold enrichment ) ( Fig 2 ) . 
+ This observation suggested that , besides the major regulatory mechanism of using the canonical consensus sequence , σE may utilize additional complementary mechanisms to regulate gene transcription . 
+ Establishing in vitro condition to mimic the in vivo σE-binding The transcriptional profile of σE for STM intracellular infection is elusive because the rpoE mutation is so deleterious that the mutant can survive for no more than 30 min post-infection [ 5 ] . 
+ Therefore , we sought an in vitro method that could reproduce in vivo σE-binding conditions . 
+ STM grown in LPM was used for mimicking in vivo infection [ 26 ] , here the bacteria were cultured in LPM for 4 , 8 , and 18 h and then subjected to ChIP-qPCR as performed on in vivo samples . 
+ σE-binding was evaluated on 15 sites selected from the in vivo ChIP-seq results ( Fig 4 ) . 
+ None of the three in vitro conditions could achieve similar binding capacity of σE on these sites as was seen in the in vivo conditions . 
+ However the LPM 4 h condition produced the most favorable result among them , where except for htrA , remaining binding sites were detected at levels higher than the 2-fold threshold . 
+ Hence , we used the LPM 4 h condition to study σE-mediated gene regulation in an effort to approximate in vivo binding conditions . 
+ Dissecting direct versus indirect regulation mediated by σE Although 31 σE-binding sites were recognized as directly regulated by σE during in vivo infection ( Table 1 , Fig 2 ) , microarray was not sufficient to evaluate the regulatory effects of σE on these genes through comparing WT and ΔrpoE because this comparison measured total regulation mediated by σE . 
+ Indirect regulation mediated by σE , largely through controlled expression of multiple general regulators , accounted for the majority of the total regulation observed [ 10 ] . 
+ To dissect direct from indirect σE -- mediated regulation for greA , ompX , ompA , luxS , and rpoE , we replaced the authentic consensus sequence upstream of these genes with a scrambled consensus sequence , which resulted in 5 mutants designated greA-RpoE1 , ompX-RpoE1 , ompA-RpoE1 , luxS-RpoE1 , and rpoE-RpoE1 . 
+ Mutant strains were cultured in LPM for 4 h and the σE binding capacity on 5 designated binding sites ( greA , ompX , ompA , luxS , and rpoE promoter region ) was examined using ChIP-qPCR ( Fig 5 ) . 
+ As expected , mutation of the σE-bind-ing motif upstream of each gene only eliminated ( greA , ompA , luxS ) or decreased ( ompX ) σE binding of each mutant to its corresponding sites , while leaving the other sites intact for σEbinding . 
+ These results further confirmed the direct occupation of σE on these selected σE-bind-ing motifs . 
+ Of the 31 σE-binding sites identified , 16 of them were located between bi-directional promoter elements ( Table 1 ) . 
+ Since the σE-binding site was 5 ' to genes in both directions , we further investigated if σE regulated these genes equally . 
+ Two σE-binding sites were chosen , one located between dacB and greA , with a second between ompX and ybiF ( Fig 6A ) . 
+ WT , ΔrpoE , and consensus sequence substituted mutants ( greA-RpoE1 or ompX-RpoE1 ) were cultivated in LPM for 4 h , and expression of the genes in question was measured using qRT-PCR with gyrB serving as the internal control . 
+ The total regulatory effects of σE on a particular gene was calculated by comparing its expression ratio in WT versus ΔrpoE . 
+ The direct contribution of σE was measured by the expression ratio in WT versus the consensus sequence substituted mutant , and the indirect contribution of σE was measured by the expression ratio in ΔrpoE versus the consensus sequence substituted mutant . 
+ We found that based on total regulation , it was apparent that σE regulated dacB instead of greA . 
+ However the indirect regulation of σE on greA was much higher than that on dacB , leading to the net regulation directly mediated by σE significant on greA , but not dacB ( Fig 6B ) . 
+ Similarly , dissecting σE regulation indicated that σE activated ompX instead of ybiF ( Fig 6C ) . 
+ Three more σE-binding sites ( 5 ' to ompA , luxS , and rpoE ) were examined . 
+ σE activated each locus through direct occupation of their σE consensus sequence ( Fig 6D ) . 
+ Moreover , the direct effect of σE in regulating its own expression ( rpoE ) was much lower than the total and the indirect contributions of σE . 
+ The dissection of σE-mediated regulation clarified that σE activated gene expression by binding to the consensus sequence ( such as greA ) or reverse consensus sequence ( such as ompX ) . 
+ The previous findings of σE repressing gene expression by comparing the level of expression in WT and ΔrpoE was therefore misleading , in that indirect regulation of σE expression itself played a large role in defining activation versus repression . 
+ When the indirect regulation of σE is subtracted from the total regulation mediated by σE , the resulting direct regulation of σE is activation of gene expression . 
+ Together , 
+ Genes regulated by σE are involved in heat shock and oxidative stress response The alternative sigma factor σE is activated under extracytoplasmic stress such as heat shock , ethanol , osmotic stress , immune response etc. [ 4 ] . 
+ By regulating gene expression at both the transcriptional and post-transcriptional levels [ 11 ] , σE functions to maintain the integrity and homeostasis of the cell . 
+ We further investigated if newly identified targets of σE-binding served to fulfill this purpose . 
+ The WT and consensus sequence substituted mutants ( greA-RpoE1 , ompX-RpoE1 , ompA-RpoE1 , luxS-RpoE1 , and rpoE-RpoE1 ) were challenged by heat shock and oxidative stress ( Fig 7 ) . 
+ Compared to WT , the susceptibility of all mutants to heat shock was significantly altered at 90 min post-challenge . 
+ Except for greA-RpoE1 , all other mutants exhibited lower resistance to heat shock . 
+ At the 3 h time point , the greA-RpoE1 strain recovered to WT levels , whereas the remaining mutants were significantly hyper-sensitive to heat shock when compared to WT ( Fig 7A ) . 
+ Under oxidative stress , the mutant strains exhibited higher ( greA-RpoE1 ) or lower ( rpoE-RpoE1 and ompX-RpoE1 ) susceptibility than WT at 60 min post-challenge . 
+ At the 2 h time point , all mutant strains showed significantly altered sensitivity to oxidative stress compared to WT ( Fig 7B ) . 
+ These results suggested the genes regulated by σE ( greA , luxS , ompA , and ompX ) were involved in heat shock and oxidative stress response . 
+ Discussion
+ The transcriptional profile of σE has been studied using microarray by comparing WT to ΔrpoE strain or by comparing WT to rpoE-overexpressed strain [ 10 ] [ 8 ] . 
+ Since microarray can not distinguish direct from indirect regulation , a large amount of genes were found to be regulated by σE , and the number of genes activated by σE was similar to those repressed [ 10 ] . 
+ Although the Salmonella σE-regulon associated with direct binding has been studied using an E. coli two-plasmid screening system [ 9 ] , the genome-wide identification of σE-binding sites during Salmonella in vivo infection has not been investigated . 
+ The in vivo σE-binding sites are likely different from in vitro because the signals perceived by σE from these two environments are different . 
+ Thirty-one σE-binding sites were identified during Salmonella in vivo infection in this study , and when compared with in vitro culture on fifteen of these sites , the binding capacity of σE in LPM 4 , 8 , and 20 h culture was lower on the majority of the sites measured ( Fig 4 ) . 
+ How σE selectively binds to some sites but not other potential sites under certain conditions is not known . 
+ However , other general regulators such as H-NS , IHF , and Fis might play a role in affecting the direct binding of σE to these sites [ 14 ] . 
+ Out of the thirty-one in vivo σE-binding sites , sixteen were located upstream of bi-directional promoter elements ( Table 1 ) . 
+ We found that those flanking genes were not equally regulated by σE under the conditions examined ( Fig 6 ) . 
+ The σE consensus sequence predicted here during Salmonella infection was similar to the consensus sequence identified by the E. coli two-plasmid system [ 9 ] , and also comparable to the consensus sequence identified in E. coli [ 8 ] , suggesting conservation of sequence features related to σE-binding . 
+ Both this and previous studies found that the σE consensus sequence located upstream of ompX was reversed , and the microarray data indicated that σE represses ompX expression [ 8 ] [ 10 ] . 
+ Since ompX was found to be directly regulated by σE , the repression effect of σE on outer membrane proteins was hypothesized as a conserved feature [ 8 ] . 
+ We constructed a consensus sequence substituted strain and by comparing with WT and ΔrpoE strains , successfully dissected the direct versus indirect contribution of σE . 
+ The results clearly demonstrated that σE directly activated ompX expression through the reversed consensus sequence . 
+ The perception of σE-mediated repression was due to the high-level of indirect effects of σE . 
+ Except for ompX , other genes encoding outer membrane proteins ( ompA , ompF , and ompN ) were additionally directly regulated by σE . 
+ Similar to ompX , the total effect of σE on the expression of these genes is not activation [ 10 ] , likely mediated by σE -- dependent small RNAs that accelerate global omp mRNA decay upon membrane stress [ 27 ] . 
+ We found that a considerable amount of σE-binding sites were associated with genes involved in transcriptional circuitry ( rpoE , rpoH , rpoD , rseA , greA , and sixA ) [ 25 ] [ 22 ] , protein folding ( fkpA , surA , and htrA ) [ 28 ] [ 29 ] [ 30 ] , protein biosynthesis ( pth ) [ 31 ] and assembly ( yfiO ) [ 32 ] , stress adaptation ( cspE , galF , and luxS ) [ 33 ] [ 34,35 ] , cell division ( minD/minE ) [ 21 ] , and lipid A biosynthesis ( ddg ) [ 36 ] . 
+ Upon activation of σE , a positive feedback on itself ensures sufficient σE is produced to accelerate the protein synthesis required for regaining homeostasis . 
+ As a strategy to avoid imbalances , σE also regulates the anti-sigma factor rseA , which dampens the σE response as a negative feedback loop [ 8 ] . 
+ Other alternative factors are also activated by σE , not only because some of the stresses they respond to are overlapping , but also to expand the protection to other stresses the bacteria may encounter in later stages [ 37 ] . 
+ A previous study identified a cell division protein FtsZ is regulated by σE in eight organisms closely related to E. coli [ 8 ] . 
+ Here , we found σE regulates other cell division factors ( minD / minE ) , further suggesting that σE may orchestrate factors associated with cell division . 
+ Both outer membrane proteins ( OmpA , OmpF , OmpN , and OmpX ) and proteins promoting OMP assembly ( FkpA , HtrA , and YfiO ) are regulated by σE , hence facilitating the proper assembly and insertion of the newly synthesized OMPs into the outer membrane of Salmonella . 
+ Activation of σE is thought to maintain bacterial homeostasis under extracytoplasmic stresses . 
+ Five σE-binding sites ( greA , ompX , ompA , luxS , and rpoE ) were tested for the fulfillment of this purpose using heat shock and oxidative stress challenges ( Fig 7 ) . 
+ All of them exhibited significantly altered susceptibility to these stresses , consistent with the traditional role of σE . 
+ Moreover , this finding supported the observation that the transcription elongation factor GreA has functional chaperone activity [ 38 ] , which is likely regulated by σE . 
+ Why the quorum sensing protein LuxS is involved in extracytoplasmic stress response needs to be further investigated . 
+ The σE regulatory system is flexible and efficient , hence playing a significant role for Salmonella to survive in various environments . 
+ Supporting Information
+ S1 Fig . 
+ The quality of antibodies for ChIP experiments . 
+ The rabbit Ab to σE and monoclonal Ab to GFP were tested for specificity by western blot . 
+ Cell lysates from WT or ΔrpoE strains were loaded on the gel , and different Abs were used to develop the membrane . 
+ Lane 1 , the flow through of rabbit immune sera during affinity chromatography purification of anti-σE Ab . 
+ Lane 2 -- 4 , Ab to σE diluted at 1:1000 , 1:3000 , and 1:5000 respectively . 
+ Lane 5 -- 7 , monoclonal Ab to GFP diluted at 1:1000 , 1:3000 , and 1:5000 respectively . 
+ ( TIF ) 
+ Conceived and designed the experiments: FH JNA JL. Performed the experiments: JL RC MBJ. Analyzed the data: CCO JL. Contributed reagents/materials/analysis tools: JEM ED Wrote the paper: JL EDC.
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/26483520.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/26483520.txt 0 → 100644
View file @27818a9
+ Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USAa; Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, USAb; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USAc
+ ABSTRACT 
+ An ability to sense and respond to changes in extracellular phosphate is critical for the survival of most bacteria . 
+ For Caulobacter crescentus , which typically lives in phosphate-limited environments , this process is especially crucial . 
+ Like many bacteria , Caulo-bacter responds to phosphate limitation through a conserved two-component signaling pathway called PhoR-PhoB , but the direct regulon of PhoB in this organism is unknown . 
+ Here we used chromatin immunoprecipitation-DNA sequencing ( ChIP-Seq ) to map the global binding patterns of the phosphate-responsive transcriptional regulator PhoB under phosphate-limited and - replete conditions . 
+ Combined with genome-wide expression proﬁling , our work demonstrates that PhoB is induced to regulate nearly 50 genes under phosphate-starved conditions . 
+ The PhoB regulon is comprised primarily of genes known or predicted to help Caulobacter scavenge for and import inorganic phosphate , including 15 different membrane transporters . 
+ We also investigated the regulatory role of PhoU , a widely conserved protein proposed to coordinate phosphate import with expression of the PhoB regulon by directly modulating the histidine kinase PhoR . 
+ However , our studies show that it likely does not play such a role in Caulobacter , as PhoU depletion has no signiﬁcant effect on PhoB-dependent gene expression . 
+ Instead , cells lacking PhoU exhibit striking accumulation of large polyphosphate granules , suggesting that PhoU participates in controlling intracellular phosphate metabolism . 
+ IMPORTANCE
+ The transcription factor PhoB is widely conserved throughout the bacterial kingdom , where it helps organisms respond to phosphate limitation by driving the expression of a battery of genes . 
+ Most of what is known about PhoB and its target genes is derived from studies of Escherichia coli . 
+ Our work documents the PhoB regulon in Caulobacter crescentus , and comparison to the regulon in E. coli reveals signiﬁcant differences , highlighting the evolutionary plasticity of transcriptional responses driven by highly conserved transcription factors . 
+ We also demonstrated that the conserved protein PhoU , which is implicated in bacterial persistence , does not regulate PhoB activity , as previously suggested . 
+ Instead , our results favor a model in which PhoU affects intracellular phosphate accumulation , possibly through the high-afﬁnity phosphate transporter . 
+ Most bacteria must sense and rapidly respond to the nutrient states of their environments to survive and to proliferate . 
+ Although this capacity for adapting to extracellular changes is critical , the molecular mechanisms by which it occurs remain incompletely understood . 
+ Sensing the availability of extracellular phosphate is particularly crucial , as phosphate is required for the synthesis of many biomolecules , from ATP to phospholipids . 
+ Prior studies have demonstrated that phosphate sensing is important for maximal growth of bacteria ( 1 ) , bioﬁlm formation ( 2 ) , and the virulence of some pathogens ( 3 -- 6 ) . 
+ Most bacteria respond to phosphate limitation through a widely conserved signal transduction pathway whose connectivity and functionality remain only partly characterized ( 1 ) . 
+ In this pathway , the availability of phosphate is likely sensed through changes in phosphate uptake by the high-afﬁnity Pst transporter in conjunction with a two-component signaling pathway , PhoR-PhoB , collectively known as the Pho system . 
+ In Escherichia coli , the Pst transporter is active during growth under phosphate-replete conditions , which somehow inhibits autophosphorylation of the histidine kinase PhoR . 
+ When phosphate becomes limiting and ﬂux through the Pst transporter is reduced , PhoR is stimulated to autophosphorylate and then transfer its phosphoryl group to PhoB . 
+ Phosphorylated PhoB undergoes a conformational change and dimerizes along its 4 - 5 - 5 interface ( 7 ) , allowing it to then bind conserved DNA sequences called pho boxes in certain promoters , typically leading to increased transcription of target genes ( 8 ) , many of which help cells cope with the decreased extracellular phosphate levels that initiated the pathway . 
+ X-ray crystallography and mutational studies indicate that PhoB binds to region 4 of 70 , stabilizing its association with the 35 region of target promoters ( 9 , 10 ) . 
+ In E. coli , the expression of more than 40 genes , including the pst and pho genes themselves , changes following phosphate starvation ( 11 ) . 
+ These genes were identiﬁed through a combination of reporter studies and DNA microarray analyses , but which genes are direct targets of PhoB is unclear ( 12 , 13 ) . 
+ More recently , a study of PhoB using chromatin immunoprecipitation with microarray technology ( ChIP-chip ) identiﬁed some putative direct targets but did not examine PhoB binding under high-phosphate conditions ( 11 -- 13 ) . 
+ The conservation of the Pho regulon is also unknown , as there have not yet been efforts to accurately deﬁne the entire set of genes directly regulated by PhoB in organisms other than E. coli . 
+ Two-component systems are a predominant means by which bacteria sense and respond to external stimuli ( 14 , 15 ) . 
+ Although many histidine kinases bind extracellular ligands , others lack large extracellular domains and may instead respond to intracellular signals ( 1 , 16 ) . 
+ The conserved Gram-negative histidine kinase PhoR resides in the inner membrane but does not contain a signiﬁcant periplasmic domain . 
+ PhoR has been suggested to sense the extracellular phosphate status through an interaction with the Pst transporter , which also resides in the inner membrane , but the precise mechanism by which the Pst system regulates PhoR is unclear . 
+ A protein of unknown function , PhoU , was proposed as an intermediate between the Pst and Pho systems , inhibiting PhoR when the Pst system actively transports phosphate ( 17 , 18 ) . 
+ phoU is widely conserved in bacteria and frequently coregulated with the pst and pho genes ( 17 , 18 ) . 
+ In E. coli , the expression of alkaline phosphatase and some other members of the Pho regulon are upregulated in phoU mutants ( 19 , 20 ) , which indicates that PhoU may function as a negative regulator of the Pho regulon . 
+ However , the effects of a phoU mutant on expression of the Pho regulon are poorly deﬁned , and earlier studies suggested that PhoU may instead affect phosphate transport ( 21 , 22 ) , leaving the precise function of PhoU uncertain . 
+ Although many bacteria use the Pst-Pho signaling pathway to respond to phosphate limitation , they likely adapt to changes in phosphate levels in different ways . 
+ For example , in the freshwater alphaproteobacterium Caulobacter crescentus , low extracellular phosphate stimulates elongation of a polar appendage called the stalk , which is a tubular extension of the cell envelope . 
+ Phosphate starvation can lead cells to extend their stalks to up to 20 times their lengths under phosphate-replete conditions ( 23 , 24 ) . 
+ It was initially suggested that stalk elongation may increase the nutrient-scavenging ability of phosphate-starved Caulobacter cells , but subsequent studies found that a diffusion barrier may exist between the stalk and the cell body , potentially preventing the free exchange of membrane and periplasmic proteins ( 25 ) . 
+ Here we use a combination of genome-wide expression proﬁl-ing , bioinformatics , and chromatin immunoprecipitation-DNA sequencing ( ChIP-Seq ) to identify the direct targets of Caulobacter PhoB . 
+ The Caulobacter PhoB regulon includes nearly 50 genes , with relatively few genes in common with the E. coli PhoB regulon beyond the pst and pho genes . 
+ Active Caulobacter PhoB drives substantial changes in the repertoire of membrane transporters expressed , presumably to help cells effectively scavenge for inorganic phosphate . 
+ Our results demonstrate how a highly conserved signaling pathway can be used for vastly different programs of gene expression in different bacteria , highlighting the plasticity of bacterial regulatory networks . 
+ Further , we show that the conserved protein PhoU does not , as previously suggested from studies of E. coli , negatively regulate the PhoR-PhoB signaling pathway in Caulobacter and instead is a critical player in cellular phosphate metabolism or phosphate transport by the Pst system . 
+ MATERIALS AND METHODS
+ Strains and growth conditions . 
+ C. crescentus strains were grown in pep-tone-yeast extract ( PYE ) ( rich medium ) , M2G ( minimal medium ) , or M5G ( low-phosphate medium ) , supplemented when necessary with oxytetracycline ( 1 g/ml ) , kanamycin ( 25 g/ml ) , or gentamicin ( 0.6 g/ml ) . 
+ Cultures were grown at 30 °C , unless otherwise noted , and diluted when necessary to maintain exponential growth . 
+ E. coli cultures used for cloning were grown at 37 °C in Superbroth , supplemented when necessary with oxytetracycline ( 12 g/ml ) , kanamycin ( 50 g/ml ) , or gentamicin ( 15 g/ml ) . 
+ The phoU depletion strain was constructed by ﬁrst integrating a copy of phoU at the vanillate locus using plasmid pVGFPN-4 ( 26 ) , with the phoU open reading frame cloned into the NdeI and XbaI sites , which removes the green ﬂuorescent protein ( GFP ) coding region from the plasmid . 
+ We subsequently deleted phoU from the pstC-pstA-pstB-phoU-phoB operon using allelic replacement , as described previously ( 27 ) . 
+ The resulting strain contains phoU at the van locus and a markerless deletion of phoU in the pstC operon , in which the entire coding region of phoU is removed , except for the ﬁrst and ﬁnal 9 bases . 
+ lacZ reporter plasmids were derived from pRKlac290 ( 28 ) . 
+ pRKlac290 was digested with KpnI and XbaI , and a DNA fragment containing the 200 bp directly upstream of either the pstC or the pstS annotated translation start site , with ﬂanking KpnI and XbaI cut sites , was cloned into the mul-tiple-cloning site upstream of the lacZ open reading frame . 
+ The pstS : : Tn5 strain was obtained from Yves Brun ( 23 ) and transduced into a clean CB15N background . 
+ Lysate from this strain was also used to construct the pstS : : Tn5 phoB-3 FLAG and pstS : : Tn5 Pvan-phoU phoU strains . 
+ The phoR : : tet strain was constructed previously ( 27 ) . 
+ The phoR Pvan-phoU phoU double mutant was constructed by transduction of the phoR allele into the phoU depletion strain . 
+ Microscopy . 
+ Cells in mid-exponential phase were ﬁxed with 0.5 % paraformaldehyde , mounted on M2G-1 .5 % agarose pads , and imaged as described in reference 29 . 
+ To image polyphosphate granules in single cells , 12 g/ml 4 = ,6 - diamidino-2-phenylindole ( DAPI ) was added directly to the culture medium . 
+ Culture plus DAPI was incubated at 22 °C in the dark for 30 min and spotted on M2G-1 % agarose pads . 
+ Fluorescence micros-copy images of DAPI-stained polyphosphate granules in cells were collected at a 630 magniﬁcation with a Leica CTR5000 microscope with a Hamamatsu ORCA-ER camera . 
+ A custom ﬁlter set ( 390/70-nm excitation ﬁlter , 488-nm dichroic ﬁlter , and 515-nm long-pass emission ﬁlter ) was used to visualize DAPI-polyphosphate . 
+ - Galactosidase assays . 
+ Strains were grown to the mid-exponential phase in medium supplemented with 1 g/ml oxytetracycline . 
+ Assays were performed essentially as described previously ( 30 ) . 
+ Immunoblots . 
+ Immunoblotting was performed as described previously ( 31 ) . 
+ Samples were prepared in 20 l of 1:4 sample buffer-distilled water to an optical density at 600 nm ( OD600 ) of 0.2 , resolved on 12 % sodium dodecyl sulfate-polyacrylamide gels , and transferred to polyvi-nylidene diﬂuoride transfer membranes ( Pierce ) . 
+ Membranes were probed with monoclonal mouse anti-FLAG ( Sigma ) at a 1:1,500 dilution . 
+ Secondary horseradish peroxidase ( HRP ) - conjugated antimouse antibody ( Pierce ) was used at a 1:3,000 dilution . 
+ ChIP-Seq and analysis . 
+ Chromatin immunoprecipitation for ChIP-Seq was performed as described previously ( 32 ) , with modiﬁcations . 
+ Mid-exponential-phase cultures were cross-linked in 10 mM sodium phosphate ( pH 7.6 ) and 1 % formaldehyde at room temperature for 10 min . 
+ Reactions were quenched with 0.1 M glycine at room temperature for 5 min and on ice for 15 min . 
+ Cells were washed three times in phosphatebuffered saline ( PBS ) and lysed with Ready-Lyse lysozyme solution ( Epicentre , Madison , WI ) according to the manufacturer 's instructions . 
+ Ly-sates were diluted 1:1 in ChIP buffer ( 1.1 % Triton X-100 , 1.2 mM EDTA , 16.7 mM Tris-HCl [ pH 8.1 ] , 167 mM NaCl ) with Roche protease inhibi-tor tablets ( Roche ) and incubated at 37 °C for 10 min . 
+ Lysates were sonicated ( Branson sonicator ) on ice , with 6 bursts of 10 s each at 15 % amplitude , and then cleared by centrifugation at 14,000 rpm for 5 min at 4 °C . 
+ Cleared supernatants were normalized by protein content in 1 ml of ChIP buffer with 0.01 % SDS and precleared with 50 l of protein A Dynabeads ( Invitrogen ) ( preblocked with 100 g UltraPure bovine serum albumin [ BSA ] in ChIP buffer with 0.01 % SDS ) by rotation for 1 h at 4 °C . 
+ Ten percent of each supernatant was removed and used as the total chromatin input sample . 
+ The remaining supernatant was incubated with a 1:1,000 dilution of anti-M2 antibody overnight at 4 °C . 
+ Each sample was then incubated with 50 l of preblocked protein A Dynabeads for 6 h at 4 °C , with rotation . 
+ The Dynabeads were washed consecutively at 4 °C for 15 min with 1 ml of the following buffers : low-salt wash buffer ( 0.1 % SDS , 1 % Triton X-100 , 2 mM EDTA , 20 mM Tris-HCl [ pH 8.1 ] , 150 mM NaCl ) , high-salt wash buffer ( 0.1 % SDS , 1 % Triton X-100 , 2 mM EDTA , 20 mM Tris-HCl [ pH 8.1 ] , 500 mM NaCl ) , LiCl wash buffer ( 0.25 M LiCl , 1 % Nonidet P-40 , 1 % deoxycholate , 1 mM EDTA , 10 mM Tris-HCl [ pH 8.1 ] ) , and Tris-EDTA ( TE ) buffer ( 10 mM Tris-HCl [ pH 8.1 ] , 1 mM EDTA ) ( twice ) . 
+ Complexes were eluted twice by incubation with 250 l freshly prepared elution buffer ( 1 % SDS , 0.1 M NaHCO3 ) at 30 °C for 15 min . 
+ Cross-links were reversed by the addition of 300 mM NaCl and 2 l of 0.5 mg/ml RNase A and overnight incubation at 65 °C . 
+ Samples were treated with 5 l of proteinase K ( 20 mg/ml ; NEB ) in 40 mM EDTA and 40 mM Tris-HCl ( pH 6.8 ) for 2 h at 45 °C . 
+ DNA was extracted using a PCR puriﬁcation kit ( Qiagen ) and was resuspended in 80 l of water . 
+ Libraries were prepared using the SPRIworks system and sequenced on an Illumina MiSeq sequencer ( MIT BioMicroCenter ) . 
+ ChIP-Seq results were analyzed using the MACS software package ( 33 ) . 
+ A total of 860,000 reads were analyzed for each sample , and peaks were called with a P value of 10 5 . 
+ To determine the amount by which a ChIP sample was enriched for individual loci , we performed quantitative PCR ( qPCR ) with an input DNA control for each sample ; a portion of each sample was reserved before the sample was subjected to ChIP , and DNA was isolated from this input sample and analyzed by qPCR at the pstC and CC1294 loci . 
+ Fold enrichment was then calculated as the amount of the pstC or CC1294 locus found in a ChIP output sample , relative to the amount found in the input sample . 
+ DNA microarrays . 
+ RNA was collected from cultures grown to the mid-exponential phase in rich medium at 30 °C . 
+ For the pstS mutant , RNA from the wild type was used as a reference . 
+ For the phoU depletion strain and the pstS and phoU depletion double mutant , the strains grown in the absence of vanillate were compared with the same strains grown in the presence of vanillate . 
+ Gene expression proﬁles were obtained as described previously ( 29 ) , using custom 8 15K Agilent expression arrays , and expression values are given as the average of ratios for a given gene . 
+ Complete data sets for expression proﬁling and ChIP-Seq are provided in Tables S1 and S2 in the supplemental material and are available through GEO . 
+ CFU . 
+ Strains were grown overnight in PYE with vanillate and then washed and released into medium with or without vanillate . 
+ Cultures were subsequently grown for 30 h and diluted once every two generations to maintain mid-exponential growth . 
+ Samples were removed every 3 h and plated on PYE with vanillate . 
+ Colonies were counted after 2 days of growth for all strains except those with a pstS mutation , which have a growth defect and were counted after 3 days of growth . 
+ Transposon mutagenesis and rescue cloning . 
+ Electrocompetent Pvan-phoU phoU cells ( 50 l ) were transformed with 0.5 l of EZ-Tn5 transposon mixture ( EZ-Tn5 R6Kyori/KAN -2 insertion kit ; Epicentre ) and grown in 1 ml of PYE for 1.5 h at 30 °C . 
+ Cells were then plated on PYE supplemented with kanamycin . 
+ Colonies were picked after 2 , 3 , and 4 days of growth at 30 °C . 
+ Colonies were restruck onto fresh plates with PYE plus kanamycin , and chromosomal DNA subsequently prepared from single colonies was cultured in PYE with kanamycin . 
+ DNA was digested with BfuCI for 2 min at room temperature and 20 min at 80 °C , to yield approximately 5-kb fragments . 
+ Sheared DNA was ligated with T4 DNA ligase , and the reaction mixture was dialyzed for 1 h using 0.45 - m nitrocellulose ﬁlters ( Millipore ) . 
+ From each dialyzed ligation reaction mixture , 1.5 l was electroporated into 25 l of pir-116 cells and plated on LB medium supplemented with kanamycin . 
+ DNA was extracted from the resulting colonies and sequenced using KAN-2 FP-1 and R6KAN-2 RP-1 primers ( Epicentre ) . 
+ RESULTS
+ Epitope-tagged PhoB retains wild-type function . 
+ To map the PhoB regulon in Caulobacter , we sought to perform ChIP-Seq on PhoB . 
+ To this end , we constructed a strain in which the chromosomal copy of phoB encodes a 20-amino-acid linker and a C-ter-minal 3 FLAG epitope tag ( phoB-3 FLAG ) . 
+ To test whether this version of phoB supports wild-type-like growth under both phos-phate-replete and phosphate-starved conditions , we grew the phoB-3 FLAG strain in minimal medium with 10 mM phosphate ( M2G ) and then washed and resuspended the cells in minimal medium with either 10 mM phosphate ( M2G ) or 50 M phosphate ( M5G ) , with the latter representing phosphate-limited conditions . 
+ The growth of the phoB-3 FLAG strain was indistinguishable from that of the wild type under both conditions , in contrast to the growth of a phoB deletion strain , which grew poorly under both conditions ( Fig. 1A ) . 
+ We also tested whether PhoB-3 FLAG regulated PhoB-de-pendent genes in a manner comparable to that of the untagged PhoB . 
+ We constructed lacZ transcriptional fusions with the pstC and pstS promoters in Caulobacter and then assessed the ability of phoB-3 FLAG to induce the expression of each reporter following phosphate limitation . 
+ Cells were shifted from M2G to M5G , or mock shifted and retained in M2G , and were grown for 7 h to mid-exponential phase before - galactosidase activity was measured . 
+ In M2G , the activity of each reporter was 2,000 Miller units in both the phoB-3 FLAG and wild-type strains . 
+ In M5G , - galactosidase activity was induced to 10,000 Miller units for the PpstC reporter in both the wild-type and phoB-3 FLAG strains and to 8,000 Miller units for the PpstS reporter in both strains . 
+ These results indicate that FLAG-tagged PhoB functions as well as untagged PhoB to induce the expression of PhoB-dependent genes . 
+ As a control , we conﬁrmed that in a phoB deletion strain , both reporters exhibited less than 2,000 Miller units of activity , consistent with these promoters being PhoB dependent ( Fig. 1B ) . 
+ Finally , we assessed the activity of PhoB-3 FLAG in a strain that constitutively activates the Pho system . 
+ Null mutations of pstS , which encodes the periplasmic phosphate-binding protein , block phosphate import and result in hyperactivation of the Pho regulon in Caulobacter , including stalk elongation , even during growth under phosphate-replete conditions ( 23 ) . 
+ We found that in a pstS : : Tn5 background , cells producing PhoB-3 FLAG also exhibited extensive stalk elongation ( Fig. 1C ) , further supporting our conclusion that the epitope-tagged version of PhoB binds and regulates the same set of target genes as wild-type PhoB . 
+ ChIP-Seq reveals genome-wide binding patterns of PhoB . 
+ We performed ChIP-Seq analysis using an anti-FLAG antibody with cells expressing phoB-3 FLAG grown to mid-exponential phase in ( i ) peptone-yeast extract ( PYE ) , a complex rich medium in which PhoB should be predominantly unphosphorylated and inactive , ( ii ) minimal deﬁned medium containing 50 M phosphate ( M5G ) , in which PhoB is phosphorylated and active , as judged by the PhoB-dependent reporters for PpstC and PpstS ( Fig. 1B ) , and ( iii ) PYE with cells also harboring a disruption of pstS , leading to high constitutive activation of PhoB . 
+ In each case , PhoB-bound DNA was immunoprecipitated using an anti-FLAG antibody . 
+ As controls , we performed ChIP-Seq using the same anti-FLAG antibody on strains grown under identical conditions but producing untagged wild-type PhoB . 
+ We ﬁrst used qPCR to verify that the ChIP samples for strains producing tagged PhoB were enriched for a chromosomal region ( the pstC promoter ) strongly predicted to be PhoB bound , based on E. coli studies and the expression analyses presented below . 
+ As a control locus , we examined the promoter of gene CC_1294 ( PCC1294 ) , whose expression is not PhoB regulated . 
+ For all three growth conditions involving cells producing PhoB-3 FLAG , the pstC locus was 20-fold enriched by ChIP relative to the input DNA ( Fig. 2A ) , in contrast to what occurred with cells producing wild-type PhoB , for which all enrichment ratios were 2.5-fold . 
+ As expected , PCC1294 was not signiﬁcantly enriched in any of the samples ( Fig. 2A ) . 
+ We then constructed and deep sequenced libraries from the ChIP samples taken from cells producing epitope-tagged PhoB and grown in a rich medium or phosphate-limited medium or cells with an additional pstS : : Tn5 mutation grown in rich me-dium ; control ChIP samples were taken from strains treated identically but harboring the wild-type copy of phoB . 
+ Equal numbers of reads ( 860,000 reads ) were analyzed from each sample , using the peak-calling software MACS ( 33 ) , and peaks for which the P values were 10 5 were identiﬁed . 
+ Consistent with our qPCR data , the genome-wide PhoB binding proﬁles ( Fig. 2B and C ) indicated that in most cases , PhoB binding was induced by phosphate limitation . 
+ Only 5 signiﬁcant peaks were identiﬁed in the rich medium sample , while 102 signiﬁcant peaks were found in the 50 M phosphate sample and 204 were identiﬁed in the pstS : : Tn5 sample , consistent with PhoB being hyperactivated in this genetic background ( Fig. 3A ; also see Table S1 in the supplemental material ) . 
+ Fold enrichment for individual loci in the ChIP-Seq data was determined by comparing the number of reads at a particular peak in the epitope-tagged experimental sample with the number of reads at the same locus in the non-epitope-tagged control sample . 
+ At most loci , the observed fold enrichment was greater in the pstS mutant sample than in the low-phosphate sample ( Fig. 3B ; also see Table S1 in the supplemental material ) ; this may indicate that a pstS mutation leads to greater activation of PhoB than does lowphosphate growth medium . 
+ Alternatively , the difference may re-ﬂect differences in the abundances of other transcription factors , as the pstS cells were grown in rich medium while the phosphate-starved cells were grown in minimal medium . 
+ In total , 92 peaks identiﬁed in the pstS sample were also identiﬁed in the 50 M phosphate sample ( Fig. 3A ) . 
+ The overlap in the peaks identiﬁed for these two independent samples , in which PhoB is activated under different nutrient conditions , supports the notion that these peaks represent direct PhoB binding sites and suggests that PhoB regulates the same core set of genes at different levels of phosphate limitation . 
+ Finally , we noted that the vast majority of PhoB ChIP-Seq peaks were found in intergenic regions ( see Table S1 ) . 
+ Only 4 of the 50 highest peaks were contained within an annotated coding region . 
+ This pattern is consistent with a model in which PhoB activates transcription primarily by binding near the 35 region of promoters , outside coding regions ( 8 ) . 
+ Identiﬁcation of the PhoB regulon . 
+ To delineate high-conﬁ-dence members of the PhoB regulon , we used whole-genome DNA microarrays to identify genes whose expression depends on PhoB . 
+ We harvested RNA from strains harboring either the pstS : : Tn5 mutation alone or the pstS : : Tn5 mutation in a phoB background , with each strain being grown to mid-exponential phase in rich medium . 
+ RNAs from these strains were compared on microarrays with RNA obtained from wild-type Caulobacter grown under the same conditions . 
+ To identify PhoB-regulated genes , we selected genes that ( i ) had expression levels that changed at least 1.7-fold in the pstS mutant relative to the wild type but did not change in the pstS : : Tn5 phoB double mutant relative to the wild type and ( ii ) had promoters that were enriched 5-fold over background in the PhoB ChIP-Seq analysis of pstS : : Tn5 cells . 
+ A threshold of 5-fold was chosen because promoters showing enrichment above this level in the pstS : : Tn5 cells also typically showed high enrichment in the M5G ChIP sample . 
+ In total , 43 genes ﬁt these criteria , with 32 genes in 22 putative transcriptional units being activated by PhoB and 11 genes being repressed by PhoB ( Fig. 3B ) . 
+ There were an additional 91 genes whose expression changed at least 2-fold in the pstS : : Tn5 mutant but that did not have peaks indicating 5-fold enrichment in the PhoB ChIP-Seq data , although some had 2-fold enrichment ( see Table S2 in the supplemental material ) . 
+ These genes are likely indirectly regulated by PhoB . 
+ Conversely , there were 39 genes that had PhoB ChIP-Seq peaks indicating 5-fold enrichment but that did not change signiﬁcantly in terms of gene expression . 
+ Some of these genes may be directly regulated by PhoB , but either a change in gene expression was not detected by DNA microarray analysis or expression of these genes may change only transiently upon phosphate starvation . 
+ Finally , we note that there were 122 genes with ChIP-Seq peaks indicating enrichment of 2-fold ; some of these genes may also be bona ﬁde PhoB targets . 
+ We used the motif-ﬁnding program MEME ( 34 ) to identify a consensus PhoB binding site using the sequences of the 24 ChIP-Seq peaks that were 13-fold enriched in the pstS mutant . 
+ The resulting motif contains two 6-bp sites that appear to be direct repeats ﬂanking an AT-rich region ( Fig. 3C ) . 
+ Although similar to the PhoB consensus site predicted for E. coli ( 11 ) , this Caulobacter site is 1 base pair shorter and has a higher GC content , with the latter possibly reﬂecting the higher GC content of the Caulobacter genome . 
+ In E. coli , the central AT-rich region was proposed to be a modiﬁed 35 binding site ( 11 ) . 
+ The putative Caulobacter PhoB binding motif , or Pho box , identiﬁed here was used to predict PhoB binding sites across the Caulobacter genome using MAST ( 34 ) . 
+ This analysis identiﬁed putative Pho boxes within 37 of the 50 most enriched ChIP-Seq peaks ( see Table S1 in the supplemental material ) . 
+ The list of 43 genes in Fig. 3B represents high-conﬁdence members of the PhoB regulon and includes several known and expected target genes , such as the Pst transporter genes ( pstCAB and pstS ) , phoU , and phoB itself . 
+ The PhoB regulon indicates that Caulobacter cells respond to phosphate limitation through major changes in the expression of membrane transport systems , likely to support the scavenging of inorganic phosphate from the extracellular environment while preventing unwanted efﬂux . 
+ In addition to upregulating the high-afﬁnity Pst transporter , PhoB activates the expression of the phosphonate transport system PhnCDE , 6 genes annotated as TonB-dependent receptors , and the TonB accessory proteins ExbB and ExbD . 
+ The set of PhoB targets also includes genes that help to generate sources of inorganic phosphate , such as a secreted alkaline phosphatase , an exported lipoprotein , called ElpS , that may stimulate alkaline phosphatase activity ( 35 ) , and another putative secreted phosphatase , PhoX . 
+ The set of genes downregulated by PhoB includes 4 TonB-dependent receptors . 
+ Whether ( and how ) the repression of these transporters helps cells cope with phosphate limitation is unclear , but the changes in their expression underscore the notion that Caulobacter cells respond to phosphate starvation by remodeling membrane transport capabilities . 
+ PhoU is not a negative regulator of the Pho regulon in Cau-lobacter . 
+ The activation of PhoB as a transcription factor depends on the histidine kinase PhoR , which somehow senses changes in ﬂux through the phosphate transporter PstABC . 
+ Whether this sensing is direct is unknown . 
+ A highly conserved protein called PhoU , which often is encoded in the same operon as the pst genes , has been suggested to couple PhoR with the Pst transporter . 
+ Spe-ciﬁcally , PhoU was suggested to repress PhoR activity under phos-phate-replete conditions when the transporter is active , implying that PhoU is a negative regulator of the Pho regulon ( 1 ) . 
+ To test this hypothesis in Caulobacter , we constructed a strain in which phoU was deleted from its native locus but inserted at the van locus under the control of the Pvan promoter , permitting inducible expression of phoU by vanillate . 
+ This phoU depletion strain was cultured overnight in rich medium supplemented with vanillate , and cells were then shifted to medium without vanillate and diluted as needed to maintain exponential growth . 
+ Upon the removal of vanillate , the depletion of PhoU did not result in stalk hyperelongation , as would be expected upon loss of a negative regulator of the Pho regulon and as occurs in pstS mutants ( Fig. 1C ) . 
+ Instead , the loss of PhoU led to enlarged and slightly ﬁlamentous cells after 8 h of depletion and modest chromosome accumulation after 16 h of depletion ( Fig. 4A ) . 
+ Also in contrast to a pstS mutant , we found that the depletion of phoU was lethal . 
+ To measure viability , the phoU depletion strain was shifted to medium without vanillate and diluted approximately every two generations to maintain growth in mid-expo-nential phase . 
+ Samples were taken at 3-h intervals to measure CFU , normalized at each time point to the number of CFU per ml of culture at an OD600 of 1 . 
+ We observed a 3-log decrease in CFU after 30 h of depletion ( Fig. 4B ) . 
+ In contrast , when wild-type or pstS : : Tn5 cells were treated in the same manner , we observed no loss in viability . 
+ Finally , we note that we were unable to delete the native copy of phoU unless another copy of phoU was present at the vanillate locus ( data not shown ) . 
+ Collectively , these phenotypic analyses suggest that PhoU likely does not negatively regulate the Pho regulon in Caulobacter . 
+ To directly assess whether PhoU inﬂuences the Pho regulon , we examined global patterns of gene expression in the phoU depletion strain . 
+ A culture was grown in the presence of vanillate and then shifted to medium with or without vanillate . 
+ Samples were removed at 2 , 5 , 7 , and 16 h postshift , and RNAs from the two conditions were directly compared on DNA microarrays . 
+ Although we did not have an antibody for monitoring PhoU levels , we note that the viability of the phoU depletion strain began to decrease 10 h postshift , suggesting that PhoU levels had dropped signiﬁcantly by that point . 
+ Moreover , even if PhoU were lost only through dilution , it would be present at 6 % or 0.1 % of the initial levels after 7 or 16 h of depletion , respectively , given that the phoU depletion strain accumulates ( the optical density increases ) at a rate comparable to that of the nondepleted control for up to 14 h . 
+ The vast majority of genes upregulated 2-fold or more in a pstS mutant were not upregulated after phoU was depleted ( Fig. 5A ; also see Table S2 in the supplemental material ) . 
+ Expression of the pstCAB genes ( but not pstS ) was increased 2.6-fold 7 h after the shift , although this change was substantially less than that observed in the pstS mutant . 
+ After 16 h of PhoU depletion , some Pho regulon genes were upregulated ( Fig. 5A ) ; however , much greater changes in expression were observed for a multitude of other stress response genes at this time point ( Fig. 5B ; also see Table S2 ) . 
+ Additionally , by the 16-h time point , an approximately 2-log decrease in viability was observed ( Fig. 4B ) , suggesting that expression changes at this time point might have resulted nonspeciﬁcally from cell death . 
+ In support of the latter idea , we also examined gene expression changes in a phoU depletion strain harboring a suppressor mutation ( see below ) , such that cells remain viable throughout a PhoU depletion time course . 
+ In that case , we did not observe any signiﬁcant changes in the Pho regulon genes or in other genes that change following PhoU depletion , even after 16 h ( Fig. 5C ; also see Table S2 ) ; however , it is possible that for the Pho regulon , the genes are already maximally induced in the suppressor strain and thus can not be further induced . 
+ We also assayed the effects of depleting PhoU by using lacZ reporters for the pstC and pstS promoters , which are PhoB regulated ( Fig. 1B and 3B ) . 
+ We again shifted the phoU depletion strain to noninducing conditions , and we observed a modest increase in the activity of the PpstC reporter but not the PpstS reporter after 7 h or more ( Fig. 5D ) . 
+ In both cases , substantially higher activity was seen in the pstS mutant . 
+ Taken together , our data indicate that a loss of phoU does not induce the same expression patterns as seen in a pstS mutant in which the Pho regulon is upregulated . 
+ These data support the conclusion that PhoU does not function as a negative regulator of the Pho regulon in Caulobacter . 
+ Mutations in both the Pst and Pho systems suppress a phoU mutant . 
+ To further probe the function of PhoU , we isolated mutations that restore viability to a strain depleted of PhoU . 
+ We performed transposon mutagenesis of the Pvan-phoU depletion strain using kanamycin-marked Tn5 . 
+ We plated transposon-mu-tagenized cells on rich medium with kanamycin and without va-nillate , to select for mutants that are viable in the absence of phoU . 
+ We selected colonies that grew after 2 to 4 days , and we identiﬁed the sites of transposon insertion for 18 candidate suppressors ( Fig. 6A ) . 
+ Two insertions mapped near vanR , the vanillate repressor ( which regulates phoU transcription in this strain ) , and one mapped to cobT , a cobaltochelatase , which we did not independently verify . 
+ Seven Tn5 insertions mapped to phoR or phoB , with the remaining 8 insertions being distributed among the four components of the pst system , i.e. , pstSCAB . 
+ To verify that mutations in both the pst and pho systems can suppress the lethality of a phoU mutant , we independently transduced the pstS : : Tn5 allele and a phoR mutation into the phoU depletion strain . 
+ In both cases , the mutation introduced was epi-static to the depletion of phoU . 
+ Cells depleted of PhoU and harboring a phoR deletion had a morphology similar to that of the phoR strain and no longer lost viability when shifted to medium lacking vanillate ( Fig. 6B and C ) . 
+ Similarly , pstS : : Tn5 cells depleted of PhoU exhibited the long-stalk phenotype associated with pstS : : Tn5 and retained viability upon being shifted to medium without vanillate . 
+ These ﬁndings conﬁrm that mutations disrupting either the PhoR-PhoB signaling pathway or the Pst transporter can suppress the lethality of PhoU depletion . 
+ These genetic results corroborate our conclusion that PhoU is not a negative regulator of the Pho regulon . 
+ If it were and the lethality of phoU depletion were due to overexpression of the Pho regulon , then our screen would have been predicted to identify suppressor mutations in phoR and phoB but not in the pst genes , because such mutations , like pstS : : Tn5 , dramatically upregulate the Pho regulon ( Fig. 3A and B ) . 
+ Instead , our results strongly suggest that the lethality of a PhoU mutant is suppressed by reducing the levels of the Pst transporter , which is achieved directly by disrupting pst genes or indirectly by disrupting phoR or phoB , genes that are required for expression of the pst genes ( Fig. 1B and 5C ) . 
+ This model suggests that PhoU may participate in regulating or metabolizing cellular pools of inorganic phosphate that are taken up by the Pst system , such that , without PhoU , the inorganic phosphate imported is inappropriately metabolized or perhaps converted to a toxic form , leading to cell death . 
+ Alternatively , the activity of the Pst transporter may be increased without PhoU and cells may not tolerate the resulting excessive concentrations of inorganic phosphate . 
+ Intriguingly , we observed the formation of large intracellular granules by phase microscopy after depletion of phoU ( Fig. 7A ) ; the granules may be comprised of polyphosphate ( 36 ) . 
+ To test whether the intracellular granules observed are in fact composed of polyphosphate , we used ﬂuorescence microscopy . 
+ When stained with DAPI , polyphosphate granules ﬂuoresce green , while DNA ﬂuoresces blue , and an appropriate ﬁlter set can discriminate between polyphosphate and nucleic acids ( 37 ) . 
+ We found that , when grown in rich medium in the absence of vanillate , the phoU depletion strain indeed accumulated large stores of concentrated polyphosphate , visible as bright foci in most cells after 9 h ( Fig. 7A , top ) and as large extended structures after 22 h ( Fig. 7A , bottom ) . 
+ In the presence of vanillate , little intracellular polyphosphate accumulation was observed with DAPI staining . 
+ Under these growth conditions , foci were rarely seen in the wild type and , when pres-ent , were substantially smaller than those seen following PhoU depletion . 
+ These results are consistent with a model in which the uptake of inorganic phosphate is unregulated in cells lacking PhoU , leading to the accumulation of large cytoplasmic pools of polyphosphate . 
+ We also asked whether the polyphosphate accumulation observed was responsible for the lethality of the phoU depletion mutant . 
+ If this were the case , then decreasing the levels of polyphosphate should restore viability . 
+ In Caulobacter , two genes , i.e. , ppk1 and ppk2 , are responsible for polyphosphate synthesis ( 38 ) . 
+ We therefore deleted ppk1 and ppk2 in the phoU depletion strain and assayed the levels of polyphosphate using DAPI staining . 
+ These cells were largely devoid of polyphosphate foci , even after depletion of PhoU for 22 h , as expected ( Fig. 7A ) . 
+ However , despite the absence of polyphosphate granules , cells lacking ppk1 and ppk2 and depleted of PhoU still lost viability , like the phoU depletion strain alone ( Fig. 7B ) . 
+ These results suggest that the polyphosphate granules seen in the phoU depletion strain do not contribute signiﬁcantly to cell death but reﬂect the fact that the cells likely have imported excessive levels of inorganic phosphate . 
+ The latter idea is consistent with the results of our suppressor screen , indicating that the lethality of a phoU mutant can be rescued by mutations that reduce the expression of the high-afﬁnity Pst transporter and presumably prevent an accumulation of cellular inorganic phosphate . 
+ Collectively , our data suggest that PhoU is not a negative regulator of PhoR and instead regulates phosphate uptake , such that cells depleted of PhoU accumulate excessively high levels of inorganic phosphate . 
+ DISCUSSION
+ Two-component signaling systems are one of the predominant forms of signal transduction in bacteria . 
+ Although these systems employ a variety of output mechanisms , most use a response regulator to enact a transcriptional response ( 39 ) . 
+ However , for the vast majority of response regulators , the regulons remain uncharacterized or incompletely deﬁned . 
+ Here we used global expression studies and ChIP-Seq to create a high-resolution map of PhoB targets in Caulobacter crescentus under both phosphate-limited and phosphate-replete conditions . 
+ Our results support a model in which phosphate limitation leads to the phosphorylation of PhoB , which enables it to bind and to regulate target genes . 
+ We found that PhoB directly activated more than 30 genes following phosphate starvation ( Fig. 3B ) , with at least 90 other genes being indirectly activated by PhoB . 
+ We also found that PhoB negatively regulates gene expression , with microarray analysis indicating that 11 PhoB targets are likely directly repressed by PhoB ( Fig. 3B ) . 
+ PhoB regulates a different set of genes in Caulobacter than in E. coli . 
+ A recent ChIP-chip-based study of the Pho regulon in E. coli ( 11 ) enables a comparison of the PhoB regulons in Caulobacter and E. coli . 
+ We ﬁnd that the set of genes regulated by PhoB in Caulobacter differs substantially from that regulated in E. coli , indicating that , although the upstream signaling pathway is highly conserved , the output regulon has likely been tailored to each organism 's individual needs and ecological niche . 
+ Of the 43 genes comprising the Caulobacter Pho regulon , we could identify a reciprocal best BLAST hit in E. coli for only 8 ( see Table S3 in the supplemental material ) . 
+ Of those 8 genes , only 3 are members of the E. coli Pho regulon , including phoB itself , the pst operon , and the phosphonate ( phn ) transport system operon . 
+ The minimal overlap of the PhoB regulons in Caulobacter and E. coli may reﬂect differences in the ways in which genes were identiﬁed as PhoB targets , although this is unlikely to explain the lack of overlap entirely . 
+ In some cases , there may also be different genes in the two organisms that fulﬁll similar functions , particularly as both regulons encode a number of transport-related proteins . 
+ However , it generally appears that the two organisms have evolved fundamentally different Pho regulons . 
+ Some of the differences in the Caulo-bacter PhoB regulon relative to that of E. coli presumably account for their different physiological responses , most notably the stalk elongation that is a hallmark of phosphate-starved Caulobacter . 
+ Which genes are responsible for stalk elongation is not yet clear , but the delineation of the PhoB regulon may help guide future efforts to identify them . 
+ PhoU does not regulate PhoR activity in Caulobacter and instead likely regulates phosphate metabolism . 
+ Although the PhoR-PhoB signaling pathway is highly conserved throughout the bacterial kingdom and has been studied for decades , the mechanism by which the histidine kinase PhoR senses changes in extracellular phosphate levels has remained unclear . 
+ PhoU has been suggested to couple the Pst transporter to the histidine kinase PhoR , inhibiting PhoR when ﬂux through the transporter is high ( 1 ) . 
+ However , our results strongly suggest that PhoU does not function in this manner , at least in Caulobacter , because depletion of PhoU did not phenocopy a pstS mutation , which leads to hyperactivity of the PhoR-PhoB pathway . 
+ It is formally possible that PhoU has two functions , one that is essential for viability and a second that involves regulating the PhoR-PhoB pathway ; if the depletion of PhoU affects the essential function ﬁrst , then this could , in principle , mask effects on the PhoR-PhoB pathway . 
+ However , such a model would require that PhoU be capable of regulating PhoR-PhoB at extremely low levels , as many of the gene expression effects of PhoR hyperactivation were not seen after depletion of PhoU for 16 h. Even if PhoU were a stable protein , its levels should be 1 % after 16 h due to dilution , given that the phoU depletion strain continued to grow at a rate comparable to that of the wild-type strain for 10 to 14 h . 
+ Our genetic data indicate that the lethality of PhoU depletion can be rescued by mutations that block phosphate uptake , including disruptions of the genes that encode the high-afﬁnity phosphate transporter system PstSCAB or the signaling proteins PhoR and PhoB , which stimulate expression of the transport system ( Fig. 8 ) . 
+ We can envision two general models for PhoU function , i.e. , that ( i ) PhoU negatively regulates the phosphate import activity of the Pst transporter or ( ii ) PhoU regulates intracellular phosphate metabolism . 
+ We could not detect changes in radioactive phosphate uptake rates in the phoU depletion strain ( data not shown ) . 
+ However , the phosphate uptake assay may not detect subtle differences that , over the extended period of a phoU depletion time course ( Fig. 4 ) , lead to signiﬁcant differences in intracellular phosphate levels . 
+ Studies of E. coli have produced inconsistent results , as phoU mutants have been reported to have little effect on phosphate import ( 18 ) , to increase phosphate import ( 21 ) , and to reduce phosphate import ( 22 ) . 
+ Whether PhoU affects phosphate uptake or metabolism , cell death likely results from an excessive accumulation of intracellular phosphate or other metabolites that accumulate in a phosphate-dependent manner , explaining why mutations that prevent the expression or activity of the Pst transporter , and hence slow the import of phosphate , restore viability . 
+ Additionally , we found that Caulobacter cells accumulated large polyphosphate granules following PhoU depletion ( Fig. 7 ) , as also seen in other organisms ( 36 ) . 
+ The presence of these granules after PhoU depletion is consistent with a defect in the regulation of phosphate uptake . 
+ However , polyphosphate granules are not the underlying cause of cell death in the absence of PhoU , because deletion of ppk1 and ppk2 did not restore viability to the phoU mutant . 
+ Instead , only mutations that resulted in the loss of the phosphate transporter or the genes ( phoR and phoB ) that drive its expression rescued phoU ( Fig. 6 ) . 
+ Thus , we favor a model in which PhoU regulates the Pst transporter to prevent the toxic accumulation of inorganic phosphate or a phosphate-derived metabolite ( Fig. 8 ) . 
+ In either case , our results strongly support the conclusion that PhoU does not couple the Pst system to the histidine kinase PhoR , as is commonly asserted . 
+ Concluding remarks . 
+ We have characterized both the transcriptional output enacted in response to phosphate limitation and the signaling pathway that regulates it in Caulobacter crescentus . 
+ We have shown that PhoU does not act as a negative regulator of the Pho regulon but that it is a critical player in cellular phosphate metabolism , a role that is likely to be conserved , given the wide conservation of phoU . 
+ We have also identiﬁed the genes that PhoB regulates in response to phosphate limitation in Caulobacter crescentus ; these genes show little similarity to the set of PhoB-regulated genes in E. coli , highlighting the ﬂexibility and dynamics of transcriptional networks in bacteria . 
+ The delineation of the Caulobacter Pho regulon will enable a better understanding of how bacteria respond to phosphate limitation , including how the synthesis of a polar organelle , such as the stalk , is regulated . 
+ ACKNOWLEDGMENTS
+ We thank D. Baer for help with ChIP-Seq analysis , A. Yuan for pstS : : Tn5 and pstS : : Tn5 phoB microarray data , and B. Perchuk for assistance with viability assays . 
+ M.T.L. is an Investigator of the Howard Hughes Medical Institute . 
+ The funders had no role in study design , data collection and interpretation , or the decision to submit the work for publication .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/26706151.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/26706151.txt 0 → 100644
View file @27818a9
+ Seminars in Cell & Developmental Biology
+ completion of a prior event before initiation of a subsequent event . 
+ However , in the cell cycle of bacteria dividing under high nutrient or high growth conditions , the events of cell growth , chromosome replication , chromosome segregation and the assembly of the division machinery at the division site occur simultaneously . 
+ The chromosome begins a subsequent round of replication prior to the completion of the ﬁrst , a mechanism termed multifork replication . 
+ As the round of replication nears completion , the division machinery accumulates and assembles at the division site in preparation for septation , although the subsequent splitting of the daughter cells is held off until midcell has been cleared of DNA . 
+ The simultaneous nature of these cell cycle events in bacteria makes advance 
+ 1. Introduction
+ dination of chromosome replication with cell division to produce viable daughter cells . 
+ The ordering of distinct events of cell growth , chromosome replication , chromosome segregation and cytokinesis ( cell division ) is fundamental to the cell cycle in actively dividing eukaryotic cells and occurs through checkpoints that ensure Survival of any cellular organism relies on the efﬁcient coor ¬ 
+ 1084-9521 / Crown Copyright © 2015 Published by Elsevier Ltd. . 
+ All rights reserved . 
+ in this area challenging due to the difﬁculty in separating these processes and , despite intense investigation over several decades , the exact mechanism as to how bacteria spatially and temporally couple septum formation with the replication and segregation of the genetic material is not yet clearly understood . 
+ This review focuses on the current evidence of a coordinating link between these processes in Bacillus subtilis ( Gram-positive model system ) and Escherichia coli ( Gram-negative model system ) , and the latest advances in chromosome conformation capture techniques to bring us one step closer to answering this fundamental question . 
+ 2. DNA replication
+ divided into three main stages : initiation , synthesis ( or elongation ) and termination . 
+ A key regulatory molecule for ensuring that replication occurs only once per cycle and in synchrony with cell growth and division is initiator DnaA , an AAA + ATPase ( ATPases Associated with various cellular Activities ) found in virtually all bacteria [ 1,2 ] : levels of DnaA are stringently controlled . 
+ Means of its control vary amongst bacteria and have been recently reviewed [ 1,3,4 ] . 
+ In its ATP-bound form DnaA exists as a helical oligomer that binds to speciﬁc AT-rich sequences ( DnaA-boxes ) within oriC . 
+ It is the binding of DnaA to the single chromosome origin of replication , oriC , positioned ◦ ◦ replication . 
+ At these sites DnaA forms a highly ordered nucleoprotein complex , the DNA-unwinding element ( DUE ) [ 1,5 ] , effectively melting and unwinding the local double stranded DNA to allow the recruitment of the replication machinery [ 5,6 ] . 
+ case loader ( DnaI ) and co-loader proteins ( DnaD and DnaB ) that are also recruited to this site actively load the helicase ( DnaC ) to establish the replication fork [ 7 -- 9 ] . 
+ DNA primase ( DnaG ) , DNA polymerase III holoenzyme ( PolC ) and the accessory polymerase ( DnaE ) then bind to the oriC region and , together with the other replication proteins , form the replisome thus completing the initiation stage [ 8 ] . 
+ A similar process , although with differing proteins , occurs in E. coli , as reviewed by [ 10 ] . 
+ [ Note , particularly that the DNA helicase in E. coli is named DnaB , not DnaC as it is in B. subtilis and that there is no homolog of the B. subtilis DnaB in E. coli . ] 
+ the circular chromosome [ 11 -- 13 ] . 
+ The two replication forks continue bi-directionally until they encounter the terminus region ( Ter ) , at which point the replisome disassembles allowing the decatenation and subsequent complete separation of the newly replicated sister chromosomes [ 14,15 ] . 
+ Resolution of the chromosomes , when required , is then completed by the combined action of site-speciﬁc recombinases located at the terminus ( XerCD in E. coli , and RipX and CodV in B. subtilis ) and DNA translocases ( FtsK in E. coli , and SpoIIIE in B. subtilis ) [ 16 -- 21 ] . 
+ DNA replication is a tightly regulated and ordered process at the 0 / 360 chromosome position that initiates Working together with oriC-bound DnaA , in B. subtilis , the heli-Starting at oriC , DNA synthesis occurs bi-directionally around 
+ 3. Chromosome segregation
+ regation occur concomitantly . 
+ Soon after the origin regions are replicated , they migrate in opposite directions towards the future division sites , located at the cell quarter positions [ 22 ] . 
+ Most research on chromosome segregation in bacteria has focussed on how these newly-replicated origin regions separate . 
+ Separation involves the ParABS system and SMC ( Structural Maintenance of Chromosome ) condensin complex in B. subtilis , and the MukBEF complex in E. coli . 
+ aration in a number of bacteria , including B. subtilis , the ParABS system was ﬁrst identiﬁed by the discovery of two proteins , ParA In B. subtilis and E. coli , DNA replication and chromosome seg-Although it is now known to be required for chromosome sep-and ParB , required for effective plasmid partitioning on the P1 plasmid hosted in E. coli [ 23 ] . 
+ ParB was found to bind co-operatively to the parS cis-acting site along with ParA , a Walker-type ATPase , to form a large nucleoprotein complex resulting in replicated plasmids segregating bidirectionally to the cell poles [ 24 ] . 
+ Since its discovery , par loci have been subsequently found on the chromosome of over 65 % of all sequenced bacterial genomes [ 25 ] , including B. subtilis . 
+ In B. subtilis however , the components of the ParABS system are known as Soj ( ParA ) and Spo0J ( ParB ) because they had been previously observed in B. subtilis having an effect on sporulation [ 26 ] . 
+ Cells lacking Spo0J mislocalise sister origin positions [ 27 ] , suggesting a role for this protein in separating the newly duplicated chromosome origins in B. subtilis . 
+ Subsequently it was shown that Spo0J binds to several parS sites located within the oriC region [ 28,29 ] , and in cells labelled with Spo0J-GFP , Spo0J co-localises with oriC and appears as distinct compact foci positioned at the cell quarters [ 30,31 ] . 
+ Several elegant studies in recent years have revealed signiﬁcant insight into the roles of the ParABS system and the SMC condensin complex in chromosome segregation in B. subtilis . 
+ Following binding to parS , Spo0J spreads onto non-speciﬁc neighbouring DNA , drawing them together to form a nucleoprotein complex ( see Fig. 1B ) [ 30,32 ] . 
+ The method by which Spo0J spreads has been recently proposed by Graham et al. , such that Spo0J forms clusters on neighbouring DNA bridging them together , forming DNA loops [ 33 ] . 
+ The formation of these long-distance DNA loops is suggested to facilitate the condensation and compaction of the origin-proximal region of the chromosome as well as the recruitment and loading of the SMC constituents ( Fig. 1B ) [ 33 ] . 
+ The SMC condensin complex ( from now on referred to as simply SMC ) , made up of proteins Smc , ScpA and ScpB , and supplemented by the ParABS system , is then suggested to draw the sister origins away from each other [ 34 ] . 
+ Essentially , SMC resolves the origins enabling ParABS to actively segregate origins towards opposite poles . 
+ The extent to which Soj actually plays a role in the active segregation of chromosomes is unknown . 
+ Interestingly it was shown that the primary role of Soj is likely to be in regulating the initiation of DNA replication ( Fig. 1A ) . 
+ It does this by directly interacting with the initiation protein , DnaA [ 35 ] , inhibiting or promoting DnaA activity . 
+ As a monomer , Soj inhibits DnaA activity by preventing formation of its helical oligomer , whereas the Soj dimer relieves this inhibition by allowing DnaA to form oligomers . 
+ This ability to switch however is mediated by its interaction with DNA-bound Spo0J ( Fig. 1A ) [ 36 ] . 
+ So it is now clear that Spo0J has two separate roles , one in the regulation of DnaA via its effect on Soj self-association , and another in chromosome segregation . 
+ These functions have been shown to reside in different domains [ 37 ] . 
+ The mechanism of chromosome segregation in E. coli is more elusive . 
+ While the ParABS system is absent in this organism it does possess a distant relative of the SMC complex , MukBEF , a protein complex existing in enterobacteria and some - proteobacteria [ 38,39 ] . 
+ This complex plays a key role in separating newly replicated oriC regions [ 40 ] and , together with topoisomerase IV ( TopoIV ) , in promoting DNA decatenation [ 41 ] . 
+ Although it does n't share any homology with Spo0J , MukB is thought to have a similar bridging function , in that it binds DNA forming a cluster , creating bridges with randomly colliding protein-free DNA . 
+ Rybenkov et al. postulates that the formation of these bridges stabilises DNA compaction , potentially assisting in pulling apart the sister chromosomes [ 42 ] . 
+ It is unlikely these proteins are the sole players in chromosome segregation and there are several hypotheses as to how E. coli and other bacteria segregate their chromosomes [ 12,43,44 ] . 
+ In fact , biophysical models in E. coli suggest that chromosome segregation is generated via entropic forces [ 45 ] . 
+ Application of polymer physics concepts to the bacterial chromosome by Jun and Mulder , resulted in a passive segregation model where th replicated sister chromosomes themselves possess internal forces leading to entropic repulsion or exclusion [ 45 ] . 
+ However , recent evidence suggests that these entropic forces are insufﬁcient to complete whole chromosome segregation [ 46,47 ] , highlighting the high complexity of chromosome segregation which requires several different components or modes of action for successful execution . 
+ 4. Cell division and regulation of division-site placement
+ ciﬁc division proteins to the right place ( midcell ) at the right time within the cell cycle . 
+ The ﬁrst and foremost of these proteins to localise to midcell is the tubulin-like protein , FtsZ which assembles at the inner face of the cytoplasmic membrane into a ring structure known as the Z ring [ 48 ] . 
+ The Z ring then facilitates the recruitment of all the other division proteins , together called the divisome [ 49 -- 51 ] , and provides a contractile force required for the invagination of the envelope layers , or at least the inner cell membrane . 
+ Thus , precise recruitment of FtsZ to midcell is central to the regulation of cell division and FtsZ has , over the last 20 years , become one of the most studied bacterial proteins . 
+ Much has been elucidated about FtsZ and most , if not all , of the divisome proteins [ 51 ] , but what yet escapes us is how the assembly of this crucial machinery , not only precisely ﬁnds the midcell , but how it does so in concert with the replication and segregation of the chromosome ? 
+ This question is of utmost importance as the correct timing and positioning of the Z ring between the DNA at midcell is quintessential to the competitive long-term survival of bacteria . 
+ Cell division is dependent on the localisation of numerous spe ¬ 
+ 4.1. Negative regulators of Z ring placement
+ has been described as regulated by the combined action of the Min system and nucleoid occlusion . 
+ The Min system prevents the Z ring from forming at the cell poles and nucleoid occlusion prevents the Z ring from forming within the vicinity of the chromosome [ 52 -- 54 ] . 
+ The overall result is that the two systems prevent the Z ring from forming anywhere other than the cell centre ( Fig. 2A ) . 
+ For the past two decades , the positioning of Z ring formation extensive research has revealed several components of this system that function in a co-operative manner to inhibit polar Z ring assembly and division at the poles . 
+ In E. coli and B. subtilis the Min system consists of two main proteins , MinC and MinD , that function to prevent FtsZ assembly and cell division at the cell poles , and additional Min proteins unique to each organism , that assist in different modes of action . 
+ For a complete review of the Min system and its mode of action , the reader is encouraged to read recent reviews [ 58 -- 60 ] . 
+ tion over the DNA , and is mediated by proteins SlmA ( in E. coli ) and Noc ( in B. subtilis ) [ 52,53 ] . 
+ Although no sequence homology exists between the two , both SlmA and Noc possess similar characteristics : both proteins bind to speciﬁc regions scattered around the chromosome , except for the terminus region , which is largely devoid of these binding sites [ 61 ] . 
+ This pattern of binding supports the proposal that as chromosome replication nears completion and the terminus region occupies the central position in the cell , SlmA and Noc are no longer present in this region , relieving this area of nucleoid occlusion , thus allowing a Z ring to form there [ 52 ] . 
+ SlmA and Noc however , have differing modes of action . 
+ Recent studies into the activity of SlmA have elucidated two potential mechanisms as to how it inhibits Z ring formation . 
+ In the ﬁrst , SlmA promotes FtsZ depolymerisation [ 62 -- 64 ] . 
+ When bound to its speciﬁc DNA binding sites ( SBS ) , SlmA attaches to the highly conserved C-terminal tail of FtsZ where it competes for binding with other interacting or regulatory partners of FtsZ , including ZipA , FtsZ , ZapD , MinC and ClpX [ 64 ] . 
+ This promotes further interactions between SlmA and FtsZ , leading to FtsZ protoﬁlament breakage independent of the GTPase activity of FtsZ [ 64,65 ] . 
+ In a second , alternative hypothesis , Tonthat et al. have suggested that SlmA binds to DNA as a dimer of dimers and spreads along nascent DNA where it forms higher-order nucleoprotein complexes that capture and inhibit FtsZ from coalescing into functional Z rings [ 66 ] . 
+ Continuing studies into the activity of SlmA are required to elucidate which hypothesis is correct , and , further , to understand how SlmA is able to carry out these membrane-localised functions when tethered to the DNA . 
+ effect by recruiting DNA to the membrane periphery via its newly discovered ability to bind the membrane [ 67 ] . 
+ Adams et al. propose a model in which Noc mediates its Z ring inhibitory function by physically crowding the available space between the DNA and the membrane periphery such that Z rings are unable to form there [ 67 ] . 
+ The model raises several questions . 
+ Is Noc abundant enough within the cell to mediate this crowding effect on its own or are there other protein players involved ? 
+ Is this Noc activity coupled with the transertion effect , a theory postulated over 20 years ago , which couples transcription , translation and insertion of membrane proteins ? 
+ Additionally , what effect does this recruitment of the DNA to the cell periphery have on chromosome organisation and what happens to this organisation in the absence of Noc ? 
+ Noc could potentially impact chromosome orga-nisation or segregation in a way not previously considered . 
+ Noc belongs to the ParB family and shares ∼ 40 % sequence homology with the known chromosome segregation protein , Spo0J [ 68 ] . 
+ Furthermore , unlike in B. subtilis , Staphylococcus aureus cells with a noc deletion form a signiﬁcant number of anucleate cells , even during normal , unperturbed growth [ 69 ] , thus suggesting a role for Noc in chromosome segregation in this organism . 
+ This raises the possibility that Noc could also be impacting chromosome organisation or segregation in B. subtilis to inﬂuence cell division . 
+ ulatory systems has shown that they can not be the sole regulators Since the discovery of the Min system over 30 years ago [ 55 -- 57 ] , Nucleoid occlusion on the other hand , inhibits Z ring forma-In contrast , Noc in B. subtilis mediates its nucleoid occlusion Continued study of the Min system and nucleoid occlusion reg-of correct placement of the Z ring at midcell . 
+ Under normal growth conditions , when either the Min system or Noc/SlmA in B. subtilis or E. coli are deleted , cells continue to grow and divide without major changes to cell viability [ 52,53,70 ] . 
+ However , although division is signiﬁcantly perturbed in B. subtilis and E. coli cells devoid of both the Min system and their respective nucleoid occlusion proteins , Z rings nonetheless preferentially form at midcell in internucleoid positions with high precision [ 53,71,72 ] . 
+ Thus , it appears that the role of the Min system and Noc/SlmA is to ensure there is sufﬁcient FtsZ for Z ring assembly at the desired division site in B. subtilis and E. coli by limiting the regions in which FtsZ can accumulate . 
+ Additionally , a number of bacteria possess only one system , or do not possess either the nucleoid occlusion or the Min protein homologues . 
+ Instead , positive mechanisms regulating Z ring positioning have recently been revealed by studies on several of these bacteria , including Streptococcus pneumoniae , Myxococcus xanthus and Streptomyces coelicolor . 
+ These are illustrated in Fig. 2B and described below . 
+ 4.2. Positive regulators of Z ring placement
+ A novel protein in S. pneumonia , recently described by two independent studies , is named MapZ ( Mid-cell Anchored Protein Z ) or LocZ ( Localising at midcell of FtsZ ) [ 73,74 ] . 
+ MapZ localises to the midcell division site prior to any division proteins , including FtsZ and FtsA . 
+ This localisation of MapZ drives the recruitment of FtsZ to its midcell position . 
+ Following Z ring assembly , MapZ splits into two rings , which migrate bidirectionally , in tandem with the equatorial rings , to the future division sites [ 73,74 ] . 
+ Similarly , the ParA-like protein , PomZ ( Positioning at Midcell of FtsZ ) , discovered by Treuner-Lange et al. in M. xanthus , localises at the cell centre prior to , and independently of , FtsZ [ 75 ] . 
+ However , PomZ appears to have a positive spatial and temporal regulatory role . 
+ In newborn cells , PomZ is seen to co-localise with the nucleoid . 
+ Only once the nucleoid replicates does PomZ migrate to midcell to promote FtsZ recruitment . 
+ How it does so is not yet understood . 
+ The lack of in vitro interaction between puriﬁed FtsZ and PomZ , led Treuner-Lange et al. to postulate the existence of interacting partners to mediate PomZ regulatory activity on FtsZ [ 75 ] . 
+ Interacting proteins positively regulating Z ring placement have also been observed in S. coelicolor . 
+ S. coelicolor possesses a novel set of proteins unique to Actinobacteria which promote FtsZ recruitment and polymerisation at the correct site . 
+ In sporulating S. coelicolor cells , the membrane-associated SsgB is localised at midcell by its interaction with SsgA ; SsgB then recruits and tethers FtsZ to the division site [ 76,77 ] . 
+ An outstanding question in these organisms is how do these proteins recognise the future division site ? 
+ Is there a signal or unidentiﬁed marker ? 
+ And do such positive systems exist in bacteria that possess the Min system and/or nucleoid occlusion proteins ? 
+ Given that neither the Min system nor nucleoid occlusion are essential in positioning the Z ring correctly in either E. coli or B. subtilis , it would suggest some other regulatory system exists in these bacteria . 
+ A recent positive regulation link has been found between cell division and glycolysis in B. subtilis . 
+ Monahan et al. describe a model in which PDH E1 ( the E1 subunit of pyruvate dehydrogenase , required for the metabolism of pyruvate at the ﬁnal stage of glycolysis ) positively regulates Z ring assembly by co-localizing with the chromosome in a pyruvate-dependent manner [ 78 ] . 
+ This system may help to coordinate bacterial division with nutritional conditions to ensure the survival of newborn cells . 
+ Indeed , increasing evidence is pointing towards aspects of DNA replication and chromosome organisation/segregation inﬂuencing cell division in B. subtilis and E. coli cation and cell division in bacteria came from studies in B. subtilis . 
+ B. subtilis cells are able to begin septation when only 70 % of the chromosome has been replicated [ 79 ] . 
+ Given that Z ring formation precedes septation , this means mechanisms must be at play to trigger this ﬁrst stage of cell division earlier on in DNA replication . 
+ Moreover , blocking the initiation of DNA replication in B. subtilis signiﬁcantly affects Z ring positioning , suggesting a link between these two processes [ 80,81 ] . 
+ Examining this more closely Moriya et al. examined the effect of different blocks at the initiation stage of DNA replication and found that , the earlier the block in initiation , the less likely a Z ring would form at midcell , with completion of the initiation stage allowing midcell Z rings to form at wild-type levels . 
+ Moriya et al. proposed a model , called the Ready-Set-Go model , linking the progression of initiation of DNA replication to midcell Z ring assembly , such that as the initiation phase progresses , midcell becomes increasingly available or `` potentiated '' for Z ring assembly ( Fig. 3 ) . 
+ This coincides nicely with the ﬁnding that the initiation phase of DNA replication in B. subtilis involves several proteins that assemble at oriC in a step-wise manner [ 8 ] . 
+ Most signiﬁcantly , the `` Ready Set Go '' phenomenon is independent of Noc [ 82 ] , and has a positive inﬂuence on Z ring placement . 
+ What this Z ring potential at midcell actually is , is currently unclear . 
+ It is possible that the build-up of the replisome proteins at the medially located oriC acts as a beacon for progressive FtsZ accumulation there . 
+ Importantly , this study highlighted that Noc activity is insufﬁcient in inhibiting cell division during initiation of DNA replication , suggesting other Noc-independent inhibition strategies must be in place within cells for proper cell division , an idea also supported by studies of Bernard et al. [ 83 ] . 
+ in B. subtilis is further demonstrated in studies by Arjes et al. . 
+ The authors show that extended inhibition of DNA initiation replication results in an irreversible block to cell division and vice versa . 
+ This phenomenon was adequately termed the point of no return ( PONR ) [ 84 ] . 
+ What the trigger for the PONR is and why bacteria are unable to resume growth remain outstanding questions . 
+ The phenomenon is however independent of the SOS-response and cellular levels of DnaA and FtsZ ; and microarray data suggest that the trigger for the PONR may be post-transcriptional [ 84 ] . 
+ to DNA replication in E. coli . 
+ Cambridge et al. found midcell Z ring assembly was inhibited when DNA replication elongation was blocked [ 85 ] . 
+ Importantly , this occurred in a SlmA - , MinC - and SOS-independent manner . 
+ Overall , this ﬁnding suggests that DNA replication playing a positive role in Z ring positioning is not exclusive to B. subtilis , but is likely to occur in a number of organisms . 
+ The ﬁrst suggestion of a coordinated link between DNA repli-Linkage between initiation of DNA replication and cell division More recently , midcell Z ring assembly has also been linked 
+ 6. Coordinating cell division with chromosome organisation and segregation
+ cally associated with coordinating cell division and chromosome segregation , a variety of mutations in chromosome segregation proteins have long been known to lead to incorrect Z ring positioning in both E. coli and B. subtilis [ 26,39 ] , providing clear evidence that the two processes are connected . 
+ The absence of any of the constituents of the E. coli MukBEF complex results in temperature sensitivity , loss of chromosome organisation and condensation , and generation of ∼ 5 % anucleate cells at the permissive temperature due to Z ring misplacement [ 39 ] . 
+ Similarly , B. subtilis cells lacking smc ( under slow growth conditions ) or spo0J exhibit aberrant positioning , or level of condensation of the nucleoid , and While the nucleoid occlusion proteins Noc and SlmA are typialso result in formation of anucleate cells [ 26,27,86,87 ] . 
+ In minimal media , deletion of both smc and spo0J enhances this effect whereby the frequency of anucleate cells increases to 19 % , with 12 % of cells containing nucleoids guillotined by the septum [ 27 ] . 
+ While these cell division phenotypes of chromosome segregation mutants have been known for a long time , it remains unclear if chromosome segregation and division site positioning are coupled by the chromosome segregation proteins themselves . 
+ At the heart of this question is the fact that misplacement of Z rings in chromosome segregation mutants can occur indirectly through nucleoid occlusion : chromosome segregation mutants alter chromosome architecture , thus resulting in improper Noc/SlmA-DNA localisation within the cell and misplaced Z rings . 
+ However , it still remains possible that chromosome segregation proteins may actually directly contribute to Z ring placement , independently of their indirect consequences on nucleoid occlusion . 
+ One hypothesis is that they may participate directly in establishing the Z ring site ; however no direct interaction between chromosome segregation proteins and FtsZ has ever been reported in the literature . 
+ A second hypothesis is that through their chromosome-organizing activities , they contribute to an unknown aspect of chromosome organisation that is directly linked to cell division . 
+ In favour of this hypothesis is the recent observation that the organisation of a speciﬁc region of the E. coli chromosome contributes to establishing the Z ring position at midcell . 
+ Bailey et al. found that the Ter macrodomain of the chromosome in E. coli becomes important for midcell Z ring positioning in the absence of SlmA and the Min system [ 72 ] . 
+ Speciﬁcally , combining the slmA min double mutant with a mutant in MatP ( the protein that organises the Ter macrodomain ) affected midcell Z ring precision . 
+ The authors also demonstrated that this effect is mediated through interactions between MatP and the divisome proteins ZapB and ZapA . 
+ These interactions were established by Espéli et al. [ 88 ] . 
+ Thus , these results suggest that the organisation of the Ter macrodomain plays a positive role in Z ring positioning . 
+ Intriguingly , in the absence of this link to the Ter macrodomain , SlmA and the Min system , there is still a slight midcell bias for Z ring placement . 
+ Thus , these modest effects to Z ring positioning suggest that many levels of control are required for bacteria to accurately coordinate division with chromosome organisation . 
+ In analogy to MatP , and as mentioned above , Spo0J , SMC and MukBEF are suggested to be involved in the overall organisation of the origin region following its replication [ 33,42,89 ] . 
+ Marbouty et al. and Wang et al. have very recently utilised Hi-C techniques in B. subtilis to elucidate the structure of the chromosome in this organism [ 90,91 ] . 
+ As well as identifying both short - and long-range chromosomal DNA interactions within the B. subtilis chromosome , both these studies directly demonstrate the requirement of the ParABS system and SMC complex in origin-region resolution , reformation and segregation following its duplication . 
+ Both studies also elegantly shine a light on the intrinsic link between DNA replication and chromosome organisation . 
+ Blocking initiation of DNA replication elucidated an effect on these DNA interactions and demonstrated a loss to normal chromosome organisation and segregation . 
+ These studies make it intriguing to see which chromosome interaction domains are lost in chromosome organisation mutants resulting in abnormal Z ring formation such as those in the absence of spo0J , smc or mukB . 
+ It is possible that changes to chromosome architecture as a result of the action of chromosome segregation proteins is vital for large scale compaction of the origin region to bring together sequence-distal operons to allow for important interactions , for example to form a signal or to localise protein-protein interactions required for proper Z ring formation . 
+ Continued studies in this area will surely further lead to the emergence of how these three key processes of DNA replication , chromosome organisation and cell division come together in perfect synchrony ral dynamics of cell division and chromosome segregation proteins and chromosomal loci is a greater appreciation of the architecture of the chromosome under various conditions , and the important role this plays in cell cycle regulation . 
+ For example , it is not yet clear exactly how or in what way cellular processes such as DNA replication , chromosome segregation or transcription affect chromosome architecture . 
+ Isolating the inﬂuence of each of these processes on chromosome architecture , and pinpointing cause and effect , will be challenging . 
+ Recent advances in technologies to look closer at how the chromosome is compacted and organised will be of great value in this endeavour . 
+ Genome-wide conformation capture techniques such as Hi-C and super-resolution microscopy allow us to detect chromosome interaction domains and infer information on the spatial organisation of the chromosome [ 90 -- 93 ] . 
+ Examining chromosome architecture on a more global scale using these technologies and in different environmental or growth conditions will give us insight into how bacteria coordinate chromosome replication , segregation and Z ring formation under various situations to allow proper daughter cell propagation . 
+ Critical to a greater understanding of the spatial and tempo ¬ 
+ Acknowledgements
+ for providing data prior to publication . 
+ We are also grateful to Fiona MacIver for critical and proofreading of the manuscript . 
+ This work was supported by Australian Research Council Discovery Project Grants to E.J.H ( DP120102010 ; DP150102062 ) . 
+ I.V.H was supported by an Australian Postgraduate Award . 
+ We thank David Rudner , Marcelo Nollmann and Romain Koszul
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/26862720.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/26862720.txt 0 → 100644
View file @27818a9
+ DamID-seq: Genome-wide Mapping of Protein-DNA Interactions b
+ Abstract 
+ The DNA adenine methyltransferase identification ( DamID ) assay is a powerful method to detect protein-DNA interactions both locally and genome-wide . 
+ It is an alternative approach to chromatin immunoprecipitation ( ChIP ) . 
+ An expressed fusion protein consisting of the protein of interest and the E. coli DNA adenine methyltransferase can methylate the adenine base in GATC motifs near the sites of protein-DNA interactions . 
+ Adenine-methylated DNA fragments can then be specifically amplified and detected . 
+ The original DamID assay detects the genomic locations of methylated DNA fragments by hybridization to DNA microarrays , which is limited by the availability of microarrays and the density of predetermined probes . 
+ In this paper , we report the detailed protocol of integrating high throughput DNA sequencing into DamID ( DamID-seq ) . 
+ The large number of short reads generated from DamID-seq enables detecting and localizing protein-DNA interactions genome-wide with high precision and sensitivity . 
+ We have used the DamID-seq assay to study genome-nuclear lamina ( NL ) interactions in mammalian cells , and have noticed that DamID-seq provides a high resolution and a wide dynamic range in detecting genome-NL interactions . 
+ The DamID-seq approach enables probing NL associations within gene structures and allows comparing genome-NL interaction maps with other functional genomic data , such as ChIP-seq and RNA-seq . 
+ Introduction
+ DNA adenine methyltransferase identification ( DamID ) 1,2 is a method to detect protein-DNA interactions in vivo and is an alternative approach to chromatin immunoprecipitation ( ChIP ) 3 . 
+ It uses a relatively low amount of cells and does not require chemical cross-linking of protein with DNA or a highly specific antibody . 
+ The latter is particularly helpful when the target protein is loosely or indirectly associated with DNA . 
+ DamID has been successfully used to map the binding sites of a variety of proteins including nuclear envelope proteins 4-10 , chromatin associated proteins 11-13 , chromatin modifying enzymes 14 , transcription factors and co-factors15-18 and RNAi machineries 19 . 
+ The method is applicable in multiple organisms including S. cerevisiae 13 , S. pombe 7 , C. elegans 9,17 , D. melanogaster 5,11,18,20 , A. thaliana 21,22 as well as mouse and human cell lines 6,8,10,23,24 . 
+ The development of the DamID assay was based on the specific detection of adenine-methylated DNA fragments in eukaryotic cells that lack endogenous adenine methylation 2 . 
+ An expressed fusion protein , consisting of the DNA-binding protein of interest and E. coli DNA adenine methyltransferase ( Dam ) , can methylate the adenine base in GATC sequences that are in spatial proximity ( most significantly within 1 kb and up to roughly 5 kb ) to the binding sites of the protein in the genome 2 . 
+ The modified DNA fragments can be specifically amplified and hybridized to microarrays to detect the genomic binding sites of the protein of interest 1,25,26 . 
+ This original DamID method was limited by the availability of microarrays and the density of predetermined probes . 
+ We have therefore integrated high throughput sequencing into DamID 10 and designated the method as DamID-seq . 
+ The large number of short reads generated from DamID-seq enables precise localization of protein-DNA interactions genome-wide . 
+ We found that DamID-seq provided a higher resolution and a wider dynamic range than DamID by microarray for studying genome-nuclear lamina ( NL ) associations 10 . 
+ This improved method allows probing NL associations within gene structures 10 and facilitates comparisons with other high throughput sequencing data , such as ChIP-seq and RNA-seq . 
+ The DamID-seq protocol described here was initially developed for mapping genome-NL associations 10 . 
+ We generated a fusion protein by tethering mouse or human Lamin B1 to E. coli DNA adenine methyltransferase and tested the protocol in 3T3 mouse embryonic fibroblasts , C2C12 mouse myoblasts 10 and IMR90 human fetal lung fibroblasts ( data not published ) . 
+ In this protocol , we start with constructing vectors and expressing Dam-tethered fusion proteins by lentiviral infection in mammalian cells 24 . 
+ Next , we describe the detailed protocols of amplifying adenine-methylated DNA fragments and preparing sequencing libraries that should be applicable in other organisms . 
+ The Dam-V5-LmnB1 fusion protein was verified to be co-localized with the endogenous Lamin B protein by immunofluorescence staining ( Figure 1 ) . 
+ The successful PCR amplification of adenine-methylated DNA fragments is a key step for DamID-seq . 
+ The experimental samples should amplify a smear of 0.2 - 2 kb while the negative controls ( without DpnI , without ligase or without PCR template ) should result in no-or clearly less-amplification ( Figure 2 ) . 
+ The methylated DNA fragments are in the range from 0.2 to 2 kb , while the desired insert size for an NGS library is from 200 to 300 bp . 
+ Therefore , it is essential to fragment the methyl PCR products into the suitable size range . 
+ Nonetheless , it was found to be impractical to simultaneously break larger DNA fragments down to suitable sizes and keep the majority of smaller DNA fragments intact in a single fragmentation duration . 
+ Therefore , time course experiments were performed to determine the minimal time ( T0 .2 kb ) needed to fragment 1 µg DNA to a smear centered at 200 bp ( Figure 3 ) . 
+ Then 6 time durations in equal increments were selected between 5 min and T0 .2 kb for the actual fragmentation . 
+ The enzymatic activity of double strand DNA Fragmentase may vary from batch to batch and may decrease over time , so it is recommended to repeat this step for a new batch of Fragmentase or after storage for a period of time . 
+ The desired insert size is between 200 and 300 bp corresponding to DNA fragments between 300 and 400 bp (including 121 bp sequencing adaptors) on the agarose gel. Three thin slices within this range were excised from each experimental sample to narrow the size range of a library and increase the possibility of obtaining at least one qualified sequencing library (Figure 4).
+ An aliquot of 5 µl of each amplified DNA library was analyzed on the agarose gel to determine which library may qualify for sequencing . 
+ As shown in Figure 5A , a clear single band of the same size as the excised gel slice should be visible on the agarose gel ( step 3.7.4 ) . 
+ Next , selected libraries were examined by a Bioanalyzer ( Figure 5B ) to determine the exact size range and concentrations prior to sequencing . 
+ If desired , amplified DNA libraries can be directly examined by a Bioanalyzer without gel analysis . 
+ When multiple libraries are of good quality , it is recommended to sequence libraries of similar size ranges for a pair of experimental ( cells expressing Dam-V5-POI ) and control ( cells expressing V5-Dam ) samples . 
+ The short reads generated by sequencing systems were first mapped back to the corresponding genome . 
+ Uniquely aligned reads were then passed to subsequent analyses . 
+ A pipeline to process short reads , construct a genome-NL interaction map and analyze gene-NL associations were described in detail in our previous work 10 . 
+ Representative results are shown in Figure 6 . 
+ Discussion
+ Whether Dam-tagged proteins retain the functions of endogenous proteins should be examined before a DamID-seq experiment . 
+ The subcellular localization of Dam-tagged nuclear envelope proteins should always be determined and compared with that of the endogenous proteins . 
+ For studying transcription factors , it is suggested to examine whether the Dam-fusion protein can rescue the functions of the endogenous protein in regulating gene expression . 
+ This functional test can be performed in organisms in which knockout mutants of endogenous DNA-binding proteins are available . 
+ Because advances in genome engineering have potentially allowed knocking out any endogenous gene of interest , functions of Dam-tagged DNA-binding proteins can be examined in cultured mammalian cells . 
+ The critical step in this protocol is to successfully fragment the DpnII-digested DamID PCR products to around 200 bp . 
+ This step is designed to render the amplified adenine-methylated fragments to a narrow size range for sequencing and to randomize the starting nucleotides of the DNA fragments in a sequencing library . 
+ Inefficient fragmentation will leave the majority of the DNA fragments starting with GATC ( the 5 ' - overhang from the second DpnII digestion ) , and will result in a much lower performance and yield or even a failure in Illumina sequencing . 
+ Other DNA fragmentation methods may be used as an alternative approach . 
+ The resolution of DamID ( and DamID-seq described here ) is limited by the frequency of GATCs in the genome to be studied . 
+ Moreover , even with high throughput sequencing , the genomic localizations of a DNA-binding protein can only be mapped within two consecutive GATCs rather than to the actual DNA-binding sites . 
+ Despite its limitation , the DamID assay has important advantages . 
+ Because DamID does not require highly-specific antibodies , it can be used to detect a subset of nuclear proteins that could be difficult to assay by ChIP ( such as the nuclear envelope proteins ) . 
+ To study how these proteins regulate genome functions , it is important to integrate and cross-analyze their genome-wide localization data with the current epigenomic mapping data ( such as data from the ENCODE and NIH Roadmap Epigenomics Projects 30,31 ) . 
+ The DamID-seq approach provides both higher resolution and higher sensitivity than DamID by microarray and enables detecting differential NL-associations within gene structures 10 . 
+ A combinatorial analysis of DamID-seq data , ChIP-seq data 32 and gene expression data has identified a class of NL-associated genes with distinct epigenetic and transcriptional features ( data not published ) . 
+ Another advantage of DamID is that it only requires a small number of cells . 
+ In recent years , there has been an explosion in single cell analysis of gene regulation 33,34 . 
+ Although genome sequence 35 , genome-wide gene expression 36 and chromatin conformation 37 can be assayed in a single cell , there has not been an available approach for detecting protein-DNA interactions genome-wide in a single cell . 
+ DamID-seq is a highly promising approach for this goal , and may complement the single cell imaging approach in detecting the dynamics of genome-NL interactions 38 . 
+ One complication is that because the Dam-fusion protein is expressed at a much lower level than the endogenous protein in the DamID assay , it is possible that the Dam-fusion protein may only occupy a subset of genomic binding sites as compared to the endogenous protein . 
+ DamID assay has mostly been used in cultured animal cells to detect protein-DNA interactions . 
+ Notably , developmental biologists have applied this assay in detecting protein-DNA interactions in specific cell types in vivo . 
+ For example , Dam-tagged RNA polymerase II was expressed specifically in Drosophila neural stem cells to detect their genome-wide occupancy without cell isolation 39 . 
+ DamID-seq will be highly useful to study the genome-wide localizations of nuclear envelope proteins , transcription factors and chromatin regulators during development in animal models . 
+ Acknowledgements
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/27336699.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/27336699.txt 0 → 100644
View file @27818a9
+ coli Challenge on Liver and Mammary
+ Abstract 
+ Our objective was to identify the biological response and the cross-talk between liver and mammary tissue after intramammary infection ( IMI ) with Escherichia coli ( E. coli ) using RNAseq technology . 
+ Sixteen cows were inoculated with live E. coli into one mammary quarter at ~ 4 -- 6 weeks in lactation . 
+ For all cows , biopsies were performed at -144 , 12 and 24 h relative to IMI in liver and at 24 h post-IMI in infected and non-infected ( control ) mammary quarters . 
+ For a subset of cows ( n = 6 ) , RNA was extracted from both liver and mammary tissue and sequenced using a 100 bp paired-end approach . 
+ Ingenuity Pathway Analysis and the Dynamic Impact Approach analysis of differentially expressed genes ( overall effect False Discovery Rate 0.05 ) indicated that IMI induced an overall activation of inflammation at 12 h post-IMI and a strong inhibition of metabolism , especially related to lipid , glucose , and xenobiotics at 24 h post-IMI in liver . 
+ The data indicated in mammary tissue an overall induction of inflammatory response with little effect on metabolism at 24 h post-IMI . 
+ We identified a large number of up-stream regulators potentially involved in the response to IMI in both tissues but a relatively small core network of transcription factors controlling the response to IMI for liver whereas a large network in mammary tissue . 
+ Transcriptomic results in liver and mammary tissue were supported by changes in inflammatory and metabolic mediators in blood and milk . 
+ The analysis of potential cross-talk between the two tissues during IMI uncovered a large communication from the mammary tissue to the liver to coordinate the inflammatory response but a relatively small communication from the liver to the mammary tissue . 
+ Our results indicate a strong induction of the inflammatory response in mammary tissue and impairment of liver metabolism 24h post-IMI partly driven by the signaling from infected mammary tissue . 
+ Introduction
+ During early lactation ( i.e. the first 60 days of lactation ) , the massive repartition of nutrients to the mammary gland for milk synthesis has been identified as a major contributor to the high risk of developing diseases [ 1 ] . 
+ This repartition of energy toward the mammary gland is not compensated via feed intake , that is also reaching a nadir during early lactation [ 1 ] . 
+ The requirement of energy and nutrients increases ~ 5-fold from pregnancy to lactation in high producing dairy cows mainly due to the large amount of milk synthesized by the mammary gland [ 2 ] . 
+ In order to meet the nutrient demands in early lactation , most cows mobilize body tissue , e.g. skeletal muscle and adipose tissue , and thereby experience a period of negative energy balance , as reflected by the degree of increase in circulating non-esterified fatty acids ( NEFA ) , ketone bodies ( β-hydroxybutyrate ; BHBA ) and decrease in blood glucose [ 3 ] . 
+ As a result , production diseases , such as ketosis and hepatic lipidosis , occur most often at this time [ 4 ] and are associated with negative impacts on animal health and reduced economic outcome to the farmer . 
+ The liver plays a central role in the metabolic and inflammatory physiology of the dairy cow . 
+ Dairy cows , being ruminants , have a negligible amount of glucose absorbed from the intestine [ 4 ] ; therefore , the large amount of glucose needed to synthesize milk lactose is coming largely from the hepatic gluconeogenesis . 
+ Especially in early lactation , the liver is naturally compromised via increased gluconeogenesis and the catabolism of infiltrating NEFA . 
+ Besides its vital role in metabolism , the liver participates to the immune response by synthesizing and secreting into the bloodstream inflammatory mediators ( i.e. acute phase proteins ) [ 5 ] . 
+ Acute phase proteins are non-specific innate immune components involved in restoring homeostasis and providing host protection from invading microorganisms and inflammation [ 6 ] . 
+ Few studies have focused on the metabolic changes that occur in the liver after an IMI . 
+ Recent work has demonstrated a large transcriptomic response of the bovine liver after intramammary infection ( IMI ) challenge with lipopolysaccharides ( LPS ) , an endotoxin released from the cell wall of Escherichia coli ( E. coli ) [ 7 ] . 
+ Results indicated an increase in the hepatic expression of transcripts coding for proteins involved in acute phase reaction ( and other inflammatory related proteins ) and a consequent/concomitant reduction in the expression of transcripts coding for key metabolic enzymes , including the ones involved in gluconeogenesis [ 7 ] . 
+ Therefore , during inflammatory states such as mastitis , the immunometabolic demands may compromise liver function and increase risk of disease . 
+ The cross-talk between different tissues is emerging as an important regulatory factor during health and disease . 
+ This has been observed in monogastrics [ 8 ] . 
+ Mastitis , an inflammation of the mammary gland , is the most costly disease in the dairy industry and occurs more frequently during early lactation [ 9 ] . 
+ Mastitis is characterized by an increase in milk somatic cell count ( SCC ) and may be accompanied by the presence of an intramammary pathogen [ 10 ] , such as E.coli , one of the most common and costly mastitis-causing pathogens [ 11 ] . 
+ The mammary tissue is highly dependent on a highly functional liver and , therefore , it appears reasonable that a direct cross-talk between the mammary gland and the liver exists to coordinate the nutrient demands for lactation and to elicit the immune response to mastitis . 
+ Our objective was to characterize the individual response and the cross-talk between liver and mammary tissue in response to IMI with E. coli using RNAseq technology . 
+ Materials and Methods
+ Experimental procedures involving animals were approved by the Danish Animal Experiments Inspectorate and complied with the Danish Ministry of Justice Laws concerning animal experimentation and care of experimental animals . 
+ The animal trial was conducted at the Aarhus University 's dairy barn , Ammitsbøl Skovgaard ( Denmark ) . 
+ Sixteen healthy primiparous Holstein cows at ~ 4 -- 6 weeks in lactation were used for this study . 
+ The experimental design has been illustrated previously [ 12 ] . 
+ Cows were not treated for any clinical signs of disease before the study period . 
+ Details on animal housing , total mixed ration fed , treatment , preparation and inoculation of E. coli and clinical examinations have been previously described [ 13,14 ] . 
+ Briefly , cows were considered healthy and free of mas-titis-causing pathogens based on body temperature , white blood cell count , glutaraldehyde test , California Mastitis Test ( Kruuse , Marslev , Denmark ) and bacteriological examinations of aseptic quarter foremilk samples prior to the start of the study period . 
+ Using the portable DeLaval Cell Counter ( DeLaval , Tumba , Sweden ) , the front quarter with the lowest SCC ( < 27,000 cells / mL ) was used for E. coli infusion . 
+ All eligible cows were inoculated with ~ 20 -- 40 cfu of live E. coli ( Danish field isolate k2bh2 ) into one front mammary quarter immediately following the afternoon milking ( h = 0 ) . 
+ Healthy , control quarters were chosen based upon bacteriological examinations where quarter SCC were < 181,000 cells/mL at 24 h post-IMI . 
+ Daily feed intake and milk yield at each milking were recorded . 
+ Rectal temperature was recorded periodically and composite milk samples were collected at -180 , -132 , -84 , -36 , -12 h , 0 , 12 , 24 , 36 , 48 , 60 , 72 and 84 h relative to IMI challenge from infected quarters for analysis of lactate dehydrogenase ( LDH ) , alkaline phosphatase ( AP ) and N-acetyl-β-D-glucosaminidase ( NAGase ) following procedures reported previously [ 15 ] . 
+ Aseptic quarter foremilk samples were collected at -12 , 0 , 3 , 6 , 12 , 18 , 24 , 48 , 60 and 84 h relative to IMI challenge from infected quarters . 
+ One day prior to IMI challenge , sterile Micro-Renathane polyvinyl catheters were inserted into the jugular vein and flushed with a sterile 0.9 % NaCl solution containing 50 IU Na-heparin as previously described [ 12 ] . 
+ Blood was collected at -12 , 0 , 3 , 6 , 12 , 18 , 24 , 36 , 60 and 84 h relative to IMI challenge and plasma was analyzed for concentrations of glucose , NEFA , BHBA and cholesterol using an autoanalyzer , ADVIA 16501 Chemistry System ( Siemens Medical Solution , Tarrytown , NY , USA ) as previously described [ 16 ] . 
+ Liver biopsies were collected from all cows at -144 , 12 and 24 h relative to IMI challenge were analyzed for gene expression as previously described [ 17 ] ; and , using a minimally invasive biopsy technique , mammary tissue was collected at 24 relative to IMI challenge from both infected and control quarters for gene expression analysis as previously described [ 13 ] . 
+ Combined biopsy had no effect on the production and inflammatory mediators presented for this study [ 12 ] . 
+ After the mammary biopsies had been collected , cows were administered a prophylactic antibiotic treatment against infection with Gram-positive bacteria by intramuscular injection of 30 mL of Penovet1 vet ( 300,000 IE benzylpenicillinprocain/ml ; Boehringer Ingelheim Danmark A/S , Copenhagen , Denmark ) . 
+ No other antibiotic therapy was administered after IMI challenge . 
+ RNAseq Analysis
+ For a subset of liver and mammary tissue samples ( n = 6 cows for both liver and mammary tissue ) , RNA extraction was performed as described previously [ 18 ] . 
+ RNA was extracted from the liver of 6 cows at 3 different time points ( i.e. -144 , 12 and 24 h relative to IMI challenge ; 3 x 6 = 18 liver RNA samples in total ) and for the mammary tissue from the infected and control quarters of the same 6 cows ( i.e. 2 x 6 = 12 mammary tissue RNA samples in total ) . 
+ Each sample was sequenced using a 100 bp paired approach with the Illumina Hiseq2000 sequencing technology by AROS Applied Biotechnology ( Aarhus , Denmark ) . 
+ The standard Illumina Tru-Seq mRNA protocol was used with minor changes ( fragmentation time was 1 minute and the number of PCR cycles was 13 ) . 
+ Sequence reads obtained for each sample were aligned separately to the Bovine genome assembly ( UMD3 .1 ) . 
+ Sequence alignment was conducted using an efficient and sensitive mapping paradigm based on seed-and-vote algorithm [ 19 ] implemented in the Rsubread package in R . 
+ The abundance of mRNAs for all genomic features ( bovine Ensembl genes ) was calculated for each sample . 
+ Procedures previously described [ 20 ] were used to obtain the annotated bovine Ensembl genes from the annotations available at the UCSC Genome Browser . 
+ The total number of bovine Ensembl genes was 24,616 . 
+ The function ` Feature Counts ' in the R package was used to count the number of reads that mapped to each feature using the default settings . 
+ The total number of mapped reads for each sample varied from 10,434,369 to 22,098,317 . 
+ Statistical Analysis
+ RNAseq . 
+ Differential expressed genes were identified using edgeR ( version 3.4.2 ; [ 21 ] ) . 
+ The count data were normalized using the weighted trimmed mean of M-values [ 21 ] . 
+ Count data were non-normally distributed thus a generalized linear model ( GLM ) was fitted for each gene . 
+ A negative binomial GLM was used allowing for a quadratic mean-variance relationship commonly observed in this type of data . 
+ The GLM allows for a design to be modeled and differential expression is determined using a likelihood ratio test . 
+ In the GLM model specified the number reads in RNA sample i that map to the gene g is denoted ygi and the total number of mapped reads is denoted Ni . 
+ It is assumed that ygi * NB ( μgi , ϕg ) , where μgi is the mean and ϕg is the dispersion parameters of the negative bionomial distribution . 
+ The expected number of reads for a particular gene μgi is the product of the relative abundance of that gene λgi and the total number of reads Ni in the ith sample . 
+ It is assumes that λgi can be represented by a log-lin-ear model : where xT is the covariate vector indicating the treatment conditions applied to sample i and β i g is the vector of regression coefﬁcients by which the covariate effects are mediated for gene g. Liver and mammary tissue samples were analyzed separately . 
+ In the statistical analyses we only considered the main effect of time in case of liver samples ( i.e. βg has three treatment levels , -144 , 12 and 24 h relative to IMI challenge ) and infection status in case of mammary tissue samples ( i.e. βg has two treatment levels , Infected and Control ) . 
+ The aim of the differential expression analysis is to test the null hypothesis ( H0 ) ( stating that the relative abundance of reads for the gene g is similar in treatment condition T1 ( e.g. time = -144 hours ) and T2 ( e.g. time = 12hours ) ) : H : λ T1 1/4 λ T2 against the alternative hypothesis ( H 0 g g 1 ) ( stating that the relative abundance of reads for the gene g is different in treatment condition T1 and T2 ) : H : λ T1 1/4 : b 1/4 b 1 g g 0 g g 6 λ T2 . 
+ This corresponds to test the null hypothesis H T1 T2 in the log-linear model which can be speciﬁed by a contrast vector cT for H0 : cT βg = 0 . 
+ These hypotheses are evaluated by comparing the observed likelihood ratio test statistics to a χ2 distribution with 1 degree of freedom . 
+ For the liver we determined if there is differential expression between any of the time points ( e.g. a joint test for the three comparisons : -144 vs 12 , -144 vs 24 and 12 vs 24 ) or at speciﬁc time points ( e.g. three individual tests one for each of comparison -144 vs 12 , -144 vs 24 and 12 vs 24 ) . 
+ For the mammary tissue determined if there is differential expression between the infected and uninfected quarter ( e.g. a single test ) . 
+ To ensure stable inference for each gene an empirical Bayes method was applied to squeeze the gene wise dispersions towards a common dispersion for all genes [ 22 ] . 
+ As there are 24,616 genes annotated in the bovine genome , the statistical tests in each analysis were corrected for multiple testing using the Benja-mini and Hochberg False Discovery Rate ( FDR ) method [ 23 ] as implemented in R ( version 2.12.0 ) . 
+ The RNAseq data files were deposited in NCBI 's Gene Expression Omnibus ( GEO ; http : / / www.ncbi.nlm.nih.gov/geo/ ) and are accessible through GEO series accession number [ GSE75379 ; http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=czgzgcwwfxwjtub&acc= GSE75379 ) . 
+ Blood and milk . 
+ The PROC MIXED procedure of SAS ( SAS/STAT version 9.2 ; SAS Institute Inc. , Cary , NC ) was used for statistical analysis for all biopsied cows ( n = 16 ) . 
+ The class variables included cow , block ( i.e. date of IMI challenge ) , and time ( i.e. relative to IMI ) and the model included the fixed effect of time and the random effect of cow nested within block . 
+ The degrees of freedom were estimated with the Kenward-Roger specification in the model statement . 
+ The data are presented as least squares mean ( LSM ) and the standard error of the mean ( SEM ) . 
+ Statistical differences were declared as significant if P 0.05 . 
+ Bioinformatics Analyses Using Ingenuity Pathway Analysis (IPA)
+ Network , function , and pathway analyses were generated using IPA ( Ingenuity Systems , www . 
+ ingenuity.com ) that assists with RNAseq data interpretation via grouping differentially expressed genes ( DEG ) into known functions , pathways , and networks based primarily on human and rodent studies . 
+ The whole dataset containing Ensembl Gene ID , FDR , expression ratio , and P-value for all comparisons was uploaded into IPA and the whole bovine annotated genome ( 24,616 unique Ensembl IDs ) was used as background . 
+ Only 15,148 unique genes were mapped into IPA . 
+ Due to the nature of the sample analyzed the analysis carried out with IPA was restricted to IPA database related to liver as organ system and all Hepatoma cell lines for the data related to liver and IPA dataset related to mammary tissue as organ system and Brest Cancer cell lines and Other cell lines for the data related to mammary tissue . 
+ The genes used in the IPA were selected based on the following criteria : In the liver a FDR adjusted p-value below 0.05 for the joint test determining differentially expression at any of the time points , a fold change larger than 1 ( or smaller than -1 ) , and a p-value below 0.05 for each of the comparison -144 vs 12 , -144 vs 24 and 12 vs 24 . 
+ In the mammary tissue a FDR adjusted p-value below 0.05 for test comparing the infected and uninfected quarter . 
+ Both up - and down-regulated genes were analyzed simultaneously . 
+ The IPA was used to analyze the up-stream regulators of DEG using the `` Up-stream analysis '' feature . 
+ The analysis uses an IPA Knowledge base to predict the expected causal effects between up-stream regulators and DEG targets . 
+ The analysis provides the more plausible prediction of the status of the up-stream regulator ( i.e. , activated or inhibited ) by computing an overlap P-value and an activation z-score . 
+ The results were downloaded and graphically depicted using SigmaPlot v11 ( Systat Software Inc. , Germany ) . 
+ Network analysis of DEG coding for transcriptional factors ( TF ; i.e. , transcriptional network analysis ) was performed using IPA . 
+ The network analysis allows to uncover ( and to visualize ) the interaction between TF and also provides an overall prediction of the TF activation/inhibition using expression of putative down-stream genes . 
+ Furthermore , the most significantly enriched functions were uncovered among DEG TF in the network using IPA . 
+ The cross-talk between liver and mammary tissue at 24 h after E.coli IMI was performed using the network capability of IPA as previously described [ 24 ] . 
+ For the present analysis DEG were identified using as criteria : FDR < 0.05 for the overall effect ( overall time effect for liver encompassing 12 and 24 h post-IMI and IMI vs. control for mammary tissue ) , an expression ratio 2 , and P-value < 0.001 ; a higher stringency compared to the functional analysis was used in order to identify the genes coding for proteins with a higher likelihood of cross-talk between the tissues . 
+ The genes considered to code for secreted proteins were the one in the cytokines and growth factors categories while genes coding for proteins considered to be receptors that might be able to `` sense '' the secreted proteins were the one in G-protein coupled receptor , ligand-dependent nuclear receptor , transcription regulator , and transmembrane receptor . 
+ Networks between DEG with high expression in liver of cows at 24 h and coding for secreted proteins and DEG more expressed in E. coli treated vs. saline in mammary tissue and coding for receptors and vice versa were built using IPA Knowledge base . 
+ Bioinformatic Analyses Using Dynamic Impact Approach
+ The description of the Dynamic Impact Approach ( DIA ) tool is reported elsewhere [ 25 ] . 
+ For the DIA analysis , the Ensembl IDs were transformed into Entrez Gene ID using BioDBnet [ 26 ] . 
+ Only 16,867 unique Entrez Gene IDs were obtained and used for the analysis . 
+ The whole annotated dataset was uploaded into the system using the whole bovine annotation as background . 
+ As for IPA a FDR < 0.05 for the overall time effect and a P-value < 0.05 between comparisons were used as cut-off . 
+ The DIA analysis was performed for the Kyoto Encyclopedia of Genes and Genomes ( KEGG ) pathways [ 27 ] . 
+ Results and Discussion
+ Responses in Blood and Milk after IMI with E. coli
+ For all biopsied cows ( n = 16 ) , clinical symptoms and rises in milk SCC , shedding of E. coli , rectal temperature , heart rate and respiration were observed and results are reported elsewhere [ 12,15 ] . 
+ Change in concentration of AP , NAGase and LDH in milk after IMI with E. coli for all 16 cows is shown in S1 Fig . 
+ Alkaline phosphatase increased by 48 h after IMI when compared to pre-IMI levels ( h = 0 ) . 
+ Milk NAGase was elevated at 36 , 48 and 72 h post-IMI . 
+ Milk LDH was significantly elevated at 36 h post-IMI whereas no other time points differed from h = 0 . 
+ Our data clearly indicate an inflammatory response to IMI challenge with E. coli . 
+ Feed intake and milk yield response for cows ( n = 16 ) after IMI challenge with E. coli are shown in S2 Fig . 
+ Feed intake decreased by 48 h post-IMI and returned to pre-IMI levels by 72 h post-IMI whereas milk yield decreased by 24 h , remained lower through 48 h , and returned to pre-IMI levels by 60 h post-IMI . 
+ The pro-inflammatory response and the changing hormonal environment contribute to reduced feed intake and milk production observed during an IMI [ 28 ] and most likely explains the majority of variation in feed intake and milk yield observed for this study . 
+ Changes in plasma concentrations of NEFA , BHBA , glucose and cho-lesterol ( n = 16 ) after IMI with E. coli are shown in S3 Fig . 
+ Overall , plasma NEFA increased during IMI but no differences were observed at any given time point . 
+ Elevated NEFA in blood indicates increased adipose tissue lipolysis during the inflammatory response [ 29 ] . 
+ Glucose concentrations increased by 36 and 60 h after IMI whereas plasma BHBA decreased by 12 h post-IMI and remained lower throughout the study period . 
+ Concentration of plasma cholesterol decreased and was significantly lower at 84 h post-IMI when compared to 0 h. Increases in plasma glucose are not associated with increased hepatic gluconeogenesis , impaired during periods of inflammation [ 30 ] , but is likely consequence of insulin resistance typically induced by inflammation leading to hyperglycemia [ 31 ] . 
+ Drops in plasma BHBA can be primarily associated with increased transfer into milk [ 15 ] . 
+ Blood BHBA has been associated with reduced neutrophil recruitment [ 32 ] and may partly explain reduced levels of blood BHBA during inflammation . 
+ Changes in metabolites in blood and animal production during IMI are similar to those reported by others [ 33,34 ] and may increase risk for the development of subsequent disease during mastitis for dairy cows in early lactation . 
+ The complete dataset with statistical results is available in S5 Fig . 
+ Genomic analysis via RNAseq uncovered > 3,643 and > 4,724 DEG ( Fig 1 ) in liver tissue at 12 and 24 h post-challenge compared to -144 h , respectively . 
+ By 12 h ( i.e. early inflammatory response ) , 1,399 DEG were down-regulated and 2,244 were up-regulated whereas the same number of up - and down-regu-lated ( 2,388 up - and 2,336 down-regulated ) were detected at 24 h post-challenge ( i.e. peak inflammatory response ) compared to 144 h pre-challenge . 
+ For mammary tissue , > 2,379 DEG were observed 24 h post-challenge in the IMI compared to the control quarter with , similar to the liver at 12 h , more up-regulated ( 1,407 ) compared to down-regulated ( 972 ) DEG . 
+ Overall , the liver had a larger transcriptomic response to IMI when compared to mammary tissue . 
+ However , the mammary gland having a single comparison the use of a FDR cut-off for the overall effect of the treatment has accounted for the false positive while for the liver the false positives were accounted for the overall effect ( i.e. , time ) but we did not corrected for false positive for the specific comparisons ( i.e. , 12 vs. 144 h , 24 vs. 144 h , and 24 vs. 12 h ) . 
+ The use of FDR 0.05 for the mammary tissue resulted in a P-value 0.005 ( S5 Fig ) . 
+ The use of the same P-value cut-off for the liver resulted in lower DEG for the comparisons , with the comparison 12 vs. 144 h having 2,555 DEG and the comparison 24 vs. 144 h having 3,323 DEG ( approx . 
+ 28 % less DEG compared with the use of P-value 0.05 ) . 
+ Thus , the overall transcriptomic effect of IMI was still larger for the liver compared to mammary tissue . 
+ Because of the difference in the overall effect considered for the mammary tissue ( IMI vs. control ) and the liver ( time effect ) , we decided to use a consistent FDR for the overall effect and a P-value 0.05 for the various comparisons for the liver , while for the mammary the P-value was not used ( but only the FDR for the overall effect ) . 
+ The use of a simple P-value ( i.e. not corrected for multiple comparisons ) 0.05 in liver is a liberal approach and can be a limitation when evaluating the functional analysis in enrichment approach analyses , such as the approach used by IPA . 
+ However , the Z-score for the up-stream regulator analysis in IPA can benefits from a larger number of DEG and the DIA is minimally affected by the P-value cut-off and , as for the Z-score , the algorithm benefits from a larger number of DEG . 
+ Thus , we opted for a more liberal P-value between comparisons in liver . 
+ Functional Analysis of the Liver Transcriptome: Early Induction of Inflammation followed by Large Inhibition of Metabolism
+ Hepatic transcriptomic response at 12 h post-IMI . 
+ Results using IPA analysis ( Fig 2 ) revealed a large induction of pathways associated with the inflammatory response such as ` IL-10 Signaling ' , ` IL-6 Signaling ' and ` Acute Phase Response Signaling ' with an important role played by the hepatic stellate cells at 12 h post-IMI via an induction of ` Hepatic Fibrosis / Hepatic Stellate Cell Activation ' pathway . 
+ With regard to DEG associated with ` Hepatic Fibrosis/Hepatic Stellate Cell Activation ' , the DEG overlapped with other inflammatory pathways such as ` IL-10 Signaling ' and ` IL-6 Signaling ' and genes associated with liver tissue damage were not altered at this time . 
+ Among other noteworthy pathways highly enriched and induced were ` Glucocorticoid Receptor Signaling ' and ` Death Receptor Signaling ' . 
+ During the inflammatory response , the liver synthesizes acute phase proteins ( e.g. serum amyloid A and hapto-globin ) that are associated with restoring homeostasis and providing host protection from invading microorganisms via inhibiting growth of bacteria . 
+ The acute phase proteins are considered a negative feedback inhibitor for the immune response [ 6 ] . 
+ Synthesis of acute phase proteins are stimulated by several cytokines , including IL-6 [ 35 ] , and may partly explain the enrichment of DEG associated with ` IL-6 Signaling ' . 
+ Glucocorticoids have been well documented as a natural immunosuppressor around parturition [ 36 ] and during the inflammatory 
+ < 0.05 , P-value between comparisons < 0.05 ) as total ( black ) , up-regulated ( Up ; red ) , and down-regulated ( Down ; green ) in bovine liver at 12 and 24 h vs. -144 h relative to IMIchallenge with E. coli and at 24 vs. 12 h post-IMI . 
+ Reported is also the number of DEG between infected and non-infected bovine mammary quarters at 24 h post-IMI . 
+ response [ 36 ] . 
+ Interleukin-10 aids in controlling the pro-inflammatory response via its antiinflammatory properties [ 37 ] . 
+ The enrichment of both ` IL-10 Signaling ' and ` Glucocorticoid Receptor Signaling ' may help control the pro-inflammatory response thereby reducing risk of host tissue damage during inflammation . 
+ Among the metabolic-related pathways , the only pathway highly enriched and induced was ` PPAR Signaling ' . 
+ The Peroxisome Proliferator-activated Receptor ( PPAR ) are ligand-depen-dent nuclear receptors that can control a large number of functions , but particularly control lipid metabolism [ 38 ] . 
+ The increase activation of PPARα , the most abundant PPAR isotypes in liver , during the early response to IMI challenge is somewhat expected due to the anti-inflam-matory role of PPAR via increased catabolism of arachidonic acid and modulation of liver acute phase response by transrepression of inflammatory transcription factors [ 39 ] . 
+ The DIA analysis ( Fig 3 and S6 Fig ) uncovered as the most impacted and induced pathways at 12 h post-challenge the ones associated with the ` Immune System ' as well as metabolic-related pathways . 
+ Among immune-related pathways the ` NOD-like receptor signaling ' and the ` Toll-like receptor ( TLR ) signaling ' pathways were among the most impacted and induced ( S6 Fig ) . 
+ Among metabolic-related pathways , DIA indicated a high impact and induction of pathways related to amino acid metabolism ( Fig 3 ) . 
+ There was a general induction of almost all the pathways related to amino acid metabolism but the largest induction was detected for the ` Taurine and hypotaurine metabolism ' pathway ( S6 Fig ) . 
+ The large effect of amino acid metabolism during inflammation is likely determined by the pro-inflammatory cytokines and appears to be an important part of the inflammatory pattern in most species [ 40 ] . 
+ The DIA analysis uncovered a relatively high impact of IMI on lipid-related pathways with ` Fatty acid biosynthesis ' , ` Primary bile acid biosynthesis ' , and ` Steroid biosynthesis ' pathways among the most induced ( S6 Fig ) . 
+ The larger induction of fatty acid synthesis might be associated with the increase in triglycerides accumulation in liver during IMI , at the least as observed by using LPS in cows [ 41 ] . 
+ The summary of the pathways ( Fig 3 ) indicated as highly impacted by the IMI also pathways related to signaling . 
+ The most induced pathways related to signaling were ` Cytokine-cyto-kine receptor interaction ' and ` Chemokine signaling ' ( S6 Fig ) . 
+ The `` Folding , sorting and degradation ' category of pathways was induced by IMI ( Fig 3 ) especially due to an induction of the ` Protein processing in endoplasmic reticulum ' pathway ( S6 Fig ) indicating an induction of synthesis of secreted proteins which may be associated with the synthesis of acute phase proteins , as supported by an induction of ` Acute Phase Response Signaling ' in IPA ( Fig 2 ) and by increase in acute phase proteins after IMI [ 13 ] . 
+ Finally an induction of the apoptosis was also indicated by the DIA analysis ( Fig 3 and S6 Fig ) . 
+ An increase in apoptosis of liver during inflammation has been observed previously in mice [ 42 ] . 
+ Hepatic transcriptomic response of liver at 24 h post-IMI . 
+ According to IPA analysis , DEG 24 h post-challenge associated with metabolism were down-regulated in liver tissue with no pathways associated with the inflammatory response significantly altered , with the exception of ` Acute Phase Response Signaling ' , which was overall induced ( Fig 2 ) . 
+ The most significantly enriched pathway at 24 h post-IMI was the ` LPS/IL -1 Mediated Inhibition of Retinoid X Receptor ( RXR ) Function ' indicating a direct effect of the Gram-negative bacteria on inhibiting the liver metabolism through the RXR , a nuclear receptor essential for the formation of the functional heterodimer with most ligand-dependent nuclear receptors [ 43 ] . 
+ In addition , ` Gluta-thione-mediated Detoxification ' is a common pathway utilized by the liver for detoxification but requires acetyl-CoA as a substrate and acetyl-CoA availability may have been low due to the inhibition of hepatic fatty acid oxidation/metabolism ( Figs 2 and 3 ) and carbohydrate metabolism ( Fig 3 ) during the inflammatory response . 
+ The DIA analysis confirmed IPA results with metabolic-related pathways being the most strongly impacted and inhibited and small or no effects were detected for immune-related pathways ( Fig 3 ) . 
+ Among the most inhibited metabolic-related pathways were ` Carbohydrate Metabolism ' ( e.g. , ` Ascorbate and aldarate metabolism ' , ` Pentose and glucuronate interconversions ' , ` Glycolysis / Gluconeogenesis ' , ` Propanoate metabolism ' ) , ` Lipid Metabolism ' ( e.g. , ` Ste-roid hormone biosynthesis ' , ` Primary bile acid biosynthesis ' , and ` Fatty acid metabolism ' ) , metabolism of amino acids ( e.g. , ` Valine , leucine and isoleucine degradation ' , ` Tryptophan metabolism ' , ` Glycine , serine and threonine metabolism ' ) , metabolism of vitamins ( e.g. , ` Reti-nol metabolism ' ) , biosynthesis of secondary metabolites ( e.g. , ` Caffeine metabolism ' ) , and metabolism of xenobiotics ( e.g. , ` Drug metabolism -- cytochrome P450 ' ) ( Fig 3 ; S6 Fig ) . 
+ These results indicate that the overall metabolism in the liver was activated during early inflammation but strongly inhibited and may have been compromised in relatively later stage of inflammation . 
+ The data indicated that especially inhibited at 24 h post-IMI was the ability to oxidize fatty acids , synthesize ketone bodies , produce cholesterol , performed gluconeogenesis , and metabolize carbohydrates and amino acids . 
+ Coupled with the increase in ` Fatty acid biosynthesis ' pathway , it is likely that the liver of the cows at 24 h post-IMI had an increase in triglycer-ides accumulation . 
+ This is consistent with previous observations [ 44,45 ] . 
+ Furthermore , the inhibition of hepatic gluconeogenesis by inflammation has been observed in lactating dairy cows [ 30 ] . 
+ The decrease in BHBA observed in blood is explained by a combination of increased BHBA secretion in milk [ 15 ] and a down-regulation of genes associated with hepatic ketogenesis ( Fig 2 ) . 
+ An impairment of the hepatic metabolic response during inflammation is further supported by the comparison of hepatic transcriptomic expression between 24 and 12 h post-IMI challenge ( Fig 3 ) . 
+ The DIA results indicated that DEG associated with metabolism such as ` Carbohydrate Metabolism ' , ` Lipid Metabolism ' and ` Metabolism of Other Amino Acids ' were the most inhibited pathways during the inflammatory response . 
+ These results provide evidence that hepatic metabolic function is compromised during inflammation and suggest that cows experiencing mastitis may be at risk for development of subsequent metabolic diseases , especially during early lactation when risk of metabolic disease is high [ 46 ] . 
+ Up-stream Regulators Controlling the Liver Transcriptomic Response to IMI 
+ The IPA analysis uncovered a large number of up-stream regulators potentially controlling the transcriptomic adaptation of the liver at 12 ( Fig 4 ) and 24 h ( Fig 5 ) post-IMI . 
+ Hepatic response 12 h post-IMI . 
+ Genes coding for pro-inflammatory cytokines such as tumor necrosis factor alpha ( TNF ) , interleukin ( IL ) 1α ( IL1A ) and β ( IL1B ) , and IL-6 ( IL6 ) were estimated to play a major up-stream controlling role . 
+ Few anti-inflammatory cytokines 
+ E. coli . 
+ Ingenuity Pathway Analysis predicts causal effects among up-stream regulators and targets ( i.e. , differentially expressed genes ) . 
+ The analysis provides the more plausible prediction of the status of the upstream regulators ( i.e. activated or inhibited ) by computing an overlapping P-value and a Z-score . 
+ The upstream regulators are grouped by functional categories with an activation Z-score 2 ( positive = activated ; negative = inhibited ) , significance of overlap ( or enrichment ; as -- log10 of the P-value ) , and , when available and significant , the expression ratio . 
+ E. coli . 
+ Ingenuity Pathway Analysis predicts causal effects among up-stream regulators and targets ( i.e. differentially expressed genes ) . 
+ The analysis provides the more plausible prediction of the status of the upstream regulator ( i.e. activated or inhibited ) by computing an overlapping P-value and a Z-score . 
+ The upstream regulators are grouped by functional categories with an activation Z-score 2 ( positive = activated ; negative = inhibited ) , significance of overlap ( or enrichment ; as -- log10 of the P-value ) , and , when available and significant , the expression ratio . 
+ were also uncovered to play a regulatory role , such as IL2 and IL10 . 
+ Several growth factors were estimated to be important up-stream regulators such as leptin ( LEP ) and epidermal growth factor ( EGF ) . 
+ Among a large number of transcription regulators ( TR ) , the highest activation was estimated for IRF1 ( interferon regulatory factor 1 ) , MYC ( v-myc avian myelocytomatosis viral oncogene homolog ) , NFE2L2 ( Nuclear Factor , Erythroid 2-Like 2 ) , STAT3 ( signal transducer and activator of transcription 3 ) , and SMAD4 ( mothers against decapentaplegic homolog 4 ) . 
+ The analysis by IPA indicated that HNF4A ( hepatocyte nuclear factor 4 , alpha ) , master regulator of lipid metabolism [ 47 ] , FOXO1 ( forkhead box protein O1 ) , essential for the regulation of gluconeogenesis [ 48 ] , and PPARGC1A ( PPAR gamma , coactivator 1 alpha ) , the essential cofactor for the activity of PPAR , were inhibited by the IMI . 
+ Few ligand-dependent nuclear receptors ( LdNR ) were deemed to play a role , among these the NR1I3 ( nuclear receptor subfamily 1 , group I , member 3 ) and NR0B2 ( nuclear receptor subfamily 0 , group B , member 2 ) were the most activated while AHR ( aryl hydrocarbon receptor ) and THRA ( thyroid hormone receptor , alpha ) were strongly inhibited . 
+ Several transmembrane receptors involved in the immune response were strongly induced while several membrane transporters , especially the one involved in cholesterol transport ( e.g. , ABCB4 and APOE ) were inhibited ( Fig 4 ) . 
+ Inflammation is known to negatively affect cholesterol transport in monogastrics [ 49 ] and decrease as consequence of inflammation in dairy cows [ 50 ] . 
+ The decrease in cholesterol transport was also supported in our study by the overall decrease in cholesterol following IMI ( S4 Fig ) . 
+ Hepatic response 24 h post-IMI . 
+ The pro-inflammatory cytokines regulating the transcriptomics adaptation at 12 h post-IMI were still among the main up-stream regulators also at 24 h post-IMI with a relatively strong activation . 
+ Among growth factors , EGF remained activated but LEP was inhibited at 24 h post-IMI . 
+ A large number of LdNR and TR were uncovered to play a major regulatory role by IPA . 
+ With few exceptions , all the LdNR and the TR were inhibited at 24 h post-IMI . 
+ Among these , AHR , PPARA ( PPARα ) , PPARD ( PPARβ / δ ) , RARA ( retinoic acid receptor α ) and RXRA ( RXRα ) were the most inhibited LdNR while HNF4A , MED1 ( mediator complex subunit 1 ) , PPARGC1A , and the two SREBP ( sterol regulatory ele-ment-binding protein ) isoforms ( i.e. SREBF1 and SREBF2 ) were the most inhibited LR ( Fig 5 ) . 
+ Noteworthy is also the estimated inhibition of SCAP ( SREBF cleavage activating protein ) , MLX , and XBP1 ( X-box binding protein 1 ) . 
+ Most of the LdNR inhibited by the IMI at 24 h are involved in lipid and glucose metabolism [ 51,52 ] . 
+ Overall , the data indicated a large role of cytokines and TF in the transcriptomic regulation in the early phase of IMI but a very large importance of LdNR and TR in the transcriptomic adaptation of the liver at 24 h post-IMI , especially in controlling hepatic lipid and glucose metabolism . 
+ Transcription Network Controlling the Transcriptomic Adaptation in Liver after IMI 
+ The analysis of the transcriptional network among the up-stream regulators deemed to have played a major role in the transcriptomics adaptation of the liver to IMI ( Figs 4 and 5 ) , uncovered a relative small number of TF with potential large cross-talk capabilities ( Figs 6 and 7 ) . 
+ At 12 h post-IMI , the network analysis revealed a tight network among several predicted activated TR such as MYC , STAT3 and IRF1 , and several predicted inhibited LdNR and TR , such as NR1I3 , AHR , HNF4A and PPARGC1A ( Fig 6 ) . 
+ The STAT3 was predicted to be activated by a plethora of cytokines and growth factors with IRF1 activated by IL1-β ( Fig 6 ) . 
+ This prediction is indicative of STAT3 being a central hub in the response to liver inflammation with the activation of MYC , NR1I3 , and NFE2L2 at 12 h post-IMI mainly due to the activation of STAT3 by cytokines . 
+ However , the network also indicated that the inhibited LdNR and TR had a negative regulatory role toward several of the pro-inflammatory TR ( Fig 6 ) . 
+ Thus , the activation of the networks toward a pro-inflammatory response could have been initially be driven by cytokines mainly through STAT3 but then the inhibition of several TR and LdNR had reduced the inhib-itory role , amplifying the pro-inflammatory response . 
+ The importance of the TR and LdNR of the network is highlighted also by their significant association with several pathways and functions with an important role in the early response to inflammation , such as ` Glucocorticoid Receptor Signaling ' , ` Acute Phase Response Signaling ' , ` Regeneration of Liver ' , ` Inflammation of Liver ' , ` Proliferation of Lymphocytes ' and ` Synthesis of Fatty Acid ' . 
+ At 24 h post-IMI , the transcription network was composed exclusively by metabolic-associ-ated LdNR and TF including PPARA , PPARD , SREBP isoforms , and HNF4A ( Fig 7 ) . 
+ Among these TF the PPAR are known to be highly regulated by endogenous or dietary compounds , such as fatty acids , leukotriene B4 , and glucose [ 38 ] . 
+ However , according to the predicted upstream regulators analysis , the importance of endogenous chemicals is not as large as cytokines ( Fig 5 ) . 
+ It is known that several of the LdNR and especially PPAR isotypes are inhibited indirectly by cytokines [ 53 ] which , however , was not predicted to have played a prominent role at 24 or 12 h post-IMI . 
+ The transcriptional network at 24 h post-IMI ( Fig 7 ) was significantly associated with lipid-related metabolism , including oxidation of fatty acids and synthesis of cholesterol and triglycerides which were all deemed to be inhibited at 24 h post-IMI . 
+ The above results are indicative of a prominent role of metabolic-related LR and LdNR on the coordination of the metabolic shut down of the liver at 24 h post-IMI . 
+ Functional Analysis of Mammary Transcriptome: Large Induction of Inflammatory Signaling and Reduced Milk Fat Synthesis
+ Differentially expressed genes up-regulated in mammary tissue 24 h post-IMI challenge were associated with the inflammatory response and down-regulated DEG were associated with lipid metabolism . 
+ The DIA analysis ( Fig 3 ) revealed that the most impacted pathways primarily activated in mammary tissue 24 h post-IMI challenge were associated with the ` Immune System ' , ` Signaling Molecules and Interaction ' ( i.e. , ` Jak-STAT signaling pathway ' ) , and ` Xenobiot-ics Biodegradation and Metabolism ' . 
+ Only two categories of pathways were predicted to be primarily inhibited in mammary tissue after IMI challenge , i.e. the ` Biosynthesis of Other Secondary Metabolites ' and ` Lipid metabolism ' . 
+ Surprisingly , no pathways or functions relating to the utilization of BHBA in the mammary transcriptome were observed and is partly supported by the down-regulation in expression of 3-hydroxybutyrate dehydrogenase isoforms ( File S1 ) , genes coding for a key proteins for the utilization of 3-hydroxybutyrate . 
+ Increased transfer of BHBA from circulation to the mammary gland partly explains the decrease in blood BHBA relative to increases in milk BHBA observed in this study [ 15 ] . 
+ The mammary transcriptome data indicate little or no utilization of BHBA by mammary tissue . 
+ The role , if any , of BHBA in milk during IMI is unclear and warrants further investigation . 
+ The importance of lipid metabolism was highlighted by ` Fatty acid biosynthesis ' and ` Glycerolipid metabolism ' being among the top impacted and inhibited pathways ( S6 Fig ) . 
+ Among the many immune-related pathways that were highly impacted and activated , the ` Chemokine signaling pathway ' and the ` NODlike receptor signaling ' were the most important ( Fig 3 and S6 Fig ) . 
+ The results from the IPA analysis ( Fig 8 ) supported DIA results . 
+ All the most enriched pathways were associated with the activation of the inflammatory response such as adhesion and diapedesis of leukocytes , ` Interferon Signaling ' , IL-10 and IL-6 Signaling , and ` Acute Phase 
+ Response Signaling ' . 
+ The ` LXR/RXR Activation ' was the only metabolic-related pathway significantly enriched ( Fig 8 ) . 
+ Despite having the majority of DEG associated with these pathways up-regulated by IMI , several of the DEG were strongly down-regulated and are known to be associated with milk fat synthesis . 
+ Among these are LPL ( lipoprotein lipase ) , FASN ( fatty acid synthase ) , SREBF1 , and ACACA ( acetyl-CoA carboxylase alpha ) ( S5 Fig ) . 
+ No down-regulation of PPARG ( PPARγ ) or NR1H3 ( LXR ) were detected ( S5 Fig ) despite the fact that milk fat synthesis was substantially reduced post-IMI [ 15 ] . 
+ Previous data with IMI using Streptococcus uberis were indicative of PPARγ being important in the observed reduction of milk fat synthesis [ 54 ] . 
+ These data with other evidences allowed proposing a role of PPARγ in the control of milk fat synthesis in the bovine mammary gland [ 38 ] . 
+ The data from the present paper only partly support a role of PPAR and LXR being important in the decreased milk fat synthesis due to E. coli IMI , mostly inferred by the decrease in expression of target genes ; however , none of the bioinformatics analysis results were indicative of these two LdNR being crucial in the E. coli IMI response . 
+ Overall , the results from DIA and IPA clearly are indicative of a marked activation of pathways and functions associated with the immune response and inhibition of pathways associated with lipid synthesis in mammary tissue 24 h post-IMI challenge with E. coli . 
+ The analysis of up-stream regulators ( Fig 9 ) revealed a primary role of a large number of cytokines , growth factors , and TR , with almost all predicted to be highly activated in the coordination of the transcriptomic adaptation to IMI by the mammary tissue . 
+ Among the up-stream regulators , the largest predicted activation was uncovered for INFG ( interferon gamma ) and TNFA among DEG coding for cytokines , whereas EGF , NRG1 ( neuregulin 1 ; that can bind EGF receptor ) , and VEGF ( vascular endothelial growth factor ) among the growth factors , RELA ( v-rel avian reticuloendotheliosis viral oncogene homolog A ) and NFKB1 ( nuclear factor of kappa light polypeptide gene enhancer in B-cells ) were among the TR most highly activated in mammary tissue at 24 h post-IMI . 
+ The data depict a large importance of TR involved in inflammation and immune response with an interesting induction of epithelial proliferation ( i.e. , EGF ) [ 55 ] and vasculogenesis ( i.e. , VEGF ) . 
+ Several transmembrane receptors were predicted to be highly activated , among these prevailed several of the TLR which are involved in the inflammatory response ( Fig 9 ) . 
+ Contrary to the liver , where no miRNA ( microRNA ) were predicted to be upstream regulators , the mammary tissue response resulted in a relatively large number of miRNA inhibited with miR16-5p be the most inhibited ( Fig 9 ) . 
+ Interestingly , when compared with the actual data ( S5 Fig ) , none of the predicted down-regulated miRNA by IPA were actually affected by the IMI . 
+ Up to 11 miRNA were significantly affected by the IMI ( FDR < 0.001 ) with 9 of them up-regulated and 2 down-regulated ( miR30F and miR33b ) . 
+ Among the up-reg-ulated miRNA , the miR23A was induced > 420-fold in mammary 24 h post-IMI ( S5 Fig ) . 
+ The up-regulation of miR23A can be associated with the activation of TLR , especially TLR2 [ 56 ] . 
+ Whether the miRNA expression observed serves as a feedback signal altering the mammary transcriptome is unclear . 
+ The transcription regulation network controlling the transcriptomic adaptation of the mammary tissue 24 h post-IMI was very large ( Fig 10 ) . 
+ Most of the TR were deemed to be activated and associated with inflammatory response under the control of a plethora of cytokines . 
+ The TR in the network were highly associated with the acute phase response and activity and proliferation of leukocytes . 
+ The TR of the network were also estimated to be associated with synthesis of lipid and metabolism of proteins ( Fig 10 ) . 
+ Overall the data are indicative of a large inflammatory response of the mammary at 24 h post-IMI with little or no effect on metabolism , with the exception perhaps of some negative effects on milk fat synthesis supporting the decrease in milk fat synthesis observed [ 15 ] . 
+ The with E. coli . 
+ Ingenuity Pathway Analysis predicts causal effects among up-stream regulators and targets ( i.e. differentially expressed genes ; DEG ) . 
+ The analysis provides the more plausible prediction of the status of the up-stream regulator ( i.e. activated or inhibited ) by computing an overlapping P-value and a Z-score . 
+ The upstream regulators are grouped by functional categories with an activation Z-score ( positive = activated ; negative = inhibited ) , significance of overlap ( or enrichment ; as -- log10 of the P-value ) , and , when available and significant , the expression ratio . 
+ adaptation of mammary tissue was highly coordinated by cytokines that appeared to have driven the inflammatory signaling network of the mammary . 
+ Both RNAseq reported here and microarray analyses in liver [ 7,17 ] and mammary tissue [ 13 ] resulted in similar conclusions . 
+ In mammary tissue , an up-regulation of DEG associated with the inflammatory response and a down-regulation of DEG associated with lipid metabo-lism were observed in both studies . 
+ In liver , DEG associated with the inflammatory response were up-regulated during the early inflammatory response ( i.e. 12 h post-IMI challenge ) whereas DEG associated with metabolism were down-regulated during peak inflammatory response ( i.e. 24 h post-IMI challenge ) . 
+ Results indicate that during the early response of mastitis an increase in both pro-and anti-inflammatory factors may help control inflammation while minimizing damage to liver tissue . 
+ In addition , results suggest an increase in the synthesis of acute phase proteins in response to pro-inflammatory cytokines ( e.g. IL-1β , IL-6 and TNF-α ) released primarily from resident macrophages and mammary epithelial cells during mastitis . 
+ During peak inflammatory response ( i.e. 24 h post-IMI challenge ) , hepatic tissue shifts from an inflammatory state to a reduction in the liver 's ability to metabolize nutrients , especially energy and protein metabolism . 
+ Cross-talk Between Liver and Mammary Inferred by the Transcriptomic Analysis 
+ In Fig 11 is shown the cross-talk between liver and mammary tissue as inferred by the DEG with an expression 2-fold at 24 h post-IMI challenge with E. coli . 
+ The use of a more stringent criteria identified approximately 2,300 DEG in liver at 24 h post-IMI compared to -144 h and 1,800 DEG in mammary tissue in the IMI vs. control ( 51.6 % and 76.8 % of the DEG compared to the functional analysis reported above ) . 
+ The analysis revealed substantial cross-talk between the two tissues with a communication almost unidirectional , i.e. mammary to liver . 
+ The IPA analysis predicted that the mammary tissue primarily had a large increase in secreted inflammatory cytokines , e.g. IL-10 , IL-6 and IL-1β , that interacted with liver receptors highly expressed during IMI , i.e. TLR2 , IL1R1 , TNFRSF1A , STAT3 , and BCL3 ( B-cell CLL/lymphoma 3 ) with a consequent increase in hepatic proliferation and regeneration and the inflammatory response ( primarily acute phase protein response ; Fig 2 ) . 
+ The IPA analysis identified 3 up-regulated DEG in liver at 24 h post-IMI , GRP ( gastrinreleasing peptide ) , SPP1 ( secreted phosphoprotein 1 ; osteopontin ) , and EPO ( erythropoietin ) , that encode for signaling proteins that may potentially interact with receptors involved in the migration of neutrophils , development of mononuclear leukocytes and infiltration of granulocytes in mammary tissue ( Fig 11 ) . 
+ The gastrin-releasing peptide was originally classified as a neurotransmitter [ 57 ] and more recently has been associated with the endocrine response primarily as a regulatory peptide in the cow reproductive tract [ 58 ] . 
+ The protein encoded by SPP1 , i.e. osteopontin , is expressed mainly in the bone and kidney and is involved in the attachment of osteoclasts to the mineralized bone matrix . 
+ Osteopontin also acts as a cytokine that regulates the immune-mediated disease response [ 59 ] . 
+ The SPP1 protein has been associated with inflammation , metabolic diseases , fatty liver , and liver fibrogenesis in human [ 60,61 ] . 
+ The up-regulation of SPP1 at 24 h post-IMI challenge in the liver may have played a role in the impairment of hepatic metabolic function during inflammation as well as maintenance of the inflammatory response in the mammary tissue . 
+ challenge with E. coli . 
+ Purple objects for liver and blue for mammary denote genes with expression 2-fold at 24 h after intramammary infection challenge with E. coli . 
+ Differentially expressed genes that code for proteins that are either released ( ! ) 
+ by the tissue as cytokines and growth factors ( drawn outside the organs ) or function as receptor ( drawn inside the organ ) . 
+ Solid ➞ denote direct and dashed ⇢ denote indirect activation of receptors in one organ by cytokines or growth factors with a potential increased release from the other organ . 
+ In yellow shade are most enriched functions of the affected receptors . 
+ SREBP isoforms , CEBP [ CCAAT/enhancer-binding protein alpha isoform d ] , and XBP1 ) . 
+ Coupling the metabolic response in blood and milk with transcriptomic responses in liver , our results indicate impaired liver metabolism during later stages of inflammation . 
+ The transcriptome analysis of the mammary tissue at 24 h post-IMI indicated a large immune response of the tissue with very little effect on metabolism , except a likely inhibition of lipid synthesis . 
+ Contrary to the liver , the transcriptomic adaptation of the mammary tissue appeared to be driven by a large network of up-stream regulators , with a large sensitivity to cytokines . 
+ The inflammatory transcriptomic response of the mammary tissue is supported by the inflammatory response observed in milk where dramatic increase in inflammatory mediators was observed after IMI with E. coli . 
+ The analysis of cross-talk uncovered a large communication from the mammary to the liver to coordinate the inflammatory response with very few factors potentially released by the liver to control the response of the mammary tissue during IMI . 
+ Our data also indicate that the mammary tissue did not directly influence the decreased metabolism of the liver at 24 h post-IMI but , likely , indirectly impacted hepatic metabolism via stimulation of the hepatic inflammatory response . 
+ A summary of the most relevant findings in the present experiment are reported in Fig 12 . 
+ In conclusion , our data revealed a different response of the liver and mammary tissue during IMI , with a similar overall inflammatory-like response of the mammary tissue in the quarter treated with IMI vs. the control quarter at 24 h post-treatment and liver at 12 h vs. -144 h post-IMI with also a large increase in expression of positive acute phase proteins related genes ( S4 Fig ) . 
+ Despite this , the metabolism of the mammary tissue was not significantly affected . 
+ As a result , the mammary tissue did not experience the `` shutdown '' of specific functions , i.e. metabolism and clearance , as observed in the liver , at 24 h post-IMI . 
+ The momentarily impairment of metabolism and clearance capability of the liver as consequence of the IMI might be partly explained by the shift in partitioning of nutrients towards the immune response . 
+ Due to the pivotal role played by the liver in controlling overall nutritional economy , metabolism , clearance inflammatory status in the mammary tissue . 
+ Our analyses revealed substantial cross-talk between the two tissues with a communication almost unidirectional , i.e. mammary to liver , via various cytokines and growth factors altered during inflammation . 
+ The mammary tissue appears to have synthesized positive acute phase proteins but did not experience a reduction of negative acute phase proteins ( S4 Fig ) . 
+ By 24 h post-IMI , milk synthesis decreased in the infected mammary gland , primarily milk fat synthesis . 
+ AHR = aryl hydrocarbon receptor ; APP = acute phase proteins ; BHBA = beta hydroxybutyrate ; CEBP = CCAAT/enhancer-binding protein alpha isoform d ; HNF4A = hepatocyte nuclear factor 4 , alpha ; IMI = intramammary infection ; NFKB = nuclear factor of kappa light polypeptide gene enhancer in B-cells ; NR1I3 = nuclear receptor subfamily 1 , group I , member 3 ; PPAR = peroxisome proliferator-activated receptors ; RELA = v-rel avian reticuloendotheliosis viral oncogene homolog A ; SREBP = sterol regulatory element-binding protein ; STAT3 = signal transducer and activator of transcription 3 ; VLDL = very low density lipoproteins . 
+ from xenobiotics , and the immune response , our findings of a transcriptionally driven decline of critical functions in the liver provide further evidence of the negative consequence on overall performance of dairy cows affected by mastitis . 
+ The IMI response appeared to be highly coordinated by the potential increase in signaling between the two tissues with a strong regulatory role of the mammary tissue toward the liver via signaling molecules . 
+ Overall , the data revealed a previously unknown cross-talk between mammary and liver coordinating the response to IMI . 
+ Supporting Information
+ S1 Fig . 
+ Concentrations of alkaline phosphatase ( A ) , N-acetyl-β-D-glucosaminidase ( NAGase ; B ) and lactate dehydrogenase ( LDH ; C ) in milk from cows ( n = 16 ) after intramammary challenge during early lactation . 
+ Differences ( P < 0.05 ) when compared to h = 0 . 
+ ( DOCX ) 
+ S2 Fig . 
+ Changes in daily feed intake ( A ) and milk yield ( B ) from cows ( n = 16 ) after intramammary challenge during early lactation . 
+ Differences ( P < 0.05 ) when compared to h = 0 . 
+ ( DOCX ) 
+ S3 Fig . 
+ Plasma concentrations of non-esterified fatty acids ( NEFA ; A ) , beta-hydroxybuty-rate ( BHBA ; B ) , glucose ( C ) and cholesterol ( D ) from cows ( n = 16 ) after intramammary challenge during early lactation . 
+ Differences ( P < 0.05 ) when compared to h = 0 . 
+ ( DOCX ) 
+ S4 Fig . 
+ Expression of genes coding for selected pro-inflammatory cytokines and positive and negative acute phase proteins ( data as also reported in S5 Fig ) . 
+ ( JPG ) 
+ S6 Fig . 
+ Dynamic Impact Approach of the KEGG pathways analysis for both liver and mammary tissue . 
+ Presented are the summary of the categories of pathways , the details of each pathway , and sorted in descending order of impact in each comparison . 
+ ( XLSX ) 
+ S7 Fig . 
+ Results of the pathway and function analysis of Ingenuity Pathways Analysis for both liver and mammary tissue . 
+ ( XLSX ) 
+ S8 Fig . 
+ Complete results of the predicted up-stream regulators by Ingenuity Pathway Analysis for both liver and mammary tissue . 
+ ( XLSX ) 
+ Acknowledgments
+ We thank Dr Helle Daugaard Larsen , Danish Meat Research Institute ( DMRI ) , Roskilde , Denmark for the E. coli bacteria . 
+ The AU laboratory technicians and barn staff are thanked and acknowledged for their excellent technical assistance . 
+ 38 . 
+ Bionaz M , Chen S , Khan MJ , Loor JJ ( 2013 ) Functional Role of PPARs in Ruminants : Potential Targets for Fine-Tuning Metabolism during Growth and Lactation . 
+ PPAR Res 2013 : 684159 . 
+ doi : 10.1155 / 2013/684159 PMID : 23737762
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/27424527.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/27424527.txt 0 → 100644
View file @27818a9
+ OPEN Systematic analysis of an evolved
+ Thermobifida fusca is a thermophilic actinobacterium . 
+ T. fusca muC obtained by adaptive evolution preferred yeast extract to ammonium sulfate for accumulating malic acid and ammonium sulfate for cell growth . 
+ We did transcriptome analysis of T. fusca muC on Avicel and cellobiose with addition of ammonium sulfate or yeast extract , respectively by RNAseq . 
+ The transcriptional results indicate that ammonium sulfate induced the transcriptions of the genes related to carbohydrate metabolisms significantly more than yeast extract . 
+ Importantly , Tfu_2487 , encoding histidine-containing protein ( HPr ) , did n't transcribe on yeast extract at all , while it transcribed highly on ammonium sulfate . 
+ In order to understand the impact of HPr on malate production and cell growth of the muC strain , we deleted Tfu_2487 to get a mutant strain : muCΔ2487 , which had 1.33 mole/mole-glucose equivalent malate yield , much higher than that on yeast extract . 
+ We then developed an E. coli-T . 
+ fusca shuttle plasmid for over-expressing HPr in muCΔ2487 , a strain without HPr background , forming the muCΔ2487S strain . 
+ The muCΔ2487S strain had a much lower malate yield but faster cell growth than the muC strain . 
+ The results of both mutant strains confirmed that HPr was the key regulatory protein for T. fusca 's metabolisms on nitrogen sources . 
+ Thermobifida fusca is a thermophilic actinobacterium , which is an effifficient degrader of plant cell walls1 . 
+ T. fusca has been heavily studied for more than 30 years because it can produce a variety of cellulases , hemicellulases and other enzymes . 
+ These enzymes especially cellulases are thermostable and their activities retain high from pH 4 to 102 -- 9 . 
+ Previously , a mutant strain , T. fusca muC obtained from adaptive evolution by cultivating T. fusca WT on non-lethal medium , was found to accumulate malic acid from different sugars10 . 
+ Malic acid is an important value-added product of the C4 diacids family , reported by the U.S. Department of Energy as the chemical of great commercial interest produced from renewable substrates11 . 
+ The malic acid synthesis pathway in T. fusca was identified by Deng et al. : phosphoenolpyruvate from glycolysis pathway was converted to oxaloacetate , which was then reduced to malate12 . 
+ Previously , the nitrogen sources were proved to affect the cell growth and malic acid production significantly12 . 
+ In this study , we did transcriptome analysis of T. fusca muC on Avicel and cellobiose with addition of ammonium sulfate and yeast extract , respectively . 
+ The histidine phosphocarrier protein ( Hpr or Tfu_2487 ) did n't express at all in the muC strain on ammonium sulfate , with faster cell growth and less malic acid yield compared to the one on yeast extract . 
+ In order to understand the impact of HPr on the muC strain 's metabolisms , the hpr gene was deleted in T. fusca muC , forming muCΔ2487 . 
+ Tfu_2487 was then over-expressed in muCΔ2487 , forming muCΔ2487S , which was grown on ammonium sulfate and yeast extract . 
+ 1National Engineering Laboratory for Cereal Fermentation Technology ( NELCF ) , Jiangnan University , 1800 Lihu Road , Wuxi , Jiangsu 214122 , China . 
+ 2The Key Laboratory of Industrial Biotechnology , Ministry of Education , Jiangnan University , 1800 Lihu Road , Wuxi , Jiangsu 214122 , China . 
+ 3College of Life Science , North China University of Science and Technology , Tangshan 063000 , China . 
+ 4School of pharmaceutical science , Jiangnan University , 1800 Lihu Road , Wuxi , Jiangsu 214122 , China . 
+ Correspondence and requests for materials should be addressed toY.D . 
+ ( email : dengyu@jiangnan.edu.cn ) or X.Z. ( email : xzhang6@163.com ) 
+ Results
+ Effect of nitrogen sources . 
+ The muC strain was grown on four conditions with different carbon and nitrogen sources : 1 ) CB1 : 2g/L yeast extract , 5g/L cellobiose ; 2 ) CB2 : 2g/L yeast extract , 5g/L Avicel ; 3 ) CC1 : 2g/L ammonium sulfate , 5 g/L cellobiose ; 4 ) CC2 : 2 g/L ammonium sulfate , 5 g/L Avicel . 
+ The fermentation results are shown in Fig. 1 . 
+ The specific growth rates on CC1 and CC2 were 0.218 h − 1 and 0.221 h − 1 , which were significantly higher than those on CB1 ( 0.142 h − 1 ) and CB2 ( 0.154 h − 1 ) . 
+ However , the muC strain on CB1 and CB2 had much higher malic acid yields than those on CC1 and CC2 . 
+ It seems that yeast extract is much more suitable for producing malic acid , while T. fusca muC favors ammonium sulfate to accumulate biomass . 
+ Global gene expression and clusters of orthologous groups ( COG ) analysis . 
+ In order to investigate transcriptional changes on different conditions , the transcriptomes of the muC strain on CB1 , CB2 , CC1 and CC2 were analyzed by RNAseq . 
+ The RNAseq results were stored in the GenBank ( SRA # : SRP067532 ) . 
+ The results of transcriptomes are shown in Supplementary Table S2 . 
+ The COG distributions for the genes up-regulated and down-regulated in the transcriptomes of them are shown in Fig. 2 . 
+ Compared to CB1 , T. fusca muC on CC1 had 357 genes up-regulated and 335 genes down-regulated ; the top three categories of up-and down-regulated genes are : R : general function prediction only 
+ ( equivalent to unclassified ) , E : Amino acid transport and metabolism and C : Energy production and conversion . 
+ Compared to CB2 , T. fusca muC on CC2 had 292 genes up-regulated and 357 genes down-regulated . 
+ The top three categories of up-and down-regulated genes are : R : General function prediction only ; C : Energy production and conversion ; C : Amino acid transport and metabolism . 
+ The observed cell growth and metabolite changes were phenotypes that largely depended on changes in carbohydrate transport and metabolism . 
+ The genes responsible for producing malic acid by muC were also mainly included in the above category . 
+ When considering genes associated with carbohydrate transport and metabolism , RNA transcripts in CC1 ( as compared to the CB1 strain ) : 184 genes up-regulated and 27 genes down-regulated ; the ones in CC2 ( as compared to the CB2 ) : 172 genes up-regulated and 10 genes down-regulated . 
+ Because T. fusca muC on CC1 and CC2 used ammonium sulfate as the sole nitrogen source and T. fusca muC on CB1 and CB2 used yeast extract as the sole nitrogen source , the above transcriptional results indicate that ammonium sulfate induced the transcriptions of the genes related to carbohydrate metabolisms significantly more highly than yeast extract . 
+ Based on the cell growth in Fig. 1 , it is indicated that the higher transcriptional levels of the genes related to carbohydrate metabolism were in agreement with the faster cell growth of the muC strain on CC1 and CC2 ( ammonium sulfate as the nitrogen source ) . 
+ Genes related to cellulases . 
+ The widely accepted mechanism for enzymatic hydrolysis of cellulose involves the synergistic activity of endocellulases , exocellulases and processive cellulases13 . 
+ There were 10 cellulase genes and 3 hemicellulase genes found to be highly expressed in the muC strain under all conditions ( Table 1 ) . 
+ In Table 1 , 7 out of 10 cellulase genes and 2 out of 3 hemicellulase genes of the muC strain on CC1 transcribed more than those on CB1 ( LFC ≥ 0 ) . 
+ All of the cellulase and hemicellulase genes except Tfu_1607 of the muC strain , transcribed significantly more on CC2 than those on CB2 ( LFC ≥ 1 ) . 
+ Although most of the cellulase gene 's expressions were higher on ammonium sulfate than on yeast extract , in order to estimate the synergistic effect of the cellulases corresponding to gene expressions , the specific cellulase activities were tested . 
+ In Fig. 3 , the muC strain on CC1 had much higher specific cellulase activity than the one on CB1 . 
+ And the specific cellulase activity of muC on CC2 was much higher than that on CB2 . 
+ Thus , the cellulase gene expressions were in agreement with the specific cellulase activity well . 
+ Malic acid production . 
+ The synthesis of malic acid involved phosphoenolpyruvate carboxykinase ( Tfu_0083 ) and malate dehydrogenase ( Tfu_0092 ) . 
+ The degradation of malic acid included ` malic ' enzyme ( Tfu_0562 ) , malate dehydrogenase ( oxaloacetate decarboxylating ) ( Tfu_2390 ) and fumarate hydratase ( Tfu_0459 ) 12 . 
+ All the genes related to malic acid metabolism were expressed highly ( Supplementary Table S2 ) . 
+ The gene expressions related to malic acid pathway were shown in Table 2 . 
+ In Fig. 1B , the muC strain on CB1 and CB2 produced a lot more malic acid than that on CC1 and CC2 . 
+ However , the transcriptions of the major genes related to malic acid synthesis were inconsistent with the fermentation results . 
+ We then further looked into the major genes related to the malic acid degradation to pyruvate , which were ` malic ' enzyme ( Tfu_0562 ) and malate dehydrogenase ( oxaloacetate decarboxylating , Tfu_2390 ) . 
+ Both genes on CC1 and CC2 transcribed significantly more highly than the one on CB1 and CB2 , causing less malate accumulated by the muC strain on CC1 and CC2 . 
+ PTS system related genes and the deletion of hpr ( Tfu_2487 ) . 
+ The bacterial phosphoenolpyruvate ( PEP ) : sugar phosphotransferase system ( PTS ) regulates the use of carbon sources in bacteria14 -- 16 . 
+ PTS system consists of enzyme I ( EI ) and histidine phosphocarrier protein ( HPr ) and several sugar specific enzyme IIs . 
+ EI transfers phosphoryl groups from PEP to the phosphoryl carrier protein HPr . 
+ HPr then transfers the phosphoryl groups to the different transport complexes . 
+ PTSs are ubiquitous in bacteria but do not occur in archaea and eukaryotes . 
+ It has been found that EI and HPr exist while enzyme IIs are absent in T. fusca17 . 
+ And HPr ( Tfu_2487 ) was thought to be the regulatory protein for T. fusca 's metabolism17 . 
+ On CB1 and CB2 , there was no Tfu_2487 transcription detected , while it transcribed highly on the medium on CC1 and CC2 . 
+ The transcription of this gene was confirmed by RT-qPCR ( Supplementary Table S1 ) . 
+ After searching the genome annotations of T. fusca , Tfu_2765 ( enzyme I ) and Tfu_2487 ( HPr ) have been well-annotated and they both transcribed highly ( Supplementary Table S2 ) . 
+ However , EII complexes , where the carbohydrate specificity of the PTS resides , were missing in T. fusca . 
+ The above results indicate that the PTS system was not complete in T. fusca , but it worked for the regulatory systems to control the carbon metabolisms1 . 
+ In order to determine if the zero transcription of Tfu_2487 was the reason why the muC strain produced a lot more malic acid on yeast extract than ammonium sulfate , it was deleted in the muC strain by homologous recombination by using plasmid puC-hpr-del , forming strain muCΔ2487 . 
+ The puC-hpr-del plasmid was not a replicating plasmid , which was eliminated after the homologous recombination . 
+ The muCΔ2487 and muC strains were then grown on the medium with ammonium sulfate or yeast extract , respectively . 
+ In Fig. 4 , muCΔ2487 grown on ammonium sulfate had a little higher biomass ( 0.41 g/L ) than yeast extract ( 0.35 g/L ) , however , it grew much worse than the muC strain ( 0.94 g/L DCW on ammonium sulfate and 0.41 g/L DCW on yeast extract ) . 
+ The highest titer of malate produced by muCΔ2487 was achieved on ammonium sulfate ( 5.23 g/L with 1.33 mole-malate/mole-glucose equivalent yield ) , not on yeast extract , and it was opposite to the muC strain ( 4.33 g/L on yeast extract with 1.10mole-malate / mole-glucose equivalent yield ) . 
+ Besides , on either yeast extract or ammonium sulfate , muCΔ2487 produced a lot more malic acid than the muC strain ( Fig. 4B ) 18 . 
+ The above results confirm that HPr was the repressor for malic acid production 
+ Over expression of Tfu_2487 in muCΔ2487 strain . 
+ In order to study if the over-expression of Tfu_2487 solely could affect cell growth and malic acid production , Tfu_2487 was expressed in plasmid pYD-Tfu-hpr , a replicating plasmid designed for T. fusca . 
+ Tfu_2487 gene was cloned into the E. coli-T . 
+ fusca shuttle plasmid pYD-Tfu-4 with thiostrepton resistant gene as the selection marker . 
+ The backbone of pYD-Tfu-4 was based on pIJ602119 , a replicating plasmid in Streptomyces coelicolor and puC18 in E. coli . 
+ However , the replication of origin of pYD-Tfu-4 was from T. fusca to ensure the replication in T. fusca strains . 
+ Besides , the promoter region for thiostrepton resistant gene ( tsr ) was replaced by the one for gapdh ( Tfu_2017 ) transcription in T. fusca to ensure a strong transcription of tsr in T. fusca . 
+ The hpr gene was cloned to the multiple cloning site of this plasmid , forming pYD-Tfu-hpr . 
+ The transcription of hpr was controlled by a thiostrepton inducible promoter , tipA . 
+ The above plasmid replicated in muCΔ2487 , forming the muCΔ2487S strain . 
+ The cell growth and malic acid yield of this mutant strain was shown in Fig. 5 . 
+ The biomass of the muCΔ2487S strain was much higher than the muC strain both on ammonium sulfate ( 0.99 g/L for muCΔ 2487S and 0.93 g/L for muC strain ) and yeast extract ( 0.86 g/L for muCΔ2487S and 0.66 g/L for muC strain ) . 
+ The yield of malic acid by muCΔ2487S ( 0.64 mole-malate/mole-glucose equivalent ) was much lower than the muC and muCΔ2487 strains ( 0.80 and 1.33mole-malate / mole-glucose equivalent , respectively ) on ammonium sulfate . 
+ Although yeast extract was thought to induce the production of malic acid , the malate yield of muCΔ2487S ( 0.71 mole-malate/mole-glucose equivalent ) was significantly lower than muC ( 1.10 mole-malate/mole-glucose equivalent ) and muCΔ2487 strains ( 1.24 mole-malate/mole-glucose equivalent ) . 
+ The above results indicate that the over-expression of Hpr could increase the biomass , however , the malate yield was reduced significantly . 
+ Discussion
+ Thermobifida fusca is a thermophilic actinobacterium . 
+ Previously , we isolated a mutant strain : T. fusca muC from adaptive evolution . 
+ T. fusca muC was confirmed to produce malic acid a lot more than its parent strain YX10 . 
+ After identifying the malate synthesis pathway , we engineered the muC strain to produce a lot more malic acid from sugars12 . 
+ Importantly , by fermentation optimization , the choices of nitrogen sources were found to affect the cell growth and malate production . 
+ However , the intracellular metabolisms related to the muC strain were largely unknown . 
+ In order to investigate the effect of nitrogen sources on cell growth and malate synthesis in T. fusca muC , RNAseq was employed to analyze the transcriptional changes corresponding to nitrogen sources . 
+ After analyzing the RNAseq data , the most important discovery was that Tfu_2487 , annotated for histidine-containing protein ( Hpr ) , did n't transcribe on yeast extract , however it transcribed highly on ammonium sulfate . 
+ We then deleted Tfu_2487 to get a mutant strain : muCΔ2487 , which produced a lot more malic acid than the muC strain on ammonium sulfate ( a nitrogen source bad for malate production by muC ) . 
+ However , the deletion of Tfu_2487 is not the ultimate evidence to prove the role of Hpr in cell growth and malate production . 
+ We also need to over-express it in muCΔ2487 , a strain without Hpr background . 
+ The hpr ( Tfu_2487 ) gene was over-expressed on an E. coli - T. fusca shuttle plasmid pYD-Tfu-3 in muCΔ2487 , forming muCΔ2487S . 
+ The muCΔ2487S strain experienced a low malate yield and faster cell growth compared to the muC strain . 
+ HPr is an important member in the PTS system . 
+ We analyzed the genome annotations of T. fusca , and found that the PTS system was incomplete . 
+ In bacteria , HPr transfers a phosphoryl group to the CheY domains of response regulators , which typically regulate transcription . 
+ Adenylate cyclases increase the cellular level of cAMP , which , along with CAP protein , stimulates transcription from a number of promoters14 -- 16,20 . 
+ Adenylate cyclase , encoded by Tfu_2552 in T. fusca , was found to be highly active . 
+ The proteins interacted with adenylate cyclase in T. fusca was predicted by STRING 10 ( Supplementary Figure S3 ) 20 . 
+ Tfu_2552 might regulate RNA polymer-ase subunits ( rpoB , rpoC , rpoA and rpoZ ) , which controlled the transcription of genes in T. fusca . 
+ The pyruvate kinase ( Tfu_1179 ) was also regulated by Tfu_2552 . 
+ Supplementary Figure S4 show that pyruvate kinase ( Tfu_1179 ) interacted with malate dehydrogenase ( Tfu_2390 ) and ` malic enzyme ' ( Tfu_0562 ) , which are directly related to the synthesis of malic acid . 
+ Thus , the proposed rough scheme of malate regulation is : HPr → adenylate cyclase → pyruvate kinase → malate dehydrogenase and malic enzyme → malic acid . 
+ Thus , we will employ proteomics analysis and other experimental tools to verify and modify the above prediction . 
+ Based on the fermentation data of the muCΔ2487 strain , the deletion of hpr increased the malic acid production significantly , indicating that HPr was the repressor for malic acid production . 
+ In addition , the over-expression of HPr in muCΔ2487 , dramatically impaired the malic acid production , confirming the role of Hpr on malate synthesis . 
+ However , the mechanisms of HPr regulation in T. fusca need more study . 
+ Methods
+ Strains and cultivation . 
+ Thermobifida fusca was grown on Hagerdahl medium with addition of desired carbon and nitrogen sources2 ,21 . 
+ 10 % V/V of pre-cultures of T. fusca were grown in the shaken flasks at 55 °C with 250 rpm rotation . 
+ The shaken flasks were sealed by the rubber stoppers to reduce the air exchange during the fermentation . 
+ 150 mL precultures were inoculated into a 5 L fermentor at 55 °C with various stirring speeds . 
+ E. coli was grown on Luria-Bertani ( LB ) or SOC medium for molecular work . 
+ The strains and plasmids used in this study are shown in Table 3 . 
+ The primers and DNA sequences are shown in the Supplementary Table S2 . 
+ Molecular Work . 
+ Deletion of Tfu_2487 . 
+ The general strategy of deleting Tfu_2487 in T. fusca was described previously22 . 
+ The brief process is : the Hpr deletion cassette included ~ 500 bp DNA fragment homologous to the upstream of hpr gene , the inducible promoter region of Tfu_2176 ( endoglucanase ) of T. fusca ( genome location : 2,552,376-2 ,552,832 , 457 bp ) , kanamycin resistant gene from pET-28a ( + ) ( 813 bp ) and 500 bp DNA fragment homologous to the downstream of hpr gene ( the sequences are shown in the supplementary file 1 ) . 
+ The above cassette was synthesized and ligated to pUC57 by Genewiz ( Suzhou , China ) , forming puC-hpr-del plasmid . 
+ The resulting plasmid with right structure was then transformed to E. coli BL21 ( DE3 ) . 
+ Then , puC-hpr-del plasmid was introduced to T. fusca muC for deleting hpr gene . 
+ Plasmid work . 
+ The backbone of the E. coli-T . 
+ fusca shuttle plasmid was puC18 . 
+ The replication of origin of T. fusca was identified by the online tool ( http://tubic.tju.edu.cn/doric/index.php ) 23,24 . 
+ The DNA cassette : tfu-shuttle for T. fusca included : Kanamycin resistant gene , tsr-inducible MCS region , the gapdh promoter region of T. fusca , thiostrepton resistant gene ( tsr ) and origin of replication was synthesized by Genewiz ( Suzhou , China ) . 
+ The puC18 was cut at NdeI and SacI and then was ligated to tfu-shuttle cassette by Gibson assembly25 , which was then transformed to E. coli JM109 . 
+ The transformants were picked from agar plates and subjected to colony PCR with primers hprEco-F and hprEco-R ( Supplementary Table S1 ) . 
+ The final plasmid was sequenced to verify the structure and designated as pYD-Tfu-4 . 
+ The sequence and map of pYD-Tfu-4 are shown in Supplementary Figure S1 . 
+ Construction of the plasmid for expressing hpr in T. fusca . 
+ The hpr gene was amplified by primers hprF and hprR ( Supplementary Table S1 ) . 
+ The PCR product was cloned to pUCm-T vector ( Sangon Biotech , Shanghai , China ) and sequenced and digested by BamHI and SphI . 
+ The , the digested hpr gene was purified and ligated to pYD-Tfu-4 , forming pYD-Tfu-hpr ( Supplementary Figure S2 ) . 
+ Transformation of T. fusca and allelic exchange . 
+ The puC-hpr-del and pYD-Tfu-4 plasmids were transformed to T. fusca protoplasts with the experimental details described previously22 . 
+ The transformants with pYD-Tfu-4 were grown on kanamycin and thiostrepton . 
+ The deletion of hpr was confirmed by diagnosis PCR using primers hprdel-F and hprdel-R ( Supplementary Table S1 ) and directly sequenced by Sanger sequencing , thus confirming a positive recombination event . 
+ Metabolite detection . 
+ The metabolites were detected by HPLC system ( Dionex Ultimate3000 ) equipped with Bio-Rad HPX-87H ion exclusion column . 
+ The mobile phase was 0.005 mol/L H2SO4 at the rate of 0.6 mL/min using IR and UV detection . 
+ RNAseq work . 
+ T. fusca muC under different conditions was grown to 24 hours and then cells were harvested by centrifugation at > 10,000 g for 15 min . 
+ The cell pellets were washed by fresh medium and then spin down again to get rid of the supernatant . 
+ The above process was repeated three times . 
+ The general procedures about sample preparation for RNAseq analysis were described previously26 . 
+ An Illumina HiSeq2000 PE100 was used in this study . 
+ The biological duplicates were used in this study . 
+ SRA accession number for samples is SRP067532 . 
+ Statistical Analysis . 
+ The Unity platform developed by Genome Canada was used to process the raw data . 
+ In this study , the reported data were subject to log2-transformation26 . 
+ If log2 ( a ) − log2 ( b ) ( defined as log2 fold change or LFC ) ≥ 1 or ≤ − 1 , the difference between a and b was significant . 
+ The other statistical analysis methods were according to Deng et al. 26 . 
+ RT-PCR . 
+ The RT-qPCR experiments were describe previously21 ,22,27 and the primers used are shown in Supplementary Table S1 . 
+ Cellulase activity. The general process of measuring cellulase activity has been described previously22.
+ Acknowledgements
+ This work was supported by the grants from National Natural Science Foundation of China ( 31500070 ) , Natural Science Foundation of Jiangsu Province ( BK20150136 , BK20150151 ) , the Fundamental Research Funds for the Central Universities ( JUSRP115A18 ) and the Key Laboratory of Industrial Biotechnology , Ministry of Education , Jiangnan University , China ( KLIB-KF201403 ) . 
+ Y.D. and X.Z. , conceived and designed the experiments . 
+ Y.D. , J.L. , Y.M. and X.Z. performed the experiments . 
+ Y.D. , J.L. , Y.M. and X.Z. contributed to the writing of the manuscript . 
+ All authors Y.D. , J.L. , Y.M. and X.Z. contributed to the analysis of data and approved the manuscript . 
+ Additional Information Supplementary information accompanies this paper at http://www.nature.com/srep Competing financial interests : The authors declare no competing financial interests . 
+ How to cite this article : Deng , Y. et al. . 
+ Systematic analysis of an evolved Thermobifida fusca muC producing malic acid on organic and inorganic nitrogen sources . 
+ Sci . 
+ Rep. 6 , 30025 ; doi : 10.1038 / srep30025 ( 2016 ) . 
+ This work is licensed under a Creative Commons Attribution 4.0 International License . 
+ The images or other third party material in this article are included in the article 's Creative Commons license , unless indicated otherwise in the credit line ; if the material is not included under the Creative Commons license , users will need to obtain permission from the license holder to reproduce the material . 
+ To view a copy of this license , visit http://creativecommons.org/licenses/by/4.0/
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/27466434.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/27466434.txt 0 → 100644
View file @27818a9
+ Thymus transcriptome reveals novel pathways in response to avian pathogenic Escherichia coli infection
+ ABSTRACT Avian pathogenic Escherichia coli as in 5 dpi vs. one dpi challenged-susceptible birds . 
+ The ( APEC ) can cause significant morbidity in chickens . 
+ Toll-like receptor signaling pathway was the major in-The thymus provides the essential environment for T nate immune response for birds to respond to APEC incell development ; however , the thymus transcriptome fection . 
+ Moreover , lysosome and cell adhesion molecules has not been examined for gene expression in response pathways were common mechanisms for chicken reto APEC infection . 
+ An improved understanding of the sponse to APEC infection . 
+ The T-cell receptor signaling host genomic response to APEC infection could inform pathway , cell cycle , and p53 signaling pathways were future breeding programs for disease resistance and significantly activated in resistant birds to resist APEC APEC control . 
+ We therefore analyzed the transcrip - infection . 
+ These results provide a comprehensive assess-tome of the thymus of birds challenged with APEC , ment of global gene networks and biological functioncontrasting susceptible and resistant phenotypes . 
+ Thou - alities of differentially expressed genes in the thymus sands of genes were differentially expressed in birds under APEC infection . 
+ These findings provide novel inof the 5-day post infection ( dpi ) challenged-susceptible sights into key molecular genetic mechanisms that difgroup vs. 5 dpi non-challenged , in 5 dpi challenged - ferentiate host resistance from susceptibility in this pri-susceptible vs. 5 dpi challenged-resistant birds , as well mary lymphoid tissue , the thymus . 
+ Key words : RNASeq , APEC , thymus , transcriptome , immune response 2017 Poultry Science 95:2803 -- 2814 http://dx.doi.org/10.3382/ps/pew202 
+ INTRODUCTION
+ Colibacillosis , caused by avian pathogenic Escherichia coli ( APEC ) , is an extraintestinal disease that may manifest as septicemia , pericarditis , or airsacculitis in poultry ( JanBen et al. , 2001 ; Stordeur et al. , 2004 ) . 
+ APEC also has been recently identified as a possible cause of human disease ( Rodriguez-Siek et al. , 2005 ; Ewers et al. , 2007 ; Johnson et al. , 2007 ; Russo and Johnson , 2003 ) . 
+ Studies report that APEC shares similar phylogenic background and certain virulence genes with human extraintestinal pathogenic Escherichia coli ( ExPEC ) , suggesting the potential of zoonotic risk of APEC ( Manges and Johnson , 2012 ) . 
+ C The Author 2016 . 
+ Published by Oxford University Press on behalf of Poultry Science Association . 
+ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/ by-nc/4 .0 / ) , which permits non-commercial re-use , distribution , and reproduction in any medium , provided the original work is properly cited . 
+ For commercial re-use , please contact journals.permissions @ oup.com Received March 3 , 2016 . 
+ Accepted May 5 , 2016 . 
+ 1Corresponding author : sjlamont@iastate.edu 
+ Moreover , contaminated chicken meat and eggs are potential sources of human infections ( Vincent et al. , 2010 ; Bergeron et al. , 2012 ) . 
+ APEC generally gains entry to the host bird via the respiratory tract ( Dho-Moulin and Fairbrother , 1999 ) . 
+ From there , bacteria enter the bloodstream and gain access to the viscera resulting in a multisystemic disease . 
+ Colibacillosis causes multimillion-dollar annual losses in the US poultry industry due to morbidity , mortality , and condemnation of infected products ( Kabir , 2010 ) . 
+ In the United Kingdom , a recent longitudinal survey of 4 broiler flocks sampled weekly for 4 wk showed 39 % of dead birds resulted from colibacillosis ( Kemmett et al. , 2013 ) . 
+ Also in the U.K. , 70 % of dead birds were caused by colibacillosis from a separate analysis of causes of mortality 2 to 3 d after placement of broiler chicks ( Kemmett et al. , 2014 ) . 
+ Although antibacterial agents have been used successfully to prevent this disease , restrictions on antibiotic usage in poultry production and APEC 's increasing resistance to antimicrobial agents have made colibacillosis control problematic ( Lanz et al. , 2003 ; Yang et al. , 2004 ) . 
+ Thus , control of colibacillosis by means other than antimicrobial agents is highly desirable . 
+ Variation in gene expression can be very useful in studying specimens treated under different conditions at a genome-wide level ( Alizadeh et al. , 2000 ; Ross et al. , 2000 ; Bahar et al. , 2006 ) . 
+ Many types of chicken microarrays have been used in genome-wide gene expression studies , including a macrophage microarray , avian innate immunity microarray , 44 K Agilent microarray , and Affymetrix chicken genome array ( Call et al. , 2001 ; Lavric et al. , 2008 ; Li et al. , 2008 ; Kranis et al. , 2013 ) . 
+ The new technology of RNAseq is an efficient and reliable tool to investigate genetic architecture and sequence variation and to quantify gene expression through whole transcriptome analysis ( Ozsolak and Milos , 2011 ) . 
+ We have reported its use in previous studies of the transcriptomic response of bone marrow and bursa to systemic APEC infection ( Sun et al. , 2015a ; 2015b ) . 
+ The current study used RNAseq technology to characterize the transcriptomic response of genes involved in the early phases of immune response against APEC by studying the primary lymphoid organ , the thymus . 
+ MATERIALS AND METHODS
+ Ethics Statement
+ All animal care and experimental procedures were reviewed and approved by the Iowa State University Institutional Animal Care and Use Committee ( # 11-07-6460-G ) . 
+ Avian Pathogenic Escherichia Coli (APEC) Experimental Design
+ A total of 360 commercial male broilers ( meat-type chickens ) were used in the pathogen-challenge trial . 
+ At 
+ 4 wk of age , 288 birds were inoculated with APEC O1 intra-air sac and , for the control group ( same type and age ) , 72 were injected with the same volume of phosphate buffered saline ( PBS ) . 
+ The APEC O1 strain and experimental procedures have been previously described in detail ( Sandford et al. , 2011 ; Sandford et al. , 2012 ) . 
+ At necropsy , the lesions on the liver , air sacs , and pericardium were scored . 
+ The range of scores for each tissue was : liver , 0 to 2 ; air sac , 0 to 3 ; and pericardium , 0 to 2 . 
+ The sum of its tissue lesion scores was used to assign the level of pathology of the individual bird . 
+ If the summed lesion scores were 0 to 3 , birds were classified as resistant ( mild lesion ) . 
+ If the summed lesion scores were 4 to 7 , birds were classified as susceptible ( severe lesion ) . 
+ The lesion scores were normally distributed over the 288 infected birds ( Sun et al. 2015a ) . 
+ Thymi were collected at one or 5 days post infection ( dpi ) . 
+ A total of 6 treatments were thus classified : 1 dpi challenged-resistant birds ; 1 dpi challenged-susceptible birds ; 1 dpi non-challenged birds ; 5 dpi challenged-resistant birds ; 5 dpi challenged-susceptible birds ; and 5 dpi non-challenged birds ( Figure 1 ) . 
+ Birds selected for RNA-seq analysis were from the 2 phenotypic extremes for APEC-induced pathology : resistant birds with 0 to 1 lesion scores and susceptible birds with 6 to 7 lesion scores . 
+ These were the same birds as the study of Sun et al. ( 2015a ) . 
+ Four individual bird replicates were used for each treatment group , totaling 24 samples . 
+ An Ambion MagMAX-96 Kit ( AM1839 ) ( Applied Biosystems , Foster City , CA ) was used to isolate RNA from the thymus samples . 
+ The quality and quantit of RNA were assessed using the Agilent 2100 Bioanalyzer according to the manufacturer 's instructions ( Agilent Technologies ) . 
+ RNA Integrity Numbers ( RIN ) for all the RNA samples selected to construct the cDNA libraries were greater than 8.0 . 
+ Next , an Illumina TruSeq © R RNA Sample Preparation v2 Kit was utilized to convert 0.1 to 4 μ g RNA into cDNA libraries . 
+ Twenty-four cDNA libraries , which included 4 biological replicates ( birds ) for each treatment , were constructed . 
+ Briefly , fragment mRNA was purified using oligo-dT beads from the initial RNA and reverse transcribed into a double strand cDNA fragment . 
+ End repair , adenylation , adapter ligation , and PCR amplification were then carried out in conformance with the TruSeq © R manufacturer 's instructions ( Protocol : # 15026495 , May 2012 ) . 
+ A Qubit © R Quantitation Platform and HS dsDNA kit ( Invitrogen , Paisley , UK ) were then used to test and quantify the cDNA libraries . 
+ Six cDNA libraries , including one for each of the 6 treatments , were sequenced in the same lane of the Illumina © R HiSeq 2000 at the Iowa State University DNA facility ( 4 lanes for the 24 cDNA libraries ) with single end 100 bp cycles . 
+ Read Quality Control, Alignment, and Reads Number Calculation
+ Fastx toolkit software ( version 0.0.13 ) was used to remove the adapter for each read , and quality of RNA-seq reads from all the samples was checked using FastQC software ( version 0.10.1 ) keeping a Phred score of 32 . 
+ Then the filtered reads from each sample were separately aligned to the Gallus gallus 4.0 reference genome from Ensembl using TopHat2 ( version 2.0.9 ) and Bowtie ( version 2.1.0 ) software with default parameters . 
+ The abundance of reads for all annotated genes was counted using the HTseq software package ( version 0.5.4 p3 ) in Python . 
+ Statistical and Biological Analysis
+ To test the samples ' relationship , Qlucore Omics Explorer ( version 3.0 ) was used to conduct principal component analysis ( PCA ) by using the read count data from the 24 samples . 
+ Then , the software package edgeR 
+ ( version 3.0.8 ) was run in R software ( version 2.15.3 ) to identify differentially expressed ( DE ) genes . 
+ The generalized linear model ( GLM ) analysis in edgeR based on the negative binomial distribution was applied . 
+ Then relevant linear contrasts were constructed to compare treatment conditions . 
+ The Benjamini-Hochberg method was used to control false discovery rate ( FDR ) ( Benjamini and Hochberg , 1995 ) at 5 % . 
+ To avoid gene length bias , the GOseq package ( version 1.10.0 ) ( Young et al. , 2010 ) was utilized for further gene ontology ( GO ) and pathway analysis while controlling FDR at 5 % . 
+ Animal systems biology analysis and modeling center ( ASBAMC ) was used to generate the significant pathways . 
+ Candidate Genes for qPCR Validation
+ Quantitative real time PCR ( qPCR ) was performed to measure the mRNA expression levels of 11 selected genes ( IL7 , IL7R , LCK , ZAP70 , CD3Z , IL18 , IL8 , IFNGR , NOD1 , LIG4 , TLR6 ) using the same 24 RNA samples used for sequencing . 
+ The gene selection criteria were involvement in multiple immune response pathways and significance in the RNAseq analysis . 
+ An internal control gene ( 28 S rRNA ) was used for normalization of the initial concentration of RNA . 
+ Primers were designed for amplifying fragments in the qPCR reaction using sequences from NCBI and Primer 3 ( Rozen and Skaletsky , 2000 ) . 
+ Primer sequence detail is displayed in Table 1 . 
+ qPCR was performed in triplicate on individual thymus samples . 
+ Reactions of qPCR were carried out using the QuantiTect SYBR Green kit ( Qiangen Inc. , Valencia , CA ) as described by Redmond and co-workers ( Redmond et al. , 2010 ) . 
+ The following equation was used to calculate the adjusted cycle threshold ( Ct ) values : 40 -- [ Ct target gene mean + ( Ct 28S median -- Ct 28S mean ) ( slope of target gene/slope of 28S ) ] . 
+ The Fit Model procedure in JMP software ( SAS Institute Inc. , Cary , NC ) was used to analyze the Ct value . 
+ Relative gene expression values were calculated for different treatment contrasts . 
+ Availability of Supporting Data
+ The RNAseq data can be obtained from the NCBI Gene Expression Omnibus ( GEO ) database with the accession number GSE69014 
+ RESULTS
+ Twenty-four individual thymus samples were analyzed by RNAseq . 
+ These included one sample from each of 6 treatment conditions from 4 biological replicates . 
+ After sequencing the cDNA libraries , the average total raw reads were 26.59 million . 
+ By trimming the adaptor contamination using the Fastx toolkit and FastQC quality control , the average number of clean reads over all samples was 25.14 million . 
+ The number of raw and clean reads for each treatment group is displayed in Figure 2A . 
+ Using TopHat2 , an average of 82.96 % of the reads mapped back to the reference genome and the unique mapped reads accounted for an average of 78.62 % ( Figure 2A ) . 
+ Examination of the total mapped reads distribution is illustrated in Figure 2B . 
+ Distribution of reads among the 6 treatment groups was relatively consistent . 
+ On average , 74.44 % of the reads mapped to exons , including 48.00 % CDS exons , 22.16 % 3 ′ UTR exons , and 4.28 % 5 ′ UTR exons ( Figure 2B ) . 
+ There were 21.59 % and 3.97 % of the reads located into introns and intergenic , respectively ( Figure 2B ) . 
+ After alignment , the average number of reads for all samples was 12.53 million using HTseq counting . 
+ The average transcriptome coverage , i.e. , the number of detected transcripts over the total annotated transcripts , was 85.89 % . 
+ To further explore the relationship among the total 24 samples , PCA was used to cluster similar samples in multivariate space . 
+ The PCA results showed that the 5 dpi susceptible birds were distinct from the other 5 treatment groups ( Figure 3 ) . 
+ Additionally , 1 dpi susceptible birds differed slightly from the 4 groups : 1 dpi resistant , 5 dpi resistant , 1 dpi non-challenged , and 5 dpi non-challenged birds . 
+ Variability among replicates in each treatment group was low and the clear separation of the different groups indicated that susceptible birds possess a unique characteristic expression pattern that was greatly different from resistant and from nonchallenged birds . 
+ Analysis of Differentially Expressed (DE) Genes
+ From a total of 16,693 detected transcripts , 2,484 transcripts were novel . 
+ After keeping genes with read counts above one count per million for at least 3 samples in at least one treatment group and removing the other low-expression reads , 11,585 transcripts were statistically analyzed . 
+ Comparisons of gene expression with respect to treatment , time , and pathology effects were carried out to identify candidate genes that respond to APEC infection . 
+ Nine total contrasts were constructed for interesting 2-way comparisons . 
+ The numbers of up-regulated DE transcripts were greater than those of down-regulated ones for most of the 9 contrasts ( Table 2 ) . 
+ Tests for 4 comparisons ( pair-wise contrasts ) identified large numbers of DE genes : 1 dpi susceptible vs. 1 dpi non-infected birds , 5 dpi susceptible vs. 5 dpi non-infected birds , 5 dpi susceptible vs. 5 dpi resistant birds , and 5 dpi vs. 1 dpi susceptible birds ( Table 2 ) . 
+ However , tests for the other comparisons detected only a few DE genes ( N < 25 ) . 
+ There were 158 DE genes detected comparing 1 dpi susceptible vs. 1 dpi noninfected birds . 
+ Thousands of DE genes were identified when comparing 5 dpi susceptible vs. 5 dpi non-infected birds ; 5 dpi susceptible vs. 5 dpi resistant birds ; and 5 dpi vs. 1 dpi susceptible birds . 
+ These results indicate that there were large differences between 5 dpi susceptible birds and 5 dpi resistant birds , and between 5 dpi susceptible birds and 5 dpi non-challenged birds . 
+ However , resistant birds differed little from the non-infected birds . 
+ The transcriptomic response of susceptible birds greatly increased over time post infection , whereas a time-related response increase did not occur in resistant and non-infected birds 
+ Signiﬁcant GO Terms Analysis
+ To provide sufficient genes for common biological process analysis , the 4 comparisons with the largest numbers of DE genes were used for further analysis . 
+ The false discovery rate was controlled at 5 % for all the significant GO terms and pathways in the following results and discussion . 
+ In the contrast of 1 dpi susceptible vs. non-challenged birds , the DE genes were mainly involved in these top 3 significant GO terms : defense response to bacterium , defense response , and response to bacterium . 
+ The comparisons of 5 dpi susceptible vs. 5 dpi non-infected birds and of 5 dpi susceptible vs. 5 dpi resistant birds had the top significant GO terms of immune response , tolllike receptor signaling pathway , T/B cell activation , and 
+ T-cell lineage commitment . 
+ With passage of time post infection ( 5 dpi vs. 1 dpi ) , the susceptible birds ' response mainly focused on natural killer cell differentiation , myeloid progenitor cell differentiation , lymphoid progenitor cell differentiation , and lymphocyte differentiation GO terms . 
+ Signiﬁcant Pathways Analysis
+ These 4 comparisons also had significantly changed pathways in response to APEC infection . 
+ Generally , phagosome , lysosome , toll-like receptor ( TLR ) signaling pathway , Jak-STAT signaling pathway , cell adhesion molecules ( CAMs ) , ECM-receptor interaction , and cytokine-cytokine receptor interaction were dramatically induced in these 3 comparisons : 5 dp susceptible vs. 5 dpi non-challenged birds , 5 dpi susceptible vs. 5 dpi resistant birds , and 5 dpi vs. 1 dpi susceptible birds . 
+ Moreover , T-cell receptor ( TCR ) signaling pathway was strongly suppressed in the above 3 contrasts . 
+ Also , cell cycle and p53 signaling pathways were significantly suppressed in the contrast of 5 dpi susceptible vs. 5 dpi non-challenged birds and of 5 dpi susceptible vs. 5 dpi resistant birds . 
+ Cell cycle also was detected in 1 dpi susceptible vs. 1 dpi non-challenged birds . 
+ Figure 4 showed the DE genes that were involved in the significant pathways in the 4 contrasts . 
+ Detailed information of DE genes of the significant pathways for the 4 contrasts was displayed in Table S1-S4 . 
+ These results indicate that compared to resistant birds , susceptible birds extensively initiate their pathways of immune response , signal transduction , and signal molecules and interaction to resist APEC infection . 
+ However , the Tcell differentiation and proliferation and cell growth are significantly impaired in susceptible birds . 
+ Validation of RNAseq Data
+ To validate the RNAseq data , qPCR was performed on the following 11 genes selected from immune related genes that were significantly DE in RNAseq : IL7 , IL7R , LCK , ZAP70 , CD3Z , IL18 , IL8 , IFNGR , NOD1 , LIG4 , TLR6 . 
+ The qPCR results for 10 of 11 selected genes conformed to the same direction of fold change and significance as those in RNAseq data ( Table 3 ) . 
+ A close correlation ( 93.42 % ) in the expression level was between qPCR results and RNAseq data . 
+ Only one gene , CD3Z , was not significantly DE in the qPCR experiment ; however , the CD3Z expression pattern in the qPCR experiment conformed to the same direction as for RNAseq ( Table 3 ) . 
+ DISCUSSION
+ The novel experimental design of the current study enabled characterization of the resistance and susceptibility mechanisms of different phenotype birds in response to APEC infection through chicken thymus transcriptome analysis . 
+ The PCA of the thymus transcriptome of different phenotype birds , together with the identified DE genes in different contrasts ( Figure 3 and Table 2 ) , demonstrated that it was appropriate to classify the challenged birds as resistant or susceptible birds based upon their total lesion scores . 
+ Nakamura et al. ( 1985 ) demonstrated that marked atrophy of the thymus and bursa were observed in natural colibacillosis of broiler chickens , and the relative weights of the thymus and bursa were dramatically decreased at 1 dpi ( Nakamura et al. , 1986 ) . 
+ Histologically , the T and B lymphocytes were greatly depleted in the thymus and bursa , respectively , after 1 dpi in colibacillosis of white Leghorn ( Nakamura et al. , 1986 ) . 
+ These results indicate T and B cells have important functions in bacteria infection . 
+ Thus , the primary lymphoid tissues ( bone marrow , bursa , and thymus ) have critical importance to understand how the host 's primary immune organs respond to systemic APEC infection . 
+ Transcriptome analyses of bone marrow and bursa have been published on investigations of the earliest phases of immune response to systemic APEC infection ( Sun et al. , 2015a ; 2015b ) , as well as the combined analysis of bone marrow , bursa , and thymus to investigate primary lymphoid tissues ' interaction or cooperation ( Sun et al. , 2016 ) . 
+ To date , however , the gene expression patterns in the thymus of resistant and susceptible birds under systemic APEC infection have not been reported . 
+ The thymus is an essential primary lymphoid organ , providing an appropriate environment for T cell precursor development , differentiation , and maturatio 
+ ( Rose , 1979 ) and unique pathway changes were identified in thymus transcriptome analysis , compared to the results of combined analysis of primary lymphoid tissues . 
+ In the current study , the TLR signaling pathway , lysosome pathway , CAMs , and TCR signaling pathway were the major response mechanisms in the thymus after APEC infection . 
+ In the comparison of combined analysis of primary lymphoid tissues ( Sun et al. , 2016 ) , TLR and CAM were the unique pathway changes in the thymus . 
+ The TLR is the major innate immune response modulator for chicken resistance to APEC infection . 
+ TLRs can recognize pathogen-associated molecular patterns ( PAMPs ) to trigger inflammatory cascades ( Martinon and Tschopp , 2005 ; Akira et al. , 2006 ) . 
+ The TLR4 protein bound to Gram-negative bacteria can interact with TIR-domain-containing adaptor proteins ( MyD88 , MAL , and IRAK4 ) to transmit signals , inducing MAPK ( mitogen-activated protein kinases ) signaling pathway activation and inflammatory cytokines ( Akira et al. , 2001 ; Werling and Jungi , 2003 ; Akira , 2006 ; Sutterwala et al. , 2006 ) . 
+ Moreover , TLR6 can use the same signaling pathway as TLR4 ( Figure 5 ) . 
+ TLR5 can bind to flagellin to activate cytokine IL8 expression and inflammatory response 
+ ( Hayashi et al. , 2001 ) . 
+ In the current study , TLR6 ( TLR1LA ) , TLR4 , TLR5 , and IL8 were all overexpressed in 5 dpi susceptible vs. 5 dpi non-challenged and 5 dpi susceptible vs. 5 dpi resistant birds ( Figure 5 ) . 
+ These results may indicate that susceptible birds attempt to trigger high levels of activation of the innate immune response to resist the systemic APEC infection , compared to resistant and non-challenged birds . 
+ MAPK and ERK have important functions in signal transduction under cellular stresses ( Davis , 1993 ; Kyriakis and Avruch , 1996 ) . 
+ Currently , controversial evidence showed signal transduction pathway JNK and MAPK had a complex role in transmitting a distinct cellular effect in different cell lineages ( Huh et al. , 2004 ) . 
+ For example , MAPK signaling was activated when pathogenic bacteria invaded ( Watanabe et al. , 2001 ) . 
+ Activation of ERK1/2 , JNK , and p38 MAPK was induced in the infection of epithelial cell lines with Listeria monocytogenes , Salmonella enterica , or enteropathogenic Escherichia coli ( EPEC ) ( Chen et al. , 1996 ; Hobbie et al. , 1997 ; Czerucka et al. , 2001 ) . 
+ In our study , the p38 ( MAPK11 , MAPK12 , and MAPK13 ) , ERK ( MAPK1 ) , and JNK ( MAPK9 ) genes were overexpressed in 5 dpi susceptible vs. 5 dpi non-challenged birds and 5 dpi susceptible vs. 5 dpi resistant birds . 
+ It seems that susceptible birds activated signalin transduction pathways to protect cell survival under systemic APEC , compared to resistant and nonchallenged birds . 
+ Moreover , the TLR signaling pathway also produced the costimulatory molecules ( CD40 , CD80 , and CD86 ) to stimulate T cells ( Melief et al. , 2002 ; Severa et al. , 2007 ) . 
+ In the current study , CD40 , CD80 , and CD86 were more highly expressed in 5 dpi susceptible birds than in 5 dpi non-challenged and 5 dpi resistant birds ( Figure 6 ) . 
+ The same phenomenon was also observed in the contrasts of 5 dpi vs. 1 dpi susceptible birds ( Figure 6 ) . 
+ Additionally , TLR6 ( TLR1LA ) , TLR4 , TLR5 , MAPK1 , and CD40 also were significantly changed in bone marrow in 5 dpi susceptible vs. 5 dpi non-challenged birds ( Sun et al. , 2015a ) . 
+ TLR4 and CD40 were also DE in bone marrow in 5 dpi susceptible vs. 5 dpi resistant birds ( Sun et al. , 2015a ) . 
+ These genes might be potential biomarkers for chicken host response to APEC infection . 
+ CD40 also was involved in the significantly changed pathways CAMs . 
+ Here , the CAMs pathway was strongly induced in the thymus in 5 dpi susceptible vs. 5 dpi resistant birds and 5 dpi susceptible vs. 5 dpi non-challenged birds . 
+ The VCAM1 , ITGB1 , and ITGA6 genes were all more highly expressed in 5 dpi susceptible birds than in 5 dpi non-challenged and 5 dpi resistant birds in the current study , strongly suggesting important roles of these genes . 
+ The highly induced CAMs pathway , together with previous reports of thymus atrophy and T lymphocyte depletion under colibacillosis ( Nakamura et al. , 1985 ; Nakamura et al. , 1986 ) , indicates that CAMs might be the major local tissue repair mechanism after APEC infection . 
+ As the thymus provides the essential environment for T-cell development and maturation , many distinct stages of T-cell development were marked with changes in gene expression under APEC infection . 
+ The TCR signaling is a critical requisite signal to initiate T-cell selection , proliferation , activation , and response magnitude in mice one d after Listeria infection ( Zehn et al. , 2009 ) . 
+ The interaction between antigen peptide and MHC complexes can activate the TCR signal to trigger a complex downstream series of signaling cascades that can result in a variety of outcomes ( Anderson et al. , 1996 ; Kannan et al. , 2012 ) . 
+ The proximal signaling events include activation of Src tyrosine kinase Lck , phosphorylation of ITAMs in the TCR/CD3 complex , recruitment and activation of ZAP70 , phosphorylation of LAT , recruitment of a variety of signaling molecules , and the activation of NFAT and NF-kB ( Irving and Weiss , 1991 ; Chan et al. , 1992 ; Letourneur and Klausner , 1992 ; Bubeck et al. , 1996 ; Zhang et al. , 1998 ; Smith-Garvin et al. , 2009 ) . 
+ In the present study , the key genes ( CD3Z , LAT , ZAP70 , GRAP2 , and VAV ) in the TCR signal had reduced expression levels in the 3 contrasts of 5 dpi susceptible vs. 5 dpi non-infected birds , 5 dpi susceptible vs. 5 dpi resistant birds , and 5 dpi vs. dpi susceptible birds ( Figure 6 ) . 
+ Deficiency of PDCD1 , a co-inhibitory receptor expressed on T cells , can promote autoimmunity ( Latchman et al. , 2004 ; Keir et al. , 2006 ; Hirahara et al. , 2012 ) . 
+ This gene was also downregulated in 5 dpi susceptible birds compared to 5 dpi resistant or 5 dpi non-challenged birds . 
+ Collectively , the TCR signal was deeply impaired in susceptible birds , which indicates T-cell proliferation , activation , differentiation , and maturation are significantly impaired by APEC infection in susceptible birds . 
+ Moreover , CD3Z was also significantly DE in bone marrow in 5 dpi susceptible compared to 5 dpi resistant birds ( Sun et al. , 2015a ) , indicating this gene is a positive marker of resistance in birds . 
+ Expression of NFATC can result in T-cell anergy and NFKBIE can inhibit NF-kB transactivation ( Whiteside et al. , 1997 ; Heissmeyer et al. , 2004 ) . 
+ These 2 genes both exhibited higher expression in 5 dpi susceptible birds , indicating damage of the TCR signal . 
+ T cells are activated not only by antigen presentation signals but also by co-stimulatory molecules for negative and positive regulatory signal transduction pathways ( De Koker et al. , 2011 ) . 
+ CTLA4 can interact with CD80 or CD86 to terminate T-cell activation and result in cell-cycle arrest ( Alegre et al. , 2001 ) . 
+ In the current study , expression of CTLA4 and CD86 were increased in 5 dpi susceptible vs. 5 dpi non-challenged birds and 5 dpi vs. 1 dpi susceptible birds . 
+ These results suggest that APEC infection suppresses T-cell activation in susceptible birds . 
+ Moreover , IL7 exerts a significant impact on naive T-cell survival , proliferation , and homeostasis in mammals ( Hsu and Mountz , 2010 ; Vicente et al. , 2010 ; Hong et al. , 2012 ) . 
+ Hsu and Mountz ( 2010 ) reported that the interaction between IL7 and IL7R could lead to proliferation and progression of T cells . 
+ IL7 and IL7R also play pivotal roles in the development of γδ T cells ( Watanabe et al. , 1991 ; Plum et al. , 1993 ) . 
+ IL7R can also be highly expressed in CD4 + and CD8 + cells and correlated with T-cell activation status in chickens ( Van Haarlem et al. , 2009 ) . 
+ IL7 signaling is a negativefeedback loop ( IL-7R → CD8 → TCR IL-7R ) that drives cell-intrinsic IL7R and TCR oscillatory signaling ( Huang and August , 2015 ) . 
+ In the present study , IL7 and IL7R had increased expression levels in 5 dpi susceptible vs. 5 dpi resistant birds and 5 dpi vs. 
+ Conclusion
+ The current study provides novel evidence that , in susceptible birds , T-cell development , activation , and cell cycle progression are impaired by APEC infection through reduced expression of regulatory genes in TCR signaling , while the innate immune response is activated through cross-talk among multiple signaling pathways . 
+ Infection with APEC induces very few transcriptomic differences between challenged-resistant and non-challenged birds . 
+ Taken together , the transcriptome analysis of thymus tissue during APEC infection demonstrates that both T-cell development and immune response mechanisms concurrently contribute to avian resistance to APEC infection . 
+ Moreover , many genes , especially TLR4 , CD40 , CD3Z , were identified as potential markers for host resistance to APEC infection . 
+ The CAM pathway might be a major local tissue repair mechanism after APEC infection . 
+ These findings contribute to the knowledge of the transcriptomic response in the thymus of genes that are involved in the earliest phases of the immune response to APEC , including those that drive the subsequent cellular immune reaction . 
+ The current study is foundational to the identification of genetic variation that differentiates birds that are susceptible or resistant to the pathological effects of APEC . 
+ Abbreviation
+ DE , differentially expressed ; APEC , avian pathogenic Escherichia coli ; ExPEC , extraintestinal pathogenic Escherichia coli ; dpi , day post infection ; PCA , principal component analysis ; GO , gene ontology ; TLR , toll-like receptor ; CAM , cell adhesion molecule ; TCR , T-cell receptor . 
+ Competing Interests
+ The authors declare that there were no competing interests regarding the publication of this paper . 
+ Authors’ Contributions
+ HS isolated RNA from tissues and generated cDNA libraries , analyzed data of the RNAseq experiment , conducted qPCR validation , and wrote the manuscript . 
+ PL , LKN , and SJL conceived the concept , participated in the animal experiments , and revised the manuscript . 
+ All authors read and approved the final manuscript . 
+ ACKNOWLEDGMENTS
+ The authors gratefully acknowledge the assistance of members of the Nolan and Lamont labs in collecting 
+ SUPPLEMENTARY DATA
+ Supplementary data are available at PSCIEN online . 
+ Table S1 . 
+ Differentially expressed genes that are involved in the significantly changed pathways in the contrast of 1 d post infection ( dpi ) susceptible vs. 1 dpi non-challenged birds . 
+ Table S2 . 
+ Differentially expressed genes that are involved in the significantly changed pathways in the contrast of 5 d post infection ( dpi ) vs. 1 dpi susceptible birds . 
+ Table S3 . 
+ Differentially expressed genes that are involved in the significantly changed pathways in the contrast of 5 dpi susceptible vs. 5 dpi non-challenged birds . 
+ Table S4 . 
+ Differentially expressed genes that are involved in the significantly changed pathways in the contrast of 5 dpi susceptible vs. 5 dpi resistant birds .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/27492287.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/27492287.txt 0 → 100644
View file @27818a9
+ Silencing of cryptic prophages in Corynebacterium
+ ABSTRACT 
+ DNA of viral origin represents a ubiquitous element of bacterial genomes . 
+ Its integration into host regulatory circuits is a pivotal driver of microbial evolution but requires the stringent regulation of phage gene activity . 
+ In this study , we describe the nucleoid-associated protein CgpS , which represents an essential protein functioning as a xenogeneic silencer in the Gram-positive Corynebacterium glutamicum . 
+ CgpS is encoded by the cryptic prophage CGP3 of the C. glutamicum strain ATCC 13032 and was ﬁrst identiﬁed by DNA afﬁnity chromatography using an early phage promoter of CGP3 . 
+ Genome-wide proﬁling of CgpS binding using chromatin afﬁnity puriﬁcation and sequencing ( ChAP-Seq ) revealed its association with AT-rich DNA elements , including the entire CGP3 prophage region ( 187 kbp ) , as well as several other elements acquired by horizontal gene transfer . 
+ Countersilencing of CgpS resulted in a signiﬁcantly increased induction frequency of the CGP3 prophage . 
+ In contrast , a strain lacking the CGP3 prophage was not affected and displayed stable growth . 
+ In a bioinformatics approach , cgpS orthologs were identiﬁed primarily in actinobacterial genomes as well as several phage and prophage genomes . 
+ Sequence analysis of 618 orthologous proteins revealed a strong conservation of the secondary structure , supporting an ancient function of these xenogeneic silencers in phage-host interaction . 
+ 1Institute of Bio - und Geosciences , IBG-1 : Biotechnology , Forschungszentrum Jülich , 52425 Jülich , Germany and 2Quantitative and Theoretical Biology , Heinrich-Heine-Universita ̈t Düsseldorf , 40225 , Düsseldorf , Germany 
+ INTRODUCTION
+ Viral DNA , in the form of functional prophages or degenerated ( cryptic ) phage elements , is ubiquitously found in bacterial genomes and may constitute up to 20 % of the host genome ( 1 -- 3 ) . 
+ The mosaic-like structure of bacterial genomes indicates that phage-mediated horizontal gene transfer is a pivotal driver of bacterial evolution ( 4 ) . 
+ Recent 
+ Nucleic Acids Research, 2016, Vol. 44, No. 21 10117–10131 doi: 10.1093/nar/gkw692
+ studies demonstrated that these elements might contribute significantly to the fitness of their respective host by improving stress tolerance , antibiotic resistance , biofilm formation or virulence ( 5,6 ) . 
+ Phage-mediated gene transfer may provide the cell with novel adaptive traits , improving the fitness of the receptor cell , but this does not occur without risks . 
+ The integration of selfish replicators , including transposable elements , integrative/conjugative elements ( ICE ) or phages , can lead to high transcriptional and translational costs or even cell death ( 7,8 ) . 
+ Hence , bacteria possess a number of different systems that confer resistance to foreign genetic elements , e.g. CRISPR/Cas and restriction modification ( RM ) systems ( 9,10 ) . 
+ However , to harness the adaptive potential of foreign DNA and enable its integration into the host regulatory circuitry , bacteria have evolved a rather mediative mechanism called xenogeneic silencing ( XS ) ( 11 -- 13 ) . 
+ This mechanism relies on the function of small nucleoid-associated proteins ( NAPs ) to target and inhibit the expression of foreign DNA , which is recognizable by its typically higher AT content in comparison to the host genome ( 1,14 ) . 
+ The major role of XS proteins is the binding of foreign DNA elements and the inhibition of transcription by a complex formation of AT-rich DNA stretches causing either the occlusion or trapping of the RNA polymerase ( 15,16 ) . 
+ Currently known XS proteins belong to one of four classes , consisting of H-NS-type proteins found in several proteobacteria ( 12,17 ) , Lsr2-like proteins of the actinomycetes ( 18 ) , MvaT of Pseudomonas species ( 16 ) and Rok of Bacillus subtilis ( 19 ) . 
+ To date , most studies have focused on host-encoded XS proteins acting as silencers of foreign DNA . 
+ However , it may also be of benefit for the foreign element to bring its own silencer protein to improve tolerance within the host cell . 
+ Here , we describe a novel prophage-encoded XS protein of the Lsr2-type in Corynebacterium glutamicum ATCC 13032 . 
+ The genome of this important industrial amino acid producer contains three cryptic prophages ( 20,21 ) . 
+ Whereas CGP1 and CGP2 are highly degenerated , CGP3 comprises almost 6 % of the entire genome ( 187 kb ) and is inducible in an SOS-dependent manner ( 22,23 ) . 
+ Even under non-inducing conditions , spontaneous prophage induction ( SPI ) was observed , preceded by a spontaneous activation of the SOS response in > 60 % of cases ( 20,22,23 ) . 
+ However , the precise regulatory control of CGP3 induction has not been studied thus far . 
+ In this study , we demonstrate the essential role of a prophage-encoded NAP , which is a homolog to the mycobacterial Lsr2 protein and functions as a silencer of cryptic phage elements in C. glutamicum ( CgpS , C. glutamicum prophage silencer ) . 
+ Genome-wide profiling of the CgpS -- DNA interaction revealed its association with AT-rich DNA regions located primarily within prophage regions . 
+ Countersilencing of CgpS activity via the expression of its truncated oligomerization domain resulted in the induction of CGP3 , causing cell death . 
+ A bioinformatics analysis revealed homologous proteins mainly in actinomycetes , but , interestingly , also in several phage and prophage genomes . 
+ These data demonstrate the importance of XS proteins for the tolerance of viral DNA and indicate that this mechanism is exploited by both the host and the virus . 
+ MATERIALS AND METHODS
+ Bacterial strains and growth conditions
+ The bacterial strains and plasmids used in this study are listed in Supplementary Table S1 . 
+ Corynebacterium glutamicum ATCC 13032 was used as wild-type strain ( 24 ) . 
+ E. coli DH5 was used as host for cloning procedures and cultivated in Lysogeny Broth ( LB ) medium or on agar plates at 37 ◦ C ( 25 ) . 
+ For growth studies and fluorescence assays ( e.g. preparation of cells for fluorescence microscopy ) , C.glutamicum cells were pre-cultivated in BHI ( brain heart infusion , DifcoTM BHI , BD , Heidelberg , Ger - ◦ many ) medium at 30 C for 6 h . 
+ This first preculture was used to inoculate an overnight culture in CGXII minimal · − medium ( 26 ) containing 2 % ( w/v ) glucose and 30 mg l 1 protocatechuat acid . 
+ The CGXII culture was finally used to inoculate the main culture in the same medium ( CGXII with 2 % ( w/v ) glucose ) to a start OD600 of 1 , unless specified · − · − otherwise . 
+ If necessary , 50 g ml 1 ( E. coli ) or 25 g ml 1 · − ( C. glutamicum ) kanamycin and/or 34 g ml 1 ( E. coli ) or · − 10 g ml 1 ( C. glutamicum ) chloramphenicol were added . 
+ Recombinant DNA work
+ Plasmids and oligonucleotides used in this study are listed in Supplementary Table S2 , respectively . 
+ Standard methods including PCR , DNA restriction and ligation , were performed according to established protocols ( 25 ) . 
+ In some cases , Gibson assembly ( 27 ) was used for the constructions of plasmids . 
+ DNA sequencing and oligonucleotides synthesis were conducted by Eurofins MWG Operon ( Ebersberg , Germany ) . 
+ The chromosomal integration of the Strep tagged cgpS gene variant was performed using the two-step homologous recombination method ( 28 ) . 
+ The 500 bp up and downstream regions of cgpS were amplified using the oligonucleotides LF cgpS pK19 fw and LF cgpS rv and , accordingly , RF cgpS fw and RF cgpS pK19 rv . 
+ Amplification of the Strep-tagged cgpS gene was done by using the plasmid pAN6-cgpS-strep as template for the oligonucleotide pair cgpS strep fw and cgpS strep rv . 
+ The three resulting PCR products and the digested pK19mobsacB plasmid ( with BamHI , EcoRI ) were assembled using Gibson assembly ( 27 ) . 
+ Correct integration into the cgpS locus was confirmed by sequencing of the colony PCR product with the oligonucleotides Cgps indel-fw and CgpS indel rv . 
+ Cultivation in the BioLector System
+ Growth experiments were performed predominantly in the BioLector ® microcultivation system of m2p-labs ( Aachen , Germany ) as described by ( 29 ) . 
+ Cultivation was performed in 48-well FlowerPlates ( m2p labs , Germany ) at 30 ◦ C and a shaking frequency of 1200 rpm . 
+ The cells were cultivated in 750 l of CGXII minimal media with 2 % ( w/v ) glucose containing different additives ( e.g. Isopropyl - D-1-thiogalactopyranoside ( IPTG ) , MMC , kanamycin ) , as indicated . 
+ Measurements were taken at 15-min intervals . 
+ DNA affinity chromatography with the promoter region of alpAC
+ The promoter region of alpAC was amplified by PCR with the oligonucleotides PalpAC-Biotin-Tag-fw and PalpAC rv ( product size 516 bp ) . 
+ To flag the amplified product further PCRs were performed but with the Biotin-Primer ( MWG Eurofins , Ebersberg , Germany ) and the PalpAC rv . 
+ At least 220 pmol of the biotinylated products were purified by size exclusion chromatography with the usage of an 8 ml sepharose s400-HR column from GE Healthcare ( Freiburg , Germany ) . 
+ A total of 5 mg of the M-280 Streptavidin Dynabeads ® ( Invitrogen , Carlsbad , CA , USA ) were washed twice with the binding and wash ( BW ) buffer ( 10 mM Tris-HCl pH 7.5 , 2 M NaCl ) , subsequently suspended in BW buffer containing biotinylated products and incubated for 1 h at room temperature . 
+ To eliminate unbound DNA fragments the beads were washed three times with the BW buffer and finally suspended in the binding and storage ( BS ) buffer ( 20 mM Tris-HCl pH 7.5 , 1 mM EDTA , 10 % ( v/v ) glycerin , 0.01 % ( v/v ) Triton-X-100 , 100 mM NaCl , 1 mM DTT ) . 
+ A total fo 500 ml of cells were grown in CGXII minimal media with glucose as carbon source ( as described in bacterial strains and growth conditions ) to an OD600 of ∼ 5 . 
+ After the cells were harvested by centrifugation ( 20 min , 5300g and washed once with phosphate buffered saline ( PBS ) buffer ( 137 mM NaCl , 2.7 mM KCl , 20 mM Na2HPO4 , 1.8 mM KH2PO4 ) , cell pellets were suspended in BS buffer supplemented with 1 mM phenylmethylsulfonyl fluoride ( PMSF ) . 
+ Cell disruption was performed by five passages at 172 MPa through a French pressure cell ( Heinemann , Schwaebisch Gmuend , Germany ) . 
+ The DNA binding reactions were set up with complete prepared crude extracts , the DNA-coupled beads and 500 g of chromosomal DNA for 45 min at room temperature . 
+ After the binding reaction , beads were washed once with BS buffer , twice with BS buffer and 400 g chromosomal DNA and , as a final washing step , again with BS buffer . 
+ The elution was fulfilled in two subsequent steps with BS buffer containing 2 M sodium chloride . 
+ After TCA precipitations ( 30 ) of the pooled elution fractions the samples were analyzed vi sodium dodecyl sulfate-polyacrylamide gel electrophoresis ( SDS-PAGE ) ( 31 ) . 
+ Identification of proteins was conducted by MALDI-ToF analysis as described in the section below . 
+ Preparation of ChAP-Seq samples
+ Cells of the wild-type strain ATCC 13032 and the variant containing the Strep-tagged CgpS protein ( WT : : cgpS-strep ) were first grown in BHI for 6 h and then 1 ml was used to inoculate minimal media cultures ( CGXII with 2 % ( w/v ) glucose ) . 
+ After cultivation overnight , these precultures were used to inoculate 500 ml of the same minimal medium , were grown to an OD600 5 to 6 , and finally harvested by centrifugation ( 10 min , 11 325g at 4 ◦ C ) . 
+ After washing the cells with CGXII medium without ( w/o ) MOPS , the cells were resuspended in 10 ml MOPS-free CGXII containing 1 % ( v/v ) formaldehyde . 
+ The fixation was conducted by incubation at room temperature for 20 min . 
+ Subsequently , glycine was added to a final concentration of 125 mM and the cells were incubated for further 5 min at room temperature . 
+ Then , the cells were washed twice with buffer A ( 100 mM Tris-HCl , pH 8.0 , 1 mM EDTA ) and resuspended in 10 ml buffer A supplemented with cOmplete Protease Inhibitor ( Roche , Basel , Switzerland ) and 5 mg RNase A. Cell disruption was performed as described in the DNA affinity chromatography section ( five passages through a French Press cell ) . 
+ The chromosomal DNA of the lysates were sheared by sonication 3 × 30 s with a Branson sonifier 250 ( Heinemann , Schwaebisch Gmuend , Germany ) using a pulse length of 40 % and an intensity of one to give an average fragment size of 200 -- 1500 bp as confirmed by agarose gel electrophoresis . 
+ Cell debris was first removed by centrifugation at 5300g for 20 min and then centrifuged for 1 h at 150 000g both steps at 4 ◦ C . 
+ The supernatant was used for protein -- DNA purification according to the standard Strep-tag ® purification protocol ( see below , protein purification ) . 
+ The pooled elution fractions were incubated overnight at 65 ◦ C , followed by a treatment with proteinase K ( final concentration 400 mg · ml − 1 ) for 3 h at 55 ◦ C. Finally , the DNA of the samples was purified by phenol -- chloroform extraction ( 32 ) , precipitated with ethanol , washed with 70 % ( v/v ) ethanol , dried and resuspended in 50 -- 100 l ddH2O . 
+ ChAP-Seq
+ The obtained DNA fragments of each sample ( 2 g ) were used for library preparation and indexing using the TruSeq DNA PCR-free sample preparation kit according to the manufacturer 's instruction , yet omitting the DNA size selection steps ( Illumina , Chesterford , UK ) . 
+ The resulting libraries were quantified using the KAPA library quant kit ( Peqlab , Bonn , Germany ) and normalized for pooling . 
+ Sequencing of pooled libraries was performed on a MiSeq ( Illumina , San Diego , US ) using paired-end sequencing with a read-length of 2 × 150 bases . 
+ Data analysis and base calling were accomplished with the Illumina instrument software and stored as fastq output files . 
+ The obtained sequencing data of each sample were imported into CLC Genomics Workbench ( Version 7.5.1 , Qiagen Aarhus A/S ) for trimming and base quality filtering . 
+ The output was mapped to accession BX927147 as C. glutamicum reference genome 
+ ( 21 ) . 
+ For peak detection the resulting mapping coverage of each sample was exported and imported into the in-house software Genome Data Viewer ( unpublished ) . 
+ A peak was automatically annotated if the coverage of a region is above the 3-fold average of the averaged genome coverage . 
+ All peaks were inspected and confirmed manually . 
+ The relative amount of circular phage DNA was determined via quantitative PCR ( qPCR ) . 
+ Therefore , C. glutamicum wild type cells containing empty pAN6 plasmid ( control ) , pAN6-cgpS gene or pAN6-N-cgpS were grown in 48-well FlowerPlates containing CGXII minimal medium at 30 ◦ C and 900 rpm in a microtron ( Infors-HT , Bottmingen , Switzerland ) . 
+ The overexpression of cgpS and the Nterminal part were induced with 150 M IPTG ( for control samples no IPTG was added ) . 
+ After 24 h , 750 l of the cells were harvested and the DNA was extracted using the NucleoSpin microbial DNA Kit ( Macherey Nagel , Dueren , Germany ) and DNA concentration was quantified using a nanophotometer ( Implen , München , Germany ) . 
+ Each sample contained 1 g total DNA as a template . 
+ For the reaction an innuMIX qPCR MasterMix SyGreen ( Analytic Jena , Jena , Germany ) and a qTOWER 2.2 ( Analytic Jena ) was used . 
+ The reaction protocol was divided into two parts ( i ) polymerase chain reaction ( PCR ) ( ( a ) 3 min preincubation at 95 ◦ C , ( b ) 5 s denaturation at 95 ◦ C , ( c ) 25 s elongation at 62 ◦ C , 40x repetition of step ( b ) to ( c ) ) and a ( ii ) melting curve analysis ( T = 1 ◦ C/6 s ) . 
+ The PCR product size using oligonucleotides belonging to the circular phage product is 150 bp ( listed in Supplementary Table S2 ) . 
+ As reference gene ddh was used with the oligonucleotides listed in Supplementary Table S2 resulting in a 150 bp product . 
+ For data analysis the qPCR software qPCR 3.1 ( Analytik Jena ) and the Livak method were used ( 33 ) to determine the 2 − Ct based on the measured CT-values . 
+ DNA microarrays
+ For a comparative transcriptome analysis of C. glutamicum ATCC 13032/pAN6 with cells carrying the pAN6-N-cgpS - ( used for countersilencing ) were cultivated in CGXII with 2 % ( w/v ) glucose and 100 M IPTG as described in bacterial strains and growth conditions . 
+ The preparation of labeled cDNA and DNA microarray analysis was performed as described previously ( 34 ) . 
+ Array data were deposited in the GEO database ( ncbi.nlm.nih.gov / geo ) under accession number GSE80674 . 
+ Cultivation and perfusion in microfluidic device
+ For single-cell analysis an in-house developed microfluidic platform was used ( 22,35 -- 37 ) . 
+ Phase-contrast and fluorescence time-lapse imaging was performed at 6 min intervals . 
+ Medium was supplied continuously to ensure stable and constant environmental conditions . 
+ CGXII minimal medium with 2 % ( w/v ) glucose and 25 g · ml − 1 kanamycin was infused at a rate of 300 nl · min − 1 using a high-precision syringe pump ( neMESYS , Cetoni GmbH , Korbussen , Germany ) . 
+ For the expression of the N-terminal part of Cgp 
+ 150 M IPTG were added to the medium . 
+ A constant cultivation temperature of 30 ◦ C was ensured ( PeCon GmbH , Erbach , Germany ) . 
+ The cells were cultivated for 16 h. 
+ Fluorescence microscopy
+ The cultivations were done as described in bacterial strains and growth conditions . 
+ After 6 h of cultivation , 1 -- 3 l were pipetted on a microscope slide coated with a thin 1 % ( w/v ) agarose layer that was based on tris-acetate buffer . 
+ To stain the DNA with the Hoechst Dye , 33 342 1 ml cells were harvested ( 5300g , 5 min ) , subsequently resuspended − in PBS buffer containing 100 ng · ml 1 Hoechst 33342 and incubated at room temperature for 20 min . 
+ Images were taken on an AxioImager M2 ( Zeiss , Oberkochen , Germany ) equipped with a Zeiss AxioCam MRm camera . 
+ Fluorescence was monitored with the filter set 46 HE YFP for eYFP , 63 HE filter was used for mCherry fluorescence and Hoechst fluorescence was examined with the filter set 49 . 
+ An EC Plan-Neofluar 100x/1 .3 Oil Ph3 objective was used . 
+ Images were acquired and analyzed with the AxioVision 4.8 software ( Carl Zeiss ) . 
+ Protein purification
+ CgpS tagged C-terminal with a Strep-tag ® was heterologously produced in E. coli BL21 ( DE3 ) . 
+ Cells were grown to an OD600 of 0.4 at 37 ◦ C. Upon induction with 50 M IPTG the cultivation was continued at 16 ◦ C overnight . 
+ Cells were harvested by centrifugation at 5300g and 4 ◦ C for 10 min and resuspended in buffer B ( 250 mM NaCl , 50 mM Tris-HCl , pH 7.5 ) . 
+ Cell disruption was performed by two passages through a French pressure cell at 172 MPa . 
+ Cell debris was removed by centrifugation at 20 min , 5300g and 4 ◦ C , followed by an ultracentrifugation ( 60 min , 229 000g , 4 ◦ C ) . 
+ The supernatant was applied to an equilibrated 1 ml Strep-Tactin ® - Sepharose ® ( IBA , Göttingen , Germany ) column . 
+ It was subsequently washed with 10 ml buffer B and the protein was eluted with 10 ml buffer B containing 1 mM d-desthiobiotin ( Sigma Aldrich ) . 
+ Electrophoretic mobility shift assays (EMSA)
+ EMSA studies of CgpS and selected DNA regions identified by ChAP-Seq were performed with selected regions ( 500 bp fragments , for oligo sequences see Supplementary Table S3 ) . 
+ The corresponding regions were amplified by PCR and purified by using the PCR clean-up Kit of Macherey Nagel ( Dueren , Germany ) . 
+ The promoter region of gntK was used as control fragment ( 560 bp ) . 
+ A total of 90 ng DNA per lane were incubated with different concentrations ( 1 M and 2 M ) of purified CgpS protein for 20 min in EMSA buffer ( 250 mM Tris-HCl pH 7.5 , 25 mM MgCl , 2 200 mM KCl , 25 % ( v/v ) glycerol ) . 
+ Subsequently , samples were loaded onto a native 10 % polyacrylamide gel ( TBEbased , TBE ( 89 mM Tris base , 89 mM boric acid , 2 mM Na2EDTA , loading dye : 0.01 % ( w/v ) xylene cyanol dye , 0.01 % ( w/v ) bromophenol blue dye , 20 % ( v/v ) glycerol , 1x TBE ) . 
+ The DNA was stained with SYBR Green I ( Sigma Aldrich , St. Louis , MO , USA ) . 
+ Protein pull down and MALDI-TOF analysis
+ C. glutamicum cells containing the plasmids pAN6 , pAN6-cgpS-strep or pAN6-N-cgpS-strep were cultivated as described in bacterial strains and growth conditions . 
+ The cultures were grown in 500 ml CGXII with 2 % ( w/v ) glucose to an OD600 of 5 and subsequently induced with 150 M IPTG for further 4 h . 
+ The cells were harvested ( 5300g , 20 min , 4 ◦ C ) , washed in buffer B ( see protein purification ) and disrupted as descripted in the DNA affinity chromatography section . 
+ Purification was performed as described in the section above . 
+ The eluted fractions were analyzed by SDS-PAGE ( 31 ) using a 4 -- 20 % Mini-PROTEAN ® gradient gel ( Bio Rad , Munich , Germany ) . 
+ The gels were stained with a Coomassie dye based RAPIDstain solution ( GBiosciences , St. Louis , MO , USA ) . 
+ MALDI-TOF-MS measurements were performed with an Ultraflex III TOF/TOF mass spectrometer ( Bruker Daltonics , Bremen , Germany ) for the identification of the proteins as described ( 38 ) . 
+ Homology search
+ BLAST ` nr ' database ( ver . 
+ February 2015 ) was downloaded from NCBI ( http://www.ncbi.nlm.nih.gov/ ) . 
+ CgpS amino acid sequence was extracted from the GenBank file Corynebacterium glutamicum ATCC 13032 , accession : NC 006958.1 and locus tag : cg1966 . 
+ A PSI-BLAST ( ( 39 ) ) search with CgpS sequence as the query was executed against the ncbi nr database . 
+ The e-value threshold was set to 0.005 , the number of iteration was not limited and the search iteration was performed until it converged . 
+ A total of 5230 ( 1920 unique ) homologous hits were achieved from which 618 could be allocated to a particular bacterial species or a phage . 
+ Sequence global identity was calculated by pairwise comparison between the CgpS sequence with all 618 PSI-BLAST hits using the Needleman -- Wunsch algorithm ( 40 ) implemented in the EMBOSS package ( 41 ) needle . 
+ Secondary structure prediction
+ The amino acid sequence of the CgpS protein and the sequences of the 618 homologous hits were used to predict the secondary structure by psipred ( 42 ) . 
+ The visualization of the psipred output was done in R ( 43 ) . 
+ All statistical analysis and data visualization from formatic section was performed in R (43).
+ RESULTS
+ A small nucleoid-associated protein encoded by a cryptic prophage element
+ To decipher the control of prophage induction and activation of cryptic elements in C. glutamicum ATCC 13032 , we performed DNA affinity chromatography with the promoter of the early phage operon alpAC using the crude extract of log-phase cells grown in glucose minimal medium ( ( 34 ) , Figure 1A ) . 
+ SDS-Page analysis of the proteins boun to the alpAC promoter revealed a prominent band corresponding to the 13.4 kDa protein Cg1966 encoded within the CGP3 prophage region ( Figure 1B ) . 
+ In particular , the C-terminal domain of Cg1966 shares significant sequence similarity with the nucleoid-associated protein Lsr2 of Mycobacterium tuberculosis ( Supplementary Figure S1 ) . 
+ This domain corresponds to the DNA binding domain of Lsr2 ( IPR024412 ) , which was previously found to bind AT-rich DNA via an AT-hook motif and functions as a silencer of xenogeneic DNA ( 44,45 ) . 
+ Based on the data described in the following sections , we renamed Cg1966 as CgpS ( Corynebacterium glutamicum prophage silencer ) . 
+ Secondary structure predictions of CgpS as well as of CgpS homologs suggest a significant structural similarity with Lsr2 and reveal the presence of an AT-hook-like motif ` RGI ' between the two predicted C-terminal alpha helices ( Figure 1C ) ( 18,45 ) . 
+ CgpS functions as a silencer of CGP3 activity
+ To study the impact of cgpS expression on the activity of the CGP3 prophage , we overexpressed cgpS in a strain carrying a reporter construct ( WT-Plys-eyfp ) indicative for the activation of CGP3 by the production of the yellow fluorescent protein eYFP under the control of a phage promoter ( 22 ) . 
+ Upon induction with mitomycin C , the control strain carrying the empty plasmid displayed increased reporter activity . 
+ Consistent with our assumption , overexpression of cgpS reduced the reporter output to nearly the background level ( Figure 2A ) . 
+ To study the intracellular localization of CgpS in C. glutamicum cells , we C-terminally fused this protein to mCherry and analyzed its distribution via fluorescence microscopy . 
+ As shown by Hoechst staining , this NAP appeared associated with the nucleoid but formed distinct foci in the cell ( Figure 2B and C ) . 
+ Remarkably , CgpS-mCherry foci co-localized with foci of an AlpA-eYFP fusion that was previously described as a CGP3 DNA adaptor protein binding to the alpAC promoter region ( 34 ) ( Figure 2C ) . 
+ Th functionality of this CgpS-mCherry fusion was confirmed by the counteraction of CGP3 activation upon addition of MMC ( Supplementary Figure S2 ) . 
+ Genome-wide binding profile of CgpS
+ The data of the co-localization experiments suggest binding of CgpS to the CGP3 prophage region . 
+ In the following , the genome-wide binding profile was analyzed by combining affinity chromatography purification of crosslinked CgpS -- DNA complexes followed by sequencing of associated DNA ( ChAP-Seq ) . 
+ For this purpose , we replaced the native cgpS gene in the genome of ATCC 13032 with cgpS-Strep encoding a C-terminal Strep-tagged CgpS variant . 
+ This analysis revealed that CgpS associates with 1.5 % of the ATCC 13032 genome and with ∼ 20.5 % of the cryptic CGP3 prophage region ( Supplementary Figure S3 ) . 
+ In total , 90 peaks were detected , 58 of which were within and 32 were located outside the CGP3 prophage ( Figure 3A , Supplementary Table S4 ) . 
+ The majority of the peak maxima were located within promoter regions ( 60 % ) , but CgpS binding was also observed within genes ( 31 % ) or intergenic regions ( 9 % ) ( Supplementary Figure S4B and C ) . 
+ To deduce a binding motif of CgpS , sequences of the 90 peaks ( Supplementary Table S4 ) were extracted and analyzed using the MEME-ChIP software platform ( 46 ) . 
+ A 21-bp long AT-rich motif was predicted , which was present in 87 of 90 sequences ( Figure 3B ) . 
+ The occurrences of the found DNA binding sites were validated using a FIMO search ( Find Individual Motif Occurrences , ( 47 ) ) in the ATCC 13032 genome , which revealed significant matches ( > 75 % ) of the predicted and experimentally identified CgpS binding sites ( Supplementary Figure S5 ) . 
+ Remarkably , the % GC content of the 90 peak sequences is considerably lower than the average GC content of the ATCC 13032 strain , indicating the preferred binding of CgpS to AT-rich DNA ( Figure 3C ) . 
+ Moreover , the GC contents of the CgpS bound regions within the prophage revealed no significant differences from that of the regions bound outside the prophage ( Figure 3C ) . 
+ Most of the identified CgpS targets were located within the CGP3 prophage and code for hypothetical proteins . 
+ The two strongest signals were found within transposase-encoding genes ( cg1950-cg1951 ) and in the promoter region of cgpS itself , indicating a negative autoregulation similar to that of H-NS ( 48 ) . 
+ Other potential target genes encode the actin-like protein and the corresponding adaptor protein ( alpAC , cg1890 and cg1891 ( 34 ) ) , a resolvase ( cg1929 ) , a prophage primase ( cg1959 ) , a putative phage lysin ( cg1974 ) and a phage integrase ( cg2071 ) , which are spread across the cryptic prophage element . 
+ In addition to regions within CGP3 , CgpS target sites are located in the low GC island 1 ( LCG1 ) , in the cryptic phage element CGP1 , or proximal to transposases encoding genes . 
+ Furthermore , promoter regions of genes coding for R-M systems ( Pcg1028 and PcglIM , ( Pcg1996 ) ) are also bound by CgpS , which in several studies were shown to be transferred horizontally ( 49 -- 52 ) . 
+ A considerably high peak was observed for the promoter region of cg0150 that encodes a putative regulatory protein or toxin possessing a predicted fido domain ( IPR003812 ) . 
+ The binding profile obtained by the ChAP-Seq analysis was validated by EMSAs ( Supplementary Figure S6 ) . 
+ For this purpose , CgpS was purified as a C-terminal Streptag fusion and incubated with DNA fragments covering selected putative CgpS binding sites as identified by ChAP-Seq ( Figure 3D and Supplementary Figure S6 ) . 
+ This in vitro approach confirmed the binding of CgpS for all selected target regions ( including the promoters of cg0150 , alpAC and cgpS itself ) in comparison to the control fragment ( gntK promoter ) ( Figure 3D ) . 
+ Overall , these data are consistent with CgpS acting as a xenogeneic silencer by targeting AT-rich DNA regions , several of which have likely been acquired by HGT . 
+ Countersilencing of CgpS activity
+ Several independent efforts to inactivate the cgpS gene failed ( data not shown ) , suggesting that cgpS represents an essential gene for C. glutamicum ATCC 13032 . 
+ However , previous studies revealed that deletions of all three cryptic phage elements , including the cgpS gene , are possible and do not lead to a significant growth defect of the particular strain ( 53 ) . 
+ In fact , trials to construct an in-frame deletion of cgpS resulted in the isolation of strains lacking large parts of the CGP3 prophage , indicating that the essentiality of cgpS is a consequence of the de-repression of toxic phage genes in the absence of CgpS . 
+ For the conditional inactivation of CgpS , we adapted a countersilencing approach similar to the H-NST system described by Williamson and Free ( 54 ) . 
+ This protein was reported as a truncated H-NS derivative that antagonizes H-NS function by interfering with the multimerization of H-NS . 
+ Co-purification assays with the N-terminal domain of CgpS confirmed the interaction of this truncated varian with the full-length protein ( Figure 4A ) . 
+ Based on previous data and the H-NST mechanism ( Figure 4B ) , we constructed the pAN6-N-cgpS plasmid to overproduce a truncated variant of CgpS ( amino acids 1 -- 65 ) under the control of Ptac . 
+ Homology studies indicated that amino acids 1 -- 65 cover the domain of CgpS required for the oligomerization of this NAP . 
+ Remarkably , production of the truncated CgpS-N domain in the wild-type strain resulted in a significant growth defect , whereas no impact on growth was observed in a strain lacking the CGP3 prophage ( Figure 5A ) . 
+ This finding was supported by single-cell analysis of a strain containing a prophage reporter construct ( Plys-eyfp ) ( 22 ) and the countersilencing construct pAN6-N-cgpS . 
+ Production of the N-terminal domain of CgpS led to a strong increase in fluorescence accompanied by growth arrest and a branched cell morphology ( Figure 5C , Video S1 and S2 ) . 
+ Quantitative real-time PCR revealed a 3-fold increase in the level of circular CGP3 DNA in comparison to uninduced cells , which is consistent with the induction of this cryptic prophage ( Figure 5B ) ( 20 ) . 
+ To monitor the impact of countersilencing CgpS activity on gene expression , we performed a comparative transcriptome analysis ( Figure 5D , Supplementary Table S5 ) . 
+ More than 194 genes were affected , 12 of which exhibited a reduced mRNA level ( mRNA ratio ≤ 0.5 , P-value < 0.05 ) , and 182 genes were upregulated ( mRNA ratio ≥ 2 , P-value < 0.05 ) . 
+ The majority of upregulated genes ( 148 ) were genes of the prophage CGP3 . 
+ Additional genes that displayed an increased mRNA level were the ferritin gene ( ftn , cg2782 ) and cg1517 of the CGP1 prophage ( Supplementary Table S5 ) , both of which were also identified as putative CgpS targets by ChAP-Seq . 
+ Together , these data demonstrate that CgpS is an essential NAP due to its function as a silencer of cryptic phage elements inC. glutamicum . 
+ CgpS homologs are found in actinomycetes and their phages
+ Our data support a function for CgpS as a xenogeneic silencer that binds to AT-rich DNA similar to the Lsr2 of M. tuberculosis as well as the H-NS of E. coli . 
+ This is underlined by the fact that both proteins , Lsr2 and CgpS , are able to complement the phenotype of an hns mutant strain ( ( 18 ) , Supplementary Figure S7 ) . 
+ These findings highlight the conserved mechanism of a highly diverse set of proteins . 
+ In the following , we overexpressed the Nterminal oligomerization domains of CgpS orthologs from Corynebacterium amycolatum DSM 44737 ( CORAM0001 2081 ) and Corynebacterium diphtheria DSM 44123 ( CDC7B 2240 ) and the Lsr2 from M. tuberculosis H37R ( Rv3597c ; Lsr2 ) ( Figure 6A and B ) . 
+ Whereas the production of the oligomerization domain strongly affected cellular growth in all cases ( Figure 6A ) , only the N-terminal domain of the ortholog of C. amycolatum ( DSM 44737 ) led to a significant induction of CGP3 ( Figure 6B ) . 
+ No significant reporter output was observed with production of the truncated orthologs of C. diphtheria or M. tuberculosis , suggesting a high level of plasticity within this family of xenogeneic silencers ( Figure 6B ) . 
+ Furthermore , we used a bioinformatics approach to obtain a more general overview of the distribution of CgpS orthologous proteins . 
+ For this purpose , a PSI-BLAST 
+ ( Position-Specific Iterated BLAST ) search was performed on CgpS and resulted in 5230 hits , of which 1920 protein sequences were unique ( threshold e-value ≤ 0.005 ) . 
+ Of these , 98.3 % were found in the domain of bacteria and 1.7 % in phages , mostly belonging to the Siphoviridae ( Figure 7A , Supplementary Table S6 ) . 
+ Of 302 bacterial genomes containing prophage regions predicted by PhiSpy ( 55 ) , 22 contain cgpS orthologs ( Supplementary Table S6 ) . 
+ The remaining 280 hits were found outside of any predicted prophage region . 
+ Moreover , secondary structure predictions were performed for 618 unique sequences , which were clearly assigned to bacterial or phage species , exhibiting high resemblances . 
+ The structural similarity suggests a common function , although the identity of the amino acid sequences is low ( ∼ 23 % ) ( Figure 7B , C and Supplementary Figure S10 ) . 
+ XS exclusion hypothesis
+ A recent bioinformatics study on the distribution of XS genes revealed that members of the same family can appear within a particular species but that members of different families are never found together ( 56 ) . 
+ To test the proposed exclusion mechanism , we expressed the hns gene from E. coli MG1655 in a C. glutamicum ATCC 13032 strain containing the prophage reporter ( : : Plys-eyfp ) . 
+ As expected , the overexpression of hns caused a severe growth defect , coinciding with a highly increased output of the prophage reporter ( Figure 6C and D ) . 
+ The effect of hns overexpression was comparable to the countersilencing of CgpS activity with the production of a truncated CgpS variant ( Figure 6E ) . 
+ When hns was expressed in a CGP3 background the effect on growth was only moderate ( Supplementary Figure S8 ) . 
+ However , hns expression still negatively affected the growth of the CGP3 mutant strain which can likely be explained by unspecific binding and interference of H-NS at other genomic regions . 
+ These findings are in agreement with the hypothesis that different XS proteins interfere at AT-rich DNA regions , leading to a disruption of silencing complexes and thereby to an activation of foreign DNA elements . 
+ Nevertheless , in some cases the scenario is clearly more complex , as illustrated by the finding that the expression of cgpS in the E. coli wild-type strain was not able to counteract H-NS expression at the bgl operon ( Supplementary Figure S9 ) . 
+ DISCUSSION
+ CgpS functions as a silencer of cryptic phage elements
+ In this study , we identified the prophage-encoded XS protein CgpS that inherits an essential role as a silencer of cryptic prophages in C. glutamicum . 
+ Genome-wide profiling of CgpS binding sites reveals an association of this protein to AT-rich DNA stretches primarily located within horizontally acquired genomic islands and shows a remarkable accumulation of binding sites within the large and cryptic CGP3 prophage . 
+ Countersilencing of CgpS activity by overproduction of its N-terminal oligomerization domain resulted in a strong increase in CGP3 activity leading to cell death . 
+ Furthermore , several CgpS binding sites were identified outside the CGP3 region , and the essentiality of the cgpS gene was attributed to the presence of the CGP prophage . 
+ This is consistent with the finding that the cgpS gene is located on the CGP3 island , suggesting that evolution favored a physical association between this XS and its main target . 
+ Sequence analysis of CgpS revealed a low sequence identity ( 27 % , Supplementary Figure S1 ) with the mycobacterial Lsr2 protein that was described in previous studies as an H-NS-like protein targeting AT-rich sequences in M. tuberculosis ( 18 ) . 
+ Both XS proteins , Lsr2 and CgpS , complemented the bgl-based phenotype ( 57 ) of an Escherichia coli hns strain , supporting the overall analogous functions of these XS proteins ( Supplementary Figure S7 ) ( 18 ) . 
+ Whereas both lsr2 and cgpS are essential for viability in their native hosts , E. coli hns mutant strains are viable although exhibiting severe growth defects ( 58 ) . 
+ Salmonella Typhimurium null mutants of hns are not viable unless mutations in rpoS ( general stress response ) or phoP ( virulence gene regulator ) counteract this deletion ( 12 ) . 
+ Because the presence and diversity of phage elements contributes to major strain-specific differences within a bacterial species , our study illustrates that the essentiality of XS genes is highly dependent on the particular strain background . 
+ The C. glutamicum strain MB001 , cured of all prophage regions as well as the cgpS gene located on prophage CGP3 , displays wild-type-like growth behavior ( 53 ) . 
+ CgpS binds AT-rich xenogeneic DNA regions
+ Secondary structure predictions of CgpS-related proteins evince two - helices flanking an ` RGI ' motif ( Figures 1C and 7C ) . 
+ This motif resembles the prokaryotic AT-hook motif ` Q RGR ' found in H-NS and Lsr2 and may also be / responsible for the binding of AT-rich DNA as a general rule for XS functioning ( 44,59 ) . 
+ A certain plasticity of the AT-hook motif is supported by experiments with AT-hook muteins of H-NS and Lsr2 , showing that the exchange of a single arginine residue to an alanine reduces DNA binding but does not completely abolish it ( 59 ) . 
+ Moreover , another member of the H-NS family , the Ler protein , has a hydrophobic amino acid ( ` VGR ' motif ) instead of an arginine at this position ( 60 ) . 
+ However , significant differences were observed for the number of target genes affected by the binding of the particular XS proteins . 
+ ChIP-on-Chip analysis revealed a direct influence of S. Typhimurium H-NS on the expression of more than 740 ORFs ( 12,61 ) , and the binding of Lsr2 affected more than 800 regions within the M. tuberculosis genome and > 900 in Mycobacterium smegmatis ( 45 ) . 
+ ChAP-Seq profiling of CgpS binding , however , yielded only 90 potential target regions . 
+ Typical for XS function , an AT-rich DNA motif was derived from the ChAP-Seq results , which clusters at a high density within the CGP3 prophage region ( Supplementary Figure S5 ) . 
+ In general , promoter regions are more often bound by CgpS than genes or intergenic regions ( Supplementary Figure S4 ) , which is not surprising because promoter regions usually possess a higher AT content ( 62,63 ) . 
+ CgpS targets outside the CGP3 region show a similar or lower GC content ( Figure 3C ) but less altered expression levels , and this may suggest the importance of motif density for XS function . 
+ Here , a variation of the AT-hook motif likely represents a mechanism to adjust the binding behavior of the XS protein to meet the needs of a particular host species . 
+ In addition to CGP3 as a main CgpS target , further targets were identified which were also likely acquired by horizontal gene transfer , such as the LCG1 island , the cryptic prophage CGP1 ( 21 ) , R-M systems , transposases and also regulatory proteins such as putative transcriptional regulators ( Cg0725 , Cg1340 , Cg2426 ) , the gluconate-responsive repressor GntR1 ( Cg2783 ) ( 64 ) and an operon encoding the two-component system CgtSR6 ( Cg3060 ) ( Supplementary Table S4 ) . 
+ Several previous studies reported similar tar get genes or regions for H-NS , Lsr2 and MvaT , demonstrating the convergent evolution of XS in bacterial species ( 12,47,61,65 ) . 
+ Overall , more that 80 % of CgpS-bound regions also exhibited a more than 2-fold altered expression level under countersilencing conditions ( Figure 5D ) confirming the postulated silencing effect of CgpS . 
+ Several potential targets outside of the CGP3 region , however , showed only a moderate impact on the expression level suggesting a more complex regulatory scheme at the corresponding promoter regions . 
+ Therefore , the role of CgpS for the control of these potential targets , including , e.g. the gntR1 gene or the cgtSR6 operon , remains to be elucidated in further studies . 
+ How to overcome CgpS silencing?
+ Several different mechanisms were described to counteract H-NS-mediated silencing , including structural interference with H-NS-bound nucleoids by transcription factors , temperature or osmolarity effects , and the binding of alternative sigma factors or other NAPs preventing multimerization of the XS protein ( 11,66,67 ) . 
+ To interfere with CgpS XS activity , we produced a truncated part of the native protei covering the N-terminal oligomerization domain of CgpS ( Figure 5 ) . 
+ This overcomes the problem of cgpS being essential in the presence of CGP3 and was inspired by the study of Williamson and Free , who described the antagonistic function of a truncated H-NS variant found in an enteropathogenic E. coli strain ( 54 ) . 
+ As expected , production of the N-terminal CgpS domain resulted in strong activation of CGP3 , leading to cell death . 
+ In recent studies we described the spontaneous induction of the CGP3 prophage occuring in the absence of an external trigger ( 20,22,23 ) . 
+ Single-cell analysis demonstrated that a considerable fraction of this SPI is preceded by an activation of the SOS response , which is likely the result of spontaneous DNA damage during replication ( 68,69 ) . 
+ However , these studies also highlighted a certain ( > 30 % ) fraction of SOS-independent SPI , suggesting that other factors influence this common phenomenon of bacterial populations ( 5 ) . 
+ The present study shows the sensitive reaction of C. glutamicum cells to the downregulation of CgpS activity ( Video S2 ) . 
+ It is therefore interesting to determine whether cells can adjust the level of XS proteins to manipulate the frequency of SPI according to their particular requirements . 
+ Sequence analysis revealed the presence of CgpS/Lsr2 homologs in phage and prophage genomes displaying a low sequence identity but highly conserved secondary structure prediction ( Figure 7 ) . 
+ This finding is not surprising because bacterial evolution has been shaped by a tight interaction with bacteriophages . 
+ For the integration of viral DNA into the host genome , both the bacterium and phage benefit from tolerance and a smooth integration into the host genetic circuitry . 
+ Because the activation of silent prophages or mobile elements often causes serious detrimental effects to host cells ( 11,70,71 ) , the stringent control of xenogeneic elements is required . 
+ Several examples of XS proteins involved in the control of mobile elements or phages have been described in the recent literature , including H-NS of S. Typhimurium ( 12 ) , Rok from B. subtilis ( 19 ) and MvaT from P. aeruginosa ( 72 ) . 
+ Their corresponding genes , however , are all located on the host chromosome and are characterized as a type of immunity system protecting hosts against foreign DNA ( 11,66 ) . 
+ A PSI-BLAST search of CgpS-related proteins revealed that the majority ( > 98 % of all hits , > 92 % of prophage containing strains ) are found in bacterial genomes ( Supplementary Table S6 ) . 
+ However , several examples located in phages or prophage regions were identified . 
+ The functions of these phage-encoded XS-like proteins remain to be studied , but their presence suggests the following : ( i ) like CgpS , they may be required to secure tolerance of their carrier DNA within the respective host ; ( ii ) they may , however , also function as antagonistic proteins , interfering with the host XS protein similar to the situation described for H-NST ( 54 ) ; or ( iii ) they may interfere with the function of another class of XS proteins . 
+ This hypothesis is based on the exclusion theory suggested by Perez-Rueda and Ibarra , who postulated that XS from different families do not appear in the same bacterial organism ( 56 ) . 
+ Consistent with this bioinformatics study , our data show that the expression of E. coli hns results in strong activation of the cryptic prophage CGP3 and consequently cell death . 
+ The finding that expression of the C. glutamicum cgpS gene in E. coli MG1655 does not counteract H-NS-mediated silencing at the bgl operon shows , however , that the scenario is more complex and strongly depends on the particular strain and its regulatory equipment . 
+ However , our data on prophage activation in C. glutamicum provide evidence for an interference of analogous XS proteins at AT-rich DNA regions . 
+ Here , likely the incompatibility of the oligomerization domains inhibits the formation of XS multimeric structures required for silencing . 
+ Considering the presence of XS encoding genes in phage and prophage genomes , this principle is likely to be harnessed by any phage predator by encoding an interfering XS . 
+ Supplementary Data are available at NAR Online.
+ ACKNOWLEDGEMENT
+ The authors thank Karin Schnetz ( University of Cologne ) for helpful advice and for providing us with the E. coli hns mutant strain . 
+ FUNDING
+ Deutsche Forschungsgemeinschaft priority program SPP1617 [ FR 2759/2 -2 and KO 4537/1 -2 ] ; Helmholtz Association [ VH-NG-716 ] . 
+ Funding for open access charge : Helmholtz Association [ VH-NG-716 ] . 
+ Conflict of interest statement . 
+ None declared .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/27836995.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/27836995.txt 0 → 100644
View file @27818a9
+ Small RNA interactome of pathogenic E. coli
+ Abstract 
+ RNA sequencing studies have identified hundreds of non-coding RNAs in bacteria , including regulatory small RNA ( sRNA ) . 
+ However , our understanding of sRNA function has lagged behind their identification due to a lack of tools for the high-throughput analysis of RNA -- RNA interactions in bacteria . 
+ Here we demonstrate that in vivo sRNA -- mRNA duplexes can be recovered using UV-crosslinking , ligation and sequencing of hybrids ( CLASH ) . 
+ Many sRNAs recruit the endoribonuclease , RNase E , to facilitate processing of mRNAs . 
+ We were able to recover base-paired sRNA -- mRNA duplexes in association with RNase E , allowing proximitydependent ligation and sequencing of cognate sRNA -- mRNA pairs as chimeric reads . 
+ We verified that this approach captures bona fide sRNA -- mRNA interactions . 
+ Clustering analyses identified novel sRNA seed regions and sets of potentially co-regulated target mRNAs . 
+ We identified multiple mRNA targets for the pathotypespecific sRNA Esr41 , which was shown to regulate colicin sensitivity and iron transport in E. coli . 
+ Numerous sRNA interactions were also identified with non-coding RNAs , including sRNAs and tRNAs , demonstrating the high complexity of the sRNA interactome . 
+ Shafagh A Waters1, Sean P McAteer2, Grzegorz Kudla3, Ignatius Pang1,4, Nandan P Deshpande1,4, Timothy G Amos1, Kai Wen Leong5, Marc R Wilkins1,4, Richard Strugnell5, David L Gally2, David Tollervey6,** & Jai J Tree1,*
+ Keywords CLIP-Seq ; CRAC ; EHEC ; enterohaemorrhagic E. coli ; non-coding RNA 
+ Subject Categories Methods & Resources ; Microbiology , Virology & Host Pathogen Interaction ; RNA Biology 
+ DOI 10.15252 / embj .201694639 | Received 26 April 2016 | Revised 6 October 2016 | Accepted 11 October 2016 | Published online 11 November 2016 
+ Introduction
+ Advances in RNA sequencing technologies and associated applications have driven a revolution in our understanding of the complexity of the transcriptome . 
+ For diverse bacterial species , a single RNA-Seq experiment can reveal hundreds of novel noncoding RNAs . 
+ Bacterial small RNA ( sRNA ) species regulate translation of mRNAs involved in a diverse range of physiological processes including carbon , amino acid and metal ion utilization ( Papenfort & Vogel , 2014 ) , horizontal transfer of DNA ( Papenfort et al , 2015 ) , biofilm formation ( Holmqvist et al , 2010 ) and virulence gene expression ( Chao & Vogel , 2010 ) . 
+ Canonically , sRNAs repress mRNA translation by base pairing that covers the ribosome-binding site and/or directing the transcript for cleavage and degradation . 
+ It is now apparent that there are many variations on this canonical theme including activation of translation ( Soper et al , 2010 ) , repression by cleavage alone ( Pfeiffer et al , 2009 ) , cleavage inhibition ( Papenfort et al , 2013 ) , transcriptional attenuation ( Bossi et al , 2012 ) and sRNA sponging ( Figueroa-Bossi et al , 2009 ; Tree et al , 2014 ; Miyakoshi et al , 2015 ) . 
+ The majority of sRNAs in E. coli require the RNA chaperone Hfq to anneal with target mRNAs ( Gottesman & Storz 2011 ) . 
+ Hfq can present sRNAs for interaction with the pool of mRNA targets , increasing the local concentration of interaction partners and providing a positively charged lateral surface to aid annealing ( Panja et al , 2013 ) . 
+ In principal , targets for sRNA interactions can be predicted using sequence-based analysis ; however , few sequence or structural features are conserved between the many different sRNA targets , making false positives a major problem ( Backofen et al , 2014 ; Künne et al 2014 ) . 
+ To overcome this , target prediction programmes have used the presence of a tract of 6 or more consecutive base pairs ( the seed sequence ) and the predicted accessibility of the seed region ( Peer & Margalit , 2011 ) . 
+ Phylogenetic conservation of seed sequences also improves the likelihood of identifying functionally significant interactions but is not applicable to transcripts encoded within variable regions of the genome , such as pathogenicity islands . 
+ In consequence , determining the targets for sRNAs and their regulatory function has generally required the investigation of individual RNAs , often by using transcriptomics to indirectly identify mRNAs with altered stability following sRNA expression or 
+ A number of recent studies have implemented in vitro and in vivo techniques to directly identify interactions between non-coding RNAs and their RNA targets . 
+ These have included approaches using individual microRNAs or bacterial sRNAs as baits , with or without chemical modifications to improve capture of interacting RNAs . 
+ High-throughput sequencing allows identification of target RNAs interacting with the bait RNA ( Imig et al , 2015 ) . 
+ This approach unexpectedly identified a spacer region from the tRNA-Leu precursor as a target for RyhB ( Lalaouna et al , 2015 ) . 
+ An approach to experimentally profile transcriptome-wide RNA -- RNA interactions in eukaryotic cells has been described that uses proximity-dependent ligation of duplexed RNAs to capture RNA interactions in vivo and has been termed CLASH ( UV-crosslinking , ligation and sequencing of hybrids ) ( Helwak et al , 2013 ) ( Fig 1A ) . 
+ RNA -- RNA duplexes are UV-cross-linked to a protein `` bait '' allowing selective capture of RNAs and stringent purification of the RNA -- protein complex . 
+ A small fraction of RNAs covalently bound to the protein remain duplexed during purification and these can be ligated into a single contiguous RNA molecule with T4 RNA ligase ( Helwak et al , 2013 ) or by endogenous RNA ligases ( Grosswendt et al , 2014 ) . 
+ An alternative methodology uses a joining linker to ligate the constrained duplex ends of the RNAs ( Sugimoto et al , 2015 ) . 
+ In each case , a proportion of sequencing reads recovered ( typically ~ 1 -- 2 % ) consist of read segments that non-contiguously map to the transcriptome . 
+ These hybrid reads can be identified in silico and indicate sites of intra - or intermolecular RNA -- RNA interactions occurring on the bait protein . 
+ RNase E is an endonuclease that plays key roles in both the catalytic activity and assembly of the RNA degradosome , a complex responsible for the majority of RNA processing and bulk RNA turn-over ( Mackie , 2013 ) . 
+ The C-terminal domain of RNase E interacts with RhlB ( helicase ) , PNPase ( polynucleotide polymerase and 30 to 50 exoribonuclease activities ) and PAPI ( poly ( A ) polymerase ) . 
+ Both PAPI and PNPase can add oligonucleotide tails ( oligo ( A ) or A-rich , respectively ) to the 30 ends of RNAs following RNase E cleavage . 
+ This creates a single-stranded `` landing pad '' that promotes subsequent degradation by 30-exonucleases ( Khemici & Carpousis , 2004 ) . 
+ In CLASH analyses , the 30 ends of sequence reads will not generally correspond to in vivo cleavage sites because the RNA fragments are treated with RNase during library preparation . 
+ However , the presence of a non-encoded oligo ( A ) tract at the 30 end of sequence reads is a clear indication that this represents a site that was cleaved and then oligoadenylated in vivo . 
+ We previously reported that UV-crosslinking and highthroughput sequencing ( CRAC ) can be used to identify the binding sites for the RNA chaperone , Hfq , at base pair resolution in the model prokaryote E. coli and the related human pathogen , enterohaemorrhagic E. coli ( EHEC ) ( Tree et al , 2014 ) . 
+ These studies revealed that for many sRNA -- mRNA interactions , the Hfq binding site is closely associated with the mRNA seed sequence . 
+ Formation of the sRNA -- mRNA duplex at the Hfq binding site is predicted to induce dissociation from the single-stranded RNA binding site on the chaperone , providing directionality to the reaction ( Tree et al , 
+ 2014 ) . 
+ The endonuclease activity of RNase E is strongly stimulated by the presence of a free 50 monophosphate on the substrate and a 50 triphosphate therefore stabilizes newly synthesized mRNAs ( Mackie , 1998 ) . 
+ Recent work has demonstrated that sRNA -- mRNA duplexes can guide RNase E cleavage of the mRNA by providing a free 50 monophosphate to stimulate cleavage ( Bandyra et al , 2012 ) . 
+ Together , these results indicated that formation of an sRNA -- mRNA duplex may cause dissociation from Hfq and then direct RNase E cleavage of the mRNA . 
+ To test this model , we have identified targets of sRNA-mediated degradation transcriptome-wide and in vivo by applying CLASH to RNase E. 
+ Results
+ UV-crosslinking identifies in vivo binding sites for RNase E 
+ We reasoned that duplexed sRNA -- mRNA pairs might be transiently associated with RNase E prior to mRNA degradation , allowing tagged RNase E to act as a bait in the capture of in vivo interactions by UV-crosslinking ( CLASH ) ( Fig 1A ) . 
+ To facilitate affinity purification of RNA -- RNase E complexes , the chromosomal copy of RNase E ( rne ) was C-terminally tagged with a tandem affinity His6-TEV cleavage site-FLAG tag ( HTF ) . 
+ RNase E is essential for cell viability and was previously shown to retain function when C-terminally FLAG-tagged at the same site ( Morita et al , 2005 ; Worrall et al , 2008 ) . 
+ The strain expressing only RNase E-HTF was viable and showed normal processing of 9S rRNA precursor into mature 5S rRNA ( Ghora & Apirion , 1978 ) , indicating that the fusion protein is functional ( Fig EV1A ) . 
+ Following UV-crosslinking in actively growing cells , RNA -- RNase E-HTF complexes were affinity-purified under denaturing conditions and crosslinked RNAs were trimmed using mild RNase A/T1 digestion . 
+ T4 RNA ligase was added to join RNase 
+ E-associated RNA duplexes into hybrid sequences , and to ligate Illumina sequencing compatible linkers to the ends of RNA fragments . 
+ Silver staining of eluates revealed co-precipitated proteins , with a clearly separated protein at the expected molecular weight of 118 kDa ( Fig EV1B ) . 
+ We confirmed that this band was RNase E using LC-MS/MS . 
+ RNA -- RNase E complexes were transferred to nitrocellulose , excised from the appropriate fragment of the membrane and recovered by protease digestion . 
+ Sequencing libraries were prepared by RT -- PCR . 
+ Duplicate UV-crosslinking experiments showed a strong correlation in the number of reads were also recovered in dataset # 1 . 
+ Sequence reads were mapped to the genome and represent sites of RNase E -- RNA interaction ( read statistics presented in Table EV1 ) . 
+ Read clusters with > 10 reads were identified in 75 % of annotated mRNAs , likely representing the repertoire of mRNAs expressed under our experimental conditions crosslinking of RNA -- RNase E complexes in vivo recovered known RNase E binding sites . 
+ Photocrosslinking experiments have demonstrated that RNase E autoregulates the stability of its own transcript ( rne ) by binding the hairpin structures HP1 -- HP3 within the 50 UTR ( Diwa et al , 2000 ; Schuck et al , 2009 ) . 
+ We found that RNase E indeed binds to all three HP structures in vivo . 
+ Oligoadenylated reads , which are strongly indicative of endogenous 30 ends ( Khemici & Carpousis , 2004 ) , peaked at 9 nts relative to the rne start codon , indicating that RNase E cleaves the rne transcript near the ribosomal binding site ( Fig 1B ) . 
+ The small RNA SgrS binds pldB at +935 955 nt and stabilizes the yigL transcript by occluding an RNase E cleavage site at +948 955 nt within the dicistronic pldB-yigL mRNA ( Papenfort et al , 2013 ) . 
+ In agreement with this study , we find that RNase E binds 50 of this cleavage site and overlaps the SgrS interaction site ( Fig 1C ) . 
+ RNase E cleavage sites were recently mapped transcriptome-wide , identifying sites of 50 monophosphate-independent ( `` direct entry '' ) RNA cleavage ( Clarke et al , 2014 ) . 
+ We assessed RNase E binding at reported RNase E direct entry sites . 
+ Thirteen sites had > 50 reads within 200 nt of the direct entry cleavage site and ten showed a clear peak in RNase E binding or oligoadenylation at the direct entry site ( Fig EV3 ) . 
+ We conclude that our in vivo RNase E binding sites agree with published interactions 
+ Relationship between RNase E , Hfq and oligoadenylation sites 
+ We previously reported that non-genomically encoded oligo ( A ) tails of 2 -- 6 nt were present in 5 % of Hfq-bound sequences ( Tree et al , 2014 ) . 
+ This indicates that Hfq binding sites are associated with endogenous 30 ends that are oligoadenylated by PAPI . 
+ Oligo ( A ) tails were found in 0.7 % of RNase E-bound reads and were predominately ( 76 % ) between 2 and 6 nt in length ( Fig 1D ) . 
+ Hfq interacts with RNase E ( Morita et al , 2005 ; Worrall et al , 2008 ) , and sRNA interactions with an mRNA can facilitate RNase E recruitment and cleavage ( Ikeda et al , 2011 ; Prévost et al , 2011 ; Bandyra et al , 2012 ) . 
+ To gain insights into the arrangement of binding and cleavage sites , we compared the distribution of oligoadenylated sequences and Hfq crosslinking relative to RNase E binding sites . 
+ Maximal Hfq binding was cumulatively found five base pairs 50 of the RNase E binding maximum ( Fig 1E ) although we note a significant overlap in these binding sites . 
+ In contrast , reads with oligo ( A ) tails , reflecting in vivo cleavage sites , were maximally recovered 13 base pairs 30 of the peak in RNase E binding ( Fig 1E ) . 
+ These results support a model in which RNase E is frequently recruited to Hfq binding sites with a five base pair 30-offset leading to RNA cleavage 13 nt downstream of the RNase E binding site and sequencing read . 
+ However , we note that our observations are consistent with in vitro characterization of the MicC -- ompD interaction that directs RNase E cleavage 6 base pairs downstream of the sRNA -- mRNA duplex ( Bandyra et al , 2012 ) . 
+ RNA–RNA interactions are recovered by RNase E-CLASH
+ In CLASH analyses , RNA duplexes that are bound by RNase E can be ligated together and recovered as cDNA sequencing reads that map non-contiguously to distinct sites in the transcriptome . 
+ These were identified and mapped using the Hyb software package ( Travis et al , 2014 ) . 
+ From 21.9 M mapped reads , we recovered 176,874 RNA -- RNA interactions ( 0.8 % , Tables EV1 and EV2 ) including 1,733 sRNA -- mRNA interactions ( Table EV3 ) . 
+ There was substantial overlap between hybrids recovered in the two replicate datasets , and 
+ 41 % of interactions identified in replicate # 2 were also recovered in the larger replicate # 1 dataset . 
+ We used the approach of Sharma et al ( 2016 ) to assess the theoretical false discovery rate expected from random ligation of RNAs in solution , and find that 58.8 % of RNA -- RNA interactions have an FDR < 0.05 ( Table EV2 and 
+ Appendix Supplementary Methods).
+ To verify that RNase E-CLASH recovered bona fide sRNA -- mRNA interactions , we looked for 125 experimentally verified sRNA -- mRNA pairs within our datasets ( Table EV4 ) . 
+ Small RNA interactions were taken from sRNATarBase 3.0 ( Wang et al , 2015 ) , inspected for concordance with published sites and corrected where necessary ( corrections to sRNATarBase 3.0 are presented in Table EV4 ) . 
+ RNase E-CLASH analysis identified a statistically significant number of known sRNA -- mRNA pairs ( 14/125 , P < 6.6 × 10 4 ; Table EV5 and Appendix Supplementary Methods ) including the sRNA -- mRNA pair MicA -- ompA ( Fig 2A and B ) ( Rasmussen et al , 2005 ; Udekwu et al , 2005 ) . 
+ We performed RNA-Seq on total RNA from EHEC and found that the recovery of hybrid reads was only weakly correlated with RNA abundance ( Spearman correlation = 0.15 ; Fig EV4A ) , but was moderately correlated with RNase E crosslinking to single RNAs ( Spearman correlation = 0.44 ; Fig EV4B ) . 
+ Similar results were found for the 125 known sRNA -- mRNA interactions where hybrid recovery correlates more significantly with RNase E crosslinking ( Spearman correlation = 0.15 for mRNA binding ; Fig EV4C -- F ) . 
+ Hybrid recovery is likely a function of both sRNA and mRNA association with RNase E , and we find a general trend towards higher numbers of hybrid reads for known sRNA -- mRNA interactions where both single RNAs were strongly crosslinked to RNase E ( Fig EV4H ) . 
+ These results are consistent with hybrid reads being derived from RNA interactions on RNase E rather than from total cellular RNA . 
+ Small RNAs interact with mRNAs through base pairing , and hybrid reads generated from duplexed RNAs are predicted to have a lower-than-random free energy of interaction ( ΔG ) ( i.e. greater stability ) . 
+ We compared the distribution of free energies for all RNA -- RNA interactions identified ( Fig 2C ) and for sRNA -- mRNA pairs ( Fig 2D ) with randomly paired hybrid read halves . 
+ The distribution of free energies from RNase 
+ E-CLASH RNA -- RNA interactions was significantly lower than for random pairs . 
+ These results are consistent with the hybrid sequences being derived from duplexed RNAs associated with RNase E. Interactions between sRNAs and mRNAs that impair 30S ribosome binding and translation are generally positioned within a window extending from 50 nt upstream to 15 nt ( five codons ) downstream of the start codon ( Bouvier et al , 2008 ) . 
+ Binding sites for sRNAs identified by RNase E-CLASH were enriched within this window on mRNAs ( Fig 2E ) , in agreement with 30S occlusion as a plays important roles in the degradation and processing of all RNA classes in E. coli ( Mackie , 2013 ) . 
+ We therefore determined the proportion of unique sRNA interactions that were contributed by each RNA class ( Fig 2F ) . 
+ Messenger RNA coding regions and 50 UTRs are characterized substrates for sRNA interactions and constituted 43.1 and 2 % of interactions , respectively ( reads that included sequences from both the 50 UTR and CDS were categorized as CDS ) . 
+ The free sRNA pool can be `` buffered '' by sRNA -- tRNA interactions ( Lalaouna et al , 2015 ) , which represented 3.5 % of interactions in our dataset . 
+ In addition , sRNA interactions were recovered with rRNAs ( 35 % ) and other ncRNAs ( 6S , tmRNA , RnpB RNA , CsrB ; 0.9 % ) . 
+ Hybrids between different sRNA species were recovered , for both sRNAs encoded in the `` core '' genome ( 1.8 % , 87 interactions ) and pathogenicity islands ( 0.6 % , 29 interactions ) , indicating an extensive sRNA -- sRNA interaction network . 
+ These included the previously identified interaction between the bacteriophage-encoded anti-sRNA , AgvB , and the conserved core sRNA GcvB ( 82 unique hybrids ) ( Tree et al , 2014 ) . 
+ Small RNAs can also be generated from the 30 UTRs of mRNAs ( Guo et al , 2014 ; Miyakoshi et al , 2015 ) . 
+ 0.9 % of tRNA EcOn hybrids with sRNAs mapped within 50 nt downstream of mRNA translation termination sites , potentially reflecting interactions involving 30 UTRs or 30 UTR-derived sRNAs . 
+ For all RNA classes presented in Fig 2F , the distribution of free energies of interacting RNAs was significantly lower than randomly paired hybrid halves ( P < 1 × 10 9 ) . 
+ Our results indicate that sRNA -- mRNA interactions recovered by RNase E-CLASH have significantly lower free energy than randomly paired RNA sequences and are predominately found close to the start codon , consistent with these hybrid sequences originating from in vivo sRNA -- mRNA interactions . 
+ Numerous sRNA interactions were recovered with diverse ncRNA classes , including sRNA , rRNA , tRNA and other ncRNAs , revealing a complex network of sRNA interactions . 
+ Proximity-dependent ligation protocols can potentially yield falsepositive data through spurious ligation events , mapping artefacts or errors introduced during reverse transcription and PCR ( Ramani et al , 2015 ) . 
+ Since highly recovered interactions have a higher percentage of true positives ( Ramani et al , 2015 ) , ligation events can be weighted on the number of unique sequencing reads corresponding to individual interactions . 
+ We additionally used known and predicted attributes of sRNA -- mRNA interactions to prioritize interactions for further analysis . 
+ This was based on 
+ ( i ) the number of unique sequence reads corresponding to the interaction ; ( ii ) detection of the interaction in replicate datasets ; ( iii ) recovery of the hybrid sequences in both RNA1 -- RNA2 and RNA2 -- RNA1 orientations , indicating ligation at opposite ends of the duplex ; ( iv ) inclusion of a non-genomically encoded oligo ( A ) tail at the 30 end of the target RNA sequence , which is indicative of sRNA-directed cleavage and subsequent tailing ; and ( v ) overlap of both hybrid regions with Hfq binding sites determined by UV-crosslinking and indicating Hfq dependence ( see Appendix Supplementary Methods ) . 
+ We confirmed that experimentally verified sRNA -- mRNA interactions had a higher distribution of scores compared to total sRNA -- mRNA interactions recovered when applying these criteria ( Fig EV5 and Appendix Supplementary Methods ) . 
+ Strikingly , sRNA interactions that satisfied all five criteria , and were represented by multiple unique hybrid reads , were recovered for all RNA classes examined : mRNA , tRNA , rRNA , ncRNA , sRNA ( both core and pathogen specific [ EcOnc ] ) and mRNA antisense transcripts ( Fig 3 ) . 
+ The sRNA interactions with the most hybrid reads representing an interaction were with tRNA species and these interactions were also coincident with Hfq binding sites , indicating that tRNA is a major target for a subset of sRNAs . 
+ Several characterized sRNAs target functionally related sets of mRNAs , allowing coordinated adaption of the transcriptome in response to specific challenges . 
+ Functionally related clusters of mRNA targets within an sRNA interactome may therefore constitute a further indication of reliability , as well as providing insights into the biological roles of the sRNAs involved . 
+ We therefore clustered functionally related sRNA interactions with a score of ≥ 1.1 using BiNGO ( Maere et al , 2005 ) ( Appendix Supplementary Methods ) . 
+ Consistent with previous reports ( Sharma et al , 2011 ) , targets for the core sRNA GcvB were enriched for mRNAs involved in branched-chain amino acid metabolism . 
+ The targets of seven other sRNAs showed significant enrichment of specific ontology classes ( Table EV6 ) . 
+ In particular , the EHEC-specific sRNA Esr41 ( EcOnc14 in our earlier analysis ) was significantly enriched for targets annotated as `` signal transduction '' . 
+ Esr41 bound three mRNAs with products involved in iron uptake : CirA ( receptor for the ironbinding , catecholate siderophore ) , ChuA ( haem receptor ) and Bfr ( bacterioferritin ) , which were analysed in more detail ( see below ) . 
+ These results indicate that functionally related sRNA targets can be defined using gene ontology and are a further indicator of reliability . 
+ Within characterized sRNAs , a single `` seed sequence '' can initiate binding to multiple , distinct RNA targets . 
+ However , between sRNAs the seeds are heterogeneous in location and sequence , making them difficult to predict using only bioinformatic approaches ( Peer & Margalit , 2011 ; Backofen et al , 2014 ) . 
+ To identify putative , novel sRNA seed regions , we analysed sRNA -- target RNA interactions . 
+ The base-paired nucleotides between each sRNA and target RNA were predicted by folding the hybrid read in silico using the UNAfold suite of tools . 
+ The base-paired nucleotides within the sRNA were plotted for each interaction ( Fig 4 and Appendix Fig S1 ) . 
+ Conserved sites of target base pairing were considered to be a seed region . 
+ Multiple seed regions were apparent in the sRNAs ChiX , RyhB , ArcZ , GadY , MgrR and Spot42 . 
+ The motif discovery tool MEME ( Bailey & Elkan , 1994 ) was then applied to identify conserved sequence motifs within target mRNAs that might be recognized by each sRNA seed . 
+ Highly enriched motifs were identified ( e-value < 10 4 ) within target RNAs for 12 sRNAs . 
+ GcvB was reported to recognize the consensus motif CACAaCAY in mRNAs through interactions with the GU-rich R1 seed region located at bases 66 -- 89 ( Sharma et al , 2011 ) . 
+ We found that GcvB -- target interactions were positioned within this R1 seed region ( Appendix Fig S1D ) and MEME identified the consensus motif ACAATAWC within GcvBtargeted RNAs that has complementary to bases 69 -- 76 of the GcvB 
+ R1 seed region ( Appendix Fig S1D ) . 
+ The consensus motif suggests that base G72 of GcvB frequently participates in G-U wobble interactions . 
+ For the 12 sRNAs with statistically significant target motifs , a complementary sequence was identified within the sRNA and likely represents a seed sequence ( Fig 4A and B and Appendix Fig S1 ) . 
+ The seed sequence of the sRNA -- mRNA pair MicC -- ompD guides RNase E cleavage 6 nt downstream of the duplex ( Bandyra et al , 2012 ) . 
+ To determine whether this is a general phenomenon , we cumulatively analysed RNase E binding , oligoadenylation and Hfq binding relative to statistically significant seed motifs identified in target RNAs ( Fig 4C -- E ) . 
+ Oligo ( A ) tails were found to be maximally recovered 10 nt from the 30 end of the seed motif ( 8-nt motif length ) consistent with seed-directed RNase E cleavage . 
+ Hfq-bound reads were maximally recovered in the 10 nt 50 to the seed motif , indicating that Hfq binding sites are often closely associated with the iden ¬ 
+ Our results experimentally define seed motifs for sRNAs with multiple interactions and demonstrate that many sRNAs use more than one site for target RNA interactions . 
+ The newly identified sRNA seed motifs appear to direct RNase E cleavage and oligoadenylation of target RNAs at sites 30 of the seed interaction . 
+ Functional testing of sRNA–mRNA interactions
+ To assess whether sRNA -- mRNA interactions defined by RNase ECLASH function in regulating gene expression , we used a two-plasmid system for monitoring translation of superfolder GFP fusions ( Corcoran et al , 2012 ) . 
+ Translational fusions were constructed for sRNA -- mRNAs interactions with high scores , as defined above : hdeA-RyhB ( score = 8.9 ) , zapB-RyhB ( 7.6 ) , rssA-RyeB ( 7.2 ) , frdA-RyhB ( 6.7 ) , hdeA-GadY ( 5.8 ) ( Fig 5A -- E ) , and for interactions with lower scores that were supported by the ontological analysis chuA-Esr41 ( 4.1 ) , cirA-Esr41 ( 3.1 ) and bfr-Esr41 ( 4.2 ) ( Fig 5F -- H ) . 
+ Expression levels were reduced for all 8 of the fusions when co-expressed with the cognate sRNA . 
+ Mutations introduced into the mRNAs and sRNAs de-repressed the frdA-RyhB and hdeA ¬ 
+ RyhB interactions , and all three Esr41 interactions . 
+ Point mutations in RyhB similarly relieved repression of zapB ; however , synonymous mutations within the mRNA abolished expression and destabilized the transcript as assessed by qPCR ( data not shown ) . 
+ A rare leucine codon was introduced into zapB by the M1 synonymous mutation , potentially explaining the poor translation of this mRNA . 
+ Introduction of compensatory mutations restored RyhB control of frdA , and Esr41 control of chuA , cirA and bfr verifying direct sRNA -- mRNA interactions for these pairs and confirming that functional sRNA -- mRNA interactions are recovered by 
+ The EHEC-specific sRNA Esr41 controls iron transport and storage
+ Our previous analysis of Hfq binding sites using UV-crosslinking identified numerous novel sRNAs within the pathogenicity islands of enterohaemorrhagic E. coli , referred to as EcOnc RNAs , but their RNA targets remained largely unknown ( Sudo et al , 2014 ; Tree et al , 2014 ) . 
+ The RNase E-CLASH dataset contained 810 unique hybrids with pathogenicity island-encoded EcOnc sRNAs identifying many target transcripts ( Fig 3 and Table EV7 ) . 
+ The EHEC-specific sRNA , Esr41 ( EcOnc14 in our earlier analysis ) , was previously shown to affect the abundance of the fliC transcript and cell motility ( Sudo et al , 2014 ) . 
+ Here we have demonstrated that Esr41 regulates expression of the iron transport and storage proteins CirA , ChuA and Bfr ( Fig 5F -- H ) . 
+ The mRNA interactome of Esr41 is similar to the `` core '' genome-encoded sRNA , RyhB ( Massé et al , 2005 ) . 
+ We therefore additionally analysed translation of the chuA , cirA and bfr fusions in the presence of constitutively expressed RyhB ( Fig 6A ) . 
+ Esr41 and RyhB repressed bfr to comparable levels , but Esr41 had a greater repressive effect on chuA translation , consistent with it base pairing closer to the chuA RBS . 
+ In contrast , Esr41 repressed cirA translation by 7.6-fold , whereas RyhB positively regulated cirA translation . 
+ Esr41 is encoded on the pathogenicity island SpLE1 that also encodes the tellurite , phage and colicin resistance gene cluster ter 
+ ( Whelan et al , 1997 ) , and the enterobactin receptor Iha . 
+ Colicin 1A is a pore-forming toxin that uses the siderophore receptor CirA to enter the cell and cause bacterial cell death . 
+ RyhB confers sensitivity to colicin 1A through de-repression of CirA ( Salvail et al , 2013 ) , and we investigated the effect of Esr41 on colicin sensitivity . 
+ Constitutive expression of Esr41 conferred complete resistance to colicin 1A in the sensitive E. coli background , DH5a , but did not affect resistance in the EHEC background that is already colicin resistant ( Fig 6 and data not shown ) . 
+ Deletion of esr41 in EHEC strain ZAP198 conferred a fitness advantage in iron-limited medium ( MEM-HEPES supplemented with 250 nM Fe ( NO3 ) 3 and 
+ Normalised Hfq read clusters 0 10 20 30 40 50 60
+ 0.1 % glucose ) consistent with repression of iron transporters by Esr41 ( Fig 6E ) . 
+ Complementation of the esr41 mutant by chromosomal knock-in of esr41 restored the growth disadvantage to the esr41 mutant . 
+ These results demonstrate that , consistent with mRNA interactions identified by RNase E-CLASH , Esr41 regulates iron uptake and homeostasis in EHEC and can confer resistance to colicin 1A and colicin 1B in a sensitive background . 
+ Discussion
+ We demonstrate that interaction networks for bacterial sRNAs can be determined experimentally by UV-crosslinking sRNA -- target RNA duplexes to RNase E . 
+ Our results revealed sRNA interactions with diverse RNAs including stable RNA species : rRNA and tRNA , other non-coding RNAs , and many different mRNAs . 
+ Here we have focused on the association of RNase E with sRNA -- mRNA duplexes . 
+ The CLASH analyses of RNase E-associated RNA duplexes recovered around 0.8 % hybrids . 
+ This frequency is similar to that seen in previous analyses of human miRNAs associated with Argonaute 1 ( Ago1 ) ( Helwak et al , 2013 ) and double-stranded RNAs bound to Staufen ( Sugimoto et al , 2015 ) . 
+ In contrast , analysis of our previfound that many Hfq binding motifs overlap the mRNA seed sequence , suggesting that for these sRNA -- mRNA interactions , duplex formation would likely dissociate the RNAs from Hfq ( Tree et al , 2014 ) . 
+ We therefore postulated that duplexes formed on Hfq are rapidly transferred to RNase E. 
+ For a subset of sRNAs , we were able to define seed sequences within the sRNA and identify enriched motifs within target RNAs . 
+ Our analyses indicate that sRNAs commonly utilize multiple seed regions for target RNA base pairing . 
+ Target RNA seed sequences were closely associated with Hfq binding sites . 
+ This is consistent paired RNAs and preventing re-binding to Hfq ( Fig 4E ) . 
+ Oligoadenylation peaked 10 nt 30 of the seed motif , indicating that many seed interactions direct cleavage of the mRNA and terminal nucleotide addition by poly ( A ) polymerase or PNPase ( Fig 4C ) . 
+ This is consistent with in vitro results demonstrating RNase E cleavage of target 
+ RNAs is guided to 5 -- 6 nt 30 of a duplexed 13-mer or sRNA ( Bandyra 
+ The mechanism of sRNA-directed , RNase E cleavage has features in common with miRNA-directed cleavage by human Argonaute 2 ( hAgo2 ) . 
+ RNA targets that are fully complementary to the miRNA displace the PAZ domain of hAgo2 and induce a conformational change that results in cleavage of the miRNA -- target duplex ( Ameres et al , 2007 ; Wang et al , 2009 ) . 
+ Thus , productive base pairing of the miRNA and target is sensed by competition between hAgo2 and the target RNA resulting in dissociation of the miRNA 30 end . 
+ For the Hfq-RNase E complex , we suggest that sRNA -- mRNA duplex formation at the Hfq binding motif dissociates the sRNA -- mRNA pair from Hfq allowing interaction with RNase E and sRNA-directed cleavage of the target RNA 30 of the seed motif . 
+ A striking result from our RNase E-CLASH analysis was the range of RNA classes identified in RNA -- RNA hybrids . 
+ The transcriptomes of both E. coli and Salmonella encode small RNAs embedded within mRNAs ( Guo et al , 2014 ; Miyakoshi et al , 2015 ) lending weight to the idea of a genomic palimpsest even in prokaryotes ( Tuck & Tollervey , 2011 ) and potentially obscuring clear annotation of transcript classes . 
+ However , it is notable that all classes of RNA analysed were found in sRNA -- RNA duplexes . 
+ We and others have identified small RNA species that act as sRNA sponges and this appears to be widespread . 
+ We recovered 152 unique sRNA -- sRNA interactions in our CLASH data . 
+ These included our previously characterized interaction between the pathogenicity-associated sRNA AgvB and core sRNA GcvB ( Tree et al , 2014 ) . 
+ These results indicate that an extensive network of sponging interactions occur between sRNAs . 
+ Recent work demonstrated that sRNA interactions with tRNA spacer regions play important roles in `` buffering '' sRNA interactions to enhance specificity ( Lalaouna et al , 2015 ) . 
+ We identified 320 unique sRNA -- tRNA interactions , including the previously reported RyhB -- tRNA -- Leu interaction ( Lalaouna et al , 2015 ) . 
+ We note that six sRNA -- tRNA interactions contain > 10 nt of pre-tRNA sequence , indicating that minimally , these interactions occur before tRNA 50 and 30 maturation . 
+ Hfq has previously been shown to interact with tRNAs ( Zhang et al , 2003 ; Lee & Feig , 2008 ; Tree et al , 2014 ) , suggesting a role in facilitating sRNA -- tRNA interactions . 
+ Extensive interactions of miRNAs with tRNA and rRNA have also been identified ( Helwak et al , 2013 ) and it seems that these stable RNA species may act universally to buffer non-coding RNA interactions . 
+ These may stabilize sRNAs or miRNAs that are temporarily in excess over cognate targets and help prevent their inappropriate 
+ The EHEC-specific sRNA Esr41/EcOnc14 was independently identified by Sudo et al ( 2014 ) and in our previous analysis of Hfq binding sites . 
+ We initially investigated the role of Esr41 in promoting colicin resistance through repression of CirA , and we were able to confirm that Esr41 confers complete colicin 1A and colicin 1B resistance when provided in trans in the colicin-sensitive background , DH5a . 
+ Colicin 1B is used by Salmonella Typhimurium to clear commensal Escherichia coli species ( part of the normal flora ) during gastrointestinal colonization ( Nedialkova et al , 2014 ) . 
+ Our results demonstrate that resistance to colicin 1B can be conferred by expression of a single , pathogen-specific small RNA . 
+ In contrast , the core genome-encoded sRNA RyhB promotes colicin 1A sensitivity through translational activation of CirA ( Salvail et al , 2013 ) . 
+ Esr41 is encoded within a large pathogenicity island ( SpLE1 or 
+ O-island 43/48 ) that confers colicin , tellurite and bacteriophage resistance , and also encodes the iron transporter/adhesin Iha . 
+ We were not able to test for decreased colicin 1A sensitivity in an EHEC Δesr41 strain due to the presence of the adjacent colicin resistance ter gene cluster . 
+ However , Esr41 targets identified by CLASH and confirmed by mutations included mRNAs encoding the iron transport and storage proteins ChuA , CirA and Bfr . 
+ A role in iron homeostasis is corroborated by competitive index experiments , demonstrating that deletion of esr41 confers a fitness advantage to EHEC under relatively iron-limited conditions ( 250 nM Fe ) , indicating that Esr41 limits iron transport by repression of select iron receptors . 
+ The Iha gene is located upstream of Esr41 and encodes a receptor for the ferric iron-binding siderophore , enterobactin . 
+ We speculate that Esr41 is co-selected with Iha as Esr41-mediated repression of CirA ( catecholate siderophore receptor ) , ChuA ( haem receptor ) and Bfr ( bacterioferritin ) would redirect iron transport through a pathway involving enterobactin and Iha , favouring maintenance of the O-island . 
+ While this work was in revision , a related technique for sequencing sRNA -- RNA interactions termed RIL-Seq was described ( Melamed et al , 2016 ) . 
+ This is conceptually similar to RNase ECLASH , excepting that Hfq is used as a scaffold to capture sRNA -- RNA duplexes and the purification is performed under native conditions as opposed to CLASH that uses a stringent purification protocol . 
+ Stringency is introduced into RIL-Seq analysis in silico where hybrid reads are filtered for statistical enrichment . 
+ We find a comparable number of statically significant sRNA -- mRNA interactions are recovered by both techniques in log phase cells ( 633 using RIL-Seq and 782 using RNase E-CLASH ) and similar sRNA seed regions and motifs are recovered for abundant sRNAs ( e.g. ArcZ , MgrR , GcvB and CyaR ) , suggesting that both techniques capture bona fide sRNA -- RNA interactions . 
+ Notably , the pools of RNA -- RNA interactions recovered in association with Hfq and RNase E are expected to be different . 
+ RNase E processes a broad range of RNA species and is expected to associate with a subset of all sRNA -- mRNA interactions that specifically result in target degradation . 
+ We conclude that CLASH recovers functional RNA -- RNA interactions when applied to RNase E in E. coli , allowing high-throughput identification of functional RNA targets for many sRNA species . 
+ A key advantage of this high-throughput approach is the ability to identify interactions that would not be predicted by extrapolating our current understanding of sRNA biology . 
+ We anticipate that profiling RNA interactions using CLASH will reveal diverse roles for 
+ Materials and Methods
+ Bacterial strains, plasmids and culture conditions
+ For CLASH analysis , Escherichia coli O157 : H7 str . 
+ Sakai ( GenBank Acc # NC_002695 .1 ) was used to construct a dual-affinity-tagged HTF strain . 
+ Bacterial strains , plasmids and primers are presented in Table EV8 . 
+ Strains were routinely grown on LB agar plates and broth supplemented with antibiotics where appropriate . 
+ For crosslinking and phenotypic experiments , E. coli O157 : H7 was grown under virulence-inducing conditions in MEM-HEPES media ( Sigma M7278 ) supplemented with 250 nM Fe ( NO3 ) 3 and 0.1 % glucose . 
+ Preparation of CLASH sequencing libraries
+ Cells grown to OD 0.8 in MEM-HEPES ( M7278 ) supplemented with 250 nM Fe ( NO3 ) 3 and 0.1 % glucose were crosslinked with 1,800 mJ of UV-C . 
+ Cells were harvested by centrifugation at 4,000 g for 
+ 10 min , weighed and resuspended in 50 ml of ice-cold PBS . 
+ The cells were divided into 1 g pellets and snap-frozen in a dry ice / ethanol bath . 
+ One volume ( 1 ml/g ) of lysis buffer [ 50 mM Tris -- HCl ( pH 7.8 ) , 1.5 mM MgCl2 , 150 mM NaCl , 0.1 % Nonidet P-40 , 5 mM b-mercaptoethanol and 1 tablet `` cOmplete '' EDTA-free protease inhibitor ( Roche ) / 50 ml ] and 3 V of 0.1-mm zirconia beads were added to a cell pellet and vortexed 5 × 1 min with 1-min intervals on ice . 
+ Cell lysates were cleared by centrifugation ( 4,000 g for 20 min ) and the supernatant was transferred to 1.5-ml microcentrifuge tubes and cleared at 16,000 g for a further 20 min . 
+ Super ¬ 
+ ( Sigma-Aldrich ) and incubated overnight . 
+ The resin was washed twice with 10 ml of TNM1000 ( 50 mM Tris -- HCl pH 7.8 , 1 M NaCl , 0.1 % NP-40 , 5 mM b-mercaptoethanol ) and twice in 10 ml TMN150 ( 50 mM Tris -- HCl pH 7.8 , 150 mM NaCl , 0.1 % NP-40 , 5 mM bmercaptoethanol ) , resuspended in 500 ll of TMN150 and incubated with 20 -- 30 U of TEV protease for 2 h at 18 °C . 
+ The slurry was centri-fuged through a Bio-Rad Bio-spin column and the eluate collected . 
+ Approximately 500 ll of eluate was incubated with 0.15 U of RNace-IT ( Agilent ) at 20 °C for 7 min . 
+ The digestion was stopped by the addition of 0.4 g of guanidine -- HCl , 300 mM NaCl and 10 mM imidazole ( pH 8.0 ) . 
+ 100 ll of Ni-NTA slurry was pre-washed twice in 750 ll of wash buffer I ( 6 M guanidine -- HCl , 50 mM Tris -- HCl pH 7.8 , 300 mM NaCl , 0.1 % NP-40 and 5 mM b-mercaptoethanol ) . 
+ Eluates were added to the washed resin and incubated overnight at 4 °C . 
+ The resin was washed twice with 750 ll of ice-cold wash buffer 
+ I and twice with 750 ll of 1 × PNK buffer ( 50 mM Tris -- HCl pH 7.8 , 10 mM MgCl2 , 0.5 % NP-40 and 5 mM b-mercaptoethanol ) . 
+ The eluates were transferred into a spin column ( Pierce , Thermo Fisher , 69705 ) . 
+ The subsequent reactions were performed in 80 ll reaction volumes on-column . 
+ 30 ends were dephosphorylated by incubating for 45 min at 20 °C with thermosensitive alkaline phosphatase ( TSAP , Promega ) and RNasin ( Promega ) in PNK reaction buffer ( 50 mM Tris -- HCl pH 7.8 , 10 mM MgCl2 and 10 mM b-mercap-toethanol ) . 
+ The resin was washed once with 400 ll of wash buffer I and three times with 400 ll of 1 × PNK buffer . 
+ The resin was incubated with tobacco acid pyrophosphatase ( Epicentre ) in 1 × TAP buffer ( Epicentre ) and incubated at 20 °C for 2 h , washed once with 400 ll of wash buffer I and then three times with 400 ll of 1 × PNK buffer . 
+ The 50 ends of bound RNAs were radiolabelled by phosphorylation with T4 PNK ( 4 ll , Sigma ) and 32P-cATP ( 4 ll , PerkinElmer 
+ BLU502Z ) in PNK reaction buffer for 100 min at 20 °C , after which 100 nM of cold ATP was added and incubated for a further 50 min to complete 50 end phosphorylation . 
+ The resin was washed once with 400 ll of wash buffer I and three times with 400 ll of 1 × PNK buffer . 
+ To add 30 linkers , the resin was incubated with 4 ll of T4 RNA ligase I ( NEB ) and 8 ll of miRCat-33 30 linker ( IDT ) in PNK reaction buffer with 2 ll of RNasin ( Promega ) at 16 °C for 16 h and then washed once with 400 ll of wash buffer I and three times with 1 × PNK buffer . 
+ To add 50 linkers , the resin was incubated with 4 ll of T4 RNA ligase I ( NEB ) and 1 ll of 100 lM 50 linker ( IDT ; Table EV8 ) in PNK reaction buffer with 2 ll of RNasin ( Promega ) and 1 mM ATP at 16 °C for 16 h . 
+ The resin was washed three times with wash buffer II ( 50 mM Tris -- HCl pH 7.8 , 50 mM NaCl , 10 mM imidazole , 0.1 % NP-40 , 5 mM b-mercaptoethanol ) . 
+ 200 ll of elution buffer ( wash buffer II supplemented with 150 mM imidazole ) was added to the resin and incubated at RT for 5 min . 
+ RNase E -- RNA complexes were eluted into a clean microcentrifuge tube , and the elution was repeated . 
+ Complexes were precipitated with 100 ll of TCA and 40 lg of glycogen by incubating on ice for 30 -- 60 min and centrifugation at 4 °C for 20 min ( 16,000 g ) . 
+ Supernatants were removed and pellets washed with 800 ll of ice-cold acetone . 
+ Precipitate was centrifuged again at 16,000 g , supernatants were removed , and pellets were air-dried . 
+ The pellet was resuspended in 30 ll of 1 × NuPAGE loading buffer . 
+ The sample was loaded onto a NuPAGE 4 -- 12 % Bis-Tris PAGE gel ( Invitrogen ) and run in MOPS SDS running buffer ( Invitrogen ) . 
+ P-labelled RNase E complexes were 32 transferred to a nitrocellulose membrane ( Amersham Hybond ECL ) transfer buffer ( Invitrogen ) . 
+ Complexes were visualized by autoradiography using Kodak BioMax MS film and developed films realigned to the membrane . 
+ The high molecular weight complex ( > 115 kDa ) was excised from the membrane ( see Fig EV1C ) . 
+ The labelled RNA was recovered by incubating the membrane fragment in 400 ll of wash buffer II supplemented with 1 % SDS , 5 mM EDTA and 100 lg of proteinase K , for 2 h at 55 °C . 
+ The supernatant containing labelled RNA fragments was transferred to a clean microcentrifuge tube . 
+ To precipitate the RNA fragments , 50 ll of 3 M NaOAc pH 5.2 and 500 ll of phenol : chloroform : isoamylalcohol was added , vortexed and centrifuged for 5 min at RT. . 
+ The aqueous phase was transferred to a clean microcentrifuge tube and 1 ml of ice-cold EtOH and 20 lg of glycogen added . 
+ The precipitation was incubated at 80 °C for 30 min and centrifuged at 16,000 g for 20 min , followed by a wash with 500 ll of ice-cold 70 % EtOH and air-drying . 
+ The RNA pellet was resuspended in 13 ll of RT buffer I ( miRCat RT oligo and 5 mM dNTPs ) and reverse-transcribed using 
+ Superscript III as per the manufacturer 's instructions . 
+ cDNA was amplified using Takara LA Taq , P5 and PE_miRCat PCR primers ( Table EV8 ) , and 2 ll of cDNA . 
+ cDNAs were amplified for 20 -- 24 cycles to minimize bias in amplicons . 
+ 3 -- 10 PCRs were pooled and ethanol-precipitated . 
+ PCR products were separated on a 3 % meta-phor agarose gel and smeared amplicons above primer dimers indicated in control samples were gel-extracted using a MinElute gel extraction Kit ( Qiagen ) . 
+ Libraries were pooled and submitted for single-end 100-bp HiSeq2500 sequencing at GenePool ( University of Edinburgh ) . 
+ Sequence data has been deposited at NCBI GEO ( series 
+ GSE77463).
+ Analysis of CLASH hybrids
+ Sequencing reads generated by RNase E-CLASH were analysed using the hyb package ( Travis et al , 2014 ) . 
+ Details of the in silico analysis are presented in Appendix Supplementary Methods . 
+ Confirmation of sRNA–mRNA interactions and phenotypic characterization of Esr41
+ We employed the two-plasmid system described by Corcoran et al ( 2012 ) to monitor translation efficiency of mRNA-sfGFP fusions . 
+ Plasmids containing small RNAs were cloned as described in Urban and Vogel ( 2007 ) excepting Esr41 was inserted into pZE12 using inverse PCR . 
+ Briefly , the mutagenic primers Esr41.ZE12.F and ZE12 .5 P.R were used to amplify a fragment of pZE12 : : luc that was DpnI-treated , gel-extracted and subsequently recircularized with T4 DNA ligase and transformed into DH5a . 
+ Clones containing an Esr41 insert were confirmed by sequencing . 
+ For mRNA fusions , clones were generated essentially as described in Corcoran et al ( 2012 ) . 
+ Briefly , transcript start sites were identified using RegulonDB and the corresponding site in E. coli O157 : H7 str . 
+ Sakai identified using BLAST . 
+ Primers were designed to amplify from the transcription start site to within the CDS encompassing the predicted region of sRNA -- mRNA interaction ( Table EV8 ) . 
+ PCR products were cloned using NsiI and NheI 
+ ( Fast digest enzymes , Thermo ) and positive clones confirmed by sequencing . 
+ Point mutations were introduced using mutagenic primers listed in Table EV8 and confirmed by Sanger sequencing . 
+ Detailed methods for FACS and qPCR analysis of superfolder GFP fusions are presented in Appendix Supplementary Methods . 
+ Competitive index experiments
+ Indicated strains were grown overnight in LB at 37 °C and the culture OD600 adjusted to provide equal densities . 
+ Competing strains were inoculated at 1/1 ,000 into MEM-HEPES supplemented with 250 nM Fe ( NO3 ) 3 and 0.1 % glucose . 
+ At 24-h intervals , the culture was diluted 1/1 ,000 in fresh media for a total of three subcultures 
+ ( 3-days growth ) . 
+ Cultures were diluted and plated onto LB plates to obtain well-separated colonies and 100 colonies were replica plated onto LB agar and LB agar supplemented with nalidixic acid ( 30 lg / ml ) to select for marked strains . 
+ Competition experiments were repeated with nalidixic acid resistance and sensitivity in the opposite strain to account for any fitness cost associated with nalidixic acid resistance . 
+ Colicin sensitivity testing
+ Colicin 1A and B lysates were prepared from E. coli harbouring p3Z/Col1A and p3Z/ColB as described in Brickman and Armstrong ( 1996 ) . 
+ Colicin 1B was prepared from Salmonella Typhimurium SL1344 by inducing with 1 lg/ml of mitomycin C and filtering the supernatant . 
+ Colicin V was prepared from E. coli strain NCTC50147 
+ ( Public Health England , UK ) as described for colicin 1B . 
+ To test sensitivity to colicins , a top agar lawn of E. coli DH5a was prepared and 5 ll of colicin lysate spotted onto the lawn . 
+ Plates were incubated overnight at 37 °C and scanned . 
+ Expanded View for this article is available online.
+ Acknowledgements
+ We thank Eric Masse for providing constructs for expressing colicins 1A and B. JJT and SAW were supported by funding from the Australian 
+ National Health and Medical Research Council ( APP1067241 ) . 
+ DLG , DT and 
+ SPM were supported by Wellcome Trust funding ( WT090231MA ) and research at the Roslin Institute is supported by BBSRC Institute grant funding ( BB/J004227/1 ) . 
+ DT was supported by Wellcome Trust funding ( 077248 ) . 
+ Work in the Wellcome Trust Centre for Cell Biology is supported by Wellcome Trust core funding ( 092076 ) . 
+ GK was supported by Wellcome Trust grant 097383 and by the MRC . 
+ MRW acknowledges funding from the Australian Government NCRIS scheme and the New South Wales State 
+ Government RAAP scheme.
+ Author contributions
+ JJT , DLG and DT designed the experiments . 
+ SAW , JJT , SPM and KWL performed the experimental work . 
+ JJT , GK , NPD , IP and TGA analysed the data . 
+ All authors 
+ The authors declare that they have no conflict of interest.
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/27872077.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/27872077.txt 0 → 100644
View file @27818a9
+ Identiﬁcation of IncA/C Plasmid Replication
+ Steven J. Hancock , a , b Minh-Duy Phan , a , b Kate M. Peters , a , b Brian M. Forde , a , b Teik Min Chong , c Wai-Fong Yin , c Kok-Gan Chan , c David L. Paterson , d Timothy R. Walsh , e Scott A. Beatson , a , b Mark A. Schembria , b Australian Infectious Diseases Research Centre , University of Queensland , Brisbane , Australiaa ; School of Chemistry and Molecular Biosciences , University of Queensland , Brisbane , Australiab ; Faculty of Science , Division of Genetics and Molecular Biology , Institute of Biological Sciences , University of Malaya , Kuala Lumpur , Malaysiac ; University of Queensland Centre for Clinical Research , Brisbane , Australiad ; Department of Medical Microbiology and Infectious Disease , Cardiff University , Cardiff , United Kingdome 
+ ABSTRACT Plasmids of incompatibility group A/C ( IncA/C ) are becoming increasingly prevalent within pathogenic Enterobacteriaceae . 
+ They are associated with the dissemination of multiple clinically relevant resistance genes , including bla and CMY blaNDM . 
+ Current typing methods for IncA/C plasmids offer limited resolution . 
+ In this study , we present the complete sequence of a blaNDM-1-positive IncA/C plasmid , pMS6198A , isolated from a multidrug-resistant uropathogenic Escherichia coli strain . 
+ Hypersaturated transposon mutagenesis , coupled with transposon-directed insertion site sequencing ( TraDIS ) , was employed to identify conserved genetic elements required for replication and maintenance of pMS6198A . 
+ Our analysis of TraDIS data identiﬁed roles for the replicon , including repA , a toxin-antitoxin system ; two putative partitioning genes , parAB ; and a putative gene , 053 . 
+ Construction of mini-IncA/C plasmids and examination of their stability within E. coli conﬁrmed that the region encompassing 053 contributes to the stable maintenance of IncA/C plasmids . 
+ Subsequently , the four major maintenance genes ( repA , parAB , and 053 ) were used to construct a new plasmid multilocus sequence typing ( PMLST ) scheme for IncA/C plasmids . 
+ Application of this scheme to a database of 82 IncA/C plasmids identiﬁed 11 unique sequence types ( STs ) , with two dominant STs . 
+ The majority of blaNDMpositive plasmids examined ( 15/17 ; 88 % ) fall into ST1 , suggesting acquisition and subsequent expansion of this blaNDM-containing plasmid lineage . 
+ The IncA/C PMLST scheme represents a standardized tool to identify , track , and analyze the dissemination of important IncA/C plasmid lineages , particularly in the context of epidemiological studies . 
+ KEYWORDS uropathogenic E. coli, IncA/C plasmid, functional genomics, New Delhi metallo-beta-lactamase, plasmid multilocus sequence typing
+ IncA/C plasmids are large , low-copy-number , broad-host-range plasmids with varying capacities for conjugation ( 1 ) . 
+ These plasmids represent an increasing threat to public health due to their association with the dissemination of the blaCMY cephalosporinase genes ( 2 ) and more recently the blaNDM metallo-beta-lactamase genes ( 3 -- 5 ) . 
+ The ﬁrst IncA/C plasmids were isolated from aquatic host species , including the ﬁsh pathogen Aeromonas salmonicida ( 6 ) , and pandemic strains of Vibrio cholerae ( 7 , 8 ) . 
+ However , recently , there has been a signiﬁcant increase in the isolation of IncA/C plasmids from Enterobacteriaceae , including Salmonella ( 9 ) , Klebsiella pneumoniae ( 10 ) , and Escherichia coli ( 11 ) . 
+ Plasmids from the IncA/C group were discovered more than 40 years ago and initially assigned to two separate groups , namely , IncA ( RA1 ) ( 6 ) and IncC ( 12 ) . 
+ Subsequent investigations into compatibility , exclusion , and phage sensitivity provided strong evidence for combining these two groups into a single group named IncA/C ( 12 -- 14 ) . 
+ Molecular analysis of the IncA/C replicon has again split this group into two distinct types , A/C1 ( RA1 ) and A/C2 ( 15 ) . 
+ The A/C2 type comprises the vast majority of IncA/C plasmids sequenced to date ( 1 ) . 
+ Plasmid backbone comparisons have informed the subtyping of IncA/C2 into type 1 and type 2 plasmids ( 16 ) . 
+ The two types differ in several ways , including two replacement regions ( R1 and R2 ) that lie within rhs and a large coding sequence ( CDS ) in transfer region 1 , respectively , as well as the presence or absence of two small segments ( i1 and i2 ) ( 16 ) . 
+ Common features are also found in relation to resistance gene content , for example , the vast majority of type 1 plasmids possess the antimicrobial resistance island A ( ARI-A ) located within rhs and an ISEcp1-blaCMY insertion within the large CDS in transfer region 1 ( 16 , 17 ) . 
+ An additional resistance island ( ARI-B ) , located upstream of the par locus , is found in both type 1 and 2 plasmids but is not always present . 
+ The IncA/C replicon was ﬁrst deﬁned and characterized using the archetype plasmid RA1 . 
+ Thirteen direct repeats ( iterons ) are located downstream of the repA replication gene , similar to IncP plasmids ( 18 ) . 
+ Both repA and the iterons are required for IncA/C replication ( 19 ) . 
+ Similar to other iteron-controlled replicons , there is an imperfect inverted repeat upstream of repA ( 18 ) , suggesting autoregulation ( 20 ) . 
+ IncA/C plasmids also possess a putative toxin-antitoxin ( TA ) system . 
+ The TA genes are strongly transcribed , suggesting the system is functionally active in the postsegregation killing of plasmid-free progeny ( 21 ) . 
+ Moreover , attempts to construct deletion mutants of the antitoxin component proved to be lethal to the cell ( 22 ) . 
+ Partitioning systems are one of the most important factors that contribute to the stable inheritance of large , low-copy-number plasmids ( 23 ) . 
+ Partitioning typically involves three components : a cis-acting DNA binding site ( centromere ; parS ) , a centro-mere binding protein ( ParB ) , and an NTPase ( ParA ) ( 24 ) . 
+ Together , they facilitate the correct positioning of plasmid molecules during cell division to increase plasmid retention ( 25 ) . 
+ Partitioning systems are classiﬁed into different groups based on the characteristics of the NTPase ( 24 ) . 
+ The IncA/C ParA protein contains a Walker-type ATPase , indicating IncA/C plasmids possess a type I partitioning system ( 1 , 25 ) . 
+ This ParA protein has similarity to ParA of IncP plasmids , while IncA/C ParB contains both ParB and KorB domains ( 1 ) . 
+ These genes are transcribed at low levels , similar to repA ( 21 ) . 
+ Another putative partitioning gene , stbA , is found in a separate genetic location . 
+ StbA has similarity to the ParM partitioning protein from the IncFII plasmid NR1 ( 1 ) . 
+ In addition to these elements , IncA/C plasmids carry a number of other genes putatively involved in replication , including kfrA and ter ( 1 ) . 
+ However , the functions of these genes have not been experimentally determined . 
+ Transposon-directed insertion site sequencing ( TraDIS ) , along with other , similar techniques , including Tn-seq ( 26 ) , INseq ( 27 ) , and HITS ( 28 ) , is a high-throughput whole-genome screening method used to perform bacterial functional genomic analyses ( 29 , 30 ) . 
+ A typical TraDIS experiment examines a highly saturated transposon mutant library under a condition of interest , with pre - and postselection libraries subjected to deep sequencing to simultaneously identify all of the transposon insertion sites . 
+ After selection , the lack of insertions within a gene is used to determine the importance of that gene for survival under the condition tested . 
+ The technique has been applied to identify genes that enable the maintenance and transmission of the IncI1 plasmid pESBL ( 31 ) and the essential genes of the IncF plasmid pEC958 ( 32 ) . 
+ Here , we employed TraDIS to identify genetic elements involved in the replication and maintenance of the IncA/C plasmid group . 
+ These experimentally validated elements provided a framework for development of a novel plasmid multilocus sequence typing ( PMLST ) scheme for tracking this important plasmid group . 
+ RESULTS
+ Genomic analysis of the carbapenem-resistant E. coli strain MS6198 . 
+ MS6198 is a carbapenem-resistant uropathogenic Escherichia coli ( UPEC ) strain . 
+ MS6198 is also nonsusceptible to multiple other antibiotics , including beta-lactams , nalidixic acid , ciproﬂoxacin , gentamicin , kanamycin , sulfamethoxazole , trimethoprim , tetracycline , and tobramycin ( see Data Set S1 in the supplemental material ) . 
+ The complete genome of MS6198 was determined and shown to consist of a circular chromosome comprising 5,176,750 base pairs ( 51.59 % G-C content ) . 
+ In silico typing assigned MS6198 to sequence type 648 ( ST648 ) , which has been associated with a number of disease outbreaks in Asia and Europe ( 33 -- 35 ) . 
+ MS6198 contained a number of UPEC virulence factors , including genes encoding type 1 ﬁmbriae , Ag43 , capsule , and several iron acquisition systems . 
+ A list of known UPEC virulence genes found in MS6198 is shown in Data Set S2 in the supplemental material . 
+ In addition , MS6198 contained 4 circular plasmids , pMS6198A ( IncA/C type ; 137,565 bp ) , pMS6198B ( IncFII type ; 128,428 bp ) , pMS6198C ( untypeable ; 98,242 bp ) , and pMS6198D ( IncI1 type ; 50,899 bp ) . 
+ Methylome analysis of MS6198 identiﬁed three distinct DNA recognition motifs , indicating the presence of at least three active adenine methyltransferase enzymes ( see Data Set S3 in the supplemental material ) . 
+ Characterization of an IncA/C multidrug resistance plasmid harboring the blaNDM-1 carbapenemase gene . 
+ We focused our study on characterization of the blaNDM-1-positive IncA/C plasmid pMS6198A . 
+ Plasmid pMS6198A belongs to type 1 of the IncA/C2 group and contains 172 CDSs classiﬁed into seven functional groups ( Fig. 1 ) . 
+ The plasmid contains the typical IncA/C replicon and additional putative maintenance genes , including parAB , parM , kfrA , and a putative TA system . 
+ Similarity searches of public sequence databases indicated that pMS6198A contains 19 conjugation genes , including the master regulators acaDC ( 22 ) . 
+ In addition to a typical ISEcp1-blaCMY-6 insertion within transfer region 1 , ﬁve resistance genes are located within ARI-A -- aacA4 , rmtC , blaNDM-1 , bleMBL , and sul1 -- along with a truncated qacEΔ1 gene . 
+ Plasmid pMS6198A contains 92 CDSs that have no assigned annotation . 
+ Sequence comparison with other IncA/C plasmids showed that pMS6198A displayed 99 % conservation over its entire sequence with two other blaNDM-1-positive IncA/C plasmids : pKP1-NDM1 ( KF992018 ) from Australia and pNDM10469 ( JN861072 ) from 
+ Canada . 
+ The genetic structure of blaNDM-1 in pMS6198A is highly similar to those of eight other IncA/C plasmids ( see Fig . 
+ S1 in the supplemental material ) . 
+ A full complement of transfer genes , as deﬁned previously ( 1 ) , were present in pMS6198A . 
+ Accordingly , pMS6198A was transferable by conjugation in static liquid culture at 37 °C to the recipient strain E. coli J53 at a frequency of 10 3 transconjugants per donor . 
+ Transfer of pMS6198A to J53 was conﬁrmed by PCR ampliﬁcation of the IncA/C replicon . 
+ Antibiotic resistance proﬁling of the transconjugant strain ( MS6614 [ see Data Set S1 in the supplemental material ] ) showed that pMS6198A was capable of conferring resistance to multiple antibiotics , including cefotaxime-clavulanic acid , ceftazidime-clavulanic acid , piperacillin-tazobactam , amoxicillin-clavulanic acid , cefoxitin , cefpodoxime , ceftriaxone , cephalothin , ampicillin , gentamicin , kanamycin , imipenem , meropenem , ertap-enem , and tobramycin . 
+ Puriﬁcation of pMS6198A from MS6614 and electrotransformation into E. coli TOP10 were performed , with subsequent analysis of the transformed TOP10 strain by PCR , conjugation , and antibiotic resistance proﬁling indicating the integrity of the plasmid was maintained ( see Data Set S1 in the supplemental material ) . 
+ Thus , all subsequent analysis of pMS6198A was performed using plasmids puriﬁed from MS6614 . 
+ Identiﬁcation of genes required for pMS6198A maintenance and replication . 
+ To identify the genes required for the maintenance and replication of pMS6198A in E. coli , we employed in vitro mutagenesis in combination with TraDIS as shown in Fig. 2 . 
+ First , in vitro mini-Tn5 mutagenesis of pMS6198A was carried out to create a highly saturated mutant plasmid DNA library . 
+ This library was transformed into E. coli TOP10 by electroporation and subsequently grown in the presence of chloramphenicol to select for pMS6198A : : mini-Tn5-containing transformants . 
+ This process was performed in duplicate , resulting in two saturated libraries , each of which contained approximately 10,000 transformants . 
+ Puriﬁed plasmid DNA was extracted from both libraries and analyzed by TraDIS . 
+ Examination by inverse PCR of a subset of individual Tn5 mutant colonies was performed to investigate the randomness of insertion and the number of insertions per plasmid molecule . 
+ The majority of mutants examined ( 17/18 ; 94 % ) contained a single insertion , each at a different location on the plasmid ( see Fig . 
+ S2 in the supplemental material ) . 
+ TraDIS identiﬁed a total of 10,178 unique insertion sites in pMS6198A from the two libraries , which was equivalent to an average of one insertion site every 13.52 bp . 
+ Correlation tests of insertion sites per gene for each library showed high reproducibility for the TraDIS sequencing method ( R2 0.99 ) ( see Fig . 
+ S3 in the supplemental material ) . 
+ The relative abundance of reads mapped to mini-Tn5 insertion sites within each gene ( expressed in reads per kilobase per million [ RPKM ] ) was calculated ( Fig. 3A ; see Data Set S4 in the supplemental material ) . 
+ This enabled us to identify genes required for the maintenance and replication of pMS6198A , as plasmids containing mutations in such genes would be lost and thus underrepresented in the libraries , reﬂected as a low RPKM value . 
+ The repA gene is known to be required for IncA/C plasmid replication ( 19 ) , and we observed a relative abundance of mini-Tn5 insertions in repA ( 418 RPKM ) ( Fig. 3Ci ) signiﬁcantly lower than the average insertion abundance for all pMS6198A genes ( 7,222 RPKM ) . 
+ Therefore , we used 418 RPKM as a biological threshold for genes required for replication and maintenance of pMS6198A . 
+ Six additional genes were identiﬁed with insertion abundances lower than this threshold : 022 , parA , parB , 053 , tnpA , and rmtC ( Fig. 3B ) . 
+ The putative TA system of IncA/C plasmids belongs to the tad-ata-like family found in a range of different genetic elements , including ICE SXT , Enterobacteria phage N15 , a genomic island of E. coli EDL933 , and the plasmid pAMI2 ( 36 ) . 
+ The 022 gene ( here referred to as ata , for antitoxin of addiction system ) lies immediately adjacent to the 023 gene ( referred to as tad , for toxin of addiction system ) ( Fig. 3Cii ) . 
+ Both tad and ata are 100 % conserved in all fully sequenced IncA/C2 plasmids but absent from IncA/C1 plasmids . 
+ Tad is a member of the Gp49 family proteins ( Pfam identiﬁer PF05973 ) , which is comprised of known toxins . 
+ In the phage N15 , Gp49 is controlled by the adjacently encoded Gp48 protein ( UniProt accession no . 
+ O64356 ) , which contains a helix-turn-helix ( HTH ) DNA binding domain ( 36 ) . 
+ Analysis of the Ata amino acid sequence showed it is 24 % identical and 50 % similar to that of Gp48 ( 71 % coverage ) and possesses an HTH domain ( Pfam identiﬁer PF13744 ) , suggestive of a DNA binding function . 
+ Our TraDIS data showed that mutation of ata is strongly underrepresented , while mutation of tad is tolerated . 
+ Taken together , our data suggest that ata and tad encode an active TA system , in which Ata is the antitoxin . 
+ Three adjacent genes , parA ( 051 ) , parB ( 052 ) , and 053 , were also identiﬁed in our TraDIS analysis ( Fig. 3Ciii ) . 
+ The three genes are conserved in all fully sequenced IncA/C plasmids . 
+ ParA belongs to an ATPase family ( Pfam identiﬁer PF13614 ) , while ParB contains both ParB ( Pfam identiﬁer PF02195 ) and KorB ( Pfam identiﬁer PF08535 ) domains . 
+ Based on bioinformatics analysis , parA and parB are homologous to IncP partitioning genes . 
+ The third gene in the locus ( 053 ) encodes a hypothetical protein of 90 amino acids ( aa ) containing a winged HTH domain ( Pfam identiﬁer PF09904 ) , suggestive of a DNA binding function . 
+ Additionally , 053 is present at the same genetic location ( after parB ) in all IncA/C plasmids . 
+ Two other genes , tnpA and rmtC , were identiﬁed in our TraDIS screen ( Fig. 3Civ and v ) . 
+ The tnpA gene encodes a transposase from the insertion sequence ISEcp1 ( 37 ) , while rmtC encodes a 16S ribosomal methyltransferase ( UniProt entry Q33DX5 ) that confers resistance to many aminoglycoside antibiotics . 
+ These two genes occupy regions of pMS6198A with the lowest G-C content ( see Fig . 
+ S4 in the supplemental material ) that may be associated with an insertion bias ( 38 ) . 
+ Furthermore , the genes are not conserved among completely sequenced IncA/C plasmids , being present in only 50 % and 11 % , respectively . 
+ Based on these data , they are unlikely to be involved in the maintenance and replication of pMS6198A . 
+ The par locus of IncA/C plasmids , of which 053 is a crucial component , contributes to stability . 
+ TraDIS analysis identiﬁed two putative partitioning genes ( parA and parB ) and a region containing an open reading frame ( ORF ) immediately downstream of parB ( 053 ) as required for plasmid maintenance . 
+ Bioinformatics analysis revealed that the structural organization of parA-parB-053 is completely conserved in all sequenced IncA/C plasmids examined in this study . 
+ This led us to hypothesize that the partitioning system in IncA/C is comprised of three components that are all required for plasmid stability in E. coli . 
+ To provide evidence to support our hypothesis , we constructed pMS6198A-derived mini-A/C plasmids containing different versions of the partitioning locus and examined their stability in E. coli strain MG1655 . 
+ The mini-A/C plasmid contained the IncA/C replicon , as previously described ( 1 , 18 , 19 ) . 
+ A selectable marker ( cat cassette , conferring chloramphenicol resistance ) was included , along with two transcriptional terminators to prevent transcriptional read-through from the cassette . 
+ Two variations of the partitioning locus were incorporated into this mini-A/C plasmid , generating pMAC2 ( parAB ) and pMAC3 ( parAB plus 053 ) ( Fig. 4A ) . 
+ The stability of pMS6198A was initially examined by growth in the absence of selection for three serial passages , each incorporating 10 generations . 
+ After 30 generations , no plasmid loss was observable ( see Fig . 
+ S5 in the supplemental material ) , highlighting the high stability of the parent plasmid . 
+ The same experiment was used to assess the stability of pMAC2 and pMAC3 . 
+ This showed that pMAC2 was less stable than pMAC3 : even at time zero , only 50 % of cells retained pMAC2 , and by 30 generations , this had dropped to 2 % ( see Fig . 
+ S5 in the supplemental material ) . 
+ In contrast , pMAC3 started at 100 % and was reduced to 64 % after 30 generations ( see Fig . 
+ S5 in the supplemental material ) . 
+ The difference in plasmid stabilities observed between the starting populations of pMAC2 and pMAC3 ( time zero ) was addressed by mixing cells harboring pMAC3 in a 1:1 ratio with plasmid-free cells , thus mimicking the starting population of cells harboring pMAC2 . 
+ Analyses using this equivalent starting population resulted in plasmid stability proﬁles very similar to those observed in the single-strain experiment , with pMAC2-harboring cells reduced to 5 % after 30 generations compared to signiﬁcantly higher ( 57 % ) retention of pMAC3 ( Fig. 4B ) . 
+ The TraDIS-identiﬁed maintenance genes are conserved in all IncA/C plasmids . 
+ Plasmid multilocus sequence typing was originally developed for the typing of large collections of plasmids using a set of conserved genes ( 39 ) . 
+ In the context of IncA/C plasmid typing , our TraDIS data have provided a deﬁned subset of genes with essential plasmid maintenance functions that could be used in the development of a PMLST scheme . 
+ To provide a framework for this analysis , a collection of 82 complete IncA/C plasmid sequences available in the GenBank database were examined for overall sequence conservation ( see Data Set S5 in the supplemental material ) . 
+ A total of 28 genes were completely conserved within this collection , which included the four genes identiﬁed by TraDIS ( repA , parA , parB , and 053 ) ( Table 1 ) . 
+ The concatenated sequences of all 28 fully conserved genes from each plasmid were used to build a maximumlikelihood tree ( Fig. 5 , left ) . 
+ Interestingly , this analysis identiﬁed multiple previously deﬁned hybrid plasmid groups , including pYR1 and p39R861-4 , and also differentiated between type 1 and 2 A/C2 plasmids ( 1 , 16 , 40 ) , with type 1 plasmids also separated into three distinct groups . 
+ Using this as a baseline , the discriminatory power of repA , parA , parB , and 053 was examined by using the concatenated sequences of the four genes from each plasmid to build a maximum-likelihood tree ( Fig. 5 , right ) . 
+ The overall topology of the tree is similar to our analysis of IncA/C plasmids using all 28 conserved genes , including a split between IncA/C1 ( group 5 ) and IncA/C2 ( groups 1 to 4 ) . 
+ The IncA/C2 lineage was further split into four distinct groups ; groups 1 and 2 comprised type 1 plasmids , and group 3 included both type 1 and type 2 plasmids , while group 
+ 4 represented the hybrid plasmid pYR1 . 
+ Although the resolution of the 28 conserved genes separated hybrid plasmid p39R861-4 and both type 1 and type 2 plasmids , their similarity in the sequences of repA-parA-parB-053 clustered the plasmids together in group 3 . 
+ Furthermore , the preservation of groups 1 , 2 , 4 , and 5 between the two analyses highlights them as distinct plasmid lineages . 
+ Taken together , the analysis demonstrates that the sequences of the four essential genes identiﬁed by TraDIS are sufﬁcient to capture the phylogenetic relatedness of IncA/C plasmids and could be used for molecular typing . 
+ Development of an IncA/C PMLST scheme . 
+ As the repA , parA , parB , and 053 genes could distinguish different groups of IncA/C plasmids , we used these biologically 
+ Primer Sequence ( 5 = -- 3 =) Primer information size ( bp ) a repA-F AAGAGAACCAAAGACAAAGAC Amplify repA 982 repA-R GCTGCTTACGCTTGTTGGA parA-F AAAAGTAATCAGCTTCGCCA Amplify parA 780 parA-R TAGCCCACCTTCTCTAATAG parB-F TGTCCGAACTTGCTAAAGC Amplify parB 1,128 parB-R CTGACACAGGCACATGAA 053-F AGATCTCACAGGACATGAA Amplify 053 250 053 - R TTCAAGAACGAAGACCTGT repA-Seq1 TGGAGTTCGTACAGAGTGA Sequence 5 = region of repA fragment NA repA-Seq2 GCTCCAGCTTCTTCCCGAT Sequence 3 = region of repA fragment NA parB-Seq1 CACACAGTCAGGTAGCTT Sequence 5 = region of parB fragment NA parB-Seq2 AAGCTACCTGACTGTGTG Sequence central region of parB fragment NA parB-Seq3 GATGCTCTTCCTCCTCTG Sequence 3 = region of parB fragment NA validated genes to develop a PMLST scheme for IncA/C plasmids . 
+ Ampliﬁcation and sequencing primers were designed to target conserved regions within each essential gene ( Table 2 ) . 
+ PCR and Sanger sequencing using these primers were performed on pMS6198A to validate the methodology . 
+ Using the sequences obtained in silico from 82 IncA/C plasmids , different alleles for repA ( n 5 ) , parA ( n 6 ) , parB ( n 7 ) , and 053 ( n 3 ) were identiﬁed , which together form 11 STs , as shown in Fig. 6 ( see Data Set S6 in the supplemental material ) . 
+ The minimum spanning tree comprises two singletons , ST10 ( pYR1 ; IncA/C2 ) and ST11 ( RA1 ; IncA/C1 ) , with the remaining nine STs linked together as single-locus variants ( SLVs ) or double-locus variants ( DLVs ) . 
+ ST3 is the largest group and includes 53 plasmids . 
+ ST3 connects to ST4 , ST5 , ST6 , ST7 , ST8 , and ST9 as SLVs , forming a clonal complex with ST3 as the founder ( ST3 clonal complex [ ST3CC ] ) . 
+ ST3 links with ST2 as a DLV , and ST2 connects to ST1 as an SLV . 
+ ST1 is the second largest group and comprises 20 plasmids . 
+ ST1 and ST3 together account for 89 % of the total IncA/C plasmids investigated . 
+ To provide higher-resolution typing applicable to next-generation sequencing data , a core gene PMLST ( cgPMLST ) scheme was also constructed by extending the 4-gene PMLST to 28 conserved genes ( see Data Set S7 in the supplemental material ) . 
+ The four loci shared between the two schemes allow backward compatibility from cgPMLST to PMLST . 
+ Thus , cgPMLST is capable of subtyping 11 STs into 35 subgroups , including 4 subgroups for ST1 ( ST1 .1 to ST1 .4 ) , 22 subgroups for ST3 ( ST3 .1 to ST3 .22 ) , and 1 subgroup for each of the remaining STs ( see Data Set S7 and Fig . 
+ S6 in the supplemental material ) . 
+ IncA/C PMLST highlights a lineage of blaNDM-harboring plasmids . 
+ To determine if the distribution of antibiotic resistance genes among IncA/C plasmids is associated with STs , the resistance gene content of each plasmid was overlaid with the PMLST phylogenetic scheme ( Fig. 7 ) . 
+ Examination of the resistance gene proﬁles highlighted distinct patterns between ST1 and ST3CC . 
+ ST1 predominantly harbors blaCMY-6 , while 
+ ST3CC plasmids mainly possess the blaCMY-2 variant ( Fig. 7 ) . 
+ ST3CC also exhibits a higher prevalence of tetracycline and streptomycin resistance genes . 
+ ST1 is strongly associated with the carriage of blaNDM ( 15/20 plasmids ; 75 % ) , while only two plasmids outside ST1 carry blaNDM ( one each from ST3 and ST6 ) . 
+ Comparison of the blaNDM genetic location has shown multiple genetic organizations ( 41 -- 46 ) . 
+ Within our database , we observed a total of seven distinct blaNDM structures ( see Fig . 
+ S1 in the supplemental material ) . 
+ Six of them are found in individual plasmids ( pNDM-SAL , ST1 ; pNDM-1_Dok01 , ST1 ; pEC2-NDM-3 , ST1 ; pNDM15-1078 , ST1 ; pRH-1238 , ST3 ; and pMR0211 , ST6 ) , and one has been observed within ARI-A of nine plasmids ( 45 , 47 ) . 
+ All nine plasmids belong to ST1 , subgroups ST1 .2 and ST1 .4 ( see Fig . 
+ S6 in the supplemental material ) . 
+ Overall , non-ST1 blaNDM structures are distinct from all others , while the presence of similar structures in 9/15 ST1 plasmids suggests the successful dissemination and diversiﬁcation of the plasmid lineage . 
+ DISCUSSION
+ Carbapenem-resistant Enterobacteriaceae have been recognized as an urgent threat to human health ( 48 ) . 
+ Infections caused by these bacteria are often resistant to almost all clinically available antibiotics and are frequently associated with poor health outcomes . 
+ New Delhi metallo - - lactamase is a recently emerged carbapenemase ﬁrst described in 2009 ( 49 ) . 
+ Since then , several IncA/C plasmids carrying the blaNDM-1 gene ( 4 , 41 , 47 , 50 -- 52 ) , and recently blaNDM-3 ( 45 ) , have been reported . 
+ Plasmids of incompatibility group A/C have been known for more than 40 years , but they have only recently gained increased interest due in part to their emergence as the major plasmid type carrying the cephalosporinase gene blaCMY ( 2 ) . 
+ However , the broad-host range characteristic of IncA/C plasmids and their roles in the dissemination of multiple antibiotic resistance genes , including blaCMY and blaNDM , are underappreciated . 
+ Here , we have used a validated set of essential genes to develop a high-resolution typing scheme to monitor the spread and transmission of IncA/C plasmids . 
+ Previous studies on the replicon of RA1 , the prototypical IncA/C plasmid , showed that IncA/C plasmid replication is mediated by the autoregulated repA gene and an iteron upstream of a DnaA box ( 19 ) . 
+ Indeed , our mutagenesis analysis conﬁrmed a requirement for repA and its adjacent iteron region in pMS6198A ( Fig. 3Ci ) . 
+ This validated our method to identify required genetic components and also served as a reference point for identiﬁcation of six other genes , namely , ata , parA , parB , 053 , tnpA , and rmtC . 
+ The addiction system of IncA/C plasmids has been the subject of various studies , but its contribution to IncA/C plasmid stability has not been fully established . 
+ Recent transcriptome analysis of pAR060302 showed that the system is strongly transcribed ( 21 ) . 
+ Furthermore , multiple attempts to mutate ata were ultimately unsuccessful ( 22 ) . 
+ Our TraDIS data conﬁrmed that ata mutants were highly underrepresented . 
+ This is likely a result of uncontrolled toxin expression leading to cell death and demonstrates that the system is active in pMS6198A . 
+ The activity of the system may contribute to the variation in stability between the parent plasmid , pMS6198A , and pMAC3 ( see Fig . 
+ S5 in the supplemental material ) . 
+ Moreover , it is possible that additional stability elements may be present on pMS6198A that exert subtle effects undetected by our stringent TraDIS threshold . 
+ A putative partitioning locus is present in all IncA/C plasmids . 
+ It has been noted that the parA and parB genes share similarity to IncP plasmid partitioning and regulatory elements ( 1 ) . 
+ Our TraDIS data strongly support a role for these genes in the maintenance and stability of IncA/C plasmids . 
+ Additionally , we identiﬁed the adjacent ORF , 053 , as a novel element of the locus that has no IncP homolog . 
+ We showed that parAB alone were not sufﬁcient for maintenance ( Fig. 4 , pMAC2 ) and that plasmid stability improved markedly with the addition of a 420-bp fragment containing ORF 053 ( Fig. 4 , pMAC3 ) . 
+ Our results invoke at least two hypotheses : that a parS cis-acting centromerelike site of the ParAB partitioning system is present within this region or that 053 encodes a novel partitioning protein . 
+ Further work is required to elucidate the mechanisms by which this 420-bp region contributes to plasmid stability . 
+ Plasmid pMS6198A carries a number of other genes with putative replication and maintenance functions , including kfrA ( 161 ) and stbA ( 174 ) . 
+ However , they were not identiﬁed by TraDIS analysis ( see Data Set S4 in the supplemental material ) , suggesting they are not critical to pMS6198A maintenance in E. coli . 
+ Conversely , tnpA and rmtC have functions unrelated to plasmid stability yet were identiﬁed . 
+ Both genes have low GC contents ( 34 % and 41 % , respectively , compared to an average of 52 % ) across the entire pMS6198A sequence . 
+ This is consistent with an overall increased mini-Tn5 insertion frequency in high - versus low-GC regions of pMS6198A ( see Fig . 
+ S4 in the supplemental material ) . 
+ Thus , the identiﬁcation of these genes may be the result of insertion bias of the Tn5 transposon in the in vitro mutagenesis reaction ( 38 ) . 
+ Deciphering phylogenetic relationships among plasmids can be challenging due to their mosaic nature . 
+ However , within one plasmid incompatibility group , it is expected that replication and maintenance machineries required for its biology should be conserved . 
+ This notion has been successfully applied to select genes for multilocus sequence typing schemes of plasmids from many Inc groups , including IncI1 ( 39 ) , IncHI1 ( 53 ) , IncHI2 ( 54 ) , and IncN ( 55 ) . 
+ Here , we propose a new PMLST scheme for IncA/C plasmids based on an experimentally validated set of essential genes . 
+ The loci selected for our PMLST scheme were based on the required genes identiﬁed by TraDIS analysis and supported by their presence in all IncA/C plasmids investigated . 
+ The scheme identiﬁed 11 sequence types , demonstrating higher resolution than current typing methods . 
+ We suggest that our PMLST scheme could be used to monitor dissemination and diversiﬁcation patterns of IncA/C plasmids . 
+ Of the 11 IncA/C sequence types , the majority of plasmids belong to two main types : ST1 and ST3 . 
+ ST3 is the largest group , comprising 53 plasmids that form a clonal complex ( ST3CC ) with plasmids from ST4 , ST5 , ST6 , ST7 , ST8 , and ST9 ( Fig. 6 ) . 
+ The ST3CC plasmids were isolated from 1969 to 2015 in 19 countries and 16 species ( Fig. 6 ; see Data Set S5 and Fig . 
+ S7 in the supplemental material ) , highlighting their wide geographical distribution and broad host range characteristics . 
+ Interestingly , the IncA/C2 type 1 plasmids within ST3CC showed strong association with blaCMY ( 23/31 ; 74 % ) . 
+ As highlighted previously , the high frequency of blaCMY-positive IncA/C plasmids isolated from Salmonella enterica in the United States is an indication of sampling bias within the data set ( 1 ) . 
+ ST1 is the second-largest group in our data set , with 20 plasmids . 
+ The most striking feature of ST1 is the strong association with the carriage of blaNDM , with the majority of ST1 plasmids isolated from E. coli and K. pneumoniae . 
+ The clinical relevance of these species and carbapenem resistance has likely contributed to a sampling bias of ST1 plasmids . 
+ Nevertheless , it is tempting to hypothesize that ST1 is a newly emerged lineage of IncA/C plasmids , with carbapenem resistance enhancing its selection and dissemination . 
+ Further work is needed to test this hypothesis ; however , the presence of a common blaNDM structure in speciﬁc ST1 plasmid subgroups is supportive of the tenet . 
+ The observation of most concern here is the identiﬁcation of ST1 in nine countries spanning four continents ( see Data Set S5 and Fig . 
+ S7B in the supplemental material ) , highlighting the urgent need for surveillance and control of this extensively drug-resistant plasmid lineage . 
+ More in-depth investigations within each ST may provide a better framework for analyzing the evolution of IncA/C plasmids . 
+ Inspired by the development of core genome MLST schemes ( 56 ) , we have also constructed a cgPMLST scheme for IncA/C plasmids using the 28 conserved genes identiﬁed in our database ( see Fig . 
+ S6 and Data Set S7 in the supplemental material ) . 
+ IncA/C cgPMLST is intended as a subtyping scheme complementary to PMLST to increase the discriminatory power suitable for plasmid epidemiology studies . 
+ The discriminatory power , measured by Hunter 's index ( 57 ) , of our IncA/C PMLST is 0.53 and is increased to 0.90 for cgPMLST . 
+ Other methods , such as that proposed by Harmer and Hall ( 58 ) , could also be used in conjunction with PMLST to subtype IncA/C plasmids . 
+ While sequence data from next-generation sequencing platforms , especially those from Paciﬁc Biosciences ( PacBio ) , would provide full plasmid backbone sequences for in-depth epidemiology and high-resolution phylogeny analysis , such technologies remain out of reach for many laboratories in developing countries , where surveillance and control measures are most needed . 
+ Our PMLST scheme provides a robust method based on PCR and Sanger sequencing for the identiﬁcation of major lineages of IncA/C plasmids . 
+ These major lineages can then be subtyped using cgPMLST when wholeplasmid sequence data are available . 
+ Like other MLST-based schemes , both of our schemes are compatible with freely available tools , such as SRST2 ( 59 ) and pMLST ( 60 ) , for in silico determination of plasmid STs , enabling the use of typing data across different settings . 
+ Our PMLST and cgPMLST schemes are also available on the public databases for molecular typing and microbial genome diversity ( PubMLST ) ( 79 ) . 
+ MATERIALS AND METHODS
+ Bacterial strains and growth conditions . 
+ E. coli MS6198 was isolated from the urine of a patient with a urinary tract infection in Haryana , India , in 2010 ( 61 ) . 
+ The E. coli strain J53 was provided by G. Jacoby ( 62 ) . 
+ The strains were routinely cultured at 37 °C under orbital shaking ( 250 rpm ) , in liquid or on solid lysogeny broth ( LB ) medium , supplemented with appropriate antibiotics . 
+ The following concentrations were typically used : ampicillin , 100 g/ml ; sodium azide , 100 g/ml ; meropenem , 1 g/ml ; and chloramphenicol , 30 g/ml . 
+ Electrocompetent cells were prepared , and transformations were performed as described previously ( 29 ) . 
+ All the strains were stored in 15 % glycerol at 80 °C . 
+ DNA puriﬁcation and analysis . 
+ The PureLink HiPure Midiprep plasmid DNA puriﬁcation kit ( Invitrogen ) was used to purify pMS6198A , while vector plasmids ( 10 kb ) were puriﬁed with the QIAprep Spin Miniprep kit ( Qiagen ) . 
+ Genomic DNA was obtained with MoBio 's Ultraclean microbial DNA isolation kit . 
+ DNA concentrations were quantiﬁed using a NanoDrop 2000 ( Thermo Scientiﬁc ) and/or Qubit 2.0 ( Life Technologies ) ﬂuorometer . 
+ PCR and sequencing . 
+ The presence of plasmids was determined by PCR - based replicon typing ( 63 , 64 ) ; blaNDM was identiﬁed with primers 5 = - GGTTTGGCGATCTGGTTTTC-3 = and 5 = - CGGAATGGCTCATCACG ATC-3 = as previously described ( 65 ) , using One Taq polymerase ( New England BioLabs ) . 
+ All restriction enzymes , T4 ligase , and Antarctic phosphatase were purchased from New England BioLabs . 
+ All capillary sequencing reactions were prepared using BigDye Terminator mix and sequenced by the Australian Equine Genetics Research Centre ( AEGRC ) . 
+ A full list of primers used in this study is shown in Data Set S9 in the supplemental material . 
+ PMLST PCR and sequencing protocol . 
+ Ampliﬁcation of the four loci used in PMLST was performed with Kapa HiFi DNA polymerase ( Kapa Biosystems ) using primers listed in Table 2 with the following cycling program : 95 °C for 3 min ; 25 cycles of 98 °C for 20 s , 60 °C for 15 s , and 72 °C for 30 s ; and a ﬁnal extension of 72 °C for 3 min . 
+ Each amplicon was then puriﬁed using the QIAgen PCR puriﬁcation kit and sequenced using BigDye Terminator v3 .1 cycle sequencing ( Life Technology ) with the appropriate sequencing primers listed in Table 2 . 
+ Disc diffusion and mating assays . 
+ The disc diffusion assay was performed and interpreted according to the Clinical and Laboratory Standards Institute guidelines ( 2014 ) . 
+ Antimicrobial discs were obtained from Becton Dickinson . 
+ For mating assays , the sodium azide-resistant E. coli strain J53 ( 62 ) was used as the recipient in all mating assays . 
+ Donor and recipient strains were grown to an optical density at 600 nm ( OD600 ) equal to 2.0 . 
+ The cells were then mixed at a ratio of 1:2 ( donors to recipients ) in LB and incubated at 37 °C for 2 h under static conditions . 
+ Total CFU of donors , recipients , and transconjugants were enumerated on LB agar plates with appropriate antibiotic selection ( ampicillin for donors and sodium azide for recipients ) , and the conjugation frequency was calculated as the number of transconjugants per donor . 
+ Plasmid stability assays . 
+ Time course stability assays were performed essentially as previously described ( 66 ) . 
+ Population counts were achieved by 10-fold serial dilutions and 5 - l drop test on LB agar with or without selection . 
+ For solid-medium stability assays , strains were grown overnight on LB agar supplemented with antibiotics , and then single colonies were suspended in 0.9 % NaCl . 
+ Population counts were achieved by 10-fold serial dilutions and 5 - l drop tests on LB agar with or without selection . 
+ In vitro transposon mutagenesis . 
+ A custom mini-Tn5 transposon containing a chloramphenicol resistance cassette ( Tn5-Cm ) was generated as previously described ( 30 ) . 
+ An in vitro plasmid mutant library was created by incubating 200 ng of pMS6198A DNA and equimolar Tn5-Cm DNA ( 1.588 ng ) with 1 l of Tn5 transposase ( 1 U / l ) from an EZ-Tn5 R6K ori/KAN -2 insertion kit ( Epicentre ) at 37 °C for 2 h . 
+ The reaction was stopped by adding 1 l 1 % SDS and heating at 70 °C for 10 min . 
+ The mutant plasmid library ( 1 l ) was transformed into 50 l of E. coli TOP10 electrocompetent cells . 
+ Cells carrying mutant plasmids were selected by plating on LB agar supplemented with chloramphenicol . 
+ Mutants were pooled by scraping colonies off agar plates into LB . 
+ After addition of glycerol to a ﬁnal concentration of 15 % , the mutant library was stored at 80 °C . 
+ Inverse-PCR method . 
+ Puriﬁed DNA from individual mutants was digested with BanII for 2 h at 37 °C and heat inactivated at 65 °C for 20 min . 
+ This mixture was ligated with T4 DNA ligase overnight at 16 °C . 
+ This ligation mixture was used as the template for PCR using OneTaq and cat-speciﬁc primers 3748 and 3950 , with thermocycling : 94 °C for 1 min ; 30 cycles of 94 °C for 30 s , 62 °C for 30 s , and 68 °C for 3.5 min ; and a ﬁnal extension time of 5 min . 
+ Transposon-directed insertion site sequencing . 
+ Illumina library preparation was performed using a Nextera DNA Sample Prep kit ( Illumina ) following the manufacturer 's instructions with modiﬁcations for TraDIS . 
+ Brieﬂy , genomic DNA was fragmented and tagged with adapter sequence via one enzymatic reaction ( tagmentation ) . 
+ Following tagmentation , the DNA was puriﬁed using the Zymo DNA Clean and Concentrator kit ( Zymo Research ) . 
+ The PCR enrichment step was run using index primer 1 ( one index per sample ) and a custom transposon-speciﬁc primer , 4844 ( 5 = - AATGATACGGCGACCACCGAGATCTACACTA GATCGCaacttcggaataggaactaagg-3 = [ transposon-speciﬁc sequence is in lowercase ] ) to enrich for transposon insertion sites and allow multiplexing sequencing ; the thermocycler program was 72 °C for 3 min and 98 °C for 30 s , followed by 22 cycles of 98 °C for 10 s , 63 °C for 30 s , and 72 °C for 1 min . 
+ Each library was puriﬁed using Agencourt Ampure XP magnetic beads . 
+ Veriﬁcation and quantiﬁcation of the resulting libraries were calculated using a Qubit 2.0 ﬂuorometer , a 2100 Bioanalyzer ( Agilent Technologies ) , and quantitative PCR ( qPCR ) ( Kapa Biosciences ) . 
+ All libraries were pooled in equimolar amounts to a ﬁnal concentration of 3.2 nM and submitted for sequencing on the MiSeq platform at the Queensland Centre for Medical Genomics ( Institute for Molecular Bioscience , University of Queensland ) . 
+ The MiSeq sequencer was loaded with 12 pM of pooled library with 5 % PhiX spike-in and sequenced ( single-end ; 101 cycles ) using a mixture of standard Illumina sequencing primer and Tn5-speciﬁc sequencing primer 4845 ( 5 = - actaaggaggatattcatatggaccatggctaattcccatgtcagatgtg-3 =) . 
+ All sequence data analysis and insertion site mapping were performed as previously described ( 30 , 32 ) . 
+ The threshold for plasmid maintenance was set to the value of the repA gene ( 418 RPKM ) , in accordance with previous work that has demonstrated the gene is required for IncA/C plasmid replication ( 19 ) . 
+ Construction of mini-A/C plasmids . 
+ The replicon fragment was ampliﬁed with primers 7139/7140 from puriﬁed pMS6198A , the transcriptional terminator ( TT ) fragment gene was ampliﬁed with primers 7141/7142 from pQE30 , and the cat gene was ampliﬁed with primers 7143/7144 from pKD3 . 
+ The primers were designed so that adjacent fragments had 20-bp complementary overhangs , and a multiple-cloning site ( MCS ) was included between the replicon and TT fragments . 
+ All three fragments were mixed in equimolar ratios in a PCR using Kapa HiFi with thermocycling : 95 °C ; 30 cycles of 98 °C for 20 s , 66 °C for 20 s , and 72 °C for 6 min ; and a ﬁnal extension of 72 °C for 8 min . 
+ The product of this reaction was electroporated into E. coli TOP10 . 
+ The desired product was conﬁrmed by PCR screening and NheI digestion of puriﬁed plasmid DNA . 
+ The TT fragment contained two terminators , lambda t0 and rrnB t1 , separated by a cat gene . 
+ The cat gene was removed by PCR ampliﬁcation of the plasmid using primers 7145/7146 , followed by NheI digestion 37 °C for 2 h and overnight ligation with T4 DNA ligase . 
+ The product was electroporated into E. coli TOP10 , and subsequent puriﬁcation , PCR screening , and NheI digestion conﬁrmed the desired product , which was referred to as pMAC1 . 
+ The partitioning fragments were ampliﬁed using Kapa HiFi with primers 7147/7149 for pMAC2 and 7147/7148 for pMAC3 . 
+ Each fragment was cloned into the MCS of pMAC1 using BamHI and HindIII restriction sites . 
+ The resultant plasmids ( pMAC2 and pMAC3 ) were conﬁrmed to be correct by sequencing . 
+ Genome sequencing , assembly , and methylome analysis . 
+ Genomic DNA from MS6198 was sequenced on the PacBio RSII ( University of Malaya ) using the P4 polymerase and C2 sequencing chemistry . 
+ The raw sequencing data were assembled de novo using the hierarchical genome assembly process ( HGAP ) version 2 from the SMRT Analysis software suite ( version 2.3.0 ; Paciﬁc Biosciences ) with default parameters . 
+ The assembled contigs were visually screened for overlapping sequences on their 5 = and 3 = ends using contiguity ( 67 ) . 
+ These overlapping ends were manually trimmed based on sequence similarity , and the contigs were circularized . 
+ The circularized contigs ( chromosome and plasmids ) were then polished by mapping raw sequencing reads back onto the assembled circular contigs . 
+ The detection of methylated bases and clustering of modiﬁed sites to identify methylation-associated motifs was performed as previously reported ( 68 ) . 
+ In brief , raw reads were aligned to the complete genome of MS6198 , and interpulse duration ( IPD ) ratios were calculated using PacBio 's in silico kinetic reference computational model . 
+ Sequence analysis , annotation , and in silico typing . 
+ Visualization and annotation of plasmid sequences were performed using PROKKA v1 .11 ( 69 ) and the Artemis Genome Browser ( 70 ) . 
+ Sequence comparisons were constructed using WebACT ( 71 ) and visualized with Easyﬁg 2.1 ( 72 ) and Artemis Comparison Tool ( ACT ) ( 73 ) . 
+ In silico DNA manipulations and analysis were conducted and visualized in CLC Main Workbench ( version 7.0.2 ; Qiagen Bioinformatics ) and Easyﬁg 2.1 ( 72 ) . 
+ In silico E. coli multilocus sequence typing was performed using the MLST database hosted at the University of Warwick ( 74 ) . 
+ Plasmid Inc types were determined by PCR-based replicon typing ( 63 , 64 ) or in silico using PlasmidFinder ( 60 ) . 
+ Collection of IncA/C complete sequences and analysis . 
+ Complete sequences of IncA/C plasmids from GenBank were selected by BLASTn using the sequence of RA1 repA as a reference ( 90 % identity and 90 % query coverage ) . 
+ The BLASTn hits were manually reviewed , and only published sequences were included in our IncA/C plasmid database . 
+ Each plasmid sequence was also veriﬁed by BLASTn to contain IncA/C replicon-typing primer binding sites ( 63 ) . 
+ With the addition of pMS6198A , this database comprised 82 IncA/C plasmid sequences ( as of 9 May 2016 ) ( see Data Set S5 in the supplemental material ) . 
+ The gene annotations of pMS6198A were used as a reference to identify genes present in all 82 IncA/C plasmids , using default BLASTn v2 .2.26 settings with the criteria of an expected value of 10 30 and minimum coverage of 95 % . 
+ The sequence of each conserved gene was extracted from each plasmid using EMBOSS v6 .5.7 ( 75 ) . 
+ PMLST minimal spanning trees were built by Phyloviz using the goeBURST algorithm ( 76 ) . 
+ All alignments were constructed in MEGA 6.06 ( 77 ) using ClustalW with default settings . 
+ Phylogenetic trees were produced in MEGA 6.06 using maximum likelihood with default settings and supported with 1,000 bootstraps . 
+ The presence or absence of resistance genes was determined using the BLASTn algorithm ( 100 % identity at 100 % coverage ) against the resistance gene database ARG-ANNOT ( 78 ) . 
+ IncA/C sequence analysis and metadata have been incorporated into the microreact database ( https://microreact.org/project/IncACPlasmids?tt rc & tns 4 & tts 6 ) ( 80 ) . 
+ Accession number ( s ) . 
+ The sequences for the MS6198 chromosome , pMS6198A , pMS6198B , pMS6198C , and pMS6198D have been deposited in the NCBI GenBank database under accession numbers CP015834 to CP015838 , respectively . 
+ Raw PacBio sequence reads for MS6198 and Illumina MiSeq reads for duplicate TraDIS runs have been deposited in the Sequence Read Archive ( SRA ) under accession numbers SRX1797306 , SRX1992326 ( replicate 1 ) , and SRX1992327 ( replicate 2 ) , respectively . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at https://doi.org/10.1128/ AAC.01740-16 . 
+ ACKNOWLEDGMENTS
+ We thank David Miller , Tim Bruxner , and Angelika Christ from the Queensland Centre for Medical Genomics ( Institute for Molecular Bioscience , University of Queensland ) for technical support with the TraDIS protocol . 
+ The protocol was set up in consultation with Brian Fritz ( Illumina ) and Sabine Eckert , Daniel Turner , and Matthew Mayho ( Wellcome Trust Sanger Institute ) . 
+ This work was supported by grants from the National Health and Medical Research Council ( NHMRC ) of Australia ( GNT1033799 and GNT1067455 ) and High Impact Research ( HIR ) grants from the University of Malaya ( UM-MOHE HIR grant UM C/625/1 / HIR/MOHE/CHAN / 14/1 , no . 
+ H-50001-A000027 ; UM-MOHE HIR grant UM C/625/1 / HIR / MOHE/CHAN/01 , no . 
+ A000001-50001 ) . 
+ M.A.S. is supported by an NHMRC Senior Research Fellowship ( GNT1106930 ) , and S.A.B. is supported by an NHMRC Career Development Fellowship ( GNT1090456 ) . 
+ Teik Min Chong is supported by the Postgraduate Research Fund ( PPP ) ( grant no . 
+ PG080-2015B ) .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/28039131.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/28039131.txt 0 → 100644
View file @27818a9
+ Salmonella Persistence in Tomatoes
+ ABSTRACT Human enteric pathogens , such as Salmonella spp . 
+ and verotoxigenic Escherichia coli , are increasingly recognized as causes of gastroenteritis outbreaks associated with the consumption of fruits and vegetables . 
+ Persistence in plants represents an important part of the life cycle of these pathogens . 
+ The identiﬁcation of the full complement of Salmonella genes involved in the colonization of the model plant ( tomato ) was carried out using transposon insertion sequencing analysis . 
+ With this approach , 230,000 transposon insertions were screened in tomato pericarps to identify loci with reduction in ﬁtness , followed by validation of the screen results using competition assays of the isogenic mutants against the wild type . 
+ A comparison with studies in animals revealed a distinct plant-associated set of genes , which only partially overlaps with the genes required to elicit disease in animals . 
+ De novo bio-synthesis of amino acids was critical to persistence within tomatoes , while amino acid scavenging was prevalent in animal infections . 
+ Fitness reduction of the Salmonella amino acid synthesis mutants was generally more severe in the tomato rin mutant , which hyperaccumulates certain amino acids , suggesting that these nutrients remain unavailable to Salmonella spp . 
+ within plants . 
+ Salmonella lipopolysaccharide ( LPS ) was required for persistence in both animals and plants , exemplifying some shared path-ogenesis-related mechanisms in animal and plant hosts . 
+ Similarly to phytopathogens , Salmonella spp . 
+ required biosynthesis of amino acids , LPS , and nucleotides to colonize tomatoes . 
+ Overall , however , it appears that while Salmonella shares some strategies with phytopathogens and taps into its animal virulence-related functions , colonization of to-matoes represents a distinct strategy , highlighting this pathogen 's ﬂexible metabolism . 
+ IMPORTANCE Outbreaks of gastroenteritis caused by human pathogens have been increasingly associated with foods of plant origin , with tomatoes being one of the common culprits . 
+ Recent studies also suggest that these human pathogens can use plants as alternate hosts as a part of their life cycle . 
+ While dual ( animal/plant ) life-styles of other members of the Enterobacteriaceae family are well known , the strategies with which Salmonella colonizes plants are only partially understood . 
+ Therefore , we undertook a high-throughput characterization of the functions required for Salmo-nella persistence within tomatoes . 
+ The results of this study were compared with what is known about genes required for Salmonella virulence in animals and interactions of plant pathogens with their hosts to determine whether Salmonella repurposes its virulence repertoire inside plants or whether it behaves more as a phytopathogen during plant colonization . 
+ Even though Salmonella utilized some of its virulence-related genes in tomatoes , plant colonization required a distinct set of functions . 
+ Approximately 15 % of all cases of human salmonellosis are thought to be associated with the consumption of fruits and vegetables ( 1 ) . 
+ The ability of nontyphoidal strains of Salmonella enterica to colonize plants , including sprouts , alfalfa , lettuce , melons , and tomatoes , is highlighted by the rise in the number and severity of salmonellosis outbreaks linked to produce . 
+ The capacity of S. enterica ( and other enteric pathogens ) to readily multiply within plant tissues led to the hypothesis that persistence on plants is a part of the Salmonella life cycle , serving as reservoirs prior to reinfection of the preferred animal hosts ( 2 -- 5 ) . 
+ Several studies have dissected the molecular basis of plant-Salmonella interactions ( 6 -- 16 ) . 
+ However , the entire complement of genetic functions required for plant colonization by Salmonella is not yet known . 
+ Previous studies established that colonization of plants by Salmonella is a complex process , dependent on both host and bacterial factors . 
+ It involves major changes in the bacterial transcriptome and requires genes involved in amino acid biosynthesis and transport , cellulose production , ﬁmbriae , regulators , and surface structures ( 6 -- 17 ) . 
+ Some of the same mechanisms were involved in the colonization of both vegetative and reproductive tissues on different plant species . 
+ Colonization of the tomato pericarp involves differential regulation of at least 50 genes ( 9 ) , some of which were also required for proliferation , including the gene cysB , encoding a cysteine metabolism regulator ( 9 ) . 
+ Surface structures , like the O-antigen capsule and curli ﬁmbriae , were also required for the colonization of tomatoes , as indicated by reduced ﬁtness of the corresponding Salmonella mutants ( 11 , 17 ) . 
+ The O-antigen capsule also plays a role in the colonization of alfalfa sprouts by Salmonella spp. , as do curli ﬁmbriae , suggesting that these structures may be broadly required for the colonization and/or persistence within plants ( 7 , 14 ) . 
+ Iron acquisition was shown to play an important role during Salmonella colonization of vegetative and reproductive tissues . 
+ The siderophore transporter fepDGC was required for Salmonella proliferation in tomato pericarps ( 16 ) , and siderophore synthesis was necessary for the colonization of lettuce and alfalfa ( 10 ) . 
+ Biosynthesis of amino acids was also a likely factor required for Salmonella growth in alfalfa root exudates ( 15 ) . 
+ Plant colonization by Salmonella is affected by host factors ; the genetic background can inﬂuence Salmonella colonization of sprouts ( 18 ) , proliferation in the tomato phyllosphere ( 19 , 20 ) , and pericarps ( 21 ) . 
+ Some of the previous work focused on the roles of Salmonella virulence factors during its interaction with plants . 
+ The term `` virulence factor '' was sometimes used to broadly deﬁne any gene ( e.g. , rpoS ) required for full virulence in animal models . 
+ While the contribution of the Salmonella genes that generally impact animal virulence was characterized during the attachment and colonization of alfalfa sprouts ( 6 ) , the involvement of the more dedicated virulence-related functions , such as type III secretion systems ( T3SS ) , and their regulators encoded on the Salmonella pathogenicity islands ( SPIs ) remains controversial . 
+ Strains lacking SPI-1 and SPI-2 T3SSs had reduced proliferation in Arabidopsis thaliana leaves , failed to suppress the plant immune response against bacteria , and were required for the chlorotic appearance of plants ( 22 ) . 
+ The effector spvC carried by the virulence plasmid suppressed inducible plant defenses when it was directly transformed into Arabidopsis protoplasts ( 22 ) . 
+ Salmonella strains lacking SPI-1 T3SS induced a stronger response from plants , suggesting that this pathogenicity island represses plant immunity ( 23 ) . 
+ However , studies linking the T3SSs encoded by SPIs to plant colonization by Salmonella required high bacterial titers , which are unlikely to happen naturally . 
+ Moreover , Salmonella spp . 
+ induced plant immune responses at lower levels than specialized plant pathogens , like Pseudomonas syringae ( 24 ) . 
+ Some studies reported that Salmonella strains lacking SPI-1 , SPI-2 , SPI-3 , SPI-4 , and SPI-5 are not defective for the colonization of tomato and cantaloupe fruits ( 9 , 25 ) . 
+ Furthermore , there is no direct evidence that Salmonella can translocate effectors into plant cells via SPI-encoded T3SS , calling into question the requirement for the corresponding genes during plant colonization ( unless the T3SS apparatus performs other noncanonical functions during plant colonization , a hypothesis that has not yet been ruled out ) . 
+ Reverse genetics was previously employed to identify Salmonella virulence genes in animals . 
+ Mutants were created and individually screened to identify those defective in invasion and intracellular proliferation in cell cultures , and those with impaired virulence in intraperitoneal or oral infection models ( 26 -- 33 ) . 
+ Recently , the development of high-throughput technologies allowed coverage of the entire genome in these screens ( 34 ) . 
+ Here , we used a similar approach to test the hypothesis that Salmonella genes required for the colonization of plants are different from those required for virulence in animals , and , furthermore , that the strategies that Salmonella uses to colonize plants are distinct from those used by phytopathogens . 
+ We used a transposon insertion sequencing approach to identify Salmonella enterica serovar Typhimurium loci required for persistence in tomatoes and subsequently conﬁrmed the phenotypes using competition assays with isogenic mutants . 
+ We also compared the results obtained in this study with similar screens in animals , as well as with the data obtained using mutant screens of phytopathogens . 
+ RESULTS
+ Transposon insertion library screening and sequencing . 
+ The tested transposon mutant library consisted of about 280,000 independent mutants with Tn5 insertions , of which 230,000 insertion sites were mapped onto the genome . 
+ The mapped insertions disrupted 88 % of all features of the genome . 
+ The remaining 12 % of the loci ( 1,263 features ) are likely essential for Salmonella survival and proliferation under the culture conditions used . 
+ We deﬁned 10,632 features in the Salmonella enterica serovar Typhi-murium ATCC 14028 genome , including coding sequences , intergenic regions , and noncoding RNA . 
+ Salmonella spp . 
+ can proliferate within red ripe tomato pericarps , reaching 107 CFU per fruit ( 9 , 19 , 21 , 25 , 35 ) . 
+ To represent the entire transposon insertion library , the screening needed to be done using an inoculum titer very close to the carrying capacity of tomatoes . 
+ Consequentially , bacterial growth appeared constrained , potentially imposing a limit on the possible changes in frequency of transposon insertions with differential ﬁtness . 
+ We tested whether screening the mutant library under this condition would allow us to detect selection , using the rcsA : : kan mutant strain , whose ﬁtness is known to be modestly reduced in tomatoes ( 36 ) . 
+ Because following the screen , recovered mutants were brieﬂy grown out in LB to increase DNA output , we also conﬁrmed in preliminary experiments with the rcsA : : kan mutant that this outgrowth does not affect the recovery ratio of the mutant or its DNA ( data not shown ) . 
+ These optimization experiments established the feasibility of a high-inoculum titer to identify Salmonella transposon insertions with differential ﬁtness . 
+ Moreover , it suggested that even when the population is stagnant ( death and reproduction were present at the same rate ) , the turnover of bacterial cells is sufﬁcient to lead to changes in the abundance of genotypes with differential ﬁtness . 
+ The mutant library was screened in ripe Campari tomato pericarps using 12 biological replicates and three technical replicates each ; the biological replicates were pooled , resulting in six ﬁnal samples for library sequencing , with three input samples and three output samples . 
+ The screen identiﬁed 1,245 features with differential ﬁtness ( false-discovery rate [ FDR ] 0.1 ) ; 1,112 of them were under negative selection , and only 132 features were under positive selection ( Fig. 1 and Data Set S1 ) . 
+ This represents 11.8 % and 1.4 % of all tested features , respectively . 
+ Among the loci with differential abundance , 886 loci were coding sequences . 
+ Loci with reduced ﬁtness often appeared adjacent to each other , mostly due to operon organization . 
+ Salmonella metabolic requirements during its interaction with tomatoes . 
+ A high number of features identiﬁed in this screen were involved in metabolic pathways . 
+ To investigate the metabolic pathways required by Salmonella spp . 
+ during their interaction with the plant , we projected the KEGG Orthology terms for each locus against the Salmonella metabolic map using KEGG Mapper . 
+ Overall , the screen revealed that Salmonella proliferation within tomatoes is a complex process , requiring the combination of several catabolic and anabolic pathways , including biosynthesis of amino acids , nucleotides , and lipids , as well as glycolysis ( Fig. 2 ) . 
+ The results indicate that carbohydrates that are metabolized to pyruvate via glycolysis are a major carbon source for Salmonella within tomatoes . 
+ In Salmonella spp . 
+ colonizing the tomato , pyruvate is likely fermented to acetate , as indicated by the reduced ﬁtness of mutants in pta and ackA and the absence of phenotypes for the mutants of the genes involved in the tricarboxylic acid ( TCA ) cycle . 
+ The TCA cycle oxidizes acetyl-coenzyme A ( acetyl-CoA ) originating from pyruvate under aerobic conditions . 
+ The use of fermentation instead of the TCA cycle is consistent with the fact that the environment inside tomato fruits is microaerophilic or anaerobic . 
+ The Gene Ontology enrichment analysis indicated that the methionine , arginine , and branched-chain amino acid synthesis pathways were overrepresented among the Salmonella functions required for tomato colonization . 
+ This is an indication that these nitrogen compounds are not readily available for the bacterium within tomatoes and that Salmonella has to synthesize them de novo . 
+ Since the data indicate that carbohydrates may provide the major carbon source for Salmonella within tomatoes , it is possible that these are providing the carbon backbone for bacterial amino acid and nucleotide synthesis in this environment . 
+ Four genes related to nitrogen uptake were identiﬁed in the screen : lysP , encoding the lysine-speciﬁc permease ; STM14_1095 , encoding a putative amino acid transporter ; potB , encoding a high-afﬁnity spermidine / putrescine ABC transporter ; and STM14_1979 , encoding a putative ABC polar amino acid transporter . 
+ None of these genes are involved in the transport of the most abundant nitrogen compounds available in tomatoes ( glutamine , gamma-amino butyric acid [ GABA ] , and inorganic nitrogen ) . 
+ Salmonella spp . 
+ may be exploiting several nitrogen sources and then synthesize the complement of the needed amino acids . 
+ Fatty acid biosynthesis and degradation were required by Salmonella spp . 
+ within tomatoes , perhaps because functional cellular membrane synthesis is necessary for bacterial proliferation and maintenance . 
+ Genes in fatty acid metabolism that contrib-uted to Salmonella ﬁtness included the 3-oxoacyl - [ acyl-carrier-protein ] synthase gene fabF , and the acyl-CoA dehydrogenase gene fadE . 
+ Besides these two well-known genes , we also identiﬁed STM14_1005 , another putative acyl-CoA dehydrogenase . 
+ Of interest is that fadH ( 2,4-dienoyl-CoA reductase ) was among the genes differentially regulated in tomatoes , and its expression was shown to be dependent on the availability of linoleic acid , which in turn differentially accumulates in tomatoes as they mature ( 9 ) . 
+ Salmonella metabolic requirements for proliferation within tomatoes differ from the requirements for systemic infection of mice . 
+ We explored whether Salmo-nella metabolic requirements for the systemic infection of mice were comparable with the metabolic requirements for ﬁtness within tomatoes . 
+ We retrieved data from previous work using transposon insertion libraries in Salmonella enterica serovar Typhimu-rium ( 37 , 38 ) that had been applied in a mouse model for systemic infection and compared them with our results from the Salmonella-tomato interaction . 
+ A total of 327 KEGG Orthology ( KO ) identiﬁers were mapped for the genes required for persistence in tomatoes , and 505 KO identiﬁers required for systemic infection of mice were mapped . 
+ Only 125 KO identiﬁers were shared by the two conditions , showing that the Salmonella metabolic requirements under these conditions are different ( Fig. 2A ) . 
+ While amino acid biosynthesis was required for Salmonella spp . 
+ in the interaction with tomatoes , scavenging of amino acids was required in the systemic mouse infection . 
+ Transposon disruptions in just a few pathways ( e.g. , the urea cycle ) impaired Salmonella ﬁtness during systemic infection ( 37 , 38 ) . 
+ Besides being involved in the synthesis of arginine , the urea cycle is also involved in the incorporation of excess nitrogen . 
+ Carbohydrates were likely a major carbon source for Salmonella spp . 
+ during systemic animal infection and during tomato colonization . 
+ However , Salmonella spp . 
+ may use the less-energy-efﬁcient pathway of acetate fermentation in tomatoes . 
+ During systemic infection , Salmonella oxidizes carbohydrates using the TCA cycle , as indicated by phenotypes of the mutants that represent this pathway . 
+ Phenotypes observed for the genes encoding 2-ketoglutarate dehydrogenase and succinate dehydrogenase , which are only fully expressed under aerobic conditions , support the role of oxidation in the bacterial metabolism in animals . 
+ Nucleotide biosynthesis is required by Salmonella spp . 
+ in murine systemic infection and in proliferation within tomatoes , as evidenced by the reduction in ﬁtness when mutations occurred in the genes involved in the biosynthesis of pyrimidines and purines in either model of infection . 
+ While most Salmonella metabolic requirements diverge for colonization of plant and animal hosts , nucleotide biosynthesis is a point of convergence . 
+ Enrichment analysis also showed that genes related to pathogenesis ( GO : 0009405 ) contribute to ﬁtness during systemic infection and proliferation within tomatoes ( Fig. 2B and C ) . 
+ This GO term is heterogeneous and includes regulators , effectors , and surface proteins . 
+ The global regulator phoPQ , for example , belongs to this group of genes and displayed a phenotype in both tomatoes and mice . 
+ The phoPQ regulator activates the expression of other pathogenesis-associated genes that code for effectors and have a role in adaptation to low Mg2 and resistance to antibacterial peptides . 
+ Several effectors and the mtgC virulence factor required for survival in low Mg2 had phenotypes under both conditions , indicating that some Salmonella adaptations for virulence in animals are also at play during colonization of tomato fruits . 
+ Few metabolic parallels between ﬁtness of Salmonella in tomatoes and behavior of phytopathogens . 
+ After ﬁnding that Salmonella metabolic requirements for proliferation within tomatoes differed from the requirements for the systemic infection of mice , we explored whether these requirements are similar to the metabolic requirements of bacterial phytopathogens during plant infection . 
+ To test the hypothesis that Salmonella behavior in tomatoes is similar to the behavior of plant pathogens , we extracted the information identiﬁed in mutant screens of Pectobacterium carotovorum in Chinese cabbage ( 39 ) , Ralstonia solanacearum in Arabidopsis thaliana and tomatoes ( 40 ) , Xanthomonas citri in grapefruit ( 41 ) , Xanthomonas oryzae pv . 
+ oryzicola in rice and tobacco ( 42 ) , Pseudomonas tolaasii in A. thaliana ( 43 ) , Pseudomonas syringae pv . 
+ macu-licola in A. thaliana , and Xanthomonas campestris pv . 
+ campestris in cabbage ( 44 ) . 
+ We adopted the strategy of grouping the results of different screens to increase the coverage of KO identiﬁers , since most of these studies did not have the same depth used in transposon sequencing methods , and to obtain a broader comparison with the different lifestyles that phytopathogens may assume . 
+ There appears to be a limited overlap of metabolic functions between Salmonella requirements for colonization of tomatoes and those of phytopathogens ( Fig. 3 ) , indicating that the mechanisms employed by Salmonella to colonize tomatoes require a distinct combination of functions . 
+ Lipopolysaccharide ( LPS ) and nucleotide biosynthesis were among the functions shared by phytopathogens and Salmonella in tomatoes and were also involved in systemic infection of mice , suggesting that there is a small set of genes required for pathogenesis , regardless of the host . 
+ These functions are likely involved in resistance to host defense and metabolic limitations that bacteria experience within eukaryotes . 
+ Amino acid biosynthesis was a requirement shared by Salmonella during colonization of tomatoes and by phytopathogens but absent from Salmonella during systemic infection of mice . 
+ This divergence between colonization of animal and plant hosts could be associated with the nutrient availability . 
+ Fatty acid biosynthesis and degradation were required for Salmonella infection of various hosts , but this requirement was absent from phytopathogens . 
+ Salmonella requires biosynthesis of arginine , glutamine , glutamate , branched amino acids , methionine , tryptophan , and threonine to colonize tomatoes . 
+ Nitrogen metabolism , including amino acid biosynthesis and the global nitrogen regulator glnG , contributed to Salmonella ﬁtness during its colonization of tomatoes . 
+ We explored this role further using competition assays with isogenic mutants . 
+ A lower bacterial inoculum was used in these assays to more realistically approximate natural interactions . 
+ We identiﬁed the amino acid biosynthetic pathways required by the bacterium for proliferation within tomatoes . 
+ Transposon insertion sequencing revealed the synthesis of tryptophan , arginine , branched amino acids ( leucine , isoleucine , and valine ) , glutamine , glutamate , threonine , methionine , and proline to be required for complete bacterial ﬁtness ( Fig. 4A ) . 
+ Mutants were created in the genes responsible for the key steps of each biosynthetic pathway , and their auxotrophic phenotype was conﬁrmed in minimal medium . 
+ However , because there are at least two enzymes that can catalyze the last step in the formation of methionine and glutamate , metA and gltB were not auxotrophs . 
+ The competition assays conﬁrmed the transposon insertion sequencing results for all but one of these 10 metabolic pathways ( Fig. 4B ) . 
+ Only the auxotrophic mutant for proline ( proA : : kan ) deviated from the transposon insertion sequencing results and did not appear to have a reduced ﬁtness compared to the wild type . 
+ A deletion of the global nitrogen regulator glnG also led to a reduction in ﬁtness of Salmonella in tomatoes , consistent with its function in bacterial responses to nitrogen assimilation . 
+ To verify that the observed reduction in ﬁtness caused by the disruption in trpC , argA , ilvD , glnA , serA , thrC , gltB , metA , and glnG was not due to a reduction in overall ﬁtness , we compared the growth kinetics of these isogenic mutants in LB . 
+ We observed that the growth kinetics of these mutants did not differ from the wild-type strain ; consequently , the reduced ﬁtness observed in tomatoes is not caused by a reduction in overall ﬁtness ( Fig . 
+ S1 ) . 
+ We tested whether trpC , argA , ilvD , glnA , serA , thrC , gltB , and metA are expressed within tomato fruits using resolvase-in vivo expression technology ( RIVET ) reporters that allow quantiﬁcation of gene expression of the targets in vivo . 
+ One day after inoculation , the RIVET resolution for the reporters of trpC , argA , ilvD , glnA , serA , thrC , gltB , and metA was between 80 and 100 % , and 3 and 7 days after inoculation , the resolution reached almost 100 % for all reporters ( Fig. 5 ) . 
+ When tested in LB , the reporters were not resolved , while in M9 medium with low concentration of amino acids , the resolution reached high levels ( Fig. 5 ) . 
+ The results of the RIVET experiments are consistent with the ﬁtness phenotypes of the corresponding mutants : amino acid biosynthesis genes are strongly expressed in tomatoes and are required for proliferation within the fruit . 
+ Tomato genotype affects Salmonella ﬁtness . 
+ Tomatoes carrying a ripening inhib-itor ( rin ) mutation do not present the characteristic ripening phenotype , they do not accumulate lycopene , and their fruits do not soften . 
+ Fruits of rin mutant tomatoes accumulate many amino acids in different ( generally larger ) amounts during their development compared to the wild type ( 45 ) . 
+ Because amino acids accumulate at higher levels within rin mutant tomatoes , we tested the hypothesis that the phenotypes of the amino acid biosynthesis mutants will be less severe than in the wild-type tomatoes . 
+ To this end , we performed competition assays using isogenic mutants in trpC , argA , ilvD , glnA , serA , thrC , gltB , and metA in the tomato cultivar Ailsa Craig and its rin homozygous mutant . 
+ The competitive indices of argA , gltB , thrC , and trpC mutants were not affected by the tomato genotype , while isogenic mutants for glnA , ilvD , metA , and serA presented a reduction in their ﬁtness within rin mutant tomatoes compared to wild-type tomatoes . 
+ The metA mutant exhibited the highest difference : in wild-type tomatoes , the CI indicated a 2-fold reduction compared to wild-type Salmonella spp. , whereas in rin mutant tomatoes , the reduction was 32-fold ( Fig. 6 ) . 
+ LPS biosynthesis is required for Salmonella proliferation in tomatoes . 
+ Most of the data presented so far underscore the differences in Salmonella requirements for animal and plant colonization . 
+ However , biosynthesis of LPS was needed under both conditions . 
+ Genes involved in LPS core and O-antigen synthesis , contained by the rfa and rfb clusters , respectively , were identiﬁed as required for tomato colonization in our screen , as well as in transposon insertion sequencing analyses of mouse colonization ( Fig. 2 and 7A ) . 
+ The competition assays using isogenic mutants for rfaB , rfaI , rfbN , and rfbP con-ﬁrmed the transposon insertion sequencing data ( Fig. 7B ) . 
+ All mutants had an 4-fold ﬁtness reduction . 
+ The similar CI for the mutants involved in the same biological process strongly suggests that the observed phenotype is related to the defective LPS biosynthesis and not to a secondary mutation . 
+ The role of LPS during disease in animals has been extensively studied for several pathogens . 
+ LPS protects bacterial cells from antibacterial peptides produced by host organisms ( reviewed in reference 46 ) . 
+ Salmonella LPS mutants are severely attenuated during systemic infection and are incapable of developing disease . 
+ In bacterial plant pathogens , LPS is also a major virulence factor , protecting the pathogen against plant defenses . 
+ We hypothesize that the immunity mechanisms in plants and animals target the same cellular structures in bacterial cells , and Salmonella 's countermeasures may be similar in animal and plant hosts . 
+ Salmonella inoculation titer affects ﬁtness of SPI and nucleotide synthesis mutants . 
+ The transposon insertion sequencing experiments suggested that genes involved in nucleotide biosynthesis and those that are a part of the Salmonella pathogenicity islands ( SPI ) 1 , 2 , and 3 were required for Salmonella full ﬁtness during proliferation in tomatoes ( Data Set S1 ) . 
+ Auxotrophic mutants for purines and pyrimi-dines exhibited strong attenuation ( Data Set S1 ) . 
+ SPI-1 , -2 , and -3 are the major Salmonella virulence determinants required for the invasion of epithelial cells , macro-phages , and survival during the intracellular phase ; strains lacking any of these SPIs are avirulent in vertebrate animal models . 
+ However , the observed phenotypes of the SPI mutants were in direct contradiction to the previous study ( 9 ) . 
+ It is of note that the inoculum doses used in the transposon insertion sequencing screen were 4 to 5 orders of magnitude higher than in the study of infections in tomatoes performed by 
+ Noel et al. ( 9 ) . 
+ Therefore , we tested a hypothesis that the phenotypes of the SPI ( and nucleotide synthesis ) mutants were dependent on the inoculum dose . 
+ Competition assay for the pyrB , purH , SPI-1 , SPI-2 , and SPI-3 mutants and the wild type were carried out with an inoculum of 103 and 107 CFU per tomato . 
+ The results of these competition assays showed that the reduction in ﬁtness for all mutants tested was inoculum dependent , with phenotypes apparent only at the high inoculum doses ( Fig. 8 ) . 
+ For pyrB and purH mutants , which are purine and pyrimidine auxotrophs , respectively , the competition to scavenge the nucleotides available in the environment will increase at higher titers , and the mutants unable to synthesize purines or pyrimidines de novo will have a stronger deﬁcit for these compounds that will negatively affect their growth . 
+ Proteins encoded on SPI-1 , -2 , and -3 are not known to contribute to population density-dependent functions , and we do not have a ready explanation for the mutants ' phenotype that was dependent on the inoculum density . 
+ We hypothesize that for the SPI-3 mutant , an increasing Mg2 deﬁcit could lead to the observed phenotype at higher concentrations . 
+ One of the genes present in SPI-3 , mgtC , encodes a virulence factor required for intracellular proliferation under low-Mg2 conditions . 
+ The screening data suggest that Salmonella in tomatoes has a deﬁcit in Mg2 , as indicated by the loss of ﬁtness for mutants in the phoPQ system that is activated by low Mg2 , and by the loss of ﬁtness for corA mutants , which lack a Mg2 channel . 
+ DISCUSSION
+ The ability of human enteric pathogens , such as Salmonella , to robustly colonize plant tissues likely helps this pathogen build up populations that are high enough to then reinfect herbivorous hosts . 
+ This ability of Salmonella to exploit plants as a part of its life cycle and use them as vehicles to reach preferred animal hosts was recently supported by multitrophic experiments ( 47 ) . 
+ Salmonella and enteropathogenic E. coli are not the only members of the Enterobacteriaceae family capable of forming successful relationships with both plants and animals . 
+ Klebsiella pneumoniae is both a nitrogen-ﬁxing plant endophyte and an opportunistic pathogen in animals and humans ( 48 ) . 
+ Pantoea stewartii , the causal organism of Stewart 's wilt in sweet corn , spends a portion of its life within the gut of the corn beetle ( 49 ) . 
+ Dickeya dadantii is a pathogen causing soft rots in plants and is an aphid pathogen ( 50 ) . 
+ It is possible that these bacteria `` repurpose '' their metabolic and virulence genes to adapt to different hosts , or they may have evolved two distinct strategies for the infections of plants and animals . 
+ The hypothesis of distinct strategies is supported by the discoveries of two distinct type III secretion systems in P. stewartii , one used in plants and another in beetles ( 49 ) . 
+ To gain insight into the interactions of Salmonella spp . 
+ with their different hosts and to compare Salmonella behavior with that of the phytopathogens , we carried out high-throughput screens , followed by the comparative analyses . 
+ It appears that 40 % of the genes required for tomato colonization by Salmonella are also involved in a systemic infection in mice , and 35 % of the genes required for murine systemic infection are required for colonization of tomato fruits . 
+ The same pattern was found in the metabolic requirements for phytopathogens , where 42 % of the mapped KO identiﬁers were shared with Salmonella colonization of tomatoes . 
+ This partial overlap suggests a certain degree of repurposing of the Salmonella metabolism to adapt to different hosts but also reveals metabolic versatility of this bacterium . 
+ Salmonella mutants in amino acid biosynthesis genes had the most pronounced phenotype in tomatoes but were not required for murine systemic infection , as determined in high-throughput screenings ( 37 , 38 , 51 ) and competition assays with isogenic mutants ( 52 ) . 
+ We focused on this group of genes to explore the differences between the requirements for interactions with animals and plants . 
+ Competition assays conﬁrmed that the biosynthesis of tryptophan , arginine , branched amino acids , glutamine , serine , threonine , glutamate , and threonine is required for colonization of tomatoes . 
+ In this regard , Salmonella behavior closely paralleled that of phytobacteria : amino acid biosynthesis was required for plant colonization by the pathogens Pseudomonas tolaasii ( 43 ) and X. campestris pv . 
+ campestris ( 44 ) and by the rhizosphere colonizer Pseudomonas ﬂuorescens WCS365 ( 53 ) . 
+ The amino acid metabolism requirements for plant-associated bacteria could be a result of unbalanced amino acid concentrations in this environment . 
+ Amino acids appear to be a relatively minor component of the exudates of mature fruits , leaves , and roots of tomatoes ( 20 ) , which is consistent with the phenotypes of the auxotrophic mutants . 
+ Because tomatoes carrying the rin mutation are known to accumulate higher levels of certain amino acids ( 45 ) , we tested the hypothesis that the auxotrophy of the Salmonella mutants will be at least partially complemented within rin mutant tomatoes . 
+ Surprisingly , this did not prove to be the case , and Salmonella auxotrophic glnA , ilvD , metA , and serA mutants were even less ﬁt within rin mutant tomatoes than in the wild type . 
+ Differences between Salmonella proliferation in Campari and Ailsa Craig cultivars or between the wild type and the rin mutant could inﬂuence these results . 
+ Levels of Salmonella growth in Ailsa Craig ( wild-type ) and Campari tomatoes are not signiﬁcantly different ( 35 ) , but Salmonella growth is reduced in Ailsa Craig rin mutants ( 35 ) . 
+ This reduction in growth might indicate that the reduction in ﬁtness of amino acid biosynthesis mutants in rin mutant tomatoes could be even bigger and was masked by a limited number of cell doubling . 
+ Our observations suggest that either plant amino acids are not accessible to Salmonella spp . 
+ ( due to their compartmentalization or chemical modiﬁcation ) or that Salmonella spp . 
+ can make better use of inorganic nitrogen sources during plant colonization . 
+ LPS biosynthesis was a point of convergence between the requirements for Salmo-nella colonization of plants and animals , as indicated by our screening and conﬁrmed for the competition assays with rfaB , rfaI , rfbN , and rfbP mutants . 
+ The LPS role for Salmonella virulence in animals is well established ; mutants with a truncated LPS are attenuated , and Salmonella uses phoPQ and pmrAB to modify its LPS to evade the innate immunity through resistance against antibacterial peptides ( 46 ) . 
+ Plants have antibacterial peptides that are not homologous to the ones found in animals , but they share the same mechanisms of action ( reviewed in references 54 and 55 ) . 
+ These plant-based antibacterial peptides are small cationic molecules that target the cell membranes , leading to destabilization and cell death . 
+ LPS is required by plant bacterial pathogens to avoid being targeted by the plants ' antibacterial peptides . 
+ Salmonella LPS may similarly support bacterial proliferation in this environment . 
+ Another point of convergence was the biosynthesis of nucleotides and the requirement for the Salmonella pathogenicity island 1 ( SPI-1 ) , SPI-2 , and SPI-3 . 
+ Competition assays conﬁrmed phenotypes of these mutants only for high , but not low , titers . 
+ SPI-1 and SPI-2 T3SSs were shown to be important factors for plant colonization from high-inoculum ( 108 CFU/plant ) experiments with Arabidopsis ( 24 ) . 
+ It is possible that the phenotypes observed are related to increased intraspecies competition due to a relative scarcity of available resources within the plant . 
+ SPI-1 and SPI-2 may also be needed to subvert immunity , which may become more important as more bacterial cell numbers are invading and therefore activating the plant 's immune response . 
+ While Salmonella effectors were shown to be functional when directly transformed into cell wall-less plant protoplasts ( 22 ) , it is not clear how they are delivered across the plant cell wall . 
+ This study demonstrated that while Salmonella is capable of repurposing some of the genes it uses during animal infections to colonize alternate hosts ( such as plants ) , the overlap between the sets of genes required for animal and plant infection is only 40 to 50 % . 
+ Conversely , only similarly partial overlap was observed when Salmonella genes required for tomato colonization were compared with the genes required for virulence of phytopathogens . 
+ Considering that 520 features required for ﬁtness of Salmonella in tomatoes have no known functions , there remains much to be learned about the ecology of this human pathogen inside alternate hosts , such as plants . 
+ MATERIALS AND METHODS
+ Bacterial strains and DNA manipulation . 
+ Bacterial strains were grown in LB broth at either 30 °C or 37 °C ( as indicated in the text ) , and antibiotics were added at the following concentrations : 100 g/ml ampicillin , 50 g/ml kanamycin , and 20 g/ml tetracycline , unless otherwise indicated . 
+ A full list of the strains and plasmids used in this study is in Table 1 . 
+ Salmonella isogenic mutants used to conﬁrm screening data were constructed using Datsenko-Wanner mutagenesis ( 56 ) by replacing a speciﬁc open reading frame ( ORF ) with the kanamycin resistance cassette from pKD4 , followed by P22-mediated transduction into S. enterica serovar Typhimurium ATCC 14028 . 
+ RIVET reporters to evaluate in planta gene expression were constructed as described previously ( 57 ) ; the kanamycin cassette from isogenic mutants was removed with FLP recombinase using pCP20 , followed by the insertion of tnpR-lacZ , from pCE70 or PCE71 in the remaining FLP recombination target ( FRT ) site . 
+ The tnpR-lacZ cassette , which replaced almost the entire ORF of the deleted gene , was inserted immediately after the start codon . 
+ The tetracycline resistance gene tetA , ﬂanked by the res1 sites in a neutral site of the genome , was transduced using P22 to the ﬁnal construct from the donor strain JS246 . 
+ Primers used for cassette construction and deletions conﬁrmation are listed in Table S1 in the supplemental material . 
+ Plant material . 
+ Tomatoes of the cultivar ( cv . ) 
+ Campari were purchased at the local supermarket . 
+ Tomatoes of the cv . 
+ Ailsa Craig and its isogenic derivative rin mutant line were grown in a rooftop greenhouse . 
+ Tomatoes were tagged at anthesis and were harvested 34 days later to ensure that all tomatoes were in the same developmental stage , as described before ( 35 ) . 
+ Transposon insertion library construction . 
+ A library of S. enterica serovar Typhimurium 14028 Tn5 insertion mutants was constructed using the Epicentre EZ-Tn5 T7/Kan2 promoter insertion kit . 
+ Brieﬂy , primers Right_Tn_T7_Kan2 and Kan2_right_code were used in a standard PCR to add N18 barcodes to both sides of the EZ-Tn5 T7/KAN -2 transposon . 
+ The PCR product was gel puriﬁed using the QIAquick gel extraction kit ( Qiagen ) , according to the manufacturer 's recommendations . 
+ About 200 ng of the puriﬁed PCR product was used in an 8 - l transposase reaction mixture in 0.17 % glycerol that also contained 1 l of TypeOne restriction inhibitor and 2 U of EZ-Tn5 transposase . 
+ The reaction mixture was incubated at room temperature for 3 h and subsequently dialyzed against water for 30 min before electroporation into fresh electrocompetent cells of S. enterica serovar Typhimurium 14028 ( MZ1597 ) . 
+ Transformed cells were isolated on LB agar plates with kanamycin ( 60 g/ml ) after overnight growth at 37 °C , enumerated , and collected . 
+ The ﬁnal library consisted of a pool of about 280,000 independent colonies . 
+ Mapping of library barcodes to the S. enterica serovar Typhimurium ATCC 14028 genome . 
+ Genomic DNA of the obtained library of S. enterica serovar Typhimurium ATCC 14028 Tn5 insertion mutants was prepared using the GenElute bacterial genomic DNA kit ( Sigma-Aldrich ) . 
+ The DNA was fragmented and ligated to Illumina primers using standard methods . 
+ Approximately 150 ng of this material was then used to PCR amplify the regions ﬂanking the transposon insertion sites , in a stepwise PCR regimen . 
+ In the ﬁrst PCR , primer Illumina_P5_Read1 was paired with a primer that aligned with a region close to either the left end ( primer Tn5_Left_CGTACA_Read2 ) or the right end ( primer Tn5_Right_ACATGC_Read2 ) of the inserted transposon . 
+ PCR proceeded for 20 cycles , using 1.25 U of Taq polymerase and under standard conditions . 
+ Exactly 1/10 of the 1st PCR product was subsequently used in a 2nd PCR that engaged Illumina primers Illumina_P5_Read1 and Illumina_P7 and proceeded for 10 cycles . 
+ Products were subjected to QIAquick PCR product puriﬁcation ( Qiagen ) , according to the manufacturer 's recommendations . 
+ The material was subsequently Illumina sequenced for 150-bp reads at both ends . 
+ Sequence analysis to map the barcode to the exact location on the genome is described further below . 
+ Screen optimization . 
+ To ensure that the inoculum representative of the transposon insertion library was seeded into tomatoes and that a mutant 's ﬁtness could be assessed by changes in its relative abundance , competitive ﬁtness assays with mutants known to have a phenotype in tomatoes ( hns : : kan , ΔbcsA ΔlpfA ΔfadH cysB : : kan , phoN : : kan , and rcsA : : kan ) were carried out . 
+ Although not all combinations of these mutants were tested in all preliminary experiments , we compared the impact of the different inoculum doses ( 104 and 107 ) and various ratios of the mutants to the wild type . 
+ Prior to the infections into tomatoes , strains were grown for 16 h in LB broth at 37 °C 250 rpm . 
+ The cultures were pelleted , washed in phosphate-buffered saline ( PBS ) twice , and diluted 1:10 , reaching a ﬁnal density of approximately 108 CFU/ml . 
+ When 107 CFU of Salmonella was inoculated into tomatoes , we were able to reliably observe even modest decreases in ﬁtness . 
+ Therefore , this inoculation dose was selected for the library screen . 
+ Library screening . 
+ The MZ1597 library was screened in Campari tomatoes using the tomato wound model to identify loci that affect Salmonella ﬁtness in this environment . 
+ We chose this model based on the FDA assessment that wounds and punctures are a potential route for internalization of human pathogens ( such as Salmonella ) in fresh fruits and vegetables , and speciﬁcally in tomatoes 
+ ( http://www.fda.gov/Food/GuidanceRegulation/HACCP/ucm082063.htm ) . 
+ MZ1597 cultures were grown ( with shaking at 250 rpm ) for 16 h in LB broth supplemented with kanamycin at 37 °C . 
+ The cultures were pelleted , washed in PBS twice , and diluted 1:10 , reaching a ﬁnal density of approximately 108 CFU/ml ; 3 l of this suspension was inoculated into three shallow ( 2 - to 3-mm deep , 1 mm in diameter ) wounds in tomato pericarps ( 106 CFU per tomato ) using a small pipette tip insertion under the epidermis of the tomato fruits . 
+ Tomatoes were incubated at 22 °C and a relative humidity of 60 % for 7 days , a period of time that allows Salmonella to reach the carrying capacity of tomato fruits and approximates the time period between tomato harvest and consumption . 
+ Salmonella was recovered by collecting 1-g samples of the pericarp around the inoculation site ; samples from the same fruit were combined and homogenized in a stomacher ( Sevard ) . 
+ Salmonella cells were recovered by centrifugation and were then resuspended in 50 ml of LB broth , followed by 6 h growth at 37 °C and 250 rpm , reaching 108 CFU/ml . 
+ One milliliter of culture was recovered and used for library preparation . 
+ Library preparation for sequencing . 
+ Aliquots of around 5 107 CFU from input and output libraries were subjected to three washes in water , followed by proteinase K ( 100 g ) digestion for 1 h at 55 °C in lysis buffer ( 10 mM Tris [ pH 8.0 ] , 1 mM EDTA , 0.1 % Triton X-100 ) . 
+ After inactivation of the enzyme for 10 min at 95 °C , a nested PCR regimen was performed to speciﬁcally amplify the DNA regions adjacent to the inserted transposons . 
+ Primers 1st_PCR_Tn5_EZ_left_reverse / Left_Forward_201 and Right _ Reverse_ﬁxed / Right_Forward_983 were used to amplify the left and right ﬂanks , respectively , in a 20-cycle PCR in 50 l of 1 Kapa HiFi reaction mixture . 
+ One microliter of the successful PCR product was then utilized as the template in a second rate-limited PCR ampliﬁcation using 1.25 U of Taq ( Invitrogen ) and 0.1 mM dinucleoside triphosphates ( dNTPs ) , with primers 2nd_PCR_Tn5_EZ_Right_Forward and Tn _ SetYY_XXXXXXXX . 
+ The second PCR ampliﬁed the ( already enriched ) right ﬂank of the transposon insertion site . 
+ Products of the second PCR were pooled and subjected to QIAquick PCR product puriﬁcation ( Qiagen ) , according to the manufacturer 's recommendation . 
+ Illumina sequencing proceeded with custom primers Tn5_EZ_Right_Seq_ﬁxed and Tn5_EZ_Index_Seq_new for a single indexed run , with a read length of 25 bases . 
+ Sequence analysis . 
+ For mapping of the barcodes introduced into the mutants of the transposon insertion library , the Tn5 primers and the corresponding N18 sequence ﬂanking the Tn5 priming sites were trimmed from the raw sequence reads using custom Perl scripts . 
+ The trimmed reads were mapped to the S. Typhimurium 14028 reference genome using Bowtie2 . 
+ PCR duplicate reads were removed using Picard tools ( https://github.com/broadinstitute/picard ) . 
+ Only reads mapped in proper pairs were considered for further downstream analysis . 
+ The left-most position of read 2 for each mapping was extracted from the SAM alignments using custom scripts , and this was identiﬁed as the Tn5 insertion site in a strand-speciﬁc manner . 
+ The N18 barcode tag for each mapped read was then identiﬁed from the raw untrimmed reads using custom Perl scripts . 
+ The above-mentioned analysis identiﬁed the N18 barcode tag ﬂanked by conserved priming sites for each Tn5 insertion mutant . 
+ For the identiﬁcation of barcoded mutants , raw sequencing data consisted of single-end 25-bp reads . 
+ The ﬁrst 18 bases , which represented the unique N18 tag for each Tn5 mutant , were extracted , and the abundance of all unique 18-mers was calculated using custom Perl scripts . 
+ The abundances of all N18 barcodes mapped within each annotated genome feature were summed in a strand-speciﬁc manner . 
+ This represented the aggregated abundance for each feature in the coding strand and the noncoding strand . 
+ The aggregated abundances for the input and output libraries were statistically analyzed using edgeR , and the log2 fold changes and FDRs were reported . 
+ Competition assays . 
+ Competition assays were used to estimate the ﬁtness of speciﬁc mutants . 
+ Overnight cultures of the wild type and an isogenic mutant were adjusted based on optical density at 600 nm ( OD600 ) to the same population density and combined in a 1:1 ratio . 
+ The combined cultures were washed three times in PBS and diluted 10,000-fold , and 3 l of the diluted mixture was inoculated into the tomato pericarp in three separate shallow wounds ( 103 CFU per tomato ) . 
+ An aliquot of the inoculum was plated onto xylose lysine deoxycholate ( XLD ) plates to enumerate CFU . 
+ The inoculated tomatoes were incubated at 22 °C and relative humidity 60 % for 7 days , and Salmonella cells were recovered by inserting a loop into the wound and streaking the material onto XLD plates . 
+ The competition index was calculated as previously described ( 9 ) using the formula log2 ( [ MUTout/WTout ] / [ MUTin/WTin ] ) , where MUT is the number of mutant colonies recovered , WT is the number of wild-type colonies recovered , `` in '' represents the cultures inoculated in tomatoes , and `` out '' represents the colonies recovered after the experiment . 
+ Competition index ( CI ) statistical signiﬁcance values of isogenic mutants were compared against the CI for the neutral mutant CEC1000 using analysis of variance ( ANOVA ) . 
+ The CI signiﬁcance for the rcsA : : kan mutant against CEC1000 , the CI for isogenic mutants in the wild type against rin mutant tomatoes , and the comparison of different inoculation titers were done using a pairwise t test . 
+ The software JMP version 12 was used for all CI analyses . 
+ RIVET assays . 
+ RIVET reporters were used to evaluate Salmonella gene expression in planta . 
+ In this system , gene expression is quantiﬁed as the percentage of cells that lost tetracycline resistance [ which ultimately results from the activation of the promoter of interest that leads to the transcription of recombinase tnpR , and TnpR-catalyzed excision of tetA ﬂanked by res1 sites , causing tetracycline sensitivity ( 57 ) ] . 
+ For RIVET assays in tomatoes , Salmonella strains were grown in LB broth with tetracycline for 16 h at 37 °C . 
+ Cultures were then washed extensively in PBS to remove traces of the medium and the antibiotic , and inoculum suspensions in PBS were seeded into tomato pericarps ( 103 cells per tomato ) in three separate shallow wounds . 
+ Cells were recovered 1 , 3 , and 7 days after inoculation using a sterile wire loop , and the material was streaked onto XLD plates . 
+ For the positive control of RIVET activation , cultures were grown in M9 agar ( 0.7 % ) supplemented with the appropriate amino acids ( leucine , isoleucine , and valine for ilvD , arginine for argA , serine for serA , glutamine for glnA , glutamate for gltB , methionine for metA , tryptophan for trpC , and threonine for thrC ) . 
+ The tested amino acid concentrations ranged from 2 mg/ml to 8 g/ml in a 2-fold dilution series , and cells were recovered from the lowest concentration that allowed for growth of the reporter . 
+ Cultures were then streaked onto XLD . 
+ The percentage of Salmonella cells sensitive to tetracycline recovered from tomatoes and M9 with the amino acid was estimated by plating the colonies from XLD plates into LB tetracycline plates . 
+ Similar control experiments were done in LB agar ( 0.7 % ) , only without the supplementation with amino acids . 
+ Metabolic mapping and functional characterization . 
+ Genes required for Salmonella systemic infection in mice were retrieved from data sets deposited with the original publications using parameters established by the authors of the original papers . 
+ For the study by Chaudhuri et al. ( 37 ) , a P value of 0.05 and log2 ( fold change ) of less than 1 were used ; for the study by Santiviago et al. ( 38 ) , a P value of 0.0005 and log2 ( fold change ) of less than 0.75 were used ; and for the study by Silva et al. ( 51 ) , a P value of 0.001 and log2 ( fold change ) of less than 0.75 were used . 
+ BlastKOALA ( 58 ) was used to assign KEGG Orthology ( KO ) terms for Salmonella enterica ATCC 14028 coding sequences , and the KEGG Mapper Web interface was used to visualize metabolic pathways . 
+ Gene Ontology ( GO ) term enrichment was performed with Panther ( 59 ) , with Bonferroni 's correction . 
+ Genes required for bacterial phytopathogens to elicit disease in their preferred plant host were extracted from previously published studies ( 39 -- 42 , 44 , 60 ) , and the KO terms for them were also retrieved with BlastKOALA . 
+ It should be noted that the results of mutant screens in phytopathogens were less comprehensive than the high-density transposon screens . 
+ We also note that , generally , screens in phytopathogens identiﬁed fewer ( 10 to 200 ) functions required for virulence in plants , which may also represent more stringent screening conditions than those of the studies with high-density transposon insertion libraries . 
+ Growth in LB . 
+ To evaluate potential growth defects of isogenic mutants , we compared their growth kinetics with the wild-type strain . 
+ Cultures grown for 16 h at 250 rpm and 37 °C in LB broth were diluted to an OD600 of 0.02 in 50 ml of LB broth . 
+ Three replicate cultures of each strain were incubated at 37 °C and 250 rpm . 
+ Samples were withdrawn hourly , and the OD600 was measured with a BioSpec-mini spectrophotometer ( Shimadzu ) . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at https://doi.org/10.1128/ AEM.03028-16 . 
+ ACKNOWLEDGMENTS
+ We are grateful to Alex Gannon for technical assistance . 
+ We thank J. Giovannoni for sharing tomato seeds . 
+ This study was supported by funding from USDA-NIFA and the Center for Produce Safety .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/28060822.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/28060822.txt 0 → 100644
View file @27818a9
+ Transcriptome Sequencing Reveals Large-
+ access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use , distribution , and reproduction in any medium , provided the original author and source are credited . 
+ Data Availability Statement : All relevant data are within the paper and its Supporting Information files . 
+ Transcriptome data have been deposited in the Short Read Archive under accession PRJNA340082 . 
+ Funding : This work was supported by the United States National Institutes of Health ( https://grants . 
+ nih.gov / ) through grants R01AI106892 ( MRS ) and F32GM109750 ( KJV ) . 
+ The funders had no role in study design , data collection and analysis , decision to publish , or preparation of the manuscript . 
+ Abstract 
+ Scale Changes in Axenic Aedes aegypti Larvae 
+ Kevin J. Vogel*, Luca Valzania, Kerri L. Coon, Mark R. Brown, Michael R. Strand*
+ Department of Entomology , The University of Georgia , Athens , Georgia , United States of America 
+ Mosquitoes host communities of microbes in their digestive tract that consist primarily of bacteria . 
+ We previously reported that Aedes aegypti larvae colonized by a native community of bacteria and gnotobiotic larvae colonized by only Escherichia coli develop very similarly into adults , whereas axenic larvae never molt and die as first instars . 
+ In this study , we extended these findings by first comparing the growth and abundance of bacteria in conventional , gnotobiotic , and axenic larvae during the first instar . 
+ Results showed that conventional and gnotobiotic larvae exhibited no differences in growth , timing of molting , or number of bacteria in their digestive tract . 
+ Axenic larvae in contrast grew minimally and never achieved the critical size associated with molting by conventional and gnotobiotic larvae . 
+ In the second part of the study we compared patterns of gene expression in conventional , gnotobiotic and axenic larvae by conducting an RNAseq analysis of gut and nongut tissues ( car-amino acid transport , hormonal signaling , and metabolism . 
+ Overall , our results indicate that axenic larvae exhibit alterations in gene expression consistent with defects in acquisition 
+ Introduction
+ Like most animals , mosquitoes host communities of microbes in their digestive tract that consist primarily of bacteria [ 1 -- 3 ] . 
+ Both field and laboratory studies indicate that most of these bacteria are aerobes or facultative anaerobes [ 3 -- 12 ] . 
+ Analysis of 16S rRNA gene amplicons of select species indicates that larvae primarily contain a subset of the bacteria in their aquatic environment , while some but not all of these bacteria are present in adults [ 4 , 7 -- 9 , 13 ] . 
+ In contrast , controlled experiments show that larvae contain no gut bacteria if they hatch from surface sterilized eggs and are maintained in a sterile environment [ 7 ] . 
+ Taken together , these findings indicate that mosquito larvae acquire most if not all of their microbiota from their environment and that they transstadially transmit some members of the bacterial community to adults . 
+ Aedes aegypti is a key vector of several human pathogens including filarial nematodes and the viruses that cause yellow fever , Dengue fever , Zika fever and Chikungunya [ 14 , 15 ] . 
+ Ae . 
+ aegypti is also an important model for many fundamental studies on mosquito development , immunity and behavior [ 16 -- 18 ] . 
+ Larvae reared under conventional ( non-sterile ) conditions and fed a nutritionally complete diet molt through four instars before pupating and emerging as adults [ 19 ] . 
+ Studies dating back to the 1920s noted that Ae . 
+ aegypti and other species of mos-quito larvae contain bacteria in their gut [ 20 -- 23 ] , but conclusions regarding the role of these bacteria in development vary . 
+ Some report that bacteria are a source of nutrients or provide other factors that are required for development [ 23 , 24 ] while others report that larvae develop on both undefined and defined diets in the absence of bacteria [ 20 , 25 , 26 ] . 
+ A key challenge in interpreting these variable findings is that researchers during this period lacked the molecular tools needed to characterize the gut microbiota in mosquitoes or determine whether larvae reported to lack bacteria actually were ` germ free ' . 
+ As a result , it is also difficult to evaluate the accuracy of the findings reported . 
+ Using high-throughput sequencing approaches , we previously determined that a laboratory population of Ae . 
+ aegypti ( UGAL strain ) contains ~ 100 bacterial operational taxonomic units 
+ ( OTUs ) during the larval stage with lower bacterial diversity in adults [ 7 ] . 
+ Our experiments also indicated that axenic larvae , conclusively shown to have no bacteria , die as first instars when fed a standardized diet and maintained under sterile conditions [ 7 , 27 ] . 
+ Axenic larvae also die as first instars if standard diet is supplemented with dead bacteria or is preconditioned by co-culture with living bacteria before feeding . 
+ However , axenic larvae develop into adults if colonized by bacteria from water containing conventionally reared larvae [ 7 ] . 
+ Gnotobiotic Ae . 
+ aegypti larvae colonized individually by several members of the bacterial community in conventionally reared larvae or the non-community member Escherichia coli also develop normally with adults showing no morphological defects or reductions in fitness as measured by development time , size and fecundity [ 7 , 27 ] . 
+ Lastly , offspring from field collected Ae . 
+ aegypti and several other mosquito species host communities of bacteria that differ from laboratory cultures but exhibit the same dependency on living bacteria for development as UGAL strain Ae . 
+ aegypti [ 28 ] . 
+ Altogether , we conclude from these results that several mosquito species fail to develop if reared under axenic conditions but larvae develop normally into adults if living bacteria are present in the digestive tract . 
+ Our results further indicate that development does not depend on a particular OTU or community of bacteria in the larval digestive tract . 
+ These findings are important because they implicate gut bacteria as a key factor in the development of larvae into adults , which is the life stage that transmits vector borne pathogens to humans . 
+ Understanding the interactions between larval stage mosquitoes and gut bacteria is also important because many of the OTUs in larvae are transstadially transmitted to adults where they can affect vector competence to transmit Plasmodium and arboviruses ( summarized by [ 2 , 29 ] ) . 
+ In this study , we further assessed Ae . 
+ aegypti development by comparing the growth and abundance of bacteria in conventional larvae , gnotobiotic larvae colonized by only E. coli and axenic larvae during the first instar . 
+ Based on these data , we then performed a transcriptome analysis of larvae in each treatment as a first step to understanding how bacteria in the gut affect gene expression in first instars . 
+ Our results indicated that conventional and gnotobiotic first instars grow similarly , whereas axenic larvae do not attain the critical size associated with molting of conventional and gnotobiotic larvae to the second instar . 
+ Our transcriptome analysis further indicated that a number of genes with functions in nutrient acquisition , metabolism , and stress were differentially expressed in axenic larvae when com ¬ 
+ Materials and Methods
+ Ethics statement
+ Animal care and use are described in Animal Use Protocol A2014 12-013-R1 ( renewal 1/28 / 2016 ) , which was approved by The University of Georgia Institutional Animal Care and Use 
+ Committee ( IACUC ) . 
+ The UGA IACUC oversees and provides veterinary care for all campus animal care facilities and is licensed by the US Department of Agriculture ( USDA ) and maintains an animal welfare Assurance , in compliance with Public Health Service policy , through the NIH Office of Laboratory Animal Welfare , and registration with the USDA APHIS Animal 
+ Care , in compliance with the USDA Animal Welfare Act and Regulations , 9 CFR . 
+ IACUC personnel attend to all rodent husbandry under strict guidelines to insure careful and consistent handling . 
+ The University of Georgia 's animal use policies and operating procedures facilitate compliance with applicable federal regulations , guidance , and state laws governing animal use in research and teaching including the : 1 ) The Animal Welfare Act , 2 ) Public Health Service ( PHS ) Policy on the Humane Care and Use of Laboratory Animals , 3 ) United States Government Principles for the Utilization and Care of Vertebrate Animals Used in Testing , Research and Training , 4 ) Guide for the Care and Use of Laboratory Animals , 5 ) Guide for the Care and 
+ Use of Agricultural Animals in Research and Teaching , 6 ) American Veterinary Medical Association Guidelines for the Euthanasia of Animals , and 7 ) Applicable Georgia laws . 
+ Insects
+ UGAL Ae . 
+ aegypti were maintained as previously described by feeding larvae a standardized , nutritionally complete diet ( 1:1:1 rat chow : lactalbumin : torula yeast ) and blood-feeding adult females on an anesthetized rat [ 30 ] . 
+ Anesthetization of rats ( Sprague-Dawley strain ) obtained from Charles Rivers Laboratories for mosquito blood feeding was performed and monitored by trained personnel as in Animal Use Protocol A2014 12-013-R1 . 
+ All larvae used in the study hatched from eggs that were surface sterilized using previously developed methods [ 7 ] . 
+ In brief , eggs laid 5 -- 7 days previously were submerged in a sterile petri dish containing 70 % ethanol in water for 5 min followed by transfer to a second petri dish containing a solution of 3 % bleach and 0.1 % ROCCAL-D ( Pfizer ) in sterile water for 3 min , followed by a second wash in 70 % ethanol for 5 min . 
+ Surface sterilized eggs were then transferred to a new sterile petri dish and washed 3 times with 10 ml of sterile water followed by transfer to a sterile 10 cm culture flask containing 15 ml sterile water and allowed to hatch for 1 hour . 
+ Axe-2 nic larvae that hatched from eggs were transferred to culture flasks that contained 10 mg of our standard rearing diet that had been sterilized by gamma-irradiation [ 7 ] . 
+ Conventional larvae were produced by adding 1 ml of water from the general lab culture to a culture flask containing axenic larvae . 
+ Gnotobiotic larvae colonized by only E. coli were produced by adding 108 CFUs from an overnight culture of the K12 strain ( National BioResource Project : E. coli/B . 
+ subtilis , 
+ National Institute of Genetics , Shizuoka , Japan ) to culture flasks containing axenic larvae . 
+ When fed a nutritionally complete diet under controlled temperature and photoperiod , Ae . 
+ aegypti larvae molt at predictable intervals with each instar being distinguished by the width of the head capsule [ 19 ] . 
+ To distinguish key traits within the first instar we monitored the growth of conventional , gnotobiotic and axenic larvae by placing newly hatched individuals in 24 well culture plates containing sterilized diet and water . 
+ Cohorts of larvae were then observed every 2 h for behavioral and morphological characters associated with feeding , apolysis , and ecdysis . 
+ Larval length was measured from the anterior border of the head to the posterior border of the last abdominal segment , which precedes the siphon tube . 
+ We also measured the width of the head capsule and prothorax from the dorsal side at their widest point . 
+ All measures were made using a Leica stereomicroscope fitted with an ocular micrometer . 
+ Critical size , which is defined as the point within an instar when a larva achieved sufficient size to molt , was confirmed by transferring larvae from wells containing diet at specific times post-hatching to wells containing only sterile water . 
+ The number of larvae that molted to the second instar was then determined . 
+ Bacterial abundance and immunofluorescence microscopy
+ We estimated the number of bacteria in conventional , gnotobiotic and axenic first instars by two methods : colony count analysis of culturable bacteria and quantitative real time PCR 
+ ( qPCR ) . 
+ Colony count data were generated as previously described [ 7 ] by collecting and surface sterilizing larvae at 18 h post-hatching followed by homogenization in LB broth and culturing on LB plates at 27 ˚ for 72 h . 
+ The number of bacterial colonies was then counted . 
+ For qPCR assays , an absolute standard curve was generated by PCR amplification using the universal bacterial 16S primers HDA1 ( ACTCCTACGGGAGGCAGCAGT ) and HDA2 ( GTATTA CCGCGGCTGCTGGCA ) [ 31 ] and bacterial DNA from K12 E. coli as template followed by TOPO-TA cloning of the product as previously described [ 32 ] . 
+ After propagation in E. coli , plasmid was purified using the GeneJet Miniprep kit ( Thermo Scientific ) . 
+ A standard curve was then generated by serial dilution of the plasmid ( 10 -- 10 copies ) and qPCR analysis . 
+ Bac-7 2 terial DNA was then isolated from individual conventional , gnotobiotic and axenic larvae as previously described [ 7 ] followed by qPCR using the same primers and fitting the data to the standard curve to estimate bacterial abundance via amplicon copy number [ 32 ] . 
+ Digestive tracts were dissected for immunofluorescence microscopy from conventional , gnotobiotic and axenic larvae at 18 h post-hatching in phosphate buffer saline ( PBS , pH 7.4 ) . 
+ Samples were fixed in 4 % paraformaldehyde in PBS for 20 min at room temperature . 
+ After rinsing three times in PBS , guts were dehydrated in ethanol , permeabilized for 20 min in PBS plus 0.2 % Triton X-100 ( PBT ) for 20 min , and then rewashed three times in PBT . 
+ After blocking for 1 h in 
+ PBS containing 5 % goat serum ( Sigma ) and 0.1 % Tween 20 ( vol/vol ) ( PBS-GS-T ) , samples were incubated overnight at 4 ˚C with a mouse anti-peptidoglycan primary antibody ( GTX39437 
+ GeneTex ) diluted 1:200 in PBS-GS-T . 
+ After washing three times for 10 min in PBS-GS-T , samples were incubated at room temperature for 2 h with an Alexa Fluor 488 goat anti-mouse secondary antibody ( Thermo Fisher ) diluted 1 : 2000 in PBS-GS-T . 
+ After three washes in PBS , samples were incubated overnight at 4 ˚C with a Cy3-labeled chitin binding protein [ 33 ] diluted 
+ 1:5 , followed by rinsing in PBS , and mounting on slides in 50 % glycerol diluted in PBS containing 1 μg / ml HOECHST 33342 ( Sigma ) . 
+ Samples were then examined using a Zeiss LSM 710 inverted confocal microscope with acquired images processed using Adobe Photoshop CS4 . 
+ Gnotobiotic larvae colonized by K12 strain E. coli that constitutively expressed green fluorescent 
+ RNA preparation for transcriptome studies
+ Flasks of larvae containing conventional , gnotobiotic or axenic larvae were prepared and then used to produce RNA samples for sequencing libraries . 
+ This was done by dissecting 50 larvae per biological replicate at 22 h post-hatching in sterile PBS . 
+ Larval heads were removed and the digestive tract from each larva was collected to produce a gut sample , while the remainder of each larva formed a non-gut ( carcass ) sample , which consisted primarily of fat body , cuticular epithelium , the nervous system , and trachea . 
+ Each gut and carcass from a given larva was transferred to an RNase-free 1.5 ml tube . 
+ Total RNA was then extracted from each sample using TRIZol ( Life Technologies ) according to the manufacturer 's instructions followed by two DNAase treatments using the Turbo-DNAfree kit ( Life Technologies ) . 
+ RNA integrity was assessed on a BioAnalyzer ( Agilent ) using a Eukaryotic Total mRNA Nano chip . 
+ Library preparation, sequencing, and data analysis
+ Stranded , paired-end libraries ( 75 bp ) were constructed at the University of Georgia Genomics Core Facility for each of 18 samples : three replicates per treatment ( axenic , conventional , and gnotobiotic ) for each tissue ( gut and carcass ) . 
+ Each library was barcoded and equal amounts of the libraries were pooled and sequenced on an Illumina NextSeq mid-output flowcell . 
+ Resulting FASTQ sequences were de-multiplexed and quality filtered using the FASTX-toolkit ( http://hannonlab.cshl.edu/fastx_toolkit/ ) . 
+ Reads with Phred-equivalent scores of < 30 ( corresponding to a per-base error rate of 0.1 % ) for any base were omitted from further analysis . 
+ Reads were then re-paired and mapped to the Ae . 
+ aegypti genome ( [ 34 ] ; assembly AaegL3 , gen-eset AaegL3 .3 ) using TopHat2 [ 35 ] . 
+ Read counts and differential expression were determined using the Cufflinks package [ 36 ] . 
+ This generated fragments per kilobase of transcript per million reads mapped ( FPKM ) values for Ae . 
+ aegypti gene expression . 
+ This analysis also identified novel transcripts not present in the L3 .3 annotation of the Ae . 
+ aegypti genome [ 36 ] . 
+ Un-anno-tated transcripts were further analyzed using TransDecoder , which is part of the Trinity package [ 37 ] that identifies potential protein-coding genes . 
+ Gene Ontology ( GO ) terms were 
+ Data analyses
+ Larval growth and bacterial colony count assays were analyzed by either one-way analysis of variance ( ANOVA ) followed by post-hoc Tukey-Kramer Honest Significant Difference ( HSD ) tests or Fisher 's Exact Test using R ( http://www.r-project.org/ ) . 
+ Pairwise analyses between treatments and tissues of transcript abundance data were performed in Cufflinks and signifi ¬ 
+ Results
+ Conventional and gnotobiotic first instars grow similarly whereas axenic first instars exhibit reduced growth 
+ All first instars hatched with an average head-capsule diameter of 281.7 ± 9.8 ( SE ) μm . 
+ Conventional and gnotobiotic larvae began feeding within 1 h of hatching ( 0 h ) which continued for ~ 16 h post-hatching as evidenced by the presence of food in the gut and a corresponding increase in body size as measured by length ( Fig 1A ) . 
+ We also noted that the width of the prothorax was less than the width of the head capsule at hatching but by 16 h was greater than the width of the head capsule ( Fig 1A ) . 
+ These morphological features at 16 h post-hatching were associated with individuals becoming somewhat more sedentary and also not increasing further in length until after molting to the second instar ( Fig 1A ) . 
+ Ecdysis to the second instar occurred on average at 23.5 ± 1.2 h for conventional and 23.4 ± 0.9 h for gnotobiotic larvae ( t = 0.3 ; P > 0.1 ) . 
+ Collectively , we interpreted these data as suggesting that conventional and gnotobiotic larvae achieved critical size and initiated apolysis at a similar time in the first instar ( ~ 16 h ) , which resulted in larvae from both treatments also molting to the second instar at near identical times . 
+ Experimental support for these conclusions derived from transferring conventional and gnotobiotic larvae at different times post-hatching to wells without food and assessing whether or not they could molt to the second instar . 
+ Results showed that no larvae in either treatment molted if transferred to wells without food prior to 16 h , whereas ~ 50 % molted if transferred at 18 h , and > 85 % molting if transferred at 20 h by length and the ratio of thorax : head capsule width , which remained < 1 ( Fig 1A ) . 
+ In turn , no axenic larvae ever molted , which resulted in all individuals ultimately dying as first 
+ Conventional and gnotobiotic first instars contain similar numbers of bacteria that are similarly distributed in the gut 
+ Previous studies indicated that conventionally reared Ae . 
+ aegypti larvae contain gram negative aerobes or facultative anaerobes that are obtained from the water where they feed [ 7 , 28 ] . 
+ Several of these OTUs as well as E. coli used to colonize gnotobiotic larvae can also be cultured on Luria Broth ( LB ) plates at 27 ˚ [ 7 ] . 
+ We therefore used a colony count assay as a first step to estimating the number of bacteria in individual larvae at 18 h post-hatching . 
+ Results indicated that the mean number of bacteria culturable on LB plates was higher in conventional ( 5374.9 ± 550 ( SE ) ) than gnotobiotic larvae ( 2632.6 ± 414.4 ) but this difference was not significant due to inter-individual variation ( Fig 2A ) . 
+ As expected , no culturable bacteria were pres-ent in axenic larvae ( Fig 2A ) . 
+ Since some bacteria in conventional larvae are potentially not culturable on LB plates , we also estimated bacterial abundance using culture-independent qPCR and universal primers that amplify a conserved region of the bacterial 16S rRNA gene . 
+ 16S gene copy number did not significantly differ between conventional ( 19,852 ± 3,841 16S copies ) and gnotobiotic ( 15,418 ± 3,841 16S copies ) larvae , and no 16S amplicons were generated from axenic larvae many bacteria encode multiple 16S operons [ 38 , 39 ] and individual cells can be polyploid [ 40 ] . 
+ qPCR can also capture DNA from both living and dead bacteria . 
+ The impact of copy number is well illustrated by K12 E. coli , which is fully culturable on LB plates but contain 7 16S rRNA operons [ 38 ] . 
+ Dividing the mean 16S copy number for gnotobiotic larvae by 7 yielded a value of 2203 , which was very similar to the estimate generated by colony count . 
+ We did not know 
+ 16S copy numbers for each of the OTUs in conventional larvae but the same reasoning suggested qPCR estimates were consistent with colony count data . 
+ It also suggested that the higher values generated by qPCR versus colony counts more likely reflects 16S copy number than an abundance of bacteria that were not culturable under the conditions we used . 
+ each time point with multiple comparisons performed by Tukey-Kramer HSD test . 
+ Conventional and gnotobiotic larvae did not differ for either size measure at any time point . 
+ An asterisk ( * ) indicates the time points where axenic larvae significantly differ from the gnotobiotic and conventional treatments ( P 0.01 ) . 
+ To the right of each graph are drawings of 1 and 18 h post-hatching first instars showing where length ( L ) , head capsule width ( HW ) and thorax width ( TW ) were measured . 
+ ( B ) Proportion of gnotobiotic and conventional larvae that molt to the second instar after transfer to wells containing water but no food . 
+ Individual larvae from each treatment were removed from culture plates containing food at two hour intervals , rinsed 3x in sterile water , and transferred to wells of a 24-well culture plate containing sterile water only . 
+ The proportion of larvae molting to the second instar was recorded at 36 h post-hatching . 
+ A minimum of 24 individuals was assayed for each treatment per time point . 
+ There were no differences between the proportion of gnotobiotic and conventional larvae that molted at any time point ( Fisher 's exact test : P > 0.05 ) . 
+ We examined the distribution of bacteria in the digestive tract of conventional and gnotobiotic larvae using an anti-peptidoglycan antibody , a Cy3-labeled chitin binding protein that labeled the peritrophic matrix , and Hoechst 33342 that labeled gut cell nuclei ( Fig 3 ) . 
+ In the case of gnotobiotic larvae , distribution was also visualized using E. coli that constitutively expressed GFP ( S1 Fig ) . 
+ Results showed the presence of bacteria in the foregut , midgut and hindgut of conventional and gnotobiotic larvae ( Fig 3 , S1 Fig ) . 
+ All bacteria in the midgut also resided within the endoperitrophic space formed by the peritrophic matrix ( Fig 3 , S1 Fig ) . 
+ Anti-peptidoglycan and GFP signal intensity were similar between conventional and gnotobiotic larvae , which was consistent with our colony count and qPCR data that did not detect any differences in bacteria abundance ( Fig 3 , S1 Fig ) . 
+ Higher magnification images also clearly indicated that anti-peptidoglycan bound to particles in the endoperitrophic space that morphologically appeared to be rod-shaped bacteria ( Fig 3 ) . 
+ In contrast , anti-peptidoglycan did not detect any bacteria that were in contact with midgut cells ( Fig 3 ) . 
+ As expected , anti-pepti-doglycan did not bind to any particles in the guts of axenic larvae but binding of Cy3-labeled chitin binding protein clearly showed that the midgut of axenic larvae was lined with a peri-trophic matrix ( Fig 3 ) . 
+ Transcriptional profiling
+ We used Illumina sequencing to transcriptionally profile conventional , gnotobiotic and axenic first instars at 22 h post-hatching which was a time point that preceded molting of conventional and gnotobiotic first instars , whereas axenic larvae remained below critical size ( see Fig 1 ) . 
+ We also profiled the gut and carcass in each of these treatments separately . 
+ Three biological replicates per treatment and two tissue sources ( gut and carcass ) resulted in a total of 18 samples for which sequencing libraries were produced and analyzed . 
+ An average of 45.2 million reads were generated per sample ( range : 166 -- 10.7 ) , which was reduced to an average of 6.3 million paired reads ( range 9.9 -- 4.4 ) after quality filtering ( S1 Table ) . 
+ This resulted in a total of 15.8 to 22.9 million quality filtered reads per treatment ( S1 Table ) of which 67.8 % on average mapped to the current assembly of the Ae . 
+ aegypti genome ( AaegL3 ) using Tophat 
+ ( S2 Table ) . 
+ Of the 18,293 transcripts that are annotated in the Ae . 
+ aegypti reference genome , 13,551 had an FPKM 1 in one or more of our samples . 
+ A total of 1,353 transcripts were identified that did not map to the L3 annotation of the Ae . 
+ aegypti genome ( Fig 4A ) . 
+ Using TransDecoder , 164 of these had predicted open reading frames that were > 100 amino acids ( AA ) , which we searched against the NCBI nr database . 
+ BLAST results detected a hit to an annotated insect gene with a bit score > 100 for 125 of these transcripts , which we interpreted as evidence they likely derive from protein coding genes that are absent from the current annotation of the Ae . 
+ aegypti genome ( S2 Table ) . 
+ However , only 3 of these likely protein-coding transcripts were differentially expressed among treatments ( Fig 4A ) . 
+ One of these was a conserved hypothetical protein that was more abundant in the gut and carcass of axenic versus conventional and gnotobiotic larvae . 
+ The second was a putative structural component of cuticle that was also more abundant in the carcass of axenic larvae . 
+ The third was a transcript significantly upregulated in the gut of axenic larvae that was most similar to the Culex quinquefaciatus gene schnurri : a regulatory factor in the decapentaplegic pathway implicated as a negative regulator of intestinal stem cell proliferation in the midgut of D. melanogaster [ 41 ] . 
+ The remaining 1,228 unannotated transcripts were presumptive non-coding RNAs of which 253 were classified using PLEK [ 42 ] as long , non-coding RNAs ( Fig 4A ) . 
+ To examine the number of genes that were differentially expressed between treatments , we first limited our consideration to loci with an FPKM of 10 or higher in one condition . 
+ Among the three treatments , this resulted in the number of significantly differentially expressed genes ranging from 1,328 between conventional and axenic carcasses to 228 between axenic and gnotobiotic carcasses ( Fig 4B ) . 
+ We noted that more genes were significantly up-regulated ( 995 ) than down-regulated ( 84 ) in the carcasses of axenic larvae when compared to conventional larvae ( Fig 4B ) . 
+ This was also the case when comparing the carcasses of axenic and gnotobiotic larvae ( Fig 4B ) . 
+ In contrast , the number of up-regulated versus down-regulated genes was less distinctly different between the carcasses of conventional and gnotobiotic larvae or the guts of axenic , conventional , and gnotobiotic larvae ( Fig 4B ) . 
+ Transcripts with an FPKM that was > 10 in axenic but < 1 in gnotobiotic or conventional larvae were classified as preferentially and highly up-regulated under axenic rearing conditions . 
+ Only 21 loci met these criteria with 6 being detected in the gut , 15 in the carcass , and none in both tissues . 
+ Moreover , only 3 of these loci mapped to annotated genes while 2 generated significant BLAST hits to known insect proteins . 
+ These included one acyl-CoA transferase expressed in the gut ( AAEL006672 ) a second acyl-CoA transferase expressed in the carcass ( AAEL000466 ) , and a heat-shock 70 ( HSP70 ) gene ( AAEL017978 ) also expressed in the carcass . 
+ The two unannotated transcripts with significant BLAST hits were a predicted diacylglycerol kinase and an asparagine synthetase that were both expressed in the gut . 
+ The other 17 loci were unannotated with no significant BLAST hits , which suggested they were a principle components analysis ( PCA ) that included all genes with an FPKM value 1 that were differentially expressed ( log2 fold change 2 ) in at least one of the comparisons shown in Fig 4B ( see also S3 -- S5 Tables ) . 
+ The first component , explaining 44.8 % of the variation in our data , separated the samples by tissue type , which not surprisingly showed within each treatment that the differentially expressed genes identified in gut and carcass samples largely did not overlap ( Fig 4C ) . 
+ The second component , which explained 28.9 % of the variation in the data , separated the samples by treatment ( Fig 4C ) . 
+ This indicated that the gut and carcass samples from axenic larvae most differed from conventional larvae . 
+ However , the pool of differentially expressed genes in conventional and gnotobiotic larvae also did not overlap even though larvae in both treatments grew and molted to the second instar near identically . 
+ By extracting global classification of gene ontology ( GO ) terms from VectorBase , we determined that most differentially expressed genes ( log2 fold change 2 ) in Fig 4B belonged to 7 functional categories : cell cycle , chitin/cuticle formation , metabolism , oxidoreductases , peptidases , signaling , and transport . 
+ Up-regulated genes in the guts and carcasses of axenic larvae were most enriched in the categories of metabolism , transport , and oxidoreductases . 
+ Most up-regulated genes in the category of oxidoreductases were cytochrome p450 enzymes ( CYPs ) rather than genes associated with the formation or neutralization of reactive oxygen species ( S3 -- S5 Tables ) . 
+ Down-regulated genes in the guts of axenic larvae were most enriched for peptidases , while in the carcass they were most enriched for the category of chi-tin/cuticle ( Fig 4D ) . 
+ Altogether , these results indicated the absence of bacteria in axenic larvae as well as the type of bacteria in conventional versus gnotobiotic larvae affected gene expression in Ae . 
+ aegypti first instars . 
+ They also indicated gene expression was affected in 
+ Select peptidases are down-regulated in the guts of axenic larvae while several amino acid transporters are up-regulated 
+ We next focused on genes in a subset of the categories shown in Fig 4D to gain additional insights into factors that potentially contribute to the disabled growth of axenic larvae . 
+ The Ae . 
+ aegypti genome contains hundreds of peptidases but this category was of interest because of the known role peptidases play in digestion and the finding that several peptidase genes were significantly down-regulated in axenic larvae . 
+ The functional literature on digestive peptidases in Ae . 
+ aegypti is restricted to adult females where the principal enzymes identified in bloodmeal digestion are select trypsin-like serine peptidases [ 43 -- 47 ] . 
+ However , additional trypsins or trypsin-like genes expressed in larvae have also been identified through PCR-based , expressed sequence tag ( EST ) , or transcriptome data sets prepared from whole body samples [ 48 -- 51 ] . 
+ The first important feature our data set revealed was that most peptidases previously identified in bloodmeal digestion were not expressed in the guts of conventional , gnotobiotic or axenic first instars ( Fig 5A ) . 
+ Instead , several other peptidase genes exhibited 
+ FPKM values 50 in the gut of each treatment , while all of the peptidases with significantly lower FPKM values in axenic versus conventional and gnotobiotic larvae were serine or leu-kotriene-C4-hydrolases ( Fig 5A ) . 
+ Comparing these results with another RNAseq data set [ 16 ] indicated these down-regulated peptidase genes are not expressed in the guts or carcasses of adults either before or after consumption of a blood meal . 
+ In addition , none of these genes with the exception of AAEL007926 had previously been reported to be differen ¬ 
+ The second category of interest from the perspective of digestion and nutrient acquisition
+ was transmembrane transporters . 
+ Due potentially to lower expression of certain peptidases , several heavy subunit and proton-coupled amino acid ( AA ) transporter genes plus one glucose transporter had significantly higher mean FPKM values in the guts of axenic versus conventional or gnotobiotic larvae ( Fig 5B ) . 
+ In contrast , transcript abundance of one sugar transporter was much higher in the guts of gnotobiotic than conventional or axenic larvae ( Fig 5B ) . 
+ Several AA transporter genes as well as select neurotransmitter and sterol transporter genes were also significantly up-regulated in the carcasses of axenic larvae relative to conventional and/or gnotobiotic larvae ( Fig 5B ) . 
+ Neurotransmitter transporters are involved in the degradation of neurotransmitters in the nervous system , and sterol transporters aid uptake and incorporation of sterols into cell and organelle membranes . 
+ Axenic larvae exhibit altered expression of genes with roles in growth , molting and metabolic signaling 
+ While many genes with metabolic or signaling functions were differentially expressed between treatments , the proportion of these genes that were significantly up - or down-regulated exhibited no obvious patterns when examined by GO category distribution alone ( Fig 4D ) . 
+ However , certain patterns did emerge when we focused on genes within these categories with essential roles in growth and molting . 
+ The first of these gene groups that we examined focused on ecdysteroids , which regulate molting and affect larval growth [ 52 ] , juvenile hormone ( JH ) , which influences ecdysteroid function and also affects growth [ 53 ] , and select other peptide hormones with roles in ecdy-sone and JH biosynthesis or other aspects of molting [ 54 ] . 
+ Cholesterol either stored or from the diet is converted into ecdysteroids through early steps catalyzed by shroud , a short-chain dehydrogenase/reductase , and neverland , a Rieske oxygenase , and later by the Halloween CYPs ( shadow , spook , disembodied , phantom , and shade ) [ 55 ] . 
+ Only shroud exhibited higher transcript abundances in the carcasses of conventional and gnotobiotic larvae when compared to axenic larvae ( Fig 6A ) . 
+ In contrast , shade , which catalyzes the conversion of ecdysone to 20-hydroxyecdysone in target tissues , was significantly more abundant in the carcasses of axe-nic larvae as were several downstream components of the ecdysone signaling pathway such as the ecdysteroid receptor ( ecr ) , its partner ultraspiracle , and the downstream factor e75 ( Fig 6A ) . 
+ Other peptide hormones and associated receptor genes with roles in regulating ecdysone biosynthesis such as prothracicotropic hormone ( ptth ) , or molting such as bursicon and eclosion hormone , were not differentially expressed ( Fig 6A ) . 
+ No significant differences were detected in mean FPKM values of allatotropin , allatosta-tins , or their receptors , which positively and negatively regulate JH biosynthesis in Ae . 
+ aegypti [ 56 -- 58 ] ( Fig 6A ) . 
+ Genes for key JH biosynthetic and metabolic enzymes including putative 3-hydroxy-3-methylglutaryl CoA reductase ( hmgr ) , farnesoic acid O-methyltrans-ferase ( famet ) , and multiple predicted JH esterases also exhibited few differences among treatments ( Fig 6A ) . 
+ In contrast , Ae . 
+ aegypti encodes multiple members of the takeout gene family , several of which are annotated as JH binding proteins ( JHBPs ) in VectorBase ( jhbpto ) and were among the most strongly upregulated genes in the carcasses and guts of axenic larvae when compared to conventional or gnotobiotic larvae ( Fig 6A ) . 
+ However , takeout genes overall share similarity with odorant binding proteins ( OBPs ) , lipocalins and a putative JHBP ( JP29 ) in Manduca sexta . 
+ Thus Takeout proteins are more broadly classified as putative hydrophobic ligand binding proteins [ 59 ] . 
+ The actual ligands for takeout gene family members are unknown in any insect , but studies in Drosophila implicate takeout in feeding and longevity , while also showing that starvation strongly upregulates takeout expression [ 60 ] . 
+ In addition to ecdysteroids and JH , growth and metabolism in insects involves the insulin signaling pathway , which converges with amino acid sensing and the target of rapamycin ( TOR ) pathway . 
+ FPKM values for several genes in the insulin and TOR pathways were significantly higher in the guts and carcasses of axenic versus conventional or gnotobiotic larvae ( Fig 6B ) . 
+ Particularly striking were the increases in mean FPKM values for the insulin receptor ( mir ) , foxo , and the FOXO target 4e-bp , which are up-regulated in several vertebrates and invertebrates including Ae . 
+ aegypti in response to starvation or reduced nutrient availability [ 61 -- 64 ] . 
+ No differences in expression of mir and foxo were detected when conventional and gnotobiotic larvae were compared to one another . 
+ However , select other insulin and TOR pathway genes exhibited higher mean FPKM values in gnotobiotic than conventional larvae , although fold differences were usually smaller than in comparisons between axenic and conventional or gnotobiotic larvae ( Fig 6B ) . 
+ Altered expression of genes in the insulin and TOR pathways in association with starvation is often coupled with up-regulated expression of genes in energy-producing metabolic pathways such as glycolysis , fatty acid metabolism , and fatty acid oxidation [ 61 ] . 
+ Mean FPKM values for several genes in each of these processes were significantly up-regulated in the guts and carcasses of axenic larvae when compared to conventional larvae ( Fig 6C -- 6E ) . 
+ A lesser number signaling pathway ( yellow circles ) or both pathways ( yellow/purple circles ) . 
+ While Ae . 
+ aegypti produces 8 insulin-like peptides ( ILPs ) , only one is included in the heatmap because the other ILP genes are only known from expressed sequence tags ( ESTs ) and are not annotated . 
+ ( C ) Genes with functions in fatty acid metabolism . 
+ ( D ) Genes with functions in fatty acid β-oxidation . 
+ ( E ) Genes with functions in glycolysis . 
+ Labeling for each heatmap is as described in Fig 3B with gene names listed by abbreviation if well defined in the literature and VectorBase or by full spelling if not . 
+ Color range in the heatmap indicates log2 fold change ( fc ) . 
+ Black boxes surrounding entries in the heatmap indicate FPKM values that significantly differed ( P 0.05 ) between a given pairwise treatment . 
+ of these genes were also significantly up-regulated in axenic larvae when compared to gnotobiotic larvae ( Fig 6C -- 6E ) . 
+ Several cuticular protein genes are upregulated in axenic larvae 
+ Insects including mosquitoes encode a diversity of cuticular proteins ( CPs ) that interact with chitin to form cuticle and/or the peritrophic matrix of the midgut [ 65 ] . 
+ A total of ten CP families are currently recognized on the basis of different motifs . 
+ These include two families distinguished by Rebers and Riddiford ( RR ) consensus sequences ( CPR1 , 2 ) [ 66 ] , two others that are classified as Cuticular Proteins Analogous to Peritrophins ( CPAP1 , 3 ) , four CP families of low complexity ( CPLCA , G , W , C ) , and two families designated as CPF and CPT ( = Tweedle ) ( Fig 
+ 7 ) . 
+ Using the CP accessions curated by Ioannidou et al. [ 65 ] , we determined that each had at least one member that was differentially expressed between treatments , which suggested gut bacteria broadly affect CP gene expression ( Fig 7 ) . 
+ Transcript abundance of many CP genes was significantly higher in the carcasses of axenic versus conventional and gnotobiotic larvae . 
+ However , several of the same CP genes were also differentially expressed between conventional and gnotobiotic larvae ( Fig 7 ) . 
+ Few immune genes are differentially expressed among treatments 
+ Prior work establishes that bacteria in the gut induce basal level expression of genes in both the Toll and Imd pathways in adult mosquitoes [ 67 -- 69 ] while only basal expression of the Imd pathway is induced in the digestive tract of adult Drosophila [ 70 , 71 ] . 
+ We thus anticipated that several immune genes would likely be differentially expressed in the guts of axenic , conventional and gnotobiotic first instars . 
+ However , immune genes were not among the categories that were significantly enriched in any of our treatments ( Fig 4D , S3 -- S5 Tables ) . 
+ Among the few immune genes that were differentially expressed ( log2 fold change 2 ) were pgrp-le , which activates the Imd pathway [ 72 , 73 ] , and was significantly down-regulated in the guts of axenic versus conventional and gnotobiotic larvae . 
+ However , no other components of the Imd pathway were differentially expressed among treatments in either the gut or carcass ( S3 -- S5 Tables ) . 
+ Three späetzle genes ( spz2 , 4 and 6 ) which encode predicted ligands for the Toll receptor , were also down-regulated in the carcasses of axenic versus conventional larvae , but almost no other genes in or regulated by the Toll pathway , including effector proteins , were differen ¬ 
+ Discussion
+ Our previous results indicated that several species of mosquitoes including Ae . 
+ aegypti fail to develop when fed a nutritionally complete diet and cultured under axenic conditions [ 7 , 28 ] . 
+ This outcome notably contrasts with studies of Drosophila and mice , which show defects in maturation of the digestive tract and immune system but do not require gut microbes for development since axenic cultures of both can be maintained over multiple generations if fed a nutritionally complete diet [ 71 , 74 -- 77 ] . 
+ Only under conditions of low nutrient availability do axenic Drosophila larvae exhibit delays in development , which can be rescued in gnotobiotic larvae that are singly colonized by particular members of the gut community [ 78 , 79 ] . 
+ Development of axenic Ae . 
+ aegypti can also be rescued in gnotobiotic larvae that are singly colonized by different species of bacteria . 
+ Unlike Drosophila , however , several different species of bacteria identified as community members as well as some non-community members such as E. coli rescue development of Ae . 
+ aegypti larvae , which develop at the same rate as conventionally reared larvae [ 7 , 28 ] . 
+ Adult Ae . 
+ aegypti produced from gnotobiotic larvae singly colonized by E. coli also show no morphological defects or reductions in fitness 
+ Altogether , these findings suggest an essential role for living microbes in development of Ae . 
+ aegypti . 
+ Axenic larvae will not develop when provided diet along with dead bacteria or diet that has been pre-conditioned by living bacteria [ 7 ] . 
+ Along with our current findings , these data argue against bacteria being an essential food source or providing a particular nutrient essential to larval development . 
+ In contrast , the absence of living bacteria in the gut could adversely affect physiological processes in larvae with roles in nutrient acquisition or assimilation . 
+ Thus , the primary goal of this study was to assess whether axenic larvae exhibit alterations consistent with this possibility or alternatively exhibit defects that point to other factors that 
+ We first assessed whether conventional and gnotobiotic larvae exhibit any fine scale dif-
+ ferences in growth during the first instar , and also whether axenic larvae exhibit specific traits that help explain why they do not molt . 
+ Our results identified no differences in growth or timing of molting between conventional and gnotobiotic first instars . 
+ The statistically similar number and distribution of bacteria in conventional and gnotobiotic larvae suggests the digestive tract of both contains sufficient space to host a finite number of bacterial cells that E. coli occupied when alone but which multiple species occupied in conventional larvae . 
+ The observation that all bacteria in conventional and gnotobiotic larvae reside inside the endo-peritrophic space further suggests their essential role in growth does not involve direct contact with midgut cells . 
+ In contrast , our results indicate that axenic larvae grow a small amount but never reach the critical size associated with apolysis and other events that precede molting by conventional and gnotobiotic larvae . 
+ Studies of several insects indicate that individual species often increase in size by approximately the same factor through the penultimate instar [ 80 , 81 ] . 
+ Within each instar , larvae also initiate a molt upon reaching a particular critical size , which is often associated with allometries such as the ratio between head capsule width and weight . 
+ In the first through penultimate instar , reaching critical size stimulates ecdysteroid hormone release , which induces the epidermis to produce a new cuticle while digesting most of the old endocuticle ( apolysis ) . 
+ This is followed by ecdysis , which refers to shedding of the old exo - and epicuticle and the beginning of the next instar . 
+ In the final instar related events result in pupation . 
+ The aquatic habit and small size of Ae . 
+ aegypti first instars precluded using the ratio between head capsule width and weight to estimate when larvae achieved critical size . 
+ However , we determined that the ratio of prothorax width to head capsule width exceeds 1 when conventional and gnotobiotic Ae . 
+ aegypti first instars achieve critical size . 
+ This measure also supported the conclusion that axenic larvae do not achieve critical size . 
+ Our transcriptome analysis at 22 h post-hatching indicated that approximately 12 % of the annotated genes in the Ae . 
+ aegypti genome are differentially expressed in axenic larvae when compared to conventional or gnotobiotic larvae . 
+ However , this profile consisted primarily of genes in seven categories that included the down-regulation of select peptidases in the gut and up-regulation of several genes in the gut and carcass with roles in amino acid transport , signaling through the ecdysteroid , insulin and TOR pathways , and fatty acid oxidation . 
+ Reduced expression of select peptidases suggests the absence of bacteria may adversely affect digestion , while the increased transcription of amino acid transporters , genes associated with insulin and TOR signaling , and fatty acid oxidation suggests a response to acquire additional nutrients and use lipid reserves from embryogenesis for nourishment . 
+ Similar patterns have been observed in mammals , Drosophila and mosquitoes in response to starvation stress [ 62 , 
+ 82 -- 84 ] . 
+ Insulin and TOR signaling have also been implicated in affecting JH synthesis , ecdysteroid synthesis , and ecdysteroid signaling in several insects including Ae . 
+ aegypti [ 62 , 85 -- 89 ] . 
+ That Ae . 
+ aegypti encodes multiple takeout orthologs , which are up-regulated in axenic larvae , is also intriguing given evidence showing that takeout expression is strongly upregulated in Drosophila larvae subjected to starvation but not other stress factors [ 60 ] . 
+ As previously noted , takeout gene products exhibit features of OBPs , JP29 , a predicted JH binding protein , and lipocalins that transport a diversity of hydrophobic molecules including reti-noids , steroids , lipids and pheromones [ 60 , 90 ] . 
+ The actual ligands Takeout proteins bind , however , are unknown . 
+ A number of CP genes are differentially expressed in axenic larvae relative to conventional and gnotobiotic larvae as are several CYPs assigned to the category of oxidoreductases . 
+ The significance of these differences in regard to growth and molting are uncertain although other studies have noted the differential expression of both CPs and CYPs in response to stress factors including starvation , heat , cold , and ionizing radiation [ 82 , 83 ] . 
+ Insects also continuously deposit endocuticle during the intermolt period [ 52 ] , which could explain why CP transcripts are detected in both axenic larvae , which never molt , and conventional or gnotobiotic larvae that were post-critical size and in the process of molting when tissue samples were collected . 
+ In contrast , we are uncertain why so few differences were detected among our treatments in regard to expression of immune genes . 
+ At minimum our results suggest differences between first instars and prior studies conducted in adult mosquitoes [ 67 -- 69 ] . 
+ Why such differences exist , however , will require future study . 
+ While our primary goal was to identify differentially expressed genes in axenic larvae , our results also identified several differences between conventional and gnotobiotic larvae . 
+ This indicates that colonization of larvae by E. coli alone does not fully recapitulate gene expression patterns in conventional larvae , and that the community of bacteria in the gut affects gene activity in larvae . 
+ On the other hand the differences in gene expression detected between conventional and gnotobiotic Ae . 
+ aegypti larvae are insufficient to substantially alter growth given the similarities in when larvae molted to the second instar and recently completed results showing that conventional and gnotobiotic larvae develop into adults that exhibit no differ ¬ 
+ In summary, this study indicates that living bacteria in first instar Ae. aegypti affect
+ growth and alter the expression of several genes with roles in nutrient acquisition , nutrient assimilation and stress . 
+ Since we examined only a single time point in the first instar , our transcriptome data do not identify when axenic larvae first exhibit changes in gene expression relative to conventional or gnotobiotic larvae . 
+ However , given that axenic first instars grow minimally beyond their size at hatching suggests the absence of living bacteria in the digestive tract adversely affects nutrient acquisition and/or assimilation almost immediately after hatching . 
+ We also recognize that our study did not include a treatment where conventional and gnotobiotic larvae were deprived of food to ascertain whether similar patterns are exhibited when compared to axenic larvae . 
+ We did not do this because at the onset of the investigation we did not know the key patterns our transcriptome data would identify . 
+ However , the results reported here position us to study select genes in this manner , while also providing information that will be used in functional studies of axenic larvae . 
+ In terms of disease control , the current study advances prior results by suggesting that the absence of gut bacteria disables growth at least in part by altering the metabolism of mosquito larvae and nutrient uptake . 
+ If correct , these findings further suggest that disruption of the microbial factors larvae require could potentially be used to reduce vector abundance and disease transmission [ 91 ] . 
+ Data Deposition
+ Transcriptome data have been deposited in the Short Read Archive under accession PRJNA340082 . 
+ Supporting Information
+ S1 Fig . 
+ Representative images of bacteria in the guts of conventional ( CNR ) and gnotobiotic ( GNT ) larvae at 18 h post-hatching . 
+ Bacteria ( B ) in the gut of conventional larvae were labeled with a peptidoglycan primary antibody and visualized using an Alexa Fluor 488 secondary antibody ( green ) while E. coli in gnotobiotic larvae expressed green fluorescent protein . 
+ Domains corresponding to the foregut ( FG ) , gastric caecae ( GC ) , anterior midgut ( AM ) , posterior midgut ( PM ) , Malpighian tubules ( MT ) , and hindgut ( HG ) are indicated at the top of the figure . 
+ Note the very similar distribution of bacteria in each treatment ( scale bar = 500 μm ) . 
+ S1 Table. Quality filtering statistics of RNAseq reads. (PDF)
+ S2 Table. Read mapping statistics. (PDF)
+ S3 Table . 
+ Gene expression data for gut versus carcass tissues of the same treatment . 
+ Col-umns are ( 1 ) test_id : the unique accession of each transcript as determined by cufflinks . 
+ ( 2 ) gene : VectorBase accession ( s ) that map to the locus . 
+ Entries with a `` - '' indicate novel transcripts identified by TopHat . 
+ ( 3 ) locus : the coordinates of the Aedes aegypti genome assembly 3 to which the transcript mapped . 
+ ( 4 ) sample_1 : the sample being compared to ( 5 ) sample_2 . 
+ `` axn '' axenic , `` cnr '' conventional , `` gnt '' gnotobiotic , `` mg '' gut , `` carc '' carcass . 
+ ( 6 ) status : indicates whether sufficient reads were mapped to the locus to perform statistical analysis . 
+ `` NOTEST '' indicates sample did not have sufficient alignments to statistically test . 
+ ( 7 ) value_1 ( 8 ) value_2 : FPKM values corresponding to sample_1 and sample_2 , respectively . 
+ ( 9 ) log2 ( fold_change ) : the log2 fold change difference in FPKM between sample_1 and sample_2 . 
+ Positive values indicate higher expression in sample_2 , negative values indicate higher expression in sample_1 . 
+ ( 10 ) test_stat : test statistic used to compute significance of the difference in FPKM between samples . 
+ ( 11 ) p_value : uncorrected significance of comparison in expression . 
+ ( 12 ) q_value : Benjamini-Hochberg false-discovery rate corrected p ¬ 
+ S4 Table . 
+ Gene expression differences between carcass tissues from different treatments . 
+ Columns are as in S3 Table . 
+ S5 Table . 
+ Gene expression differences between gut tissues from different treatments . 
+ Col-umns are as in S3 Table . 
+ We thank J. A. Johnson for assistance with illustrations , A. Elliott for assistance in maintaining the Ae . 
+ aegypti culture , and H. Merzendorfer for generously providing us with an aliquot of 
+ 1.
+ 2 . 
+ Hegde S , Rasgon JL , Hughes GL . 
+ The microbiome modulates arbovirus transmission in mosquitoes . 
+ Curr Opin Virol . 
+ 2015 ; 15:97 -- 102 . 
+ doi : 10.1016 / j.coviro .2015.08.011 PMID : 26363996 
+ 3 . 
+ Minard G , Mavingui P , Moro CV . 
+ Diversity and function of bacterial microbiota in the mosquito holobiont . 
+ Parasite Vector . 
+ 2013 ; 6:146 . 
+ 7 . 
+ Coon KL , Vogel KJ , Brown MR , Strand MR. Mosquitoes rely on their gut microbiota for development . 
+ Mol Ecol . 
+ 2014 ; 23 ( 11 ) :2727 -- 39 . 
+ doi : 10.1111 / mec .12771 PMID : 24766707 
+ 10 . 
+ Muturi EJ , Kim CH , Bara J , Bach EM , Siddappaji MH . 
+ Culex pipiens and Culex restuans mosquitoes harbor distinct microbiota dominated by few bacterial taxa . 
+ Parasite Vector . 
+ 2016 ; 9:18 . 
+ 14 . 
+ Barrett ADT , Higgs S. Yellow Fever : a disease that has yet to be conquered . 
+ Annu Rev Entomol . 
+ 2006 ; 52 ( 1 ) :209 -- 29 . 
+ 16.
+ 17.
+ 18 . 
+ Severson DW , Behura SK . 
+ Mosquito genomics : progress and challenges . 
+ Annu Rev Entomol . 
+ 2012 ; 57:143 -- 66 . 
+ doi : 10.1146 / annurev-ento-120710-100651 PMID : 21942845 
+ 19 . 
+ Timmermann SE , Briegel H. Larval growth and biosynthesis of reserves in mosquitoes . 
+ J Insect Physiol . 
+ 1999 ; 45 ( 5 ) :461 -- 70 . 
+ PMID : 12770329 
+ 20 . 
+ Barber MA . 
+ The food of anophiline larvae-food organisms in pure culture . 
+ US Pub Health Rep. 1927 ; 43:11 -- 7 . 
+ 21 . 
+ Chao J , Wistreich GA. . 
+ Microorganisms from the mid-gut of larval and adult Culex quinquefasciatus . 
+ J Insect Path . 
+ 1960 ; 2:220 -- 4 . 
+ 22 . 
+ Ferguson MJ , Micks DW . 
+ Microorganisms associated with mosquitoes . 
+ 1 . 
+ Bacteria isolated from mid-gut of adult Culex fatigans Wiedemann . 
+ J Insect Path . 
+ 1961 ; 3 ( 2 ) :112 -- 9 . 
+ 23 . 
+ Hinman EH . 
+ A study of the food of mosquito larvae ( Culicidae ) . 
+ Am J Hyg . 
+ 1930 ; 12 ( 1 ) :238 -- 70 . 
+ 24 . 
+ Rozeboom LE . 
+ The relation of bacteria and bacterial filtrates to the development of mosquito larvae . 
+ Am J Hyg . 
+ 1935 ; 21 ( 1 ) :167 -- 79 . 
+ 25 . 
+ Jones WL , Delong DM . 
+ A simplified technique for sterilizing surface of Aedes aegypti eggs . 
+ J Econ Entomol . 
+ 1961 ; 54 ( 4 ) :813 -- 4 . 
+ 26 . 
+ Lang CA , Storey RS , Basch KJ . 
+ Growth , composition and longevity of axenic mosquito . 
+ J Nutr . 
+ 1972 ; 102 ( 8 ) :1057 -- 66 . 
+ PMID : 5051860 
+ 29 . 
+ van Tol S , Dimopoulos G. Chapter Nine : Influences of the mosquito microbiota on vector competence . 
+ In : Raihkel AS , editor . 
+ Advances in insect physiology . 
+ 2016 ; 51:243 -- 91 . 
+ 38 . 
+ Klappenbach JA , Dunbar JM , Schmidt TM . 
+ rRNA operon copy number reflects ecological strategies of bacteria . 
+ Appl Environ Microb . 
+ 2000 ; 66 ( 4 ) :1328 -- 33 . 
+ 39 . 
+ Stevenson BS , Schmidt TM . 
+ Life history implications of rRNA gene copy number in Escherichia coli . 
+ Appl Environ Microb . 
+ 2004 ; 70 ( 11 ) :6670 -- 7 . 
+ 45 . 
+ Noriega FG , Pennington JE , Barillas-Mury C , Wang XY , Wells MA . 
+ Aedes aegypti midgut early trypsin is post-transcriptionally regulated by blood feeding . 
+ Insect Mol Biol . 
+ 1996 ; 5 ( 1 ) :25 -- 9 . 
+ PMID : 8630532 
+ 47 . 
+ Noriega FG , Wells MA . 
+ A molecular view of trypsin synthesis in the midgut of Aedes aegypti . 
+ J Insect Physiol . 
+ 1999 ; 45 ( 7 ) :613 -- 20 . 
+ PMID : 12770346 
+ 53 . 
+ Jindra M , Palli SR , Riddiford LM . 
+ The juvenile hormone signaling pathway in insect development . 
+ Annu Rev Entomol . 
+ 2013 ; 58 ( 1 ) :181 -- 204 . 
+ 54 . 
+ Strand MR , Brown MR , Vogel KJ . 
+ Chapter six -- Mosquito peptide hormones : diversity , production , and function . 
+ In : Raihkel AS , editor . 
+ Advances in insect physiology . 
+ Academic Press ; 2016 ; 51:145 -- 88 . 
+ Gilbert LI. Halloween genes encode P450 enzymes that mediate steroid hormone biosynthesis in Dro- sophila melanogaster. Mol Cell Endocrinol. 2004; 215(1–2):1–10. doi: 10.1016/j.mce.2003.11.003 PMID: 15026169
+ Hernández-Martınez S, Mayoral JG, Li Y, Noriega FG. Role of juvenile hormone and allatotropin on nutrient allocation, ovarian development and survivorship in mosquitoes. J Insect Physiol. 2007; 53 (3):230–4. doi: 10.1016/j.jinsphys.2006.08.009 PMID: 17070832
+ 60 . 
+ Sarov-Blat L , So WV , Liu L , Rosbash M . 
+ The Drosophila takeout gene is a novel molecular link between circadian rhythms and feeding behavior . 
+ Cell . 
+ 2000 ; 101 ( 6 ) :647 -- 56 . 
+ PMID : 10892651 
+ 61 . 
+ Baker KD , Thummel CS . 
+ Diabetic larvae and obese flies-emerging studies of metabolism in Drosophila . 
+ Cell Metab . 
+ 2007 ; 6 ( 4 ) :257 -- 66 . 
+ doi : 10.1016 / j.cmet .2007.09.002 PMID : 17908555 
+ 63 . 
+ Puig O , Tjian R. Nutrient availability and growth : regulation of insulin signaling by dFOXO/FOXO1 . 
+ Cell Cycle . 
+ 2006 ; 5 ( 5 ) :503 -- 5 . 
+ doi : 10.4161 / cc .5.5.2501 PMID : 16552183 
+ 66 . 
+ Rebers JE , Riddiford LM . 
+ Structure and expression of a Manduca sexta larval cuticle gene homologous to Drosophila cuticle genes . 
+ J Mol Biol . 
+ 1988 ; 203 ( 2 ) :411 -- 23 . 
+ PMID : 2462055 
+ 69 . 
+ Xi Z , Ramirez JL , Dimopoulos G . 
+ The Aedes aegypti toll pathway controls Dengue virus infection . 
+ PLoS Pathog . 
+ 2008 ; 4 ( 7 ) : e1000098 . 
+ doi : 10.1371 / journal.ppat .1000098 PMID : 18604274 
+ 72 . 
+ Dziarski R. Peptidoglycan recognition proteins ( PGRPs ) . 
+ Mol Immunol . 
+ 2004 ; 40 ( 12 ) :877 -- 86 . 
+ PMID : 14698226 
+ 74 . 
+ Blaser MJ . 
+ Antibiotic use and its consequences for the normal microbiome . 
+ Science . 
+ 2016 ; 352 ( 6285 ) :544 -- 5 . 
+ doi : 10.1126 / science.aad9358 PMID : 27126037 
+ 77 . 
+ Sommer F , Backhed F . 
+ The gut microbiota -- masters of host development and physiology . 
+ Nat Rev Microbiol . 
+ 2013 ; 11 ( 4 ) :227 -- 38 . 
+ doi : 10.1038 / nrmicro2974 PMID : 23435359 
+ 81 . 
+ Nijhout HF , Riddiford LM , Shingleton MC , Suzuki Y , Callier V . 
+ The developmental control of size in insects . 
+ WIREs Developmental Biology . 
+ 2013 ; 3:113 -- 34 . 
+ doi : 10.1002 / wdev .124 PMID : 24902837 
+ 82 . 
+ Harbison ST , Chang S , Kamdar KP , Mackay TF . 
+ Quantitative genomics of starvation stress resistance in Drosophila . 
+ Genome Biol . 
+ 2005 ; 6 ( 4 ) : R36 . 
+ doi : 10.1186 / gb-2005-6-4-r36 PMID : 15833123 
+ 84 . 
+ Schuler AM , Wood PA. . 
+ Mouse models for disorders of mitochondrial fatty acid beta-oxidation . 
+ ILAR J. 2002 ; 43 ( 2 ) :57 -- 65 . 
+ PMID : 11917157 
+ 88 . 
+ Mutti NS , Dolezal AG , Wolschin F , Mutti JS , Gill KS , Amdam GV . 
+ IRS and TOR nutrient-signaling pathways act via juvenile hormone to influence honey bee caste fate . 
+ J Exp Biol . 
+ 2011 ; 214 ( 23 ) :3977 -- 84 . 
+ 91 . 
+ World Health Organization . 
+ World Malaria Report . 
+ Geneva : World Health Organization , 2015 .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/28348816.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/28348816.txt 0 → 100644
View file @27818a9
+ Genome-wide analysis of the response to nitric oxide n uropathogenic Escherichia coli CFT073
+ Abbreviations : ChIP-seq , chromatin immunoprecipitation and DNA sequencing ; DAVID , database for annotation visualization and integrated discovery ; EBSeq , empirical Bayes hierarchical model for inference in RNA-seq experiments ; iNOS , inducible nitric oxide synthase ; NONOate , compound with formula R1R2N - ( NO2 ) - N5O ; MACS2 , model-based analysis for ChIP-seq ; MEME , multiple Em for motif elicitation ; PSWM , position-speciﬁc weight matrix ; RNA-seq , whole transcriptome shotgun sequencing ; RSEM , RNA-Seq by expectation-maximization ; RT-PCR , reverse transcriptase PCR ; TCA , tricarboxylic acid ; TPM , transcripts per million ; UPEC , uropathogenic Escherichia coli ; UTI , urinary tract infection . 
+ Data Summary
+ 1 . 
+ RNA-seq data have been deposited in the GEO database ; accession number : GSE69830 ( url -- http://www.ncbi.nlm . 
+ nih.gov/geo/query/acc.cgi ? 
+ acc5GSE69830 ) 
+ 2 . 
+ ChIP-seq data have been deposited in the GEO database ; accession number : GSE69829 ( url -- http://www.ncbi.nlm . 
+ nih.gov/geo/query/acc.cgi ? 
+ acc5GSE69829 ) 
+ Introduction
+ Extraintestinal Escherichia coli are a group of bacteria that can survive as harmless human intestinal inhabitants but are serious pathogens when they enter the appropriate environment ( Welch et al. , 2002 ) . 
+ Uropathogenic Escheri-chia coli ( UPEC ) strain CFT073 is one such pathogen that is a causative agent of urinary tract infections ( UTIs ) and was isolated from the blood of a woman suffering from acute pyelonephritis ( Mobley et al. , 1990 ) . 
+ UPEC is responsible for 80 % of all symptomatic and asympto-matic UTIs ( Roos et al. , 2006 ) . 
+ During UTI , the mucosal inﬂammatory response is activated , which causes neutro-phils to inﬁltrate and migrate through the tissues and into the urine ( Godaly et al. , 2001 ) . 
+ Thus , UPEC is exposed to the defence mechanisms of the innate immune system . 
+ The antimicrobial properties of nitric oxide ( NO ) are exploited by cells of the innate immune system . 
+ In response to stimulation by proinﬂammatory cytokines and lipopolysaccharides of microbial pathogens , phagocy-tic cells express an inducible nitric oxide synthase ( iNOS ) , which oxidizes arginine to produce NO ( Fang , 1997 ; Fang & Vazquez-Torres , 2002 ; Mowat et al. , 2010 ) . 
+ iNOS is also expressed in epithelial cells of the urinary tract ( Poljakovic et al. , 2001 ) . 
+ NO ( and related N radicals that are derived from NO ) target proteins containing iron -- sulfur clusters , haem and thiols ( Fang , 1997 ; Fang & Vazquez-Torres , 2002 ; Kim et al. , 1995 ; Ren et al. , 2008 ) . 
+ Because of the cytotoxic properties of NO and its conge-ners , bacteria that proliferate within the host often employ strategies to convert NO into a non-toxic product . 
+ E. coli has three known enzymes that detoxify NO . 
+ Flavohaemoglobin ( Hmp ) is an NO denitrosylase that oxidizes NO to nitrate , and may reduce NO to N2O in the absence of oxygen ( Gardner & Gardner , 2002 ; Hausladen et al. , 2001 ) . 
+ Hmp has a role in protecting UPEC from NO stress in the host environment ( Svensson et al. , 2010 ) . 
+ Flavorubredoxin ( FlRd or NorV ) together with its NADH-linked reductase NorW reduces NO to N O ( Gomes 2 et al. , 2000 ) . 
+ FlRd is sensitive to oxygen in vitro , and so the enzyme has been described as an ` anaerobic ' NO reductase ( da Costa et al. , 2003 ; Gardner et al. , 2002 ) . 
+ The periplasmic nitrite reductase Nrf can catalyse the reduction of NO to ammonia under anaerobic conditions , a reaction that might contribute to defence against NO ( Poock et al. , 2002 ) . 
+ Recently , it has been suggested that the hybrid cluster protein Hcp is a key player in the response to NO under anaerobic conditions , when E. coli is lacking all previously known NO scavenging enzymes ( Cole , 2012 ) , although the enzymic activity of Hcp remains enigmatic . 
+ The response to NO in E. coli involves several transcription factors , including NsrR , FNR , SoxR , OxyR , NorR and Fur ( Bodenmiller & Spiro , 2006 ; Cruz-Ramos et al. , 2002 ; D'Autre ́aux et al. , 2002 ; Gardner et al. , 2002 ; Hausladen et al. , 1996 ; Pomposiello & Demple , 2001 ) . 
+ Most of these regulators primarily sense other signals such as oxygen , hydrogen peroxide and iron ( Bodenmiller & Spiro , 2006 ) , but NorR and NsrR serve as dedicated sensors of NO ( Tucker et al. , 2010 ) . 
+ The mononuclear non-haem iron centre of NorR directly senses NO , in response to which NorR activates transcription of norVW ( D'Autre ́aux et al. , 2005 ; Tucker et al. , 2010 ) . 
+ NsrR is an [ Fe -- S ] protein that is an NO-sensitive repressor of its target genes ( Bod-enmiller & Spiro , 2006 ; Filenko et al. , 2007 ; Pullan et al. , 2007 ) . 
+ The hmp gene is subject to complex regulation by multiple regulators including NsrR and FNR ( Spiro , 2007 ) . 
+ Apart from hmp , the NsrR regulon contains various genes implicated in the NO stress response , such as ytfE , hcp and the nrf operon ( Tucker et al. , 2010 ) . 
+ In addition to recognizing an 11 -- 1 -- 11 bp inverted repeat sequence in its target promoters , it has been suggested that NsrR can also bind to a single copy of the 11 bp motif ( Partridge et al. , 2009 ) . 
+ In E. coli , the only known genes activated by NorR are norVW , but binding sites for NorR have been found upstream of the hmp gene in Pseudomonas aer-uginosa , Pseudomonas putida and Vibrio cholerae and the gene encoding a respiratory NO reductase in the denitriﬁer Ralstonia eutropha ( Rodionov et al. , 2005 ; Stern et al. , 2012 ; Tucker et al. , 2004 ) . 
+ Low concentrations of NO are generated endogenously by E. coli as a by-product of respiratory nitrate and nitrite reduction ( Corker & Poole , 2003 ; Ji & Hollocher , 1988 ) . 
+ Nitrate and nitrite are sensed directly by the NarXL and NarQP two-component regulatory systems ( Gunsalus , 1992 ; Stewart , 1993 ) . 
+ Thus , during nitrate or nitrite respiration , complex changes occur in the transcriptome that are mediated by NarXL / NarQP in addition to the above-mentioned NO-responsive regulators ( Constantinidou et al. , 2006 ) . 
+ Global gene expression analysis has been used to study UPEC during UTI and demonstrates that UPEC is directly exposed to NO in the host environment ( Hagan et al. , 2010 ; Haugen et al. , 2007 ; Roos & Klemm , 2006 ; Snyder et al. , 2004 ) as well as to nitrate in urine ( Green et al. , 1981 ; Radomski et al. , 1978 ) . 
+ UPEC is more resistant to the stress imposed by acidiﬁed nitrite than K-12 strains of E. coli ( Bower & Mulvey , 2006 ) and may also be more resistant to a prolonged exposure to NO ( Svensson et al. , 2006 ) , in which case toxicity might be due to N radicals derived from NO . 
+ We have shown that CFT073 recovers from an exposure to NO no better than a K12 strain and that recovery is partly , although not entirely , dependent on Hmp ( Spiro , 2011 ) . 
+ Thus , we were motivated to undertake a deeper analysis of the determinants of NO resistance in CFT073 . 
+ In this paper , we show that apart from Hmp , FlRd is a major contributor to aerobic NO detoxiﬁcation in UPEC . 
+ We also show that CFT073 possesses at least one novel anaerobic NO scavenging mechanism in addition to Hmp and FlRd . 
+ We use expression analysis to examine the response of CFT073 to a physiological source of NO , and map NsrR binding sites in the CFT073 genome . 
+ Methods
+ Bacterial strains and growth conditions . 
+ The strains used in this work are listed in Table S1 ( available in the online Supplementary Material ) . 
+ The rich medium was L broth ( per litre : 10 g tryptone , 5 g yeast extract , 5 g NaCl ) , supplemented with 0.5 % glucose for anaerobic cultures . 
+ To treat cultures with NO , 50 mM spermine NONOate ( which releases two equivalents of NO with a half-life of 39 min at 37 uC ; Cayman Chemicals ) was added to cultures during the early exponential phase ( OD650 of 0.15 -- 0.3 ) . 
+ [ NONOate has the chemical formula R R N - ( NO ) - N5O . ] 
+ Anaerobic cultures were 1 2 2 grown in ﬁlled bottles supplemented with 20 mM nitrate where indicated . 
+ Cultures for RNA isolation were grown anaerobically in MOPS minimal medium ( Neidhardt et al. , 1974 ) supplemented with 0.05 % Casamino acids , 0.5 % glucose ( and 20 mM nitrate as indicated ) , and 5 mg vitamin B1 ml . 
+ Gene deletions were made using the 21 method of Datsenko & Wanner ( 2000 ) . 
+ Oxygen and nitric oxide consumption assays . 
+ For oxygen consumption assays , 30 ml cultures grown aerobically in L broth ( with and without 1 h of 50 mM spermine NONOate treatment ) were harvested , washed and resuspended in 50 mM HEPES ( pH 7.4 ) , 100 mM NaCl , 5 mM KCl , 1 mM MgCl2 , 1 mM NaH2PO4 , 1 mM CaCl2 and 1 mM glucose ( Stevanin et al. , 2000 ) . 
+ All samples were resuspended at equal cell densities . 
+ Oxygen consumption was measured using a Clark-type oxygen electrode ( Hansatech Instruments ) . 
+ A 100 ml aliquot of cells was added to 500 ml buffer in a capped , waterjacketed chamber at 37 uC . 
+ The NO sensitivity of oxygen uptake was measured by addition of 20 mM proli NONOate ( which releases two equivalents of NO with a half-life of 1.8 s at 37 uC ; Cayman Chemicals ) when the oxygen concentration was 60 , 120 or 180 mM . 
+ For NO consumption assays , 30 ml of cultures grown anaerobically in L broth ( treated with 50 mM spermine NONOate for 2 h or grown with 20 mM nitrate ) were harvested and resuspended in the same HEPES buffer ( without glucose ) . 
+ The water-jacketed chamber housing an amperometric NO-speciﬁc electrode ( ISO NOP electrode ; WPI Instruments ) was maintained at 37 uC . 
+ Cell suspension ( 0.5 ml ) and buffer ( 1.5 ml ) were added to the chamber , and oxygen was removed with 5 ml of 1 M glucose , 5 ml of 30 mg glucose oxidase ml and 5 ml of 7 mg catalase ml . 
+ When oxygen was 21 21 undetectable , 20 mM proli NONOate was added and the rate of NO consumption was measured . 
+ The same procedure was used to measure NO consumption by cell fractions . 
+ Cell fractionation . 
+ Cells were fractionated using a modiﬁcation of a previously described procedure ( Alefounder & Ferguson , 1980 ) . 
+ Cultures were grown anaerobically in 300 ml L broth supplemented with 0.5 % glucose , and in some cases were treated with 50 mM spermine NONOate . 
+ Cultures were harvested by centrifugation and washed twice in 10 mM potassium phosphate buffer ( pH 7.6 ) . 
+ Cell pellets were resuspended in 5 ml spheroplasting buffer ( 0.5 M sucrose , 3 mM sodium EDTA and 0.1 M Tris/HCl , pH 8.0 ) , 0.2 mg lysozyme ml was added and the suspension was 21 incubated at 30 uC for 30 min . 
+ The suspension was centrifuged at 12 200 g for 15 min at 4 uC , and the supernatant ( periplasmic fraction ) was kept on ice . 
+ The pellet was resuspended in 1 ml of 0.1 M Tris/HCl ( pH 8.0 ) and slowly added drop-wise into 4.5 ml water with constant stirring at 4 uC . 
+ After the mixture became homogeneous , it was centrifuged at 47 800 g for 1 h at 4 uC . 
+ The supernatant ( cytoplasmic fraction ) was stored on ice and the pellet ( membrane fraction ) was resuspended in 0.1 M Tris/HCl ( pH 8.0 ) and kept on ice . 
+ Malate dehydrogenase was used as a marker to check the integrity of cell fractions . 
+ Malate dehydrogenase activity ( Sutherland & McAlister-Henn , 1985 ) was detected only in cytoplasmic fractions . 
+ RNA sequencing . 
+ Cultures were grown in triplicate as described above , and total RNA was isolated using the Qiagen RNeasy Protect Bacteria Mini kit . 
+ For rRNA depletion , samples were treated using the MICROBExpress Bacterial mRNA Enrichment kit ( Life Technologies ) according to the manufacturer 's instruct-ions . 
+ Samples were cleaned with the Zymo RNA Clean and Concentrator kit ( Zymo Research ) and then subjected to a second cycle of rRNA depletion . 
+ RNA was recovered by ethanol precipitation . 
+ Library preparation and whole transcriptome shotgun sequencing ( RNA-seq ) was performed at the University of Texas Southwestern Medical Center Genomics and Microarray Core Facility . 
+ The program Bowtie ( Langmead et al. , 2009 ) was used to align RNA-seq reads to the genome of E. coli CFT073 ( GenBank accession no . 
+ AE014075 .1 ) with default par-ameters . 
+ To estimate transcript abundances , transcripts per million ( TPM ) values were calculated using RNA-Seq by expectation-maximization ( RSEM ; Li & Dewey , 2011 ) . 
+ Gene annotations were obtained from the European Nucleotide Archive ( accession no . 
+ AE014075 .1 ) . 
+ Differential expression between conditions with and without NO treatment was analysed using EBSeq ( an empirical Bayes hierarchical model for inference in RNA-seq experiments ; Leng et al. , 2013 ) . 
+ A change greater than twofold and a false discovery rate cut-off of 0.05 were used to determine signiﬁcant differential expression . 
+ To identify functional categories of differentially expressed genes and to identify enriched pathways , we used the DAVID ( database for annotation , visualization and integrated discovery ) gene functional classiﬁcation tool with default statistical parameters and Benjamini correction ( Huang et al. , 2007 ) with an adjusted P-value cut-off of 0.05 . 
+ RNA-seq data have been deposited in the GEO database , accession number GSE69830 ( Data Citation 1 ) . 
+ Reverse transcriptase PCR ( RT-PCR ) . 
+ Wild-type CFT073 and 3X mutant cells were grown in triplicate as described for RNA-seq . 
+ Total RNA was isolated using the Qiagen RNeasy Protect Bacteria Mini kit , and 2 mg of total RNA was used to make cDNA using Ambion 's RETROscript Reverse Transcription kit . 
+ Then , 1 ml of cDNA was used as template to amplify genes that were chosen for validation . 
+ PCR was performed using Thermo Scientiﬁc DreamTaq PCR Master Mix and 10 ml of PCR product was run on a 2 % agarose gel . 
+ Chromatin immunoprecipitation followed by highthroughput sequencing . 
+ Chromatin immunoprecipitation ( ChIP ) was performed as described previously on cultures grown aerobically in L broth to mid-exponential phase ( Efromovich et al. , 2008 ) . 
+ Immunoprecipitated and puriﬁed DNAs ( 10 ng ) from three cultures of CFT073 were collected for sequencing , along with 10 ng of the input DNA as a control . 
+ Samples were sheared by sonication to within a size range of 200 -- 600 bp . 
+ DNA fragments were treated using an Epicentre End-It DNA End Repair kit and 39 A overhangs were added with DNA polymerase I ( Klenow fragment ) . 
+ Adapters from the IlluminaTruSeq DNA sample preparation kit were ligated using LigaFast ( Promega ) and DNAs were ampliﬁed by PCR using primers provided in the IlluminaTruSeq DNA sample preparation kit and Phusion DNA polymerase ( NEB ) . 
+ Products of the ligation reaction and PCR ampliﬁcation in the range 300 -- 400 bp were puriﬁed by 2 % agarose gel electrophoresis . 
+ DNA concentrations were measured using Qubit dsDNA HS Assay kits ( Invitrogen ) . 
+ DNA sequencing was done on the Miseq ( Illumina ) platform following the manufacturer 's instructions . 
+ For one replicate , a single-end reads , 60 bp run was performed . 
+ For the other two replicates , a paired-end reads , 100 bp run was performed . 
+ Sequence reads were aligned with the published E. coli CFT073 genome ( AE014075 .1 ) using the software package Bowtie with the parameters ` bowtie - k 1 - X 500 - m 1 ' ( Langmead et al. , 2009 ) . 
+ Peaks were identiﬁed using the peak ﬁnding algorithm of MACS2 ( Zhang et al. , 2008 ) , with default parameters . 
+ For motif analysis , multiple Em for motif elicitation ( MEME ) was used to identify over-represented sequences ( Bailey & Elkan , 1994 ) . 
+ PatSer was used to search the genome for the presence of the NsrR position-speciﬁc weight matrix ( PSWM ) ( Hertz & Stormo , 1999 ) . 
+ A precision -- recall curve was constructed to determine the optimal threshold for predicting high-quality NsrR binding sites . 
+ Precision was deﬁned as the ratio of true positives ( locations with an NsrR ChIP-seq peak and a predicted NsrR binding site ) to true positives plus false positives ( locations with a predicted NsrR binding site but no NsrR ChIP-seq peak ) . 
+ Recall was deﬁned as the ratio of true positives divided by true positives plus false negatives ( locations with an NsrR ChIP-seq peak but no NsrR predicted binding site ; Myers et al. , 2013 ) . 
+ ChIP-seq data have been deposited in the GEO database , accession number GSE69829 ( Data Citation 2 ) . 
+ Results
+ FlRd contributes to aerobic NO detoxiﬁcation
+ The norV gene is absent from current annotations of the E. coli CFT073 genome ( Welch et al. , 2002 ) . 
+ However , we have resequenced the norV region and found a sequen-cing error in the published sequence that leads to a frame-shift mutation . 
+ The corrected norV sequence encodes a protein product that is 99 % identical to FlRd ( NorV ) of E. coli K-12 ( 476/479 residues identical ) . 
+ Thus , we performed experiments to determine whether E. coli CFT073 expresses an active FlRd , and to assess its contribution to NO detoxiﬁcation . 
+ The E. coli K-12 FlRd functions as an ` anaerobic ' NO reductase , although it is capable of reducing NO in vivo in microaerobic cultures growing under an atmosphere containing * 5 mM oxygen ( Gardner et al. , 2002 ) . 
+ FlRd can also function as an oxygen reductase , albeit with a rather low afﬁnity for oxygen ( Gomes et al. , 2002 ) . 
+ Enzymes from the ﬂavo-diiron family in other organisms are inactivated by oxygen in vitro ( Silaghi-Dumitrescu et al. , 2003 , 2005 ) , so a view has emerged that FlRd functions in NO detoxiﬁcation only in cultures growing anaerobically . 
+ However , the norVW genes can be induced by sources of NO in aerobic cultures , so a physiological role for FlRd under aerobic conditions can not be excluded ( Hutchings et al. , 2002 ; Mukhopadhyay et al. , 2004 ) . 
+ Respiration in E. coli is sensitive to NO ( Yu et al. , 1997 ) and the activity of an NO scavenging enzyme can be studied by observing its ability to protect aerobic respir-ation from NO inhibition ( Stevanin et al. , 2000 ) . 
+ Washed cell suspensions respiring oxygen were exposed to NO in a Clark-type oxygen electrode , and the inhibitory effect of NO was observed as a transient decrease in the rate of oxygen consumption . 
+ In these experiments , oxygen uptake returns to normal after an interval that depends upon the ability of the strain to scavenge NO ( Fig. 1 ) . 
+ Wild-type cells that were exposed to NO during growth exhibited NO-resistant oxygen uptake , while the respir-ation of cells that were not pre-exposed to NO was sensitive to NO ( duration of inhibition 1.9 +0.04 min , Fig. 1e ) . 
+ In an hmp mutant , respiration became more resistant to NO if the culture was pre-induced with NO ( 4.3 +0.7 versus 1.4 +0.4 min inhibition for uninduced and induced cells , respectively ; Fig. 1a ) . 
+ Respiration of the norVW mutant was completely NO-resistant if cells were induced with NO , but NO-sensitive ( 1.7 +0.6 min inhibition ) if cells were not induced ( Fig. 1b ) . 
+ Thus , both hmp and norVW mutants show evidence of an NO-induci-ble scavenging activity , which is , presumably , FlRd and Hmp , respectively . 
+ Accordingly , in an hmp norVW double mutant ( Fig. 1c ) , oxygen consumption was equally sensitive to NO whether or not the cultures were exposed to NO ( 6 +0.1 and 6.3 +0.3 min inhibition for uninduced and induced cells , respectively ) . 
+ Under the growth and assay conditions used for these experiments , NO-inducible NO scavenging by CFT073 can be entirely accounted for by the combined activities of Hmp and FlRd . 
+ By measuring NO inhibition of respiration of the hmp mutant at different oxygen concentrations , we concluded that FlRd can scavenge NO in vivo in the presence of as much as 180 mM oxygen ( Fig. 1g -- i ) . 
+ Complementing the hmp norV double mutant with norV on a plasmid expressed from an inducible promoter showed that FlRdmediated protection of respiration was restored ( data not shown ) . 
+ Interestingly , complementation failed if norV was expressed on a plasmid from its own promoter , as we have observed before ( Hutchings et al. , 2002 ) , and successful complementation required expression from a heter-ologous promoter . 
+ A novel inducible anaerobic NO scavenging activity
+ We generated a triple mutant of E. coli CFT073 lacking the three known NO detoxiﬁcation systems , Hmp , FlRd and NrfA ( this strain is designated UTD692 , and will be referred to as ` 3X ' ) . 
+ Washed cells of the 3X strain grown anaerobically with nitrate ( nitrate provides a source of endogenously generated NO under anaerobic conditions ; Ji & Hollocher , 1988 ) , or induced anaerobically with NO showed a higher rate of NO consumption compared with an uninduced strain ( Fig. 2 ) . 
+ Thus , in cells grown and assayed anaerobically , there is evidence of a novel NO-inducible activity . 
+ A wild-type strain grown and assayed under similar conditions showed only moderately increased rates of NO consumption compared with the 3X mutant ( Fig. 2 ) . 
+ The dominant activity of Hmp requires molecular oxygen , so Hmp is not expected to contribute to NO consumption under these assay conditions . 
+ The measured activity is therefore a combination of FlRd and the novel activity , the latter seeming to be a major contri-butor in cells grown and assayed anaerobically . 
+ Cell fractionation experiments revealed this activity to be associated with the cell membrane ( Fig. 2c ) . 
+ Interestingly , NO uptake by membrane fractions of the triple mutant required neither an exogenous reductant nor an oxidizing agent . 
+ Possible candidates for the source of this activity include NirB ( Vine & Cole , 2011b ) and the hybrid cluster protein , Hcp ( Cole , 2012 ; Vine & Cole , 2011a ) , although in both cases NO reduction would be dependent on NADH . 
+ Introduction of nirB and hcp-hcr mutations into the 3X mutant , either individually or in combination , had no effect on the NO scavenging activity , so NirB and Hcp can be excluded as the source of the activity we observe . 
+ Introduction of a mobA mutation ( which eliminates nitrate reductase activity ) into the 3X mutant eliminated the response to nitrate , conﬁrming that nitrate reduction and therefore endogenous NO generation is probably required for nitrate-mediated induction of the NO scavenging activity . 
+ In a 3X nsrR mutant , NO scavenging could be induced by nitrate but not by NO in anaerobic cultures ( Fig. 3 ) . 
+ The same outcome was seen in a CFT073 nsrR mutant ( data not shown ) , which further demonstrates that under these conditions , Hmp , which is de-repressed in an nsrR mutant ( Filenko et al. , 2007 ) , is not functional . 
+ In the 3X strain with an fnr mutation , the NO scavenging activity became constitutive ( Fig. 3 ) , implying that FNR acts negatively on the expression of the gene ( s ) encoding the activity . 
+ In a 3X narL narP mutant , the activity could be induced by nitrate but not NO in anaerobic cultures ( data not shown ) . 
+ Individual 3X narL and 3X narP mutants behaved like the 3X parent strain . 
+ In further attempts to identify the source of the NO scavenging activity , we tested candidate genes by introdu-cing the corresponding deletion mutations into the 3X mutant . 
+ Candidates were identiﬁed on the basis of one or more of the following criteria : ( 1 ) a primary structure suggesting a possible role in NO metabolism ; ( 2 ) a previously described expression pattern matching our observations described above ; and ( 3 ) an expression pattern in our RNA-seq data ( see below ) similar to the behaviour of the novel activity . 
+ Some genes were also tested that might be indirectly involved in expression of this activity ( e.g. mobA , required for the activity of enzymes containing the molybdopterin guanine dinucleotide cofactor ) . 
+ In this way , we showed that 26 genes or operons are not required for expression of the novel NO scavenging activity ( aegA , betA , cydAB , cyoAB , fdhE , hcp-hcr , mobA , ndh , nirB , nrdA , poxB , putA , rnr , sdhA , tehAB , yceJI , ydcX , ydhXV , yeaR-yoaG , yebE , yedY , yeiH , ygbA , yhaM , yibIH , ytfE ) . 
+ Further efforts to identify the source of the novel NO scavenging activity described in this paper have so far proved unsuccessful . 
+ Vine & Cole ( 2011b ) have reported that a norVW nrf nirB hmp quadruple mutant of E. coli K-12 consumes NO at rates comparable to those of the wild-type strain . 
+ Our data suggest that Hmp and FlRd are the major NO scavenging activities of E. coli CFT073 under aerobic conditions , and the novel activity we describe makes a signiﬁcant contribution to NO consumption only under anaerobic growth conditions . 
+ Transcriptional proﬁling
+ We used transcriptomics ( RNA-seq ) to explore the response of E. coli CFT073 to a source of NO . 
+ In part , this experiment was motivated by a desire to identify the gene ( s ) encoding the novel NO scavenging activity . 
+ Therefore , we used a strain and growth conditions identical to those used for the initial detection of this activity , as described above . 
+ That is , the transcriptome of the triple mutant was analysed in anaerobic cultures grown in the presence and absence of nitrate . 
+ By differential gene expression analysis of the RNA-seq data , we identiﬁed 525 upregulated and 649 downregulated genes in the nitrate-treated cultures ( genes showing greater than two-fold change with a 0.05 false discovery rate cut-off ) . 
+ Among the most highly upregulated genes ( Table 1 ) were some that are known to be regulated by NsrR and to respond to NO , including yeaR , ytfE , hcp-hcr and ygbA ( Filenko et al. , 2007 ; Pullan et al. , 2007 ) . 
+ Previous transcriptomics experiments with E. coli K-12 have also shown that members of the NsrR regulon are de-repressed in cultures grown anaerobically with nitrate ( Constanti-nidou et al. , 2006 ) . 
+ RNA was extracted from anaerobically grown cultures , so it was surprising that the most highly upregulated genes included some involved in oxidative phosphorylation ( cyoABCDE , sdhABCD and nuoEF ) and the tricarboxylic acid ( TCA ) cycle ( acnB , icdA , sucD , lpdA , sdhABCD , fumA and mdh ) . 
+ Several genes encoding ABC transporters ( dpp operon , proVW , gltK , kpsMT ) were also upregulated . 
+ Other genes showing increased expression in nitrate-grown cells were those involved in nitrogen metabolism ( nitrate respiration ) , DNA repair and the SOS response , bacterial motility and chemotaxis . 
+ The most highly downregulated genes included some encoding enzymes involved in anaerobic metabolism , including hydrogenase ( hya operon ) and formate dehydrogenase ( fdhF ) . 
+ At least some of these regulatory effects may reﬂect inactivation of FNR by NO ( Cruz-Ramos et al. , 2002 ; Justino et al. , 2005 ; Pullan et al. , 2007 ) , or regulation by the nitratesensing two-component systems NarXL and NarQP ( Constantinidou et al. , 2006 ) . 
+ Genes involved in glycolysis , gluconeogenesis , pyruvate metabolism [ eno , pykF , pgi , fba , gapA , glk , pgk , ldhA , maeA ( sfcA ) ] , the pentose phosphate pathway ( talA , tktb , pgl ) and the metabolism of sugars ( fructose , sucrose and mannose ) showed decreased expression . 
+ Iron transport genes ( feoAB ) and some stress response genes ( clpB , dnaJ , dnaK , dps ) also showed reduced expression . 
+ Fig . 
+ S1 ( a ) provides an overview of the differentially expressed genes based on their occurrence in pathways , and Fig . 
+ S1 ( b ) provides an overview based on functional categories . 
+ It is known that the citric acid cycle is repressed in E. coli grown anaerobically in nitrate with glucose as the carbon source ( Prohl et al. , 1998 ) . 
+ Under anaerobic conditions with glucose and nitrate , ArcA represses operons encoding a-ketoglutarate dehydrogenase and succinate dehydrogen-ase , thus preventing complete oxidation of glucose to carbon dioxide ( Prohl et al. , 1998 ) . 
+ However , our RNA-seq data for the 3X mutant show increases in the transcript levels of the genes encoding members of both these enzyme complexes as well as various other genes involved in the intact aerobic TCA cycle . 
+ This suggests that nitrate ( and possibly NO generated by nitrate respiration ) impacts the TCA cycle by allowing complete oxidation of acetyl-CoA and this may have a role to play in energy generation in the presence of NO . 
+ Our data are consistent with a recent ﬁnding that the TCA cycle is necessary for UPEC ﬁtness in vivo ( Alteri et al. , 2009 ) , conditions in which UPEC is known to be exposed to NO ( Lundberg et al. , 1996 ; Mysorekar et al. , 2002 ; Poljakovic et al. , 2001 ) as well as nitrate ( Green et al. , 1981 ; Radomski et al. , 1978 ) . 
+ Numerous studies indicate that iNOS expression levels increase and NO is released in the urinary tract during bacterial infection ( Kaboré et al. , 1999 ; Lundberg et al. , 1996 ; Smith et al. , 1996 ; Wheeler et al. , 1997 ) . 
+ Also , dietary nitrate is excreted into urine where UPEC can potentially use it for respiration , generating NO as a byproduct ( Corker & Poole , 2003 ; Ji & Hollocher , 1988 ; Lidder & Webb , 2013 ) . 
+ Although the role of NO is presumably anti-microbial , the pathogen may use host NO ( as well as NO derived from nitrate respiration ) as a signal to induce the expression of virulence genes . 
+ We make this suggestion based on the observation that virulence-related genes are upregulated in UPEC cultures exposed to nitrate and/or to NO produced by nitrate respiration ( Fig . 
+ S1c ) . 
+ The secreted autotransporter toxin ( Sat ) is a serine protease that causes cytoplasmic vacuolation and histological damage in urinary-tract-derived epithelial cells ( Guyer et al. , 2002 ) . 
+ The ﬂagellar gene ﬂiC contributes to ﬁtness of UPEC and enhances its pathogenesis ( Lane et al. , 2005 ) , and other genes involved in ﬂagellar assembly ( ﬂiD , ﬂiS , ﬂiR , motAB and ﬂgM ) were upregulated in nitrate-grown cells . 
+ The ﬁmbrial site-speciﬁc recombinases ﬁmB and ﬁmE can be associated with virulence , as they regulate type 1 ﬁmbrial gene expression . 
+ Type 1 ﬁmbriae are known to enhance E. coli virulence in the urinary tract ( Bryan et al. , 2006 ; Connell et al. , 1996 ) . 
+ Haemolysin ( encoded by hly genes ) is a cytotoxin for renal proximal tubular epithelial cells and a haemolysin-deﬁcient CFT073 mutant demonstrates signiﬁcantly reduced cytotoxicity ( Mobley et al. , 1990 ) . 
+ The sialic acid capsule proteins ( encoded by kps genes ) , also known as K antigens , encapsulate bacteria so that they can evade unspeciﬁc host responses ( Jann & Jann , 1992 ; Rowe et al. , 2000 ) . 
+ BipA is identiﬁed as a virulence regulator in enteropathogenic E. coli that regulates several processes such as ﬂagellamediated motility , resistance to host defence peptides and group 2 capsule gene clusters ( Farris et al. , 1998 ; Rowe et al. , 2000 ) . 
+ The dipeptide binding protein DppA delivers dipeptides to its cognate ABC-type transporter proteins . 
+ As sugar sources such as glucose , maltose and lactose are rare in the urinary tract , it is suggested that dipeptides and certain amino acids such as D-serine are important sources of nutrients for UPEC ( Haugen et al. , 2007 ) . 
+ Upregulation of dppA is observed during UTI ( Subashchandrabose et al. , 2014 ) . 
+ A periplasmic osmoprotectant , ProV , is upregulated during UTI ( Subashchandrabose et al. , 2014 ) and we observe upregulation during our growth conditions as well . 
+ Very often , pathogens experience increased osmotic pressure at the site of infection and hence acquire osmoprotectants from the environment ( Lewis et al. , 2012 ) . 
+ KpsMT , DppA and ProV are ABC-type transporters that , under appropriate conditions , become important for viability , virulence and pathogenicity ( Davidson et al. , 2008 ) . 
+ Upregulation of all these viru-lence-related genes suggests that , while NO provides a defence mechanism for the host and nitrate provides an electron acceptor for pathogens , either or both could also provide a useful signalling mechanism for pathogenic bacteria to induce virulence . 
+ Several genes were selected from the RNA-seq dataset for validation using RT-PCR . 
+ Expression of these genes was measured in both the wild-type strain and the 3X mutant grown anaerobically with nitrate ( the same conditions used for RNA-seq ) . 
+ Of seven genes upregulated by nitrate according to RNA-seq data , six ( sdhA , kpsM , bipA , cyoA , dppA and ytfE ) were also upregulated in RT-PCR data , in both strains ( Fig. 4 ) . 
+ The seventh ( hlyA ) was upregulated only in the 3X mutant . 
+ Three genes that were downregulated by nitrate in RNA-seq data ( asr , hycA and fdhF ) were also downregulated according to RT-PCR ( Fig. 4 ) . 
+ On the basis of this selection of genes , we conclude that most changes observed in the transcriptome of the 3X mutant in response to nitrate are likely also to occur in the wild-type parent . 
+ The NsrR regulon of E. coli CFT073
+ As the E. coli CFT073 genome is * 0.6 Mb larger than that of E. coli K-12 , it is of interest to determine the extent to which regulatory networks of the two organisms differ . 
+ Thus , we used chromatin immunoprecipitation and DNA sequencing ( ChIP-seq ) to identify NsrR binding sites in the E. coli CFT073 genome . 
+ Cultures expressing 3Xﬂagtagged NsrR were grown aerobically . 
+ After ChIP , libraries constructed from precipitated DNAs were sequenced using the Illumina Miseq platform . 
+ The peak ﬁnding algor-ithm MACS2 was used to identify putative NsrR binding sites , with a false discovery rate of 0.01 . 
+ Ninety-four signiﬁcant peaks [ 2log10 ( P-value ) w10 with fold enrichment greater than 2 ] were identiﬁed in at least two of the three biological replicates . 
+ In total , 52 % of the binding sites ( 49 of 94 ) in E. coli CFT073 were located in putative promoter regions ( within 350 bp of the start codon ) and the remaining 48 % were found either within coding regions or between the coding regions of convergent genes . 
+ These potentially functional 49 NsrR binding sites are shown in Table 2 . 
+ The presence of promoter-associated NsrR binding sites identiﬁes target genes that potentially belong to the NsrR regulon . 
+ Of these promoters bound by NsrR in vivo ( Table 2 ) , 19 ( grxD , hypA , ytfE , ygiG/folB , hmp , ybjW/hcp , feoA , ybeM , yihF , yccM , yibD/waaH , yieI , yohK , ygiF , trxB , yggS/yggT , dgoK , rfe , yfhB/pgpC ) were identiﬁed in a previous ChIP-chip analysis of NsrR binding sites in E. coli K-12 ( Partridge et al. , 2009 ) . 
+ Twenty of the remaining sites are associated with genes ( hycB , phnC , arnA , livJ , wzxE , tufB , yfbT , glpA , yjbN , deoB , murB , recJ , dkgB , c3139 , rhaB , arcB , c3976/nanK , ﬁmC , c3934/h ﬂB and malF ) that have homologues in E. coli K-12 , and 10 ( c0118 , c0233 , c2471 , c5065 , c0813 , c2514 , c0650 , c4580 , c4214 and c5205 ) are speciﬁc to E. coli CFT073 . 
+ In E. coli K-12 , the nrfA promoter is bound by NsrR ( Partridge et al. , 2009 ) and is repressed by NsrR according to microarray and reporter fusion data ( Filenko et al. , 2007 ) . 
+ In our ChIP-seq data , NsrR binding was also detected upstream of the transcription unit that includes nrfA . 
+ In strain CFT073 , an additional gene upstream of nrfA ( c5065 ) is predicted to be co-expressed with nrfABCD . 
+ The c5065 gene encodes a small protein of 65 aa residues . 
+ We have conﬁrmed the sequence of this reading frame in the CFT073 genome . 
+ The genome location and expression pattern of the c5065 gene suggest that its product may have a role in the response to NO stress in CFT073 . 
+ Nineteen of the 49 potential NsrR targets show differential expression with a fold change greater than 1.5 for the CFT073 3X mutant strain in the presence of a physiological source of NO ( Table 3 ) . 
+ The hycB , c0118 , feoA , ybeM , c5065 , c0813 , c2514 , glpA , deoB , hypAB , yohJK and yfhB / pgpC genes were downregulated , among which hycB , c0118 , c5065 , c0813 , c2514 , glpA and deoB are newly detected potential NsrR targets in CFT073 . 
+ The grxD , folB , ybjW/hcp , ytfE , yjbN , trxB and c5205 genes were upregulated and c5205 and yjbN are potential NsrR targets newly detected in CFT073 . 
+ The glpA gene , which encodes anaerobic glycerol-3-phosphate dehydrogenase subunit A , is downregulated in UPEC strain UTI89 exposed to acidi-ﬁed nitrite ( Bower et al. , 2009 ) . 
+ The livJ gene , which encodes a periplasmic Leu/Ile/Val-binding protein , is upre-gulated during in vitro growth in human urine ( Snyder et al. , 2004 ) . 
+ The ﬁmC gene encoding type 1 ﬁmbriae is upregulated in vivo during UTI ( Snyder et al. , 2004 ) . 
+ The rfe gene was upregulated in vivo compared with growth in human urine in vitro ( Hagan et al. , 2010 ) . 
+ The E. coli 
+ K-12 homologue of yfbT is upregulated in the presence of a source of NO ( Hyduke et al. , 2007 ) . 
+ In E. coli K-12 , the expression of arcB and malF was increased and decreased , respectively , after treatment with NO ( Hyduke et al. , 2007 ) , and phnC was upregulated by treatment with 1 mM S-nitrosoglutathione or acidiﬁed nitrite ( Mukho-padhyay et al. , 2004 ) . 
+ Computational analysis of NsrR binding sites in CFT073
+ The 49 peaks located in putative regulatory regions were used to construct a PSWM for NsrR binding sites in the CFT073 genome . 
+ Two hundred base pairs centred on the nucleotide with the largest tag density within each of the peaks was analysed ( Myers et al. , 2013 ) . 
+ The sequence of NsrR in CFT073 is identical to that in E. coli K-12 , and evidence from previous studies suggests that NsrR binding sites have two copies of an 11 bp motif arranged as an inverted repeat with 1 bp spacing ( Partridge et al. , 2009 ) . 
+ So , we ﬁrst used MEME to identify over-represented palindromic sequences with the parameters ` - mod zoops-nmotifs 1 - minw 23 - maxw 23 - revcomp -- pal ' to see if the same motif could be retrieved . 
+ Motifs matching the search criteria could be found in 20 of the 49 peak regions . 
+ As expected , the predicted NsrR binding site in CFT073 is similar to that for E. coli K-12 ( Fig. 5 ) . 
+ A precision -- recall curve ( see Methods ) was constructed using the NsrR PSWM with two inverted repeats and searching throughout the genome of CFT073 to determine the optimal threshold for predicting high-quality NsrR binding sites . 
+ Using an ln ( P-value ) of 214.28 as the cut-off , where we had both relatively high precision and recall , there were 27 predicted NsrR binding sites with the 11 -- 1 -- 11 inverted repeat ( palindrome ) motif in the CFT073 genome ( Table 4 ) . 
+ Four of these predicted targets were not detected by the ChIP-seq data ( tehA , yeaR , yhiX and ygbA ) . 
+ Among them , yeaR and ygbA are known to be regulated by NsrR ( Bodenmiller & Spiro , 2006 ; Lin et al. , 2007 ) , and the ygbA promoter was reported to be bound by NsrR in E. coli K-12 according to previous ChIP-chip data ( Partridge et al. , 2009 ) . 
+ Likewise , tehA was implicated as an NsrR target in E. coli K-12 by the same ChIP-chip data and by repressor titration ( Bodenmiller & Spiro , 2006 ) , and it was shown to be upregulated in the urinary tract in an asymptomatic bacteriuria strain of E. coli ( Roos & Klemm , 2006 ) . 
+ By contrast , reporter fusion data suggest that tehA is not regulated by NsrR ( Bodenmiller & Spiro , 2006 ) ; conﬂicting reports may reﬂect differences in growth conditions or genetic background . 
+ Minimally , we can conclude that yeaR and ygbA are probably false negatives in our ChIP-seq data . 
+ The gadX ( yhiX ) gene was reported to be induced by NO through an indirect NsrR-dependent mechanism in E. coli O157 : H7 ( Branchu et al. , 2014 ) , but the presence of an NsrR binding site upstream of gadX may indicate a direct regulatory mechanism . 
+ There is evidence that a single 11 bp motif can function as an NsrR-binding site in E. coli K-12 ( Partridge et al. , 2009 ) . 
+ So we combined the two halves of the 11 -- 1 -- 11 palindro-mic motif , and reconstructed a PSWM of 11 bp . 
+ The new 11 bp PSWM was used to scan the 49 200 bp sequences ﬂanking all the peak regions using the P-value 26 cut-off of 10 . 
+ In this analysis , 38 of 49 peaks had at least one single motif , and the updated sequence logo for the 11 bp motif is shown in Fig. 5 ( b ) . 
+ Discussion
+ Flavohaemoglobin ( Hmp ) , ﬂavorubredoxin ( FlRd ) and respiratory nitrite reductase ( Nrf ) have been extensively studied to understand their role in combating NO in E. coli and Salmonella enterica ( Clarke et al. , 2008 ; Gardner , 2005 ; Gardner & Gardner , 2002 ; Gardner et al. , 2002 ; Gomes et al. , 2002 ; Mills et al. , 2008 ; Poock et al. , 2002 ; van Wonderen et al. , 2008 ) . 
+ In S. enterica , it has been suggested that additional NO detoxiﬁcation mechanisms are expressed in the absence of Hmp , and that the availability of different NO detoxiﬁcation mechanisms under different environmental conditions is an important contri-butor to virulence ( Mills et al. , 2008 ) . 
+ Work in E. coli K-12 lacking the known NO detoxiﬁcation mechanisms has also suggested the existence of an additional major pathway of NO metabolism ( Vine & Cole , 2011b ) , which may be the same activity that we have observed in CFT073 . 
+ In UPEC strains , the only system known to protect the pathogen from NO is Hmp ( Svensson et al. , 2006 , 2010 ) . 
+ Competitive infection of UTI mouse models with wild-type and hmp-deleted UPEC strains showed a decreased ability of the mutant to infect ( Svensson et al. , 2010 ) . 
+ The roles of NrfA and FlRd have not been studied in the pathogen , but in this paper we show that FlRd is a major contributor to NO metabolism in UPEC , and that there is an additional NO-inducible activity yet to be identiﬁed . 
+ Our data suggest that the respiratory nitrite reductase Nrf makes only a minor contribution to NO metabolism . 
+ Previous transcriptomic studies have suggested that UPEC experiences iron and oxygen limitation in the urinary tract ( Hagan et al. , 2010 ; Snyder et al. , 2004 ) . 
+ It has also been proposed that the ability of UPEC to adapt to low oxygen may be critical for successful bladder colonization during UTI ( Subashchandrabose et al. , 2014 ) . 
+ As UPEC is potentially exposed to NO ( host derived and/or endogenously generated from nitrate respiration ) , nitrate and low oxygen in vivo , our choice of growth conditions for transcriptomics is relevant to the host environment . 
+ Our data ( Fig. 4 ) suggest that responses to nitrate in the 3X mutant also occur in the wild-type parent . 
+ Nevertheless , the use of the 3X mutant may exacerbate responses to nitrate and NO compared with the parent strain . 
+ Our results have highlighted a group of interesting genes ( including some that are virulence-associated ) that we believe are good candidates for further investigation , including in vivo approaches . 
+ A disadvantage of the use of nitrate as a source of NO is that we can not necessarily disentangle the effects of endogenously generated NO from direct effects due to nitrate . 
+ Thus , the changes observed in the transcriptome are likely to be mediated by nitrate sensing systems ( NarXL/NarQP ) in addition to those responsive to NO ( principally NorR and NsrR ) . 
+ At least for those genes shared with E. coli K-12 , we can use prior knowledge to infer some of the regulatory consequences of nitrate exposure . 
+ For example , upregulation of members of the NsrR regulon is strong evidence for the generation of physiologically signiﬁcant concentrations 
+ In this work , upregulation of genes involved in respiration and electron transport , along with genes associated with the TCA cycle , suggests that the pathogen uses these mechanisms to maximize energy generation during NO stress . 
+ Decreases in the levels of transcripts involved in glucose metabolism ( glycolysis and gluconeogenesis ) and upregulation of genes involved in dipeptide transport suggests that during NO stress , glucose may not be the energy source used by the pathogen . 
+ Differential expression of virulence-associated genes and genes on pathogenicity islands as a consequence of nitrate exposure suggest a role for nitrate and/or NO in pathogenesis . 
+ These experiments were performed with a mutant strain compromised in its ability to metabolize NO . 
+ By ChIP-seq we identiﬁed NsrR binding sites in the CFT073 genome . 
+ Of 49 NsrR binding sites in promoter regions , 19 are associated with genes that were nitrate-responsive in the RNA-seq data . 
+ This discrepancy may reﬂect differences in the strains used , or the growth conditions used for the two experiments ( aerobic growth for ChIP-seq , anaerobic growth for RNA-seq ) , although there is no published evidence to suggest that NsrR binding to DNA is sensitive to oxygen in vivo . 
+ Another possible explanation is that at some binding sites NsrR exerts weak or no regulation , as we have observed previously for E. coli K-12 . 
+ As was the case for E. coli K-12 ( Partridge et al. , 2009 ) around half of mapped sites were within coding regions or between convergently transcribed genes . 
+ Similar results have been obtained with other regulatory proteins , for example Fur ( Seo et al. , 2014 ) , and this is not surprising behaviour for a DNA-binding protein with a relaxed sequence speciﬁcity . 
+ We assume that most sites in this category have no biological function , although some may regulate the activity of promoters driving expression of small or anti-sense RNAs . 
+ We found strong NsrR binding signals upstream of some hypothetical proteins of unknown function , some of them speciﬁc to CFT073 ( meaning not present in E. coli K-12 ) . 
+ Examples are c0118 and c0233 , which are homol-ogues of each other . 
+ Both c0118 and c0233 have two copies of a conserved helix -- turn -- helix domain that is often found in transposases and is likely to bind DNA . 
+ Both proteins are implicated as transposases or derivatives in the clusters of orthologous groups of proteins ( COGs ) database . 
+ Transposase genes are frequently associated with pathogenicity islands , and NsrR has been implicated in regulating pathogenicity island genes in E. coli O157 : H7 ( Branchu et al. , 2014 ) . 
+ Therefore , it would be interesting to study the function of c0118 and c0233 to see if they are related to the pathogenicity of CFT073 , and to determine if NsrR is involved in the regulation of pathogenicity island genes . 
+ Of the genes implicated as possible NsrR targets by ChIP-seq that were also differentially regulated in response to NO , two-thirds were downregulated in the presence of a source of NO . 
+ This behaviour is consistent with positive regulation by NsrR , as has been reported previously ( Branchu et al. , 2014 ) , or with indirect effects of NsrR . 
+ Some genes associated with NsrR binding sites were not differentially regulated in the RNA-seq experiment , which may indicate that these genes are subject to multiple regulatory mechanisms , such that regulation by NsrR is revealed only under speciﬁc growth conditions . 
+ An additional possibility is that there is a category of promoter that is bound by , but not regulated by , NsrR . 
+ In conclusion , the response of UPEC strain CFT073 to NO overlaps substantially with that of E. coli K-12 . 
+ In both cases , Hmp and FlRd provide the principal NO detoxiﬁcation mechanisms , although there is evidence of additional activities yet to be identiﬁed . 
+ Anaerobic growth in the pre-sence of nitrate ( and therefore low concentrations of endogenously generated NO ) causes a major reprogramming of the transcriptome . 
+ Major players in regulating differential gene expression under these conditions are likely to be NarXL , NarQP , FNR and NsrR . 
+ The NsrR regulon of CFT073 overlaps signiﬁcantly with that of E. coli K-12 , but our data also suggest that NsrR ( and therefore NO ) may regulate several CFT073 genes that do not have homologues in E. coli K-12 . 
+ Acknowledgements
+ We are grateful to Harry Mobley and Barry Wanner for generously providing strains and plasmids . 
+ We thank Zhenyu Xuan for help with data analysis and Yunfei Wang for help with sequencing . 
+ M. Q. Z. acknowledges ﬁnancial support from the National Institutes of Health ( award HG001696 ) and the Cecil H. and Ida Green Endowment .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/28439033.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/28439033.txt 0 → 100644
View file @27818a9
+ Genome-Wide Analysis of ResD, NsrR,
+ ABSTRACT Upon oxygen limitation , the Bacillus subtilis ResE sensor kinase and its cognate ResD response regulator play primary roles in the transcriptional activation of genes functioning in anaerobic respiration . 
+ The nitric oxide ( NO ) - sensitive NsrR repressor controls transcription to support nitrate respiration . 
+ In addition , the ferric uptake repressor ( Fur ) can modulate transcription under anaerobic conditions . 
+ However , whether these controls are direct or indirect has been investigated only in a gene-speciﬁc manner . 
+ To gain a genomic view of anaerobic gene regulation , we determined the genome-wide in vivo DNA binding of ResD , NsrR , and Fur transcription factors ( TFs ) using in situ DNase I footprinting combined with chromatin afﬁnity precipitation sequencing ( ChAP-seq ; genome footprinting by high-throughput sequencing [ GeF-seq ] ) . 
+ A signiﬁcant number of sites were targets of ResD and NsrR , and a majority of them were also bound by Fur . 
+ The binding of multiple TFs to overlapping targets affected each individual TF 's binding , which led to combinatorial transcriptional control . 
+ ResD bound to both the promoters and the coding regions of genes under its positive control . 
+ Other genes showing enrichment of ResD at only the promoter regions are targets of direct ResD-dependent repression or antirepression . 
+ The results support previous ﬁndings of ResD as an RNA polymerase ( RNAP ) - binding protein and indicated that ResD can associate with the transcription elongation complex . 
+ The data set allowed us to reexamine consensus sequence motifs of Fur , ResD , and NsrR and uncovered evidence that multiple TGW ( where W is A or T ) sequences surrounded by an A - and T-rich sequence are often found at sites where all three TFs competitively bind . 
+ IMPORTANCE Bacteria encounter oxygen ﬂuctuation in their natural environment as well as in host organisms . 
+ Hence , understanding how bacteria respond to oxygen limitation will impact environmental and human health . 
+ ResD , NsrR , and Fur control transcription under anaerobic conditions . 
+ This work using in situ DNase I footprinting uncovered the genome-wide binding proﬁle of the three transcription factors ( TFs ) . 
+ Binding of the TFs is often competitive or cooperative depending on the promoters and the presence of other TFs , indicating that transcriptional regulation by multiple TFs is much more complex than we originally thought . 
+ The results from this study provide a more complete picture of anaerobic gene regulation governed by ResD , NsrR , and Fur and contribute to our further understanding of anaerobic physiology . 
+ Bacteria live in environments where many chemical and physical parameters constantly change . 
+ In addition , their life is affected by the presence and activity of other living organisms . 
+ These effects are sometimes detrimental to bacteria but are often manageable by orchestrating global gene expression in response to each environmental change . 
+ One such change they encounter in diverse environments and in living hosts is a ﬂuctuation of oxygen concentration . 
+ Bacillus subtilis adapts to oxygen limitation by undergoing nitrate respiration or anaerobic fermentation that generates ATP by substrate-level phosphorylation . 
+ Gene regulation required for the adaptation from aerobic to anaerobic conditions is mainly controlled at the transcriptional level ( 1 ) . 
+ The signal transduction system composed of the ResD response regulator and the ResE histidine sensor kinase plays essential roles in nitrate respiration ( 2 ) and aerobic respiration ( 3 ) . 
+ Due to the dual roles of ResDE in aerobic and anaerobic respiration , B. subtilis is endowed with an additional layer of regulation for controlling genes that function in nitrate respiration when cells encounter oxygen-limited conditions in the presence of nitrate . 
+ This regulatory mechanism is executed by two [ 4Fe-4S ] - containing transcription factors ( TFs ) , namely , oxygen-sensitive Fnr and nitric-oxide ( NO ) - sensitive NsrR . 
+ Fnr plays a major role in autoregulation through the narK-fnr operon promoter ( 2 ) , as well as in the activation of the narGHJI ( respiratory nitrate reductase ) operon ( 4 , 5 ) . 
+ The NsrR repressor functions in upregulating ResD-dependent transcription of the nasDEF nitrite reductase operon in response to NO ( 6 ) . 
+ NO is generated from nitrite as a by-product during nitrate respiration ( 6 ) ; hence , B. subtilis likely senses nitrate availability through the presence of NO . 
+ In addition , nitrite reductase converts nitrite to ammonium , thus mitigating NO accumulation that is harmful to cells . 
+ NO detoxiﬁcation could also be carried out under microaerobic conditions by ﬂavohemoglobin ( 7 ) , which is encoded in B. subtilis by another ResD/NsrR-controlled gene , hmp ( 8 ) . 
+ We have shown in vivo that nasD and hmp transcription are highly repressed by NsrR during anaerobic fermentative growth ( in the absence of NO ) , but that the expression of these genes is induced under conditions that support nitrate respiration ( in the presence of NO ) ( 6 ) . 
+ Studies using electrophoretic mobility shift assays ( EMSAs ) and in vitro transcription conﬁrmed that NsrR is an NO-sensitive repressor ( 9 , 10 ) . 
+ Electron paramagnetic resonance and resonance Raman spectroscopies demonstrated that NO interaction with iron in the [ 4Fe-4S ] cluster causes dinitrosylation of iron , leading to the dissociation of NsrR from the nasD promoter DNA ( 11 ) . 
+ The direct modiﬁcation of the [ 4Fe-4S ] cluster by NO has also been reported with an NsrR ortholog ( 12 ) . 
+ To gain insight into the genome-wide function of NsrR in transcription , a microarray analysis was conducted ( 9 ) . 
+ The results showed that hmp and nasD are the genes most highly regulated by NsrR . 
+ In addition , we identiﬁed other genes moderately controlled by NsrR , some of which belong to the Fur regulon involved in iron homeostasis ( 13 ) . 
+ Although in vivo transcription assays conﬁrmed the negative effect of NO on NsrR repression of the newly identiﬁed genes , an EMSA showed that NsrR only weakly binds to this class of promoter DNA in an NO-insensitive manner ( 9 ) . 
+ Hence , NsrR might indirectly control the transcription of these genes . 
+ Alternatively , efﬁcient NsrR binding could require ternary DNA structure and/or other TFs to assist NsrR binding to DNA , both of which were lacking in the in vitro studies ( 9 ) . 
+ The following in vivo protein-DNA binding approach using chromatin afﬁnity precipitation ( ChAP ) - chip revealed that NsrR interacts with some but not all of the genes identiﬁed by the preceding microarray results ( 14 ) . 
+ Some of the DNA sites directly targeted by NsrR were also bound by ResD and/or Fur ( 14 ) . 
+ However , the resolution of ChAP-chip is not high enough to precisely localize the exact binding sites of each TF . 
+ In this study , we revisited the genome-wide interaction of the three TFs -- NsrR , ResD , and Fur -- in cells cultured under anaerobic fermentative conditions to resolve each TF-binding sequence by adapting the high-resolution mapping method called genome footprinting by high-throughput sequencing ( GeF-seq ) . 
+ GeF-seq involves in situ DNase I digestion of genomic DNA followed by ChAP-seq , which was successfully used by members of our group to delimit AbrB-binding sites ( 15 ) . 
+ A comparison of binding sites targeted by each TF in the wild type and in mutant strains lacking the other two TFs has revealed that the binding of ResD , NsrR , and Fur could be cooperative or competitive depending on the targeted promoters and the other TFs binding to nearby or to overlapping sites . 
+ By using a single-nucleotide-resolution analysis of GeF-seq data , we were able to determine that all three TFs competitively bind within a 40-bp region of promoter DNAs that contain sequences similar to the consensus sequence of each TF . 
+ We also showed that DNA-bound ResD distributions differ depending on ResD function in the transcriptional control of each ResD-targeted gene . 
+ RESULTS
+ Rationale . 
+ Our previous ChAP-quantitative PCR ( qPCR ) showed that ResD , NsrR , and/or Fur often affects the binding of the other TF ( s ) ( 14 ) . 
+ To gain further understanding of how binding by the multiple TFs to promoter DNA affects individual TF-DNA interactions , we carried out GeF-seq both in the wild type and in mutants that do not produce the other two TFs . 
+ A comparison of TF binding between the wild type and double mutant strains is an advantageous approach . 
+ First , the binding of multiple TFs to nearby but not overlapping sequences might lead to cross-linking between His12-tagged and untagged TFs , which could contribute to protection of the DNA region against DNase I cleavage . 
+ Thus , afﬁnity puriﬁcation of the DNA during ChAP generates a longer DNA that includes more than a single TF-binding site . 
+ In fact , a search for NsrR-and ResD-binding motifs by MEME ( multiple expectation maximization for motif elicitation ) analysis of GeF-seq data from wild-type extracts often identiﬁed the Fur-binding motif ( data not shown ) . 
+ We could partially circumvent the problem by examining each TF-binding target in mutant strains . 
+ Second , each TF affects , positively or negatively , DNA binding by other TFs . 
+ Thus , a comparison of binding distributions of each TF in the absence or presence of other TFs likely provides us with more valuable information , such as the existence of cooperative and competitive binding , which is useful to understand the biological importance of combinatorial control exerted by multiple TFs . 
+ Therefore , we carried out GeF-seq using both the wild-type and double mutant strains . 
+ More speciﬁcally , NsrR , ResD , or Fur binding was determined in the resD fur , nsrR fur , or resD nsrR mutant cells , respectively . 
+ All GeF-seq experiments were carried out in cells cultured anaerobically in 2 yeast extract-tryptone medium ( YT ) supplemented with 0.5 % glucose and 0.5 % pyruvate to support anaerobic fermentation where NsrR is active as a repressor ( 6 ) . 
+ ResD and NsrR binding is localized at the hmp and nasD promoters . 
+ Previous ChAP-qPCR revealed that binding of ResD to nasD increases in the nsrR mutant background ( 14 ) , suggesting that NsrR efﬁciently competes with ResD . 
+ However , we were unable to distinguish whether this competition is due to either an overlapping or nearby sequence targeted by the two TFs . 
+ The GeF-seq results of ResD and NsrR binding to hmp ( Fig. 1A ) and nasD ( Fig. 1B ) at a single nucleotide resolution not only solved the question but also provided a new ﬁnding that the DNA-binding pattern of ResD correlates with its function in transcriptional control . 
+ Protein-binding regions shown in the left panels are expanded and shown in the right panels in Fig. 1 . 
+ NsrR interaction with the hmp promoter in the resD fur mutant was at an intensity comparative to that in the wild-type strain , but the binding peak in the mutant became more prominent within the region between 9 to 24 ( relative to the transcription start site ) ( Fig. 1A ) , which corresponds to the NsrR consensus sequence ( NsrR-1 ) predicted by bioinformatics analysis ( 16 ) . 
+ Given that Fur does not bind to hmp in either the wild type or the resD nsrR mutant , NsrR , not Fur , is likely the TF that outcompetes ResD for binding to the hmp promoter under anaerobic fermentation conditions . 
+ In addition , the GeF-seq data from the mutant revealed that ResD interacts with two distinct regions upstream of the core promoter and the transcription start site , whereas very low association of ResD with the upstream region was detected in the wild-type cells . 
+ The upstream ResD-binding site ( 93 to 35 ) corresponds well with the sites previously determined by in vitro DNase I footprinting ( 76 to 40 ) ( 17 ) and hydroxyl radical footprinting ( 81 to 47 ) ( 18 ) . 
+ The downstream ResD-binding site , which covers the 10 sequence and the transcription start site , overlaps with the NsrR-binding site . 
+ The downstream ResD site has not been detected in the in vitro binding studies ( 17 , 18 ) . 
+ We interpret this result as meaning that the upstream ResD-binding site is required for the activation of hmp transcription as demonstrated in the earlier study ( 17 ) , and the downstream site was not detected in the in vitro footprinting experiments because ResD binding to this site requires another protein , very likely RNA polymerase ( RNAP ) , as described later . 
+ Both NsrR - and ResD-binding proﬁles to nasD showed similarities to as well as differences from those to the hmp promoter region . 
+ Our previous work showed that the nasD promoter carries a sequence similar to the predicted NsrR consensus sequence in hmp ( 6 ) . 
+ This nasD sequence functions as the operator for NsrR based on the effect of base substitutions on in vitro NsrR-DNA binding and in vivo transcription ( 9 ) . 
+ Binding of NsrR to nasD , as detected by GeF-seq , covered the previously identiﬁed NsrR-binding site ( Fig. 1B ) . 
+ However , the binding intensity was lower in the resD fur mutant unlike that for hmp . 
+ The requirement of ResD for efﬁcient binding of NsrR to nasD was also demonstrated by ChAP-qPCR ( 14 ) . 
+ As observed with hmp , ResD was more enriched at the nasD promoter in the absence of NsrR . 
+ At the hmp promoter , the upstream site showed greater interaction with ResD than the downstream site , whereas for nasD , the promoter region ( corresponding to the downstream site of hmp ) had a higher intensity of ResD binding , and upstream of 35 ( corresponding to the upstream site of hmp ) in the mutant was recognized , at best , only as a shoulder of the peak . 
+ In summary , the results clearly demonstrated a high-resolution view of NsrR interference in ResD interaction with the hmp and nasD promoter DNA . 
+ Previous studies suggested that the transcription of hmp and nasD is controlled by ResD and NsrR through similar mechanisms , but this study revealed that there are differences between each promoter with respect to ResD and NsrR . 
+ Two different ResD-binding proﬁles reﬂect the regulatory roles of ResD . 
+ ResD has been known to act as a direct transcriptional activator but has not been identiﬁed as a direct transcriptional repressor . 
+ GeF-seq revealed two different ResD-binding proﬁles ( Fig. 2 ) , which suggested the possibility that ResD likely plays diverse roles in transcriptional control . 
+ Table 1 lists representative promoters where ResD strongly binds , as well as how ResD regulates the transcription of each gene based on previously published and unpublished results together with those obtained from this study . 
+ Seventeen genes ( 18 sites ) showed ResD binding at both their promoters and throughout the entire coding regions ( Table 1 , group A ) . 
+ The binding of ResD was clearly sharper and stronger to the promoters than to the coding regions ( Fig. 2A to D ) . 
+ As evident with the nasDEF and the cydABCD operons , the trail of binding ends at the - independent terminator of the operon , suggesting that a portion of the ResD population that binds to the promoter remains associated with the RNAP elongation complex . 
+ ResD-binding regions in the second class of genes are limited to the promoter DNA ( Table 1 , group B , and Fig. 2E to G ) . 
+ We examined whether ResD plays distinct roles in transcription between genes that show different binding proﬁles . 
+ As shown in Table 1 , most of genes in group A , to which ResD binds both the promoter and coding regions , function in aerobic or anaerobic respiration and are controlled positively by ResD . 
+ These genes include hmp ( 8 ) , nasD ( 19 ) , narG ( 4 ) , ctaA ( heme A synthase gene ) ( 20 ) , ctaB ( major heme O synthase gene ) ( 21 ) , qox ( cytochrome aa3 quinol oxidase gene ) ( 22 ) , and cydA ( cytochrome bd ubiquinol oxidase gene ) ( 23 , 24 ) . 
+ Therefore , we speculated that ResD activates the transcription of all genes in this class . 
+ The speculation was tested using transcriptional lacZ fusions to as-yet-uncharacterized promoters that belong to this class ( Fig. 3 ; see also Fig . 
+ S1 in the supplemental material ) . 
+ Transcriptional control of these genes by ResD has not been reported , and the biological functions of most of the genes remain to be uncovered . 
+ As expected , ResD indeed activated yxiE and yozB transcription ( Fig. 3A and B ) , as well as ctaO ( minor heme O synthase gene ) ( 25 ) , ydbL , and yfmQ ( Fig . 
+ S1 ) . 
+ A previous study using afﬁnity puriﬁcation of RNAP followed by mass spectrometry identiﬁed ResD as an RNAP-associated protein ( 26 ) . 
+ The high afﬁnity of ResD to RNAP might explain why ResD travels with elongating RNAP through coding regions . 
+ Consistent with this ﬁnding , a previous EMSA showed that ResD binding to the nasD promoter is enhanced in the presence of RNAP ( 10 ) . 
+ Altogether with these results , we concluded that genes exhibiting ResD associations with both promoter and coding regions are directly activated by ResD . 
+ Next , we chose ﬁve genes -- glpF , yjlC , ywcE , ymfC , and ytcP -- representing the second class of ResD-bound loci to examine whether and how ResD controls the transcription of these genes ( Fig. 3C to F ) . 
+ glpF and the downstream glpK constitute an operon encoding glycerol uptake facilitator and glycerol kinase , respectively . 
+ The third gene in the operon , glpD ( glycerol-3-phosphate dehydrogenase gene ) , is transcribed from a gene-speciﬁc promoter as well as the glpF promoter ( 27 ) . 
+ A previous transcriptomic analysis showed that glpFK transcription is repressed under anaerobic conditions compared with that under aerobic conditions and that aerobic glpD expression is upregulated in the resDE mutant ( 1 ) . 
+ These results suggested that ResD negatively controls the transcription of the glpF operon , which was conﬁrmed by data showing that glpF transcription is derepressed in the resD mutant ( Fig. 3C ) . 
+ Taken together with the GeF-seq results , we concluded that ResD is a direct repressor of the glpF operon . 
+ Similarly , the transcription of the yjlC-ndh operon was downregulated under anaerobic conditions and increased in the resDE mutant under aerobic conditions ( 1 ) . 
+ The regulatory loop of this operon was previously reported ( 28 ) . 
+ A higher NAD concentration leads to repression of the yjlC-ndh operon by the Rex repressor that responds to a decrease in the NADH/NAD ratio ( 29 ) , thereby leading to a reduction in NADH dehydrogenase encoded by ndh and a consequent elevation in the NADH/NAD ratio ( 28 ) . 
+ The operon has a relatively long untranslated leader sequence , and the transcription initiation site resides 257-bp upstream of the initiation codon . 
+ The Rex-binding site was identiﬁed within the leader region by deletion analysis and EMSA ( 28 ) . 
+ Our GeF-seq results showed that ResD binds to a sequence that includes the 10 site of the yjlC promoter ( 27 ) ( see Data Set S1 ) . 
+ ResD was able to repress the expression of lacZ fused to the yjlC promoter ( 138 to 29 ) that lacks the Rex-binding site ( 145 to 166 ) , thus functioning in a Rex-independent manner ( Fig. 3D ) . 
+ ywcE , whose transcription is under negative control by AbrB , encodes a holin-like protein , and the ywcE mutant forms spores with a reduced outer coat ( 30 ) . 
+ ResD repressed ywcE ( Fig. 3E ) at least under anaerobic conditions . 
+ ResD also repressed the transcription of ymfC ( GntR family TF ) ( Fig. 3F ) , but the effect of the resD mutation on ymfC transcription was not as strong as those of other genes listed in Fig. 3 . 
+ ytcP expression was not detected under the culture conditions tested regardless of the presence or absence of ResD . 
+ We previously showed that ResD functions as an antirepressor of Fur for ykuN ( ﬂavodoxin gene ) that also belongs to this class ( 14 ) . 
+ In summary , ResD plays the role of a repressor ( or antirepressor ) in transcriptional control of the second class of genes , but not as a direct activator , by binding to promoter regions . 
+ To the best of our knowledge , this is the ﬁrst report to show that ResD directly represses transcription . 
+ Some ResD-binding promoters interact with NsrR and/or Fur . 
+ Previous studies including ChAP-chip analyses showed that ResD and NsrR , by directly binding to hmp and nasD , positively and negatively regulate the transcription of these genes . 
+ Furthermore , ResD and NsrR modulate Fur-dependent repression of ykuN , which involves the direct binding of three TFs . 
+ The questions that remained to be answered were whether there are more promoter regions targeted by a combination of the three TFs and , if there are , whether and how the coordinated or simultaneous interactions affected the transcription of these genes . 
+ Table 1 classiﬁes ResD-binding genes by the ResD-binding proﬁles described above and by their interactions with NsrR and/or Fur . 
+ NsrR does not bind to the group A genes listed in Table 1 either in the wild type or in resD fur mutant strains , except for hmp and nasD as described above . 
+ This unique features of hmp and nasD are attributed to the physiological importance of NO-sensitive NsrR repression , the mechanism by which the RNAP-ResD complex efﬁciently competes for promoter binding with NsrR only when NO is present . 
+ Table 1 , group B , shows that ResD-binding patterns to promoter DNA somewhat correspond to whether NsrR also binds to the DNA . 
+ Genes in group Ba to which ResD binds in both the wild-type and mutant strains generally do not interact with NsrR or Fur , except yvaF , which exhibits higher binding by NsrR in the absence of ResD and Fur . 
+ ykuN and fbpC , which belong to group Bb , are enriched for both ResD and NsrR in the wild-type strain but not in the mutants . 
+ By contrast , Fur binding is less signiﬁcantly affected by the nsrR and resD mutations . 
+ Previous studies showed that fbpAB and fbpC are Fur-regulated RNA chaperones for the fsrA small RNA ( sRNA ) that functions in the iron-sparing response ( 31 ) . 
+ In comparison to fbpAB where NsrR and ResD bind weakly at most ( data not shown ) , all three TFs clearly bind to the fbpC promoter ( Table 1 ) . 
+ FsrA-dependent control of resA expression was uncovered in a previous study ( 31 ) . 
+ Given that resD is under the control of the resA operon promoter that is indirectly activated by ResD itself ( 32 ) , it is attractive to envision the involvement of FbpC in the ResDE autoregulatory pathway ; however , the role of the RNA chaperones , particularly FbpC , has not been fully established . 
+ Genes in the last group , Bc , could interact with NsrR and/or Fur ( except yppF ) , but the binding intensity of each TF is dramatically decreased in the wild-type cells , indicating the competitive nature of binding . 
+ An untranslated region ( UTR ) named S842 was identiﬁed upstream of yppF ( 27 ) , and we found that the yppF ResD-binding site is within the 146-bp region coding for S842 . 
+ S842 is induced by anaerobiosis ( 27 ) , thus ResD binding of the S842 site might be classiﬁed as belonging to group A instead of Bc , although it is difﬁcult to distinguish the ResD-binding proﬁle because of the small size of S842 . 
+ Transcriptional control of group Bc genes such as ymfC by each TF is often subtle and complex ( Fig. 3 and data not shown ) . 
+ This is likely caused by the overlap of each TF-binding site . 
+ For example , NsrR and/or Fur likely represses transcription by occupying the site available in the resD mutant . 
+ In summary , GeF-seq and transcription assays enabled us to identify genes that are direct targets of ResD , which could be further classiﬁed into different groups using in vivo binding proﬁles , ResD 's role in transcription , and whether or how NsrR/Fur affects mutual DNA binding . 
+ The three TFs show combinatorial binding to DNA . 
+ Genome-wide association studies of the TFs showed that ResD , NsrR , and Fur bind to a signiﬁcant number of overlapping sites ( Data Set S1 ) . 
+ These sites not only are in promoter regions but also localize outside promoter DNA . 
+ Each TF binds to a larger number of the overlapping sites when the other TFs are missing compared with that in the wild-type strain , suggesting that these TFs participate in competitive interactions more than cooperative interactions at overlapping binding sites ( Table 2 ) . 
+ Fur shares target sites with ResD and/or NsrR but almost exclusively binds to those associated with both ResD and NsrR . 
+ Such a trend was observed in both the wild type and the mutant backgrounds . 
+ These TFs use overlapping binding sequences of target genes ( Data Set S1 ) . 
+ The combinatorial binding by the TFs is classiﬁed into three groups based on how the binding of the TFs is mutually affected ( Table 2 ) . 
+ The ﬁrst group of genes includes ykuN and fbpC , which were classiﬁed into group Bb in Table 1 . 
+ The TFs cooperatively bind to either the promoter DNA ( ykuN , fbpC , and exlX [ extracellular endoglucanase gene ] ) or the coding region of ppsB ( plipastatin synthase gene ) . 
+ The binding of ResD and NsrR to these genes requires Fur , which is in good agreement with ChAP-qPCR results of ykuN ( 14 ) , whereas Fur is still able to bind to these genes in the absence of ResD and NsrR , albeit with less intensity ( fbpC and ppsB ) or with altered binding distribution ( ykuN , which will be discussed later and in Fig. 5 ) . 
+ The second group of genes includes those classiﬁed into group Bc in Table 1 . 
+ Each TF only binds to these genes , or the binding intensities increase when the other two TFs are missing . 
+ This competitive binding of the three TFs was also observed at two other sites , the promoter region of yngD ( nrnB ) that encodes nano-RNase B and the ydjA ( BsuM DNA restriction system [ 33 ] ) coding region . 
+ The three TFs also competitively bind to yet-unassigned genes located between yoaM-yozS and ynfE-xynC , as described later . 
+ As Fur binding of pps is enhanced in the mutant , although the intensity value remains lower than the deﬁned threshold for peak detection , pps likely belongs to this group . 
+ This group of genes generally shows higher intensities of NsrR binding compared with those of ResD and Fur . 
+ The third group shows binding competition that is different from the second group . 
+ NsrR binding in the wild-type background is weak or not detected but enhanced in the resD fur double mutant . 
+ Fur binding also increases in the resD nsrR mutant despite the observation that intensity values do not reach the threshold in the case of yvaF , spoIIIC , and spoIVCB . 
+ As ResD binding does not signiﬁcantly change between the wild type and the mutant , ResD likely has a higher binding afﬁnity than NsrR and Fur . 
+ NsrR and Fur likely compete with each other for binding in the absence of ResD . 
+ In summary , GeF-seq in the wild type and in double mutants has made it possible for us to classify all three TF binding characteristics into three groups based on the effect of mutual DNA interactions . 
+ Search for consensus sequences for Fur , ResD , and NsrR binding . 
+ As considerable numbers of sites were bound by ResD/NsrR/Fur , and the TFs often bind to overlapping sites with different afﬁnities depending on the site , it is not easy to identify each TF-binding motif , particularly for ResD and NsrR . 
+ A search for the consensus sequences was conducted using data collected from GeF-seq in the mutant backgrounds . 
+ Results using the MEME motif search ( 34 ) and BiPad ( a two-block de novo DNA motif-ﬁnding tool ) ( 35 ) are shown in Data Sets S2 to S4 . 
+ Fur . 
+ Previous studies by Helmann 's group determined the consensus sequence of Fur-binding sites ( 36 , 37 ) . 
+ The binding sequence was identiﬁed as a 7-1-7 inverted repeat ( TGATAATNATTATCA , where N is any nucleotide ) where one Fur dimer interacts . 
+ The sequence was further conﬁrmed in a study of condition-dependent transcriptome analysis in which Fur operators were characterized using a MEME motif search ( 27 ) . 
+ Here , we carried out a MEME search using all Fur-binding regions identiﬁed by GeF-seq and found a common TGANAA motif ( Data Set S2 ) . 
+ As MEME did not detect the 7-1-7 motif in all Fur-binding sites , we repeated the search using subsets of genes divided into four groups by binding strength ( Data Set S2 ) . 
+ The highest binding sites ( intensity rank 1 to 25 ) were found to harbor a 7-1-7 motif similar to the previously identiﬁed sequence , except that the fourth T in the upstream 7 was only moderately conserved ( 17 of 25 ) and the fourth A of the downstream 7 was the least conserved ( 8 of 25 ) in the newly identiﬁed motif . 
+ Thus , the common motif in genes of the highest intensity rank contains a modiﬁed 7-1-7 ( TGANAATNATTNTCA ) . 
+ When Fur-binding sites that belong to the next three ranks ( 26 to 50 , 51 to 75 , and 76 to 98 ) were used , MEME identiﬁed shorter consensus motifs that overlap the upstream half-site in the 7-1-7 motif ( TGANAA ) ( Data Set S2 ) . 
+ The result might suggest that either Fur binds as a monomer or the downstream half motif was not detected , because MEME motif search does not identify bipartite motifs separated by gaps of various length . 
+ Therefore , we used BiPad , which is capable of predicting various pairs of bipartite motifs with different gap lengths . 
+ Changes of spacing between the halfsites from 0 to 2 when incorporated into the search successfully detected target sequences in all potential Fur-binding sites ( Fig. 4 ; see also Data Set S2 ) . 
+ Fifty-ﬁve Fur-binding sites contain the modiﬁed 7-1-7 motif identiﬁed by MEME . 
+ An additional 27 sites and 16 sites were detected to have a common 7-2-7 and 7-0-7 sequence motif ( Fig. 4 ) , respectively , although the downstream half of the 7-0-7 conﬁguration is not well conserved . 
+ The high-intensity binding sites mostly carry the original 7-1-7 motif as expected , and often carry two overlapping 7-1-7 motifs to generate a 21-bp binding site , a sequence similar to the 19-bp sequence proposed earlier ( 36 ) . 
+ We also noticed tandem TGANAA-like sequences in both orientations near the 7-1-7 motif , suggesting cooperative binding to multiple sites to compensate for weaker afﬁnities . 
+ This result might indicate that a Fur dimer interacts with each half-site of the Fur operator in a ﬂexible manner . 
+ ResD . 
+ The ResD consensus sequence was determined using all the ResD-binding sites identiﬁed in the nsrR fur mutant by GeF-seq . 
+ A MEME motif search identiﬁed a common 10-1-10 motif . 
+ TGANW6 or TNTGANW4 ( where W is A or T ) was observed particularly in high-intensity ResD-binding sites as tandem repeats with 1-bp spacing ( Fig. 4 ; see also Data Set S3 ) . 
+ A directly repeated TGANAANW3 motif was also visually detected . 
+ The alignment of these logos is very complex , and the overall A T richness of the sequence where ResD binds makes it difﬁcult to conclude which motif ﬁts best with an individual binding site . 
+ However , it is likely that a 10-bp motif with 1-bp spacing functions primarily as the ResD target sequence . 
+ The data support the hydroxyl radical footprinting results suggesting that ResD likely binds tandemly at the same surface of the DNA helix in the hmp and nasD promoters ( 18 ) . 
+ The consensus sequence obtained here is somewhat similar to the ResD consensus sequences identiﬁed earlier based on in vitro studies ( 38 ) . 
+ The current work supported the previously detected in vitro ResD-binding architecture but also demonstrated the importance of TGA for ResD binding as well as for Fur-operator interaction . 
+ The GA sequence in TGANAA of the Fur operator site is important for the recognition by Fur and functions as a discriminatory core sequence distinguishing the Fur operator from the sites recognized by other Fur homologs ( Per and Zur ) ( 37 ) . 
+ The G base of TGA in the ResD-binding site is less conserved compared with that in the Fur site , which might act to favor Fur over ResD . 
+ Unexpectedly , another C-rich motif was identiﬁed by MEME analysis ( Data Set S3 ) , and interestingly , the reverse complement sequence to this motif was identiﬁed as one of the common motifs for NsrR targets ( Data Set S4 ) . 
+ At the moment , it is unknown whether this sequence is involved in ResD - and NsrR-binding recognition or serves as a binding site for other DNA-binding proteins , including TFs . 
+ NsrR . 
+ A MEME motif search determined that 9 of 26 NsrR-binding sites carry a similar motif ; however , this motif is not present in more than half of the NsrR-binding sites ( Data Set S4 ) , including hmp . 
+ Accordingly , the automatic search did not detect the motif in the nasD NsrR-binding site that is similar to the hmp site . 
+ As described above , all previous in vivo and in vitro work and a computational study indicated that a partial dyad symmetry sequence of 8-1-8 ( ATRTATYTTAAATAtat , where R is G or A and Y is C or T [ the last three bases are not conserved between NsrR-binding sites in nasD and hmp ] ) is the cis site required for NO-sensitive repression by NsrR ( 6 , 9 , 16 ) . 
+ Base substitutions of every residue in the sequence revealed that T and A at the fourth and ﬁfth positions in the upstream half-site are important , and A at the ﬁfth position in the downstream half-site is the most critical residue for in vivo and in vitro NsrR activity . 
+ In addition , the deletion of the center T , but not its substitution to G , resulted in the loss of NsrR activity ( 9 ) . 
+ The downstream part of the NsrR motif identiﬁed by MEME ( Fig. 4 ; see also Data Set S4 ) , namely , ATATTATGTAAAC , does not show dyad symmetry ; however , it could be read as ATATYATGTAAAC , which somewhat resembles RTATYTTA AATAt ( italicized letters show important residues for NsrR activity ) in the NsrR-binding site of nasD , and importantly , the critical distance between TA and A ( TAN8A ) is conserved . 
+ The results suggest that the consensus sequence emerging from the analysis presented above might function as an NsrR operator site , at least for certain genes ( Data Set S4 ) . 
+ In conclusion , comparison of the proposed consensus sequences for Fur , ResD , and NsrR brought to light that the A T-rich sequences have a moderate similarity to each other , which could explain the overlapping DNA contacts of these TFs . 
+ Interestingly , the proposed Fur and NsrR consensus sequences contain TGA and TGT , respectively , whereas the ResD consensus sequence has TGA and TGW . 
+ Any variation departing from each consensus sequence likely determines which TF more strongly occupies these sites to control transcription . 
+ Consensus sequences are identiﬁed within a narrow range of DNA where the three TFs competitively bind within promoters . 
+ ResD and NsrR bind to narrow regions within the ykuN and fbpC promoters only in the presence of the other two TFs , while others such as the ymfC promoter are sites where ResD/NsrR/Fur ( or NsrR/ResD ) competitively bind ( Table 2 ; see also Data Set S1 ) . 
+ As a proof of concept , we determined whether the proposed consensus sequences ( Fig. 4 ) could be identiﬁed in the overlapping binding sites of these promoters . 
+ Fur enrichment to the ykuN promoter is 30 - to 50-fold stronger than for NsrR and ResD ( Fig. 5A ) , respectively , which mirrors the presence of two strong Fur consensus sequences within the overlapping binding site . 
+ We reported previously that efﬁcient binding of NsrR and ResD to ykuN depend on each other , and neither protein is enriched at ykuN in the fur mutant ( 14 ) . 
+ The GeF-seq results of NsrR and ResD binding are consistent with the previous ﬁndings . 
+ On the other hand , the resD nsrR mutation led to a shift of the Fur-binding site to the core promoter region . 
+ Because of the presence of the strong Fur consensus sequences , it was difﬁcult to pinpoint NsrR and ResD consensus sequences , although some similarity to their consensus sequences is detected within the binding site . 
+ Fur binding is relatively weak compared with ResD and NsrR binding at the ymfC , yoeC , ydhB , yobR , and yclA promoters and is only visibly detected at pps . 
+ NsrR was enriched at these promoters to a greater extent than ResD or Fur . 
+ When the binding region of ymfC was expanded ( Fig. 5B ) , we clearly observed that the three TFs bound within a 40-bp segment . 
+ Furthermore , the consensus sequences for Fur , ResD , and NsrR proposed in Fig. 4 were detected within a 25-bp sequence that carries multiple TGAs and TGTs . 
+ The results demonstrated that the high-resolution mapping is a powerful tool for identifying each consensus sequence for multiple TFs that bind to overlapping targets , if the DNA afﬁnity of each is comparable . 
+ ResD and NsrR bind to sequences outside promoter regions . 
+ ResD and NsrR mostly bind to promoter sites and ResD often binds to entire coding regions of positively controlled genes , but they occasionally show a narrow peak of binding in coding regions . 
+ Such examples are recombination sites of SP ( yodU-yotN and ykoA-ypgP ) and sigK intervening ( skin ) element ( 39 ) ( spoIVCB-spoIVCA and yqaB-spoIIIC ) , to which both ResD and NsrR bind ( Table 2 ) . 
+ The B. subtilis genome carries 10 prophagelike sequences identiﬁed as A T-rich regions ( 40 ) . 
+ Interestingly , the SP prophage and skin are inserted into protein-coding regions ( 41 , 42 ) . 
+ Upon DNA damage ( caused by mitomycin C or UV ) , SP is excised from the genome and begins lytic development . 
+ The skin element interrupts the sigK gene encoding one of the sporulation mother cellspeciﬁc sigma factors . 
+ At late sporulation , two parts of genes interrupted by skin , spoIVCB and spoIIIC , are connected by site-speciﬁc recombination in the mother cells to generate the intact sigK gene . 
+ Recent work showed that the site-speciﬁc recombination of the SP region also occurs during sporulation , which generates the functional spore polysaccharide gene spsM from truncated yodU and ypqP ( 43 ) . 
+ The sequences that ResD and NsrR bind cover the sites of recombination catalyzed by recombinases SprA ( for SP ) ( 43 ) and SpoIVCA ( for sigK ) ( 39 ) ( Data Set S1 ) . 
+ ResD and NsrR bind to the upstream site of ypqP that encodes the C-terminal part of SpsM ( see Fig . 
+ S2 ) . 
+ The NsrR-binding sequence partially overlaps with attR , and the ResD-binding sequence covers half of the inverted repeat of the SP recombination site . 
+ As expected from the relatively low intensity of both NsrR and ResD binding to yodU encoding the N-terminal part of SpsM , both binding sites show less similarity to each TF 's consensus sequence . 
+ NsrR again binds to the site that overlaps with attL , and ResD binds to the other half of the inverted repeat , except on the opposite strand . 
+ Excision of the phage will leave an intact NsrR-binding site , while the ResD-binding region is excised with the phage . 
+ It remains to be determined whether the binding of ResD and/or NsrR to these sites plays any role in site-speciﬁc recombination during sigK generation and/or the life cycle of SP phage . 
+ The TFs , particularly Fur , bind within a narrow segment of coding-region DNA or at the 3 = end of certain genes . 
+ In addition to prophage-like insertion sites , GeF-seq provided evidence that the TFs bind to other locations outside promoter regions . 
+ For example , Fur binds to 32 and 48 promoter DNAs in the wild-type and resD nsrR mutant strains , respectively , but Fur also binds to 65 and 50 coding regions in the wild type and the resD nsrR mutant , respectively . 
+ These sites include resE , hemX , gltT , and kinB . 
+ ResD and NsrR only bind to coding regions of ppsB ( plipastatin synthase gene ) in the wild type , whereas Fur binds in both the wild type and the mutant ( Table 1 ) . 
+ NsrR interacts with the ydjA-coding region in the resD fur mutant , but Fur only weakly binds to the site , while ResD shows no interaction . 
+ Although yodU , spoIVCB , and spoIIIC belong to this category , these sites function as targets of site-speciﬁc recombination as described above . 
+ Why Fur binds to many coding regions and how this binding affects gene expression are unknown , although a road-block mechanism has been reported in the binding of the CodY TF to a coding region ( 44 ) . 
+ Among the binding sites at the 3 = ends of genes , the regions between ynfE and xynC , as well as between yoaM and yozS , contain sites where the three TFs bind in the mutant strains ( Table 2 ) . 
+ In these cases , NsrR appears to bind more strongly than ResD or Fur , a pattern similar to that for ymfC described above ( Fig. 5B ) . 
+ Consistent with this ﬁnding , we were able to detect the consensus sequences of all three TFs within a 30-bp region where all three bind ( Fig. 5C and D ) . 
+ We noticed that the binding sites likely reside within upstream regions of unassigned genes in the strain 168 genome . 
+ Small hypothetical proteins were annotated to both regions in the genomes of other B. subtilis strains according to NCBI . 
+ Although both genes carry putative 10 sequences , a Shine-Dalgarno sequence is missing ( the putative open reading frame located downstream of yoaM ) or weak ( the one located downstream of ynfE ) , and whether the regions code for protein or RNA needs to be investigated . 
+ DISCUSSION
+ Genome-wide transcriptome analysis , such as RNA-seq , is a powerful method to identify genes controlled by each TF . 
+ However , the method is not always applicable to identifying genes that are directly regulated by TF-DNA contact . 
+ Particularly , it is difﬁcult to determine direct or indirect control when a TF exhibits less sequencespeciﬁcity toward its target and/or is inﬂuenced by multiple TFs binding within the target 's vicinity . 
+ Here , we investigated genome-wide binding of Fur , ResD , and NsrR . 
+ NsrR was originally considered a repressor that binds to DNA in a sequence-speciﬁc manner ( 9 , 10 , 16 ) , but later studies raised the possibility that it interacts with promoters lacking the putative consensus sequence identiﬁed in nasD and hmp ( 9 , 14 ) . 
+ We used GeF-seq to detect in vivo binding of the three TFs . 
+ Compared with our previous ResD and NsrR ChAP-chip data ( 14 ) , GeF-seq enabled us to identify binding sites with a much higher resolution . 
+ The results conﬁrmed the previous ﬁnding that the method resolves protein-DNA binding at a resolution similar to that obtained by in vitro DNase I footprinting ( 15 ) . 
+ For example , Fur binding identiﬁed by GeF-seq achieved a comparable resolution to that obtained by in vitro DNase I footprinting ( 37 ) ( Data Set S5 ) . 
+ The three TFs bind cooperatively within a 75-bp DNA sequence ( Fig. 5A ) and competitively when targeting a segment smaller than 40 bp ( Fig. 5B to D ) . 
+ We examined whether these DNA sites contain each consensus sequence . 
+ One such site , the ykuN promoter , is extremely complex . 
+ As marked with arrows for Fur , direct and inverted repeats of four TGAW3 motifs were detected as two clusters downstream of 35 and the transcription start site ( Fig. 5A ) . 
+ Identical sequences ( TGAAAATCATTATCA ) of the 7-1-7 Fur consensus motif reside within both TGAW3 clusters . 
+ In addition , as TGANAANW3 might be a ResD-binding sequence , we were unable to draw a reasonable conclusion to describe the ResD/NsrR-DNA binding location . 
+ On the other hand , we successfully identiﬁed all consensus sequences in the TF-binding sites of ymfC , pps , yobR , and two unassigned genes downstream of ynfE and yoaM ( Fig. 5B and data not shown ) . 
+ The highest NsrR-binding intensity among the three TFs is a common characteristic to these binding sites . 
+ Although this characteristic applies to genes , including ydhB and yclA , the NsrR consensus sequence was not detected in these genes . 
+ Given that more than half of the NsrR-binding sites do not carry the putative consensus sequence , the entire picture of NsrR-operator binding is not fully uncovered . 
+ This study showed that a part of the ResD population likely associates with RNAP during the elongation of certain genes . 
+ Although ResD interaction with RNAP was previously reported ( 26 ) , our work revealed that a majority ( if not all ) of the genes to which ResD binds across coding regions are activated by ResD and play roles in aerobic/anaerobic respiration . 
+ It is not clear whether one such gene , ndk , has any role in respiration , although a possible involvement of Ndk ( nucleotide diphosphate kinase ) in energy production was previously indicated by a study in which Ndk was shown to function in ATP generation from phosphoenolpyruvate ( PEP ) in an oxygen-independent manner via a phosphotransfer network in Pseudomonas ﬂuorescens ( 45 ) . 
+ Ndk that interacts with the inner mitochondrial membrane was shown to couple nucleotide transfer with respiration ( 46 ) . 
+ The functions of yxiE and yozB remain to be investigated . 
+ yxiE is induced by phosphate starvation independently from the PhoPR two-component regulatory system and B , which is a rare exception among phosphate starvation-induced genes ( 47 ) . 
+ Transcription of ctaO , ydbL , and yfmQ is higher under anaerobic than under aerobic conditions ( 27 ) , and YdbL and YfmQ , like CtaO , are predicted to localize to the cell membrane . 
+ yizD transcription is activated by oxygen limitation ( 27 ) , and the divergently transcribed small gene was identiﬁed as a homolog of RsaE , an sRNA widely conserved in Gram-positive bacteria ( 48 ) . 
+ The B. subtilis RsaE ( now renamed RoxS ) is activated by ResDE in response to NO ( 49 ) . 
+ Genes involved in redox homeostasis are upregulated in the roxS mutant , which include cydA , ykuN , and resA that were a focus of our study . 
+ Therefore , ResD negatively regulates these genes through RoxS , while it upregulates cydA , ykuN , and resA as a direct activator , an antirepressor , and indirectly via an as-yet-unidentiﬁed regulatory pathway , respectively . 
+ The small basic protein YizD ( 55 amino acids [ aa ] , pI 9.7 ) might function as a RoxS chaperone . 
+ All our data in earlier studies led us to believe that ResD - and NsrR-dependent transcriptional control of hmp and nasD operate similarly , but the current study revealed that the assumption is not fully supported and that the proposed mechanism will need to be revised based on the results from future experimentation . 
+ One unexpected observation is that ResD-RNAP binding around the nasD transcription start site was much stronger than binding to the upstream sites previously identiﬁed by in vitro footprinting experiments ( Fig. 1B ) . 
+ On the other hand , ResD interacts with the hmp upstream regulatory site as well as the transcription start site ( Fig. 1A ) . 
+ The difference might be due to the core promoter ( a suboptimal 10 for hmp and a strong TG 10 extended promoter motif [ TG 10 ] element for nasD ) . 
+ ResD binding at the upstream site might be released upon recruitment of RNAP to the nasD promoter , while the 
+ RNAP-ResD complex might be stalled at the promoter due to the stable RNAP/TG 10 promoter DNA complex before promoter escape . 
+ The upstream ResD-binding sites in nasD overlap a CodY-binding sequence ( 50 ) , and a TnrA-binding sequence resides between the CodY and NsrR sites ( 51 ) ( Fig. 1B ) . 
+ TnrA activates the nasD operon only under poor nitrogen conditions and not under the anaerobic fermentation conditions used in this study ( 19 ) . 
+ Because the CodY-binding site overlaps with the ResD-binding site , we examined whether CodY has any role in anaerobic nasD expression . 
+ For example , CodY might play a positive role by binding to the site , either as an activator , coactivator , or an antirepressor . 
+ nasD expression was upregulated by the nsrR mutation , but the expression in the codY mutant was as low as that in the wild type . 
+ nasD-lacZ expression in the codY nsrR mutant is similar to that in the nsrR mutant ( data not shown ) , indicating that CodY does not play a major role in nasD transcription under anaerobic conditions . 
+ However , we observed a reproduc-ible minor growth defect in the absence of codY . 
+ MATERIALS AND METHODS
+ Bacterial strains and culture conditions . 
+ Strains and plasmids used in this study are listed in Table S1 in the supplemental material . 
+ All B. subtilis strains used in this study are derivatives of strain 168 unless otherwise stated . 
+ Strains OC0010 , ORB8238 , and ORB8440 expressing C-terminally His12-tagged nsrR , resD , and fur at their native loci were constructed previously ( 14 ) . 
+ These strains produce the 12 His-tagged proteins that are functional in vivo based on the effects of each TF in transcriptional control of target genes as previously described ( 14 ) . 
+ Strains expressing His12-tagged nsrR in the resD fur double mutant background ( ORB9091 ) , His12-tagged resD in the nsrR fur mutant background ( ORB9092 ) , and His12-tagged fur in the nsrR resD mutant background ( ORB9093 ) were also constructed . 
+ To this end , chromosomal DNA of HB2501 ( fur : : neo ) was used to transform OC0010 to generate ORB8277 ( nsrR-His12-tet fur : : neo ) , which was then used for transforming with MH5260 ( resD : : cat ) chromosomal DNA , resulting in ORB9091 ( nsrR-His12-tet fur : : neo resD : : cat ) . 
+ Similarly , HB2501 and ORB6179 ( nsrR : : cat ) chromosomal DNA was used to construct ORB9092 ( His12-resD fur : : neo nsrR : : cat ) , and ORB6179 and LAB2511 ( resD : : spc ) DNA was used to construct ORB9093 ( fur-His12 nsrR : : cat resD : : spc ) . 
+ The successful construction of His-tagged genes was conﬁrmed by PCR , followed by sequencing . 
+ To determine the regulatory roles of the TFs that interact with speciﬁc promoter regions , transcriptional lacZ fusions were used . 
+ The lacZ fusions used are those integrated at the native loci or ectopic loci ( thrC or amyE ) ( see Table S1 ) . 
+ - Galactosidase activities were measured in the wild type and in mutant backgrounds that lack resD , fur , and/or nsrR as indicated . 
+ B. subtilis strains were grown anaerobically in 2 YT supplemented with 0.5 % glucose and 0.5 % pyruvate ( fermentative conditions ) where NsrR is active ( 6 ) . 
+ Genome footprinting by high-throughput sequencing . 
+ GeF-seq was carried out as described in the previous report on genome-wide AbrB-binding in B. subtilis ( 15 ) with some modiﬁcations . 
+ In short , OC0010 ( nsrR-His12 ) , ORB8238 ( resD-His12 ) , or ORB8440 ( fur-His12 ) was grown anaerobically at 37 °C in 2 YT with 0.5 % glucose and 0.5 % pyruvate until the end of exponential growth ( T0 ) and cells were treated with formaldehyde . 
+ For the mutant backgrounds , ORB9091 ( nsrR-His12 resD : : cat fur : : neo ) , ORB9092 ( resD-His12 nsrR : : cat fur : : neo ) , and ORB9093 ( fur-His12 resD : : spc nsrR : : cat ) were used and cultured similarly . 
+ Cells were harvested and treated with 5 mg/ml lysozyme in isotonic SMM buffer ( 0.02 M maleic acid , 0.5 M sucrose , and 0.02 M MgCl2 , pH 6.5 adjusted with NaOH ) in the presence of EDTA-free protease inhibitor cocktail and 1 mM phenylmethylsulfonyl ﬂuoride ( PMSF ) before DNase I and RNase A treatment . 
+ DNase I concentrations were adjusted to the level that generates fragments less than 100 bp in size based on agarose gel electrophoresis . 
+ The reactions were terminated by the addition of UT buffer ( 0.1 M HEPES [ pH 7.5 ] , 0.01 M imidazole , 8 M urea , 0.5 M NaCl , 1 % Triton X-100 , 10 mM - mercaptoethanol , 1 mM PMSF ) . 
+ The samples were then sonicated on ice using an Astrason Ultrasonic Processor XL ( Misonix ) for 10 min ( 4 s `` on '' and 10 s `` off '' at output level 5 ) . 
+ After removing cell debris , protein-DNA complexes were afﬁnity-puriﬁed using Dynabeads His-Tag isolation and pulldown ( Thermo Fisher Scientiﬁc ) and were reverse cross-linked by incubating at 65 °C overnight . 
+ The DNA library for sequencing by the Illumina Genome Analyzer IIx ( GAIIx ) or Illumina HiSeq2000 was generated using the NEBNext DNA Sample Prep reagent kit ( New England BioLabs ) according to the manufacturer 's instructions with some modiﬁcations . 
+ To reduce the loss of small-sized DNA fragments , all DNA puriﬁcation steps before adapter ligation were carried out using a QIAquick nucleotide removal kit ( Qiagen ) . 
+ In addition , after adapter ligation to the DNA fragments , the size selection step that was done previously was eliminated to recover all sizes of DNA fragments bound by target protein . 
+ The DNA fragments were then puriﬁed using a QIAquick PCR puriﬁcation kit ( Qiagen ) and were ampliﬁed using PCR to obtain at least 1 fmol of DNA library . 
+ The amount of DNA was determined using an Agilent 2100 Bioanalyzer with a high-sensitivity DNA kit ( Agilent ) . 
+ The sequence of the library was then determined by 69-bp or 100-bp paired-end sequencing using Illumina GAIIx or Illumina HiSeq2000 , respectively . 
+ GeF-seq data analysis . 
+ The mapping of read sequences to a reference genome was carried out using the mpsmap software ( 52 ) . 
+ The detection of depth peaks and visualization of the results were carried out with methods previously described ( http://metalmine.mydns.jp/maps/gefseq ) with some modiﬁcations ( 15 ) . 
+ In the present analysis , the positions of both ends of DNA fragments had been determined based on paired-end read information , using pair_map software we developed ( http : / / metalmine.mydns.jp / maps/pair _ map ) . 
+ Regions between the starting positions of paired reads within the range of 25 to 100 bp were collected as peak candidates . 
+ To detect the majority of potential peaks , the regions where signal intensities exceeded threshold value ( 50 for NsrR binding in the wild-type strain and 100 for others ) at more than half of the nucleotides between them were extracted . 
+ The detected peaks include background peaks derived from preferential DNA digestion of A T-rich regions by DNase I and thus are consistent with the local G C content at every 50 bp as shown in Fig. 1 . 
+ The maximum read count was used to represent the intensity of each peak . 
+ The distribution of intensities of primary peaks shows a normal distribution caused by the background peaks derived from the G C content , and the equation was evaluated using Gaussian curve ﬁtting of the IGOR Pro software ( Hulinks Inc. ) . 
+ To extract the peaks caused by protein binding from the background , P values of each peak were calculated and all peaks with P values lower than the threshold ( indicated in the supplemental material ) were assigned as protein-binding peaks . 
+ All positive peaks were then reconﬁrmed by visual inspection using IMC software ( In Silico Biology , Inc. ) . 
+ Ambiguous peaks were removed manually in this process . 
+ Finally , to normalize the read number variation among experiments , read counts at each position of the genome were multiplied by 500 million and then divided by the total read count . 
+ Motif analysis . 
+ NsrR - , ResD - and Fur-binding DNA motifs were analyzed by the BiPad web server ( http://bipad.cmh.edu ) for modeling bipartite sequence elements or the MEME program ( http://meme - suite.org/tools/meme ) . 
+ To search bipartite motifs for Fur , BiPad was used with the following settings : left half-site , gap range lengths , right half-site , and the iteration cycles were set to 7 , 0 to 2 , 7 , and 100 , respectively ( 35 ) . 
+ Using the results , sequence logos of the bipartite motifs with 0 , 1 , and 2 gaps were generated by WebLogo ( 53 ) . 
+ Accession number ( s ) . 
+ Sequencing data in this study were submitted to the DDBJ Sequence Read Archive ( DRA ) under accession number DRA005609 . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at https://doi.org/10.1128/JB .00086 -17 . 
+ ACKNOWLEDGMENTS
+ We thank Adriano O. Henriques , Tsutomu Sato , Boris Belitsky , and Linc Sonenshein for providing B. subtilis strains . 
+ We thank Kazuo Kobayashi for sharing unpublished transcriptomic results used in Table 1 . 
+ We also thank Peter Zuber and Naotake Ogasawara for critically reading the manuscript . 
+ This work was supported by grants from the National Science Foundation ( grant number MCB1157424 to M.M.N. ) , the Advanced Low Carbon Technology Research and Development Program ( ALCA ) of the Japan Science and Technology Agency ( JST ) ( grant number JP15K07359 to S.I. ) , and KAKENHI of the Japan Society for the Promotion of Science ( JSPS ) ( grant number JP26430199 to K.N. ) . 
+ T.Q. was supported by a Partners in Science Program Grant from the M. J. Murdock Charitable Trust .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/28614372.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/28614372.txt 0 → 100644
View file @27818a9
+ to infection by commensal and Shiga toxin
+ Funding : This work was supported by National Institutes of Health ( https://projectreporter.nih.gov/ reporter.cfm ) Grants R21AI115003 , R01-AI064893 , U01-AI075498 , U19-AI116491 , to AAW . 
+ Core support was obtained from the CCTST ( Center for Clinical and Translational Science and Training , for Advancing Translational Sciences Award Number 1UL1TR001425-01 ) and by NIDDK 
+ Abstract 
+ Intestinal organoids model human responses 
+ Sayali S. Karve , Suman Pradhan , Doyle V. Ward , Alison A. Weiss 1☯¤ 1☯ 2 1*
+ 1 Department of Molecular Genetics , Biochemistry , and Microbiology , University of Cincinnati , Cincinnati , Ohio , United States of America , 2 Center for Microbiome Research and Department of Microbiology and 
+ Physiological Systems , University of Massachusetts Medical School , Worcester , Massachusetts , United States of America 
+ Infection with Shiga toxin ( Stx ) producing Escherichia coli O157 : H7 can cause the potentially fatal complication hemolytic uremic syndrome , and currently only supportive therapy is available . 
+ Lack of suitable animal models has hindered study of this disease . 
+ Induced human intestinal organoids ( iHIOs ) , generated by in vitro differentiation of pluripotent stem cells , represent differentiated human intestinal tissue . 
+ We show that iHIOs with addition of human neutrophils can model E. coli intestinal infection and innate cellular responses . 
+ Commensal and O157 : H7 introduced into the iHIO lumen replicated rapidly achieving high num-integrity was observed after 4 hours . 
+ O157 : H7 grew as filaments , consistent with activation of the bacterial SOS stress response . 
+ SOS is induced by reactive oxygen species ( ROS ) , and O157 : H7 infection increased ROS production . 
+ Transcriptional profiling ( RNAseq ) dem-onstrated that both commensal and O157 : H7 upregulated genes associated with gastrointestinal maturation , while infection with O157 : H7 upregulated inflammatory responses , including interleukin 8 ( IL-8 ) . 
+ IL-8 is associated with neutrophil recruitment , and infection 
+ Introduction
+ Shiga toxin producing E. coli ( STEC ) , including O157 : H7 are an important cause of diarrheal disease , causing about 265,000 illnesses yearly in the US [ 1 ] . 
+ Shiga toxin ( Stx ) is responsible for the life-threatening systemic complication , hemolytic uremic syndrome ( HUS ) . 
+ Currently , supportive therapy is the only treatment , and importantly , patients treated with antibiotics are more are more likely to develop severe disease , including HUS [ 2 ] . 
+ Stx is an AB5 toxin ; the B-pentamer promotes entry of A-subunit into the mammalian cytoplasm , and the enzymatic Asubunit damages ribosomes , inhibiting protein synthesis [ 3 ] . 
+ Recent studies have shown that the A - and B-subunits can circulate independently , and active toxin is formed by subunit association on the target cell surface [ 4 ] . 
+ The genes for Stx are encoded in the late-gene region of lysogenic bacteriophages , and are silent until viral lytic replication is triggered by the bacterial SOS stress response [ 5 ] . 
+ Some antibiotics upregulate Stx expression , and this is likely responsible for the association of antibiotic treatment with increased risk for severe disease [ 6 -- 8 ] . 
+ O157 : H7 do not naturally infect mice , and there is a need for human model systems . 
+ Remarkable progress has been made using human tissue specific stem-cell propagated enter-oids [ 9 ] . 
+ Especially important has been the demonstration that the previously noncultivatable pathogen , human norovirus , can replicate in human intestinal enteroids [ 10 ] . 
+ Enteroids have been used to model bacterial infection , including studies with E. coli O157 : H7 [ 11 ] . 
+ Some studies have been performed using enteroid monolayers in transwells , with tissue culture medium on both the apical and basolateral surfaces . 
+ The presence of glucose-rich medium on the apical surface does not replicate the nutrient environment in the intestinal lumen , and presents a technical problem for studying bacteria , such as E. coli , which replicate rapidly and produce bi-products that can be toxic to cells . 
+ Furthermore , the smaller size of enteroids makes micro-injection challenging . 
+ Pluripotent stem-cell `` induced human intestinal organoids '' ( iHIOs ) represent a new experimental model to study enteric pathogens [ 12,13 ] . 
+ iHIOs are generated entirely in vitro from pluripotent embryonic stem cells by a process that mimics normal differentiation [ 12 ] , and represent a potentially infinite source of identical tissue samples . 
+ iHIOs represent human tissue from the distal portion of the small intestine [ 12 ] , the tissues favored for initial attachment of E. coli O157 : H7 [ 14 ] . 
+ iHIOs adopt the three-dimensional architecture of the human intestine 
+ [ 12 ] . 
+ The epithelium contains absorptive enterocytes and the major secretory lineages ( paneth cells , enteroendocrine cells , and goblet cells ) , and intestinal functions such as peptide transport and mucus secretion by goblet cells are maintained [ 12 ] . 
+ The epithelium is surrounded by a stratified mesenchyme which contains smooth muscle cells and sub-epithelial fibroblasts [ 12 ] . 
+ iHIOs have been successfully used to model features of embryonic development [ 16 ] and inflammatory bowel disease [ 17 ] . 
+ In this study , iHIOs were infected with commensal as well as pathogenic E. coli O157 : H7 . 
+ The iHIOs were not damaged by infection with commensal E. coli ; however , O157 : H7 produced a very severe and rapid loss of epithelial structural integrity . 
+ Materials and methods
+ E. coli strains
+ We used strains characterized in previous studies . 
+ Bacterial assemblies and sequence data for strains ECOR13 and PT29 are deposited under NCBI BioProject ID : PRJNA359210 . 
+ E. coli 
+ O157 : H7 , PT29S , is a spontaneous streptomycin resistant mutant of PT29S previously isolated from a patient [ 18 ] . 
+ PT29 is sequence type ST-11 ( https://cge.cbs.dtu.dk/services/MLST/ ) and only possesses the genes for the more potent from of Shiga toxin , Shiga toxin type 2 ( Stx2 ) , not Shiga toxin type 1 ( Stx1 ) . 
+ In addition to Stx , the Virulence Finder search program ( https://cge . 
+ cbs.dtu.dk / services/VirulenceFinder / ) , revealed PT29 possessed virulence factor genes typical for O157 : H7 including ; tir ( translocated intimin receptor protein ) , eae ( intimin ) , espA ( type III secretion system ) , espB ( secreted protein B ) , espF ( type III secretion system ) , ehxA ( enterohae-molysin ) , espP ( extracellular serine protease plasmid-encoded ) , iss ( increased serum survival ) , espJ ( prophage-encoded type III secretion system effector ) , etpD ( type II secretion protein ) , astA ( EAST-1 heat-stable toxin ) , nleA-C ( non-LEE encoded effectors A-C ) , katP ( plasmid-encoded catalase peroxidase ) , toxB ( toxin B ) , and iha ( adherence protein ) . 
+ Commensal strain SGUC183 , also known as 183ϕS [ 8 ] is streptomycin and gentamicin resistant derivative of ECOR13 , a non-pathogenic E. coli isolated from a healthy person in Sweden , and is part of the Michigan State University STEC Center ECOR collection [ 8 ] . 
+ ECOR13 is Group A , Sequence Type ST-44 . 
+ The only hit using the Virulence Finder program was glutamate decarboxylase ( GadB ) , which converts glutamate to gamma-aminobutyrate . 
+ This activity helps to maintain neutral intracellular pH following exposure to extremely acidic conditions , such as transit through the stomach [ 19 ] . 
+ Streptomycin resistance was selected as a spontane ¬ 
+ Reagents and equipment used in iHIO culture
+ Matrigel basement membrane matrix ( BD Biosciences , cat . 
+ 356234 ) , and extracellular matrix gel ( Sigma , cat . 
+ E1270 ) were used to embed the iHIOs in order to support development of 
+ 3-dimensional architecture . 
+ Gut media for iHIO culture was prepared using advanced Dulbecco 's Modified Eagle Medium/Ham 's F-12 ( DMEM/F12 ) ( Gibco , Invitrogen , cat . 
+ 12634 -- 028 ) supplemented with B27 insulin ( Invitrogen , cat . 
+ 17504044 ) , N2 supplement ( Invitrogen , cat . 
+ 17502048 ) , 2 mM L-glutamine ( Fisher , cat . 
+ SH3003401 ) , 15 mM HEPES ( Invitrogen , cat . 
+ 15630080 ) , 100 ng ml epidermal growth factor ( R&D Systems , cat . 
+ 236-EG-200 ) , and either 2 -1 mM penicillin/streptomycin ( Invitrogen , cat . 
+ 15140 -- 122 ) or penicillin alone ( Amresco , cat . 
+ E480-20ML ) . 
+ iHIOs were maintained in tissue culture treated Nucleon delta 4-well ( Nunc , cat . 
+ registry number 0043 ) [ 12 ] were obtained from Pluripotent Stem Cell Facility and Organoid Core at Cincinnati Children 's Hospital and Medical Center . 
+ iHIOs were maintained in reconwith a micropipette puller ( Sutter Instrument Company ) . 
+ The sealed tips of the capillaries were cut open using Cuterz glass scissors , and the capillaries were loaded onto Nanoject II auto-nanoliter injector ( Fisher , cat . 
+ 13-681-455 ) . 
+ Microinjections were performed , and before and after injection images of iHIOs were obtained using a stereomicroscope ( Leica ) . 
+ In some studies iHIOs were co-injected with 2.5 mg ml of fluorescent dye fluorescein isothiocyanate -1 
+ ( FITC ) to label the lumen and to assess maintenance of the epithelial barrier as reported in previous studies [ 21 ] . 
+ The iHIOs were incubated at 37 ˚C in a humidified chamber containing 5 % CO2 for 5 days . 
+ For bacterial infections , approximately 10 3 E. coli cells were microinjected into the iHIO lumen . 
+ The iHIOs infected with E. coli were incubated in reconstituted gut media containing penicillin ( final concentration 100 U ml ) at 37 ˚C in a humidified chamber -1 with 5 % CO2 for 1 day . 
+ Images of the injected iHIOs were collected using Zeiss LSM710 Live Duo Confocal Microscope . 
+ Cryosectioning and staining of iHIOs . 
+ iHIOs were fixed in 4 % paraformaldehyde ( 2 to 4 hours ) followed by 30 % sucrose ( overnight ) . 
+ Organoids were prepared for cryosectioning by freezing at -20 ˚C in Tissue Freezing Medium ( Fisher , cat . 
+ 15-183-13 ) . 
+ Cryosections ( 10 μm ) were prepared with BD Cryotome FSE Cryostat and the sections placed on a plus glass microscope slides . 
+ Histologic stains are listed in S1 Table . 
+ To stain , sections were fixed in cold acetone ( 10 minutes ) , rinsed with distilled water , and blocked in blocking buffer ( PBS containing 
+ 10 % goat serum , 1 % bovine serum albumin ( BSA ) and 0.01 % Triton X-100 ) for 2 hours at room temperature in a humidified chamber . 
+ The slides were drained and stained with primary antibody ( 1:500 ) in blocking buffer in a humidified chamber at 4 ˚C overnight . 
+ Sections were rinsed twice with wash buffer ( PBS with 0.1 % BSA and 0.025 % Triton X-100 ) , and secondary antibody was diluted in PBS ( 1:1000 ) and was applied to the sections . 
+ The sections were allowed to incubate with the secondary antibody for 2 hours in dark at room temperature . 
+ The sections were washed with PBS and DNA was counterstained with Hoechst ( 1 μg / mL ) or DAPI ( 0.5 μg / mL ) dye for 2 minutes in dark . 
+ Stained sections were air-dried and mounted using VectaMount permanent mounting medium . 
+ Analysis was performed using Zeiss LSM710 Live Duo Confocal Microscope . 
+ Merged images were generated and the FITC fluores ¬ 
+ After indicated incubation times at 37 ˚C with 5 % CO2 in a humidified chamber , the organoids were removed from the 3-dimensional culture matrix , transferred to an eppendorf tube , and washed with ice cold PBS . 
+ The organoids were then transferred to a sterile 2-ml tissue homogenizer , disrupted , and suspended in 100 μl PBS . 
+ Subsequent dilutions were plated on L-agar plates , and incubated at 37 ˚C overnight . 
+ The total number of bacteria per organoid was calculated based on the colony forming units ( CFU ) observed on the agar plates on the next day . 
+ Antibiotics are needed to confine bacterial growth to the lumen . 
+ Except where indicated , bacterial challenge studies were performed with streptomycin-resistant strains using extracellular matrix without gentamicin , and tissue culture media supplemented with penicil-lin and streptomycin . 
+ Penicillin-sensitive strains were able to grow in the organoid lumen , but not in the tissue culture medium when penicillin was in the tissue culture media . 
+ Gentamicin and streptomycin inhibited growth of antibiotic sensitive E. coli C600 strain [ 7 ] injected into the lumen . 
+ Matrigel is not available without antibiotics , so we transferred the organoids to the antibiotic-free extracellular matrix gel ( Sigma Aldrich , cat . 
+ # E1270 ) when working with genta ¬ 
+ Western blots to quantify Stx production
+ The iHIOs were infected with 10 PT29S cells and incubated at 37 ˚C with 5 % CO2 in a hum 3 ified chamber . 
+ After the indicated incubation times , the iHIO suspension was obtained as described above . 
+ No signal was detected in the supernatants of the lysed organoids ( data not shown ) . 
+ To determine whether the toxin was bound to the cell membrane , the lysate was centrifuged at 8600 x g 5 minutes at 4 ˚C and the supernatant and pellet fractions were analyzed by Western blot . 
+ Proteins were resolved in Bio-Rad Mini PROTEAN Tetra Cell using the 4 -- 15 % precast Mini-PROTEAN TGX ™ gel . 
+ Samples were boiled for 7 minutes in sample buffer ( 1M Tris , pH 
+ 6.8 , 50 % glycerol , 10 % SDS , 0.5 % bromophenol blue , 0.5 % beta-mercaptoethanol ) before being loaded in a 15 μL volume . 
+ Gels were run at a constant 30 milliamps until the bromophenol blue dye reached the bottom of the gel . 
+ Proteins were transferred to a PVDF membrane in a Hoefer TE series transphor electrophoresis unit at 100 V for 1 hour using chilled transfer buffer ( 10 % methanol , 24 mM Tris pH 8.3 , 194 mM glycine ) . 
+ After transfer , the PVDF membrane was wetted in 100 % methanol for 1 minute followed by PBS for two minutes . 
+ The membrane was incubated with primary antibody rabbit polyclonal recognizing Stx2 A - and Bsubunits ( 1:5000 ) in Odyssey blocking diluent with 2 % Tween 20 , overnight followed by three washes in PBS-T ( PBS with 0.1 % Tween 20 ) . 
+ IRDye 800CW Diluted Secondary antibody ( Goat anti-rabbit ) ( 1:10,000 ) in Odyssey blocking diluent with 0.2 % Tween 20 was added to the membrane and incubated in the dark for one hour at room temperature with gentle shaking . 
+ The membrane was rinsed with PBS-T with vigorous shaking for 5 minutes . 
+ The washing was repeated three times and finally rinsed with PBS to remove the residual Tween 20 before the membrane was imaged in the Odyssey Family Imaging System ( LI-COR ; Odyssey CLx 1 
+ Near-Infrared ( NIR ) imaging system ) for the presence of Stx2a . 
+ Purified Stx2a at 25 and 50 ng was used as the positive control . 
+ The respective protein bands were quantified compared to the 
+ Stx2a standards using LI-COR Image Studio 4.0 software.
+ Assessment of production of reactive oxygen species
+ iHIOs were infected with 10 commensal or 10 pathogenic O157 : H7 in a medium devoid of 3 3 antibiotics and incubated at 37 ˚C with 5 % CO2 in a humidified chamber for a period of 4 h. Saline alone injected organoids were used as controls . 
+ At the indicated time , the iHIOs were again injected with 230 nL at 830 nM concentration of ROS detection reagent from Enzo Life Sciences . 
+ The iHIOs were further incubated at 37 ˚C with 5 % CO2 in a humidified chamber for an hour and the fluorescent intensity observed under a fluorescent microscope ( Nikon Eclipse 
+ TE2000-U) and the fluorescence quantitated by image processing program ImageJ.
+ Isolation and labeling of PMNs
+ De-identified human peripheral blood was obtained from the Cell Processing Core at Cincinnati Children 's Hospital Medical Center . 
+ 5.0 ml of blood in EDTA was carefully layered onto 5.0 ml of Polymorphprep ™ ( Axis-Shield , Cat # 2017 -- 11 ) , and centrifuged at 500 G for 35 min at room temperature . 
+ The upper band of plasma and mononuclear cells was removed , and the lower band of PMNs was harvested . 
+ An equal volume of half-strength HEPES-buffered saline 
+ ( 0.425 % ( w/v ) NaCl , 5 mM HEPES-NaOH , pH 7.4 ) was added to the PMN suspension . 
+ The PMNs were harvested by centrifugation at 400 G for 10 min at room temperature and suspended in the modified gut medium . 
+ Cell counts were performed using the 40μm Scepter ™ 
+ Cell Counter Sensor ( Millipore , Cat # PHCC40050 ) . 
+ The purified PMNs were labeled with 5 μM CellTracker ™ Violet BMQC dye ( Cat # C10094 , Molecular Probes ) for 30 minutes , centri-fuged to remove excess dye and the washed PMNs were suspended at the required number in gut medium . 
+ CO2 in a humidified chamber . 
+ For experiments without antibiotics , after injection the orga-noids were washed 3 times with sterile PBS to remove extracellular bacteria . 
+ After 4 hours , 5 X 10 PMNs in 20 4 μL were added to the wells and incubated for the indicated times . 
+ Fluorescent intensity of the labeled PMNs was observed on intact organoids by confocal microscope ( Zeiss LSM710 LIVE Duo ) , and quantified by image processing using ImageJ . 
+ A standard plane of focus was used for all confocal images ; the presence of the green FITC fluorescence indicates the image included the luminal compartment . 
+ The outline of the bright field image was used to define the boundaries of the organoid , and violet fluorescence within the boundary was considered to be due to internalized PMNs . 
+ Values were normalized to account for difference in 
+ Bioinformatics RNA-seq data analysis
+ RNA-seq was performed by Genomics , Epigenomics and Sequencing Core ( GESC ) in the University of Cincinnati . 
+ For each treatment , the total RNA from three independent iHIOs was extracted by using mirVana miRNA Isolation Kit ( Lifetech , Grand Island , NY ) with total RNA extraction protocol . 
+ Briefly , iHIOs were lyzed with lysis/binding buffer , treated with homoge-nate additive , and extracted with acid-phenol : chloroform . 
+ The supernatant was mixed with ethanol and passed through the filter cartridge . 
+ Bound RNA was washed and eluted . 
+ RNA concentrations were determined by Nanodrop ( Thermo Scientific , Wilmington , DE ) , and integ-rity was determined by Bioanalyzer ( Agilent , Santa Clara , CA ) . 
+ The Apollo 324 system 
+ ( WaferGen , Fremont , CA ) and PrepX PolyA script was used for automatic polyA RNA isolation . 
+ The library was prepared using PrepX mRNA Library kit ( WaferGen ) and Apollo 324 
+ NGS automatic library prep system . 
+ Isolated RNA was RNase III fragmented , adaptor-ligated and converted to cDNA with Superscript III reverse transcriptase ( Lifetech , Grand Island , 
+ NY ) , followed by automatic purification using Agencourt AMPure XP beads ( Beckman Coulter , Indianapolis IN ) . 
+ The targeted cDNA fragments were around 200 base pairs ( bp ) . 
+ Universal ( SR ) and index-specific primers were added to each adaptor-ligated cDNA sample and the amplified library was enriched by AMPure XP beads purification , and quality and yield of the library was assessed by Kapa Library Quantification kit ( Kapabiosystem , Woburn , MA ) using ABI 's 9700HT real-time PCR system ( Lifetech ) . 
+ Individually indexed libraries were proportionally pooled ( 20 -- 50 million reads per sample ) for clustering in cBot system ( Illumina , San Diego , CA ) . 
+ Libraries at the final concentration of 15.0 pM were clustered onto a single read 
+ ( SR ) flow cell using Illumina 's TruSeq SR Cluster kit v3 , and sequenced for 50 bp using TruSeq SBS kit on Illumina HiSeq system . 
+ To analyze differential gene expression , sequence reads were aligned to the human genome using the TopHat aligner [ 22 ] , and reads aligning to each known transcript were counted using Bioconductor packages for next-generation sequencing data analysis [ 23 ] . 
+ The differential expression analysis between different sample types was performed for each gene separately using the edgeR Bioconductor package [ 24 ] . 
+ The statistical significance of differential expression is established based on the FDR ( False discovery rate ) - adjusted p-values and are indicated as the values in the padj columns in S2 and S3 Tables ) [ 25 ] . 
+ 19,076 transcripts were characterized ; 18,543 were identified as genes and 15,448 were associated with a gene ontology ( GO ) term using the gene ontology analysis program GOrilla [ 26 ] . 
+ As expected transcripts ( e.g. IL-13 , IL-25 , IL-22 , INF-γ , TNF , and IL-12 ) restricted to hematopoietic lineages were not detected . 
+ Venn diagrams were prepared using the online tool from Bioinformatics & Evolutionary Genomics ( http://bioinformatics.psb.ugent.be/webtools/Venn/ ) . 
+ Results and discussion
+ Sensitivity to LPS depends on route of exposure
+ iHIOs resemble sterile neonatal tissue [ 15 ] . 
+ Microbial colonization promotes maturation of the neonatal intestine , and Gram negative lipopolysaccharide ( LPS ) elicits strong responses , which are dependent on the cell-surface that is exposed . 
+ For iHIOs , introduction of LPS into the lumen mimics natural intestinal colonization , while addition of LPS to the tissue culture medium mimics life-threatening septicemia . 
+ To assess LPS toxicity , the lumen was labeled with the fluorescent dye , fluorescein isothiocyanate ( FITC ) , and the fluorescence was monitored to indicate maintenance of barrier function ( Fig 2 ) . 
+ Luminal addition of up to 10 ng of LPS did not compromise barrier function ( Fig 2A ) . 
+ This intraluminal concentration is about 
+ 20,000 ng/ml , assuming a spherical organoid with a diameter of 1 mm has a volume of about 0.5 μL . 
+ In contrast , iHIOs were extremely sensitive to LPS added to the surrounding medium 
+ To assess whether E. coli can replicate and persist in the iHIO lumen , approximately 103 nonpathogenic commensal strain of E. coli ( SGUC183 ) or clinical isolate ( PT29S ) of O157 : H7 , which only expressed Stx2a were microinjected into the iHIO lumen under conditions that prevented bacterial growth in the tissue culture medium . 
+ The iHIOs were able to support the growth of both E. coli strains with virtually identical growth rates , although the O157 : H7 strain had a slightly longer lag phase ( Fig 3A ) . 
+ Biphasic growth rates were observed . 
+ During the first 4 hours , both strains had a doubling time of about 30 minutes , similar to in vitro growth rates with aeration in nutrient rich medium . 
+ After about 4 hours , much slower doubling times of about 3 hours were observed , suggesting changes in the lumen environment , such as nutrient or oxygen depletion . 
+ At 24 hours about 10 commensal bacteria were recovered , while after 72 6 hour over 10 commensal bacteria were recovered ( data not shown ) . 
+ Assuming iHIOs are hol-7 low spheres about 0.1 cm in diameter ( radius = 0.05 cm ) , the internal volume is equal to 4/3 πR3 ( or 5.24 x 10 − 4 ml ) , for an estimated density of about 1.9 x 1010 bacteria per ml , within the range of bacterial density in the human ileum ( about 10 per ml ) and colon ( about 10 per 8 12 
+ At 24 hours, commensal E. coli but not pathogenic O157:H7 were recovered from the
+ iHIOs . 
+ Furthermore , after 24 hours , organoids challenged with O157 : H7 were fragile and often broke apart when removed from the extracellular matrix support . 
+ To determine if the inability to recover O157 : H7 at 24 hours was due to loss of the epithelial barrier and subsequent exposure to the antibiotics from the tissue culture medium , at 18 hours the medium standard deviation were determined at the indicated times from three different iHIOs for each strain . 
+ Closed symbols , in a separate experiment iHIOs were injected with 10 commensal E. coli ( SGUC183 , squares ) , or 3 pathogenic O157 : H7 ( PT29S , triangles ) as above , but at 18 hours , the medium was replaced with medium lacking penicillin , and bacterial counts were assessed at 27 hours post inoculation . 
+ B-C , Commensal E. coli 
+ ( B ) replicates in the lumen without damaging the iHIO , while O157 : H7 ( C ) damages the actin layer . 
+ Cryosections of iHIOs 18 hours after injection were stained for DNA ( blue ) , bacteria ( green , anti-E . 
+ coli for commensal , anti-O157 for O157 : H7 ) , and F-actin ( red ) . 
+ Bar indicates 20 μm . 
+ D-E . 
+ Cryosection 1 hour after infection with O157 : H7 , stained for nuclear and bacterial DNA ( DAPI , blue ) , E-cadherin ( green ) , and F-actin ( red ) . 
+ White arrowheads represent bacterial nucleoids co-localized with actin . 
+ ( D ) , Bar indicates 10 μm . 
+ ( E ) , Magnified image of D , bar indicates 2 μm . 
+ F-G . 
+ Cryosections 4 hours after infection with O157 : H7 . 
+ White arrowheads represent bacterial co-localization with actin . 
+ ( F ) , stained for nuclear and bacterial DNA ( DAPI , blue ) , E-cadherin ( green ) , and F-actin ( red ) , bar indicates 10 μm . 
+ ( G ) , stained for nuclear and bacterial DNA ( DAPI , blue ) , F-actin ( red ) , and anti-O157 ( green ) , bar indicates 5 μm . 
+ H-I . 
+ Cryosections 18 hours after infection with O157 : H7 . 
+ Yellow arrows indicate filamentous E. coli . 
+ ( H ) , stained for nuclear and bacterial DNA ( DAPI , blue ) , E-cadherin ( green ) , and F-actin ( red ) , bar indicates 10 μm . 
+ ( I ) , stained for nuclear and bacterial DNA ( DAPI , blue ) , F-actin ( red ) , anti-O157 ( green ) , bar indicates 10 μm . 
+ Representative images of experiments performed at least four times are shown . 
+ containing penicillin and streptomycin was replaced with antibiotic-free medium . 
+ The orga-noids were harvested at 27 hours post-infection ( 9 hours without antibiotics in the medium ) , and colony counts were assessed for both the tissue culture medium and the organoids . 
+ For the commensal , 2 x 10 CFU were recovered from the organoid at 27 hours ( Fig 3A , solid 6 square ) , but no viable bacteria were recovered from the tissue culture medium , suggesting the commensal bacteria continued to replicate within the confines of the lumen . 
+ In contrast , nine hours after antibiotic removal , 6 x 10 viable O157 : H7 were recovered from the organoid ( Fig 4 3A , solid triangle ) , and 3 x 10 were recovered from the tissue culture medium . 
+ These results 5 suggest that about 18 hours post-infection the O157 : H7 destroy the luminal barrier , and are killed if antibiotics are present in the medium . 
+ However , if antibiotics are not present in the 
+ Histologic characterization of iHIOs
+ Cryosections were examined to determine the effect of bacterial infection on iHIO morphology and the luminal epithelial layer . 
+ In sections taken at 18 hours , the epithelial layer of iHIOs injected with non-pathogenic SGUC183 was clearly defined by F-actin ( red ) and similar to PBS-injected organoids ( Fig 1B ) , with numerous bacteria ( green ) within the lumen ( Fig 3B ) . 
+ In contrast , the actin layer of iHIOs injected with O157 : H7 ( Fig 3C ) was clearly disrupted , there was no evidence for a luminal compartment and filamentous bacteria ( green ) were pres ¬ 
+ 3D -- 3I ) . 
+ At one hour ( Fig 3D and 3E ) , while the lumen was clear , breaks in F-actin ( red ) were seen . 
+ At 4-hours post-infection ( Fig 3F and 3G ) , disrupted F-actin and loss of E-cadherin expression was apparent . 
+ At 18 hours post infection ( Fig 3H and 3I ) , the luminal border was gone , F-actin staining was sparse and randomly distributed , and the green E-cadherin staining iHIOs resemble the distal portion of the small intestine [ 12 ] , the tissues that are favored for initial attachment of E. coli O157 : H7 [ 14,30 ] . 
+ Adherence to human intestinal epithelium is a key determinant of pathogenicity . 
+ Strains possessing the locus of enterocyte effacement ( LEE ) display F-actin mediated intimate attachment to epithelial cells [ 31 -- 34 ] . 
+ Individual bacteria could be seen in the expanded images . 
+ O157 : H7 and other enteropathogenic E. coli display intimate attachment to intestinal epithelial cells mediated by the cytoskeletal protein F-actin [ 31 ] . 
+ Intimate contact can activate host antibacterial responses , such as production of reactive oxygen species ( ROS ) , which in turn can induce expression of Stx through activation of SOS response in STEC [ 35,36 ] . 
+ At 1-hour post infection ( Fig 3E , white arrowheads ) , DNA the size of a bacterial nucleoid ( blue staining ) and F-actin ( red ) were co-localized as evidenced by the purple in expanded merged confocal image . 
+ Pedestal formation was not observed , although such structures are typically visualized by electron microscopy . 
+ At 4-hours post infection ( Fig 3F ) numerous bacteria were seen in the lumen , growing primarily as coccobacilli . 
+ Co-localiza-tion of DNA and F-actin was observed ( Fig 3F , white arrowheads ) , and staining with the anti-E . 
+ coli antibody demonstrated that bacteria ( green ) were co-localized with the actin ( Fig 3G , white arrowhead ) . 
+ At 18 hours post infection ( Fig 3H ) , numerous filamentous DNA structures were seen . 
+ To verify that the filamentous structures were E. coli O157 : H7 , the organoids were stained with antibody to O157 LPS ( Fig 3I ) . 
+ Numerous green coccobacilli as well as long green filaments were seen , demonstrating that the small , sub-nuclear DNA structures were E. coli . 
+ Production of reactive oxygen species (ROS)
+ Filaments form when bacteria continue to replicate , but the daughter cells fail to separate , and occurs following exposure to DNA damaging agents , including ROS . 
+ The delay in septation is induced by the bacterial SOS system . 
+ It allows time for DNA damage repair , minimizing transfer of damaged chromosomes . 
+ ROS production was assessed in iHIOs were injected with saline , or 10 commensal or 10 pathogenic O157 : H7 . 
+ After 4 hours , bacterial recovery was 3 3 similar for both strains ( Fig 4A ) . 
+ Injection of O157 : H7 resulted in significantly increased ROS compared to the saline control or injection with the commensal strain ( Fig 4B and 4C ) . 
+ Stx production
+ While the SOS response is designed to protect chromosomal integrity , lysogenic bacteriophage use activation of the SOS response as a signal to initiate lytic replication and escape from a damaged host . 
+ Stx is phage-encoded , and activation of the SOS response initiates Stx expression [ 5 ] . 
+ Stx expression was observed in iHIOs infected with O157 : H7 . 
+ Stx2a was not detected by western blots at 1 , 2 , 4 and 6 hours post-infection ( Fig 5 ) . 
+ At 18-hours post infection , 4 ng and 10 ng Stx2a was detected in two separate experiments . 
+ Transcriptional profiling
+ Relative expression of linage specific genes . 
+ RNAseq was performed on iHIOs at 4 hours post-injection with PBS ( control ) or 10 commensal 3 E. coli or O157 : H7 . 
+ Expression of individual intestinal genes was examined ( Table 1 ) . 
+ Infection with the commensal and O157 : H7 strain resulted in slight ( approximately 2-fold ) , but significantly increased expression of epithelial transcription of proteins that participate in gastrointestinal defenses , such as alkaline phospha-tase , which is involved in detoxification of lipopolysaccharide ( 9 to 19-fold increase ) , the bacteriolytic enzyme lysozyme ( 6-fold increase ) , mucins involved in barrier function , including MUC2 ( 4 to 6 fold increase ) and MUC13 ( 4-fold increase ) , and a structural component of gas-signaling related to the innate immune defenses was also examined ( Table 1 ) . 
+ Bacterial infection constituted the first encounter of the sterile iHIOs with lipopolysaccharide ( LPS ) ; however , expression of the LPS receptor , TLR4 was not altered . 
+ Expression of IL-1β was highly upregulated by infection with either strain ; however , infection with O157 : H7 , but not the commensal strain , resulted in significant upregulation of the inflammatory mediators , IL-8 and IL ¬ 
+ 18, and significant downregulation of NOD-like receptor, NLRC4.
+ Transcriptional enrichment analysis
+ Setting significance at P < 0.05 and using a 4-fold change compared to the PBS controls as the cutoff , infection with the commensal strain resulted in 317 differentially expressed genes ( 95 upregulated and 222 downregulated ) , while infection with the pathogenic O157 : H7 strain resulted in 429 differentially expressed genes ( 160 upregulated and 269 downregulated ) . 
+ The most significantly ( P < 3E-10 ) GO category uniquely upregulated by O157 : H7 infection ( S1A 
+ Fig ) was `` Chemokine-mediated signaling pathway '' ( GO :0070098 ) , with upregulation of the genes indicated in S1A Fig , box . 
+ Other categories uniquely upregulated by O157 : H7 included 
+ `` regulation of response to wounding '' , and the classical MAP kinase pathway , `` positive regulation of ERK1 and ERK2 cascade '' . 
+ Infection with either strain resulted in upregulation of the 
+ GO term , `` digestive system process '' . 
+ The most significant downregulated GO process category for both E. coli strains was the GO term , `` Multicellular organismal process '' ( GO :0032501 ) ; with 82 down-regulated genes for the commensal strain and 88 for O157 : H7 . 
+ The upregulated GO terms were compared ( Fig 6 ) . 
+ Both commensal and O157 : H7 infection up-regulated the GO terms `` Response to iron '' ( GO :0010039 ) and `` Regulation of vascular endothelial growth factor receptor signaling pathway '' ( GO :0030947 ) , and `` Maintenance of gastrointestinal epithelium '' ( GO :0030277 ) . 
+ O157 : H7 uniquely upregulated `` Chemokine mediated signaling pathways '' ( GO :0070098 ) . 
+ PMNs and infected iHIOs
+ The inflammatory mediator , IL-8 , is associated with neutrophil recruitment , alternatively breach of the intestinal barrier could promote recruitment by the presence of pathogen-associ-ated molecular patterns , such as LPS . 
+ We assessed whether pathogenic O157 : H7 promoted neutrophil recruitment . 
+ iHIOs were injected with saline , or 10 commensal or O157 : H7 , in the 3 presence of the fluorescent dye FITC to label the lumen , and incubated for four hours to allow for chemokine expression . 
+ Human PMNs ( polymorphonuclear leukocytes ) , a population comprise primarily of neutrophils , were labeled with fluorescent cell-tracker dye , and 5 x 10 were 4 added to the medium . 
+ Initial studies were done in the absence of antibiotics to allow for assessment of the potential of PMNs to reduce bacterial numbers ( Fig 7A ) . 
+ Bacterial recovery from the iHIOs was similar at 4 hours before addition of the PMNs . 
+ Both strains grew within the organoid after addition of the PMNs , and at 18 hours bacterial recovery from the iHIO in the presence of PMNs ( Fig 7A , full graph ) was not statistically different from growth in the absence of PMNs ( Fig 7A ) . 
+ The culture medium was also sampled . 
+ About 
+ 1300 O157 : H7 were recovered from the media , but only 32 commensal bacteria were recovered from the media , suggesting O157 : H7 may have breached the epithelial barrier . 
+ As shown in Fig 3A , antibiotics can access and kill the bacteria if the epithelial barrier is breached . 
+ The influence of antibiotics in the tissue culture medium on bacterial recovery was assessed in the presence and absence of PMNs ( Fig 7B ) . 
+ At 6 and 8 hours , bacterial recovery was similar in the presence or absence of PMNs . 
+ However , at 23 hours post-infection , the commensal strain was recovered whether or not PMNs were present , but no viable O157 : H7 were recovered . 
+ This is consistent with O157 : H7 induced loss of the intestinal barrier , and further suggests that the presence of PMNs can not prevent the epithelial damage . 
+ Epithelial barrier function was further evaluated by quantifying fluorescence of FITC injected into the lumen fluorescence was recovered from the iHIOs injected with the commensal strain at 18 hours . 
+ Recruitment of PMNs . 
+ Recruitment of PMNs was monitored by microscopy ( Fig 7D -- 
+ 7I ) . 
+ In the merged bright field and fluorescent images , the dark iHIO with a green , FITC-labeled lumen can be seen . 
+ PMNs ( violet ) were seen at the periphery of all iHIOs ( Fig 7D -- 7F and 7G -- 7I ) . 
+ For injection with saline or commensal , violet cells were primarily localized to the periphery of the iHIO . 
+ In contrast , for injection with O157 : H7 , violet cells were seen at the periphery , as well as within the iHIO and in some cases co-localize with the green stain that defines the lumen ( Fig 7I , white arrows ) . 
+ The violet signal within region corresponding to the body of the iHIO was quantified . 
+ Significantly more fluorescent signal was detected in the iHIOs infected with E. coli O157 : H7 than the saline or commensal-infected iHIOs at both 8.5 impermeant dye that stains cellular nucleic acids if the membrane has been compromised ( Fig 7K -- 7M ) . 
+ Significantly more fluorescent signal was detected in iHIOs infected with E. coli 
+ O157 : H7 compared to saline or commensal-infected iHIOs ( Fig 7N ) . 
+ Luminal presence of the phagocyte marker , CD11b , was also monitored in cryosections were in the lumen of the commensal infected iHIOs , as evidenced by red fluorescence ( Fig 8M ) and more were observed in the iHIOs infected with O157 : H7 ( Fig 7N ) . 
+ Less bacterial staining was observed for the commensal E. coli compared to E. coli O157 : H7 , likely due to the use of different bacterial antibodies , since Fig 7A demonstrated recovery of the two strains was similar . 
+ Conclusions
+ Commensal E. coli grew to high numbers in the previously sterile iHIO lumen without causing damage , demonstrating that like the neonatal intestine , the innate defenses of iHIOs are sufficient to contain non-pathogenic bacteria [ 37 ] . 
+ Tolerance of commensal bacteria is also seen in wild type mice , as well as severely immunodepleted NOD scid gamma ( NSG ) mice , lacking mature T cells , B cells , and natural killer ( NK ) cells . 
+ In contrast , growth of pathogenic O157 : 
+ H7 resulted in loss of epithelial barrier function . 
+ Thus some property or properties expressed by pathogenic O157 : H7 , but not commensal E. coli , is responsible for the rapid loss of epithelial barrier function . 
+ Both strains have been sequenced , and a likely candidate is the O157 : H7 LEE pathogenicity island , which is known to alter the integrity of the actin cytoskeleton , a cellular component necessary to maintain epithelial cell contact . 
+ A second candidate is Shiga toxin , which is known to kill cells . 
+ Whether either , both , or neither traits mediate the phenotype observed with O157 : H7 infection could be resolved by experiments with defined mutants in O157 : H7 . 
+ Pathogenic O157 : H7 activated innate defenses , including ROS production ( Fig 4 ) and several inflammatory immune responses ( S1A Fig , Table 1 ) . 
+ The different bacterial morphologies are consistent with differential activation of the host defenses [ 35 ] . 
+ The commensal strain grew normally as cocco-bacilli , while O157 : H7 displayed filamentous growth ( Figs 3 and 8 ) . 
+ In human disease , elevated neutrophil counts have been associated with development of HUS and fatal outcome [ 38,39 ] . 
+ IL-8 induces neutrophil-chemotaxis , and was upregulated by 
+ O157 : H7 . 
+ PMNs accumulated at the iHIO margins , migrated through the tissue and localized within the lumen . 
+ However , recruitment of PMNs did not prevent loss of epithelial barrier function ( Fig 7B and 7C ) or reduce the O157 : H7 numbers ( Fig 7A ) . 
+ This could be since long filamentous chains , as seen for O157 : H7 , can protect bacteria from phagocytosis [ 40,41 ] . 
+ Recruitment and activation of PMNs could contribute to tissue damage without helping to resolve the infection . 
+ Lack of experimental models has hampered investigation of human-restricted pathogens such as E. coli O157 : H7 . 
+ Our studies comparing infection of commensal to pathogenic E. coli demonstrate iHIOs represent a valuable model to study human-restricted enteric pathogens . 
+ Supporting information
+ S1 Fig . 
+ A , Highly significant GO PROCESS pathways upregulated by ( A ) O157 : H7 , PT29S and ( B ) . 
+ Commensal SGUC183 . 
+ S3 Table . 
+ RNAseq 4 hours post infection with O157 : H7 versus PBS ( samples in triplicate ) . 
+ ( XLSX ) 
+ Acknowledgments
+ We would like to thank James Wells , Christopher Mayhew and Amy Pitstick from the Pluripotent Stem Cell and Organoid Core , Cincinnati Children 's Hospital Medical Center , and Chet 
+ Closson from the University of Cincinnati Live Microscopy Core for their valuable input . 
+ We also acknowledge the support from CCTST ( Center for Clinical and Translational Science and 
+ Training , for Advancing Translational Sciences Award Number 1UL1TR001425-01 ) and by NIDDK P30 DK078392 ( Pluripotent Stem Cell and Organoid Core and Live Microscopy 
+ Core ) of the Digestive Disease Research Core Center in Cincinnati . 
+ We thank the Biodefense and Emerging Infectious Diseases Research Resources Repository for providing purified Stx2a 
+ 26.
+ 27.
+ 28 . 
+ Berg R . 
+ The indigenous gastrointestinal microflora . 
+ Trends Microbiol . 
+ 1996 ; 4 : 430 -- 435 . 
+ https://doi.org/ 10.1016 / 0966-842X ( 96 ) 10057-3 PMID : 8950812 
+ 29 . 
+ Canny GO , McCormick BA . 
+ Bacteria in the Intestine , Helpful Residents or Enemies from Within ? 
+ Infect Immun . 
+ 2008 ; 76 : 3360 -- 3373 . 
+ https://doi.org/10.1128/IAI.00187-08 PMID : 18474643 
+ 30 . 
+ Paton JC , Paton AW . 
+ Pathogenesis and diagnosis of Shiga toxin-producing Escherichia coli infections . 
+ Clin Microbiol Rev. 1998 ; 11 : 450 -- 479 . 
+ PMID : 9665978 
+ 31.
+ 32.
+ Kenny B, DeVinney R, Stein M, Reinscheid DJ, Frey EA, Finlay BB. Enteropathogenic E. coli (EPEC) transfers its receptor for intimate adherence into mammalian cells. Cell. 1997; 91: 511–520. PMID: 9390560
+ Melton-Celsa A, Mohawk K, Teel L, O’Brien A. Pathogenesis of Shiga-Toxin Producing Escherichia coli. In: Mantis N, editor. Ricin and Shiga Toxins. Berlin, Heidelberg: Springer Berlin Heidelberg; 2011. pp. 67–103. http://link.springer.com/10.1007/82_2011_176
+ 33 . 
+ Gyles CL . 
+ Shiga toxin-producing Escherichia coli : An overview . 
+ J Anim Sci . 
+ 2007 ; 85 : E45 -- E62 . 
+ https : / / doi.org/10.2527/jas.2006-508 PMID : 17085726 
+ 35 . 
+ Morelle S , Carbonnelle E , Matic I , Nassif X. Contact with host cells induces a DNA repair system in path-ogenic Neisseriae : Induction of a DNA repair system in Neisseria . 
+ Mol Microbiol . 
+ 2004 ; 55 : 853 -- 861 . 
+ 41 . 
+ Justice SS , Hunstad DA , Cegelski L , Hultgren SJ . 
+ Morphological plasticity as a bacterial survival strategy . 
+ Nat Rev Microbiol . 
+ 2008 ; 6 : 162 -- 168 . 
+ https://doi.org/10.1038/nrmicro1820 PMID : 18157153
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/28649444.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/28649444.txt 0 → 100644
View file @27818a9
+ ARTICLE OPEN Reverse
+ Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology , with potential impacts ranging from medicine to agronomy . 
+ There are several techniques used presently to experimentally assay transcription factors to target relationships , deﬁning important information about real gene regulatory networks connections . 
+ These techniques include classical ChIP-seq , yeast one-hybrid , or more recently , DAP-seq or target technologies . 
+ These techniques are usually used to validate algorithm predictions . 
+ Here , we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms . 
+ First , we developed a gene regulatory networks-simulating engine called FRANK ( Fast Randomizing Algorithm for Network Knowledge ) that is able to simulate large gene regulatory networks ( containing 104 genes ) with characteristics of gene regulatory networks observed in vivo . 
+ FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks . 
+ The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties ( built ex nihilo ) . 
+ In combination with supervised ( accepting prior knowledge ) support vector machine algorithm we ( i ) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning , and ( ii ) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future . 
+ By demonstrating that our predictions concerning the inﬂuence of the prior-knowledge structure on support vector machine learning capacity holds true on real data ( Escherichia coli K14 network reconstruction using network and transcriptomic data ) , we show that the formalism used to build FRANK can to some extent be a reasonable model for gene regulatory networks in real cells . 
+ npj Systems Biology and Applications ( 2017 ) 3:17 ; doi :10.1038 / s41540-017-0019-y 
+ INTRODUCTION
+ Gene regulation plays a key role in the control of fundamental processes in living organisms , ranging from development , to nutrition and metabolic coordination . 
+ Genes are regulated at several levels of integration but one key step is the control of gene transcription . 
+ Determining the fundamental structure of transcriptional Gene Regulatory Networks ( GRNs , considered here as the relationships of transcription factors ( TFs ) and their targets ) is a major challenge of systems biology .1 -- 4 Understanding GRNs has tremendous implications ranging from medicine to agriculture . 
+ Indeed , being able to learn GRNs may enable manipulating the cell as a system and potentially control and coordinate many physiological events that are related to GRN activity ( diseases , biotechnological applications , crop production , and more ) . 
+ The quest of systems biology is thus to determine GRN structure using machine-learning algorithms applied on transcriptomic datasets [ considered as the most exhaustive level measurement of the system to date ( commonly assayed by microarrays or next generation sequencing ) ] . 
+ Furthermore , recent high throughput experimental approaches are now drafting the GRNs backbones for many different species ,5 ranging from prokaryotes , yeast , plants to humans . 
+ We can distinguish two complementary types of approaches . 
+ The ﬁrst type is TF-centered , such as Chromatin Immuno-precipitation followed by high-throughput sequencing [ ChIP-seq , DAP-seq6 -- 12 or TARGET procedures [ Transient Assay Reporting Genome-wide 13 -- 16 Effect of TF ] . 
+ In these cases , one aims at investigating the binding activity of a particular TF across the genome or its capacity to activate its targets upon entrance in the nucleus . 
+ The second type of approach is target-centered , such as enhanced yeast-one hybrid ( eY1H ) approaches that decipher GRNs control-17 -- 22 ling a particular set of genes . 
+ These approaches , TF-centered and target-centered , can be understood as the closest proxy to experimentally determining actual GRN in living organisms .23 Interestingly , these experimental data on GNR connections are often used to validate algorithms predictions , but it can also be used as potential knowledge to train machine-learning procedures .24 Hence , the purpose of the current work is to understand : ( i ) how valuable is this GRN prior-experimental-knowledge , ( ii ) which characteristics of this prior-knowledge are potentially better in training or supervising machine-learning procedure to learn large GRNs from transcriptomic data ? 
+ In other words can we , in the near future , possibly train algorithms to decipher real regulatory connections by combining ChIP-seq , DAP-seq , eY1H , or TARGET results with transcriptomic datasets ? 
+ Since no GRN is known with sufﬁcient precision to be used as gold standard , we undertook a reverse engineering path . 
+ Indeed , training machine learning algorithms on real biological networks poses fundamental problems because these networks are not perfectly deﬁned . 
+ This kind of approach is now routinely used , in particular during the DREAM challenges [ http://dream challenges.org / ; Dialog on Reverse Engineering Assessment and Methods ] .3 , 25 -- 27 This work demonstrated that learning GRNs even from in silico simulated transcriptional data is not trivial but can still provide signiﬁcant results . 
+ During the several DREAM challenges that focused on GRNs inference , the machine learning procedures are trained on simulated gene expression , on mutant versions of the networks , as well as on perturbed networks , where expression of several genes is modiﬁed to simulate external inﬂuencing factors . 
+ Here , we use an approach which is quite different , since as mentioned above , we focus on using experimentally probed TF → target as prior-knowledge . 
+ Our rationale is very close to what has been proposed before by Cerulo and colleagues ,24 but quite different regarding the size of the simulated networks as well as the biological questions that we ask and answer . 
+ Indeed , we decided to develop our own GRN-simulating algorithm called FRANK for Fast Randomizing Algorithm for Network Knowledge , which is able to ( i ) simulate very large networks ( potentially containing as many genes as real eukaryotic genomes ~ 104 genes including ~ 103 TFs ) ; ( ii ) simulate gene expression over several thousands of simulated time points or system levels ( see below ) , ( iii ) in a relatively short computation time ( several minutes ) . 
+ The decision to work on very large networks comes with trade-off concerning mathematical formalism fully discussed thereafter . 
+ Indeed , it is worth noting that several network simulators are already available with different characteristics including the most popular : Netsim28 SynTReN29 , and GeneNetWeaver .30 , 31 But in our experience , their simulating engine based on ordinary differential equations ( ODEs ) resolution is quite slow when solving very large network dynamics and steady states . 
+ We thus undertook ( i ) a different and simpler formalism to routinely simulate and infer large networks , and ( ii ) to answer very biologist-driven questions . 
+ In this work we use FRANK to simulated GRNs and related gene expressions and use machine learning algorithms to learn back the simulated network structure to benchmark the quality of the reconstruction . 
+ However , instead of studying the machine learning algorithms themselves , we rather focused on the impact of the structure of the network , as well as the characteristics of the data needed to perform good reconstruction . 
+ In this sense , our work is a very much biologically oriented and proposes mathderived hypothesis to answer the following questions : To what extent prior-knowledge of a given GRNs would be able to improve machine-learning procedures ? 
+ What amount of prior knowledge is needed to properly infer a GRN of a given size ? 
+ Which kind of expression data ( dynamic , steady state , mixed ) are the most valuable to infer a given GRN ? 
+ Which kind of prior-knowledge ( TFcentered or Target-centered ) would be best suited to supervise inference of GRNs ? 
+ What proportion of TF or target gene expression are needed to properly infer GRNs ? 
+ Are machine learning procedures resilient to bad quality prior knowledge in inferring GRNs from it ? 
+ Herein , we propose answers to these questions derived from our in silico simulations . 
+ This paper presents the results into two complementary parts . 
+ The ﬁrst one describes FRANK the simulator and the machine learning procedure according to a mathematical/computer science perspective . 
+ The second one is biologically oriented and proposes to answer the abovementioned questions . 
+ The second part has been built to be independently read by biologists when the ﬁrst part will require more mathematical skills ( except the two ﬁrst paragraphs describing FRANK general concept related to Fig. 1 , see below ) . 
+ RESULTS
+ Part I : Mathematical and computational simulation Preview of FRANK : a large network simulator . 
+ To quickly simulate GRNs of very large size , as well as to control any aspects of the algorithm for further work , we created a simulation algorithm using the C++ language . 
+ FRANK formalism is meant to be simple and deterministic to quickly calculate gene expression for very large simulated GRNs ( Fig. 1 , see FRANK manual for full description provided in Sup . 
+ Info . 
+ 1 ) . 
+ FRANK is a software that produces ( i ) GRNs with features considered as crucial in GRN literature ( ii ) synthetic gene expression values drawn semi-randomly in accordance with the previously built network . 
+ Several input parameters ( essentially parameterizing probability distributions related to the network features and detailed below ) are tuned by the user or provided with default values . 
+ The outputs are ﬁles containing ( i ) the simulated network ( . 
+ csv ) , ( ii ) gene expression levels generated by this network ( . 
+ csv ) and heatmaps ( . 
+ png ) . 
+ FRANK was designed to quickly generate several hundreds of different large networks having different tunable parameters and their corresponding simulated expression . 
+ We have in mind to proceed further with machine learning algorithms , and evaluate the effect of changing GRNs parameters on their learning capacity ( second part of the work below ) . 
+ Network and dynamical model . 
+ The network is considered here as a directed graph ( potentially weighted ) and modeled as a large dimensional sparse matrix . 
+ This means that each gene is seen as a vertex and interaction between two genes appear as an edge with either a positive ( + ) or a negative ( − ) sign depending on the nature of this inﬂuence . 
+ Depending on the kind of model required , we may then consider two situations : purely directed graphs ( edges take values ± 1 ) ; or weighted graphs ( edges are drawn randomly from a Gaussian distribution ; for details see Sup File 1 Manual ) . 
+ The network graph ( Fig. 1a ) is encoded by a network matrix named N , containing two sub-matrices named A and B ( Fig. 1b ) . 
+ The sub-matrix A contains TF → TF edges and is squared . 
+ B contains TF → TA ( Stands for target ) edges and is not squared . 
+ All the vertices of the graph appear as the row names of the matrix N. Thus N contains null , positive and negative coefﬁcients . 
+ A null cell , at line TG ( TG can be a TF or a TA ) and column TF means that no connection exists from TF to TG . 
+ The non-null cells correspond to the edge of the graph mentioned above and are drawn from a Gaussian distribution N ( β ,1 ) where parameter β is given . 
+ Additional properties of the network are commented below . 
+ Once the network is designed , we can turn to generating the expression levels for the genes through a dynamical process that should be simple for computational reasons but likely to mimic the reality of biological complexity . 
+ Let X ( t ) be the vector of gene expression at time t decoupled in two subvectors X ( t ) = ( XTF ( t ) , XTG ( t ) ) where XTF ( t ) denotes the vector of TF expressions and XTG ( t ) stand for the expression of TG . 
+ We then assume that the margins of X are all log-normally distributed with mean µ and standard deviation σ2 and perturbed by a measurement error denoted ɛ following a centered Gaussian distribution and decomposed in accordance with X hence : X ðtÞ 1/4 expðVðtÞÞ þ ðtÞ TF εTF 
+ X ðtÞ ¼ expðWðtÞÞ þ ε ðtÞ; TG TG
+ where V ( t ) and W ( t ) are log of gene expression ( see Fig. 1b ) , with N ( µ , σ2 ) distribution and follow in addition the evolution equations : 
+ Wðt þ 1Þ WðtÞ 1/4 B VðtÞ 1/2 Model Here the square matrix A plays the role of an in nitesimal ﬁ generator hence contains the information needed to ensure the stability of the system , especially through its eigenvalues .32 Designing A and B is consequently at the core of FRANK . 
+ The system is fully determined by initial value V ( 0 ) and W ( 0 ) or equivalently X ( 0 ) ( we take ε ( 0 ) = 0 ) that may be either provided by the user or randomly generated by FRANK . 
+ In the sequel the word `` iteration '' stands for the operation X ( t ) → X ( t + 1 ) . 
+ Main features and calibration . 
+ The network structure can be parameterized for several features . 
+ In particular the user can choose a given sparsity , a minimum and a maximum number of TFs controlling a given gene . 
+ The network structure is also constrained to harbor scale free properties ( see Manual ) . 
+ In silico experiments are then computed in parallel ( following the simple formula Fig. 1c ) by using a fast exponentiation algorithm based on the dyadic decomposition of the power number ( see Manual for full details , Sup . 
+ Info . 
+ 1 ) . 
+ Figure 2 reports some examples of FRANK outputs for a simulated network containing 100 TF and 1000 TA . 
+ First , FRANK simulated GRNs display the required network parameters including in and out-scale free properties ( Fig. 2b ) . 
+ It is important to note here that , even if the network is built to comply with deﬁned parameters ( as mentioned above ) , its coefﬁcient ﬁlling is randomized . 
+ Thus , for any raw matrix N built by FRANK , the probability of having a network whose gene expression will be stable across iterations is extremely low . 
+ We however assume that network expression stability is a prerequisite to sustain a viable organism . 
+ We thus implemented an algorithmic correction of the matrix N to have it generate stable gene expression . 
+ This implementation is related to the complex eigenvalues of A ( nth eigenvalue arranged in a decreasing order of moduli is termed λn ) . 
+ More speciﬁcally the location of the eigenvalues with respect to the unit circle in the complex plane is crucial : all the eigenvalues should be inside the disk to ensure convergence , at least one must be on the unit circle to ensure stability and more than one located in the unit circle if one seeks for oscillations and periodicity of the system . 
+ But this stability condition has to be managed within a sparse matrix framework . 
+ We found no speciﬁc work that addresses the issue of complex eigenvalue location ( within the unit disk ) for large sparse matrices arising in gene regulatory network . 
+ Our solution consists of a small perturbation of the ( m-sparse ) matrix constructed earlier with IN and OUT-scale free properties . 
+ First , all coefﬁcients are standardized so that ρ ( A ) = 1 with ρ ( A ) the spectral radius of matrix A . 
+ This does not change sparsity or any initial properties of the matrix . 
+ Then we compute its eigenvalues . 
+ Since A is real , these eigenvalues are real or conjugate . 
+ We select an integer say p < 10 that accounts for the complexity required in the network . 
+ We pick the 2p conjugate eigenvalues closest to the unit disk ( they are necessarily inside the circle ) and move them vertically-up for the one with positive imaginary part and down for the other-until they reach the unit disk . 
+ This operation leads to a new matrix , say , A ′ with A ′ = A + Ep , where Ep depends only on the eigenvectors related to the 2p conjugate eigenvalues considered above and on the small purely imaginary perturbation that projects the eigenvalues onto the unit circle . 
+ The resulting A ′ is still a real matrix bit loses its sparsity in a strict mathematical sense . 
+ Switching from A to A ′ leads to new coefﬁcients with very low but non-null values ( see Fig. 3a ) . 
+ We observe a clear gap between these new network connections of low inﬂuence and the coefﬁcients of the original network . 
+ These new connections are likely to be necessary to observe stable oscillatory behavior in gene expression . 
+ This observation is further discussed below for its potential biological consequences ( see Part II ) . 
+ Learning versus inference . 
+ Along the past 20 years , statistical science provided several reliable methods for studying gene regulatory network . 
+ The standard statistical tools used to address this problem are based on the reconstruction of the network using gene expression data . 
+ We mention here that ODE systems , standard in GRN modeling , stem from distinct areas of mathematics and with different goals . 
+ Network reconstruction may be split into two different approaches . 
+ Most of the techniques infer the network : roughly speaking they try to discover all the edges simultaneously from gene expression .26 , 33 Our approach here differs substantially . 
+ Indeed , we do not apply inference algorithm but rather intend to literally learn the network ( even if the word `` learning '' is now abusively used for inference methods such as LASSO ) . 
+ More speciﬁcally we assume here , that we are given the exact structure of a piece of the network ( prior knowledge ) . 
+ We then train a learning algorithm on the known part and ﬁnally try to predict the unknown part of the network . 
+ The dichotomy between inference and learning that we underline here is important for us not just because it involves different techniques , but also because it opposes interpretability and predictability . 
+ We do not seek an easy-to-understand ( sparse or causal ) model but the best possible network prediction . 
+ The price to pay relies on using black-box methods and also on concerns in calibrating/tuning the para-meters . 
+ FRANK appears as a useful tool to carry out pure learning procedures . 
+ We describe now shortly four learning algorithm known for their reliability ( LASSO , decision trees , deep neural networks ( NN ) , support vector machine ( SVM ) ) and explain the reasons why we lastly selected the SVM . 
+ Four benchmarks methods for gene network inference and learning : reasons for selecting SVM . 
+ The LASSO is a penalized mean square program with l1 penalty ( see the historical reference ) .34 X 2 n min T 1 β yi β Xi þ γjβj ; i 1/4 1 where γ is the regularization parameter . 
+ The LASSO consequently estimates a linear regression model with an additional constraint on the 11 norm of the slope vector . 
+ It has well-known thresholding properties : the selected slope parameter usually features several zero coefﬁcients . 
+ In other words when the tuning parameter γ is chosen to be large enough , β may have a large number of null coordinates . 
+ The LASSO is very well suited to the sparse data encountered in gene regulation network and may be computed with low complexity algorithm but has some drawbacks . 
+ Indeed , an underlying model is assumed and this model is linear , the LASSO estimate , although providing nice interpretation properties , has poor prediction power and is suited for network inference , whereas , as explained earlier , we are fundamentally going through a learning approach ( refer to35 for deeper information about the LASSO ) . 
+ Classiﬁcation trees and random forests are other robust methods . 
+ Brieﬂy speaking , a classiﬁcation is a tree where each node provides a combination of input variables and each leaf is associated with a class of the output variable ( here y ) . 
+ At each step , for each node , an input variable is selected according to its ability to split the sample in the best possible way . 
+ This ability is measured by quantitative criteria such as Gini impurity , entropy or variance deﬂation . 
+ The inherent complexity of the resulting trees is balanced by pruning the tree . 
+ Pruning is usually carried out by examining the cross-validation error . 
+ Several extensions to decision trees were proposed in order to improve their performances : bagging , boosting , and random forests are the most popular .36 Conversely to the LASSO , classiﬁcation trees and their extensions by ensemble methods were designed for learning and for use in a prediction approach . 
+ However in our framework ,37 of the original data was deﬁnitely a stumbling stone and we could not carry out decision tree or random forests correctly on data with sparsity levels as observed in biological networks . 
+ Indeed , these approaches happen to run for several days on our servers without leading to any interesting results . 
+ NN are more and more popular since they proved their efﬁciency in image analysis . 
+ They consist in building a sequence of nonlinear processing ( each element of this sequence is called a layer ) to detect informative features in the data . 
+ The layer N processes the features computed at stage N − 1 and is expected to reﬁne them . 
+ The ﬁnal result may be viewed as a hierarchy of representation for the data and may be carried out either for clustering or classiﬁcation purposes . 
+ In our framework we tested several architectures of deep ( with fewer neurons ) or non-deep ( with many neurons ) NN . 
+ We also carried two classical strategies for pretraining : Stacked Denoising AutoEncoder and restricted Boltzman machines . 
+ Our experience shows that deep networks do not outperform single layers networks whenever a sufﬁcient number of neurons is involved . 
+ Besides it is not clear to us that NN are the best tool to cope with the sparse information structure of GRN . 
+ Another issue arises . 
+ Indeed when supervised learning is carried out , these NN need large amounts of data . 
+ Since here we intend to address the question of reconstructing the network from a minimal prior knowledge , NN were outperformed by the fourth and last method presented below . 
+ At last we introduce SVM slightly more deeply than the three previous methods . 
+ SVM are another popular method for classiﬁcation by machine learning .38 , 39 Consider a two class problem and suppose that we are given a training dataset { ( X1 , y1 ) , ( X2 , y2 ) , ... , ( Xn , yn ) } where Xi is a vector in Rp and yi is either − 1 or +1 depending on the class Xi belongs to . 
+ We can state ﬁrst the mathematical setting in the simplest framework . 
+ Imagine that the training dataset may be perfectly separated by a hyperplane ( Sup . 
+ Fig. 1 [ SvmPlot.pdf ] ) the sample depending on two variables x1 and x2 . 
+ For red triangle points the class is ( y = − 1 ) and for blue circles the class is ( y = +1 ) . 
+ Here the SVM computes the equation of the straight line that splits the two groups in an optimal way . 
+ Optimal here means that the corridor ( dotted lines ) around the straight line is the largest possible . 
+ The three ﬁlled points are called `` support points '' because the computation of the optimal hyperplane depends only on the points located on the edge of their groups . 
+ We can write now the SVM program with mathematical symbols : minb0 ; b1kb1k ; subject to y T i b1 Xi þ b0 1 ; where y 1/4 b T 1 X þ b0 is the hyperplane equation . 
+ It can be shown then that 1 / | | b1 | | is proportional to the `` corridor width '' . 
+ The constraints appearing on the right hand side of the equation above just reads `` points such that y = +1 are on one side of the hyperplane and points such that y = − 1 are on the other side . 
+ The description of the classiﬁcation problem above is very speciﬁc at least for three reasons . 
+ First , we assumed that two groups are strictly separated which is not true in general . 
+ Second , we take it for granted that both groups may be linearly separated . 
+ It is not hard to think of situation , where the frontier between the two groups may be a quadratic or exponential function or equations of other kinds . 
+ Third , if we turn back to the gene network problem , we should consider three possible valued for y , namely 0 ( no edge ) , +1 ( activation ) and − 1 ( inhibition ) . 
+ When the groups are not strictly separated -- which means that some points of , say group ( y = − 1 ) are mixed with points of group ( y = +1 ) -- the program above may be adapted by relaxing the constraint . 
+ It sufﬁces to replace yiðb T 1 Xi þ b0Þ 1 by y ðb T i 1 Xi þ b0Þ 1 ci where ci stands for the gap between the current point and its group 's margin .40 
+ The non-linear generalization of SVM is surprisingly not that intricate , and it essentially relies on the use of speciﬁc kernels and on Reproducing Kernel Hilbert space ( RKHS ) theory . 
+ Let K ( . 
+ , . ) 
+ be a positive kernel deﬁned on the design space ( here Rp ) and denote H the RKHS associated to K . 
+ The g ( eneral and abstra ) ct SVM program is given below . 
+ X n k k þ ð ð Þ þ Þ 1/2 min 2 b0 ; f f H θ ci yi f Xi b0 1 ci ; ci > 0 SVM i 1/4 1 ject 
+ Above all the c and θ are positive , the latter being a tuning i parameter . 
+ Given a new design point x , the decision rule stems from y ( x ) = sgn ( f ( x ) + b ) where f is a solution of the program above 0 and mayX always be written under the form : f ðxÞ 1/4 aiyiKðx ; xiÞ ; i2S where S denotes the set of active points ( i.e. , the points that match the constraints in [ SVM ] ) . 
+ The SVM program comes down to estimating the coefﬁcients ai above subject to the dual program of [ SVM ] . 
+ Although the choice of the kernel is rarely crucial we tested Gaussian ( RBF ) kernel vs. several other kernel types : polynomial of order 1 and 2 , Bessel , etc. . 
+ We kept the latter in all our work because it involves a single-bandwidth or variance-parameter . 
+ This bandwidth is selected by a cross validation approach .41 The other tuning parameter is the Lagrange multiplier interpreted as the cost for constraint violation . 
+ It was set to 1 in accordance with strategies often carried out with SVM . 
+ Finally the connection with multi-classes SVM , that is when y takes more than two values , is achieved by speciﬁc algorithm that reduces this issue to multiple binary problems .42 
+ Learning on FRANK generated data . 
+ After choosing a proper model and selecting the best method ( SVM ) the last step in our methodology consists in evaluating the learning process . 
+ To that aim we consider here essentially two scores that have a biological meaning and are insightful and classical for practitioners : the percentage of true positive ( non-null edge detected as non-null including the direction of the regulation [ positive or negative ] ) , and the percentage of false positive ( null edges detected as positive or negative ) ( see Part II ) . 
+ A trick for data selection . 
+ In our early investigations we faced a problem stemming from the sparse structure of the data . 
+ The learning algorithm mentioned above was not designed to cope with sparse data . 
+ The output values may possibly take values − 1 , 0 and 1 but we observed that whatever the method at work the predictions are essentially null . 
+ As a consequence we often faced the issue of constant ( null ) predicted values that is a very low rate of true positive . 
+ Improving the ability of the methods to detect vertices was a challenge . 
+ We introduced a trick that is inspired from boosting that artiﬁcially increases the proportion of positive ( vertices ) in the learning sample . 
+ - Simulate a n-sample of data and collect the output values ( y1 , y2 , ... , yn ) , - Keep those yi ′ s that are − 1 or + 1 , denote n ≠ the cardinal of this set of output values V 1/4 y1 ; y2 ; 1/4 yn ≠ ; - Select at random n ≠ amongst the n − n ≠ remaining null y data denoted 0 1/4 0 0 0 V y1 ; y2 ; 1/4 yn ≠ ; 
+ The four steps above tend to remove the sparsity in the learning data at the expense of a serious decrease of the sample size . 
+ Clearly the proportion of 50 % zeros/50 % non-zeros may be tuned 
+ This somewhat unusual though pragmatic change turns out to enhance predictions and meet the goals of increasing the true positive rate ( Sup Fig. 2 ) . 
+ Part II: Biological insights using FRANK and SVM
+ Benchmarking the role and the characteristics of prior knowledge and transcriptomic data to improve supervised machine learning of GRNs . 
+ In Part I , we deﬁned FRANK ( Fig. 1 ) as a rapid and effective large network simulator . 
+ FRANK provides ( i ) network modules having the characteristics that are observed in eukaryotic systems ( Figs. 1 and 2 ) and ( ii ) simulated gene expression in a large number of conditions ( Fig. 2 ) . 
+ The challenge then was to learn the network module using SVM applied on : ( i ) simulated transcriptomic data generated by the network and ( ii ) a set of given connections of the network ( priorknowledge , called alpha in the following Figures ) . 
+ Indeed , as deﬁned in the introduction , many techniques are now available to actually experimentally probe GRNs ( eY1H , ChIP-Seq , TARGET , DAP-seq ... ) that may be used to improve our GRN learning capacities . 
+ The particularity of our approach was also , not to prove that prior-knowledge was important ( it as already been demonstrated ) ,24 but rather study the characteristics of this priorknowledge as well the characteristics of the transcriptomic data . 
+ We develop this point in following sections . 
+ For each section we evaluate the accuracy of SVM learning according to its capacity to uncover connections and their directions ( positive or negative corresponding to positive or negative sign of the coefﬁcient in the network N ) . 
+ In this work , we only evaluate if a particular TF controls a particular gene , which is so far what is needed in biology . 
+ Mathematically , this corresponds to reconstructing the support of the network N plus the direction of the edges and not the coefﬁcient value per se . 
+ This evaluation is made base on two metrics : the % true positive , and the % of false positive ( see for example Fig. 4a ) . 
+ The ﬁrst one corresponds to the percentage of the correct non-null predicted edges . 
+ The second corresponds to the percentage of non-null predicted edges that are actually equal to 0 . 
+ These two values are respectively expressed throughout the manuscript as a surface reporting their relationship between the number of experimental data points ( understand simulated transcriptomic experiment ) and the prior knowledge needed expressed in % of the network N ( Fig. 4a ) . 
+ When discrete changes are evaluated such as in Figs 5 -- 10 , an additional metric is computed which corresponds to the volume under the surface ( VUS ) which is conceptually close to the popular area under the curve ( basically , a two-dimensional extension ) . 
+ Oscillatory phenomena at a whole network scale are predicted to require a decrease in network sparsity and widespread inﬂuence of TFs genome wide . 
+ Before starting with the machine learning procedure , we ﬁrst emphasize a discovery made when developing FRANK itself . 
+ Indeed , Fig. 3a presents the relationship between the coefﬁcients of the network matrix ( N ) before and after Eigen value correction that produces gene expression stability in an oscillatory mode ( Fig. 3b ) . 
+ It is stunning to observe that the overall correction does not dramatically affect the coefﬁcient values of N. Indeed , we observed a very low dispersion around the diagonal . 
+ However , it may be interesting to note that the correction leading to oscillation is creating new connections of very low inﬂuences that seem needed to maintain the stable oscillatory behavior . 
+ Furthermore , these new connections of low inﬂuence can still be distinguished from the pre-existing connections ( before Eigenva-lue correction ) . 
+ Mathematically , as evoked above , this means that strictly speaking , the high degree of sparsity of the network N seems to be incompatible with the generation of the oscillatory gene expression at the whole genome level . 
+ Biologically , this would mean that TFs involved in the oscillatory modules , controlling large networks ( up to 80 % of the genome display oscillations in plants for instance ) ,9 should display a high degree of connectivity but with potentially very low inﬂuence on many genes . 
+ This prediction may ﬁnd some echoes in the recent experimental ChIP-seq investigations of the central regulatory TF controlling circadian oscillations in plants , named CCA1 . 
+ Indeed , CCA1 ChIP-seq results demonstrated that this TF is bound to more than 1000 genomic regions representing approximately 4 -- 5 % of the genes in the genome .9 Our model may explain to some extent such an important widespread inﬂuence of a key oscillator . 
+ This may experimentally reveal mathematical constraints on overall network structure to reach stable oscillations . 
+ TA oriented prior knowledge is predicted to be superior to supervised SVM machine learning procedures to learn GRNs . 
+ To begin with , we asked whether ( i ) prior knowledge is likely to improve GRNs learning and ( ii ) what kind of prior knowledge is the most appropriate to supervise SVM . 
+ To do this , we simulated network containing 100 TFs and 1000 TAs . 
+ For each network gene expression was simulated following a `` multistart '' logic ( fully explained and studied below for its effect on learning ) . 
+ Surfaces points will always improve learning when this will be applied to real datasets and that mixed ( TA and TF-oriented ) prior knowledge will ﬁnally be used to decipher GRNs . 
+ But if one needs to invest in having more data to supervise GRN machine learning , our results show that it might be more helpful to carry out TA oriented techniques . 
+ The ﬁrst steps in gene expression preceding stable regime contain the information needed to learn GRNs . 
+ In this part , we also evaluated the kind of gene expression that contains the more information to best learn GRNs using prior knowledge . 
+ Indeed , FRANK uses an iterative process to generate gene expression data ( See part I , Fig. 1 ) . 
+ Here , this iterative process starts with the randomization of the gene expression E0 at step 0 . 
+ Then the model is applied once to reach E1 ( expression of the genome at step 1 ) . 
+ Here , we can distinguish between two ways of simulating genome expression . 
+ The ﬁrst one is named `` multistart '' ( close to Monte Carlo in spirit ) , where the above process is repeated n times by sorting out a new E0 each time . 
+ The second one is named `` dynamic '' , where E1 is used to reach E2 and so on and so forth up to En ( this progression is the one being displayed in Figs. 2c , d , 3b ) . 
+ These two concepts will be used right below . 
+ To understand what are the characteristics of the transcriptome that may contain the most information for GRN learning , we decided to build gene expression datasets containing values of multistart process reaching the step En ( Fig. 6 ) . 
+ For instance for n = 10 the experiments provided to the SVM learning are a compilation of E10 genome expression for many different E0 . 
+ What has been observed here is that the smallest increments in gene expression are the most useful to learn the network . 
+ Indeed , Fig. 6a shows that the learning capacity of a supervised SVM is clearly more efﬁcient if one uses E1 instead of E10 . 
+ This is clearly exempliﬁed by the VUS progression in response to increasing n values presented Fig. 6b . 
+ Indeed , we recorded a very marked decrease in the SVM capacity to learn the GRN from n = 1 to n = 10 that is manifest at the same time because of a decrease in the true positive as well as an increase in the false positive ( Fig. 6b ) . 
+ This means that the most useful information in the transcriptomic dataset for GRNs learning lies in the fast response following gene expression perturbation . 
+ When the gene expression reaches its steady state it would be very difﬁcult to learn the underlying GRNs . 
+ This is a pretty intuitive result but we believe that our approach provides a clear measurement and simulation of such phenomenon . 
+ Starting from this above observation , we wanted to simulate gene expression that may resemble more to the dynamics that are provided by real transcriptomic datasets . 
+ In other words , it is quite unusual to perturb cellular networks and harvest a particular time point many times ( such as in Fig. 6 ) . 
+ Actually biologists usually perform kinetics .44 -- 47 This means that perturbation is applied once and then samples are harvested across time . 
+ This is what we wanted to explore next . 
+ Thus , in Fig. 6 we performed dynamics of different sizes and measure what is the most useful to train supervised SVM . 
+ Again we observe that dynamics can be used to solve the network with a pretty good accuracy ( Fig. 6 ) in particular with shorter dynamics ( Fig. 6b ) . 
+ Interestingly , when a dynamic including the ﬁrst 16 iteration steps ( corresponding to the stabilization regime in Fig. 2c , d ) is used , the supervised SVM still performs with a good accuracy ( Figs. 6a and b ) . 
+ This result ( i ) opens very interesting perspectives concerning the applicability of this supervised learning on real datasets ( ii ) provides a good entry point to the relationship between the mathematical iteration and the real life time scale ( see Discussion ) and has been discussed and studied by others .44 -- 47 
+ TF/TA ratio matters in supervised GRN learning . 
+ The network named N is virtually built of two sub networks named A and B ( Fig. 1 ) . 
+ A contains all the TF to TF relationships , whereas B contains all the TF to TA relationships . 
+ Thus , one of our preconceptions of the system at the beginning of this study was that A is likely to process the information when B is only receiving information from A ( Fig. 1 ) . 
+ Thus , according to this , in a ﬁrst instance , one could imagine that solving or learning A would be enough to understand the whole network N . 
+ We actually found that , when the supervised learning is applied this idea is actually wrong . 
+ To evaluate the role of TA genes in the machine learning process , we decided to generate several FRANK networks having variable TF/TA ratios by ( i ) keeping the number of TF constant ( 100 ) and increasing the number of TA ( Fig. 7a ) , ( ii ) keeping the number of genes constant ( 1100 ) and varying the TF and TA number ( Fig. 7b ) . 
+ In both cases , we observed that by increasing the TF/TA ratio the GRN learning efﬁciency decreases ( Figs. 7a , b ) . 
+ Very strikingly , we even reached a very peculiar point where the learning is nearly perfect ( Fig. 7c ) with a network having 25 TF and 1075 TA . 
+ It is perfect in a sense where with a relative low number of experiments ( ~ 2000 ) and a relatively low level of prior knowledge ( 0.3 ) , SVM reach nearly 80 % of true positive and produce no false positive connections . 
+ We are perfectly aware that this situation is far from being what is found in real networks . 
+ However , we believe that this peculiar point is very informative concerning the potential of supervised SVM learning when applied to sub-networks . 
+ Furthermore , this demonstrates that TA information is very important to reconstruct the whole network . 
+ The explanation likely lies into what we have developed above concerning TA-oriented prior-knowledge ( Fig. 4b ) . 
+ Supervised machine learning algorithms are predicted to be robust to prior-knowledge errors . 
+ In the previous parts of this work , we established that prior-knowledge , in particular TA-oriented one , are key to supervised SVM and may radically help to reconstruct GRNs in a near future . 
+ Nevertheless , in the previous simulations , all the prior knowledge that we used to supervise the learning processes contained 100 % of true connections . 
+ However , in wet lab experimental conditions , it is quite known that the results of ChIP-seq , Y1H , or DAP-seq are likely to contain false positive or false negative results . 
+ We thus , wanted to test how resilient ( robust ) might be the supervised learning if errors where introduced in the prior-knowledge . 
+ To do so , we simulated three types of errors ( Fig. 8 ) . 
+ Type I errors create a certain number of false connections in the prior knowledge , having positive or negative inﬂuences . 
+ Type II errors remove a certain percentage of the actual connections in the prior knowledge . 
+ Type III errors change the directions of a certain percentage of the connections in the prior-knowledge . 
+ We thus tested the capacity of SVM to learn the actual network even though the prior knowledge was changed and noisy . 
+ Very interestingly , we observed that supervised SVM are resilient to any kind of error up to 10 % . 
+ Furthermore , we observed that supervised SVM are particularly resilient to type I errors as compared to type II and type III ( Fig. 8a ) . 
+ This can be ﬁrst explained by the sparsity of the network . 
+ Indeed , biological networks seem dense ( Fig. 2a ) but their actual connections represent a very little portion of all the possible connections between the nodes . 
+ Hence , the network ( N ) is mathematically sparse ( have a lot of 0 ) . 
+ Thus , when we provide 20 % or error for instance , it still represents a small proportion of the real connections . 
+ Furthermore , because the errors that we make are sampled randomly the probability for having the same error reproduced for two different genes is quite low . 
+ Thus , by the same principle explained above ( Fig. 4b ) , it is very likely that the SVM is detecting the artiﬁcially introduced errors . 
+ Network modularity does not impact learning capacities of supervised SVMs . 
+ In the preceding parts of this work , we focused on the learning of network modules having homogenous properties ( Fig. 2a ) . 
+ However , biological networks are expected to be modular .48 , 49 A module is by deﬁnition , a discrete entity whose function is separable from other modules but likely receiving signals from these latters . 
+ We thus ﬁrst wanted to implement FRANK simulation towards the production of modular networks . 
+ We also wanted to evaluate the effect of modular networks on the machine learning capacities of supervised SVMs . 
+ To build a modular network we combined 2 FRANK stable networks ( one being plain stable and the other one being oscillating ) ( Fig. 9a , b , see FRANK manual for details Sup . 
+ File 1 ) . 
+ The two modules are linked together by their hubs ( most connected TFs in each modules ) following the general observation of hierarchical networks .49 We were able to indeed connect two network modules displaying two different intrinsic behaviors in their gene expression though connected via some TFs ( Fig. 10b ) . 
+ We then evaluated the learning capacities of SVM on this modular network . 
+ We found that network modularity does not impact learning capacities of SVM ( Fig. 10c ) . 
+ We further applied this logic for an increasing number of modules ( number of TF and TA being constant ) . 
+ We again found that network modularity does not impact learning capacities of SVM for a modularity being higher than two ( Fig. 10d ) . 
+ Indeed , we observed that learning on row-oriented prior-knowledge performed as efﬁciently on nonmodular ( Fig. 4a ) as compared to modular networks ( Fig. 10c ) . 
+ In both cases ( modular and non-modular ) column-oriented priorknowledge dramatically decreased the learning capacities of SVM . 
+ This demonstrates that modularity ( i ) does not change the conclusions drawn concerning the structure of the priorknowledge and its inﬂuence on learning ; ( ii ) may not be a limitation to supervised learning procedures applied to real datasets as results in Fig. 5 may conﬁrm . 
+ DISCUSSION
+ In his famous essay published almost 40 years ago50 Francois Jacob describes evolution as a tinkering rather than an engineering process . 
+ We embrace this vision . 
+ Thus GRNs that we are now observing in nature are intrinsically the outcome of an iterative try-and-selection/Darwinian process . 
+ However , reverse engineering procedures can also be of great interest to understand the possibilities offered by nature to design biological objects . 
+ This is in line with the famous Richard Feynman sentence : `` What I can not create , I do not understand '' . 
+ We believe that the creation of FRANK shed some light on probable design principles of GRNs . 
+ Indeed , herein we employed reverse engineering to delineate the potential properties of big ( containing thousands of genes and ten of thousands connections ) GRNs . 
+ Doing so we uncovered an interesting feature concerning large network sparsity and stability . 
+ Indeed , we have observed that to obtain a stable network displaying oscillatory behaviors , we need to force at least two 
+ Eigenvalues to be on the unit circle . 
+ This observation is not a novelty in the mathematical ﬁeld of dynamical systems , however it is to our knowledge the ﬁrst time that this concept is related to large gene network modeling . 
+ This observation also point further towards an important potential inverse relationship between sparsity and oscillatory stability in gene expression ( Fig. 3 ) . 
+ This is a prediction of our models and it needs to be experimentally observed . 
+ This would mean that TFs are likely to have many subtle inﬂuences on a large portion of the genome , and that inﬂuence is important to maintain gene expression oscillatory stability . 
+ Some experimental observations in nature are in accordance with this fact ( see above ) .9 Another important issue that rises from our studies is the relationship between the iterative process that we modeled , and the real time scale in cell biology . 
+ Indeed , when we observe individual gene behavior during the stabilization phase of gene expression ( Figs. 2c , d ) it resembles very much what we can observe during transcriptomic analysis following treatments : genes are regulated and sometimes display an overshoot which then reaches a stable state . 
+ In nature this overshoot in gene expression can be observed within ~ 20 min with a stability phase happening within a couple of hours .44 This can vary according to the biological model , the perturbation of the network , and the GRN studied . 
+ This means that E20 ( Figs. 2c , d ) is likely to correspond to hours of treatment . 
+ Consequently we evaluate the simulated step in our iteration process to be equivalent to ~ 5 -- 6 min of cell response ( 120 min/20 steps ) . 
+ But this constitute a rough calibration that will deserve further work . 
+ When it comes to study of our capacity to learn the GRN by using supervised SVM , we wanted to draw some general conclusions that we believe can help to design future machine learning procedures on real datasets . 
+ We found that ( i ) short dynamics are more powerful to teach machine learning algorithms than longer ones ( Figs. 6 , 7 ) , ( ii ) prior knowledge greatly helps SVM to deﬁne real underlying GRNs even if it contains errors ( Figs. 1 -- 10 ) , ( iii ) prior knowledge is far more efﬁcient when it represents Target-oriented results ( Figs. 4 and 5 , such as Y1H for instance ) , ( iv ) studying the whole network connectivity in particular by studying a lot of passive genes ( TA ) is important for the learning process ( Fig. 8 ) . 
+ From these observations , even if the cell system is over-simpliﬁed in our model , we really believe that by using this approach to teach the machines the actual true connections in the network , we will be able to accurately reconstruct GRNs in a near future . 
+ We have also found that prior knowledge not only greatly improves detection of true positive connections , but also strongly decreases the percentage of false positive . 
+ Take Fig. 4a for instance . 
+ For 35 % of prior knowledge on rows the SVM algorithm is able to reconstruct 40 % of the rest of the network . 
+ This may appear to be quite low . 
+ However , it is very important to note that , after 35 % of prior knowledge the algorithm produces nearly no false positive ( Fig. 4a right surface ) . 
+ Meaning that out of 40 % of network reconstruction mainly all the connections are true ones . 
+ This would mean that one could use this newly discovered connections as new prior knowledge to further train the SVM in a next cycle of the learning process . 
+ This approach , termed boosting in computer science ,51 will certainly be an important aspect of future GRN learning algorithms . 
+ Concerning limitations , one needs to be aware that FRANK is an over-simpliﬁed version of transcriptional GRNs . 
+ In particular the coefﬁcient in the network are ﬁxed across iterations , which otherwise could be related to the inﬂuence of post-transcriptional and post-translational modiﬁcations . 
+ Furthermore , it is important to note that even if the FRANK formalism uses a sort of discrete linear system ( Fig. 1 ) , the TF to TA relationships ; XTG = f ( XTF ) in part I , are not linear but polynomial , which does not prevent the relationship to be of sigmoidal form as often observed in nature . 
+ Finally , we would like to bring the attention of the reader on one aspect , which makes our approach reasonable . 
+ On one hand , the FRANK formalism has helped to discover that TA-oriented prior-knowledge is likely to be more informative to train SVM than TF-oriented prior-knowledge ( Fig. 4a ) . 
+ On the other hand , since ( i ) this conclusion holds true for real network learning ( Fig. 5 ) and that ( ii ) the explanation of the phenomenon stems into the inherent structure of FRANK system ( Fig. 4b ) , we conclude that the FRANK formalism bring us a bit further towards the reality of GRNs in cells , despite its obvious limitations . 
+ METHODS
+ The modeling and machine learning procedures ( FRANK ) are fully described in the manual of the algorithm provided online ( Sup File 1 ) . 
+ The model equations are fully described in the Result section Part I . 
+ The learning procedures have been implemented on a DELL server using SVM package ( Kernlab ) on R ( https://www.r-project.org/ ) . 
+ Data availability
+ FRANK software can be used online via a web page ( https://m2sb.org/ ? 
+ page = FRANK ) or scripts will be provided upon request to any of the authors . 
+ ACKNOWLEDGEMENTS
+ The authors wish to thank Dr. Sandrine Ruffel , Dr. Stéphane Mari and Dr. Milos Tanurdzic for their feedback on the manuscript . 
+ This work has been supported by the CNRS PEPS-BMI ( SuperRegNet ) and the Labex Numev to G.K. and A.M. that funded C.C. post-doc . 
+ AUTHOR CONTRIBUTIONS
+ G.K. and A.M. designed the project . 
+ C.C. developed FRANK , performed computer simulations and machine learning . 
+ A.M. , G.K. and C.C. contributed to the study design during the course of the project . 
+ A.M. , G.K. , C.C. wrote the paper . 
+ ADDITIONAL INFORMATION
+ Supplementary Information accompanies the paper on the npj Systems Biology and Applications website ( doi :10.1038 / s41540-017-0019-y ) . 
+ Competing interest : The authors declare that they have no competing ﬁnancial interests . 
+ Publisher 's note : Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/28791299.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/28791299.txt 0 → 100644
View file @27818a9
+ Differential MicroRNA Analyses of
+ Increasing evidence that microRNAs ( miRNAs ) play important roles in the immune response against infectious agents suggests that miRNA might be exploitable as signatures of exposure to speciﬁc infectious agents . 
+ In order to identify potential early miRNA biomarkers of bacterial infections , human peripheral blood mononuclear cells ( hPBMCs ) were exposed to two select agents , Burkholderia pseudomallei K96243 and Francisella tularensis SHU S4 , as well as to the nonpathogenic control Escherichia coli DH5α . 
+ RNA samples were harvested at three early time points , 30 , 60 , and 120 minutes postexposure , then sequenced . 
+ RNAseq analyses identiﬁed 87 miRNAs to be differentially expressed ( DE ) in a linear fashion . 
+ Of these , 31 miRNAs were tested using the miScript miRNA qPCR assay . 
+ Through RNAseq identiﬁcation and qPCR validation , we identiﬁed differentially expressed miRNA species that may be involved in the early response to bacterial infections . 
+ Based upon its upregulation at early time points postexposure in two different individuals , hsa-mir-30c-5p is a miRNA species that could be studied further as a potential biomarker for exposure to these gram-negative intracellular pathogens . 
+ Gene ontology functional analyses demonstrated that programmed cell death is the ﬁrst ranking biological process associated with miRNAs that are upregulated in F. tularensis-exposed hPBMCs . 
+ 1. Introduction
+ The National Institute of Allergy and Infectious Diseases ( NIAID ) , the US Department of Agriculture ( USDA ) , and the Centers for Disease Control and Prevention ( CDC ) assess and classify pathogens into different categories based on potential utilization as biological warfare agents ( BWA ) . 
+ For instance , due to the risk of deliberate misuse and potential for mass casualties or devastating effects to the economy , critical infrastructure , or public conﬁdence , the CDC has classiﬁed Francisella tularensis and Burkholderia pseudomallei as Tier 1 select agents . 
+ Generally speaking , the ability to appropriately respond to a biological attack depends ﬁrst on rapid detection of the event . 
+ A number of techniques have previously been developed and evaluated to quickly detect pathogens in the environment as well as in humans suspected to have been exposed . 
+ For instance , nucleic acid-based assays can be used for the detection and identiﬁcation of microorganisms . 
+ Examples of these techniques include real-time polymerase chain reactio 
+ ( RT-PCR ) , microbial 16S ribosomal RNA gene sequencing , ampliﬁed-fragment length polymorphism polymerase chain reaction ( AFLP-PCR ) , and , more recently , repetitive element polymerase chain reaction ( REP-PCR ) DNA ﬁn-gerprinting [ 1 ] . 
+ These assays may require multiplexing in order to distinguish biological warfare agents ( BWA ) from near-neighbor species . 
+ For example , a quadruplex RT-PCR assay is required for the differentiation of Yersinia pestis from Y. pseudotuberculosis [ 2 ] . 
+ In addition to nucleic acid-based assays , other types of assays such as immunological and microbiological methods are available for detection of BWA . 
+ In the case of the prototypic BWA , Bacillus anthracis , numerous detection assays have been explored or employed -- from conventional microbiological methods , immunoassays based on surface antigens and antibodies , and PCR assays based on ampliﬁcation of unique regions of DNA to electrochemiluminescent assays based on ligands such as apta-mers and phage display-derived peptides [ 3 ] . 
+ However , there is no single detection or diagnostic assay that would serve to identify individuals who had been exposed to a pathogen within hours of exposure and prior to onset of symptoms . 
+ Such an assay could help to rapidly identify speciﬁc populations or individuals who have been exposed as well as the agent to which those individuals were exposed . 
+ This information could drastically reduce the time to identify an effective treatment and thereby increase positive clinical outcomes . 
+ Currently , host transcription proﬁles are being explored as biomarkers for infectious diseases . 
+ Ex vivo studies have provided much needed insight into the host transcriptional response to a variety of pathogens including viral [ 4 ] , bacterial [ 5 ] , and fungal [ 6 ] infectious agents . 
+ These studies have identiﬁed unique patterns in transcriptomic proﬁles to infectious agents , suggesting that transcriptome biomarkers could be a useful diagnostic tool for infectious diseases including B. pseudomallei [ 7 ] . 
+ Here , we explore how microRNA expression may change in response to exposure to BWA . 
+ MicroRNAs , also known as miRNAs , are highly conserved , 19 -- 22-nucleotide-long , single-stranded , noncoding RNA ( ncRNA ) sequences so far mainly found in eukaryotes and viruses . 
+ miRNA research is a very active area of study , and these ncRNAs have been implicated in a wide range of physiological as well as pathological processes , including inﬂammatory responses , apoptosis , growth , cancer , and neurodegenerative and cardiovascular diseases [ 8 -- 14 ] . 
+ Particularly , there is increasing evidence that miRNAs play an important role in the immune response against infectious agents , including but not limited to , Helicobacter pylori [ 15 ] , Listeria monocytogenes [ 16 ] , Actinobacillus pleuropneumoniae [ 17 ] , and Mycobacterium avium [ 18 ] . 
+ For instance , it is well known that expression of miRNAs such as miR-155 , miR-146 , miR-125 , let-7 , and miR-21 is commonly altered during bacterial infections and contributes to immune responses aimed at protecting the organism against overwhelming inﬂammation [ 19 -- 21 ] . 
+ Despite these ﬁndings , our understanding of expression patterns under normal conditions and the regulatory role of miRNAs following bacterial infections is still very limited . 
+ To investigate the temporal changes of miRNA expression in the host cell following exposure to BWA , three different bacterial strains were used : B. pseudomallei K96243 , F. tularensis SHU S4 , and Escherichia coli DH5α as a `` negative control . '' 
+ While all three are gram-negative , only B. pseudomallei and F. tularensis are intracellular and pathogenic , causing melioidosis and tularemia , respectively , which are lethal if left untreated or improperly treated . 
+ Early time points , 30 , 60 , and 120 minutes postexposure , were investigated using the RNAseq method in order to generate a candidate list of biomarkers for BWA exposure . 
+ Following RNAseq analysis , a subset of miRNA species were chosen for validation of expression proﬁles using a custom qPCR array . 
+ Finally , target genes for differentially expressed miRNAs were functionally characterized using gene ontologies . 
+ We present here our ﬁndings that hsa-miR-30c-5p is a potential biomarker for infection by the gram-negative path-ogens B. pseudomallei and F. tularensis and that programmed cell death-related genes were the largest constituent of genes predicted to be regulated by differentially expressed miRNAs at early time points in F. tularensis infection of hPBMCs . 
+ 2. Materials and Methods
+ 2.1 . 
+ Exposure of hPBMCs to E. coli , B. pseudomallei , and F. tularensis . 
+ Human peripheral blood mononuclear cells ( hPBMCs ) from two different donors : ( i ) a 53-year-old Caucasian female ( Lot number 1F3884 ) for the RNAseq experiment and ( ii ) a 38-year-old Caucasian male ( Lot number 2F3400 ) for the qPCR experiment , were obtained from Lonza ( Walkersville , MD , USA ) . 
+ Cells were thawed according to the manufacturer 's instructions , counted , and dispensed to 24-well tissue culture plates at ~ 3 × 106 hPBMCs per well . 
+ The cells were grown overnight in LGM-3 growth medium ( Lonza ) at 37 °C in 5 % CO2 . 
+ Bacterial strains B. pseudomallei K96243 and F. tularensis SHU S4 were obtained from BEI Resources ( Manassas , VA , USA ) and E. coli DH5α from Invi-trogen ( Carlsbad , California , USA ) . 
+ All strains were stored in single-use aliquots in 20 % glycerol at − 80 °C prior to use . 
+ At the time of use , a single aliquot was thawed and used to inoculate 25 ml of appropriate liquid medium . 
+ B. pseudomallei and E. coli were grown overnight in Luria broth ( LB ) , whereas F. tularensis was grown in tryptic soy broth ( TSB ) supplemented with 0.1 % cysteine for 48 hours . 
+ All cultures were incubated at 37 °C with vigorous shaking . 
+ Prior to exposure , hPBMCs were washed three times with fresh medium [ RPMI 1640 + GlutaMAX with L-glutamine ( 5 ml/500 ml ) , 10 % heat-inactivated Hyclone FBS ( 56 ° for 30 + min , 50 ml / 500 ml ) , and 0.1 % 2-mercaptoethanol ( 0.5 ml/500 ml ) ] . 
+ Using a multiplicity of infection ( MOI ) of approximately one , 3 × 106 hPBMCs per duplicate well were exposed to E. coli DH5α ( 4.3 × 106 CFU ) , B. pseudomallei K96243 ( 4.8 × 106 CFU ) , or F. tularensis SHU S4 ( 4.2 × 106 CFU ) . 
+ Following the addition of inoculum , the plates were centri-fuged at 200 × g for ﬁve minutes and this was designated as time zero ( t0 ) . 
+ Individual plates were prepared for three exposure intervals ( 30 , 60 , 120 minutes ) , and unexposed hPBMCs were included as a negative control . 
+ The plates wer incubated at 37 °C in an atmosphere of 5 % CO2 . 
+ All infections were conducted at biosafety level 3 ( BSL-3 ) . 
+ 2.2 . 
+ RNA Extraction and RNA Sequencing . 
+ Prior to bacterial exposure ( t0 ) and at each experimental time point , one plate was removed from incubation , the medium was pipetted o , and 1 ml of TRIzol ® ff ( Life Technologies ; Grand Island , NY , USA ) was added to lyse the cells and stabilize the RNA . 
+ Sterility was conﬁrmed according to internal protocols , and RNA was isolated according to the manufacturer 's instructions . 
+ The RNA purity and quality were checked using the Agilent Bioanalyzer RNA chip ( Agilent Technologies , Santa Clara , CA , USA ) . 
+ As per the Illumina TruSeq ™ Small RNA protocol , adapter ligation , reverse transcription , PCR ampliﬁcation , and pooled gel puriﬁcation steps were performed to generate a small RNA library product . 
+ The libraries were sequenced on the Illumina MiSeq System using MiSeq Reagent Kit v2 ( 2 × 51 bp read length ) using one full run per sample . 
+ As an internal positive control for alignment calculations and quantiﬁcation eficiency , 15 % PhiX Control v3 was spiked in . 
+ 2.3 . 
+ RNAseq Mapping and Identiﬁcation of miRNAs . 
+ Raw sequences from FASTQ ﬁles generated by the MiSeq sequencer were checked for quality using FastQC [ 22 ] and processed using several packages included in Consensus Assessment of Sequence And Variation ( CASAVA v 1.8.2 , Illumina ) ( Figure S1 available online at https://doi.org/10 . 
+ 1155/2017/6489383 ) . 
+ The artiﬁcially introduced RNA 3 ′ adapter ( RA3 ) with oligonucleotide sequence , TGGAAT TCTCGGGTGCCAAGGC , from a TruSeq Small RNA Sample Prep Kit ( Illumina ) was removed using trimmer v3 .0 . 
+ Posttrimmed reads which were at least 15 base pairs ( bp ) in length were aligned using ELAND ( Eficient Large-Scale Alignment of Nucleotide Databases ) against contaminants . 
+ The contaminants screened were mitochondrial 
+ DNA , 5S ribosomal RNA , adapter , poly ( A ) , poly ( C ) , and enterobacteria phage phiX174 sequences from Illumina iGe-nome , as well as human small nucleolar RNA ( snoRNA ) , long intergenic noncoding RNA ( lincRNA ) , and small nuclear RNA ( snRNA ) from Ensembl release 69 . 
+ The remaining reads which did not match to these contaminants were then aligned to human mature miRNA from the miRNA database [ 23 , 24 ] ( Release 19 ) allowing up to 2 nucle-otide ( nt ) mismatches . 
+ These analyses were performed on duplicates of each sample at each time point . 
+ 2.4 . 
+ Differential Expression ( DE ) Analysis of RNAseq Data . 
+ Replicates were assessed for correlation using the Pearson method . 
+ A time series matrix of raw counts from different time points in ascending order ( t30 , t60 , and t120 ) was created ( Supplementary Data 1 ) , and the miRNA Temporal Analyzer ( mirnaTA ) [ 25 ] was used to identify DE miRNAs . 
+ 2.5 . 
+ qPCR Conﬁrmatory Analysis . 
+ A subset of miRNAs that were identiﬁed as differentially expressed with statistical signiﬁcance ( P < 0 05 ) were chosen for further investigation using a custom miScript PCR assay ( Qiagen , USA ) . 
+ 3 reverse transcription controls ( miRTC ) , 3 positive PCR controls ( PPC ) , 6 housekeeping genes ( SNORD95 , SNORD68 , SNORD96A , SNORD61 , SNORD72 , and RNU6-2 ) , and 5 assay negative controls ( miRNAs which were not detected ( i.e. , zero count ) in the RNAseq reads ) were included in the assay . 
+ Using total RNA as the starting material for cDNA synthesis , qRT-PCR was performed using this customformatted miScript SYBR Green PCR System . 
+ Quantitative RT-PCR of miRNA expression was performed in triplicates for each time point for each experiment , and the corresponding threshold cycle ( Ct ) values were collected . 
+ Three housekeeping genes with the lowest standard deviation were selected for normalization [ 26 ] . 
+ The threshold cycle ( Ct ) values from qPCR were analyzed using the RT2 Pro-ﬁler PCR Array data analysis tool ( Qiagen ) to calculate ΔΔCt-based fold changes . 
+ 2.6 . 
+ Correlation between RNAseq and qPCR Analyses . 
+ For each group of bacterial-exposed hPBMCs , correlation coef-ﬁcients between the RNAseq data and the qRT-PCR data were calculated using the Pearson method . 
+ The expression data of miRNAs with r > 0 95 were visualized as 2D graphs . 
+ PermutMatrix [ 27 ] was also utilized to view expression data using its default parameters : Euclidean distance dissimilarity , McQuitty 's method ( WPGMA ) hierarchical clustering , and multiple-fragment ( MF ) heuristic seriation rule . 
+ 2.7 . 
+ Functional Annotation of the Targets of the DE miRNA Species 
+ 2.7.1 . 
+ Gene Target Prediction Using mirDB . 
+ For each organism , DE miRNAs were categorized as upregulated or downregulated . 
+ Then , the names of miRNAs within each category were submitted to miRDB target mining [ 28 ] to search miRNAs for gene targets . 
+ A relatively stringent parameter setting was used ; the search was restricted to only a collection of 654 functional human miRNAs instead of a total of 2588 human miRNAs available , and gene targets with more than 80 target prediction score , instead of the default score of 60 , were included . 
+ In addition , only miRNAs with fewer than 500 predicted targets in the genome were included instead of those with default 800 targets . 
+ 2.7.2 . 
+ Gene Ontology Using DAVID . 
+ The gene target predictions obtained from miRDB search performed in the above step were downloaded , and the resulting Entrez gene IDs were submitted to the Database for Annotation , Visualization and Integrated Discovery ( DAVID ) [ 29 ] with medium stringency to Homo sapiens . 
+ Gene ontology results were further analyzed for biological process ( GOTERM_BP_FAT ) , cellular component ( GOTERM_CC_FAT ) , and molecular function ( GOTERM_MF_FAT ) . 
+ The GO terms produced were too many to be plotted on a single graph , and therefore , they were consolidated into more general terms . 
+ For instance , terms such as `` positive regulation of x , '' `` negative regulation of x , '' `` regulation of x , '' and `` x '' were consolidated into one term `` x_related '' ( Supplementary Table S10 and S11 ) . 
+ The number of genes in each GO term category was then counted using an in-house Perl script and visualized on the graph using R utilities . 
+ GO functions with at least ﬁve genes ( 10 genes in E. coli ) falling into a given category were included . 
+ 3. Results
+ 3.1 . 
+ Quality Control Shows Good Sequencing . 
+ Most transcripts were evaluated to be between 20 and 30 b 
+ E. coli-exposed hPBMCs
+ B. pseudomallei-exposed hPBMCs 19 8 27 5 8 13 0
+ indicating that most of the RNA species sequenced were small RNAs such as miRNAs ( Figure S2 ) . 
+ FastQC analyses showed that nearly all bases had very high-quality scores of > Q30 which is equivalent to base call accuracy of 99.9 % or higher ( Figure S3 ) . 
+ 3.2 . 
+ 87 miRNA Species Displayed Linear Differential Expression in RNAseq Data . 
+ Posttrimmed sequencing reads 
+ ( ≥ 15 bp ) were aligned to mature miRNAs in the miRBase database with up to two nucleotide mismatches allowed . 
+ Reads from E. coli-exposed hPBMCs matched to 569 mature miRNA species , those from B. pseudomallei-exposed hPBMCs to 649 , and those from F. tularensis-exposed hPBMCs to 526 , respectively . 
+ The Pearson method showed that the duplicates had very high correlation coeficient values , r ( Figure S4 and Table S1 ) . 
+ Normalization and analysis of the raw miRNA counts at each time point were performed using mirnaTA . 
+ A total of 87 human mature miRNA species displayed differential expression with statistical signiﬁcance ( P < 0 05 ) . 
+ Speciﬁcally , in E. coli-exposed hPBMCs , there were 34 signiﬁcant DE miRNAs , of which 62 % displayed downregulation while 38 % displayed upregulation overtime ( Figure 1 ( a ) ; Table S2 ) . 
+ In B. pseudomallei-exposed hPBMCs , there were 27 signiﬁcant DE miRNAs , of which 70 % displayed downregulation while 30 % displayed upregulation overtime ( Figure 1 ( b ) ; Table S3 ) . 
+ In contrast to E. coli and B. pseudomallei , both of which caused overall decreased expression of a variety of human miRNA species , exposure to F. tularensis resulted in a more balanced change in expression proﬁles , such that out of 26 signiﬁcantly DE miRNAs , 46 % displayed downregulation and 54 % displayed upregulation overtime ( Figure 1 ( c ) ; Table S4 ) . 
+ 3.3 . 
+ qPCR Assay Conﬁrms hsa-miR-30c-5p . 
+ In order to con-ﬁrm the DE patterns observed in the RNAseq data , a custom-ized qRT-PCR array was designed . 
+ Speciﬁcally , 3 , 10 , and 18 miRNAs from E. coli - , B. pseudomallei - , and F. tularensisexposed hPBMCs , respectively ( Table S5 ) , were chosen for further investigation . 
+ In other words , due to space limitation on the miScript plate , only 36 % of the miRNA species identiﬁed in the RNAseq phase of the study ( 31 out of 87 ) were chosen for conﬁrmation . 
+ The overall objective was to identify candidate miRNA species that could serve as early bio-markers of exposure . 
+ We presumed that a biomarker that increases in expression upon exposure would ultimately make a better target for an assay ; therefore , all the miRNAs displaying increased expression in B. pseudomallei - and F. tularensis-exposed hPBMCs were chosen to be included in the PCR assay . 
+ For qPCR , the hPBMCs from a second donor ( Lot number 2F3400 ) were exposed in triplicate to the three bacterial organisms and harvested at the three time points as performed in RNAseq and RNA was extracted . 
+ For normali-zation , the three HKGs with the lowest standard deviation , SNORD95 , SNORD68 , and SNORD96A , were selected ( Table S6 ) . 
+ The threshold cycle Ct values ( Supplementary Table S7 ) were converted using log transformation , and the linear regression model was applied to calculate all the P values for each miRNA fold regulation over increasing exposure time ( Figure 2 ) ( Supplementary Table S8 ) . 
+ The expression proﬁle of each particular miRNA species as measured by qPCR was evaluated against its own expression proﬁle in the RNAseq data . 
+ Of the 3 miRNA species in E. coli - or 18 miRNA species in B. pseudomallei-exposed hPBMCs that were found to be DE with statistical signiﬁcance from RNA-seq data , none were validated by the qPCR array as being DE after exposure to those same organisms ( Table 1 ; Figure 3 ) . 
+ This may be explained by the fact that there were two different donors . 
+ In order to have maximal correlation between the RNAseq and qPCR data , hPBMCs from one individual should have been used for both assays . 
+ However , we chose to test using two different individuals since a usable miRNA biomarker would have to work consistently across different genders , ages , ethnicities , and so on . 
+ Regardless , one of the 18 DE miRNA species in the RNAseq data from F. tularensis-exposed hPBMCs , namely , hsa-miR-30c-5p , was conﬁrmed to have the same altered expression proﬁle in qPCR data . 
+ Table 2 summarizes the number of miRNA species which were DE in RNAseq , included in the qPCR array , and conﬁrmed to be DE in both RNAseq and qPCR analyses . 
+ Additionally , 4 miRNA species selected for their consistently increasing expression in F. tularensis-exposed hPBMCs from RNAseq analysis exhibited statistically signiﬁcant differential expression in the qPCR array postexposure to B. pseudomallei . 
+ These 4 miRNAs are , in ascending P value order , hsa-miR-1226-3p ( P = 0 0113 ) , hsa-miR-23b-5p ( P = 0 0136 ) , hsa-let-7d-5p ( P = 0 0175 ) , and hsa-miR-30c-5p ( P = 0 02408 ) . 
+ In E. coli-exposed hPBMCs , two miRNA species ( initially selected for their consistently increasing expression pattern in F. tularensisexposed hPBMCs ) were found to be DE : hsa-miR-485-5p ( P = 0 0170 ) and hsa-miR-4802-3p ( P = 0 0273 ) . 
+ 3.4 . 
+ A Few miRNA Species Show High Correlation between RNAseq and qPCR Analyses . 
+ In order to examine the overall correlation between RNAseq data and qPCR data collected , RNAseq data were plotted against qRT-PCR data for all the 31 miRNA species included in the miScript array for each set of bacterial-exposed hPBMCs . 
+ Subsequently , the miRNAs , with correlation coeficient , r > 0 95 , were selected and plotted in a separate graph ( Figure 4 ) . 
+ In E. coli-exposed hPBMCs , hsa-miR-4755-5p displayed a similar expression pattern in both RNAseq and qRT-PCR ( r > 0 95 ) . 
+ However , the P value of 0.1973 did not pass a signi cance threshold ﬁ value of 0.05 ( Figure 4 ( a ) ) . 
+ In B. pseudomallei-exposed hPBMCs , a total of four miRNAs , hsa-miR-3177-3p , hsa-miR-200b-3p , hsa-miR-3667-3p , and hsa-miR-424-3p , displayed similar expression in both RNAseq and qRT-PCR . 
+ Of these , hsa-miR-200b-3p was the only one found to be statistically signi cant with P = 0 0275 ( Figure 4 ( b ) ) . 
+ In F. tular-ﬁ ensis-exposed hPBMCs , ﬁve miRNAs were found to have an r > 0 95 , namely , hsa-miR-200b-3p , hsa-miR-548ai , hsa-miR-125b-5p , hsa-let-7d-5p , and hsa-miR-30c-5p . 
+ Of these , three miRNA species hsa-miR-200b-3p ( P = 0 0160 ) , hsa-miR-548ai ( P = 0 0489 ) , and hsa-miR-30c-5p ( P = 0 0274 ) met the statistical signi cance threshold ( Figure 4 ( c ) ) . 
+ ﬁ 
+ 3.5 . 
+ Multiple Testing . 
+ Since RNAseq and qRT-PCR were independent technical procedures , a combined P value was obtained by multiplying the P value of RNAseq by that of qRT-PCR data . 
+ In addition , a multiple testing correction was applied [ 30 ] . 
+ That is , since 36 miRNA species were applied in the qRT-PCR plate and 5 of them were `` negative controls , '' a factor of 31 was used for further multiplication . 
+ Even after this correction for multiple testing , upregulation of hsa-miR-30c-5p in F. tularensis-exposed hPBMCs was statistically signiﬁcant ( Table 3 ) . 
+ 3.6 . 
+ Functional Annotation of the Targets of DE miRNA Species . 
+ The miRNA species DE in each experiment were analyzed using DAVID for the discovery of their functions using GO terms ( Table S9 ) . 
+ Overall , biological process ( BP ) terms were called more frequently than cellular component ( CC ) or molecular function ( MF ) terms . 
+ This is probably reﬂected by the fact that BP GO terms are more thoroughly annotated in the literature than the others . 
+ After GO terms are grouped according to their functions and analyzed , transcription-related GO terms were found to be predominant in both downregulated and upregulated miRNAs . 
+ For the genes targeted by DE miRNAs in F. tularensis-exposed hPBMCs , there were no GO terms associated with downregulated miRNAs ; however , for upregulated miRNAs , the `` programmed cell death-related '' ranked ﬁrst . 
+ The same GO term was also found to be the fourth and sixth rankings in E. coli - and B. pseudomallei-exposed hPBMCs , respectively ( Figures 5 -- 7 ; Table 4 ) . 
+ 4. Discussion
+ In this study , PBMCs from human donors were exposed to three different gram-negative microorganisms -- E. coli as well as the classic BWA B. pseudomallei and F. tularensis -- followed by analysis of host miRNA expression proﬁles at early time points . 
+ Two different methods , namely , RNAseq and qPCR , were employed . 
+ Early time points ( 0 , 30 , 60 , and 120 minutes postexposure ) were chosen due to our interest in discovering biomarkers that could be assayed immediately following a potential BWA exposure , to identify or triage patients prior to the onset of symptoms and thereby enable swifter initiation of the appropriate treatment ( s ) . 
+ Although the time points chosen were very early , they were not too early for analysis of DE miRNA . 
+ A three-hour time point has previously been used in a genome-wide human miRNA stability analysis in more than ten different cell types [ 31 ] as well as in a study of the abundance changes in both stable and unstable retinal miRNAs in the mammalian light adaptation process [ 32 ] . 
+ In order to have maximal correlation between the RNAseq and qPCR data , hPBMCs from one individual should have been used for both assays . 
+ However , we chose to test using two different individuals since a usable miRNA biomarker would have to work consistently across different genders , ages , ethnicities , and so on . 
+ It was anticipated in this study that a generic response to a gram-negative pathogen might have been observed as a common signature in E. coli - , B. pseudomallei - , and F. tularensis-exposed cells . 
+ Whether due to differences in life style ( pathogen versus nonpathogen , intracellular versus extracellular ) of E. coli DH5α as opposed to the other two organisms , the early time points chosen , the small sample size , or some other unidentiﬁed factor ( s ) , we did not observe such a phenomenon replicated among all samples and with statistical signiﬁcance . 
+ However , there were some miRNA species that were commonly DE among different infection groups . 
+ For instance , hsa-miR-1226-3p , hsa-miR-23b-5p , hsa-let-7d-5p , and hsa-miR-30c-5p , which were initially identiﬁed as increasing in expression in response to F. tularensis exposure by RNAseq , were also found to increase in expression after exposure to E. coli or B. pseudomallei , although in the case of E. coli and B. pseudomallei exposure , this increase was detected by qPCR and not by RNAseq . 
+ Whether these variations are due to the assays employed or due to individual human genetic variation or due to some other unidentiﬁed factors , a larger number in subsequent follow-on work may help to elucidate the signiﬁcance of these miRNA species in cells exposed to B. pseudomallei , F. tularensis , and E. coli . 
+ Among the many miRNA species detected in this study , the one that stands out as having an unambiguously altered expression proﬁle in response to infection is hsa-miR-30c-5p . 
+ Intracellular bacteria often survive inside their host cells by regulating the immune response to their presence ; for example , B. pseudomallei actively downregulates the host inﬂammatory response through TssM-mediated inhibition of the NF-kappaB and type I IFN pathways [ 33 , 34 ] . 
+ The miRNA expression data presented here may be indicative of a similar host response modulation phenomenon . 
+ The overall disparity in response to the three different organisms between the two donors ' cells serves to highlight the potential importance of hsa-miR-30c-5p as a potential biomarker for further study . 
+ The RNAseq and qPCR results for hsa-miR-30c-5p were found to have a correlation coeficient of r > 0 95 , and it also passed multiple testing . 
+ This is interesting because the hPBMCs used in RNAseq and qPCR were from two different individuals , and yet hsa-miR-30c-5p still remained statistically signiﬁcant . 
+ A recent study has shown that hsa-miR-30c , along with hsa-miR-30b , acted as a negative regulator of cell death induced by loss of attachment ( anoikis ) [ 35 ] . 
+ The study also showed that anoikis resistance was acquired through downregulation of caspase-3 expression by these miRNA species and that overexpression of these miRNAs resulted in a decrease in other types of caspase 3-dependent cell death . 
+ It is known that type A F. tularensis induces caspase-3-dependent macrophage apoptosis , resulting in the loss of potentially important innate immune responses to the path-ogen [ 36 ] . 
+ Therefore , we speculate that as F. tularensis infection or exposure occurs , the expression of hsa-miR-30c-5p may increase to downregulate caspase-3 expression . 
+ In addition , this miRNA , hsa-miR-30c-5p , was also found to be differentially expressed in B. pseudomallei-exposed hPBMCs . 
+ It has been demonstrated that apoptosis induced by B. pseudomallei involves a type III translocator protein , Bip B , and its interaction with caspase [ 37 ] . 
+ It is likely that similar to F. tularensis-exposed hPBMCs , hsa-miR-30c-5p may be upregulated to control caspase-3 expression in B. pseudomallei-exposed hPBMCs . 
+ From the RNAseq versus qPCR analysis , the following miRNA species were found to have a correlation coeficient > 0.95 and also met the statistical signiﬁcance threshold ( P < 0 05 ) : in B. pseudomallei , hsa-miR-200b-3p , and in F. tularensis , hsa-miR-200b-3p , hsa-miR-548ai , and hsa-miR-30c-5p . 
+ It is remarkable that hsa-miR-200b-3p met the statistical significance threshold in this analysis for both B. pseudomallei-and F. tularensis-exposed hPBMCs . 
+ 5. Conclusions
+ Several miRNA species were identiﬁed that could be potential biomarkers for identiﬁcation of bacterially infected individuals in early stages . 
+ The most interesting miRNA in this investigation was hsa-miR-30c-5p which was signiﬁcant in all four different analyses in F. tularen-sis-exposed hPBMCs : RNAseq , qPCR , RNAseq versus qPCR correlation , and multiple testing . 
+ This miRNA was also found to be upregulated in B. pseudomallei qPCR analysis . 
+ It is our speculation that hsa-miR-30c-5p may be playing a role as a negative regulator of cell death upon infections by F. tularensis or B. pseudomallei . 
+ However , a vast amount of validation is needed to be performed before any of the proposed biomarkers could be considered potentially useful . 
+ GO term analysis revealed that programmed cell death ranked ﬁrst as the biological process involved in miRNA species differentially expressed in response to F. tularensis exposure of hPBMCs . 
+ Even though the nonpathogenic E. coli DH5α served as a negative control , we still observed programmed cell death in the fourth place of GO functions for differentially expressed miRNA species in response to E. coli exposure . 
+ Similarly , B. pseudomallei-exposed hPBMCs also display programmed cell death in the sixth place . 
+ Abbreviations
+ AFLP-PCR : Ampliﬁed-fragment length polymorphism polymerase chain reaction Biological process Biological warfare agents Cellular component Centers for Disease Control and Prevention Differentially expressed or differential expression Housekeeping genes Human peripheral blood mononuclear cells Molecular function MicroRNAs National Institute of Allergy and Infectious Diseases PPC : Positive PCR controls REP-PCR : Repetitive element polymerase chain reaction RT-PCR : Real-time polymerase chain reaction RTC : Reverse transcription controls . 
+ Additional Points
+ Availability of Supporting Data . 
+ RNAseq data sets were deposited in the Short Read Archive ( SRA ) under the following IDs : unexposed [ SRX1421931 , SRX1422013 , SRX1422015 , and SRX1422194 ] , E. coli-exposed [ SRX426123 , SRX426124 , and SRX426125 ] , B. pseudomallei-exposed [ SRX424863 , SRX426391 , and SRX426392 ] , and F. tularensisexposed [ SRX426475 , SRX42647 , and SRX426478 ] . 
+ Disclosure
+ The views expressed in this manuscript are those of the authors and do not necessarily reﬂect the oficial policy or position of the Department of the Navy , the Department of Defense , the National Institutes of Health , the Department of Health and Human Services , nor the U.S. Government . 
+ Vishwesh P. Mokashi is a military service member of the U.S. Government . 
+ This work was prepared as part of hi oficial duties . 
+ Title 17 U.S.C. § 105 provides that `` Copyright protection under this title is not available for any work of the United States Government . '' 
+ Title 17 U.S.C. § 101 deﬁnes a U.S. Government work as a work prepared by a military service member or employee of the U.S. Government as part of that person 's oficial duties . 
+ The authors declare that they have no competing interests.
+ Authors Contributions ’
+ Regina Z. Cer primarily performed bioinformatics analyses , wrote custom Perl and R scripts , and drafted the manuscript . 
+ J. Enrique Herrera-Galeano oversaw bioinformatics and statistical analyses and wrote R algorithms . 
+ Kenneth G. Frey oversaw qRT-PCR experiments and fold regulation analyses and provided scientiﬁc advice throughout . 
+ Kevin L. Schully performed bacterial infections , RNA extractions , and sterility tests and provided scienti c advice . 
+ Truong V. ﬁ Luu performed RNA extractions , library constructions , MiSeq DNA sequencing , and qRT-PCR reactions . 
+ John Pesce initially conceived and designed the project and wrote the proposal . 
+ Vishwesh P. Mokashi provided scientiﬁc advice throughout the project . 
+ Andrea M. Keane-Myers conceived the project , performed bacterial infections , and oversaw the project . 
+ Kimberly A. Bishop-Lilly conceived the project , cowrote the proposal , and oversaw the miRNA sequencing , qRT-PCR experiments , and the manuscript writing process . 
+ All authors read and approved the ﬁnal manuscript . 
+ Acknowledgments
+ The authors would like to thank Ms. Kathleen Verratti for her technical assistance with the MiSeq sequencer during the initial stages of this project and Mr. Matthew G. Bell for the preparation of the hPBMCs . 
+ This work was supported by the Defense Threat Reduction Agency ( DTRA ) Grant CBM.DIAGB .03.10 . 
+ NM .028 .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/28911122.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/28911122.txt 0 → 100644
View file @27818a9
+ Data exploration, quality control and statistical
+ 1Department of Statistics , University of Wisconsin-Madison , Madison , WI 53706 , USA , 2Department of Public Health Sciences , Medical University of South Carolina , SC 29425 , USA , 3Great Lakes Bioenergy Research Center , University of Wisconsin-Madison , Madison , WI 53726 , USA , 4Department of Biochemistry , University of Wisconsin-Madison , Madison , WI 53706 , USA , 5Department of Bacteriology , University of Wisconsin-Madison , Madison , WI 53706 , USA and 6Department of Biostatistics and Medical Informatics , University of Wisconsin-Madison , Madison , WI 53792 , USA 
+ Received April 13, 2017; Revised June 02, 2017; Editorial Decision June 27, 2017; Accepted July 12, 2017
+ ABSTRACT
+ ChIP-exo/nexus experiments rely on innovative modiﬁcations of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites . 
+ Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq , these high throughput experiments pose a number of unique quality control and analysis challenges . 
+ We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package , ChIPexoQual , to enable exploration and analysis of ChIP-exo and related experiments . 
+ ChIPexoQual evaluates a number of key issues including strand imbalance , library complexity , and signal enrichment of data . 
+ Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage . 
+ We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple , new ChIP-exo datasets from Escherichia coli . 
+ ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data . 
+ INTRODUCTION
+ Chromatin Immunoprecipitation followed by exonuclease digestion and next generation sequencing ( ChIP-exo ) is currently one of the state-of-the-art high throughput assays for profiling protein-DNA interactions at or close to single base-pair resolution ( 1 ) . 
+ It presents a powerful alternative to popular ChIP-seq ( chromatin immunoprecipitation coupled with next generation sequencing ) assay . 
+ ChIP-exo experiments first capture millions of DNA fragments ( 150 -- 250 bps in length ) that the protein under study interacts with , using a protein-specific antibody and random fragmentation of DNA . 
+ Then , - exonuclease ( - exo ) is deployed to trim the 5 ′ end of each DNA fragment to each protein-DNA interaction boundary . 
+ This step is unique to ChIP-exo and aims to achieve significantly higher spatial resolution compared to ChIP-seq . 
+ Finally , high throughput sequencing of a small region ( 36 -- 100 bps ) at the 5 ′ end of each fragment generates millions of reads . 
+ Similarly , ChIP-nexus ( Chromatin Immunoprecipitation followed by exonuclease digestion , unique barcode , single ligation and next generation ligation ) ( 2 ) is a further modification on the ChIP-exo protocol . 
+ ChIP-nexus aims to overcome limitations of ChIP-exo by yielding high complexity libraries with numbers of cells comparable to that of ChIP-seq experiments . 
+ This is achieved by reducing the numbers of ligations in the standard ChIP-exo protocol from two to one , and adding unique , randomized barcodes to adaptors to enable monitoring of overamplification . 
+ In addition to these , several other high-resolution protocols have also been considered . 
+ In X-ChIP and ORGANIC ( 3,4 ) , the DNA is fragmented by the application of endonuclease and exonuclease enzymes and then stabilized by sonication . 
+ The main difference between these two protocols is that in X-ChIP , the cells are crosslinked with formaldehyde and then the DNA is extracted by cell lysation , while the ORGANIC protocol achieves this step by nuclear isolation . 
+ Currently , ChIP-exo seems to be the more commonly adapted high-resolution protocol . 
+ Figure 1A illustrates the differences between distinct ChIP-based protocols : ChIP-exo , ChIP-nexus , single-end ( SE ) ChIP-seq , paired-end ( PE ) ChIP-seq . 
+ The 5 ′ ends from a ChIP-exo/nexus experiment are clustered more tightly around the binding sites of the protein than in a ChIP-seq experiment . 
+ In a PE ChIP-seq experiment , both ends are sequenced as opposed to only the 5 ′ end in a SE ChIP-seq . 
+ Although ChIP-exo/nexus protocols are being adopted by the research community , features of ChIP-exo data , specially those pertaining to data quality , have not been investigated . 
+ First , DNA libraries generated by the ChIP-exo protocol are expected to be less complex than the libraries generated by ChIP-seq ( 5 ) because digestion by-exo aims to reduce the number of individual genomic positions , to which sequencing reads can map , to small regions located around the actual binding sites . 
+ Therefore , in high quality and deeply sequenced ChIP-exo datasets , it is possible to observe large numbers of reads accumulating at a small number of bases due to actual signal rather than overamplification bias as commonly observed in ChIP-seq experiments . 
+ Second , although we expect approximately the same numbers of reads from both DNA strands at a given binding site , there may be locally more reads in one strand than in the other , owing to - exo efficiency , ligation efficiency , or other factors . 
+ This is an important point with implications on the statistical analysis of ChIP-exo data . 
+ Specifically , currently available ChIP-exo specific statistical analysis methods ( e.g. MACE ( 6 ) , CexoR ( 7 ) and Peakzilla ( 8 ) ) rely on the existence of peak-pairs formed by forward and reverse strand reads at the binding site . 
+ Finally , most of current widely used ChIP-seq quality control ( QC ) guidelines ( 9 -- 11 ) may not be directly applicable to ChIP-exo data . 
+ To address these challenges , we develop a suite of diagnostic plots and summary statistics and implement them in a versatile R/Bioconductor package named ChIPexoQual . 
+ The overall pipeline takes into account the characteristics of ChIP-exo/nexus data and addresses the critical shortcomings of the currently available QC pipelines that are not particularly tailored for ChIP-exo/nexus data ( 9 -- 10,12 -- 13 ) . 
+ We apply this pipeline to a large collection of public and newly generated ChIP-exo/nexus data and we validate the QC pipeline by evaluating the samples for features that capture high signal to noise , such as occurrences of motifs recognized by the profiled DNA interacting protein and also utilize blacklisted regions as identified by the ENCODE consortium . 
+ MATERIALS AND METHODS
+ ChIP-seq/exo/nexus datasets
+ E. coli ChIP-exo and ChIP-seq samples . 
+ For simplicity , we introduce some abbreviations for the Escherichia coli 70 ChIP-exo ( E ) , PE ChIP-seq ( P ) , and SE ChIP-seq ( S ) samples . 
+ We denote the data generated in the first ( second ) batch as E1 ( E2 ) , P1 ( P2 ) and S1 ( S2 ) . 
+ Summaries of the growth conditions and sample IDs for the ChIP-exo samples are included in Table 1 . 
+ The SE and PE ChIP-seq samples generated under the same conditions share the same Id . 
+ convention . 
+ The procedures for sample preparation and sequencing are described in the supplement . 
+ The ChIP-exo experiments followed the protocol 7 described in ( 1 ) . 
+ Processing of the ChIP-exo and ChIP-nexus samples . 
+ We aligned the ChIP-exo/nexus samples in Table 2 by following the descriptions listed in their respective publications . 
+ When the alignment settings were not discernible in the original publication , we used bowtie ( version 1.1.2 ) ( 14 ) . 
+ We aligned the E1 samples of Table 1 with bowtie-q - m 1 - l 55 - k 1 -5 3 -3 40 -- best - S and the E2 samples using bowtie - q - m 1 - v 2 -- best . 
+ The average read lengths were 102 and 52 bp for the E1 and E2 samples , respectively . 
+ Hence , to make the alignments for both samples comparable , we trimmed 40 bp from the 3 ′ ends of the reads in the E1 samples . 
+ We trimmed 3 bp from the 5 ′ end to remove the adaptors in the E1 samples . 
+ ChIP-exo and ChIP-seq peak calling with MOSAiCS to identify high signal peaks
+ MOSAiCS ( 15 ) is a model-based approach for the analysis of ChIP-seq and ChIP-exo data . 
+ We used MOSAiCS to identify sets of highly significant peaks for ChIP-exo and ChIP-seq under the GC + Mappability and InputOnly modes for background estimation , respectively . 
+ Subsequently , we called peaks with a 5 % FDR and a threshold of at least 100 extended fragments . 
+ Generation of a set of high signal regions from E. coli samples to assess strand imbalance
+ We partitioned the E. coli genome into non-overlapping intervals of length 150 bp and counted the number of reads overlapping each interval . 
+ As is usually the practice with ChIP-seq analysis , each read was extended to the average fragment length of 150 bp toward the 3 ′ direction . 
+ To evaluate the strand imbalance , we identified a set of high signal peaks for ChIP-exo and SE ChIP-seq . 
+ The subset of these peaks for which dPeak ( 16 ) analysis identified one or more binding events were used in FSR assessments ( Figure 1B and Supplementary Figure S1E ) . 
+ Existing next generation sequencing data QC metrics and methods
+ We used the ChIP-seq QC metric definitions established by the ENCODE consortium ( 10,11 ) , and described in detail at https://genome.ucsc.edu/ENCODE/qualityMetrics . 
+ html . 
+ These QC metrics were calculated with the ChIPUtils package ( version 0.99.0 from https://github.com/keleslab/ ChIPUtils ) . 
+ Empirical data from the ENCODE project suggests the following guidelines for interpretation of the QC metrics for human and mouse genomes : a PBC value between 0 -- 0.5 indicates severe bottlenecking , 0.5 -- 0.8 moderate bottlenecking , 0.8 -- 0.9 mild bottlenecking and 0.9 -- 1 no bottlenecking . 
+ In addition to ENCODE QC metrics , we considered FASTQC ( version 0.11.5 ) and htSeqTools ( version 1.16.0 ) ( 9 ) for assessing the overall quality of the ChIP-exo/nexus sequences . 
+ Collectively , these encompass all the metrics available for read-level data in ChiLin ( 13 ) , which is another QC tool for ChIP-seq and DNase-seq , and Q-nexus ( 12 ) , which is a ChIP-nexus analysis pipeline with QC features that are similar to that of FASTQC . 
+ The remaining metrics calculated by the ChiLin pipeline require the use of a peak calling algorithm or external data ( such as DNas hypersensitive sites ) and , therefore , are not utilized in our evaluations . 
+ Blacklisted regions in eukaryotic genomes
+ For the mm9 , hg19 , and dm3 genomes , we used the blacklists generated by the ENCODE consortium ( 17 ) , available at https://sites.google.com/site/anshulkundaje/projects/ blacklists . 
+ These lists consist of genomic segments for which next-generation sequencing experiments produce artificially high signal . 
+ These lists were empirically derived from large compendia of data generated by the ENCODE and modENCODE consortia , respectively . 
+ ChIP-exo quality control with R package ChIPexoQual We implemented our proposed QC pipeline with an R/Bioconductor package named ChIPexoQual , available at http://bioconductor.org/packages/release/bioc/html/ ChIPexoQual.html . 
+ The analysis in this paper used version 1.0.0 of the ChIPexoQual package . 
+ ChIPexoQual : The package takes a set of N aligned reads from a ChIP-exo ( or ChIP-nexus ) experiment as input and performs the following steps . 
+ 1 . 
+ Identify read islands , i.e. overlapping clusters of reads separated by gaps , from read coverage . 
+ The gaps are defined as the union of positions in the genome with fewer than h * ( default = 1 ) aligned reads . 
+ The remaining is lands can be interpreted as the natural partition of the genome determined by a ChIP-exo/nexus experiment . 
+ 2 . 
+ Compute Di , number of reads in island i ; Ui number of positions in island i with at least one aligning read ; and Wi , the width of island i defined as the total number of bases in the island , i = 1 , · · · , I. 3 . 
+ For each island i , i = 1 , · · · , I , compute island statistics : D U ARCi = i , URC i Wi Di i = , FSRi = ( # of fwd . 
+ strand reads aligning to island i ) / Di , 4 . 
+ Generate diagnostic plots ( i ) URC vs. ARC plot ; ( ii ) Region Composition plot ; ( iii ) FSR distribution plot . 
+ 5 . 
+ Randomly sample without replacement M ( at least 500 , default = 1000 ) islands and fit , = + + Di β1Ui β2Wi εi , where ε denotes the independent error term . 
+ Repeat this i process B ( default = 1000 ) times and generate box plots of estimated and . 
+ 1 2 
+ Interpretation of the linear model in the QC pipeline . 
+ The linear model 
+ Di = β1Ui + β2Wi + εi
+ is a re-parametrization of the following relationship from URC vs. ARC diagnostic plot : κ URC = + γ + ε i i ARCi with = 1 and = − . 
+ In this setting , can be 1 / 2 / considered as the large-depth URCi , i.e. the limiting ratio between the number of positions with at least one mapping read and depth as the depth tends to infinity . 
+ Equivalently , = 1 can be interpreted as the average number 1 / of aligned reads per unique position when the sequencing depth is large . 
+ To interpret = − , we express as a 2 / function of ARC and URC and assume that is already estimated . 
+ Then , 
+ ARC lim ARC , γ W γ W D → ∞ U ( D ) where approximates the URC as the sequencing depth increases . 
+ In a low quality experiment where reads accumulate in a few number of positions due to PCR amplification bias or other artifacts , several reads are expected to repeatedly align to the same collection of unique positions , making the term involving the limit diverge from ARC . 
+ In contrast , in a highquality experiment , / is expected to converge to zero because the expression with the limit approximates ARC . 
+ The ChIPexoQual pipeline is enriched by the following two additional modules that are utilized when the sequencing depth is high and/or blacklisted regions are available . 
+ i. Subsampling analysis . 
+ For high depth datasets ( e.g. , ≥ 60M reads for human and mouse samples ) , we subsample N1 < N2 < · · · < N reads , starting with N1 = 20M reads and up to 50M reads in 10M increments as default , and apply steps 1 to 5 for each of the subsampled datasets . 
+ ii . 
+ Blacklisted regions analysis . 
+ The islands identified by ChIPexoQual are separated into two different collections based on their overlap with a set of blacklisted regions . 
+ Then , the 1 and 2 scores are estimated for both collections and compared against the all island scores . 
+ Motif analysis of FoxA1 and TBP enriched regions
+ For each ChIP-exo/nexus sample , we used the ChIP-exo QC pipeline to partition its reference genome into a set of islands with their respective summary statistics . 
+ We then filtered them into collections of high quality regions as follows : i. FoxA1 experiments : we removed the islands with ( i ) reads residing only on one strand ; ( ii ) Ui ≤ 15 ; ( iii ) Di ≤ 100 . 
+ ii . 
+ For TBP experiments : we removed the islands with ( i ) reads residing only on one strand ; ( ii ) Wi < 50 or Wi ≥ 2000 bp ; ( iii ) Ui ≤ 15 ; ( iv ) Di ≤ medianjDj . 
+ These thresholds were empirically selected . 
+ To validate their robustness , we performed an analogous analysis by using the regions that overlapped a set of peaks ( identified by MOSAiCS at FDR 5 % ) with width larger than 3 × rl , where rl is the median read length of the experiment ( Supplementary Figures S34 and S35 ) . 
+ The width filter was not applied to the TBP ChIP-exo samples , and accordingly to the ChIP-nexus samples for consistency , since they exhibited over-amplification ( 2 ) . 
+ We used FIMO ( version 4.9.1 ) ( 18 ) to identify the FoxA1 and TBP motifs within each enriched region using the FoxA1 MA0148 .1 and TBP MA0108 .1 position weight matrices from the JASPAR database ( 19 ) , respectively . 
+ For the FoxA1 experiments we used the default parameters and for the TBP experiments we considered all motifs identified with FIMO p.value < 0.05 . 
+ RESULTS
+ Publicly available ChIP-exo/nexus and novel E. coli ChIP- seq/exo datasets
+ We utilized a rich collection of publicly available ChIP-exo/nexus data from multiple organisms to build and evaluate our quality control pipeline ( Table 2 ) . 
+ These include : CTCF factor in human HeLa cell lines ( 1 ) ; ER factor in human MCF-7 cell lines ( 20 ) ; GR factor in IMR90 , K562 and U2OS human cell lines ( 21 ) ; TBP factor in human K562 cell lines ( 22 ) ; H3 histone in S. cerevisiae where most , but not all of the tail was deleted ( 1-28 ) ( 23 ) . 
+ ChIP-nexus data included experiments from ( 2 ) profiling TBP in human K562 cells , MyC and Max in D. melanogaster S2 cell lines , and Twist and Dorsal in D. melanogaster embryo . 
+ In order to have a setting where we can compare SE and PE ChIP-seq with their ChIP-exo counterpart , we profiled 70 under a variety of conditions in E. coli with ChIP-exo ( Table 1 ) , SE and PE ChIP-seq . 
+ Collectively , we generated 70 factor ChIP-exo , PE and SE ChIP-seq experiments under aerobic ( + O2 ) and anaerobic ( − O2 ) conditions in glu cose minimal media . 
+ For simplicity , we named these experiments as E1 , P1 and S1 , respectively . 
+ Similarly , we generated 70 factor ChIP-exo and PE ChIP-seq experiments in E. coli under aerobic ( + O2 ) conditions with and without rifampicin treatment . 
+ We also named these experiments E2 and P2 , respectively . 
+ ChIP-exo versus ChIP-seq: general features
+ We first compared ChIP-seq and ChIP-exo in terms of data features that are well studied in ChIP-seq studies . 
+ Our 70 ChIP-seq and ChIP-exo samples from E. coli are especially well suited for this task since they are all deeply sequenced compared to the genome size of E. coli . 
+ Figures 1B -- C summarize this comparison for one biological replicate of ChIP-exo and ChIP-seq experiments from the same biological conditions ( samples E1-1 from Table 1 , P1-1 and S1-1 following the same Id . 
+ convention ) . 
+ Peak-pair assumption . 
+ We evaluated the peak-pair assumption , i.e. a cluster of reads in the forward strand located on the left-hand-side of the binding site is usually paired with a cluster of reads located on the right-hand-side of the binding site in the reverse strand . 
+ This observation is commonly utilized in designing statistical analysis methods for ChIP-exo data ( 6 -- 8 ) . 
+ We considered the set of peaks identified in both the ChIP-seq and ChIP-exo samples as high quality peaks ( Materials and Methods ) and calculated the proportion of forward strand reads in these regions ( Figure 1B and Supplementary Figures S1 -- S3 ) . 
+ This plot reveals a higher level of strand imbalance for ChIP-exo compared to ChIP-seq . 
+ Potential reasons for this observation include ligation efficiency , efficiency of - exo digestion , and single-stranded protein-DNA interactions . 
+ Overall , such an imbalance is more likely to occur in low complexity libraries . 
+ Read distributions within signal and background regions . 
+ Using extended raw read counts within 150 bp non-overlapping intervals , i.e. , bins interrogating the genome , Figure 1C depicts that , as observed by others , ChIP read counts from ChIP-exo and ChIP-seq are linearly correlated especially at high read counts . 
+ This indicates that signals for potential binding sites are well reproducible between ChIP-exo and ChIP-seq data . 
+ In contrast , there is a clear difference between the two data types for bins with low read counts , highlighting potential differences in the background read distributions of these data types . 
+ Comparisons with other paired E. coli ChIP-seq and ChIP-exo samples led to similar conclusions ( Supplementary Figures S1 -- S3 ) . 
+ Mappability and GC-content bias . 
+ We next evaluated ChIP-exo data of CTCF in HeLa cells ( 1 ) to investigate biases inherent to next generation sequencing experiments with eukaryotic genomes . 
+ Figures 1D and E ( Supplementary Figure S4 ) display the bin-level average read counts against mappability and GC-content . 
+ Each data point is obtained by averaging the read counts across bins with the same mappability of GC-content . 
+ These biases , increasing linear trend with mappability and non-linear trend with GC-content , are similar to those observed in ChIP-seq datasets ( 15,24 -- 25 ) . 
+ This observation indicates that analysis of ChIP-exo data should benefit from methods that take into account apparent sequencing biases such as mappability and GC content , mostly when an input control sample is not available to account for variability in the background read distribution . 
+ Existing high throughput sequencing quality control metrics applied to ChIP-exo/nexus data
+ We processed the ChIP-exo/nexus samples with FASTQC and observed that in 73.33 % and 93.33 % of the cases , at least a warning is raised for sequence duplication levels and kmer content representation ( Supplementary Table S1 ) , respectively . 
+ The former assumes that most sequences will occur only once in a diverse library and the latter assumes that any small fragment should not have a positional bias in its appearance within a library . 
+ Clearly , these assumptions are not appropriate for ChIP-exo/nexus data , as the exo-enzyme is expected to stop its digestion when it reaches the crosslinking protein . 
+ The ENCODE consortium established empirical and widely used QC metrics on ChIP-seq data ( 10 ) . 
+ We evaluated how these metrics , namely PCR Bottleneck Coefficient ( PBC ) , Normalized Strand Cross-Correlation ( NSC ) , and Relative Strand Cross-Correlation ( RSC ) defined at https : / / genome.ucsc.edu/ENCODE/qualityMetrics.html ( 10,11 ) . 
+ Tables 1 and 2 present these metrics for the collection of ChIP-exo/nexus datasets we consider in this paper . 
+ Marinov et al. ( 11 ) discussed that highly complex ChIP-seq libraries can become exhausted by deep sequencing . 
+ Hence , the PBC is expected to decrease as the sequencing depth increases . 
+ This effect is expected to be more severe in ChIP-exo/nexus as DNA libraries generated by those protocols are expected to be less complex than the libraries generated by ChIP-seq because the numbers of positions to which the reads can align to are reduced due to the exonuclease digestion . 
+ This affects the interpretation of the PBC , which is defined as the ratio of the number of genomic positions to which exactly one read maps to the number of genomic positions to which at least one read maps . 
+ For ChIP-seq samples , low PBC values ( e.g. , ≤ 0.5 ) indicate high levels of PCR amplification bias , i.e. PCR bottleneck , unless the sequencing depth is high enough to saturate all targets of the factor profiled . 
+ In contrast , for ChIP-exo/nexus , exonuclease digestion will lead to reads with same exact 5 ′ end even before the PCR amplification step . 
+ We note that the PBC values are especially low for deeply sequenced ChIP-exo and ChIP-nexus samples ; however , this does not automatically indicate severe bottlenecking as suggested by standard ChIP-seq guidelines . 
+ Planet et al. ( 9 ) presented in the R/Bioconductor package htSeqTools the Standardized Standard Deviation ( SSD ) as a metric to assess enrichment efficiency and to compare across samples . 
+ According to the guidelines established by the authors , higher values of this metric indicates high-quality . 
+ We calculated the SSD coefficient for all the ChIP-exo/nexus samples ( Tables 1 and 2 ) . 
+ Detailed examination of these results reveals a key shortcoming of this metric as the propensity to label samples with low library complexity as higher quality because the reads in such sam ples align to fewer positions in the genome . 
+ For example , when comparing the ChIP-exo/nexus TBP samples , the use of this metric suggests that the deeply sequenced ChIP-exo samples ( replicates 2 and 3 ) exhibit higher quality than the first ChIP-nexus replicate . 
+ This is in contrast to evaluation of these datasets with an independent , motif-based metric as we discuss below . 
+ The Strand Cross-Correlation ( SCC ) , introduced by Kharchenko et al. ( 26 ) , is a commonly used quality metric in assessing ChIP-seq enrichment quality . 
+ It aims to quantify how well the reads mapped to each strand are clustered around the locations of the protein -- DNA interaction sites by calculating the Pearson correlation between forward and backward strands reads by shifting them across a range that covers both the read length of the experiment and the expected average fragment length . 
+ Typical SCC profiles exhibit two local maxima : at the average fragment length and the read length . 
+ In high quality experiments with clear ChIP enrichment , the average fragment length maximum coincides with the global maximum . 
+ In an idealized ChIP-exo experiment where the DNA fragments are digested to the boundaries of the protein -- DNA interaction sites , the SCC profile is expected to maximize at the motif length indicating clustering of the forward and reverse strand reads around the binding site . 
+ This hinders the interpretation of SCC for a ChIP-exo/nexus experiment since it is now maximized at an unobserved shorter fragment length that is confounded with the ` phantom peak ' at the read length . 
+ Carroll et al. ( 27 ) studied the impact of blacklisted regions and duplicated reads when calculating the SCC for ChIP-exo data . 
+ The authors showed that there is a dramatic effect in the SCC profile when removing duplicated reads but the effect of removing the blacklisted regions may be specific in few positions of the SCC profile and suggested to calculate the SCC using only aligned reads that overlap the experiment 's set of peaks but do n't overlap a set of predefined blacklisted regions . 
+ Several biases are introduced into the computation of this modified SCC , because it requires the use and tuning of a peak calling algorithm . 
+ Furthermore , in a lower quality experiment , the peaks may not correspond to actual binding sites . 
+ Figure 1F displays the SCC curves for the CTCF HeLa samples where the ChIP-exo curve actually shows local maxima at 12 bp and the read length , while the SE ChIP-seq curves have an expected local maxima at the read length and a global maxima at the average fragment length . 
+ SCC profiles for other samples are available in Supplementary Figures S5 to S14 . 
+ In ChIP-exo experiments , the read length and the fragment length peaks in the SCC are confounded . 
+ Furthermore , the former is close in proximity to the motif length ; as a result , this may incorrectly suggest experiments to be marginally successful or even failed ( e.g. Supplementary Figure S8 ) and renders QC metrics such as the Normalized Strand Cross-Correlation ( NSC ) or the Relative Strand Cross-Correlation ( RSC ) harder to interpret . 
+ However , in majority of the cases we present , the profile itself seems informative about the enrichment signal in ChIP-exo nexus / experiments . 
+ ChIP-exo quality control pipeline ChIPexoQual To address the limitations of available analytical exploration approaches discussed above , we developed ChIPexoQual . 
+ In Table 3 , we compare ChIPexoQual against the existing tools discussed above . 
+ We highlight that ChIPexoQual provides a global view of both library enrichment and complexity , and detailed diagnostic plots for the balance between the two . 
+ We first present the overall pipeline and then discuss individual components with a case study using ChIP-exo data of FoxA1 from ( 20 ) and ChIP-nexus data from ( 2 ) . 
+ Figure 2 summarizes the 4-step pipeline and the two additional modules . 
+ Given aligned reads from a ChIP-exo/nexus sample , the first step partitions the reference genome into islands representing overlapping clusters of reads separated by gaps by removing the regions with fewer than h * aligned reads . 
+ In step 2 , the total number of reads overlapping each island ( Di ) and the number of island positions with at least one aligned read ( Ui ) are recorded . 
+ Then , three summary statistics ARCi , URCi , and FSRi are computed for each region i. ARCi denotes the average read coefficient and is defined as the ratio of the number of reads in island i ( Di ) to the width of the island i ( Wi ) ; URCi , unique read coefficient , quantifies the inverse of the effective coverage and is defined as the ratio of the number of genomic positions with at least one aligned read within island i ( Ui ) to the number of reads in island i ( Di ) ; and FSRi denotes the proportion of forward strand reads . 
+ Step 3 of the pipeline generates several diagnostic plots aimed at quantifying ChIP enrichment and strand imbalance , and step 4 generates quantitative summaries of these diagnostic plots . 
+ Figure 2A presents the typical behavior of the URC vs. ARC plot for a high quality ChIP-exo sample . 
+ In general , the plot depicts two strong arms . 
+ The left arm , with low ARC and varying URC values , corresponds to background islands , regions that are usually composed of scattered reads that were not digested during the exonuclease step . 
+ The right arm where the URC decreases as the ARC increases corresponds to regions that are usually ChIP enriched . 
+ As a result , this arm depicts the balance between library enrichment and complexity . 
+ Low URC in this arm corresponds to regions composed by reads concentrated in a smaller number of positions . 
+ We quantify the shape of the URC versus ARC plot by the use of two estimated parameters : 1 which represents the average number of reads aligned to the unique positions in large depth regions and 2 which represents the overall change in depth as the width varies across a large set of regions . 
+ These parameters are estimated by sampling experiments on the original samples . 
+ We provide further details on how to obtain these later in the paper where we apply the pipeline to a large collection of ChIP-exo/nexus experiments . 
+ Figure 2B and C present the typical behavior of the Region Composition and Forward Strand Ratio ( FSR ) distribution plots , both of which quantify the strand imbalance as part of the QC pipeline . 
+ The Region Composition plot depicts how quickly the ratio of islands exclusively composed of fragments on a single strand among the islands with comparable read depth decreases as a function of read depth of the island . 
+ In a high quality sample , the proportion of islands with reads from only one strand is expecte to decrease rapidly as we consider higher depth regions . 
+ In contrast , this proportion remains approximately constant in lower quality samples . 
+ The Forward Strand Ratio distribution plot illustrates how quickly the quantiles of the FSR approaches to 0.5 , the expected FSR value in high quality samples . 
+ Even though not every region in a ChIP-exo experiment is perfectly balanced , the most enriched regions are expected to have approximately equal numbers of reads in both strands . 
+ Application and validation of ChIPexoQual with the FoxA1 ChIP-exo dataset . 
+ We next illustrate the proposed QC pipeline using FoxA1 ChIP-exo datasets , which were profiled at comparable sequencing depths in three biological replicates of mouse liver cells . 
+ We first investigated various thresholds for partitioning the mouse genome using these ChIP-exo samples . 
+ We specifically considered small thresholds because larger thresholds are likely to partition wider regions into smaller ones , discard parts of wide regions , and ignore background regions completely . 
+ With this in mind we processed the FoxA1 datasets with the following thresholds 1 , 5 , 25 and 50 ( Supplementary Figure S15 ) . 
+ We observed that , in a high-quality experiment , if multiple thresholds are small and close to each other , then the partitions are similar and the distributions of the proposed metrics are similar as well . 
+ Hence , we decided to use the default threshold of h * = 1 when analyzing the FoxA1 samples . 
+ Figure 3A presents URC versus ARC plots for all three replicates . 
+ The first and third replicates exhibit a defined decreasing trend in URC as the ARC increases . 
+ This indicates that these samples exhibit a higher ChIP enrichment than the second replicate . 
+ On the other hand , the overall URC level from the first two replicates is higher than that of the third replicate , elucidating that the libraries for the first two replicates are more complex than that of the third replicate . 
+ Figures 3B and C display the Read Composition and FSR distribution plots , which highlight specific problems with replicates 2 and 3 . 
+ Figure 3B exhibits apparent decreasing trends in the proportions of regions formed by fragments in one exclusive strand . 
+ High quality experiments tend to show exponential decay in the proportion of single stranded regions , while for the lower quality experiments , the trend may be linear or even constant ( Supplement Figure S21 ) . 
+ FSR distributions of both of replicates 2 and 3 are more spread around their respective medians ( Figure 3C ) . 
+ The rate at which the 0.1 and 0.9 quantiles approach the median indicate the aforementioned lower enrichment in the second replicate and the low complexity in the third one . 
+ In addition to step 4 , when a set of blacklisted regions is available we divide the ChIP-exo nexus islands into two / groups based on whether or not they overlap the blacklisted regions . 
+ Figure 3D illustrates that , first , 1 and scores 2 are robust to existence of islands in the blacklisted regions . 
+ Second , for the islands overlapping the blacklisted regions , both summary metrics are significantly higher in both the overall level and variance . 
+ Therefore , this stratified analysis further indicates that the 1 and 2 scores provide good overall assessments of the datasets and can clearly separate blacklist regions . 
+ We conclude that replicate 1 is higher quality than both of replicates 2 and 3 . 
+ We validate this observation with a motif analysis on the candidate binding regions identified from these replicates . 
+ A conservative approach to identify high quality binding regions ( Materials and Methods ) reveals 7014 , 1855 , and 2187 regions for replicates 1 , 2 and 3 , respectively . 
+ The lower number of enriched regions from replicate 2 is consistent with the lower ChIP enrichment pattern in the URC vs. ARC diagnostic plot . 
+ Figure 4A compares the FIMO scores among the three replicates , notsurprisingly confirming that the first replicate exhibits the highest quality . 
+ Figure 4B displays the average normalized read coverage around the actual motif locations in the candidate binding regions . 
+ These coverage plots reveal that the ChIP signal is slightly more defined for the first and third replicates than the second one , indicating overall strength of the ChIP enrichment in these samples compared to the second replicate . 
+ Figure 4C compares FSR distributions of the ChIP islands overlapping the union of the peaks across the three replicates and highlights that the samples largely satisfy the ` peak-pair ' assumption because peaks with at least one motif tend to be more strand-balanced . 
+ Furthermore , samples with lower library complexity appear to exhibit heavier FSR tails . 
+ High sequencing depth may confound low-complexity library issues . 
+ We evaluated every sample listed in Tables 1 and 2 with the ChIPexoQual QC pipeline ( Supplementary Figures S16 -- S27 ) . 
+ A key observation from this large scale analysis is that the URC versus ARC plots typically display one of the three patterns captured in the FoxA1 study . 
+ We will refer to these as pattern I ( FoxA1 replicate 1 ) , II ( FoxA1 replicate 2 ) , and III ( FoxA1 replicate 3 ) , respectively . 
+ Pattern III where the two arms along ARC are not distinguishable can arise due to either low-complexity library or high sequencing depth . 
+ For example , all three replicates of the TBP ChIP-exo from K562 , with sequencing depths between ∼ 60M to 115M reads , and replicate two of TBP ChIPnexus in K562 , with a sequencing depth of ∼ 130M reads , exhibit this pattern . 
+ A simple but effective strategy to distinguish the two plausible scenarios from Pattern III is to apply the QC pipeline to sub-samples randomly generated from the full dataset at varying sequencing depths ( sub-sampling analysis module ) . 
+ We applied this strategy by sub-sampling 20M to 50M reads in 10M increments , a range that represents the sequencing depths of the human samples we are using in this paper , from the TBP samples . 
+ URC vs. ARC diagnostics of these sub-samples ( Supplementary Figures S30 to S33 ) indicate that , among the four TBP samples with this pattern , replicates two and three of K562 ChIP-exo suffer from low-complexity library issues , whereas the other sam ples exhibit the pattern specific to high quality samples . 
+ To confirm this implication , we compared the top FIMO scores ( 18 ) of the TBP motif for the ChIP-exo and ChIP-nexus replicates . 
+ Figure 4D illustrates that the first ChIP-exo replicate and ChIP-nexus replicates identify binding events with consistently better motif matches than the other ChIP-exo replicates . 
+ This implication on overall quality is further confirmed by the large separation of the 1 and 2 scores between regions that do and do not overlap with the blacklist regions for these high quality samples ( Supplementary Figures S28-S29 ) . 
+ Figure 4E compares the FSR distributions of ChIP islands overlapping the union of peaks across all TBP samples by stratifying them with respect to TBP motif occurrence . 
+ Overall , while the peaks in high quality experiments are more likely to have a motif occurrence if they are balanced , many strand-unbalanced peaks with motifs are also identified . 
+ Specifically , the proportion of peaks with FSR smaller than 0.3 or larger than 0.7 varied between 0.38-0 .43 and 0.20-0 .22 , for ChIP-exo and the ChIP-nexus experiments , respectively . 
+ This further confirms the conclusion of the ChIPexoQual QC pipeline . 
+ Summary statistics for the URC versus ARC diagnostic plot . 
+ We next utilized QC pipeline results for all the samples ( Tables 1 and 2 ) and quantified the relationship between ARC and URC by fitting a reparametrized regression model of URC as a function of ARC . 
+ Specifically , we considered a model of read depth ( Di ) on the number of positions with at least one aligned read ( Ui ) and the width of the island ( Wi ) , i.e. , Di = 1Ui + 2Wi + εi , where εi represents the random error term . 
+ As we discuss in Materials and Methods , this parametrization has a direct connection i ARC + γ + i , which aims to recapitulate the i relationship in the URC vs. ARC plots . 
+ Figure 5A displays estimated overall change in depth ( β1 ) as the number of positions with at least one aligned read varies across a large collection of ChIP-exo samples from eukaryotic genomes . 
+ The parameter can be interpreted as the limiting ( i.e. , large depth ) URC of a sample . 
+ As discussed earlier , high quality ChIP-exo samples are expected to have two arms in the URC versus ARC plots : one with low ARC and varying URC and another with a decreasing URC as ARC increases and stabilizes . 
+ When the ChIP-exo sample is not deeply 1 sequenced , high values of β in Figure 5A indicate that the 1 library complexity is low . 
+ In contrast , lower values correspond to higher quality ChIP-exo experiments . 
+ Taking into account the depths of these samples and visualizing all the diagnostic plots ( Supplementary Figures S16 -- S27 ) , we conclude that samples with estimated β1 values < 10 seem to be high quality samples . 
+ We interpret the 2 as the overall change in depth as the width varies and display its estimates across all the eukaryotic samples in Figure 5B . 
+ Under perfect digestion by - exo , most of the reads aligned to binding regions are expected to accumulate around binding events . 
+ In a high quality sample , the overall variation in depth is expected to be small as the overall widths of the regions change . 
+ This is because the majority of reads are expected to be located tightly around the binding sites and , as a result , the region width should not significantly affect its depth . 
+ In contrast , low quality sample regions are usually composed of a fixed proportion of reads aligned to a small number of unique positions ; hence , the overall change in depth as the width varies is proportional to this fixed proportion . 
+ For example , although the third replicate of the TBP ChIP-exo experiment has comparable sequencing depth to the second replicate of the TBP ChIPnexus experiment ( Figure 5B ) , β2 is considerably higher for the ChIP-exo experiment . 
+ This potentially indicates that additional sequencing reads in comparison to replicates 1 and 2 are scattered around new positions instead of accumulating on the existing binding sites . 
+ In summary , samples with estimated 2 values close to zero can be considered as high quality samples . 
+ The interaction between 1 and 2 has implications regarding the quality of ChIP-exo and ChIP-nexus samples . 
+ When either β1 is large or β2 is different from zero owing to potentially the high sequencing depth of the sample , we suggest randomly sub-sampling reads to form samples of lower depth and evaluating the sub-samples with the QC pipeline . 
+ As an illustration , we apply this strategy for the three replicates of TBP ChIP-exo in K562 ( 22 ) and second replicate from the K562 ChIP-nexus experiments ( 2 ) . 
+ Figure 5C reveals a much higher β1 ( and larger than 10 ) for replicates 2 and 3 compared to replicate 1 and both ChIPnexus samples . 
+ Figure 5D illustrates that the 2 estimates remain approximately constant in ChIP-nexus sub-samples and sub-samples of first replicate of ChIP-exo , while they increase for the second and third ChIP-exo replicates . 
+ This suggests that these two ChIP-exo replicates have low library complexity and overall lower quality than the ChIP-nexus samples , regardless of the fact that all three experiments are deeply sequenced with more than 90M reads each . 
+ Furthermore , the ChIPexoQual diagnostic plots for each subsample ( Supplementary Figures S30 -- S33 ) illustrate that the two arms of the ARC vs. URC plots are clearly visible in moderate depth sub-samples of TBP ChIP-nexus data . 
+ Similarly , Supplementary Figure S32 illustrates that , as expected , the suggested subsampling strategy is also effective for the E1 and E2 samples , which are deeply sequenced , relative to the E. coli genome . 
+ ChIPexoQual R package
+ We implemented ChIPexoQual as an R/Bioconductor package . 
+ ChIPexoQual utilizes a fast processing algorithm by parallel computing . 
+ Supplementary Figure S36 provides ChIPexoQual 's processing times for a collection of samples representing different sequencing depths of the ChIP-exo/nexus experiments listed in Table 2 using four parallel threads on a server with 24 AMD 55Opteron 2.2 GHz processors . 
+ This plot shows that ChIPexoQual requires between 125 and 640 s ( 80 and 420 when the aligned reads are already loaded into memory ) for processing a ChIP-exo/nexus sample . 
+ CONCLUSION
+ We presented a systematic exploration of several ChIP-exo/nexus datasets . 
+ We provided a list of factors that reflect the quality of a ChIP-exo/nexus experiment and developed an easy to use QC pipeline , implemented into an R/Bioconductor package called ChIPexoQual . 
+ ChIPexoQual takes aligned reads as input and automatically generates several diagnostic plots and summary measures that enable assessing enrichment and library complexity . 
+ Our analysis of several datasets indicated that the QC pipeline only requires a set of aligned reads to provide a global overview of the quality of a given ChIP-exo dataset . 
+ The implications of the diagnostic plots and the summary measures align well with more elaborate analysis that is computationally more expensive to perform and/or requires additional inputs that often may not be available , such as motif occurrences in a set of high quality regions or resolution analysis based on a gold-standard . 
+ The ChIPexoQual package ( version 1.0.0 ) is available from Bioconductor ( http://bioconductor.org/packages/ release/bioc/html / ChIPexoQual.html ) . 
+ The Bioconductor version does not currently include the blacklist submodule . 
+ A stable version ( version 0.99.15 ) with this additional submodule is available at https://github.com/ welch16/ChIPexoQual/tree / devel . 
+ DATA AVAILABILITY
+ Escherichia coli ChIP-exo sequence and processed data are available under the NCBI 's Gene Expression Omnibus ( 28 ) and are accessible through GEO series accession number GSE84830 ( http://www.ncbi.nlm.nih.gov/geo/ query/acc.cgi ? 
+ acc = GSE84830 ) . 
+ Supplementary Data are available at NAR Online.
+ ACKNOWLEDGMENTS
+ R.W. acknowledges the funding provided by CONACYT . 
+ Authors ' contributions : R.W. and S.K. developed the ChIPexoQual pipeline . 
+ RW implemented the ChIPexoQual pipeline and D.C. implemented the dPeak package . 
+ R.W. and D.C. performed the analysis . 
+ J.G. and R.L. performed the E. coli sequencing experiments . 
+ R.W. and S.K. wrote the manuscript . 
+ All authors approved the final draft . 
+ FUNDING
+ National Institutes of Health ( NIH ) [ HG003747 and HG007019 to S.K. ] ( in part ) ; NIH [ GM38660 to R.L. ] ; CONACYT [ 215196 to R.W. ] . 
+ Funding for open access charge : NHGRI . 
+ Conflict of interest statement . 
+ None declared .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/29066548.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/29066548.txt 0 → 100644
View file @27818a9
+ Genome-Wide Discovery of Genes
+ Kelvin G. K. Goh , a , b Minh-Duy Phan , a , b Brian M. Forde , a , b , c Teik Min Chong , d Wai-Fong Yin , d Kok-Gan Chan , d Glen C. Ulett , e Matthew J. Sweet , b , f Scott A. Beatson , a , b , c Mark A. Schembria , b School of Chemistry and Molecular Biosciences , University of Queensland , Brisbane , Queensland , Australiaa ; Australian Infectious Diseases Research Centre , University of Queensland , Brisbane , Queensland , Australiab ; Australian Centre for Ecogenomics , University of Queensland , Brisbane , Queensland , Australiac ; Division of Genetics and Molecular Biology , Institute of Biological Sciences , Faculty of Science , University of Malaya , Kuala Lumpur , Malaysiad ; School of Medical Science , and Menzies Health Institute Queensland , Grifﬁth University , Gold Coast , Australiae ; Institute for Molecular Bioscience , University of Queensland , Brisbane , Queensland , Australiaf 
+ ABSTRACT Uropathogenic Escherichia coli ( UPEC ) is a major cause of urinary tract and bloodstream infections and possesses an array of virulence factors for colonization , survival , and persistence . 
+ One such factor is the polysaccharide K capsule . 
+ Among the different K capsule types , the K1 serotype is strongly associated with UPEC infection . 
+ In this study , we completely sequenced the K1 UPEC urosepsis strain PA45B and employed a novel combination of a lytic K1 capsule-speciﬁc phage , saturated Tn5 transposon mutagenesis , and high-throughput transposon-directed insertion site sequencing ( TraDIS ) to identify the complement of genes required for capsule production . 
+ Our analysis identi-ﬁed known genes involved in capsule biosynthesis , as well as two additional regulatory genes ( mprA and lrhA ) that we characterized at the molecular level . 
+ Mutation of mprA resulted in protection against K1 phage-mediated killing , a phenotype restored by complementation . 
+ We also identiﬁed a signiﬁcantly increased unidirectional Tn5 insertion frequency upstream of the lrhA gene and showed that strong expression of LrhA induced by a constitutive Pcl promoter led to loss of capsule production . 
+ Further analysis revealed loss of MprA or overexpression of LrhA affected the transcription of capsule bio-synthesis genes in PA45B and increased sensitivity to killing in whole blood . 
+ Similar phenotypes were also observed in UPEC strains UTI89 ( K1 ) and CFT073 ( K2 ) , demonstrating that the effects were neither strain nor capsule type speciﬁc . 
+ Overall , this study deﬁned the genome of a UPEC urosepsis isolate and identiﬁed and characterized two new regulatory factors that affect UPEC capsule production . 
+ IMPORTANCE Urinary tract infections ( UTIs ) are among the most common bacterial infections in humans and are primarily caused by uropathogenic Escherichia coli ( UPEC ) . 
+ Many UPEC strains express a polysaccharide K capsule that provides protection against host innate immune factors and contributes to survival and persistence during infection . 
+ The K1 serotype is one example of a polysaccharide capsule type and is strongly associated with UPEC strains that cause UTIs , bloodstream infections , and meningitis . 
+ The number of UTIs caused by antibiotic-resistant UPEC is steadily increasing , highlighting the need to better understand factors ( e.g. , the capsule ) that contribute to UPEC pathogenesis . 
+ This study describes the original and novel application of lytic capsule-speciﬁc phage killing , saturated Tn5 transposon mutagenesis , and high-throughput transposon-directed insertion site sequencing to deﬁne the entire complement of genes required for capsule production in UPEC . 
+ Our comprehensive approach uncovered new genes involved in the regulation of this key virulence determinant . 
+ Urinary tract infections ( UTIs ) are among the most common human bacterial infections and cause signiﬁcant morbidity , with roughly 175 million cases estimated to occur annually across the globe ( 1 ) . 
+ It is estimated that up to 50 % of women will develop a UTI in their lifetime , and UTIs account for 1 million hospitalizations and ~ $ 3.5 billion in medical and societal expenditure each year in the United States alone ( 2 , 3 ) . 
+ UTIs usually involve infection of the bladder ( cystitis ) but can also develop into kidney infection ( pyelonephritis ) and bloodstream infection ( urosepsis ) . 
+ Recurrent UTIs are also of major concern ; ~ 20 to 30 % of women with an acute UTI experience a relapsing episode within 3 to 4 months ( 3 ) , and these infections are frequently associated with antibiotic resistance . 
+ Approximately 75 % of all UTIs are caused by uropathogenic Escherichia coli ( UPEC ) ( 2 ) . 
+ UPEC strains largely belong to the E. coli phylogenetic group B2 or D and are often clonal , with the most common sequence types ( STs ) isolated worldwide being ST69 , ST73 , ST95 , and ST131 ( 4 ) . 
+ UPEC possesses a range of virulence factors , including ﬁmbrial adhesins , secreted toxins , iron acquisition systems , ﬂagella , and cell surface polysaccharides , that enable them to colonize the urinary tract and cause disease ( 5 , 6 ) . 
+ The accessory gene repertoire of UPEC can vary extensively between different strains and clones , leading to the observation that virulence is dependent on a combination of multiple factors . 
+ For instance , changes in the regulation of speciﬁc virulence factors , such as type 1 ﬁmbriae , have been observed in the globally disseminated multidrug-resistant ST131 clonal group of UPEC ( 7 ) . 
+ The increasing global incidence of UTIs caused by antibiotic-resistant UPEC highlights the need to better understand UPEC pathogenesis ( 8 , 9 ) . 
+ The O antigen and capsule comprise two cell surface polysaccharides that play important roles in UPEC virulence . 
+ There are more than 180 different serotypes of the bacterial lipopolysaccharide ( LPS ) , which consists of a conserved lipid A core region and a variable O antigen region ( 10 , 11 ) . 
+ Variation in the O antigen is caused by altered sugar residues and linkage patterns within the component repeating subunits . 
+ LPS mediates UPEC resistance to human serum , and common O antigen types frequently identiﬁed among human UPEC isolates include O1 , O2 , O4 , O6 , O7 , O8 , O16 , O18 , O25 , and O75 ( 12 ) . 
+ The E. coli polysaccharide capsule is also highly variable , with more than 80 different capsule types described ( 13 ) . 
+ E. coli capsules are classiﬁed into four major groups based on the genetic organization of the capsule gene cluster , as well as their mechanism of biosynthesis and assembly ( 13 , 14 ) . 
+ Group 2 capsules composed of different K antigens ( e.g. , K1 , K2 , K5 , and K100 ) are commonly expressed by many UPEC strains ( 15 ) . 
+ The genes involved in the biosynthesis of group 2-type capsules are arranged in three distinct regions ( 13 ) . 
+ Regions I ( kpsFEDUCS ) and III ( kpsMT ) are conserved in all group 2 capsule gene clusters and encode a transmembrane complex involved in the export and assembly of the capsular polysaccharides ( 14 , 16 , 17 ) . 
+ Region II is serotype speciﬁc and encodes enzymes responsible for synthesizing the capsular polysaccharide ( 18 ) . 
+ The group 2 gene cluster is transcribed as two polycistronic operons and is driven by two temperature-regulated promoters upstream of regions I and III ( 19 , 20 ) . 
+ Transcription of region III initiates upstream of kpsM and proceeds through to region II , aided by the antiterminator protein RfaH ( 20 ) . 
+ Several other global regulators are also involved in controlling transcription of the capsule genes , including the histone-like nucleoidstructuring ( H-NS ) protein , the DNA-binding regulator protein SlyA , the ribosomebinding protein TypA , and integration host factor ( IHF ) ( 19 , 21 , 22 ) . 
+ The capsule provides protection against phagocytosis and complement-mediated killing ( 23 , 24 ) , and its contribution to UPEC virulence is well established . 
+ In a mutagenesis screen of UPEC strain CFT073 , defects in capsular polysaccharide transport were identiﬁed that attenuated pathogenicity in a murine model of UTI ( 25 ) . 
+ Both the K1 and 
+ K2 capsules provide protection from complement-mediated killing , which has been demonstrated by increased survival of UPEC compared to isogenic capsule mutants following incubation in human serum and human blood ( 26 , 27 ) . 
+ In mice , the K1 capsule is also required for the development of intracellular bacterial communities ( IBCs ) , which are bioﬁlm-like bacterial aggregates that form in superﬁcial bladder epithelial cells during the early stages of acute UTI and contribute to host immune evasion ( 28 ) . 
+ Additionally , the genes involved in capsule synthesis are upregulated by UPEC during UTI ( 29 , 30 ) . 
+ Among the different K capsule types , the K1 serotype is strongly associated with strains that cause UTI , bloodstream infection , and meningitis ( 31 , 32 ) . 
+ The K1 capsule is made up of a chain of sialic acid residues that are synthesized by enzymes encoded by genes in region II of the capsule locus ( neuDBACES ) . 
+ This polysaccharide is identical to the polysialic acid present on some human cells , and hence the K1 antigen is poorly immunogenic due to molecular mimicry ( 33 ) . 
+ Lytic K1 phages have been used as a diagnostic tool to detect the surface expression of the K1 capsule in UPEC ; wild-type strains expressing the K1 capsule are rapidly killed upon encountering the phage due to its speciﬁc attachment to the capsule , which then triggers a lytic life cycle , whereas unencapsulated mutants are protected against phage-mediated killing ( 26 ) . 
+ Given the importance of the K1 capsule in virulence , we sought to deﬁne the entire complement of genes involved in its biosynthesis by using a novel approach that combined phage-speciﬁc killing with a high-throughput unbiased forward genetic screen . 
+ To this end , we ﬁrst assessed the distribution of group 2 capsules among publicly available E. coli complete genomes , and we also determined the full genome sequence of the K1 UPEC strain PA45B , which was originally isolated from the blood of a patient with pyelonephritis . 
+ Next , a saturated transposon mutant library of PA45B was generated and incubated with a lytic K1 capsule-speciﬁc phage , and genes required for capsule biosynthesis were identiﬁed en masse by using transposon-directed insertion site sequencing ( TraDIS ) . 
+ In addition to known capsule genes , our analysis identiﬁed two regulators ( mprA and lrhA ) which were further characterized by the generation of deﬁned mutants and examination of their phenotypic properties . 
+ Overall , our screen deﬁned the complement of genes required for UPEC K1 capsule expression . 
+ RESULTS
+ Distribution of group 2 capsule types in completely sequenced E. coli genomes . 
+ To assess the distribution of group 2 capsules among different E. coli isolates and to determine whether there is any correlation between capsule type and phylogroup / sequence type , we performed an in silico analysis on a collection of 126 completely sequenced genomes publicly available in the NCBI database ( see Data Set S1A in the supplemental material ) . 
+ Based on this analysis , 36 strains in the collection possessed a complete group 2 capsule gene cluster , of which K1 represented the most common K type ( 12/36 strains ) ( Data Set S1B ) . 
+ Eight strains possessed a group 2 capsule that could not be typed in silico , while an additional nine strains contained an incomplete group 2 gene cluster ( one to four genes missing ) . 
+ Genomic analysis of UPEC strain PA45B . 
+ PA45B is part of a previously characterized collection of UPEC strains isolated from the blood of patients presenting with urosepsis at the Princess Alexandra Hospital ( Brisbane , Australia ) ( 34 ) . 
+ PA45B belongs to ST95 and phylogroup B2 . 
+ It expresses the K1 capsule , which allowed us to utilize a lytic K1 capsule-speciﬁc phage to kill encapsulated mutants in our TraDIS experiment . 
+ Moreover , PA45B can be genetically manipulated , which allowed us to generate targeted deletion mutants to validate hits obtained from our TraDIS analysis . 
+ The complete genome of PA45B was determined and shown to consist of a single circular chromosome comprised of 5,074,754 bp ( 50.5 % GC content ) that encodes 4,745 putative protein-coding genes ( GenBank accession number CP021288 ) . 
+ Based on in silico typing , PA45B is an O2 : K1 : H7 strain . 
+ PA45B possesses an array of UPEC virulence factors , which include several iron acquisition systems , ﬁmbrial adhesins , autotransporter proteins , and the capsule ( Data Set S1C ) . 
+ PA45B carries an IncF-type plasmid that we refer to as pPA45B ( 147,172 bp ; 51.6 % GC content ) . 
+ Plasmid pPA45B contains 177 predicted protein-coding genes and three genes associated with antibiotic resistance ( aadA1 , blaTEM-1B , and sul1 ) . 
+ Methylome analysis of PA45B identiﬁed two distinct DNA recognition motifs ( Data Set S1D ) . 
+ Phenotypic conﬁrmation of PA45B capsule type . 
+ A lytic K1 phage was used to conﬁrm the expression of the K1 capsule on the surface of wild-type PA45B . 
+ Using the standard cross-brush method applied for serotyping , PA45B growth was prevented after contact with the phage suspension ( Fig. 1A ) . 
+ The speciﬁcity of the phage to the K1 capsule of PA45B was demonstrated through the construction of an isogenic kpsD mutant strain which was not susceptible to killing by the K1 phage . 
+ The kpsD gene encodes a 60-kDa outer membrane protein that facilitates the transport of capsular polysaccharides across the outer membrane . 
+ The absence of kpsD causes the buildup of capsular polysaccharides in the periplasmic space and loss of surface expression of the capsule ( 35 ) . 
+ As a control , CFT073 ( K2 capsule ) was also tested and shown to be resistant to K1-mediated phage lysis ( Fig. 1A ) . 
+ We also performed a phage-killing assay , in which strains PA45B , PA45BkpsD , CFT073 , and CFT073kpsD were cultured in the presence of K1 phage . 
+ Consistent with our analysis using the cross-brush method , the optical density at 600 nm ( OD600 ) of wild-type PA45B was dramatically reduced when incubated in the presence of the K1 phage , whereas the unencapsulated PA45BkpsD mutant and CFT073 exhibited normal growth ( Fig. 1B ) . 
+ Development of an assay to enrich for survival of capsule mutants . 
+ To develop an assay to enrich for capsule mutants and identify capsule-associated genes by TraDIS , we tested different combinations of PA45B and mutant PA45BkpsD in mixed growth assays in the presence of K1 phage . 
+ We mixed approximately 2 108 CFU of wild-type PA45B ( ~ 99 % ) with 2 106 CFU of mutant strain PA45BkpsD ( ~ 1 % ) and incubated the culture with or without the K1 phage . 
+ Plating of the cultures on both normal LB plates and plates supplemented with chloramphenicol allowed us to determine the number of unencapsulated PA45BkpsD mutant colonies at different time points . 
+ In this assay , the percentage of PA45BkpsD mutant colonies steadily increased in the presence of K1 phage but remained essentially unchanged when no phage was added ( Fig . 
+ S1 ) . 
+ Taken together , these results demonstrate that K1-speciﬁc phage-mediated killing can be used to enrich for unencapsulated mutants in a mixed culture containing both encapsulated and unencapsulated cells . 
+ Identiﬁcation of genes associated with capsule production in PA45B . 
+ To facilitate a large-scale genetic screen for the identiﬁcation of genes involved in K1 capsule synthesis by PA45B , we ﬁrst generated a library of approximately 1 million mutants by using a mini-Tn5 transposon carrying a chloramphenicol resistance gene . 
+ Next , we used this PA45B mutant library in our K1 phage-mediated killing assay to identify genes associated with capsule expression . 
+ In this assay , pools of roughly 2 108 mutant bacteria were added to four different ﬂasks containing 100 ml of LB broth , with two ﬂasks containing the K1 phage ( test pool ) and two ﬂasks with no phage ( control pool ) . 
+ The phage treatment permitted the growth and enrichment of a small number of unencapsulated mutants , while the majority of bacteria that expressed the K1 capsule were lysed . 
+ After 12 h of incubation at 37 °C , 1 ml of culture was extracted from each ﬂask and washed once in phosphate-buffered saline ( PBS ) to remove any cell debris . 
+ PA45B genomic DNA was extracted from each pool and sequenced with a multiplex TraDIS strategy ( Fig. 2 ) . 
+ The test and control pools yielded a total of 6,042,698 Tn5-speciﬁc reads , of which 95.2 % mapped to the PA45B genome . 
+ Further analysis of the control data revealed 430,791 unique insertion sites , which equates to one insertion approximately every 12 bp across the chromosome and emphasizes the high level of saturated mutagenesis and coverage in our library . 
+ We screened for the enrichment of insertions in the test pool compared to the control pool , as mutants containing insertions in genes not related to capsule expression would be lost and thus underrepresented . 
+ Subsequently , a set of genes were identiﬁed to be involved in capsule production by our TraDIS analysis in which we used a stringent threshold cutoff of a log2 fold change ( FC ) in count reads of 5 and an adjusted P value of 0.001 . 
+ This included the majority of genes within the K1 capsule gene cluster ( kpsFEDUCSMT and neuDBA ) , two previously described regulators of capsule expression ( typA and rfaH ) , a new regulator ( mprA ) ( Table 1 ) , and 12 other genes ( Data Set S1E ) . 
+ Genetic characterization and validation of unencapsulated mutants . 
+ To extend and conﬁrm our TraDIS data , we generated a series of deﬁned deletion mutants via Red-mediated recombination and tested the phage sensitivity of the mutants . 
+ The genes mutated included rfaH , typA , and mprA ( Table 1 ) , as well as 10 of the 12 other functionally uncharacterized genes identiﬁed by TraDIS ( Data Set S1E ) . 
+ In this assay , only the PA45B rfaH , typA , and mprA mutants exhibited normal growth after crossing the phage suspension , indicating that these genes were required for capsule production ( Fig. 3A ; Fig . 
+ S2 ) . 
+ The 10 other genes were regarded as false positives that were detected due to the strong positive selective pressure of our K1 phage-killing assay ( see Discussion ) . 
+ We therefore focused our attention on the mprA gene and examined its role in capsule biosynthesis by cloning the gene into the expression plasmid pSU2718 ( to construct plasmid pMprA ) and transforming this plasmid into PA45BmprA to generate the complemented strain PA45BmprA ( pMprA ) . 
+ The sensitivity to phagemediated lysis was restored to the wild-type level in the PA45BmprA ( pMprA ) complemented mutant ( Fig. 3B ) . 
+ Taken together , our data conﬁrmed that MprA is required for expression of the K1 capsule in the UPEC strain PA45B . 
+ Insertions in intergenic regions associated with the loss of K1 capsule . 
+ Our TraDIS data allowed us to identify the enrichment of insertions not only within genes but also within intergenic regions ( IGR ) . 
+ We examined the impact of such insertions on the downstream genes , based on the orientation of the chloramphenicol resistance gene within the mini-Tn5 transposon . 
+ We applied the same stringent threshold cutoffs as described above and identiﬁed eight IGRs that contained an increased amount of transposon insertions in the test pool compared to the control pool ( Data Set S1F ) . 
+ Three IGRs were located immediately upstream of the capsule synthesis genes kpsF , kpsE , and kpsM . 
+ The unidirectional orientation of the transposon insertions in these loci ( with the chloramphenicol resistance gene in the opposite direction to the downstream genes ) , coupled with the enrichment of insertions within these genes , suggested that transcription of the kpsF , kpsE , and kpsM genes was abolished . 
+ Indeed , these genes encode the transport and assembly machinery of the capsule , and hence they are essential for the surface expression of the capsular polysaccharide . 
+ One IGR ( IGR1287 ) contained a much higher number of unique mini-Tn5 insertions than the other IGRs . 
+ In this IGR , all the mini-Tn5 insertions were oriented in the same direction , with the chloramphenicol resistance gene placed in the same orientation as the downstream lrhA gene ( Fig . 
+ S3 ) . 
+ Moreover , no enrichment of insertions was observed within the coding sequence of lrhA . 
+ This suggested that there was enhanced transcription of lrhA via readthrough from the promoter of the chloramphenicol resistance gene ( within the mini-Tn5 transposon ) , which lead to increased levels of LrhA protein and ultimately loss of capsule expression . 
+ To conﬁrm this hypothesis , we inserted a constitutive Pcl promoter upstream of lrhA , generating the strain PA45BPcl-lrhA . 
+ This strain was protected against K1 phage-mediated lysis ( Fig. 3B ) , indicating that overexpression of lrhA abrogated the surface expression of the capsule . 
+ Taken together , our data show that LrhA is involved in K1 capsule regulation in PA45B . 
+ Mutation of mprA and overexpression of LrhA decreases the transcription of the capsule genes . 
+ To investigate the effects of MprA and LrhA on expression of the K1 capsule , we performed quantitative reverse transcriptase PCR ( qRT-PCR ) on the wild-type and mutant strains and examined the transcription of four genes within the capsule gene cluster : kpsF , kpsS , kpsM , and neuE ( Fig. 4A ) . 
+ These genes represent the start and end of the polycistronic transcripts of the K1 capsule gene clusters . 
+ Mutation of mprA resulted in a signiﬁcant decrease in the transcript levels of all four genes ( between 17 - and 200-fold decreases ) , whereas complementation with plasmid pMprA restored the transcription of these genes to wild-type levels ( Fig. 4B ) . 
+ Similarly , the transcripts of all four genes were signiﬁcantly decreased in the PA45BPcl-lrhA strain ( between 20 - and 179-fold decreases ) ( Fig. 4B ) . 
+ Taken together , these ﬁndings suggest that absence of MprA or overexpression of LrhA affects the transcription of genes within the capsule cluster . 
+ Mutation of mprA and overexpression of LrhA result in sensitivity to killing by factors in whole blood but do not affect the LPS . 
+ The capsule makes a key contribution toward UPEC virulence by conferring resistance against killing by innate host factors . 
+ Mutation of the capsule genes , or even enzymatic degradation of the K1 capsule , leads to increased susceptibility to complement-mediated killing ( 36 ) . 
+ Hence , to verify altered expression of the capsule , we determined whether mutant strains were more susceptible to killing in whole blood . 
+ In these assays , the unencapsulated PA45BmprA , PA45BPcl-lrhA , and PA45BkpsD mutant strains displayed increased sensitivity to killing in whole blood compared to the PA45B wild-type strain ( Fig. 5 ) . 
+ Complementation of mprA in PA45BmprA restored survival to wild-type levels . 
+ Bacterial LPS also contributes to the resistance of UPEC to human serum . 
+ To conﬁrm the phenotype that we observed was due to the loss of K1 capsule synthesis and not altered LPS production , we extracted and analyzed the LPS proﬁles of the mutants by using Tricine-sodium dodecyl sulfate -- polyacrylamide gel electrophoresis ( TSDS-PAGE ) and silver staining . 
+ In these experiments , a waaL mutant of PA45B was included as a control to depict an altered O antigen banding proﬁle . 
+ The waaL gene encodes an O antigen ligase ; mutation of this gene would prevent the attachment of the O antigen to the LPS core , and in our experiments this resulted in the loss of bands corresponding to O antigen subunits when cells were visualized by silver staining ( Fig . 
+ S4 ) . 
+ In contrast , we did not observe any difference in the O antigen banding proﬁle between the mutant strains PA45BmprA , PA45BmprA ( pMprA ) , and PA45BPcl-lrhA and wild-type PA45B ( Fig . 
+ S4 ) . 
+ Taken together , these results demonstrate that the increased sensitivities to killing in whole blood displayed by the unencapsulated mprA mutant and the LrhA-overexpressing strain are not due to modiﬁcation of their LPS . 
+ Mutation of mprA and overexpression of LrhA affect the motility of PA45B . 
+ Both MprA and LrhA have previously been shown to repress ﬂagellum-mediated motility ( 37 , 38 ) . 
+ Therefore , we hypothesized that mutation of mprA would lead to increased motility , while overexpression of LrhA would reduce motility . 
+ To test this , we performed swimming assays and compared the motility of the mutant strains to that of wild-type PA45B . 
+ In these experiments , the PA45BmprA mutant displayed a small but signiﬁcantly increased swimming phenotype compared to PA45B , and complementation with plasmid pMprA restored the swimming phenotype to wild-type levels . 
+ Conversely , strain PA45BPcl-lrhA displayed a lower level of motility than PA45B ( Fig. 6 ) . 
+ Collectively , our data suggest that while the loss of MprA or overexpression of LrhA in PA45B leads to the loss of K1 capsule production , these changes have opposite effects on motility ; the loss of MprA results in an increased swimming phenotype of the strain , whereas overexpression of LrhA diminishes the swimming phenotype compared to wild-type PA45B . 
+ Mutation of mprA and overexpression of LrhA result in the loss of capsule expression in other UPEC strains . 
+ To extend our ﬁndings on the roles of MprA and LrhA in capsule production , we ﬁrst examined the role of these proteins in another UPEC strain possessing the same K1 capsule ( strain UTI89 ) . 
+ Similar to our observations for PA45B , mutation of mprA or overexpression of LrhA in UTI89 allowed the mutants to survive K1 phage-mediated lysis , whereas complementation of the UTI89mprA mutant strain with plasmid pMprA restored sensitivity to the K1 phage to wild-type levels ( Fig. 7A ) . 
+ We next examined the role of mprA and lrhA in another UPEC strain expressing a different K capsule ( strain CFT073 [ K2 ] ) . 
+ Here , countercurrent immunoelectrophoresis with a K2-speciﬁc antiserum was used to detect the expression of the capsule . 
+ Precipitin bands were observed for capsular extracts prepared from wild-type CFT073 and strain CFT073mprA ( pMprA ) ( i.e. , the complemented mutant strain ) , whereas no band was detected from the CFT073mprA or CFT073Pcl-lrhA mutant strains ( Fig. 7B ) . 
+ To investigate if the loss of capsule expression in UTI89 and CFT073 caused the same phenotype observed for PA45B , we subjected both strains , together with their respective mutants , to a series of whole-blood killing assays . 
+ Consistent with our previous observations , the unencapsulated mutants displayed increased sensitivity to killing in whole blood compared to their respective wild-type strains ( Fig. 8 ) . 
+ These results therefore demonstrated that the effect of mutating mprA or overexpressing LrhA on capsule production is not strain or capsule type speciﬁc . 
+ DISCUSSION
+ The role of the polysaccharide capsule is well established in UPEC virulence . 
+ The prototypical group 2 capsule provides protection against phagocytosis and complementmediated killing ( 23 , 24 ) , contributes to immune evasion via molecular mimicry ( 33 ) , and can also mask other surface-associated antigens ( 39 ) . 
+ Group 2 capsule biosynthesis is controlled by a conserved export and assembly mechanism ( 16 , 17 , 19 -- 22 ) , suggesting that the expression of all group 2 capsules may be regulated by the same factors , hence providing an attractive target for the development of therapeutic agents . 
+ We ﬁrst examined a collection of 126 complete E. coli genomes to gain a better understanding of the distribution of the group 2 capsule types . 
+ We found that the majority of phylogroup B2 ( 75 % ; 33/44 ) and D ( 75 % ; 3/4 ) strains possess intact region I and III capsular export and assembly genes but carry an array of region II genes that encode different capsular K types . 
+ All 10 of the ST95 strains in our 126-strain database possess genes encoding a K1 capsule . 
+ ST95 is one of several major pandemic clonal lineages of UPEC found worldwide , and this sequence type has been noted for its relatively low frequency of drug resistance ( 4 ) . 
+ Because our data suggested that the K1 capsule is closely associated with ST95 strains , it is possible that the K1 capsule may have provided a ﬁtness advantage that contributed to the successful expansion of ST95 . 
+ Moreover , the ST95 clonal group was ﬁrst identiﬁed more than 70 years ago , suggesting that it is not a newly emerging clonal group undergoing drug resistance selection ( 4 ) . 
+ Further work is now required to examine the distribution of K types in a larger collection of ST95 strains and to assess the impact of the K1 capsule on ST95 virulence . 
+ Due to the poor immunogenicity of the K1 capsule and lack of readily available antisera , bacteriophages speciﬁc to the K1 capsule have been used as diagnostic tools to rapidly identify strains expressing the K1 antigen . 
+ These capsule-speciﬁc phages recognize and bind to the surface polysaccharide structure , allowing the phage to enter and subsequently kill the cell due to its lytic life cycle . 
+ Taking advantage of this phenotype , we exposed a highly saturated transposon mutant library to a commercially available lytic K1 capsule-speciﬁc phage , which allowed us to enrich for unencapsulated mutants in a mixed culture containing both encapsulated and unencapsulated cells . 
+ The use of a lytic phage in combination with TraDIS represents a novel approach to identify genes involved in capsule expression . 
+ Here , we searched for genes with a signiﬁcant increase in insertion frequency in the output pool ( test pool ) compared to the input pool ( control pool ) , as the disruption of these genes would have repressed capsule expression and hence conferred resistance to the K1 phage . 
+ We initially hypothesized that all 14 genes within the capsule gene cluster ( kpsFEDUCSMT and neuDBACES ) would be identiﬁed in our screen . 
+ However , analysis of our TraDIS data revealed that three genes ( neuCES ) did not possess an enrichment of Tn5 insertions and were not identiﬁed in our screen . 
+ The neuC gene encodes a UDP N-acetylglusoamine 2-epimerase which forms a sialic acid precursor and is required for the expression of the K1 capsule ( 40 ) . 
+ The neuS gene encodes a polysialyltransferase which plays a role in assembling the full-length sialic acid polymer and is also required for K1 capsule expression ( 41 ) . 
+ Although the exact role of NeuE has not been determined , NeuE , NeuS , and KpsC have been shown to be the minimum combination of proteins required to observe de novo polysialic acid synthesis in E. coli in vitro ( 41 ) . 
+ We generated individual targeted deletion mutants of neuC , neuE , and neuS in PA45B and conﬁrmed via a K1 phage assay that the mutants did not express the K1 capsule ( Fig . 
+ S2 ) . 
+ Closer examination of the three genes revealed that they all possessed a very low number of Tn5 insertions , even in the control pool . 
+ Interestingly , all three genes have a lower GC content ( neuC , 32.1 % ; neuE , 26.3 % ; neuS , 27.5 % ) than the entire PA45B genome ( 50.5 % ) . 
+ The low GC content of these three genes could explain their absence from identiﬁcation in the output pool , as the mini-Tn5 transposon used to create the PA45B mutant library preferentially inserts into GC-rich regions ( 42 ) . 
+ This has also been observed in another TraDIS study using the mini-Tn5 transposon , where an overall increased transposon insertion frequency in high-GC versus low-GC regions was observed ( 43 ) . 
+ Besides the capsule gene cluster , our TraDIS analysis identiﬁed 15 other genes that could play a role in K1 capsule expression of PA45B . 
+ These included two genes encoding factors known to positively regulate capsule expression ( typA and rfaH ) , which were conﬁrmed in this study . 
+ The other 13 genes represented genes that may affect capsule expression , and we generated deﬁned mutants of 11 of these genes for further investigation ( despite multiple attempts , we were unable to generate deﬁned glpT_2 and yghG mutants in PA45B ) . 
+ Only the mutation of mprA in PA45B conferred resistance to the K1 phage that indicated the loss of capsule expression . 
+ Although this high false-discovery rate ( FDR ) seemed puzzling at ﬁrst , further inspection of the Tn5 insertions that mapped to these genes revealed that they possessed a much lower number of unique insertion sites than did other genes that were shown to affect capsule expression ( Fig . 
+ S3 ) . 
+ We therefore suggest that the discovery of these genes in our TraDIS experiment was due to independent random attenuation of capsule production ( i.e. , via spontaneous mutation or insertion of a second Tn5 transposon into capsule biosynthesis or regulatory genes ) combined with the extremely powerful positive selection invoked by the use of a lytic K1-speciﬁc phage . 
+ This highlights a previously unrecognized complication that should be considered when interpreting data associated with large-scale sequence-based genetic screening generated from experiments involving powerful positive selection . 
+ Further analyses conﬁrmed the role of mprA in capsule expression : ( i ) transcription of the capsule genes was signiﬁcantly decreased in the PA45BmprA mutant , ( ii ) the PA45BmprA mutant exhibited increased sensitivity to killing in whole blood , despite possessing an intact LPS , and ( iii ) complementation of the mutant strain PA45BmprA with the plasmid pMprA restored all phenotypes to wild-type levels . 
+ MprA is a DNA-binding transcriptional regulator that belongs to the MarR family of winged-helix proteins . 
+ These proteins are involved in controlling the expression of multidrug efﬂux pumps and other virulence-associated factors in multiple pathogens ( 44 ) . 
+ In E. coli , the mprA gene is located within an operon together with the emrAB genes , which encode a drug efﬂux pump that protects the bacterial cell from several antimicrobial agents ( 45 ) . 
+ MprA represses transcription of the emrAB genes by directly binding to the promoter region of emrA ( 46 ) . 
+ Mutation of mprA also leads to increased transcription of ﬂagellar genes , increased expression of the FliC ﬂagellin , enhanced ﬂagellum synthesis , and a hypermotile phenotype ( 38 ) . 
+ Recently , a study reported that small-molecule inhibitors of MprA prevented K1 polysaccharide capsule expression ( 47 ) . 
+ Those authors showed that knockout of mprA had an effect similar to treatment with the inhibitor , with loss of encapsulation and complete attenuation in a murine sepsis model . 
+ They went on to show that although the mprA mutant exhibited an increase in emrA transcript levels , there was insufﬁcient upregulation of the efﬂux pump to affect antibiotic resistance . 
+ However , those authors were unable to demonstrate direct binding of MprA to the capsule promoter regions , and bioinformatic analysis failed to identify any potential MprA-binding sites in these locations . 
+ Hence , it was concluded that the effect of MprA is likely indirect and possibly coordinated via a broader regulatory network . 
+ In this study , we used an unbiased forward genetic screen coupled with deep sequencing , and we also identiﬁed MprA as a factor affecting capsule expression . 
+ We showed that MprA is required for efﬁcient transcription of the capsule genes and modulates the susceptibility to killing in whole blood . 
+ Our data conﬁrmed the role of MprA in capsule expression , while the precise mechanism of how MprA affects capsule expression remains an area of ongoing investigation . 
+ The highly saturated nature of the transposon mutant library used in our TraDIS analysis allowed us to identify not only transposon insertions within genes but also insertions in IGRs and the directionality of the insertions . 
+ The latter is important , as the promoter of the chloramphenicol resistance gene can drive the transcription of a downstream gene instead of disrupting it if the insertion position is favorable . 
+ There were eight IGRs that contained signiﬁcantly more insertions in the output pool than in the input pool , which indicated that these insertions led to the loss of capsule expression . 
+ Four IGRs contained transposon insertions with the promoter of the chlor-amphenicol resistance gene oriented away from the downstream genes , suggesting that these insertions disrupted transcription of the respective genes . 
+ Three of these IGRs with insertions oriented away from the downstream genes were located upstream of genes known to be required for capsule production , and they were identiﬁed in our TraDIS screen . 
+ Another one such IGR was found immediately upstream of mprA , providing further evidence for the role of MprA in capsule production . 
+ In contrast , another IGR contained a signiﬁcant increase in transposon insertions oriented toward the downstream lrhA gene . 
+ The lack of insertions within lrhA suggested that the transposon insertions likely caused increased transcription of the gene . 
+ We conﬁrmed this increased transcription by inserting a constitutive Pcl promoter upstream of lrhA to drive strong transcription of the gene , and we found that the PA45BPcl-lrhA strain was also resistant to the K1 phage . 
+ Mutation of lrhA did not impact K1 capsule expression ( Fig . 
+ S2 ) . 
+ However , the PA45BPcl-lrhA strain exhibited a decrease in the transcription of capsule genes and an increased sensitivity to killing in whole blood . 
+ We noted that both the PA45BPcl-lrhA and PA45BkpsD strains were exquisitely sensitive to killing in whole blood , while strain PA45BmprA , although also sensitive , survived in slightly higher numbers . 
+ The reason for the difference in sensitivity to killing in whole blood remains unknown but may reﬂect the modes of action of LrhA and MprA . 
+ LrhA is a DNA-binding transcriptional regulator that belongs to the LysR family of proteins ( 48 ) . 
+ It regulates the transcription of genes involved in motility , chemotaxis , and ﬂagellum synthesis by repressing the expression of the master regulator FlhDC , and it also represses the expression of the sigma factor RpoS , which in turn prevents the transcription of hundreds of stationary-phase genes ( 37 ) . 
+ Additionally , LrhA represses the expression of type 1 ﬁmbriae , an adhesin important for colonization of the bladder and a key factor contributing to bioﬁlm formation on abiotic surfaces and IBC formation within superﬁcial cells of the bladder urothelium ( 49 , 50 ) . 
+ Overall , our data demonstrate that LrhA represses capsule production , and we speculate that this occurs at the transcriptional level via direct binding to the promoter regions of the capsule biosynthesis genes , or by enhancing binding of the RNA polymerase holoenzyme , mechanisms that have been previously attributed to LrhA function ( 51 , 52 ) . 
+ We showed that both MprA and LrhA affect capsule expression and motility , albeit in different ways , and may contribute to different phases of bacterial lifestyles ( Fig . 
+ S5 ) . 
+ The absence of MprA results in cells that are unencapsulated and hypermotile . 
+ The capsule can also block the function of short adhesins through physical shielding ( 39 ) . 
+ We speculate that a subpopulation of hypermotile , unencapsulated bacteria could aid in dispersal/ascension within the urinary tract , where ﬁmbrial and aﬁmbrial adhesins could contribute to adherence and colonization at new sites . 
+ However , overexpression of LrhA results in unencapsulated cells that are less motile than the wild type . 
+ LrhA also regulates a number of factors involved in bioﬁlm formation , including type 1 ﬁmbriae , ﬂagella , and the stationary-phase sigma factor RpoS ( 50 , 53 ) . 
+ This suggests that varied intracellular levels of LrhA may lead to the development of subpopulations of cells primed to adapt to different steps in bioﬁlm formation ( i.e. , adherence , microcolony development , maturation ) and dispersal ( 54 ) , as well as resistance to host innate immune factors and colonization . 
+ MprA and LrhA likely contribute to this lifestyle adaptation , as they regulate multiple factors involved in UPEC virulence . 
+ Group 2 capsules share common regulatory elements ; the mprA and lrhA genes identiﬁed in this study are highly conserved and are found within the E. coli core genome ( 55 ) . 
+ This suggests that the effects of MprA and LrhA may be similar across all strains expressing a group 2 capsule . 
+ Indeed , we demonstrated that mutation of mprA and overexpression of LrhA in two other UPEC strains ( UTI89 and CFT073 ) also caused the loss of capsule expression . 
+ Thus , the molecular control of capsule synthesis mediated by these two regulators is neither strain nor capsule K type speciﬁc . 
+ Overall , our study highlights the combined power of saturated transposon mutagenesis and deep sequencing as a highly efﬁcient forward genetic screening method to study bacterial virulence . 
+ Despite the fact that group 2 capsules have been studied for decades , our combined use of a lytic K1 capsule-speciﬁc phage and TraDIS uncovered new aspects of K1 capsule regulation and expression . 
+ A better understanding of the precise molecular mechanisms that govern how the genes identiﬁed in this work regulate capsule expression is now needed . 
+ MATERIALS AND METHODS
+ Ethics approval . 
+ Approval for the collection of human blood for whole-blood killing assays was obtained from the University of Queensland Medical Research Ethics Committee ( 2008001123 ) . 
+ Bioinformatic analysis . 
+ The E. coli database was represented by 126 published complete genomes available on the NCBI database as of May 2016 . 
+ Kaptive was used to assess the prevalence of group 2 capsules ( 56 ) . 
+ Sequence comparisons were examined using the FASTA36 software package ( 57 ) . 
+ The E. coli strains were classiﬁed into major phylogroups ( A , B1 , B2 , D , E , and F ) based on an in silico analysis of the arpA , chuA , yjaA , and TSPE4.C2 loci ( 58 ) . 
+ Multilocus sequence typing analysis was performed using the sequences of seven housekeeping genes as previously described ( 59 ) . 
+ Bacterial strains and culture conditions . 
+ Cells were routinely cultured shaking at 37 °C in Luria-Bertani ( LB ) broth medium that was supplemented , when appropriate , with the following antibiotics : gentamicin ( Gent ; 20 g/ml ) , ampicillin ( Amp ; 100 g/ml ) , kanamycin ( Kan ; 50 g/ml ) , or chloramphen-icol ( Cm ; 30 g/ml ) . 
+ Expression of genes was induced with either 1 mM isopropyl - D-1-thiogalacto-pyranoside ( IPTG ) or 0.2 % L-arabinose when required . 
+ All strains and plasmids used are outlined in Data Set S1G . 
+ Molecular methods . 
+ Methods for DNA extraction , puriﬁcation , sequencing , and PCR were previously described ( 60 ) . 
+ Deletion mutants were constructed using the - Red recombinase gene replacement system as described previously ( 61 ) ; primers are listed in Data Set S1H . 
+ For qRT-PCR , exponentially growing cells ( 500 l ; OD600 of 0.6 ) were stabilized in 1 ml of RNAprotect bacteria reagent ( Qiagen ) . 
+ Subsequent RNA extraction , DNase I treatment , ﬁrst-strand cDNA synthesis , and qRT-PCR were performed as previously described ( 60 ) . 
+ Gene expression levels were determined with the cycle threshold ( 2 ΔΔCT ) method ( 62 ) , with differences expressed relative to the wild-type PA45B response . 
+ All experiments were performed in three independent replicates . 
+ Genome sequencing and assembly . 
+ PA45B was sequenced on a Paciﬁc Biosciences ( PacBio ) 249 RSII system using the P4 polymerase and C2 sequencing chemistry . 
+ The raw sequencing data were assembled as previously described ( 43 ) . 
+ Annotation of the PA45B genome sequence was performed using PROKKA v1 .11 ( 63 ) and the EcoCyc database ( 64 ) , and the sequence was visualized in Artemis ( 65 ) . 
+ The K1 capsule gene cluster from UTI89 was used to manually annotate the capsule gene cluster in PA45B . 
+ K1 phage assay . 
+ A K1 polysialic acid capsule-dependent lytic phage ( 5.6 1011 PFU/ml ) sourced from the Statens Serum Institut ( Denmark ) was used . 
+ To determine phage titers , 50 l of PA45B culture ( OD600 of 1.0 ) was ﬁrst added to 3 ml of warm soft agar , spread onto LB agar plates , and allowed to dry . 
+ Serial dilutions of the phage suspensions ( 5 l ) were spotted onto the plates , and the numbers of plaques were determined following overnight incubation at 37 °C . 
+ The K1 phage suspension was used to identify encapsulated cells in two assays . 
+ First , the cross-brush method was performed as recommended by the manufacturer ( 11 ) . 
+ Inhibition of growth after crossing the phage suspension line was considered a positive reaction . 
+ For the second assay , overnight cultures of wild-type or mutant PA45B cells were subcultured 1:100 into fresh LB broth in microtiter plates and incubated with shaking at 37 °C . 
+ After 90 min , the plates were removed from the shaker and 2 l of the K1 phage suspension was added to each well . 
+ The plates were returned to the shaker for another 3 h , and the OD600 was measured to determine the extent of phage-mediated lysis . 
+ Generation and screening of the PA45B mini-Tn5 library . 
+ Generation of the PA45B mini-Tn5 library was performed as previously described ( 66 ) . 
+ The ﬁnal library of approximately 1 million mutants was generated by pooling three batches of mutants , each containing approximately 160,000 to 580,000 mutants . 
+ Approximately 2 108 bacterial cells from the PA45B mini-Tn5 library were inoculated into 100 ml of LB broth either with ( test ) or without ( control ) 3 l of K1 phage . 
+ After 12 h of shaking incubation at 37 °C , 1 ml of culture was removed , washed with PBS once , and resuspended in 1 ml of PBS . 
+ Genomic DNA was extracted from this cell suspension using the Ultraclean microbial DNA isolation kit ( Mo Bio Laboratories ) . 
+ The screening assays were performed in duplicate . 
+ Multiplex TraDIS . 
+ Genomic DNA from each sample ( tests and controls ) was subjected to library preparation by using the Nextera DNA library prep kit ( Illumina ) with slight modiﬁcation to amplify and sequence Tn5 insertion sites ( 43 ) . 
+ Brieﬂy , 50 ng of genomic DNA was fragmented and enzymatically tagged with an adapter sequence ( `` tagmentation '' ) and then puriﬁed using the Zymo DNA clean and concentrator kit ( Zymo Research ) . 
+ The PCR enrichment step , for which we used a custom transposonspeciﬁc primer to enrich for transposon insertion sites and an index primer ( one index per sample ) to allow for multiplexing sequencing , was performed at 72 °C for 3 min and 98 °C for 30 s , followed by 22 cycles of 98 °C for 10 s , 63 °C for 30 s , and 72 °C for 1 min . 
+ Each library was puriﬁed using Agencourt Ampure XP magnetic beads . 
+ Library veriﬁcation and quantiﬁcation were undertaken using a Qubit 2.0 ﬂuorometer and 4200 Tapestation system ( Agilent Technologies ) . 
+ All libraries were pooled and submitted for sequencing on the MiSeq platform at the Australian Centre for Ecogenomics ( University of Queensland , Australia ) . 
+ Analysis of TraDIS data . 
+ Raw sequencing reads from TraDIS analysis were ﬁltered and trimmed to keep 30 bp immediately after the 12-bp Tn5-speciﬁc tag ( 5 = - TATAAGAGACAG-3 =) at their 5 = ends by using the FASTX toolkit ( http://hannonlab.cshl.edu/fastx_toolkit/index.html ) . 
+ These reads were aligned to the PA45B genome ( CP021288 ) using Bowtie ( version 1.1.2 ) ( 67 ) with its default arguments , and aligned reads were reported with `` - M 1 -- best '' parameters . 
+ Subsequent analysis steps were carried out in R ( version 3.3.1 ) , with the Rsamtools package ( version 1.26.1 ) ( 68 ) to calculate the number of sequence reads ( read counts ) and the number of insertion sites per gene ( and intergenic regions ) , which were then used to estimate the log FC and FDR by use of the edgeR_package ( version 3.16.1 ) ( 69 , 70 ) . 
+ To identify genes required for K1 capsule expression , we used stringent criteria of log FC of 5 , FDR of 0.001 , and read count of any site within a gene not exceeding 1/3 of total reads mapped to that gene . 
+ Whole blood killing assay . 
+ Approximately 107 CFU in 100 l , prepared from a late exponentialphase culture , was added in duplicate to 900 l of whole blood collected from a healthy donor by using Vacuette tubes containing sodium heparin ( Grenier Bio-one ) . 
+ The samples were incubated with rolling at 37 °C for 1 h and then plated in triplicate onto LB agar . 
+ The survival rate was expressed as the percentage of surviving cells compared to the initial inoculum . 
+ Biochemical and phenotypic assays . 
+ E. coli K2 antiserum ( Statens Serum Institut ) was used to detect the presence of the K2 antigen , and capsule extracts were made as previously described ( 26 ) . 
+ Countercurrent immunoelectrophoresis was performed for 70 min at 80 V in 1 Tris-acetate-EDTA buffer , and a precipitin band between the wells indicated the presence of the K2 capsule . 
+ LPS was extracted and resolved using Tricine-SDS -- PAGE as described previously ( 66 , 71 ) . 
+ For motility assays , 6 l of an overnight culture was spotted onto freshly prepared 0.25 % LB Bacto agar plates in triplicate and incubated at 37 °C for 16 h in a closed box containing a beaker of water to prevent drying . 
+ The rate of motility was expressed relative to motility of wild-type PA45B ( i.e. , diameter of the motility zone of mutants divided by the diameter of the motility zone of wild-type PA45B ) . 
+ Accession number ( s ) . 
+ The sequences for the PA45B chromosome and pPA45B plasmid have been deposited in the NCBI GenBank database under accession numbers CP021288 and CP021289 , respectively . 
+ The raw PacBio sequence reads have been deposited in the Sequence Read Archive ( SRA ) under accession number SRR5585696 . 
+ The TraDIS reads have been deposited at the SRA under accession numbers SRR5520531 , SRR5520532 , SRR5520533 , and SRR5520534 . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at https://doi.org/10.1128/mBio .01558 -17 . 
+ FIG S1 , TIF ﬁle , 0.9 MB . 
+ FIG S2 , TIF ﬁle , 1.6 MB . 
+ FIG S3 , PDF ﬁle , 0.8 MB . 
+ FIG S4 , TIF ﬁle , 2.7 MB . 
+ FIG S5 , TIF ﬁle , 1.5 MB . 
+ DATA SET S1 , XLSX ﬁle , 0.1 MB . 
+ ACKNOWLEDGMENTS
+ We thank Kate Peters , Tim Kidd , Alvin Lo and Steven Hancock for their detailed and constructive comments , as well as Nicola Angel and Serene Low at the Australian Centre for Ecogenomics ( University of Queensland , Australia ) for help with Illumina sequencing . 
+ This work was supported by grants from the National Health and Medical Research Council ( NHMRC ) of Australia ( GNT1067455 and GNT1129273 ) and High Impact Research ( HIR ) grants from the University of Malaya ( UM-MOHE HIR Grant UM C/625/1 / HIR/MOHE/CHAN / 14/1 , no . 
+ H-50001-A000027 ; UM-MOHE HIR Grant UM C/625/1 / HIR / MOHE/CHAN/01 , no . 
+ A000001-50001 ) . 
+ M.J.S. and M.A.S. are supported by NHMRC 
+ Senior research fellowships ( GNT1107914 and GNT1106930 , respectively ) ; S.A.B. is supported by an NHMRC Career Development fellowship ( GNT1090456 ) . 
+ The funders had no role in study design , data collection and interpretation , or the decision to submit the work for publication .
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/29091192.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/29091192.txt 0 → 100644
View file @27818a9
+ Modiﬁcations in the pmrB gene are the primary mechanism for the development of chromosomally encoded resistance to polymyxins in uropathogenic Escherichia coli
+ Methods : Two complementary approaches , saturated transposon mutagenesis and spontaneous mutation induction with high concentrations of colistin and polymyxin B , were employed to select for mutations associated with resistance to polymyxins . 
+ Mutants were identiﬁed using transposon-directed insertion-site sequencing or Illumina WGS . 
+ A resistance phenotype was conﬁrmed by MIC and further investigated using RT -- PCR . 
+ Competitive growth assays were used to measure ﬁtness cost . 
+ Results : A transposon insertion at nucleotide 41 of the pmrB gene ( EC958pmrB41-Tn5 ) enhanced its transcript level , resulting in a 64 - and 32-fold increased MIC of colistin and polymyxin B , respectively . 
+ Three spontaneous mutations , also located within the pmrB gene , conferred resistance to both colistin and polymyxin B with a corresponding increase in transcription of the pmrCAB genes . 
+ All three mutations incurred a ﬁtness cost in the absence of colistin and polymyxin B. 
+ Conclusions : This study identiﬁed the pmrB gene as the main chromosomal target for induction of colistin and polymyxin B resistance in E. coli . 
+ Introduction
+ Polymyxins are polypeptide antibiotics originally isolated in 1947 from Paenibacillus polymyxa subsp . 
+ colistinus .1,2 Both polymyxin B and E ( colistin ) are highly effective against many Gram-negative bacteria , including most members of the Enterobacteriaceae ; however , their use in clinical medicine has been limited due to their neurotoxic and nephrotoxic side effects .3 -- 5 Despite these limitations , the increasing incidence of infections caused by MDR Gram-negative pathogens , in particular carbapenem-resistant Enterobacteriaceae , has led to renewed interest in the revival of polymyxin B and colistin as last-line treatments .6 -- 8 The bactericidal activity of polymyxins against Gram-negative bacteria initiates through their interaction with negatively charged 
+ LPS in the outer membrane , subsequently permeabilizing the outer membrane and disrupting the inner membrane leading to lytic 9 cell death . 
+ Therefore , the main mechanism of polymyxin resistance occurs via modiﬁcations of LPS to decrease its net negative charge , weakening its interaction with cationic poly-10 myxins . 
+ These LPS modiﬁcations are directly controlled by the PmrAB two-component regulatory system ( TCS ) .11 In Escherichia coli , Salmonella enterica , Klebsiella pneumoniae , Pseudomonas aeruginosa and Acinetobacter baumannii , the most common poly-myxin resistance mechanism involves modiﬁcation of the PmrAB 12 -- 16 TCS . 
+ The pmrCAB operon encodes the phosphoethanolamine transferase ( PmrC ; also called EptA ) , the response regulator PmrA and the sensor kinase PmrB . 
+ Mutations in the pmrA or pmrB genes can lead to activation of PmrA , which in turn upregulates PmrC and enzymes of the arnBCADTEF-pmrE operon that are responsible for the biosynthesis and transfer of phosphoetha-nolamine and 4-deoxyaminoarabinose to lipid A , respectively .8,11 Other polymyxin resistance mechanisms include mutations in the PhoPQ TCS that indirectly activate PmrAB via PmrD ,16 adsorption ofpolymyxins by surface polysaccharides ( K. pneumoniae and P. aeruginosa ) ,17,18 modiﬁcation of the Kdo with phosphoethanol-amine ( E. coli ) 19,20 and transport via efﬂux pumps ( K. pneumoniae ) .21 Alarmingly , a plasmid-mediated polymyxin resistance gene ( mcr-1 ) was recently discovered22 and there are already .100 reports describing the identiﬁcation of this gene in multiple Gram-negative species across the globe .23,24 The mcr-1 gene encodes a phosphoethanolamine transferase originally found on a transferable IncI2 plasmid ; 22 mcr-1 is mobilized by an ISApl1 composite transposon and has been subsequently shown to reside in 25 multiple plasmid and chromosomal locations . 
+ One of the most important groups of pathogens for which poly-myxin B and colistin are considered as last-line treatments is uro-pathogenic E. coli ( UPEC ) that cause urinary tract infection ( UTI ) . 
+ The most clinically important subgroup of UPEC belongs to the glo-bally disseminated MDR ST131 clone . 
+ ST131 was originally identi-ﬁed in 2008 as a major clone associated with dissemination of the CTX-M-15-type ESBL gene .26 -- 29 Multiple epidemiological studies have since highlighted the importance of ST131 and its signiﬁcance as the predominant ﬂuoroquinolone-resistant UPEC clone worldwide .30 -- 34 Of major concern , resistance to last-line carbape-nems22 ,35 -- 37 and colistin via acquisition of the mcr-1 gene38 has been reported in ST131 . 
+ In light of this , a detailed analysis of the mechanisms of induced resistance to polymyxin B and colistin in ST131 would enhance our understanding of resistance development against this important last-line treatment . 
+ In this study , we used the complementary laboratory approaches of saturated transposon mutagenesis and spontaneous induction in the presence of high concentrations of polymyxin B and colistin to identify mutations leading to polymyxin resistance in the ST131 reference strain EC958 . 
+ Overall , we showed that the primary mechanism of induced polymyxin resistance involves modiﬁcation of the pmrB gene . 
+ Materials and methods
+ Bacterial strains and growth conditions
+ The E. coli strain EC958 was isolated from the urine of a patient with community-acquired UTI in the UK .39 EC958 belongs to ST131 subclade C2 or H30Rx32 ,40 and is resistant to ﬂuoroquinolones and third-generation cephalosporins .41 Bacterial strains were cultured on LB broth or agar with appropriate selection ( 30 mg/L chloramphenicol ) as required . 
+ Antimicrobial susceptibility testing
+ MICs of colistin and polymyxin B were determined using the broth dilution method .42 Brieﬂy , 5 % 105 cfu/mL of bacteria were inoculated into a series of Mueller -- Hinton broths containing 2-fold dilutions of each antibiotic in 0.01 % acetic acid and 0.4 % BSA . 
+ The MIC was deﬁned as the lowest concentration of antibiotic that completely inhibited bacterial growth after incubation at 37 C for 18 -- 24 h. 
+ Selection of polymyxin-resistant mutants
+ The EC958 mini-Tn5 saturated mutant library has been described previously .43 Approximately 2 % 108 cells from the mutant library were screened on LB agar supplemented with either colistin ( 5 mg/L ) or poly-myxin B ( 14 mg/L ) at 37 C for 18 h to select for resistant mutants . 
+ All trans-poson-resistant mutants were pooled for subsequent analysis . 
+ To identify 8 spontaneous resistant mutants , 2 % 10 WT EC958 cells were grown under the same conditions ; resistant colonies were subcultured and stored individually in 10 % glycerol in LB at # 80 C. 
+ Transposon-directed sequencing, WGS and sequencing data analysis
+ A total of 282 colistin-resistant transposon mutants and 243 polymyxin B-resistant transposon mutants were pooled and DNA samples were extracted for transposon-directed insertion-site sequencing ( TraDIS ) analysis as described previously .43,44 In brief , the Illumina library preparation was performed using the Nextera DNA Sample Prep Kit ( Illumina ) following the manufacturer 's instructions with modiﬁcations for TraDIS , and sequenced with 100 bp single-end reads on the Illumina MiSeq platform . 
+ Spontaneous resistant mutants were sequenced using Illumina technology . 
+ The sequencing libraries were prepared using the Nextera DNA Library Prep Kit . 
+ The libraries were sequenced with 2 % 100 bp paired-end reads using the Illumina HiSeq 2500 . 
+ Illumina reads were analysed using an in-house pipeline for quality control . 
+ High quality Illumina reads were mapped to the EC958 genome ( HG941718 ) using SHRIMP 2.045 and nucleo-tide variations were identiﬁed using Nesoni ( www.vicbioinformatics.com / software.nesoni.shtml ) . 
+ Quantitative RT–PCR (qRT–PCR) and 50-RACE
+ RNA samples were extracted from cells grown to mid-log ( OD600 0.6 ) in LB broth using the RNAeasy Mini Kit ( QIAGEN ) and then converted into cDNA using SuperScript III Reverse Transcriptase ( Invitrogen , Life Technologies ) . 
+ qRT -- PCR analysis of the pmrCAB genes was performed in triplicate using the ABI SYBR Green PCR Master Mix on the ViiA 7 Real-Time PCR System ( Life Technologies ) . 
+ Relevant primers are listed in Table S1 ( available as Supplementary data at JAC Online ) . 
+ Relative expression levels were calculated by the 2 #DDCT method46 using gapA as an endogenous control and the WT pmrCAB genes as a reference . 
+ The transcription start site of pmrB in the EC958pmrB41-Tn5 mutant was identiﬁed using the 50-RACE system ( QIAGEN ) according to the manufacturer 's instructions . 
+ Synthesis of cDNA speciﬁc for pmrB was performed using reverse transcriptase with the pmrBspeciﬁc primers pmrB_GSP1 and pmrB_GSP2 ( Table S1 ) . 
+ These PCR amplicons were sequenced using the BigDye Terminator v3 .1 Cycle Sequencing Kit ( Life Technology ) . 
+ Competitive growth assays
+ The growth of each spontaneous polymyxin-resistant mutant was examined in a mixed competitive growth experiment against EC958lacZ , which has an identical growth rate to WT EC958 .43 Overnight cultures of the spontaneous polymyxin-resistant mutants and EC958lacZ were standardized to OD600 '' 0.05 in LB before mixing at a 1:1 ratio and incubating at 37 C with shaking at 250 rpm for 18 h. cfu counts at time '' 0 and time '' 18 h were performed on MacConkey agar to allow differentiation of EC958Dlac ( white colo-nies ) and resistant mutants ( red colonies ) . 
+ The relative competitive ﬁtness index ( W ) of each polymyxin-resistant mutant compared with WT EC958 was calculated using the formula W '' ln ( cfumutant at t18/cfumutant at t0 ) / ln ( cfu 47 EC958lacZ at t18/cfuEC958lacZ at t0 ) . 
+ Competition assays were performed as 10 independent biological replicate experiments . 
+ Results
+ 100
+ Upregulation of pmrB leads to colistin and polymyxin B resistance
+ A mini-Tn5 mutant library comprising 1 million independent EC958 mutants was screened to identify mutants resistant to poly-myxins following growth in the presence of polymyxin B and colistin , respectively . 
+ The mini-Tn5 transposon employed to mutagenize EC958 was designed such that there was no transcriptional terminator downstream of the chloramphenicol resistance ( cat ) gene ; thus , the promoter of the cat gene can drive the transcription of a downstream gene when the insertion position is favourable . 
+ Consequently , within the transposon mutant library there were two types of polymyxin B/colistin-resistant mutants that could be selected ; mutants in which the transcription of a gene is induced by the upstream insertion of the transposon and mutants in which a gene is insertionally inactivated . 
+ From a total of 2 % 108 EC958 mutants plated on solid LB me-dium supplemented with 5 mg/L colistin or 14 mg/L polymyxin B , 282 colistin-resistant and 243 polymyxin B-resistant colonies , respectively , were obtained . 
+ These mutants were pooled , genomic DNA was extracted and the mini-Tn5 insertion sites were identiﬁed using TraDIS . 
+ The overwhelming majority of transposon-directed reads mapped to the pmrB gene ( EC958_4592 ) , accounting for 95.54 % and 95.78 % of total reads for colistin-resistant and poly-myxin B-resistant mutants , respectively ( Figure 1 and Table S2 ) . 
+ Interestingly , the insertion sites of mini-Tn5 were largely speciﬁc to one location in pmrB ; corresponding to nucleotide 41 ( 94.81 % and 95.75 % of total reads in colistin - and polymyxin B-resistant mutants , respectively ) ( Table S2 ) . 
+ All mini-Tn5 inserted at this site possessed the cat promoter in the same direction as the pmrB CDS ( Figure 2 ) . 
+ Forty-seven mini-Tn5 mutant colonies resistant to colistin , and 42 colonies resistant to polymyxin B were subcultured and screened by PCR for insertion of the mini-Tn5 at nucleotide 41 of pmrB ( pmrB41 : : Tn5 ) . 
+ The results revealed 100 % and 88 % of colo-nies , respectively , contained the insertion at this speciﬁc location . 
+ The MICs for two independently isolated pmrB41 : : Tn5 mutants ( EC958pmrB41-Tn5 ) of colistin and polymyxin B were increased 64 - and 32-fold compared with WT EC958 , respectively ( Table 1 ) . 
+ Furthermore , 50-RACE analysis demonstrated that transcription of the pmrB gene in EC958pmrB41-Tn5 was driven by the cat promoter , conﬁrming that insertion of the mini-Tn5 cassette in this mutant was responsible for the transcription of the downstream pmrB gene . 
+ TraDIS also identiﬁed several genes with a high number of insertion sites but a low number of mapped reads ( ,0.4 % of the total reads ) : envZ , ompR and ompC , indicating that insertional inactivation of these genes might result in tolerance but not growth upon exposure to colistin/polymyxin B ( Figure S1 and Table S3 ) . 
+ Deﬁned mutants inactivated for these genes were generated and shown to exhibit the same MIC as WT EC958 ( data not shown ) . 
+ Thus , these mutations might confer a subtle advantage in survival that we could not detect based on MIC analysis , an interpretation consistent with a recent study that showed antibiotic tolerance facilitates the rapid subsequent evolution of resistance .48 
+ Spontaneous mutations in the pmrB gene also promote resistance to colistin and polymyxin B
+ In parallel to the selection of mini-Tn5 mutants resistant to colistin and polymyxin B , the WT EC958 strain was also cultured under the same conditions ( solid medium supplemented with 5 mg/L co-listin or 14 mg/L polymyxin B ) to select for spontaneously arising resistant mutants . 
+ In this analysis , nine mutants were identiﬁed ( six resistant to colistin and three resistant to polymyxin B ) . 
+ WGS of these resistant mutants revealed that they all contained mutations within the pmrB gene ( Table S4 ) . 
+ The most common pmrB mutation was a G to A non-synonymous substitution at nucleotide 251 ( Cys-84 ! 
+ Tyr ; EC958pmrB-C84Y ) , which was found in seven of the nine mutants . 
+ The mutations in the other two mutants were a G445T substitution ( Asp-149 ! 
+ Tyr ; EC958pmrBD149Y ) and a 12 bp deletion from nucleotide 258 to nucleotide 269 ( GlnAlaValArgArg-86 -- 90 ! 
+ His ; EC958pmrB-D12bp ) , respect-ively . 
+ The MICs for these mutants of colistin and polymyxin B increased by 32 -- 64-fold , regardless of their mode of selection ( Table 1 ) . 
+ Of note , EC958pmrB-D149Y and EC958pmrB-D12bp possessed the highest level of resistance against colistin and poly-myxin B ( MIC '' 8 mg/L ; 64-fold increase compared with the WT ) . 
+ Colistin and polymyxin B resistance is associated with increased transcription of the pmrCAB genes
+ To understand better the mechanism by which mutations in the pmrB gene lead to colistin and polymyxin B resistance in EC958 , transcription of the pmrCAB genes was assessed in each mutant by qRT -- PCR . 
+ In EC958pmrB41-Tn5 , the pmrB gene was 3.1-fold upregulated compared with WT EC958 . 
+ Similarly , the transcription of pmrC and pmrA ( which lie upstream of pmrB ) was also upregulated 6.9 - and 6.6-fold , respectively ( Figure 3 ) , suggesting a feedback loop linking pmrB with their transcription . 
+ In congruence with this result , the spontaneously induced resistant mutants also possessed elevated pmrCAB transcript levels compared with WT 
+ EC958 ( Figure 3 ) . 
+ However , the magnitude of pmrCAB upregulation varied ; the largest increase in pmrCAB transcription was observed in EC958pmrB-D149Y ( 74.8 - , 39.9 - and 14.3-fold increase in pmrC , pmrA and pmrB transcript level , respectively ) . 
+ The pmrCAB transcript level in EC958pmrB-C84Y and EC958pmrBD12bp was similar ; 10 - , 6 - and 3-fold increase compared with WT EC958 ( Figure 3 ) . 
+ Spontaneous mutants resistant to polymyxins incur a ﬁtness cost
+ We hypothesized that the spontaneous mutations in pmrB that led to increased colistin and polymyxin B resistance may incur a ﬁtness cost . 
+ Thus , the growth of each of the polymyxin-resistant mutants was examined in LB broth using a mixed competitive assay against EC958Dlac , a derivative of EC958 that has an identical growth rate as the WT strain ( Figure 4 ) but can be phenotypically differentiated on MacConkey agar . 
+ Overall , all of the spontaneously derived polymyxin-resistant mutants displayed a signiﬁcantly reduced ﬁtness compared with the EC958Dlac strain ( P , 0.0001 , one-way ANOVA ; Figure 4 ) . 
+ The ﬁtness index varied among the pmrB mutants , with EC958pmrB-D149Y , which possessed the greatest resistance to colistin and polymyxin B , conversely exhibiting the greatest growth deﬁciency compared with EC958Dlac ( W '' 0.90 ) . 
+ Both EC958pmrB-C84Y and EC958pmrB-D12bp grew at a reduced rate relative to EC958Dlac ( W '' 0.95 and 0.97 , respectively ) ( Figure 4 ) . 
+ Finally , to investigate the possibility of cross-resistance to other antimicrobial cationic peptides , we measured the MICs for all of the mutants generated in this study of LL-37 , a soluble cationic peptide that contributes to host innate immunity ,49 and a key mediator of mucosal immunity in the urinary tract .50 Our results showed that all mutants exhibited a 2-fold increase in MIC of LL-37 ( from 16 to 32 mg/L ) compared with WT EC958 ( Table 1 ) . 
+ Discussion
+ The 50-year-old cationic peptide antibiotics polymyxin B and colis-tin have recently undergone a revival for clinical use in response to the increasing incidence of infections caused by MDR Gramnegative pathogens .6,51 They are among the few effective antibiotics reserved for ESBL-producing bacteria that are also resistant to gentamicin and carbapenems .6 Despite this , resistance to these antibiotics has been reported in a number of Gramnegative bacteria , including E. coli ,52,53 Salmonella typhimurium ,54 K. pneumoniae ,55 P. aeruginosa56 and A. baumannii .14 Here , we employed two parallel approaches , transposon mutagenesis and spontaneous induction , to understand the evolution of polymyxin resistance in the reference E. coli ST131 strain EC958 . 
+ We used a highly saturated mini-Tn5 mutant library in EC95843 to select for gain-of-function mutants resistant to colis-tin and polymyxin B. Based on our approach , we hypothesized that gain-of-function mutants could be achieved by two mechanisms , upregulation of a downstream gene via the promoter on the mini-Tn5 cassette or insertional inactivation . 
+ Remarkably , one transposon insertion site ( at nucleotide 41 of the pmrB gene ) accounted for .95 % of sequencing reads from TraDIS data acquired from 525 resistant colonies . 
+ This particular site was enriched from among 311 possible insertion sites within the pmrCAB locus from the input mutant library ( Figure 1 ) . 
+ Investigation using 50-RACE revealed that the transcription of pmrB in this mutant ( EC958pmrB41-Tn5 ) was driven by the cat promoter on the mini-Tn5 . 
+ The consequence of this insertion was increased transcription of the pmrCAB genes , leading to a 64 - and 32-fold increase in the MIC of colistin and polymyxin B , respectively ( Table 1 ) . 
+ Notably , the insertion also truncated the pmrB gene , and thus it is possible that the resulting change in the N-terminus of the modiﬁed PmrB changed its activity . 
+ Transposon mutagenesis was used previously in P. aeruginosa to identify genes involved in polymyxin B susceptibility .57 However , this previous study only identiﬁed one mutant with an insertion in the phoQ gene that showed increased resist-ance to polymyxin B . 
+ The second approach employed to select for resistance to colis-tin and polymyxin B involved the isolation of spontaneous mutants following challenge in the presence of high concentrations of the two antibiotics . 
+ In total , nine mutants representing three different types of sequence changes were identiﬁed . 
+ The Cys-84 ! 
+ Tyr substitution was the most common type ( found in seven mutants ) , but conferred the lowest MICs of colistin and polymyxin B ( Figure 2 and Table 1 ) . 
+ In contrast , the 12 bp deletion and Asp-149 ! 
+ Tyr mutation were found once , respectively , and conferred a higher level of resistance to both antibiotics ( Table 1 ) . 
+ The Asp-149 ! 
+ Tyr substitution is located near the autophosphorylation site of PmrB at histidine residue 152 in the histidine kinase domain ( Figure 2 ) . 
+ It is possible that this change may increase PmrB activity , resulting in more phosphorylated PmrA regulator , which increased the transcription of the whole pmrCAB operon as observed by our qRT -- PCR analysis ( Figure 3 ) . 
+ Both the Cys-84 ! 
+ Tyr substitution and the 12 bp deletion are located within the HAMP domain of PmrB , which is proposed to be important for signal transduction from the peri-plasmic input to the kinase domain .58 Mutations within the HAMP domain in other histidine kinases of E. coli such as EnvZ and CpxA result in constitutive activation of their respective receptor histidine kinase .59 -- 61 Thus , the mutations in the HAMP domain of PmrB might lead to constitutive activation of PmrA . 
+ Notably , all three mutants were recovered in reduced numbers compared with WT EC958 in mixed competitive growth experiments , indicating that these mechanisms of resistance to colistin and polymyxin B occurred at a ﬁtness cost to the cell ( Figure 4 ) . 
+ The precise mechanism by which this occurs and the capacity for compensating mutations to alleviate this effect remains an area of future investigation . 
+ Mutations in the PmrAB two-component system have been associated with resistance to colistin and polymyxin B in clinical E. coli isolates ( Table S5 ) . 
+ One study reported a Val-161 ! 
+ Gly mutation in PmrB and a double mutation of Ser-39 ! 
+ Ile and Arg-81 ! 
+ Ser in PmrA in two colistin-resistant E. coli strains isolated from swine faeces ; both mutations resulted in an MIC of colistin of 4 mg/L .62 Another report identiﬁed a range of mutations in PmrB associated with colistin resistance , including Pro-7-Gln-12del ( deletion of six amino acids ) , Ala-159 ! 
+ Val , Thr-156 ! 
+ Lys and Ile-91-Thr-92-ins-Ile ( insertion of isoleucine at position 92 ) .63 The three PmrB mutations reported here are novel in E. coli , and in add-ition to resistance against colistin and polymyxin B these mutations also resulted in a 2-fold increase in the MIC of the human cathelicidin peptide LL-37 . 
+ LL-37 is a cationic peptide produced by urothelial cells that contributes to innate protection against UTI .50 Importantly , these observations are consistent with the concerning trend of cross-resistance to LL-37 through the clinical use of colistin .64 -- 66 Overall , the impact of acquired resistance to colistin and polymyxin B on sensitivity to soluble cationic peptides present in human urine such as LL-37 remains to be assessed . 
+ Our analysis did not identify mutations in other regulatory pathways that led to colistin or polymyxin B resistance . 
+ This includes the PhoPQ TCS and mgrB , both of which play a major role in colistin and polymyxin B resistance in other Gram-negatives .11,16,19,20,67 As our mutant library contained multiple insertions within phoPQ 43 and mgrB , this suggests that mutation of these genes does not lead to signiﬁcant colistin and polymyxin B resistance in EC958 under selection conditions employed in our experiments . 
+ Since its discovery in late 2015 , the plasmid-mediated polymyxin resistance gene mcr-1 has made the threat of pan-22 ,38,68 resistance imminent . 
+ However , despite increased awareness of polymyxin resistance and the global prevalence of mcr-1 ,23 there are very few reports that describe the impact of chromosomal mutations that lead to colistin and polymyxin B resistance in E. coli . 
+ While the work described in this study demonstrates that resistance to colistin and polymyxin B in E. coli occurs through modiﬁcation of the pmrB gene , the fact that mcr-1 has now been 38 described in ST131 highlights the alarming scenario that its spread may be dramatically enhanced in a clone of E. coli that has already demonstrated its capacity to disseminate rapidly across the globe . 
+ The combined effect of chromosomal modiﬁcations in pmrB and acquisition of mcr-1 on polymyxin resistance remains to be evaluated . 
+ Funding
+ This work was supported by a grant from the National Health and Medical Research Council ( NHMRC ) of Australia ( GNT1067455 ) and High Impact Research ( HIR ) grants from the University of Malaya ( UM-MOHE HIR Grant UM C/625/1 / HIR/MOHE/CHAN / 14/1 , no . 
+ H-50001-A000027 ; UM-MOHE HIR Grant UM C/625/1 / HIR/MOHE/CHAN / 01 , no . 
+ A000001 -- 50001 ) . 
+ N. T. K. N. is supported by an Australian Government Research Training Program Scholarship . 
+ M. A. S. , S. A. B. , M. J. W. and D. L. P. are supported by NHMRC Fellowships . 
+ Supplementary data
\ No newline at end of file
--- a/data/TEXT_FILES/notuseful_txt/29339415.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/notuseful_txt/29339415.txt 0 → 100644
View file @27818a9
+ Genome-Wide Identiﬁcation by Transposon Insertion
+ ABSTRACT Escherichia coli K1 strains are major causative agents of invasive disease of newborn infants . 
+ The age dependency of infection can be reproduced in neonatal rats . 
+ Colonization of the small intestine following oral administration of K1 bacteria leads rapidly to invasion of the blood circulation ; bacteria that avoid capture by the mesenteric lymphatic system and evade antibacterial mechanisms in the blood may disseminate to cause organ-speciﬁc infections such as meningitis . 
+ Some E. coli K1 surface constituents , in particular the polysialic acid capsule , are known to contrib-ute to invasive potential , but a comprehensive picture of the factors that determine the fully virulent phenotype has not emerged so far . 
+ We constructed a library and constituent sublibraries of 775,000 Tn5 transposon mutants of E. coli K1 strain A192PP and employed transposon-directed insertion site sequencing ( TraDIS ) to identify genes required for ﬁtness for infection of 2-day-old rats . 
+ Transposon insertions were lacking in 357 genes following recovery on selective agar ; these genes were considered essential for growth in nutrient-replete medium . 
+ Colonization of the midsection of the small intestine was facilitated by 167 E. coli K1 gene products . 
+ Restricted bacterial translocation across epithelial barriers precluded TraDIS analysis of gut-to-blood and blood-to-brain transits ; 97 genes were required for survival in human serum . 
+ This study revealed that a large number of bacterial genes , many of which were not previously associated with systemic E. coli K1 infection , are required to realize full invasive potential . 
+ IMPORTANCE Escherichia coli K1 strains cause life-threatening infections in newborn infants . 
+ They are acquired from the mother at birth and colonize the small intestine , from where they invade the blood and central nervous system . 
+ It is difﬁcult to obtain information from acutely ill patients that sheds light on physiological and bacterial factors determining invasive disease . 
+ Key aspects of naturally occurring age-dependent human infection can be reproduced in neonatal rats . 
+ Here , we employ transposon-directed insertion site sequencing to identify genes essential for the in vitro growth of E. coli K1 and genes that contribute to the colonization of susceptible rats . 
+ The presence of bottlenecks to invasion of the blood and cerebrospinal compartments precluded insertion site sequencing analysis , but we identiﬁed genes for survival in serum . 
+ KEYWORDS Escherichia coli, essential genes, neonates, rat infection model, transposon sequencing
+ arly-onset sepsis and associated septicemia and meningitis are major causes of Emorbidity and mortality in the ﬁrst weeks of life . 
+ In the developed world , encapsulated Escherichia coli and group B streptococci are responsible for the large majority of these infections ( 1 -- 3 ) . 
+ Over 80 % of E. coli blood and cerebrospinal ﬂuid isolates from infected neonates express the ,2 -8 - linked polysialic acid ( polySia ) capsular K1 poly-saccharide ( 4 , 5 ) , a polymer facilitating the evasion of neonatal immune defenses due to its structural similarity to the polySia modulator of neuronal plasticity in the developing human embryo ( 6 ) . 
+ Infections arise due to colonization of the neonatal gastrointestinal ( GI ) tract by maternally derived E. coli K1 at or soon after birth , from where the bacteria invade the systemic circulation to gain entry to the central nervous system ( CNS ) ( 7 , 8 ) . 
+ Essential features of human infection can be reproduced in the neonatal rat , enabling investigation of the pathogenesis of invasive neonatal infections by E. coli K1 ( 9 -- 11 ) . 
+ In susceptible 2-day-old ( P2 ) rat pups , the protective mucus layer in the small intestine ( SI ) is poorly developed but matures to full thickness over the period from P2 to P9 , coincident with the development of resistance to invasive infection from GI tract-colonizing E. coli K1 ( 12 ) . 
+ Thus , oral administration of E. coli K1 initiates stable colonization of the small intestine in both P2 and P9 pups but elicits lethal systemic infection only in younger animals ( 13 ) . 
+ In the absence of an effective mucus barrier at P2 , the colonizing bacteria make contact with the apical surface of enterocytes in the midregion of the small intestine before translocation to the submucosa by an incompletely deﬁned transcellular pathway ( 12 ) . 
+ The bacteria subsequently gain access to the blood compartment by evading mesenteric lymphatic capture ( 10 , 14 ) . 
+ E. coli K1 cells strongly express polySia in blood ( 15 ) , and the capsule may protect the bacteria from complement attack during this phase of infection by facilitating the binding of complement regulatory factor H to surface-bound C3b to prevent the activation of the alternative pathway ( 16 , 17 ) . 
+ Following hematogenous spread , the bacteria enter the CNS via the blood-cerebrospinal barrier at the choroid plexus epithelium to colonize the meninges ( 15 ) . 
+ Some microorganisms that invade the CNS enter across the cerebral microvascular endothelium of the arachnoid membrane ( 18 ) , although the restricted distribution of E. coli K1 within the CNS suggests that this is not a primary route of entry for this pathogen . 
+ Only a limited number of pathogenic bacteria have the capacity to invade the CNS from a remote colonizing site , and the large majority elaborate a protective capsule that facilitates the avoidance of host defenses during transit to the site of infection ( 19 ) . 
+ Although the polySia capsule is clearly necessary for the neonatal pathogenesis of E. coli K1 ( 11 ) , the large majority of bacterial virulence factors that facilitate transit from the GI tract to the brain are unknown . 
+ A number of potential virulence factors associated with neonatal bacterial meningitis have been deﬁned by phylogenetic analysis ( 20 ) , and there is good evidence that the genotoxin colibactin and the siderophore yersiniabactin contribute to the pathogenesis of E. coli K1 in experimental rats ( 21 -- 23 ) ; however , a more detailed understanding of virulence mechanisms of E. coli K1 invasive disease will present opportunities for new modes of therapy for these devastating infections . 
+ Transposon insertion sequencing ( 24 , 25 ) , a combination of traditional transposon mutagenesis and massively parallel DNA sequencing , is a powerful tool for the genome-wide enhanced genetic screening of large pools of mutants in a single experiment . 
+ This method was recently used to determine the full complement of genes required for the expression of the K1 capsule by a uropathogenic E. coli isolate ( 26 ) . 
+ This technique can be used to detect variations in the genetic ﬁtness of individual mutants undergoing selection in colonized and infected hosts . 
+ There are a number of variations of this procedure , but they all rely on the creation of a pool of insertion mutants in which every locus has been disrupted at multiple sites ; determination of the site of transposon insertion by sequencing of transposon junctions within chromosomal DNA before and after the application of selective pressure will identify mutants attenuated under selective conditions ( 27 ) . 
+ Thus , genes that confer ﬁtness during Klebsiella pneumoniae ( 28 ) and Acinetobacter baumannii ( 29 ) lung persistence , systemic and mucosal survival of Pseudomonas aeruginosa ( 30 ) , and spleen colonization in the mouse by uropathogenic E. coli ( 31 ) have been identiﬁed by this approach . 
+ In this study , we employ transposon-directed insertion site sequencing ( TraDIS ) ( 24 ) to interrogate a library of 775,000 Tn5 mutants or constituent sublibraries of E. coli K1 strain A192PP for genes essential for growth in vitro and for GI colonization , invasion , and systemic survival in susceptible P2 rat pups . 
+ In addition , we identiﬁed `` bottlenecks '' ( 32 ) to systemic invasion that restrict population diversity and limit the potential for transposon insertion site analysis of infection in neonatal rats with GI colonization . 
+ RESULTS
+ Generation of a Tn5 mutant library and identiﬁcation of essential genes . 
+ To provide sufﬁcient saturation density for the identiﬁcation of E. coli K1 genes essential for growth in vitro and of those conferring ﬁtness in a range of deﬁned environments , approximately 300 individual pools , each with 1 103 to 5 103 transposon mutants of E. coli A192PP , were constructed and combined to form a library containing over 7.75 105 mutants . 
+ Linker PCR was performed on randomly selected mutants to conﬁrm that Tn5 had inserted into random genomic locations ( see Fig . 
+ S1 in the supplemental material ) . 
+ TraDIS was performed on pooled but uncultured mutants to identify Tn5 insertion sites within the 5.52-Mbp genome of A192PP ( 33 ) . 
+ Sequences of indexed amplicons were determined , and 2 106 sequence reads containing Tn5 were mapped onto the E. coli K1 A192PP genome . 
+ Reads mapped to 237,860 unique Tn5 insertion sites and were distributed along the entire genome ( Fig. 1A ) . 
+ As the Tn5 library contained a high transposon insertion density , genes with no or limited numbers of Tn5 insertion sites are likely to be essential for growth in nutrient-replete media such as Luria-Bertani ( LB ) broth . 
+ We calculated insertion indices for each gene by normalizing the number of insertions in each gene by the gene length . 
+ Insertion index values for two technical replicates were highly correlated ( Spearman 's rho 0.9589 ) ( Fig. 1B ) . 
+ A density plot of insertion indices produced a bimodal distribution , with a narrow peak representing genes with no or a limited number of Tn5 insertions and a broad peak containing genes with a large number of Tn5 insertions ( Fig. 1C ) ; the former comprised genes that confer lethality when mutated , and the latter comprised genes that can be mutated without affecting bacterial viability . 
+ To identify genes that signiﬁcantly lack Tn5 insertions and therefore are essential for in vitro growth , gamma distributions from the density plot were used to determine log2 likelihood ratios . 
+ Examples of essential genes containing no or limited numbers of Tn5 insertions are shown in Fig. 1D . 
+ A total of 357 genes were predicted to be essential for the in vitro growth of E. coli K1 A192PP , and these genes are shown in Table S1 in the supplemental material , together with KEGG ( Kyoto Encyclopedia of Genes and Ge-nomes ) descriptors for genes involved in metabolic pathways . 
+ COG ( Clusters of Orthologous Groups ) was used to identify the functional category of each gene essential for growth in vitro from the A192PP whole-genome sequence ( BioProject accession number PRJEB9141 ) . 
+ Genes involved in the ribosomal structure ( 11 % of the total number of essential genes ) and protein biosynthesis ( 15 % ) featured prominently and were signiﬁcantly enriched in relation to their representation within the whole genome , as were genes encoding proteins for DNA replication ( 3 % ) , cell wall ( peptidoglycan and lipopolysaccharide ) biosynthesis ( 6.25 % ) , and membrane biogenesis ( 3 % ) ( Fig. 2 ) . 
+ Genes for protein secretion and export as well as ABC transporter genes were also well represented ; the remaining essential genes were involved in a wide variety of cellular catabolic and anabolic functions . 
+ The list features 254 genes that were found by TraDIS ( 34 ) to be essential for the growth of an E. coli sequence type 131 ( ST131 ) multidrug-resistant urinary tract isolate ( from a total of 315 essential genes ) in Luria broth . 
+ In a similar fashion , 253 genes determined to be essential for the growth of E. coli K-12 MG1655 in LB broth were also identiﬁed as being essential in the present study ( Table S1 ) ; the K-12 study employed a comprehensive set of precisely deﬁned , in-frame , single-gene deletion mutants ( 35 ) and not transposon insertion sequencing . 
+ Maintaining Tn5 library diversity . 
+ The polySia capsule is a major determinant of virulence in E. coli K1 and is central to the capacity of K1 clones to cause neonatal systemic infection ( 11 , 36 ) . 
+ PolySia biosynthesis imposes a substantial metabolic burden on producer strains ( 37 ) . 
+ As TraDIS and other transposon insertion sequencing procedures generally employ growth in liquid medium for recovery and expansion of the output pool ( 38 ) , we investigated the impact of batch culture on the expression of the K1 capsule within the Tn5 library . 
+ The complete Tn5 library was inoculated into LB broth and incubated for 8 h at 37 °C , and the proportions of encapsulated and nonencapsulated A192PP bacteria were determined by susceptibility to the E. coli K1-speciﬁc bacteriophage K1E within the population . 
+ Nonencapsulated mutants initially comprised 4.66 % of the bacterial population , but by the end of the incubation period , this value had risen to 98.24 % ( Fig. 3A ) . 
+ Growth rates in LB broth of E. coli A192PP and a nonencapsulated mutant of A192PP randomly selected from the Tn5 library did not differ signiﬁcantly ( Fig . 
+ S2A ) . 
+ The cultured Tn5 library was avirulent , as determined by administration to P2 neonatal rat pups , whereas GI colonization with 2 106 to 6 106 CFU E. coli A192PP and the uncultured Tn5 library was lethal . 
+ A similar colonizing inoculum of the cultured ( 8 h at 37 °C ) E. coli A192PP-Tn5 library had no impact on survival , and all pups remained healthy over the 7-day observation period ( Fig. 3B ) , even though all animals remained heavily colonized with K1 bacteria throughout the experiment ( data not shown ) . 
+ Thus , culture of the library prior to challenge resulted in the loss of phenotypic diversity and virulence . 
+ The complete Tn5 library contained 2.81 105 unique Tn5 insertions , of which 750 ( 2.66 % of the bacterial population ) possessed transposon insertions in genes determining capsule biosynthesis ( data not shown ) . 
+ The probability that cultured sublibraries of more than 5 103 mutants contained a noncapsulated mutant was calculated to be 0.98 but was only 0.55 for sublibraries of 1 103 mutants . 
+ Low-complexity libraries of 103 mutants maintained virulence in P2 neonatal rat pups after culture , whereas more-complex libraries did not ( Fig . 
+ S3 ) , due to the absence of mutants lacking the capacity to express the polySia capsule within the inoculum . 
+ To minimize bias , in all subsequent experiments , libraries of sufﬁcient complexity to contain multiple numbers of nonencapsulated mutants were used ; for experiments utilizing neonatal rats , the period between colonization initiation and tissue harvesting was kept to a minimum , and tissue homogenates were cultured directly on selective agar plates with no intervening liquid culture step . 
+ Genes required for GI colonization . 
+ E. coli A192PP bacteria colonize the small intestine of neonatal rats following oral administration of the bacterial bolus , with 107 to 108 K1 bacteria/g intestinal tissue persisting for at least 1 week ( 12 , 13 ) . 
+ Translocation of the neonatal pathogen to the blood compartment via the mesenteric lymphatic system occurs predominantly , and in all likelihood exclusively , across the epithelium of the midsection of the small intestine ( MSI ) , even though the density of colonizing bacteria in this region of the GI tract is no higher than that within neighboring proximal small intestine ( PSI ) or distal small intestine ( DSI ) locations ( 12 ) . 
+ Few attempts have been made to determine the genes or gene products required by E. coli K1 for colonization of the GI tract ( 39 ) . 
+ To prevent a loss of diversity of the E. coli K1 A192PP-Tn5 library , we minimized the period of colonization before sampling the E. coli K1 population of the MSI . 
+ The colonizing E. coli K1 population in proximal , middle , and distal regions of the small intestine did not expand beyond 4 h after the initiation of colonization ( Fig. 4A ) ; GI tissues were therefore excised at this time point . 
+ To identify mutants with a decreased capacity to colonize the MSI , P2 rats were fed 1 109 CFU of an E. coli K1 A192PP-Tn5 library containing 2 105 mutants , the pups were sacriﬁced after 4 h , and E. coli K1 bacteria in the MSI were enumerated . 
+ The bacterial load in rats colonized with the Tn5 library was comparable to that in rats colonized with the wild-type strain ( data not shown ) . 
+ MSI tissues from four rats were pooled , homogenized , and cultured on LB agar containing kanamycin to ensure that the mutant frequency was not overestimated by the inclusion of measurements of DNA from dead bacteria ; Kanr colonies were then pooled , DNA was extracted , and the ﬁtness of each mutant was determined by TraDIS . 
+ Input and output pools each comprised 2 105 CFU , and the ratios of input to MSI read counts were expressed as log2 fold changes . 
+ A wide distribution of ﬁtness scores ( 40 ) was detected ( Fig. 4B ) . 
+ The majority of transposon insertions did not have a strong negative or positive effect on colonization of the MSI . 
+ A total of 387 transposon insertions , within 167 genes , showed signiﬁcantly decreased normalized read counts between input and output pools ( negative log2 fold change and P 0.05 ) ( see Table S2 in the supplemental material ) . 
+ Of the 387 insertion sites , 180 were not detectable in the output pool , demonstrating a complete loss in the output pool . 
+ Many of these transposon insertion sites occurred within the same gene ( Table S2 ) . 
+ For example , within the neuC gene , 70 unique transposon insertion sites were identiﬁed as being lost during colonization . 
+ Transposon-interrupted genes were identiﬁed as being important for colonization of the MSI and were grouped into seven arbitrary categories : ( i ) genes encoding surface structures , including pili ; ( ii ) genes encoding secretory components ; ( iii ) genes involved in intermediary metabolism ; ( iv ) stress response genes ; ( v ) cytoplasmic membrane ( CM ) - located genes ; ( vi ) genes for iron acquisition ; and ( vii ) others and hypothetical genes . 
+ High proportions of mutations associated with a decreased MSI-colonizing capacity were located in genes affecting the biosynthesis of surface structures ( Table S2 ) . 
+ A few genes were involved in lipopolysaccharide ( LPS ) biosynthesis ( yrbH and yiaH ) and outer membrane ( OM ) protein biosynthesis ( ompG and ycbS ) , but the majority affected the polySia capsule , with genes of the neu operon ( 41 ) accounting for 194 of the 387 colonization-attenuated mutants . 
+ There is some evidence that capsular polysaccharides may promote adhesion to biological and nonbiological surfaces during bioﬁlm formation ( 37 ) , but there has been little or no consideration of a role for capsules as mediators of GI colonization . 
+ A limited number of genes associated with type II and IV secretion were identiﬁed as being required for colonization of the MSI ; these multiprotein complexes translocate a wide range of proteins and protein complexes across host membranes ( 42 , 43 ) and are implicated in adherence and intestinal colonization by enterohemorrhagic E. coli in farm animals ( 44 ) . 
+ Genes for the assembly of pilus proteins , including some carried on the tra locus , which are likely to be located on plasmids that initiate conjugation , were also linked to colonization ; pili are virulence factors that may mediate attachment to and infection of host cells ( 45 ) . 
+ Colonization by both commensals and pathogens is dependent on nutrient scavenging , sensing of chemical signals , and regulation of gene expression as the bacteria adapt to a new and potentially hostile environment that in the case of E. coli K1 appears to rely on stress response genes such as yhiM ( which encodes a protein aiding survival at low pH ) , the heat shock protein genes clpB and yrfH , as well as DNA repair genes . 
+ A large number of genes encoded enzymes involved in the metabolism of sugars ( e.g. , gcd , rpiR , and glgC ) , amino acids ( dadX , metB , and tdcB ) , fatty acids ( yafH and ﬁxA ) , growth factors ( bisC , yigB , and thiF ) , and other secondary metabolites ( yicP ) . 
+ Transporters and permeases involved in central intermediary metabolism were also featured prominently : these proteins included permeases of the major facilitator superfamily ( YjiZ ) , the hexose phosphate transport protein UhpT , the carnitine transporter CaiT , and a range of CM-located sugar transporters . 
+ Of note was the impact of a mutation of the fucR L-fucose operon activator on colonization ; fucose is abundant in the GI tract , and the fucose-sensing system in enterohemorrhagic E. coli regulates colonization and controls the expression of virulence and metabolic genes ( 46 ) . 
+ The availability of free iron is severely limited in the GI tract , and ingestion of iron predisposes an individual to infection ( 47 ) ; the importance of iron acquisition for E. coli K1 during GI colonization is reﬂected in the requirement for a number of genes related to iron uptake ( e.g. , feoB and fepA ) . 
+ GI tract-colonizing capacity and virulence of single-gene mutants . 
+ To investigate the contribution of the polySia capsule to the colonization of the neonatal rat GI tract , we disrupted the neuC gene of E. coli A192PP using bacteriophage Red recombinase to produce a capsule-free mutant as judged by resistance to E. coli K1-speciﬁc phage K1E . 
+ We also produced other single-gene mutants for genes identi-ﬁed by the TraDIS GI screen : vasL ( encoding a type IV secretion system protein ) , yfeC ( predicted to form part of a toxin/antitoxin locus ) , and two genes with unknown function , yaeQ and A192PP_3010 ( the latter is present in genomes of other extraint-estinal E. coli pathogens , including IHE3034 , UTI89 , RS218 , PMV-1 , and S88 ) . 
+ Growth rates of these mutants , in particular the capsule-negative neuC mutant ( see Fig . 
+ S2B in the supplemental material ) , were indistinguishable from that of the E. coli A192PP parent in LB medium . 
+ All mutants were examined for their capacity to colonize the GI tract and cause lethal infection in P2 rat pups ( Fig. 4C and D ) . 
+ The E. coli A192PP parent strain or single-gene mutants ( 2 106 to 6 106 CFU ) were administered orally to P2 rats ; all members of a litter of 12 pups received the same strain . 
+ Pups were sacriﬁced 24 h after initiation of colonization , and E. coli K1 bacteria in the small intestine ( PSI , MSI , and DSI ) and colon were enumerated . 
+ The capacity of all mutants to transit through the upper portion of the alimentary canal , pass through the stomach , and colonize the small intestine was markedly inferior to that of the wild-type strain ( Fig. 4C ) . 
+ Reductions in colonization of the PSI , MSI , and DSI by the mutants , including E. coli A192PPΔneuC : : kan , were signiﬁcant , with the only exception being colonization of the DSI by A192PPΔyfeC : : kan , with no signiﬁcant difference between the parent and mutant . 
+ Interestingly , no increases in the numbers of viable A192PPΔneuC : : kan , A192PPΔvasL : : kan , A192PPΔ3010 : : kan , and A192PPΔyaeQ : : kan bacteria recovered from the colon were noted to compensate for reductions in the colonization of the small intestine . 
+ There was a signiﬁcant increase in the colonic burden of viable A192PPΔyfeC : : kan bacteria compared to that of bacteria of the parent strain . 
+ We established previously ( 12 ) that E. coli A192PP transits to the blood circulation via the mesenteric lymphatic system by exploiting a vesicular pathway through the GI epithelium only at the MSI . 
+ As numbers of mutant bacteria colonizing this region of the small intestine were much reduced compared to those of the parent strain , we determined the capacity of the single-gene mutants to elicit lethal systemic infection following GI colonization by oral administration of 2 106 to 6 106 bacteria at P2 ( Fig. 4D ) . 
+ Four of the ﬁve mutants ( A192PPΔneuC : : kan , A192PPΔvasL : : kan , A192PPΔ3010 : : kan , and A192PPΔyfeC : : kan ) displayed signiﬁcantly reduced lethal potential compared to that of the A192PP parent . 
+ The loss of capsule ( A192PPΔneuC : : kan ) resulted in a complete loss of lethality over the 7-day observation period . 
+ The administration of A192PPΔvasL : : kan elicited a lethal response in 41.6 % of pups ; 33.3 % and 25 % of pups survived after receiving A192PPΔ3010 : : kan and A192PPΔyfeC : : kan , respectively , at P2 . 
+ For A192PPΔyaeQ : : kan , 75 % of pups succumbed to lethal infection , but this did not reach levels of signiﬁcance compared to the 100 % lethality engendered by the A192PP parent ( P 0.05 ) . 
+ Overall , these data indicate that the TraDIS screen efﬁciently identiﬁed genes important for MSI colonization that impact pathogenic potential . 
+ A bottleneck to infection in the neonatal rat prevents identiﬁcation of genes for translocation across the gastrointestinal epithelium . 
+ Our initial intention was to exploit the high degree of susceptibility of P2 neonatal rats to systemic infection , sepsis , and meningitis following oral administration of an effective dose of E. coli A192PP bacteria in order to determine all genes required to enable the neonatal pathogen to overcome previously deﬁned ( 12 -- 15 ) physical and immunological barriers to invasion of the blood circulation and dissemination to the meninges . 
+ However , previous studies indicated that relatively few E. coli K1 bacteria migrate from colonized sites within the GI tract to the blood ( 10 ) , constraining the genetic diversity of the translocated bacterial population and eliminating genotypes from the translocated gene pool in a stochastic manner that does not reﬂect the ﬁtness of individual genes to contribute to genotypes with invasive potential ( 32 ) . 
+ We therefore determined if there were bottlenecks that would compromise the identiﬁcation of mutants with an attenuated capacity to translocate from the GI tract to the blood compartment ; if any experimental bottlenecks are narrower than the complexity of the E. coli A192PP Tn5 library , many relevant transposon insertion mutants will be lost entirely by chance ( 38 ) . 
+ Furthermore , the existence of a restrictive bottleneck would limit the complexity of the library that could be used for TraDIS evaluation of populations colonizing the MSI ( input pool ) and reaching the blood ( output pool ) . 
+ We constructed the E. coli A192PPΔlacZ mutant by bacteriophage Red recombineering and conﬁrmed that there was no signiﬁcant difference in lethal potential between E. coli A192PP and the lacZ mutant ( Fig. 5A ) . 
+ We then used mixtures of parent and mutant bacteria to investigate the existence of bottlenecks that restrict translocation to the blood compartment . 
+ A 1:1 mixture ( total of 2 106 to 4 106 CFU ) of E. coli A192PP and A192PPΔlacZ was administered orally to P2 rat pups , the animals were sacriﬁced after 24 h , and GI tissue homogenates were plated for the quantiﬁcation of bacteria of each strain . 
+ The competitive index ( CI ) , the ratio of input A192PP and A192PPΔlacZ bacteria to output A192PP and A192PPΔlacZ bacteria , was calculated for excised PSI , MSI , DSI , colon , and mesenteric lymphatic tissues and for blood . 
+ CI values for the PSI , MSI , DSI and colon were not signiﬁcantly different from 1 ( one-sample t test ) , indicating that the composition of the colonizing inoculum was maintained in each rat pup ( Fig. 5B ) . 
+ However , there was more heterogeneity in CI values of bacterial populations from the blood , and in ﬁve pups , only one strain could be recovered from the blood ( the parent strain only for four animals and A192PPΔlacZ only for one animal ) . 
+ The highly restrictive bottleneck between GI epithelial transport and entry into the blood circulation supports the argument that the reduced virulence of the complete , cultured library in comparison to that of less-complex sublibraries ( Fig . 
+ S3 ) is due at least in part to a reduced likelihood that a fully virulent mutant would randomly escape capture by the mesenteric lymphatic system . 
+ The presence of signiﬁcant bottlenecks between the GI tract , blood circulation , and brain was conﬁrmed by determination of the complexity of recovered Tn5 library populations from these sources ( Fig. 5C ) . 
+ Identiﬁcation of E. coli K1 A192PP genes required for survival in human serum . 
+ Systemic infection in neonatal rats is likely to be maintained only if E. coli A192PP bacteria survive in the blood circulation . 
+ Due to the limited exposure to antigens in utero coupled with deﬁcits in adaptive immunity , neonates depend on innate immunity for protection against infection . 
+ The complement system provides frontline innate defense against Gram-negative bacterial infection , and the polySia capsule in turn enables E. coli K1 to avoid successful complement-mediated attack by host immune mechanisms . 
+ To obtain insights into E. coli K1 pathogenesis during the invasive phase of infection , and in light of restrictions placed on the neonatal rat model with regard to the use of TraDIS by the gut-to-blood bottleneck , we used the E. coli A192PP Tn5 library to investigate genes essential for A192PP ﬁtness in pooled normal human serum , a reliable and plentiful source of all soluble components of the three complement pathways ( 48 ) . 
+ E. coli A192PP is resistant to the bactericidal action of human serum ( Fig. 6A ) . 
+ A portion of the A192PP-Tn5 library containing 2 104 mutants ( 1 109 CFU ) was incubated in either 30 % human serum or 30 % heat-inactivated serum ( ﬁnal volume of 375 l ) at 37 °C for 3 h. Kanr bacteria in the input and output pools ( each with 2 105 bacteria ) were collected , DNA was extracted from each pool , and transposon insertion sites were sequenced . 
+ A wide distribution of ﬁtness scores was detected ( Fig. 6B ) . 
+ Mutation of 97 genes ( negative log2 fold change and P 0.05 ) resulted in decreased survival in normal serum but not in heat-inactivated serum ( Fig. 6C ; see also Table S3 in the supplemental material ) . 
+ A high proportion of genes identiﬁed in the TraDIS screen as contributing to resistance encoded cell surface constituents . 
+ It is well established that the polySia capsule protects E. coli K1 from complement attack ( 16 , 17 ) , and three mutations in the kps capsule gene cluster compromised serum survival . 
+ The central region of the cluster contains the neu genes that direct the biosynthesis , activation , and polymerization of the N-acetylneuraminic acid building block of polySia . 
+ neuC encodes the UDP N-acetylglucosamine 2-epimerase that catalyzes the formation of N-acetylmannosamine ( 49 ) , and the O-acetyltransferase neuD acetylates monomeric neuraminic acid at carbon position 7 or 9 ( 50 ) . 
+ KpsM is a component of the multimeric ATP-binding cassette transporter involved in the trans-location of the polySia capsule through a transmembrane corridor to the cell surface ( 41 , 51 ) . 
+ A disruption of the genes encoding these proteins will prevent polySia expression ( 41 ) ; the interruption of rfaH , identiﬁed in the TraDIS screen , will also prevent capsule expression , but its loss will have a more profound effect on the surface topography of E. coli A192PP , as this transcriptional antiterminator is required for the expression of operons that direct the synthesis , assembly , and export of LPS core components , pili , and toxins in addition to the capsule ( 52 , 53 ) . 
+ Indeed , survival in serum is dependent on antitermination control by RfaH ( 54 ) . 
+ Another identiﬁed gene that impacts capsule formation was bipA ; BipA is a tyrosine-phosphorylated GTPase that regulates a variety of cell processes , including some associated with virulence , through the ribosome ( 55 , 56 ) . 
+ Other genes involved in LPS biosynthesis and pilus formation were also identiﬁed : waaW encodes a UDP-galactose : ( galactosyl ) LPS alpha1,2-galactosyl-transferase involved in the synthesis of the R1 and R4 LPS core oligosaccharides ( 57 ) , and wzzE encodes a polysaccharide copolymerase that catalyzes the polymerization of LPS O-antigen oligosaccharide repeat units into a mature polymer within the periplasmic space in readiness for export to the cell surface ( 58 ) . 
+ Both mutations will prevent the attachment of LPS O-antigen side chains to the core oligosaccharide of LPS . 
+ The 16 genes that specify pilus synthesis that were identiﬁed in the screen included the majority of the genes of the tra locus . 
+ The TraDIS screen identiﬁed a range of proteins that are embedded in the OM ( Fig. 6C ) , none of which had been previously implicated in complement resistance . 
+ These proteins are likely to inﬂuence the topography of the bacterial surface . 
+ Of the remaining genes with an assigned function , the majority were involved in cell metabolism and the stress response ; it is well established that metabolic processes are intimately associated with the process of complement-mediated killing of bacteria ( 59 , 60 ) . 
+ To verify the screen , we constructed four single-gene mutants of E. coli A192PP by bacteriophage Red recombineering . 
+ Genes with roles in LPS synthesis ( rfaH and waaW ) , capsule synthesis ( neuC ) , and pilus assembly ( traL ) were mutated ; none of these mutants showed any reduction in growth rates in LB broth . 
+ All mutants displayed signiﬁcant reductions in complement resistance following incubation in pooled human serum ( Fig. 6D ) . 
+ E. coli A192PPΔrfaH was exquisitely susceptible , with no colonies being detected after 30 min . 
+ The viability of A192PPΔneuC was also compromised , with a log reduction in viability over the 3-h incubation period of 3-fold . 
+ Killing of A192PPΔtraL and A192PPΔwaaW was less marked , but these mutations signiﬁcantly reduced viability . 
+ Complementation of the mutants with the functional gene introduced on a pUC19 vector completely restored resistance in all cases ( Fig. 6D ) . 
+ These genes also contrib-uted to lethality in P2 neonatal rats ( Fig. 6E ) . 
+ The lethal capacity of A192PPΔneuC , A192PPΔrfaH , and A192PPΔwaaW was completely attenuated in comparison to that of E. coli A192PP ; 42 % of pups administered A192PPΔtraL succumbed to systemic infection ( all P 0.01 ) . 
+ DISCUSSION
+ Systemic infection with meningeal involvement arises spontaneously after GI colonization of neonatal rats with a high proportion of E. coli K1 isolates , and the pathway to infection mirrors to a large extent that of natural infections in the human host . 
+ In contrast to models of bacterial infection that create artiﬁcial pathogenesis by bypassing some or all of the barriers to infection by the injection of a bacterial bolus directly into the blood circulation , the neonatal rat model provides an opportunity to investigate the progress of the pathogen as it transits from gut to blood to brain in a stepwise fashion . 
+ TraDIS and other transposon sequencing methods enable the simultaneous and rapid determination of the ﬁtness contribution of every gene under a given condition and therefore have the potential to enable the identiﬁcation of genes that are essential for , or signiﬁcantly contribute to , each step of the infection process . 
+ However , stochastic loss will become evident if each mutant in the input pool does not have an equal chance to overcome the physical , physiological , and immunological barriers presented by the host ( 61 ) . 
+ This was clearly the case with the epithelial transit of E. coli A192PP , with evidence that on occasion systemic infection arose due to only one viable bacterial cell entering the blood circulation ( Fig. 5B ) , and complements other studies showing single-cell or low-cell-number bottlenecks in models of severe infection ( 62 -- 64 ) . 
+ As translocation from colonization sites within the MSI to the blood was not amenable to analysis by TraDIS , we determined genes essential for survival in the presence of complement , a major component of the innate immune system that protects against extracellular systemic pathogens ( 17 ) . 
+ The high density of transposon insertions into random genomic positions along the entire E. coli A192PP chromosome , with minimal insertional bias ( Fig. 1A ) , enabled the identiﬁcation of genes essential for growth in nutrient-replete LB medium . 
+ Of the 357 E. coli A192PP genes considered essential , orthologues of 254 ( from 315 ) were previously identiﬁed using TraDIS in a multidrug-resistant uropathogenic strain of E. coli ST131 grown in LB broth ( 34 ) , and 253 were identiﬁed in an E. coli K-12 strain ( 35 ) , conﬁrming the existence of a core set of essential genes in E. coli . 
+ As anticipated , a high proportion of these genes encoded enzymes involved in a range of key metabolic functions such as carbohydrate , protein , and nucleobase metabolism , and the remainder were associated with essential functions such as transport , cell organization , and biogenesis . 
+ During characterization of the E. coli A192PP mutant library , we examined the impact of culture in liquid medium on the expression of the polySia capsule , which places large demands on cell energy expenditure , as lengthy incubation times before marker selection may decrease library complexity ( 38 ) . 
+ Unexpectedly , we found that prolonged culture of the library enriched the proportion of nonencapsulated mutants ( Fig. 3A ) . 
+ We anticipated that the loss of capsule would enable the nonencapsulated mutants to grow at a higher rate than those of capsule-replete mutants and the wild type and to outcompete capsule-bearing library members . 
+ However , the growth of a nonencapsulated mutant selected at random from the library was virtually identical to , and not signiﬁcantly different from , that of the E. coli A192PP parent strain ( see Fig . 
+ S2A in the supplemental material ) . 
+ There was also no difference in the climax populations of the strains at the end of the logarithmic phase of growth . 
+ In a similar fashion , the growth curve for a neuC single-gene mutant was identical to that of E. coli A192PP ( Fig . 
+ S2B ) . 
+ neuC is involved in the synthesis of the N-acetylneuraminic acid monomeric unit of polySia and , as a consequence , is unable to elaborate the capsule . 
+ It is clearly impractical to evaluate the growth kinetics of every distinct nonencapsulated mutant in the Tn5 library , but it currently appears that differences in growth rates of individual library members can not explain the highly reproducible enrichment that we observed . 
+ Indeed , the use of transposon insertion libraries is predicated on the assumption that there are no signiﬁcant differences in the growth rates of individual mutants . 
+ At present , the basis of the loss of mutants expressing capsule in TraDIS library cultures can not be readily explained . 
+ A sublibrary of 2 105 mutants was used to establish genes involved in GI colonization . 
+ To minimize bias due to any outgrowth of nonencapsulated mutants on the GI epithelium , we harvested E. coli K1 from the MSI after 4 h , by which time maximal CFU had been achieved ; bacteria were plated directly on solid medium to further avoid outgrowth . 
+ Bias due to this restricted timeline is likely to be low , as the majority of genes involved in adhesion and complement resistance are expressed constitutively . 
+ TraDIS identiﬁed the polySia capsule as a major determinant of GI colonization associated with E. coli K1 . 
+ There is little or no evidence from the literature that capsules of Gram-negative bacteria enhance GI colonization ; indeed , it has been reported that they interfere with adhesive interactions by obstructing the binding of underlying surface molecules to mucosal surfaces ( 65 , 66 ) . 
+ The E. coli A192PPΔneuC : : kan single-gene mutant displayed a reduced capacity to colonize the MSI ( Fig. 4C ) , although it should be borne in mind that passage through the upper alimentary canal and stomach may impact the number of mutant bacteria gaining access to the small intestine . 
+ In this context , it should be noted that capsular exopolysaccharide protects E. coli from the environmental stress of stomach acid ( 67 ) . 
+ Other cell surface structures that are likely to have an impact on adhesion to and colonization of the mucosal layer associated with the MSI were identiﬁed by TraDIS . 
+ Pili are established mediators of adhesion of E. coli to the host epithelium , although a large proportion of the evidence comes from enterotoxigenic and enteropathogenic strains ( 68 , 69 ) . 
+ LPS and OM protein-encoding genes were also implicated , as were genes involved in the stress response , reﬂecting ongoing adaptation to a new and hostile environment . 
+ The involvement of genes encoding metabolic enzymes , including some for anaerobic respiration , equates to increases in bacterial cell numbers in the anaerobic environment of the small intestine , and for iron acquisition genes , this reﬂects the low availability of intestinal luminal iron ( 47 , 70 ) . 
+ Genes encoding some components of type II and type IV secretion systems were found at decreased frequencies in the output pool . 
+ Members of these gene categories were also identiﬁed previously by Martindale et al. ( 39 ) as being necessary for GI colonization by E. coli K1 fecal isolate RS228 using signature-tagged mutagenesis ; no genes found in that study were identiﬁed in the present study , in spite of the close genetic relatedness of the strains employed . 
+ The intestinal lumen represents a potentially important portal of entry for patho-gens into the host through adhesion , invasion , or disruption of the epithelial barrier ( 71 ) . 
+ In neonatal rats , E. coli K1 induces no detectable disruption of barrier integrity but exploits an intracellular pathway to access the submucosa ( 12 ) . 
+ Only small numbers of bacteria breach the mesenteric lymphatic barrier in an apparently random fashion ( Fig. 5 ) , and this precludes analysis by TraDIS . 
+ To accumulate data on genes and gene products facilitating invasion and survival/replication in the blood circulation , we examined essentiality for avoiding complement-mediated bactericidal effects . 
+ Although not all E. coli K1 isolates from cases of systemic infection are resistant to complement , resistance is encountered more frequently among K1 and K5 capsular types than among other K types ( 72 ) ; E. coli O18 : K1 strains ( such as A192 ) are in turn resistant more often than are other O : K serotype combinations ( 73 ) due to the capacity of the polySia capsule to prevent complement activation . 
+ It is assumed , but not established , that the polySia capsule surrounding susceptible strains does not completely mask either OM-located activators of complement or lipid domains on the outer surface of the cell that are targets for OM intercalation of the C5b-9 membrane attack complex , the entity responsible for bacterial killing ( 59 ) . 
+ In addition , long and numerous LPS O-antigen side chains are necessary but not sufﬁcient to enable the target cell to avoid complement killing ( 74 ) , and they are able to bind the C1 inhibitor to arrest classical or lectin pathway activation at the early C1 stage ( 75 ) . 
+ The importance of these structures for the complement resistance of E. coli K1 is supported by the decreased frequency of key LPS and capsule genes in the output pool along with a large number of OM-embedded proteins . 
+ A small number of OM proteins , such as TraT and Iss , have been implicated as being determinants of complement resistance ( 74 ) , but they have been introduced into low-resistance backgrounds in high copy numbers ; their role in the intrinsic resistance of clinical isolates is unclear , and no mechanisms have been invoked to account for increases in resistance . 
+ The insertion of large numbers of protein molecules into the OM may fortuitously alter the biophysical properties of the bilayer , reducing the surface area and ﬂuidity of lipid patches that are essential for the binding and assembly of the C5b-9 membrane attack complex . 
+ The identiﬁcation of a range of OM proteins as being putative complement resistance determinants by TraDIS creates an opportunity to systematically investigate their precise function through the generation of single-gene mutants , and we intend to pursue this line of investigation . 
+ We suggest that the architecture of the external surface of the OM , together with other more-external macromolecular structures such as polysaccharide capsules , inﬂuences the capacity of the pore-generating C5b-9 complex to perturb the integrity of the OM . 
+ Thus , the surface of susceptible strains contains a sufﬁcient number of exposed lipid domains to facilitate C5b-9 generation and penetration , whereas the spatial and temporal organization of the OM of resistant bacteria is dominated by supramolecular protein assemblages to a degree where insufﬁcient hydrophobic domains are available to act as C5b-9 assembly and binding sites , and this state persists throughout the growth cycle . 
+ The data that we have generated in this study are compatible with this hypothesis . 
+ An array of metabolic genes emerged as being essential for the maintenance of the complement-resistant phenotype ( Fig. 6D ) and may be indicative of repair processes invoked due to complement attack . 
+ The exposure of resistant E. coli to complement results in a minor perturbation of membrane integrity and metabolic homeostasis ( 76 , 77 ) , and C5b-9 intercalation into the OM has profound effects on cellular metabolic parameters ( 60 ) . 
+ TraDIS was also employed by Phan and coworkers to deﬁne the serum resistome of a globally disseminated , multidrug-resistant clone of E. coli ST131 ( 34 ) . 
+ Those research-ers identiﬁed , and in most cases validated , 56 genes that contributed to the high level of complement resistance displayed by this pathogen . 
+ In a fashion similar to that in our study , genes involved in the synthesis and expression of cell surface components were prominent . 
+ A number of genes contributing to LPS biosynthesis , such as those of the waa operon , the wzz locus , and rfaH , were common to both studies , as was the gene encoding the intermembrane protein AcrA . 
+ Genes of the plasmid-borne tra locus , which we determined to be components of the E. coli A192PP serum resistome , were not present in E. coli ST131 ( 34 ) , but other OM-located proteins may fulﬁll a similar role in reducing the ﬂuidic properties of the bilayer . 
+ In contrast to the well-established role of the E. coli K1 polysialyl polymer in the prevention of complement activation , no capsule genes were identiﬁed as components of the serum resistome of E. coli ST131 , but different ST131 isolates express different capsule types due to extensive mosaicism at the capsule locus ( 78 ) , and these uronic acid-containing polymers are unlikely to prevent complement activation ( 75 ) . 
+ Thus , the different strategies employed by the two strains to prevent successful complement attack , together with differences in bacterial surface composition and topography , probably explain variations in the serum resistomes of these related pathogens . 
+ In summary , we identiﬁed E. coli K1 genes required for growth in standard laboratory liquid medium and for colonization of the GI tract of P2 neonatal rat pups . 
+ Both data sets provide insights into the biology of K1 neuropathogens and could provide the basis for drug discovery programs for the identiﬁcation of selective antibacterial or colonization-inhibiting agents . 
+ In our rodent model , the stochastic nature of invasion of the blood and probably brain prevented TraDIS analysis of gene essentiality for crossing gut epithelial and choroid plexus borders , but some indication of genes necessary for survival in blood was obtained from analyses of output pools after incubation of E. coli A192PP in human serum , a potent source of complement . 
+ MATERIALS AND METHODS
+ Ethics statement . 
+ Animal experiments were approved by the Ethical Committee of the University College London ( UCL ) School of Pharmacy and the United Kingdom Home Ofﬁce and were conducted in accordance with national legislation . 
+ Bacteria and culture conditions . 
+ E. coli strain A192PP was obtained by serial passage of E. coli A192 ( serotype O18 : K1 ) , isolated from a patient with septicemia ( 79 ) , in P2 neonatal rats as described previously ( 11 ) . 
+ Carriage of the polysialyl K1 capsule was determined with phage K1E ( 80 ) : colonies were streaked onto Mueller-Hinton ( MH ) agar , 10 l of a phage suspension containing 109 PFU/ml was dropped onto each streak , the plates were incubated overnight at 37 °C , and the proportion of encapsulated bacteria within cultures was quantiﬁed by comparing the ratio of phage-susceptible to phage-resistant colonies . 
+ E. coli A192PP single-gene mutants ( Table 1 ) were constructed by using bacteriophage Red recombination ( 81 ) ; the oligonucleotides employed for the construction of targeted mutants , for the conﬁrmation of targeted mutants , and for the construction of complemented mutants are shown in Tables S3 to S5 in the supplemental material . 
+ All bacteria were cultured in LB medium and on LB agar at 37 °C ; media were supplemented with either 100 g/ml ampicillin or 50 g/ml kanamycin as required . 
+ Tn5 library construction . 
+ The EZ-Tn5 KAN-2 Tnp transposome ( Epicentre Biotechnologies ) was introduced into E. coli A192PP by electroporation . 
+ Transformants were selected by growth on LB plates containing 50 g/ml kanamycin overnight . 
+ Pools of 1 103 to 5 103 colonies were collected and frozen at 80 °C in phosphate-buffered saline ( PBS ) containing 20 % glycerol . 
+ Aliquots of individual pools were combined to create larger populations of mutants of up to 7.75 105 mutants . 
+ Genomic DNA was extracted from 1-ml cultures by using the PurElute bacterial genomic kit ( Edge Biosystems ) according to standard protocols . 
+ Linker PCR of Tn5 insertion sites . 
+ Linker PCR was used to test individual transformant colonies and to conﬁrm individual random-insertion events . 
+ DNA ( 2.5 g ) was digested with the AluI restriction enzyme ( Promega ) and puriﬁed by using a MinElute PCR puriﬁcation kit ( Qiagen ) . 
+ A linker , formed by annealing of oligonucleotides 254 ( 5 = - CGACTGGACCTGGA-3 =) and 256 ( 5 = - GATAAGCAGGGATCGGAACC TCCAGGTCCAGTCG-3 =) , was ligated to puriﬁed fragments ( 50 ng ) with a Quick ligation kit ( NEB ) . 
+ Linker PCR was performed with linker - and transposon-speciﬁc oligonucleotides ( 258 [ 5 = - GATAAGCAGGGATC GGAACC-3 =] and 5 = - GCAATGTAACATCAGAGATTTTGAG-3 = , respectively ) by using a HotStart Taq Mastermix kit ( Qiagen ) and thermocycling conditions of 95 °C for 5 min ; 35 cycles of 94 °C for 45 s , 56 °C for 1 min , and 72 °C for 1 min ; and 72 °C for 10 min . 
+ The resulting amplicons were separated on 1.5 % agarose gels at 100 V for 60 min . 
+ Illumina sequencing . 
+ For sequencing of Tn5 insertion sites , approximately 2 g of genomic DNA was degraded to 500-bp fragments by ultrasonication using a Covaris instrument . 
+ Fragments were end repaired and A tailed by using the NEBNext DNA library preparation reagent kit for Illumina sequencing ( NEB ) . 
+ The adapters Ind_Ad_T ( ACACTCTTTCCCTACACGACGCTCTTCCGATC * T , where * indicates phosphorothionate ) and Ind_Ad_B ( GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTC ) were annealed and ligated to DNA fragments . 
+ PCR was performed with the transposon - and adapter-speciﬁc primers Tn-FO ( 5 = - TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCGGGGATCCTCTAGAGTCGACCTGC-3 =) and Adapt-RO ( 5 = - GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACACTCTTTCCCTACACGACGCTCTTCC GATC-3 =) . 
+ Tn-FO and Adapt-RO contain a forward overhang and a reverse overhang for indexing of amplicons by Nextera index primers ( Illumina ) . 
+ PCR was performed by using a HotStart Taq Mastermix kit ( Qiagen ) and thermocycling conditions of 95 °C for 5 min ; 22 cycles of 94 °C for 45 s , 56 °C for 1 min , and 72 °C for 1 min ; and 72 °C for 10 min . 
+ The resulting amplicons were separated on 1.5 % agarose gels at 70 V for 90 min , and those between 150 and 700 bp were selected and puriﬁed by using a QIAquick gel extraction kit ( Qiagen ) . 
+ Samples were indexed with oligonucleotides from the Nextera XT index kit ( Illumina ) by using HotStart ReadyMix ( Kapa Biosystems ) and with thermocycling conditions of 95 °C for 3 min ; 8 cycles of 95 °C for 30 s , 55 °C for 30 s , and 72 °C for 30 s ; and 72 °C for 5 min applied . 
+ Indexed amplicons were puriﬁed by using the AMPure XP system ( Agencourt ) . 
+ The ﬁnal concentration of samples was conﬁrmed by using Qubit dsDNA BR assays ( ThermoFisher Scientiﬁc ) . 
+ Indexed amplicons were sequenced on an Illumina Mi-Seq platform as 151-bp paired-end reads according to the manufacturer 's protocol ( Illumina ) . 
+ Bioinformatic and statistical analyses . 
+ Raw sequence reads that passed Trimmomatic quality control ﬁlters ( 82 ) and contained the Tn5 transposon were mapped to the E. coli K1 A192PP reference genome ( 14 ) by using Bowtie ( 83 ) , permitting zero mismatches and excluding reads that did not map to a single site . 
+ The reference genome assembly contains open reading frames ( ORFs ) located on contigs that were mapped to the IHE3034 chromosome and ORFs located on other contigs that are likely to map to plasmids and other mobile genetic elements . 
+ An in-house pipeline based on the SAMtools ( http://samtools.sourceforge.net/ ) and BCFtools toolkits was utilized for the alignment ﬁles to determine insertion sites and coverage . 
+ To identify essential and nonessential genes , the insertion index was calculated for each gene by dividing the number of unique insertions in the gene by the gene length . 
+ Observed insertion index values were ﬁtted to a bimodal distribution with a gamma distribution ( or an exponential distribution for genes with no observed insertion sites ) corresponding to essential and nonessential genes . 
+ The log2 likelihood , and corresponding P values , of each gene belonging to essential or nonessential sets was calculated by using R software . 
+ To compare the ﬁtnesses of individual mutants in input and output populations , reads were normalized and tested for differential base means by calculating log2 fold changes and corresponding P values at a false discovery rate of 0.1 using DESeq with R software . 
+ Colonization and infection of neonatal rats . 
+ Timed-birth Wistar rat pup litters ( usually n 12 ) were purchased from Harlan UK , delivered at P2 , and colonized on the same day . 
+ Pups were retained throughout each experiment with the natural mothers in a single dedicated cage under optimal conditions ( 19 °C to 21 °C , 45 to 55 % humidity , 15 to 20 changes of air/h , and a 12-h light/dark cycle ) and were returned to the mother immediately after colonization . 
+ Mothers had unrestricted access to standard rat chow and water . 
+ The procedure was described in detail previously ( 84 ) . 
+ In brief , all members of P2 rat pup litters were fed 20 l of mid-logarithmic-phase E. coli bacteria ( 2 106 to 6 106 CFU unless otherwise stated ) from an Eppendorf micropipette . 
+ GI colonization was conﬁrmed by culture of perianal swabs on MacConkey agar , and bacteremia was detected by MacConkey agar culture of blood taken postmortem . 
+ Disease progression was monitored by daily evaluation of symptoms of systemic infection , and neonates were culled by decapitation and recorded as dead once a threshold had been reached : pups were regularly examined for skin color , agility , agitation after abdominal pressure , the presence of a milk line , temperature , weight , and behavior in relation to the mother . 
+ Neonates were culled immediately when abnormalities for three of these criteria were evident . 
+ After sacriﬁce , GI tissues were excised aseptically without washing , the colon was separated , and the SI was segmented into 2-cm portions representing proximal , middle , and distal small intestinal tissues . 
+ Tissues were then transferred to ice-cold phosphate-buffered saline and homogenized . 
+ Bacteria were quantiﬁed by serial dilution culture on MacConkey agar supplemented with 25 g/ml kanamycin as appropriate . 
+ The presence of E. coli K1 was conﬁrmed with phage K1E : 20 lactose-fermenting colonies were streaked onto MH agar , 10 l of a phage suspension containing 109 PFU/ml was dropped onto each streak , and the plates were incubated overnight . 
+ E. coli K1 bacteria were quantiﬁed by multiplying the total CFU by the proportion of K1E-susceptible colonies . 
+ In all cases , at least 19 colonies were susceptible to the K1 phage ; E. coli K1 was never found in samples from noncolonized pups . 
+ Susceptibility to human serum . 
+ Serum was obtained from healthy volunteers and used immediately . 
+ Bacteria were grown to late logarithmic phase in LB broth in an orbital incubator ( minimum of 200 orbits/min ) , and 500 l of the culture was removed , washed twice with gelatin-Veronal-buffered saline plus magnesium and calcium ions ( pH 7.35 ) ( GVB ) , and suspended in an equal volume of GVB . 
+ Fresh human serum was diluted 1:3 in GVB and prewarmed to 37 °C . 
+ Bacterial suspensions and serum solutions were mixed 1:2 to give a ﬁnal concentration of 107 CFU/ml and incubated at 37 °C for 3 h in a total volume of 125 l containing 22 % serum . 
+ Surviving E. coli bacteria were quantiﬁed by serial dilution and incubation on LB agar overnight . 
+ Prewarmed , heat-inactivated ( 56 °C for 30 min ) serum served as a control . 
+ Accession number ( s ) . 
+ Raw read data for all transposon insertions have been deposited in the European Nucleotide Archive ( ENA ) . 
+ All ﬁles are located at https://www.ebi.ac.uk/ena/data/view/ PRJEB24291 ; accession numbers are as follows : ERR2235345 and ERR2235346 for the identiﬁcation of essential genes for replicates 1 and 2 , ERR2235567 for the input population , ERR2235568 for the output population of rat MSI genes , ERR2235569 for the output population of serum-exposed E. coli A192PP , and ERR2235570 for the output population of bacteria exposed to heat-inactivated serum . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at https://doi.org/10.1128/JB .00698 -17 . 
+ ACKNOWLEDGMENTS
+ This work was supported by research grant MR/K018396/1 from the Medical Research Council . 
+ The National Institute for Health Research University College London Hospitals Biomedical Research Centre provided infrastructural support .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/18340041.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/18340041.txt 0 → 100644
View file @27818a9
+ Genome-wide analysis of Fis binding in Escherichia coli
+ We determined the genome-wide distribution of the nucleoid-associated protein Fis in Escherichia coli using chromatin immunoprecipitation coupled with high-resolution whole genome-tiling microarrays . 
+ We identified 894 Fis-associated regions across the E. coli genome . 
+ A significant number of these binding sites were found within open reading frames ( 33 % ) and between divergently transcribed transcripts ( 5 % ) . 
+ Analysis indicates that A-tracts and AT-tracts are an important signal for preferred Fis-binding sites , and that A6-tracts in particular constitute a high-affinity signal that dictates Fis phasing in stretches of DNA containing multiple and variably spaced A-tracts and AT-tracts . 
+ Furthermore , we find evidence for an average of two Fis-binding regions per supercoiling domain in the chromosome of exponentially growing cells . 
+ Transcriptome analysis shows that ∼ 21 % of genes are affected by the deletion of fis ; however , the changes in magnitude are small . 
+ To address the differential Fis bindings under growth environment perturbation , ChIP-chip analysis was performed using cells grown under aerobic and anaerobic growth conditions . 
+ Interestingly , the Fis-binding regions are almost identical in aerobic and anaerobic growth conditions -- indicating that the E. coli genome topology mediated by Fis is superficially identical in the two conditions . 
+ These novel results provide new insight into how Fis modulates DNA topology at a genome scale and thus advance our understanding of the architectural bases of the E. coli nucleoid . 
+ The Escherichia coli genome forms a highly condensed structure called a `` nucleoid body '' ( Robinow and Kellenberger 1994 ) , whereas the genomic DNA in eukaryotic cells is packed in a nucleus as a chromatin structure ( Kornberg 1974 ) . 
+ The compact nucleoid body in a bacterial cell is extensively bound by several nucleoid-associated proteins , which include H-NS , HU , IHF , Fis , and the stationary-phase-specific DNA-binding protein Dps ( Murphy and Zimmerman 1997 ; Azam et al. 2000 ; Schneider et al. 2001 ; Dame 2005 ) . 
+ The involvement of the nucleoid-associated proteins in organizing the genetic material within the bacterial nucleoid has been widely accepted , as well as their involvement in regulating transcription ( Ussery et al. 2001 ; Dorman and Deighan 2003 ; Blot et al. 2006 ) . 
+ The Fis protein is a general host nucleoid-associated DNA bending factor comprising 98 amino acids that was first identified because of its critical role in promoting site-specific DNA recombination ( Johnson et al. 1986 ) . 
+ The Fis protein contains a helix -- turn -- helix motif , which binds in the major groove and bends DNA by between 50 ° and 90 ° ( Kostrewa et al. 1991 ; Pan et al. 1996 ) . 
+ Its bending activity stabilizes DNA looping , either directly or through protein -- protein interactions , to enhance transcription as well as to promote DNA compaction ( Travers and Muskhelishvili 1998 ; Skoko et al. 2006 ) . 
+ The intracellular level of Fis protein is growth-dependent and changes from less than 100 copies in stationary phase to more than 60,000 copies per cell in log phase ( Ball et al. 1992 ; Azam et al. 1999 ) . 
+ A variety of evidence suggests that the Fis protein plays a variety of roles in regulating DNA transactions and modulating DNA topology ( Ussery et al. 2001 ) . 
+ Recently , Fis has been implicated in the 
+ 1These authors contributed equally to this work. 2Corresponding author.
+ E-mail bpalsson@bioeng.ucsd.edu; fax (858) 822-3120 . 
+ Article published online before print . 
+ Article and publication date are at http : / / www.genome.org/cgi/doi/10.1101/gr.070276.107 . 
+ control of the gene expression involved in metabolism , transport , flagellar biosynthesis , and virulence in E. coli and Salmonella typhimurium ( Kelly et al. 2004 ; Blot et al. 2006 ; Croinin et al. 2006 ) . 
+ The regulation mechanism widely accepted is that Fis influences transcription by directly or indirectly affecting the activity of RNA polymerase and by modulating the level of DNA supercoiling in the cell . 
+ For example , at the promoters rrnB P1 and proP P2 , Fis directly stimulates transcription by contacting the Cterminal domain of the RNA polymerase subunit ( RNAP , also known as RpoA ) ( Bokal et al. 1997 ; McLeod et al. 2002 ) . 
+ On the other hand , Fis negatively autoregulates its own operon by hindering RNA polymerase binding ( Ninnemann et al. 1992 ) . 
+ In the case of bacteriophage DNA excision , Fis appears to play an architectural role by contributing to a higher-order nucleoprotein complex that facilitates DNA cleavage and excision ( Landy 1989 ) . 
+ There are 53 Fis-binding sites ( Keseler et al. 2005 ) that have been directly experimentally determined . 
+ Robison and coworkers applied the recognition matrices developed from the experimentally derived Fis-binding sequences to search for Fis-binding sites across the E. coli genome sequence and reported more than 10,000 binding sites ( Robison et al. 1998 ) . 
+ Using hidden Markov models ( HMMs ) , Ussery and coworkers reported 6000 strong Fisbinding sites in the E. coli genome ( Ussery et al. 2001 ) . 
+ Information analysis used by Hengen and coworkers estimated 68,000 Fis-binding sites , or one site per 230 bases ( Hengen et al. 1997 ) . 
+ The huge variance of Fis-binding sites predicted by three different computational methods reflects the fact that only weak binding site profiles are obtained when Fis-binding site sequences are aligned . 
+ The relationship between the global effect of Fis on DNA topology and its local effects exerted on particular promoter regions is not well understood . 
+ The global interactions between the E. coli genome and Fis can be addressed by the direct measurement of Fis -- DNA complexes by chromatin immunoprecipitation coupled with microarrays ( ChIP-chip ) . 
+ The ChIP-chip approach is particularly well suited since unambiguous identification of the location of the proteins is possible by in vivo measurement of the protein -- DNA complex ( Ren et al. 2000 ) . 
+ A recent genome-wide analysis of Fis association in E. coli cells identified 224 binding regions ( Grainger et al. 2006 ) but was limited in the ability to define binding motif because of the resolution limitation of the low-density microarrays used . 
+ Here we improve on the resolution of this approach and use a ChIP-chip approach with fully tiled high-density microarrays to determine the distribution of the Fis-binding sites on a genome-scale . 
+ Our data enable the refinement of the Fis-binding motif and new insight into the functional behavior of the Fis protein . 
+ We also determined the effects of fis deletion on the transcription state of the cell . 
+ Results
+ Immunoprecipitation of the DNA fragments associated with Fis, 70, and RNAP
+ Prior to microarray hybridization , we used qPCR to determine the quality of immunoprecipitated DNA from the strain harboring myc-tagged Fis protein ( BOP608 ) , which has been shown to be highly resistant to stringent washing conditions and to retain its regulatory function in vivo ( Cho et al. 2006a ) . 
+ The cross-linked DNA -- protein complexes were immunoprecipitated by using antibodies against myc-tag , 70 ( also known as RpoD ) , or core RNAP ( subunit , also known as RpoB ) from the cultured cells in the minimal media . 
+ Following reversal of DNA -- protein cross-links , the immunoprecipitated DNA ( IP DNA ) was randomly amplified using PCR ( Herring et al. 2005 ) . 
+ In order to determine the enrichment of the IP DNA , qPCR was used to measure the relative levels of promoter and gene regions of known Fis-binding sites using nrfA , nirB , rrsA , sdhC , and dmsA as controls . 
+ The relative occupancy of Fis at the promoter regions of nrfA , nirB , and rrsA was 34 , 32 , and 20 , respectively ( Fig. 1 ) , which is consistent with previous studies ( Wu et al. 1998 ; Browning et al. 2002 ; Paul et al. 2004 ) . 
+ We also determined the association of 70 and the core RNAP at the promoter and gene regions of nrfA , nirB , and rrsA under the same conditions . 
+ The association of 70 and core RNAP was found only at the promoter of rrsA . 
+ Interestingly , the association of 70 was only shown at the promoter , whereas the core RNAP was not only shown at promoter but at gene regions as well . 
+ These observations are in strong agreement with previous studies , such that nrf and nir operons are repressed by Fis ( Wu et al. 1998 ; Browning et al. 2002 ) , and Fis acts as a classical activator at the rrsA promoter ( Paul et al. 2004 ) . 
+ As control experiments , we determined the relative occupancy of Fis , 70 , and core RNAP at the promoters and gene regions of sdhC and dmsA ( Fig. 1D ) . 
+ The Fis levels at promoters and gene regions of sdhC and dmsA remained at background levels . 
+ As expected , there was a large increase in 70 and core RNAP association with the promoter and gene regions of sdhC due to its biological role in central metabolism under our growth conditions ( Park et al. 1997 ) . 
+ On the other hand , very low levels of 70 and core RNAP were measured at the promoter and gene regions of the dmsA gene . 
+ This agrees with the known strong repression of the dmsA gene under the aerobic condition ( Bearson et al. 2002 ) . 
+ Altogether , these results demonstrate that Fis-bound DNA fragments were selectively immunoprecipitated from the exponentially growing E. coli cells . 
+ Genome-wide mapping of Fis-binding regions
+ To identify Fis-binding regions on a genome scale , we next performed a ChIP-chip analysis using custom-designed whole-genome tiling microarrays ( NimbleGen ) that contained a total of 371,034 oligonucleotides to represent the E. coli genome with 50-bp probes in overlapping by 25 bp on both forward and reverse strands ( Herring et al. 2005 ) . 
+ Our results identify regions of the genome enriched in the IP DNA samples , allowing us to construct a genome-wide map of in vivo interactions between Fis and the E. coli genome ( Fig. 2A ) . 
+ Using a peak detection algorithm based on the double-regression model ( Kim et al. 2005 ) together with manual curation , 894 unique peaks of Fis association were identified . 
+ The complete list of 894 Fis-binding regions is summarized in Supplemental Table S1 . 
+ The ChIP-chip analysis of Fis was also in agreement with the literature , showing binding at the promoters of acs , nrfA , nuoA , aldB , and nrd ( Fig. 2B ) ( Augustin et al. 1994 ; Xu and Johnson 1995 ; Browning et al. 2004 , 2005 ; Zhang et al. 2004 ) . 
+ Prior to this study , only 53 Fis-binding sites had previously been reported , 43 
+ ( 81 % ) of which were identified in this study ( Supplemental Table S3 ) ( Keseler et al. 2005 ) . 
+ The exceptions were lpdA , hupB , lysT-valT-lysW , adhE , osmE , gyrA , rnpB , gyrB , bglGFB , and glnALG . 
+ In order to determine whether the failure to detect Fis binding at these 10 sites was due to the sensitivity of the microarrays , we performed conventional ChIP assays followed by qPCR analysis and detected binding of the Fis protein to the promoter region of only bglG . 
+ Since these known Fis-binding sites would be considered false negatives of our ChIP-chip analysis ( Heintzman et al. 2007 ) , we estimate the sensitivity of our approach to be ∼ 98 % ( 43 out of 44 ) . 
+ Validation of the ChIP-chip results was then done using qPCR on 13 randomly selected sites of the 894 Fis-binding regions ( uidR , kdgT , hupA , yecF , eaeH , ybfL , ydcC , crp , thrW , ynaJ , otsA , metJ , and yfdT ) and two control regions ( pgi and dmsA ) . 
+ All of the selected Fis-binding regions exhibited enrichment as a log2 ratio range of 1.5 ∼ 5.1 , while the two control regions showed no significant enrichment ( Supplemental Table S4 ) . 
+ Assuringly , there was a strong correlation between the signal intensities obtained from ChIP-chip analysis and the real-time qPCR ( Fig. 3 ) . 
+ On the basis of this analysis , we concluded that the majority of Fis-binding peaks identified here are bona fide binding sites . 
+ Properties of Fis-binding regions
+ To assess the properties of Fis-binding regions , we analyzed the position of Fis-binding regions against the current annotated ge-nome information ( NC_000913 ) . 
+ Fis-binding regions were not only observed within intergenic ( IG ) regions , but were just as likely to be found within open reading frames ( ORFs ) . 
+ From the Fis-binding pattern , we classified three binding categories : IG1 , IG2 , and ORF . 
+ The IG1 category consists of Fis-binding peaks found within promoter regions , while the IG2 consists of Fis-binding peaks found within the intergenic region between convergently transcribed genes ( Fig. 4A ) . 
+ All of the remaining sites found within ORFs are thus members of the ORF category . 
+ Among a total of 894 unique Fis-binding sites , 547 peaks ( ∼ 61 % ) were within IG1 regions . 
+ A significant portion of the Fis-binding sites was also present in the IG2 ( 48 peaks ) and ORF ( 299 peaks ) . 
+ Thus , although many sites ( 67 % ) are present in the intergenic regions ( IG1 and IG2 ) , 33 % of Fis-binding sites are also located at other regions within a gene ( Fig. 4B ) . 
+ To validate these sites shown in IG2 and ORF regions , we performed ChIP analysis followed by qPCR to measure the association of Fis protein with four targets within ORF regions ( uidR , ydcC , crp , and otsA ) and two targets in the IG2 region ( metJ-metB and yfdT-dsdC ) . 
+ The ChIP-qPCR results indicated that each of those regions is a genuine Fis-binding target . 
+ We now compare Fis-binding regions to core RNAP and 70 binding sites discovered in previous experiments . 
+ In a previous study ( Herring et al. 2005 ) , we measured the genome-wide association of core RNAP ( - subunit [ also known as RpoC ] ) using the same microarray under aerobic growth conditions . 
+ Since the core RNAP ChIP-chip analysis was performed with rifampicin treatment to trap RNAP at promoter sites , the core RNAP-binding peaks detected represent most of the promoters ( both active and inactive ) . 
+ Recently , the genome-wide association of 70 with the E. coli genome was also revealed by using the similar whole-genome tiling microarray ( Reppas et al. 2006 ) . 
+ Using all of these data , we found core RNAP or 70-binding peaks in 462 Fis-binding regions . 
+ Most of the core RNAP or 70-binding peaks were located in the IG1 region ( 408 peaks ) . 
+ Interestingly , 37 and 17 Fis-binding peaks in the ORF and IG2 regions also have the RNAP or 70 bindings , respectively ( Fig. 4B ) . 
+ Of the 161 70 sites that were determined to be within the coding sequences of genes ( ORF region ) or between convergently transcribed genes ( IG2 region ) ( Reppas et al. 2006 ) , 41 also contained a Fis-binding peak within the same region ( Fig. 4C ) . 
+ This result suggests that many of the aActivation and repression were decided from changes in fold ratio between log2 values obtained from fis deletion and parental strain . 
+ bClasses I and II in direct regulation category indicate the Fis-binding regions at IG1 and ORF , respectively . 
+ The amount of Fis protein in a cell is known to be growth-phase-dependent ( Azam et al. 1999 ) . 
+ The dramatic increase in levels of Fis during exponential growth phase is controlled at the transcriptional level , which responds directly to an increase in growth rate . 
+ The fact that Fis concentration varies tremendously under different growth phases clearly points to an important regulatory implication of the Fis protein for cell physiology . 
+ However , under different growth conditions ( e.g. , aerobic to an-aerobic growth condition shift ) , its regulatory role or binding regions have not been investigated . 
+ To address this issue , ge-nome-wide Fis-binding regions were mapped under aerobic and anaerobic growth conditions in exponential growth phase . 
+ Interestingly , the Fis-binding regions identified from the ChIP-chip analysis of anaerobically grown cells were almost identical with those from aerobically grown cells . 
+ Complete Fis-binding sites of anaerobic growth conditions are also summarized in Supplemental Table S1 . 
+ Next , to investigate the effect that Fis has on gene expression , we measured the expression profiles of a fis deletion strain and its parental strain under aerobic and anaerobic conditions . 
+ A two-way ANOVA analysis with a 1 % FDR ( Pvalue = 0.0001 ) revealed 48 genes to be regulated by Fis across the aerobic/anaerobic shift . 
+ The ChIP-chip data further suggested that 21 of these genes were directly regulated by Fis , while the remaining 27 genes appeared to be regulated indirectly . 
+ Of the 21 genes , 19 were members of class I ( IG1 ) , and the other two were members of class II ( ORF ) . 
+ Interestingly , when comparing the Fis binding of these 21 sites under anaerobic and aerobic conditions , there was no evidence of differential Fis binding between the two conditions . 
+ It thus remains unclear whether Fis does , indeed , directly regulate these genes . 
+ ments of loops seen in electron micrographs , spread-of-supercoiling relaxation experiments ( Postow et al. 2004 ) , and resolvase half-lives ( Stein et al. 2005 ) . 
+ The consensus of these studies is that the average size of ∼ 400 -- 450 dynamically distrib-uted domains is 10 kb . 
+ Since Fis is presumed to be instrumental in defining these domains , we created a histogram of the measured interval sizes between neighboring ChIP-chip Fis peaks . 
+ As can be seen in Figure 5 , the distribution is similarly exponential in nature . 
+ Importantly , the average interval size is 5.15 kb , almost exactly half of the directly measured average domain size ( Postow et al. 2004 ) . 
+ Determination of the Fis-binding-site position weight matrix (PWM)
+ We used the large number of Fis-binding regions discovered in this study to reappraise previously estimated Fis-binding site preferences ( Finkel and Johnson 1992 ; Hengen et al. 1997 ) . 
+ As a first step in doing this , we manually identified individual binding peaks and then computationally determined the minimal contiguous chromosomal regions corresponding to 70 % of the ( log ratio ) area under each peak . 
+ We performed this refinement to minimize the effect of non-bound DNA duplex that is the result of the sonication step in the ChIP-chip protocol . 
+ Each such refined chromosomal region was then classified according to the log ratio of its corresponding peak . 
+ We then performed motif searches in these chromosomal regions for different log ratio cutoffs . 
+ These different log ratio cutoffs corresponded to different levels of conservative searching , with the assumption that chromosomal regions corresponding to Fis peaks with larger log ratio values were more likely to contain more or stronger motif signals . 
+ Since Fis binds as a homodimer , we performed two rounds of searches wherein the palindromic motif was and was not mandated . 
+ Figure 6 shows the logo representation ( Schneider and Ste-phens 1990 ) of the sequence found in both the non-palindromic ( npFis ) and palindromic ( pFis ) motif searches for the log ratio 2 set of sequence . 
+ ( Supplemental Fig. 3 shows the results for all sequence sets . ) 
+ Three important results are contained in Figure 6 and Supplemental Figure 3 . 
+ First , while the npFis motifs found in each of the sequence sets are very significant , the pFis motifs all have much less significant E-values . 
+ Second , the information content values of the npFis motifs are larger than the values for the corresponding pFis motifs . 
+ Third , both the npFis and pFis motifs contain at their core a strong A-tract and AT-tract , respectively . 
+ These results are discussed below in the context of Fis binding and patterning along the chromosome . 
+ The result shown in Figure 6 presents a conundrum . 
+ The Fis protein binds DNA as a homodimer and the most recently estimated ( Hengen et al. 1997 ) Fis motif ( prevFis ) is palindromic , yet the most informative and significant motif we found was the non-palindromic npFis motif . 
+ In order to resolve this conundrum , we performed experiments to determine which of the npFis , pFis , and prevFis motifs better discriminated Fis peak regions from randomly selected chromosomal regions not associated with Fis peaks . 
+ We used the motifs resulting from the log ratio 2 sets of sequences in Figure 6 to score all of the sequences corresponding to Fis peak regions with log ratio 1 , and for each sequence assigned it a score based on the largest sum of individual information ( Ri ) values ( Hengen et al. 1997 ) possible from non-overlapping motif match sites . 
+ Figure 7 is an ROC plot displaying the discriminative ability of the three different motifs , and contains two important results . 
+ First is that both of the npFis and pFis motifs derived in this work are better discriminators of chromosomal Fis-peak regions from non-Fis-peak regions than is the prevFis motif . 
+ Secondly , while the npFis and pFis motifs are basically very comparable in their discriminative ability , the pFis motif seems to have slightly better discriminative ability . 
+ This was not an expected result given their relative information content and significance values . 
+ To better understand the relationship between the npFis and pFis motifs , we first identified the phasing-defining Fis ( npFis or pFis ) sites in the set of Fis peak region sequences . 
+ Both members of a pair of sites were considered phasing-defining sites if all intervening sites between the pair had lower Ri values . 
+ For each phasing-defining pair , we computed the separation distance between their start positions . 
+ We then created a histogram of the separation distances associated with npFis motifs and a histo-gram of the separation distances associated with pFis , and weighted each distance value by the Ri values of the site defining the separation distance . 
+ Since Ri values have been correlated with binding affinity for Fis ( Shultzaberger et al. 2007 ) , we interpret higher such weightings to be indicative of more physiologically likely Fis-binding configurations . 
+ Figure 8 ( top and middle ) shows these weighted histograms for the npFis and pFis motifs , and Figure 8 ( bottom ) is the subtractive difference of the pFis histogram from the npFis histogram . 
+ Figure 8 ( bottom ) shows the competitive difference of the two motifs in dictating Fis phasing in regions containing multiple potential Fis-binding sites . 
+ The pattern in the difference histogram of Figure 8 ( bottom ) shows the increased propensity for npFis to dictate helical or antihelical phasing of Fis molecules . 
+ Discussion
+ We have mapped genome-wide distribution of E. coli nucleoid associated protein Fis in exponentially growing cells using a high-resolution whole-genome tiling microarray . 
+ In addition , expression profiles of a wild type and a fis deletion mutant were generated to determine the effect Fis has on transcription . 
+ By integrating these two data sets , we were able to show that : ( 1 ) 894 Fis-binding sites were identified , ∼ 67 % of which were located within non-coding regions , while the remaining ∼ 33 % were found within coding regions ; ( 2 ) Fis binding to the E. coli genome was insensitive to aerobicity ; ( 3 ) expression profiles determined 1341 genes to be weakly affected by Fis , with only 30 % containing Fis bound within the region ; and ( 4 ) half of Fis-binding sites overlap with the binding regions of both RNA polymerase and 70 . 
+ In addition , computational analyses revealed that : ( 1 ) Fisbinding signal in the chromosome was found to be necessary but not sufficient to explain the preferred binding locations by Fis as revealed by ChIP-chip . 
+ ( 2 ) The average interval size between Fisbinding sites was 5 kb , which is half the average supercoiling domain size . 
+ Furthermore , the number of Fis peaks was almost double the estimated number of supercoiling domains , suggesting a stoichiometric relationship of two Fis-binding regions per supercoiling domain . 
+ ( 3 ) By utilizing a large number of the Fisbinding regions , a Fis-binding motif was generated and compared to the previously established binding motif . 
+ Genome-wide distribution of E. coli nucleoid-associated protein Fis shows that Fis specifically binds ∼ 894 regions throughout the E. coli chromosome . 
+ The binding sites included 43 previously described regulatory targets and many novel-binding targets that have not been identified . 
+ Of the 894 binding regions identified , ∼ 67 % were located within non-coding regions , while the remaining ∼ 33 % were found within coding regions . 
+ The experiments were then repeated under anaerobic conditions , and it was found that oxygen had no detectable effect on the binding of Fis . 
+ The unusually high number of Fis-binding sites was quite surprising , given that no transcription factors in E. coli bind more than ∼ 200 sites ( Martinez-Antonio and Collado-Vides 2003 ) . 
+ A previous study on the genome-wide mapping of Fis binding identified only 224 target sites , with half of them found within non-coding regions and the other half within coding regions ( Grainger et al. 2006 ) . 
+ Differences between this study and the previous one ( Grainger et al. 2006 ) may be due to the low-resolution array used in the previous study , since microarray resolution is a critical factor when performing ChIP-chip experiments . 
+ For example , the previous ChIP-chip study detected no Fis binding within the rRNA operon region , which is clearly activated by Fis bindings ( Paul et al. 2004 ) ; however , the ChIP-chip result in this study shows genuine binding peaks on all of seven rRNA operons ( Supplemental Fig . 
+ S1 ) . 
+ These discrepancies are most likely due to the higher-resolution arrays ' increased ability to discern the actual binding from noise . 
+ When using high-resolution arrays for ChIP-chip , binding peaks appear as a normal Gaussian distribution , which are clearly illuminated when using a tiled array ( Fig. 2 ) ; however , as the array resolution is decreased , so is the resolution of the peaks , thus making it difficult to discern between noise and the true signals . 
+ The general concept of binding patterns of global transcription factors is that their target sites are located at promoter regions . 
+ Through interacting with RNA polymerase and/or other proteins , and/or hindering the binding of RNA polymerase at the promoter , it becomes able to activate or repress the transcription of the target genes . 
+ Our genome-wide analysis indicates that Fis binds numerous such regions ( 67 % ) . 
+ On the other hand , our analysis suggests that the general concept for a global transcription factor in regulation may be partially incorrect for Fis , since Fis-binding regions were also found at the range of many different sites ( 33 % ) such as within ORF regions ( Grainger et al. 2006 ) . 
+ Note that only a certain proportion ( 30 % ) of bound Fis directly affects transcription . 
+ Thus , Fis should be considered as a genome-organizing protein like Crp , in addition to its function as a promoter-specific regulator ( Grainger et al. 2005 ) . 
+ The Fis protein showed the ability to bend DNA , indicating that the bending activity stabilizes DNA looping to enhance transcription as well as to promote DNA compaction ( Travers and Muskhelishvili 1998 ; Skoko et al. 2006 ) . 
+ The Fis binding within ORF regions may reflect the DNA bending activity to maintain chromosome structure and transcription regulation as well . 
+ Genome-wide mapping of Fis-binding sites was then compared with expression profiles of a fis deletion mutant and its parental strain to determine the effect that Fis has on the transcription . 
+ The expression profiles determined 1341 genes to be affected by Fis , yet only 30 % had Fis bound within the region . 
+ It is worthwhile to note that with 894 Fis-binding sites and the expression of 1341 affected by the removal of Fis , there inevitably will be some coincidental overlap , rendering it difficult to infer direct regulation by ChIP-chip and gene expression data alone . 
+ However , these experiments do put an upper limit on the number of promoters directly regulated by Fis , which is approximately 424 . 
+ A surprising result from the expression profiling was the extremely small change a fis deletion has on expression . 
+ Although the expression of many genes was significantly affected by the deletion of fis , the median change in expression of those genes was only ∼ 0.37 log2 ratio . 
+ For comparison , the median change in expression when the global regulators fnr and arcA are deleted is ∼ 0.88 and ∼ 0.89 log2 ratio , respectively ( Covert et al. 2004 ) . 
+ The small effect that Fis seems to have on transcriptional expression could explain the minimal growth rate difference between the wild-type strain and a fis deletion mutant , during logphase growth ( Zhi et al. 2003 ) . 
+ Recently , using high-resolution atomic force microscopy ( AFM ) , a ternary complex of Fis , RNAP , and 70 was visualized at tyrT promoter ( Maurer et al. 2006 ) . 
+ Visualization of the ternary complex showed that Fis forms a discrete assembly by positioning in close proximity to an RNAP molecule . 
+ Owing to the fact that there was weak interaction between Fis and the RNAP , that result may explain the weak regulation observed in this study . 
+ When compared with ChIP-chip data of RNA polymerase and 70 , it was found that half of the Fis-binding sites overlap with the binding regions of both RNA polymerase and 70 . 
+ Interestingly , 54 Fis-binding sites within coding regions and intergenic regions between convergently transcribed genes were also occupied by RNAP and 70 ( Fig. 4 ; Supplemental Table 1 ) . 
+ A recent study on the E. coli transcriptome using high-density tiling microarrays has also suggested the existence of many novel transcripts within the gene coding region ( Reppas et al. 2006 ) . 
+ This observation could also be found in the ChIP-chip analysis of 70 and 32 , indicating that a significant portion of the binding sites of 70 and 32 are not associated with the 5 - ends of current annotated genes ( Wade et al. 2006 ) . 
+ Therefore , the Fis-binding sites within gene region may be regulatory cis-elements of Fis for modulating transcription of the currently unknown transcripts in the E. coli genome . 
+ As another view of this issue , we speculate that Fis regulates the transcription by the formation of DNA microloops , which form a separate topological domain ( Postow et al. 2004 ) . 
+ In those regions , the RNAP may be trapped to repress the transcription or may recycle to efficiently activate the gene transcription process . 
+ Computational analyses in this work resulted in a refinement to the Fis DNA binding signal and subsequently to new insights into the functional behavior of the Fis protein . 
+ Fis has a previously documented ( Skoko et al. 2006 ) dual behavior , which is that while it can bind nonspecifically to completely coat long stretches of duplex DNA , it also has preferred binding sites to which it binds and sets the phasing of the stretches of nonspe-cifically bound Fis . 
+ The two npFis and pFis motifs we identify in Figure 6 are quite similar when one realizes that their core signals are an A-tract and an AT-tract , respectively , and that A-tracts and AT-tracts > 4 nt have very similar DNA bending characteristics ( Hagerman 1990 ; Hud et al. 1998 ; Hud and Plavec 2003 ; Stefl et al. 2004 ) . 
+ There are differences , though , for while the less significant and less informative pFis motif contains a more generic and palindromic AT-tract ( reminiscent of the previously estimated Fis motif ) ( Hengen et al. 1997 ) , the more highly significant and more informative npFis motif contains an A6-tract . 
+ Because selectivity of Fis binding is thought to reflect the intrinsic bent nature of particular DNA sequences ( Betermier et al. 1994 ) and because A6-tracts induce the largest intrinsic curvature to segments of DNA ( Koo et al. 1986 ) , our results imply that the npFis motif represents the highest-affinity DNA sequence signal for Fis . 
+ DNA segments that more resemble the pFis motif , then , would be lower-affinity sites ( that are still preferred over random DNA ) . 
+ While this preferential hierarchy is likely modulated by the influence of flanking nucleotides on binding affinity ( Pan et al. 1996 ; Perkins-Balding et al. 1997 ) , the 15-bp core is enough to specify high-affinity binding sites ( Bruist et al. 1987 ) . 
+ The result that A - / AT-tracts constitute a critical component of high-affinity Fis-binding sites is supported by numerous previous experiments . 
+ For instance , 39 of the 60 confirmed Fis binding sites used to construct the prevFis motif ( Hengen et al. 1997 ) contain A - / AT-tract cores , and many of the known high-affinity Fis binding sites contain A-tract cores ( Pan et al. 1996 ) . 
+ The identification of a preferred DNA sequence signal for Fis binding ( npFis ) and the dominating helical and anti-helical phasing signal of the npFis motif over the pFis motif ( Fig. 8 , bottom ) together have important implications in supercoiled DNA . 
+ Fis bends DNA when it binds ( Thompson et al. 1988 ) , and helically phased Fis binding induces and stabilizes curved DNA ( Hubner et al. 1989 ; Lazarus and Travers 1993 ; Muskhelishvili et al. 1995 ; Perkins-Balding et al. 1997 ) . 
+ Curved segments of supercoiled DNA are most thermodynamically favorably located at apices of plectonemes , which aside from uniquely orienting a supercoiling domain ( Laundon and Griffith 1988 ) greatly enhance a local region 's exposure to transcription machinery ( ten Heggeler-Bordier et al. 1992 ; Lazarus and Travers 1993 ; Rochman et al. 2002 ; Muskhelishvili and Travers 2003 ) . 
+ Fis-bound stretches of DNA that are not curved overall -- which would be ensured by high-affinity Fis binding sites that are not helically phased -- would not have a propensity to occur at apices , but would be associated with duplex crossovers and branch points ( Schneider et al. 2001 ) . 
+ This inferred mechanism for structuring supercoiled DNA complements the observed Fis peak interval distribution ( Fig. 5 ) . 
+ We interpret the discoveries that the Fis peak interval distribution and previously inferred supercoiling domain size distribution were both exponentially distributed , that the average Fis peak interval ( 5 kb ) was half of the average domain size ( 10 kb ) , and that the number of Fis peaks ( 894 ) was almost double the estimated number of supercoiling domains ( 450 ) to be strong evidence for an average of two Fis-binding regions per supercoiling domain . 
+ The roles that these regions would play in structuring supercoiling domains through the stabilization of crossovers , loops , bends , or apices would be largely influenced by the phasing of those DNA sequences that most resemble the high-affinity binding site motif npFis . 
+ In a broader context , our results imply that A-tracts flanked by appropriately positioned C/G residues are preferred Fis-binding sites , and , in particular , A6-tracts provide the strongest Fisbinding signal . 
+ The E. coli chromosome contains an overrepresentation of ( 83,358 ) A - / AT-tracts that demonstrate a 10 -- 12-bp periodicity and are grouped in clusters ( Tolstorukov et al. 2005 ) in roughly 150-bp regions . 
+ As discussed in previous work ( Laundon and Griffith 1988 ; Rippe et al. 1995 ) , such A - / AT-tract clusters would have a higher propensity to be intrinsically curved and thus to induce branches in superhelical plectonemes and to position promoters at the apices of superhelices . 
+ These are the same topological roles ascribed to the Fis protein . 
+ Our results , then , support the supposition ( Tolstorukov et al. 2005 ) that A - / AT-tracts constitute a sequence-directed structuring code for the E. coli chromosome by in part serving as binding sites for the nucleoid-associated protein Fis . 
+ In summary , our genome-wide approach using ChIP-chip analysis not only provides a comprehensive assessment of the genomic distribution of the bound Fis and its role in transcription regulation , but also suggests directions for furthering our understanding of the structure , function , and evolution of the E. coli nucleoid . 
+ Methods
+ Bacterial strains and growth conditions
+ E. coli strain MG1655 was used to generate the deletion mutant and the BOP608 strain harboring Fis-8myc ( Cho et al. 2006a ) . 
+ Deletion mutant ( MG1655 fis ) was constructed by a Red and FLP-mediated site-specific recombination system ( Datsenko and Wanner 2000 ) . 
+ Glycerol stocks of E. coli strains were inoculated into M9 minimal medium containing 2 g/L glucose as a carbon source and cultured overnight at 37 °C with constant agitation . 
+ The cultures were inoculated into 100 mL of fresh M9 medium containing 2 g/L glucose and cultured at 37 °C with constant agitation to an appropriate cell density ( Covert et al. 2004 ) . 
+ In the case of anaerobic cultures , after the medium ( 250 mL ) was flushed with a nitrogen/carbon dioxide ( 9:1 ) mixture gas for 30 min to assure anaerobic conditions , the strains were grown at 37 °C with continuous sparging with the gas mixture , and agitation in the minimal medium ( Cho et al. 2006b ) . 
+ Chromatin immunoprecipitation (ChIP)
+ E. coli strain BOP608 was used to perform all ChIP-chip experiments . 
+ BOP608 cultures at mid-log growth phase aerobically ( OD A600 ≈ 0.6 ) or anaerobically ( OD A600 ≈ 0.2 ) were cross-linked by 1 % formaldehyde ( 37 % solution ; Fisher Scientific ) at room temperature for 25 min . 
+ Following quenching the unused formaldehyde with 125 mM glycine for an additional 5 min of incubation at room temperature , the cross-linked cells were harvested and washed three times with 50 mL of ice-cold TBS . 
+ The washed cells were resuspended in 0.5 mL of lysis buffer composed of 50 mM Tris-HCl ( pH 7.5 ) , 100 mM NaCl , 1 mM EDTA , protease inhibitor cocktail ( Sigma ) , and 1 kU of Ready-Lyse lysozyme ( Epicentre ) . 
+ The cells were incubated for 30 min at 37 °C and then treated with 0.5 mL of 2 IP buffer composed of 100 mM Tris-HCl ( pH 7.5 ) , 200 mM NaCl , 1 mM EDTA , and 2 % ( v/v ) Triton X-100 . 
+ The lysate was then sonicated four times for 20 sec each in an ice bath to fragment the chromatin complexes using Misonix Sonicator 3000 ( output level = 2.5 ) . 
+ The range of the DNA size resulting from the sonication procedure was 300 -- 1000 bp , and the average DNA size was 500 bp . 
+ Cell debris was removed by centrifugation at 37,000 g for 10 min at 4 °C , and the resulting supernatant was used as cell extract for the immunoprecipitation . 
+ To immunoprecipitate the Fis -- DNA , 70 -- DNA , or RNAP -- DNA complexes , 3 µg of anti-c-myc antibody ( 9E10 ; Santa Cruz Biotech ) , 6 µL of anti-70 antibody ( 2G10 ; Neoclone ) or 6 µL of anti-RNAP subunit antibody ( NT63 ; Neoclone ) were then added into the cell extract , respectively . 
+ For the control ( mock-IP ) , 2 µg of normal mouse IgG ( Upstate ) was added into the cell extract . 
+ They were then incubated overnight at 4 °C , and 50 µL of the Dynabeads Pan Mouse 70 IgG ( for c-myc ) or protein A ( for and RNAP subunit ) magnetic beads ( Invitrogen ) was added into the mixture . 
+ After 5 h of incubation at 4 °C , the beads were washed twice with the IP buffer ( 50 mM Tris-HCl at pH 7.5 , 140 mM NaCl , 1 mM EDTA , and 1 % [ v/v ] Triton X-100 ) , once with the wash buffer I ( 50 mM Tris-HCl at pH 7.5 , 500 mM NaCl , 1 % [ v/v ] Triton X-100 , and 1 mM EDTA ) , once with wash buffer II ( 10 mM Tris-HCl buffer at pH 8.0 , 250 mM LiCl , 1 % [ v/v ] Triton X-100 , and 1 mM EDTA ) , and once with TE buffer ( 10 mM Tris-HCl at pH 8.0 , 1 mM EDTA ) in order . 
+ After removing the TE buffer , the beads were resuspended in 200 µL of elution buffer ( 50 mM Tris-HCl at pH 8.0 , 10 mM EDTA , and 1 % SDS ) and incubated overnight at 65 °C for reverse cross-linking . 
+ After reversal of the cross-links , RNAs were removed by incubation with 200 µL of TE buffer with 1 µL of RNaseA ( QIAGEN ) for 2 h at 37 °C . 
+ Proteins in the DNA sample were then removed by incubation with 4 µL of proteinase K solution ( Invitrogen ) for 2 h at 55 °C . 
+ The sample was then purified with a PCR purification kit ( QIAGEN ) . 
+ Prior to the microarray experiments , the gene-specific quantitative PCR was carried out using the DNA samples . 
+ Real-time qPCR
+ To measure the enrichment of the Fis-binding targets in the DNA samples , 1 µL of IP or mock-IP DNA was used to carry out genespecific real-time qPCR with the specific primers to the promoter regions ( primer sequences are available upon request ) . 
+ The realtime qPCR conditions were as follows : 25 µL SYBR mix ( QIAGEN ) , 1 µL of each primer ( 10 pM ) , 1 µL of IP or mock-IP DNA , and 22 µL of ddH O. All real-time qPCR reactions were done in tripli-2 cate . 
+ The samples were cycled for 15 sec to 94 °C , for 30 sec to 52 °C , and for 30 sec to 72 °C ( total 40 cycles ) in iCycler ( Bio-Rad ) . 
+ Three independent biological replicates were prepared and subject to be analyzed by three independent technical replicates for the real-time qPCR . 
+ Amplification of DNA
+ To amplify the DNA samples , 7 µL of the IP or mock-IP DNA , 2 µL of 5 Sequenase buffer , and 1 µL of 40 µM Rand 9-Ns primer ( 5 - TGGAAATCCGAGTGAGTNNNNNNNNN ) were mixed in a PCR tube . 
+ The mixture was heated for 2 min to 94 °C and then cooled to 10 °C in a PCR machine ( Bio-Rad ) . 
+ One microliter of 5 Sequenase buffer , 1.5 µL of dNTP mix ( 2.5 mM each ) , 1.5 µL of BSA ( 0.5 mg/mL ) , 0.75 µL of DTT ( 0.1 M ) , and 0.3 µL of Sequenase ( 13 U / µL ) were added to the mixture . 
+ The mixture was ramped from 10 °C to 37 °C over 8 min , held for 8 min at 37 °C , heated for 2 min to 94 °C , and then cooled to 10 °C . 
+ 0.9 µL of Sequenase dilution buffer and 0.3 µL of Sequenase ( 13 U / µL ) were added to the samples and ramped from 10 °C to 37 °C over 8 min , held for 8 min at 37 °C , and then cooled to 4 °C . 
+ The samples were diluted by addition of 45 µL of ddH2O . 
+ A reaction mixture ( 100 µL ) of 15 µL of the diluted DNA , 10 µL of 10 pfu reaction buffer , 10 µL of dNTP mix ( 2.5 mM each ) , 1 µL of 100 µM Rand-univ primer ( 5 - TGGAAATCCGAGTGAGT ) , 1 µL of pfu polymer-ase ( 5 U / µL ) , and 63 µL of ddH2O was prepared on ice . 
+ Four tubes per sample were prepared to achieve enough DNA quantity for microarray hybridization . 
+ The samples were cycled for 30 sec to 94 °C , for 30 sec to 40 °C , for 30 sec to 50 °C , and for 2 min to 72 °C ( total 25 cycles ) . 
+ The amplified samples were then purified by using a PCR purification kit ( QIAGEN ) . 
+ The amplified DNA samples were then ethanol-precipitated and dissolved in 9 µL ( IP DNA ) and 7 µL ( mock-IP DNA ) of ddH2O , respectively . 
+ DNA yields ranged from 5 to ∼ 10 µg , and A260/280 was between 1.8 and 2.0 . 
+ The enrichment of the Fis-binding targets in the amplified DNA samples was measured using gene-specific real-time qPCR . 
+ Whole-genome-tiled microarray analysis
+ We used a custom-tiled NimbleGen microarray for the ChIP-chip assay . 
+ The microarray includes all the E. coli MG1655 genome sequence spaced on average 25 bp apart , resulting in 371,034 oligonucleotide probes that randomly distributed on the array . 
+ Detailed methods used for microarray process are described in Supplemental Methods . 
+ Transcriptional analysis
+ Affymetrix E. coli Antisense Genome Arrays were used for all transcriptional analyses . 
+ Cultures were grown to mid-exponential growth phase aerobically ( OD A600 ≈ 0.6 ) or anaerobically ( OD A600 ≈ 0.2 ) . 
+ Cultures ( 3 mL for aerobic and 9 mL for anaerobic ) were added to 2 volumes of RNAprotect Bacteria Reagent ( QIAGEN ) , and total RNA was then isolated using RNeasy columns ( QIAGEN ) with DNase I treatment . 
+ Total RNA yields were measured using a spectrophotometer ( A260 ) , and quality was checked by visualization on agarose gels and by measuring the sample A260/A280 ratio ( > 1.8 ) . 
+ cDNA synthesis , fragmentation , end-terminus biotin labeling , and array hybridization were performed as recommended by Affymetrix standard protocol . 
+ Raw CEL files were analyzed using a robust multi-array average for normalization and calculation of probe intensities . 
+ The processed probe signals derived from each microarray were averaged for both the wild-type and fis deletion mutant strains . 
+ To assess statistically significant differential expression , the probe signals were tested using pairwise t-test comparisons between wild-type and fis deletion mutant strains . 
+ Genes meeting a 1 % FDR ( false discovery rate ) - adjusted P-value cutoff ( 0.0001 ) were chosen as significant changes in gene expression . 
+ The filtered genes were then ascribed to genes directly or indirectly affected by Fis protein . 
+ Refinement of Fis peak chromosomal regions
+ After manually defining Fis peaks , we wrote a greedy algorithm to identify the chromosomal sequence region associated with 70 % of the ( log ratio ) area under each peak . 
+ The algorithm worked by first identifying the three consecutive probes whose associated peak area was greatest , and then expanding the consecutive set of probes in either the 5 or 3 direction depending on which neighboring probe had a higher value . 
+ This process ceased when 70 % of the peak area had been accumulated in a set of consecutive probes . 
+ The chromosomal start position of the first probe and the chromosomal end position of the last probe were used to define the `` refined chromosomal peak region . '' 
+ Motif searching
+ To find the Fis-binding site position weight matrix ( PWM ) , we first constructed sets of refined chromosomal peak region sequences reflecting different levels of conservativeness . 
+ The most conservative set consisted of sequences for only Fis peaks with associated log ratios 4 . 
+ Less conservative sets were constructed for log ratios of 3 , 2 , and 1 . 
+ The rationale for such sets was that sequences associated with high log ratios were more likely to contain more and/or stronger Fis-binding DNA sequences . 
+ We then used Meme ( Bailey and Elkan 1994 ) to search for the most significant motif in each set of sequences . 
+ Since Fis binds as a dimer and since the previously estimated Fis motif ( Hengen et al. 1997 ) is palindromic , we also searched for the most significant palindromic motif in each set of sequences ( accomplished by using the `` pal '' option to Meme ) . 
+ In all searches , the reverse complement of each sequence was allowed to contain sites . 
+ Supplemental Figure 3 shows the results of the motif searches . 
+ Motif discrimination ability
+ We tested the ability of the npFis , pFis , and prevFis motifs to discriminate Fis peak sequences from non-Fis-peak chromosomal sequences by first constructing 20 sets of randomly selected chromosomal sequences . 
+ Each such set contained the same number of sequences with the same length distribution as the log ratio 1 set of refined Fis peak sequences . 
+ In scoring a single DNA sequence , we used the position weight matrix ( PWM ) for the appropriate motif to identify all sites in the sequence ( including its reverse complement ) with an individual information ( Ri ) > 0.0 bits ( Schneider 1997 ) . 
+ Using dynamic programming , we then computed the set of non-overlapping sites with the greatest sum of Ri values . 
+ The score for a sequence was defined as this sum of Ri values . 
+ A discrimination experiment , then , consisted of scoring the log ratio 1 set of refined Fis peak sequences and a set of randomly selected chromosomal sequences and creating a receiver operating characteristic ( ROC ) plot from the combined results . 
+ We performed 20 such discrimination experiments for each motif using the 20 sets of random chromosomal sequences and reported the average ROC plot in Figure 7 . 
+ Sequence positioning relationship between npFis and pFis motifs
+ To understand how the npFis and pFis sequence signals interact in Fis peak regions , we scored the log ratio 1 set of refined Fis peak sequences with both of the npFis and pFis PWMs and computed the set of nonoverlapping sites with the greatest sum of Ri values -- irrespective of the identity ( npFis or pFis ) of each associated site . 
+ In this way , each sequence had an optimal patterning of npFis and pFis sites . 
+ Any pair of these sites was associated with a distance between their respective start sites , defined by the number of intervening nucleotide positions . 
+ In each sequence , we identified all pairs of sites such that for each pair composed of site1 with Ri = R1 and site2 with Ri = R2 , any sitej between site1 and site2 had Rj < R1 and Rj < R2 . 
+ Since the individual information R of a site has been shown to be correlated to the binding i energy of Fis ( Shultzaberger et al. 2007 ) , we reasoned that the higher Ri sites would be more strongly bound by Fis protein and would dictate the positioning of any intervening bound Fis mol-ecules . 
+ Both site1 and site2 can be npFis or pFis motifs . 
+ To quantify how the npFis and pFis motifs contribute to Fis positioning , and thus to different distances between all pairs of sites site1 and site2 , we created a separate distance histogram for both npFis motif sites ( Fig. 8 , top ) and pFis motif sites ( Fig. 8 , middle ) -- using as a distance `` count '' the Ri value of a site . 
+ Thus for each pair of sites site1 and site2 ( with R1 and R2 , respectively ) separated by d nucleotides , a `` weighted '' count R1 for distance d was added to the histogram for either npFis or pFis , and similarly for R2 . 
+ To assess how the npFis and pFis motifs differently contribute to different motif site separation distances , we subtracted the pFis distance histogram from the npFis distance histogram ( see Fig. 8 , bottom ) . 
+ All distance histograms were smoothened using an averaging window of 3 bp . 
+ Raw ChIP-chip data
+ The data file for all raw data can be downloaded from the following web site : http://systemsbiology.ucsd.edu/publications/ . 
+ Acknowledgments
+ We thank Mark Abrams for insightful discussions regarding manuscript writing . 
+ This work was supported by National Institutes of Health Grant GM062791 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/18370100.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/18370100.txt 0 → 100644
View file @27818a9
+ Genomewide Identification of Protein B Locations Using Chromatin Immunoprecipitation Coupled
+ Abstract Interactions between cis-acting elements and proteins play a key role in transcriptional regulation of all known organisms . 
+ To better understand these interactions , researchers developed a method that couples chromatin immunoprecipitation with microarrays ( also known as ChIP-chip ) , which is capable of providing a whole-genome map of protein-DNA interactions . 
+ This versatile and high-throughput strategy is initiated by formaldehyde-mediated cross-linking of DNA and proteins , followed by cell lysis , DNA fragmentation , and immu-nopurification . 
+ The immunoprecipitated DNA fragments are then purified from the proteins by reverse-cross-linking followed by amplification , labeling , and hybridization to a whole-genome tiling microarray against a reference sample . 
+ The enriched signals obtained from the microarray then are normalized by the reference sample and used to generate the whole-genome map of protein-DNA interactions . 
+ The protocol described here has been used for discovering the genomewide distribution of RNA polymerase and several transcription factors of Escherichia coli . 
+ Keywords ChIP-chip , chromatin immunoprecipitation , microarray , RNA polymer-ase , transcription factor , transcription factor binding 
+ 1 Introduction
+ In the postgenomic era , systematic and high-throughput technologies allow us to enumerate biological components on a large scale . 
+ As one of the approaches , chromatin immunoprecipitation coupled with microarrays ( ChIP-chip ) has been used to explore the genomewide interactions between proteins and cis-acting elements , such as a comprehensive identification of transcriptional regulatory regions of the human genome [ 1 ] and in other organisms [ 2 -- 8 ] . 
+ Maps of genomewide protein-DNA interactions are essential in understanding many fundamental biological components , such as the logic of regulatory networks , chromosome structure , DNA replication , DNA repair , heat shock response , and metabolism [ 9 -- 14 ] . 
+ Due to a dramatic improvement in microarray technologies , it is possible to create high-resolution maps of genomewide protein-DNA interactions [ 1 , 3 ] . 
+ Although many variations of the ChIP-chip protocol exist , the essential steps begin with an in vivo fixation of protein-DNA complexes mediated by formaldehyde . 
+ The cells are then lysed and the DNA fragmented to a desired size using sonication . 
+ The protein bound DNA is then enriched through immunoprecipitation by the specific antibody against the protein of interest ( or against epitope-tag flanked with the target protein ) , then purified of the protein through a heatmediated reversal of the cross-links . 
+ The DNA then is purified , amplified , and hybridized to a microarray [ 15 -- 20 ] . 
+ Several features make the ChIP-chip protocol difficult for all applications . 
+ First , not every available antibody is efficiently applicable to the immunoprecipitation of the cross-linked protein-DNA complexes [ 17 ] . 
+ This limitation probably is due to antibody specificity and an affinity against the target protein and epitope masking by formaldehyde-mediated cross-linking . 
+ To address this limitation , epitope-tagging methods have been developed to use in ChIP-chip of yeast and E. coli [ 4 , 21 ] . 
+ A second limitation is amplifying the ChIP DNA , which often is required to obtain sufficient amounts of DNA for labeling and hybridization without introduction of bias . 
+ Generally , two PCR-based methods have been widely used to amplify the ChIP DNA . 
+ The first method uses a degenerate oligo that randomly anneals to DNA [ 22 ] , while the other uses ligationmediated PCR ( LM-PCR ) , and achieves roughly 100 - to 1,000-fold amplification [ 2 ] . 
+ Finally , current microarray platforms are fairly expensive , but these costs are beginning to decrease significantly as new technologies are developed and competition among suppliers increases . 
+ This chapter describes the ChIP-chip protocol used for mapping the RNA polymerase binding in Escherichia coli along with several transcription factors . 
+ 2 Materials
+ 2.1 Cell Culture and Cross-Linking
+ 1 . 
+ 10X M9 salt stock ( M9 minimal medium ) : Dissolve 60 g Na HPO , 30 g 2 4 KH PO , 5 g NaCl , and 10 g NH Cl in dH O ; adjust to 1 L final volume ; and 2 4 4 2 sterilize by autoclaving for 20 min at 15 psi ( 1.05 kg/cm2 ) on a liquid cycle . 
+ Store at room temperature . 
+ 2 . 
+ 10X glucose stock : Dissolve 20 g glucose in dH O , adjust to 1 L final volume , 2 and sterilize by passing the solution through a 0.22 - µm filter . 
+ Store at room temperature . 
+ 3 . 
+ 500X MgSO stock ( 1 M ) : Dissolve 22.85 g MgSO .6 H O in dH O , adjust the 4 4 2 2 volume to 0.1 L , and sterilize by autoclaving for 20 min at 15 psi ( 1.05 kg/cm2 ) on a liquid cycle . 
+ Store at room temperature 
+ 4 . 
+ 1000X CaCl stock ( 0.1 M ) : Dissolve 1.47 g CaCl .2 H O in dH O , make up the 2 2 2 2 volume to 0.1 L , and sterilize by autoclaving for 20 minutes at 15 psi ( 1.05 kg / cm2 ) on a liquid cycle . 
+ Store at room temperature . 
+ 5 . 
+ 100X trace element solution : Dissolve 16.67 g FeCl .6 H O , 0.18 g ZnSO .6 H O , 3 2 4 2 0.12 g CuCl .2 H O , 0.12 g MnSO . 
+ H O , 0.18 g CoCl .6 H O , 0.12 g Na MoO , 2 2 4 2 2 2 2 4 and 22.25 g Na EDTA .2 H O in dH O ; adjust to 1 L final volume ; and sterilize by 2 2 2 passing the solution through a 0.22 - µm filter . 
+ Store at room temperature . 
+ 6 . 
+ 37 % formaldehyde solution ( Fisher Scientific , F79-500 ) . 
+ Store in the chemical hood . 
+ 7 . 
+ 2.5 M glycine solution : Dissolve 187.68 g glycine in dH O , make up the final 2 volume to 1 L , and sterilize by passing the solution through a 0.22 - µm filter . 
+ Store at room temperature . 
+ 8 . 
+ Tris-buffered saline ( Sigma , T5912 ) . 
+ Store at 4 °C . 
+ 2.2 Cell Lysis, Preparation of Chromatin Complexes, and Immunoprecipitation
+ 1 . 
+ Lysis buffer : 10 mM Tris-HCl , pH 7.5 , 100 mM NaCl , and 1 mM EDTA . 
+ Store at 4 °C . 
+ 2 . 
+ Lysozyme ( Epicentre , R1810M ) . 
+ Follow storage instructions provided by supplier . 
+ 3 . 
+ Protease inhibitor cocktail ( Sigma , P8465 ) : Dissolve at 200 mg/mL in DMSO and dilute in four volumes of dH O. Make fresh prior to use and keep cold on ice . 
+ 2 4 . 
+ IP buffer : 100 mM Tris-HCl , pH 7.5 , 200 mM NaCl , 2 mM EDTA , and 2 % Triton X-100 . 
+ Store at 4 °C . 
+ 5 . 
+ Misonix 3000 sonicator equipped with microtip . 
+ 6 . 
+ Antibody : Anti-RNAP mouse antibody ( Neoclone , W0001 , W0002 ) , anti-myc mouse antibody ( Santa Cruz Biotechnology , sc-40 ) , Mouse IgG ( Upstate , 12 -- 371 ) . 
+ Follow storage instructions provided by supplier . 
+ 7 . 
+ Bead washing buffer : Dissolve BSA ( Sigma , A7906 ) at 5 mg/mL in PBS . 
+ Make fresh prior to use and keep cold on ice . 
+ 8 . 
+ Dynabeads : Pan mouse IgG ( Invitrogen , 112.05 ) . 
+ Follow storage instructions provided by supplier . 
+ 3 . 
+ IP washing buffer 3 ( W3 buffer ) : 10 mM Tris-HCl , pH 8.0 , 250 mM LiCl ( Sigma , L7026 ) , 1 mM EDTA , and 1 % Triton X-100 . 
+ Store at 4 °C . 
+ 4 . 
+ TE buffer : 10 mM Tris-HCl , pH 8.0 , and 1 mM EDTA . 
+ Store at 4 °C . 
+ 5 . 
+ IP Elution buffer : 10 mM Tris-HCl , pH 8.0 , 1 mM EDTA , and 1 % SDS . 
+ Store at room temperature . 
+ 3 . 
+ Protease K solution , 20 mg/mL ( Invitrogen , 25530-049 ) . 
+ 4 . 
+ 3 M sodium acetate , pH 5.2 . 
+ 5 . 
+ Qiagen PCR Purification Kit ( Qiagen , 28106 ) . 
+ 6 . 
+ 2X SYBR green PCR master mix ( Qiagen , 204145 ) . 
+ Follow storage instructions provided by supplier . 
+ 7 . 
+ iCycler real-time PCR detection system ( Bio-Rad , CA ) . 
+ 8 . 
+ Sequenase ( USB , 70775Z ) , 5X Sequenase buffer ( supplied with Sequenase ) , and Sequenase dilution buffer ( supplied with Sequenase ) . 
+ 9 . 
+ 0.1 M DTT ( USB , 70726 ) . 
+ 10 . 
+ 0.5 mg/mL BSA ( diluted from 10 mg/mL stock from NEB , B9001S ) . 
+ 11 . 
+ Rand 9-Ns primer : 5 ′ TGGAAATCCGAGTGAGTNNNNNNNNN 3 ′ . 
+ 12 . 
+ Rand univ primer : 5 ′ TGGAAATCCGAGTGAGT 3 ′ . 
+ 13 . 
+ dNTP mix ( Takara , 4030 ) . 
+ 14 . 
+ pfu turbo polymerase ( Stratagene , 600135 ) and 10X pfu buffer ( supplied with polymerase ) . 
+ 1 . 
+ Cy3-labeled nine-mers ( TriLink Biotechnologies , N46-0001-50 ) . 
+ 2 . 
+ Cy5-labeled nine-mers ( TriLink Biotechnologies , N46-0002-50 ) . 
+ 3 . 
+ Random nine-mer buffer : 125 mM Tris-HCl , pH 8.0 , 12.5 mM MgCl , and 2 0.175 % β-mercaptoethanol . 
+ 4 . 
+ 50X dNTP mix solution : 10 mM Tris-HCl , pH 8.0 , 1 mM EDTA , 10 mM dNTP . 
+ 5 . 
+ 100 U Klenow fragment ( NEB , M0212M ) . 
+ 6 . 
+ MAUI hybridization unit ( BioMicro Systems , Utah ) . 
+ 7 . 
+ NimbleGen custom microarrays ( Design ID : 1881 , Escherichia coli whole-genome tiling array consisting of 371,034 oligonucleotides spaced 25 bp apart across the whole genome ) . 
+ 8 . 
+ NimbleGen Array Reuse Kit 40 ( NimbleGen , KIT001-2 ) . 
+ 9 . 
+ Axon scanner , model 4000B . 
+ 10 . 
+ Cy3 CPK6 50-mer ( IDT , Custom oligo synthesis ) 
+ 13 . 
+ 0.1 M DTT . 
+ 14 . 
+ Wash I : 250 mL ddH O , 2.5 mL 20X SSC , 5 mL 10 % SDS , and 250 µL 0.1 M 2 DTT . 
+ 15 . 
+ Wash II : 250 mL ddH O , 2.5 mL 20X SSC , and 250 µL 0.1 M DTT . 
+ 2 16 . 
+ Wash III : 250 mL ddH O , 625 µL 20X SSC , and 250 µL 0.1 M DTT . 
+ 2 
+ 2.6 Normalization and Peak Identification
+ SignalMap ( www.nimblegen.com ) , Matlab ver 7.0.4 with bioinformatics toolbox ( www.mathworks.com ) , Microsoft Excel ( www.microsoft.com ) , Mpeak ( the complete program is available from http://www.stat.ucla.edu/~zmdl/mpeak/ ) . 
+ 3 Methods
+ 3.1 Cell Culture and Cross-Linking
+ 1 . 
+ Add 2.8 mL of 37 % formaldehyde solution directly to each 100mL culture that contains the number of cells used for a ChIP-chip experiment ( see Note 1 ) . 
+ Continue to incubate with gentle shaking for 20min at room temperature ( see Note 2 ) . 
+ 2 . 
+ Add 5 mL of 2.5 M glycine solution directly to each 100 mL sample followed by incubation for 5 min at room temperature . 
+ 3 . 
+ Centrifuge at 4,700 g for 5 min at 4 °C and pour off the supernatant . 
+ Wash each pellet three times with one volume of ice-cold TBS ( see Note 3 ) and resuspend the cell pellet in the TBS remaining after decanting the supernatant . 
+ Transfer the sample to a new 1.5-mL tube and centrifuge at the maximum speed ( 15,800 g ) for 1 min . 
+ 4 . 
+ Remove all supernatant using a pipette and store the cell pellet at − 80 °C until use . 
+ 3.2 Cell Lysis, Preparation of Chromatin Complexes and Immunoprecipitation
+ 1 . 
+ Completely resuspend the cell pellet in 0.5 mL of lysis buffer . 
+ Add 40 µL of protease inhibitor cocktail and 0.5 µL of lysozyme solution . 
+ Incubate the sample for 30 min at 37 °C on a rocker , and then add 0.5 mL of IP buffer and 40 µL of protease inhibitor cocktail . 
+ Continue to incubate on ice until the lysate is cleared ( see Note 4 ) 
+ 2 . 
+ Shear the lysate by sonicating for four 20 s pulse with a Misonix microtip sonicator at output setting 2 ( see Note 5 ) . 
+ To avoid overheating the sample , keep the sample on ice at least 1 min between cycles . 
+ Centrifuge at 15,800 g for 10 min at 4 °C to clarify the chromatin solution . 
+ Take 10 µL of the chromatin solution to use as `` total DNA ( tDNA ) '' sample and store at − 20 °C for further use . 
+ 3 . 
+ Split the chromatin solution into two 0.5-mL aliquots . 
+ Add 1 µg of specific antibody to one aliquot , and 1 µg of mouse IgG to the other ( see Notes 6 and 7 ) . 
+ Incubate samples overnight at 4 °C on a rocker . 
+ 1 . 
+ For each sample , wash 50 µL of Dynabeads Pan mouse IgG beads three times with 1 mL of bead washing buffer . 
+ Add the incubated samples ( `` with specific antibody [ iDNA ] '' and `` with mock antibody [ mDNA ] '' ) to the washed magnetic beads ( see Note 8 ) . 
+ Continue incubation for at least 6 h at 4 °C on a rocker at 8 rpm . 
+ 2 . 
+ Collect the magnetic beads using an MPC magnet and remove the supernatant by aspiration . 
+ If needed , save the supernatant for the `` unbound fraction sample . '' 
+ Sequentially , wash the beads twice with 1 mL of W1 buffer , once with 1 mL of W2 buffer , once with 1 mL of W3 buffer , and once with 1 mL of TE buffer ( see Notes 9 and 10 ) . 
+ 3 . 
+ Resuspend the beads in 200 µL of IP elution buffer and add 190 µL of that solution to the tDNA sample . 
+ Continue to incubate overnight at 65 °C to reverse cross-links . 
+ 3.4 Purification, qPCR and Amplification of DNA
+ 1 . 
+ Pull down the magnetic beads using an MPC magnet and transfer 200 µL of supernatant to a new tube . 
+ Add 200 µL of TE buffer and 8 µL of RNaseA solution ( 10 mg/mL ) to each sample . 
+ Continue to incubate for 2 h at 37 °C . 
+ Add 4 µL of protease K solution ( 20 mg/mL ) to each sample and continue to incubate for 2 h at 55 °C . 
+ Purify DNA using the Qiagen PCR Purification Kit and elute with 50 µL of ddH O ( see Note 11 ) . 
+ At this point , qPCR can be done using the iDNA , 2 mDNA , and tDNA to confirm the enrichment fold required to run a microarray ( see Note 12 ) . 
+ 2 . 
+ Set up the round A reaction mix on ice as described in Table 9.1 ( see Note 13 ) . 
+ 3 . 
+ In a PCR tube , mix 7 µL of the iDNA or mDNA , 2 µL of 5X Sequenase buffer , and 1 µL of 40 µM Rand 9-Ns primer . 
+ Cycle to 94 °C for 2 min then cool to 10 °C . 
+ Add 5.05 µL of round A mix . 
+ Ramp up from 10 °C to 37 °C over 8 min , hold at 37 °C for 8 min , heat to 94 °C for 2 min , then cool to 10 °C . 
+ Add 1.2 µL mixture of 0.9 µL of Sequenase dilution buffer and 0.3 µL of Sequenase enzyme 
+ Ramp up from 10 °C to 37 °C over 8 min , hold at 37 °C for 8 min , then cool to 4 °C . 
+ Dilute the samples by addition of 45 µL of ddH O. 2 4 . 
+ Set up the round B reaction mix on ice as described in Table 9.2 . 
+ Transfer 15 µL of the diluted template into a new PCR tube and add 85 µL of the round B reaction mix to each tube . 
+ Prepare four tubes per sample to achieve enough DNA for microarray hybridization . 
+ Cycle to 94 °C for 30 s , 40 °C for 30 s , 50 °C for 30 s , and 72 °C for 2 min ( 25 cycles ) ( see Note 14 ) . 
+ Purify the amplified DNA using a Qiagen PCR Purification Kit and elute with 120 µL EB buffer supplied with the kit . 
+ Use one purification column per two reactions and combine two elutions in a new tube . 
+ The total volume per sample should be 240 µL . 
+ Add 24 µL of ice-cold 3 M sodium acetate ( pH 5.2 ) and 700 µL of ethanol . 
+ Continue to incubate overnight at − 20 °C . 
+ 3.5 Labeling, Hybridization, and Scanning
+ 1 . 
+ Centrifuge at 37,000 g at 4 °C for 30 min and wash the pellet with cold 80 % ethanol . 
+ Dry the pellet and dissolve the pellet in 9 µL ( iDNA ) and 7 µL ( mDNA ) of ddH O , respectively . 
+ Dilute 1 µL of the sample in 99 µL of EB buffer and 2 measure the DNA quantity and quality using a spectrophotometer . 
+ The DNA yields range from 5 to 10 µg and A should be between 1.8 and 2.0 . 
+ 260/28 
+ 2 . 
+ Dilute Cy3 and Cy5 dye-labeled nine-mers to 1 OD in 42 µL random nine-mer buffer . 
+ Aliquot to 40 µL individual reaction volumes in 0.2 mL thin-walled PCR tubes . 
+ Add 1 µg of iDNA and mDNA to the Cy5 and Cy3 tubes , respectively , and bring the final volume to 80 µL using ddH O. Denature the samples in a thermo-2 cycler at 98 °C for 10 min and then quickly chill in an ice-water bath . 
+ Add 20 µL of 50X dNTP mix solution and mix well by pipetting at least 10 times . 
+ 3 . 
+ Incubate at 37 °C for 2 h in a thermocycler ( light sensitive ) and then stop the reaction by the addition of 10 µL of 0.5 M EDTA . 
+ Transfer the reaction to a 1.5-mL tube . 
+ Precipitate the labeled samples by adding 11.5 µL of 5 M NaCl and 110 µL of isopropanol to each tube . 
+ Vortex and incubate for 10 min at room temperature in the dark . 
+ 4 . 
+ Centrifuge at 37,000 g for 10 min and remove the supernatant with a pipette . 
+ Rinse the pellet with 500 µL of 80 % ice-cold ethanol and centrifuge again at 37,000 g for 2 min . 
+ Remove the supernatant with a pipette and dry the pellet for 5 min in a SpeedVac using low heat and protection from light . 
+ Rehydrate the dried pellets in 25 µL of ddH O. Vortex for 30 s and quick spin to collect con-2 tents at bottom of tube . 
+ Measure the A in each sample to determine DNA 260 concentration . 
+ Typical yields range from 10 to 30 µg per reaction . 
+ 5 . 
+ Set MAUI hybridization unit to 42 °C and allow time for the temperature to stabilize . 
+ Combine 13 µg of the both iDNA ( Cy5 ) and mDNA ( Cy3 ) into a single 1.5 mL tube ( see Note 15 ) . 
+ Dry the combined contents in a SpeedVac on low heat . 
+ Resuspend the sample in 10.9 µL of ddH O and vortex to com-2 pletely dissolve the sample . 
+ Spin down the tube briefly to collect the contents in the bottom . 
+ 6 . 
+ Using the NimbleGen Array Reuse Kit , add 19.5 µL of 2X hybridization buffer , 7.8 µL of hybridization component A , 0.4 µL of Cy3 CPK6 50-mer oligo , and 0.4 µL of Cy5 CPK6 50-mer oligo to each sample ( see Note 16 ) . 
+ Mix the tube briefly , and then spin down to collect the contents in the bottom and place at 95 °C for 5 min . 
+ 7 . 
+ Immediately transfer the tube to the MAUI 42 °C sample block and hold at this temperature until ready for sample loading ( see Note 17 ) . 
+ Place the MAUI mixer SL hybridization chamber on the array using the provided assembly/disassembly jig and carefully follow MAUI setup instructions . 
+ Use the braying tool to remove all air bubbles from the adhesive gasket around the outside of the hybridization chamber . 
+ 8 . 
+ Load the sample using the pipette supplied with the MAUI station , following manufacturer 's instructions . 
+ During loading , a small amount ( 3 -- 7 µL ) of the sample may flow out of the outlet port . 
+ Confirm that there are no bubbles in the chamber . 
+ 9 . 
+ Place the loaded array into one of the four MAUI bays and let it equilibrate for 30 s. Wipe off any sample leakage at the ports and adhere MAUI stickers to both ports . 
+ Close the bay clamp and select mix mode B. Hold down the mix button to start mixing . 
+ Confirm that the mixing is in progress before closing the cover . 
+ Hybridize the sample overnight 
+ 10 . 
+ Remove chip from MAUI hybridization station , load it back into the MAUI assembly/disassembly jig , and immerse in the shallow 250 mL Wash I ( see Note 18 ) . 
+ While the chip is submerged , carefully peel off the mixer . 
+ Gently agitate the chip in Wash I for 10 -- 15 s ( see Note 19 ) . 
+ 11 . 
+ Transfer the slide into a slide rack in the second dish of Wash I and incubate 2 min with agitation . 
+ Transfer to Wash II and incubate 1 min with agitation . 
+ Rock the dish to move the wash over the tops of the arrays . 
+ Transfer to Wash III and incubate for 15 s with agitation . 
+ 12 . 
+ Remove the array and spin dry in an array-drying unit for 1 min . 
+ Store the dried array in a dark desiccator and proceed immediately the scanning of the arrays . 
+ 3.6 Data Normalization and Peak Identification
+ 1 . 
+ Conceptually , the normalization approach using sum of intensity of each channel is the simplest way , whose assumption is that the total DNA used is same for both channels . 
+ Calculate the sum of each Cy5 and Cy3 channel and the ratio between the total intensity of Cy5 and that of Cy3 . 
+ Multiply the ratio ( N ) to each data point ( G , R ) and then calculate log ratio ( R ) of each point ( Eq . 
+ 9.1 -- Eq . 
+ k k 2 9.3 ) . 
+ Log ratio can be used for the peak identification step ( see Note 20 ) . 
+ 2 Narray ∑ Rk N = k = 1 Narray ∑ Gk k = 1 
+ 2 . 
+ The binding sites should appear in the data as runs of consecutive points with enhanced amplitude shown in Fig. 9.1 . 
+ Several peak identification methods have been developed using an error model to compute p-values for single array probes [ 2 ] , a sliding window approach with Gaussian error function [ 23 ] , double-regression model [ 1 ] , a percentile approach [ 24 ] , a hierarchical empirical Bayes model [ 25 ] , a joint-binding deconvolution method [ 26 ] , a tiled model-based analysis of tiling-arrays method [ 27 ] , and a variance stabilization approach [ 28 ] . 
+ Of those methods , we used the double-regression method to identify the protein binding sites across genome 
+ 4 Notes
+ 1 . 
+ The protocol describes the growth of E. coli MG1655 in minimal media to the mid exponential phase . 
+ Because the protein-DNA interactions are very sensitive to the physiological state of the cell , it is very important to control the growth conditions as tightly as possible . 
+ Generally , each immunoprecipitation requires 5 × 107 ∼ 1 × 108 cells ( approximately , OD 0.4 ~ 1.0 ) . 
+ To find out all the potential 600 promoters of E. coli , the cells were treated with rifampicin for 30 min [ 3 ] . 
+ 2 . 
+ The cross-linking time should be empirically optimized to each protein-DNA-antibody combination . 
+ We found that the cross-linking time ( 20 min ) and for-maldehyde concentration ( 1 % ) described in this protocol are generally applicable to RNAP ( β and β ′ subunits ) and several transcription factors of E. coli . 
+ However , theoretically insufficient cross-linking would result in the ina-bility to capture protein-DNA complexes . 
+ On the other hand , overcross-linking can make giant protein-DNA complexes and cause the epitope-masking problem . 
+ Formaldehyde should be used with appropriate safety measures , such as protective gloves , glasses , and clothing and adequate ventilation . 
+ Formaldehyde waste should be disposed of according to regulations for hazardous waste . 
+ 3 . 
+ PBS can be used to wash cell pellets instead of TBS . 
+ 4 . 
+ Cell lysis condition varies depending on the target cell type . 
+ For E. coli , the lysis method using lysozyme works very well . 
+ If the lysozyme is not available , alternative methods , such as French press or glass-bead-based lysis , can be used . 
+ 5 . 
+ The sonication step should be optimized to achieve the optimum fragmentation of DNA ( 300 -- 1,000 bp ) . 
+ One way to optimize the sonication step is to take 20 µL of the chromatin solution from each sonication cycle and determine the average size of the fragmented DNA in the chromatin solution . 
+ Add 80 µL of IP elution buffer to each 20 µL of chromatin solution and continue to incubate at least 6 h ( or overnight ) at 65 °C to reverse cross-link the chromatin complexes . 
+ Add 100 µL of TE and 4 µL of RNaseA solution ( 10 mg/mL ) and continue to incubate for 2 h at 37 °C . 
+ Add 2 µL of protease K solution ( 20 mg/mL ) and continue to incubate for 2 h at 55 °C . 
+ Purify the DNA with a Qiagen PCR Purification Kit and elute using 30 µL of EB buffer supplied with the kit . 
+ The average size of DNA then is analyzed on a 2 % agarose gel ( Fig. 9.2 ) . 
+ Under - and overfragmentation result in a loss of resolution of binding events and more noise in the microarray analysis , respectively ( 16 ) . 
+ 6 . 
+ Not every available antibody is efficiently applicable to the ChIP-chip , as mentioned in introduction . 
+ Unfortunately , routine assay methods such as Western blotting can not be used to test whether an antibody would be suitable for ChIP-chip . 
+ To determine whether an antibody is suitable for ChIP-chip experiments requires an actual ChIP assays followed by qPCRs of several known-binding sites . 
+ To address the limited use of antibodies , epitope-tagging methods have been developed for ChIP-chip of yeast and E. coli without alteration in function of target proteins [ 4 , 21 ] . 
+ The main advantages of epitope-tagging are the availability of a universal antibody and the ability to insert multiple copies of the epitope , which increases the immunoprecipitation yield [ 21 ] 
+ 7 . 
+ When epitope-tagged proteins are being used , wild-type cells can be used as a control . 
+ In this case , add the same antibody against the epitope used to both the epitope-tagged sample and the wild-type sample [ 21 ] . 
+ 8 . 
+ Alternatively , the antibodies can be conjugated to the Dynabeads Pan mouse IgG beads prior to the immunoprecipitation . 
+ Wash 50 µL of the magnetic beads three times , using the bead washing buffer , and resuspend the washed beads in 250 µL of bead washing buffer . 
+ Add 10 µg of antibody to the magnetic beads and incubate overnight at 4 °C on a rocker . 
+ Wash the beads three times in 1 mL of bead washing buffer and collect the beads using a MPC magnet prior to use . 
+ Then , add 0.5 mL of the sheared chromatin solution to the antibody preconjugated magnetic beads . 
+ Continue to incubate overnight at 4 °C on a rocker . 
+ 9 . 
+ Add 1 mL of the washing buffer to each tube and gently resuspend beads . 
+ This can be done by removing the tubes from the MPC magnet and rotating the tubes for 1 min with rocker . 
+ Collect the magnetic beads using an MPC magnet and remove the supernatant by aspiration . 
+ Normally , to remove chromatin complexes that are nonspecifically bound to the antibody or magnetic beads , intensive washing steps are needed . 
+ Do not minimize this step . 
+ 10 . 
+ Because not every protein-antibody interaction can bear the high salt and detergent conditions , we recommend washing the beads four times using only W1 buffer and once using TE buffer . 
+ If more stringent conditions are required , increase salt concentrations such as NaCl and LiCl in W2 and W3 buffer , respectively . 
+ At the final washing step using TE , the magnetic beads are collected on the tube wall loosely . 
+ Do not use aspiration but only a pipette . 
+ 11 . 
+ Instead of a Qiagen PCR Purification Kit , it is possible to use the conventional purification method , consisting of a phenol : chloroform : isoamyl alcoho extraction and ethanol precipitation . 
+ We chose the Qiagen kit to maintain consistency among samples . 
+ 12 . 
+ Examples for qPCR are shown in Fig. 9.3 . 
+ We used the ChIP-chip approach to study the association of RNA polymerase across the genome of E. coli under aerobic conditions . 
+ The ChIP DNA samples were used as a template for qPCR with primer pairs of promoter regions from pgi , cyoA , and sdhC , whose gene expressions were up-regulated under aerobic conditions . 
+ As a control , the promoter region of dmsA was used , as the gene is down-regulated under aerobic conditions . 
+ 13 . 
+ DNA amplification consists of two steps . 
+ Round A involves two rounds of DNA synthesis using the immunoprecipitated DNA as template , a partially degenerate primer ( Rand 9-Ns primer ) , and T7 Sequenase . 
+ Round B consists of 25 -- 30 cycles of PCR using a primer ( Rand universal primer ) that anneals to the specific region of the Rand 9-Ns primer . 
+ 14 . 
+ The number of cycles should be optimized prior to the PCR amplification to prevent amplification bias . 
+ The random amplification method does not amplify DNA linearly , so the linearity depends on the number of cycles . 
+ Prepare five amplification tubes and sample each tube at the 15th , 20th , 25th , 30th , and 35th cycles . 
+ Measure the DNA concentration and perform qPCR as described in Note 12 . 
+ Compare both the quantity and quality of each cycle . 
+ 15 . 
+ The tubes should be protected from light during handling to prevent photob-leaching of the light-sensitive Cy dyes . 
+ 16 . 
+ CPK6 50-mer oligos are include in the hybridization as controls and hybridize to alignment features on NimbleGen arrays . 
+ They are required for proper extraction of array data from the scanned image . 
+ 17 . 
+ This procedure describes the process for hybridization of samples prepared by chromatin immunoprecipitation and amplified by random PCR on NimbleGen custom microarrays ( Design ID : 1881 , Escherichia coli whole-genome tiling array , consisting of 371,034 oligonucleotides spaced 25 bp apart across the whole genome ) . 
+ The use of ORF arrays has limited power of ChIP experiments , since most transcription factor binding sites are located in the intergenic region and , therefore , not included on these arrays . 
+ The most robust array design for ChIP-chip has contiguous tiled DNA fragments that represent the entire genome , including the noncoding regions [ 1 , 3 , 17 ] . 
+ 18 . 
+ Prior to removing the array from the MAUI Hybridization Station , prepare two 250 mL dishes of Wash I , and one each for Wash II and Wash III . 
+ One dish for Wash I should be shallow and be wide enough to accommodate the array and mixer loaded in the MAUI assemble/disassembly jig . 
+ The lid from a 1-mL pipette tip box works well . 
+ Place the remaining three wash solutions in 300 mL Tissue-Tek slide staining dishes . 
+ 19 . 
+ Peel the hybridization chamber off very slowly to prevent the slide from cracking . 
+ Do not let the surface of the slide dry out at any point during washing . 
+ 20 . 
+ Various normalization methods can be used to normalize the tiling array data . 
+ Alternatively , another normalization method for the tiling array data is the use of a biweight mean [ 24 ] . 
+ Calculate the log of the ratio of Cy5 to Cy3 for each 2 data point and then subtract the biweight mean of this log from each data 2 point . 
+ Acknowledgments The protocol described here was based on previous work by many other research groups in this field . 
+ The pioneers in this field are Dr. Young 's group at MIT , Dr. Lieb 's group at the University of North Carolina , Dr. Grunstein 's group at Yale University , Dr. Ren 's group at UCSD , and others . 
+ We thank anyone whose work was not referenced in here . 
+ This work is supported by NIH research grant no . 
+ GM62791 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/18658270.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/18658270.txt 0 → 100644
View file @27818a9
+ of
+ Protein tyrosine nitration has been described as irreversible ( 3 ) , and E. coli cell extracts show no evidence of an ability to repair nitrated proteins ( 23 ) . 
+ On the other hand , there is evidence for a protein tyrosine `` denitrase '' activity in rat tissues and in mitochondria ( 21 , 22 ) , and nitrated proteins may be subject to more rapid turnover than their native counterparts ( 15 , 36 ) . 
+ Despite these observations , the fate of nitrated proteins remains poorly understood . 
+ The degradation of nitrated proteins ( whether or not it is selective ) would liberate free 3-NTyr , and so there is some interest in the biochemical fate of this molecule in both host cells and invading pathogens . 
+ In rat PC12 cells , 3-NTyr can be converted to 4-hydroxy-3-nitrophe-nylacetate by the sequential action of an aromatic amino acid decarboxylase , an amine oxidase , and a NAD-linked dehydrogenase ( 4 ) . 
+ The intermediates in this pathway are 3-nitrotyra-mine and 4-hydroxy-3-nitrophenylacetaldehyde ( Fig. 1 ) . 
+ Bacteria isolated on the basis of their ability to use 3-NTyr as a carbon and energy source convert 3-NTyr to 4-hydroxy-3-ni-trophenylacetate via 4-hydroxy-3-nitrophenylpyruvate through the sequential action of a deaminase and a decarboxylase . 
+ The nitro group is then removed from 4-hydroxy-3-nitrophenyl-acetate by a novel denitrase activity ( 29 ) . 
+ The responses of bacteria to oxygen and nitrogen radicals attract considerable interest , in part because of their roles in the innate immune response ( 12 ) . 
+ In the case of NO , diverse bacteria express several different NO detoxiﬁcation enzymes , and there are numerous regulatory systems that have been reported to respond to NO ( 12 , 38 ) . 
+ In the context of the current work , the regulator of interest is NsrR , a transcriptional repressor from the Rrf2 family , which probably contains 
+ * Corresponding author . 
+ Mailing address : Department of Molecular and Cell Biology , the University of Texas at Dallas , 800 W Campbell Road , Richardson , TX 75080 . 
+ Phone : (972) 883-6896 . 
+ Fax : (972) 883-2409 . 
+ E-mail : stephen.spiro@utdallas.edu . 
+ † Supplemental material for this article may be found at http://jb . 
+ asm.org / . 
+ ‡ Present address : 1 Coca Cola Plaza , TEC 434C , Atlanta , Georgia 30313 . 
+ Published ahead of print on 25 July 2008 . 
+ an iron-sulfur cluster and is sensitive to sources of NO . 
+ NsrR has been shown to act as an NO-sensitive regulator of gene expression in several organisms ( 1 , 2 , 5 , 14 , 17 , 19 , 28 , 31 , 33 ) , and targets for NsrR regulation have been predicted ( 34 ) . 
+ In E. coli , NsrR regulates expression of the NO-detoxifying ﬂavohemoglobin , along with several other genes and operons , some of which are of unknown or poorly understood function ( 5 , 13 , 24 , 40 ) . 
+ In this report , we show that 3-nitrotyramine can be used as a nitrogen source by cultures of E. coli , supporting growth at slow rates . 
+ We present evidence that the pathway of 3-nitro-tyramine degradation to 4-hydroxy-3-nitrophenylacetate is similar to that found in rat cells ( 4 ) , involving a periplasmic amine oxidase ( TynA , also known as MaoA ) and a cytosolic NAD-linked dehydrogenase ( FeaB , also known as PadA ) . 
+ The tynA and feaB promoters are bound by NsrR in vivo , and NsrR exerts a weak , though signiﬁcant , degree of control on both promoters . 
+ Overexpression of NsrR represses the tynA and feaB promoters and severely retards growth on phenylethyl-amine ( PEA ) , catabolism of which requires TynA and FeaB activities . 
+ Expression of the tynA and feaB genes is upregulated by growth on PEA and 3-nitrotyramine , regulation that requires an AraC-type regulator encoded by the feaR gene . 
+ We speculate that one physiological function of TynA and FeaB is to metabolize nitrated aromatic compounds that may accumulate in cells exposed to NO and superoxide . 
+ MATERIALS AND METHODS
+ Bacterial strains , media , and growth conditions . 
+ The strains and plasmids used in this work are listed in Table 1 . 
+ Transposon insertions in the tynA and feaB genes of E. coli MG1655 were obtained by P1 transduction , using strains JD22473 ( tynA : : Tn10 ) and JD22470 ( feaB : : Tn10 ) from the National Bio-Resource Project ( Japan ) as the donors . 
+ To construct an unmarked deletion in the lacZ gene of MG1655 , we ﬁrst introduced a lacZ : : kan mutation from strain VJS8363 ( a gift from Valley Stewart ) and then removed the kanamycin resistance cartridge by site-speciﬁc recombination with pCP20 ( 8 ) . 
+ The nsrR gene was disrupted by replacing the coding region with a kanamycin resistance cassette , using the red recombinase method , with pKD4 as the template and primers designed to generate a nonpolar mutation ( 8 ) . 
+ The mutation was transferred to other strains by P1 transduction . 
+ To convert the insertion mutation to an unmarked nonpolar deletion , we transformed the strains with pCP20 , and kana-mycin/ampicillin-sensitive transformants were identiﬁed after colony puriﬁcation at 43 °C ( 8 ) . 
+ Reporter strains with feaR : : kan mutations were constructed by P1 transduction using JW1379 ( from the National BioResource Project , Japan ) as the donor . 
+ The structures of all insertion and deletion mutants were conﬁrmed at each step by PCR . 
+ The rich medium for routine propagation of E. coli strains was L broth ( tryptone , 10 g liter 1 ; yeast extract , 5 g liter 1 ; NaCl , 5 g liter 1 ) . 
+ For growth tests and enzyme determinations , a deﬁned medium ( 37 ) was used , supplemented with the indicated carbon and nitrogen sources and with Casamino acids ( 0.05 % [ wt/vol ] ) , as needed . 
+ For growth with alternative nitrogen sources , the ammonium sulfate in this medium was substituted with sodium sulfate . 
+ PEA has limited solubility in water , so it was added directly to the bulk medium , which was then sterilized by ﬁltration . 
+ Growth on PEA is temperature sensitive ( 32 ) and is signiﬁcantly improved by the addition of Casamino acids to growth media . 
+ Therefore , all PEA cultures were grown at 30 °C in the presence of 0.05 % ( wt/vol ) Casamino acids ( in 250-ml ﬂasks shaken at 250 rpm ) and were inoculated with precultures grown in glucose minimal medium . 
+ For cultures grown on glucose with 3-nitrotyramine as the nitrogen source , we found that growth was improved by restricting the oxygen supply ( which perhaps alleviated the oxidative stress associated with the production of hydrogen peroxide by the amine oxidase ) . 
+ Precultures were grown at 30 °C in 5 ml of medium in 16-mm culture tubes rotated at 50 rpm . 
+ Experimental cultures were grown at 30 °C in 20 to 50 ml of medium , in 250-ml ﬂasks shaken at 60 to 70 rpm . 
+ Genetic manipulations . 
+ The tynA and feaB promoter regions ( on 279 - and 247-bp fragments , respectively ) were ampliﬁed by PCR ( primer sequences for these and other procedures are listed in Table S1 in the supplemental material ) and cloned into pSTBlue-1 , using methods similar to those described previously ( 5 ) . 
+ Promoter fusions to lacZ were then constructed in pRS415 , transferred to RS45 , and integrated into the chromosome as described previously ( 5 , 35 ) . 
+ The plasmid pJP07 contains the nsrR gene ( with its own promoter ) modiﬁed at the 3 end by the addition of sequences encoding the 3XFlag epitope tag ( 41 ) . 
+ The modiﬁed nsrR gene was ampliﬁed from the chromosome of strain JOEY135 ( 10 ) and cloned into p2795 , a high-copy number plasmid derived from pBluescript ( 18 ) . 
+ The C-terminal epitope tag has no detectable effect on the activity of NsrR , either in vivo or in vitro ( unpublished work ) . 
+ The same modiﬁcation was used to identify NsrR binding sites by chromatin immunoprecipitation and microarray analysis ( ChIP-chip ) . 
+ For ChIP-chip experiments , published procedures were followed for strain constructions , growth of cultures , chromatin extraction , DNA labeling , array hybridization , and data analysis ( 10 ) . 
+ Enzyme assays . 
+ Extracts for the TynA and FeaB activity assays were prepared from 50-ml cultures grown to late exponential phase . 
+ Cell pellets were washed three times and resuspended in 1 ml of basal minimal medium ( with no carbon or nitrogen source ) . 
+ Cells were disrupted by sonication and then centrifuged at 
+ 16,000 g at 4 °C for 20 min . 
+ To remove membrane fragments , extracts were centrifuged at 100,000 g at 4 °C for 1 h. To assay the amine oxidase TynA , we measured oxygen uptake rates at 30 °C , using a Clark-type electrode ( Hansatech Instruments , King 's Lynn , Norfolk , England ) in a 0.1 M phosphate buffer ( pH 7.0 ) , 1.5 mM Na2SO4 . 
+ Reactions were started by the addition of 100 M substrate , a concentration chosen to avoid substrate inhibition by PEA . 
+ FeaB activities were assayed at 30 °C in 50 mM potassium phosphate ( pH 7.0 ) containing 2 mM NAD . 
+ Reactions were initiated by the addition of 50 M PEA or 100 M 3-nitrotyramine , and the absorbance at 340 nm was followed with a Cary 50 spectrophotometer ( Varian , Palo Alto , CA ) . 
+ Enzyme kinetic data were analyzed by direct curve ﬁtting using Kaleidagraph ( Synergy Software , Reading , PA ) software . 
+ Where substrate inhibition was evident , data were ﬁtted to the Haldane equation ( equation 1 ) ; otherwise data were ﬁtted to the Michaelis-Menten equation . 
+ V Vmax S2 Km S Ki
+ - Galactosidase activities were measured according to published protocols ( 25 ) . 
+ All enzyme activities were measured in duplicate with samples from three independently grown cultures . 
+ Chemicals and analytical methods . 
+ 3-Nitrotyramine was purchased from Apin Chemicals ( Abingdon , United Kingdom ) . 
+ Concentrations of stock solutions of 3-nitrotyramine were determined spectrophotometrically . 
+ The molar extinction coefﬁcient of 3-nitrotyramine ( 422 nm ) at pH 7.5 is 2,800 M 1 cm 1 ( 26 ) . 
+ Using this value , we determined the extinction coefﬁcient to be 1,973 M 1 cm 1 at pH 7.0 and used this latter value for measuring the concentrations of stock solutions . 
+ Diethylenetriamine ( DETA ) - NONOate was purchased from Cayman Chemicals ( Ann Arbor , MI ) . 
+ This compound decomposes at pH 7.4 , with a half-life ( t0 .5 ) of 20 h at 37 °C and 56 h at 22 to 25 °C , and releases two equivalents of NO ( Cayman Chemicals ) . 
+ The half-life of DETA-NONOate under the conditions of our experiments ( pH 7.0 ; 30 °C ) is not known , but we assume that it is between 20 and 56 h , and the interpretation of results is not affected by the exact half-life of the compound . 
+ DETA-NONOate was added to cultures at the time of inoculation and was present throughout growth . 
+ 3-Nitrotyramine and 4-hydroxy-3-nitrophenylacetate concentrations were measured in ﬁltered culture supernatants , using previously published methods ( 29 ) , except that a 100-mm column was used for high-performance liquid chromatography ( HPLC ) . 
+ RESULTS
+ Regulation of tynA and feaB by NsrR . 
+ We have recently used transcriptomics ( 13 ) and ChIP-chip analysis ( unpublished data ) to identify E. coli genes regulated by NsrR . 
+ The ChIP-chip approach identiﬁed binding sites for NsrR in the intergenic region between feaB and the regulatory gene feaR and in the promoter region of the adjacent tynA gene ( Fig. 2 ) . 
+ Full results of the genome-wide mapping of NsrR binding sites using ChIP-chip will be published elsewhere . 
+ The tynA and feaB genes encode the ﬁrst two enzymes of a pathway ( Fig. 1 ) that is required for the utilization of PEA as a carbon and energy source or utilization of tyramine or dopamine as a nitrogen source ( 9 ) . 
+ To determine whether there is any regulation of these genes by NsrR , we fused the feaR , feaB , and tynA promoters to lacZ and transferred the reporter fusions in single copies to the chromosome . 
+ We found no evidence for NsrR regulation of the feaR promoter ( data not shown ) . 
+ In exponential-phase cultures growing in complex medium ( not shown ) or in deﬁned medium with glycerol as the carbon source , the tynA and feaB promoters had low activities in both a wild-type strain and an nsrR mutant ( Table 2 ) . 
+ However , in cultures grown on PEA as the sole source of carbon and energy , the activities of both promoters increased substantially ( Table 2 ) . 
+ Regulation of the tynA and feaB genes by PEA ( and tyramine ) has been observed previously ( 16 , 42 ) and is presumed to involve FeaR , a predicted regulatory protein with an AraC-type DNA binding domain , though this regulation has not been conﬁrmed biochemically ( 9 ) . 
+ Accordingly , feaR mutants grew very poorly on 
+ PEA , and the tynA and feaB promoters were not induced by PEA in a feaR mutant ( Table 2 ) . 
+ In cultures of the nsrR mutant grown on PEA , we consistently observed small , though significant ( 20 to 50 % ) , increases in feaB and tynA promoter activities ( representative data are shown in Table 2 ) . 
+ The low feaB promoter activities observed in a feaR mutant were also derepressed to a small extent in the feaR nsrR double mutant . 
+ Thus , the magnitude of the repression exerted by NsrR on the feaB promoter is similar in both the presence and the absence of FeaR . 
+ In contrast to the small effects of the nsrR mutation , we found that overexpression of nsrR ( by increasing the gene copy number ) had large effects . 
+ Transformation with a high-copy number plasmid carrying a cloned nsrR gene resulted in se-verely impaired growth on PEA ( Fig. 3 ) and 17 - and 6-fold reductions in the activities of the tynA and feaB promoters , respectively ( Table 3 ) . 
+ Addition of a compound ( DETA-NONOate ) that releases NO very slowly ( t 20 to 56 h ) in 0.5 these slow-growing cultures restored both growth on PEA and maximal promoter activities ( Fig. 3 and Table 3 ) . 
+ This suggests that increasing the copy number of the nsrR gene causes re-a Activities are shown for the tynA and feaB promoters in reporter strains containing multiple copies of the nsrR gene and exposed ( ) or not ( ) to NO . 
+ Cultures grown with PEA as the carbon source were supplemented with 100 M DETA-NONOate , which decomposes with a t0 .5 of between 20 and 56 h under the conditions of this experiment , to release 2 equivalents of NO . 
+ b Numbers in parentheses are 1 standard deviation . 
+ Units are as deﬁned by Miller ( 25 ) . 
+ pression of the tynA and feaB promoters and , therefore , impaired growth on PEA . 
+ Under these conditions , the activities of both promoters can be regulated by NO . 
+ This repression by NsrR that can be alleviated by NO presumably involves NsrR binding to the sites identiﬁed by ChIP-chip analysis ( Fig. 2 ) . 
+ The cellular concentration of NsrR ( under the conditions used for these experiments ) seems to be poised such that its removal ( by mutation ) has small effects on these promoters , but overexpression causes severe repression . 
+ We were interested in determining the physiological properties of the feaB and tynA gene products that might provide a rationale for the inclusion of these genes in the NsrR regulon . 
+ Utilization of 3-nitrotyramine as a nitrogen source . 
+ TynA and FeaB have broad substrate speciﬁcities and , besides PEA , can also oxidize tyramine to 4-hydroxyphenylacetate . 
+ E. coli K-12 strains can not further oxidize 4-hydroxyphenylacetate and so use tyramine only as a nitrogen source . 
+ Dopamine may also be used as a substrate by this pathway and is oxidized to dihydroxyphenylacetate ( 9 ) . 
+ Decarboxylation of 3-NTyr yields 3-nitrotyramine ( 4 ) , and so we wondered whether 3-nitrotyra-mine might be a substrate for TynA and 4-hydroxy-3-nitrophe-nylacetaldehyde a substrate for FeaB . 
+ Assuming the presence of an as-yet-unidentiﬁed 3-NTyr decarboxylase , TynA and FeaB would provide a pathway for the conversion of 3-NTyr to 4-hydroxy-3-nitrophenylacetate ( Fig. 1 ) , a pathway similar to that described for rat PC12 cells ( 4 ) . 
+ E. coli MG1655 grew slowly in a deﬁned medium containing 3-nitrotyramine as the sole source of nitrogen ( Fig. 4 ) . 
+ The growth yield ( as estimated by the ﬁnal culture density per mole of substrate ) of cultures grown on 3-nitrotyramine was 59 % of that of cultures grown on ( NH4 ) 2SO4 ( Fig. 4 ) . 
+ Although we can not exclude other physiological explanations for the reduced growth yield , it is at least consistent with the notion that only one nitrogen of 3-ni-trotyramine can be assimilated . 
+ A tynA mutant of MG1655 failed to grow on 3-nitrotyramine ( data not shown ) , whereas a feaB mutant grew with the same yield as that of the wild-type strain ( Fig. 4 ) . 
+ The phenotypes of the tynA and feaB mutants are consistent with the pathway shown in Fig. 1 , and , together with the growth yield data suggest that growth on 3-nitrotyra-mine is at the expense of the amino group . 
+ Growth on 3-nitrotyramine as the sole source of nitrogen induced the tynA and feaB promoters in a wild-type strain and in an nsrR mutant , the - galactosidase activities being about 40 % of those observed for cultures grown on PEA ( Table 2 ) . 
+ FeaR is thought to mediate substrate inducibility of the tynA and feaB genes ( 9 ) , and a feaR mutant can not grow on 3-ni-trotyramine . 
+ Thus , upregulation of the two promoters by 3-ni-trotyramine is independent of NsrR and probably requires FeaR . 
+ The identity of the ligand for FeaR has not been established ; it may not be PEA ( or tyramine ) , given that TynA is located in the periplasm and that its substrate is , therefore , presumably not transported into the cell . 
+ In any case , the simplest explanation for our results is that 3-nitrotyramine , or a molecule related to 3-nitrotyramine ( perhaps 4-hydroxy-3-nitro-phenylacetaldehyde ) , can function with FeaR to control the activity of the tynA and feaB promoters . 
+ In nsrR mutants grown on 3-nitrotyramine , the tynA and feaB promoters showed activities that were modestly increased compared to those of the wild-type strains , as was the case for cultures grown on PEA ( Table 2 ) . 
+ The pathway for 3-nitrotyramine catabolism . 
+ To test the prediction ( Fig. 1 ) that 3-nitrotyramine is a substrate for TynA , we assayed substrate-dependent amine oxidase activity by following oxygen uptake by cell extracts in a Clark-type oxygen electrode . 
+ Using PEA as the substrate , we found evidence for substrate inhibition ( Fig. 5 ) , as has been reported previously ( 39 ) . 
+ The activity data ﬁtted well to the Haldane equation ( equation 1 ) for substrate inhibition , with estimates of apparent K 5.5 1.4 M and K 690 109 M. With m i 3-nitrotyramine as the substrate , oxygen uptake rates in the same cell extracts were somewhat lower ( Vmax 29.6 0.7 versus 55.5 2.9 nmol/min/mg protein for PEA ) but followed Michaelis-Menten kinetics , with an estimated apparent Km value of 7.2 1.3 M ( Fig. 5 ) . 
+ The tynA mutant of E. coli MG1655 does not grow on PEA ( conditions which are required to induce activity ) ; therefore , we were unable to assay 3-nitrotyramine-dependent oxygen uptake in a tynA mutant ( to provide direct proof that TynA is responsible for the measured activity ) . 
+ Nevertheless , other data we present in this paper lend conﬁdence to the idea that TynA is the enzyme responsible for oxidizing 3-nitrotyramine . 
+ We measured substrate-dependent oxygen uptake activities in cultures of MG1655 and in an nsrR mutant culture grown under a range of conditions that were similar to those used for assays of reporter fusions . 
+ The enzyme had extremely low activity or was undetectable in cells grown on glycerol ( Table 4 ) , which is consistent with the assays of tynA promoter activity ( Table 2 ) . 
+ TynA activity was detected in cells grown on PEA or 3-nitrotyramine ( Table 4 ) , which is again consistent with the reporter fusion assays . 
+ TynA activities ( measured with either substrate ) were 10-fold higher in PEA - versus 3-nitrotyra-mine-grown cells ( Table 4 ) , whereas the tynA promoter was only 2.5-fold more active in PEA-grown cells ( Table 2 ) . 
+ This discrepancy may be indicative of some posttranscriptional control of the tynA gene . 
+ Importantly , the activity assays provide additional conﬁrmation of the suggestion that 3-nitrotyramine acts as an inducer of the catabolic pathway . 
+ We were unable to test the role of FeaB in 3-nitrotyramine catabolism directly , since the postulated substrate ( 4-hydroxy-3-nitrophenylacetaldehyde ) is not commercially available . 
+ Therefore , we developed a coupled assay in which FeaB activity can be measured in cell extracts in the physiological direction by adding the substrate for TynA , which is oxidized in situ to generate the FeaB substrate . 
+ FeaB activity was measured by following the reduction of NAD to NADH . 
+ A feaB mutant can utilize both PEA ( 32 ) and 3-nitrotyramine ( Fig. 4 ) as nitrogen sources , presumably because the mutant can liberate the amino group of PEA and 3-nitrotyramine through the activity of TynA . 
+ The feaB mutant grown on PEA as a nitrogen source showed no PEA - or 3-nitrotyramine-dependent reduction of NAD with the FeaB assay , conﬁrming that FeaB is responsible for the measured activity . 
+ Phenylethylamine 66 (5) 41 (3) 73 (11) 44 (7) Glycerol ND ND ND 3 (1) 3-Nitrotyramine 6 (3) 4 (2) 7 (2) 4 (1)
+ a Phenylethylamine - and 3-nitrotyramine-dependent O2 uptake in extracts of cells grown on phenylethylamine or glycerol ( as the carbon source ) or on 3-ni-trotyramine ( as the nitrogen source ) . 
+ Cultures were grown at 30 °C in deﬁned medium containing 5 mM phenylethylamine or 40 mM glycerol as the carbon source or containing 11.1 mM glucose as the carbon source with 2.56 mM nitrotyramine as the nitrogen source . 
+ b Numbers in parentheses are 1 standard deviation ( SD ) . 
+ ND , not detectable ( 2 nmol min 1 mg protein 1 ) . 
+ Phenylethylamine 38 ( 2.3 ) 18 ( 1.6 ) 35 ( 2.8 ) 15 ( 2.2 ) Glycerol ND 2.4 ( 1.0 ) 1.0 ( 0.7 ) 1.3 ( 0.3 ) 3-Nitrotyramine 54 ( 2.5 ) 20 ( 6.4 ) 42 ( 4.4 ) 14 ( 4.0 ) a Data show phenylethylamine - and 3-nitrotyramine-dependent NAD reduction in extracts of cells grown on phenylethylamine or glycerol ( as the carbon source ) or on nitrotyramine ( as the nitrogen source ) . 
+ Cultures were grown at 30 °C in deﬁned medium containing 5 mM phenylethylamine or 40 mM glycerol as the carbon source or containing 11.1 mM glucose as the carbon source with 1.3 mM nitrotyramine as the nitrogen source . 
+ b Numbers in parentheses are 1 standard deviation ( SD ) . 
+ ND , not detectable ( 1 nmol min 1 mg protein 1 ) . 
+ Using the coupled NAD - linked assay , we could detect FeaB activity in cell extracts with PEA as the substrate for the assay ( Table 5 ) , activities that did not differ signiﬁcantly from those measured with the known FeaB substrate phenylacetal-dehyde ( data not shown ) . 
+ FeaB activity was low or undetect-able in cells grown on glycerol , though this may be a reﬂection of the absence of TynA activity under these conditions . 
+ In cells grown on 3-nitrotyramine as the nitrogen source , FeaB activity was detectable at levels similar to those seen with cells grown on PEA ( Table 5 ) . 
+ The major conclusion that can be drawn from the results is that oxidation of 3-nitrotyramine by TynA generates an intermediate ( presumably 4-hydroxy-3-nitrophe-nylacetaldehyde ) that can be further oxidized by FeaB . 
+ Thus , the predicted product of the pathway is 4-hydroxy-3-nitrophe-nylacetate ( Fig. 1 ) . 
+ This hypothesis was tested by determination of 3-nitrotyramine and 4-hydroxy-3-nitrophenylacetate in culture supernatants , using HPLC . 
+ After the growth ( of MG1655 and its nsrR mutant ) on a limiting concentration ( 1 mM ; Fig. 4 ) , 3-nitrotyramine was undetectable in culture supernatants , and there was almost stoichiometric ( 88 to 90 % ) accumulation of 4-hydroxy-3-nitrophenylacetate ( data not shown ) . 
+ Thus , 4-hydroxy-3-nitrophenylacetate is the likely end product of 3-nitrotyramine metabolism in E. coli . 
+ DISCUSSION
+ The starting point for the work described in this paper was the discovery , using ChIP-chip analysis , of NsrR binding sites in the tynA and feaB promoter regions . 
+ We went on to show that NsrR can function as a regulator of tynA and feaB expression and that the enzymes encoded by these genes can oxidize 3-nitrotyramine , in addition to the previously described substrates . 
+ These ﬁndings illustrate one advantage of the ChIP-chip approach as a means of identifying regulon members . 
+ We could not have discovered NsrR regulation of tynA and feaB by comparing the transcriptomes ( or proteomes ) of a wild-type strain and an nsrR mutant , because ( i ) regulation requires cultures to be grown on PEA , and there would be no a priori reason to choose those growth conditions for a transcriptomics experiment ; and ( ii ) revealing the full extent of NsrR regulation requires overexpression rather than deletion of nsrR ; again , it is unlikely we would have chosen to use those conditions in a transcriptomics experiment . 
+ The tynA and feaB promoters have not been well characterized , and the nature of the DNA sequence recognized by NsrR is incompletely understood , although a consensus sequence has been proposed , which is a long inverted repeat ( 5 , 34 ) . 
+ Analysis of the ChIP-chip targets ( unpublished work ) suggests that NsrR can bind to half of the inverted repeat sequence , but , in the absence of in vitro data , we can not reach ﬁrm conclusions about the locations of the NsrR binding sites in the tynA and feaB promoters . 
+ Deletion of nsrR has very small effects on the tynA and feaB promoters , while overexpression of nsrR causes severe repression . 
+ We have observed similar effects at some other NsrR-regulated promoters ( unpublished work ) and believe that the concentration of NsrR is typically poised in a range that is insufﬁcient to repress some promoters that are potentially controlled by NsrR . 
+ In this case , understanding the factors that regulate expression of the nsrR gene becomes especially important , since conditions that lead to the upregulation of nsrR would potentially lead to the regulation of promoters ( such as tynA and feaB ) that may otherwise escape repression . 
+ In this context , we have found that the nsrR promoter is twofold more active in cultures grown in minimal medium than in medium supplemented with amino acids ( unpublished data ) . 
+ This effect , albeit small , may provide an explanation for our observation that good growth on PEA requires the addition of amino acids to growth media . 
+ The amine oxidase ( TynA ) and phenylacetaldehyde dehydrogenase ( FeaB ) enzymes of E. coli K-12 strains have been viewed as providing a straightforward pathway for the catabo-lism of PEA , tyramine , and dopamine ( 9 ) . 
+ Two recently published observations suggest that these enzymes might have alternative and/or additional physiological roles . 
+ First , tynA mutants express the SOS response constitutively , which has been interpreted as indicating that the amine oxidase is responsible for removing an endogenously generated genotoxic compound ( 30 ) . 
+ Second , feaB was identiﬁed in a screen for genes important for survival under planktonic ( versus bioﬁlm ) growth conditions ( 20 ) . 
+ This observation implies that a substrate for FeaB was present in the minimal growth medium used or can be generated endogenously . 
+ Thus , there is circumstantial evidence from independent studies to suggest that TynA and FeaB might have roles in catabolizing endogenously generated substrates . 
+ The substrate inhibition of TynA ( Fig. 5 ) indicates that the enzyme is signiﬁcantly inhibited by the concentrations of PEA typically used in growth media ( 1 mM ) . 
+ Since TynA is located in the periplasm ( 32 ) , it is exposed to the medium concentration of PEA . 
+ The inhibition of TynA by physiologically relevant concentrations of PEA suggests that the enzyme is not optimally suited to a major role in PEA catabolism , which may account for the very slow growth on PEA ( Fig. 3 ) . 
+ Our results clearly show that TynA and FeaB also provide a pathway for the metabolism of 3-nitrotyramine ( which does not exert substrate inhibition on TynA ) and that the corresponding genes are regulated by NsrR and by NO . 
+ A rationale for this regulatory pattern may be provided if 3-ni-trotyramine accumulates , and must be disposed of , in cells exposed to NO. 3-Nitrotyramine can be generated by the decarboxylation of 3-NTyr ( 4 ) , which may accumulate in cells exposed to NO and superoxide ( 12 ) . 
+ However , E. coli strains are not known to express or encode an aromatic amino acid decarboxylase , and there is therefore no known pathway from 3-NTyr to 3-nitrotyramine . 
+ Accordingly , 3-NTyr can not be used as a nitrogen source by E. coli MG1655 , and there is no 3-NTyr-mediated stimulation of oxygen uptake or NAD reduction in cell extracts ( L. D. Rankin , and S. Spiro , unpublished observations ) . 
+ It remains to be seen whether 3-nitrotyra-mine is a physiologically signiﬁcant substrate for TynA , either endogenously generated or encountered in natural environments . 
+ The wider signiﬁcance of these observations also remains to be established . 
+ Homologs of tynA and feaB have restricted distributions in sequenced genomes and are found in the same organism quite infrequently . 
+ The feaR-feaB-tynA region of E. coli strain MG1655 is absent from several other E. coli strains , including some clinical isolates . 
+ Thus , the metab-olism we have identiﬁed may not be ubiquitously important . 
+ On the other hand , we would predict that organisms capable of expressing tyrosine decarboxylase along with homologs of tynA and feaB are potentially capable of degrading 3-nitrotyrosine . 
+ ACKNOWLEDGMENTS We thank Valley Stewart , Barry Wanner , Michael Hensel , and the National Bioresource Project ( Japan ) for gifts of strains and plasmids . 
+ This work was supported by grant MCB-0702858 from the National Science Foundation ( to S.S. ) and by grant AB07CBT002 from the Army Research Ofﬁce and the Defense Threat Reduction Agency ( to J.C.S. ) .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/19052235.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/19052235.txt 0 → 100644
View file @27818a9
+ Genome-scale reconstruction of the Lrp regulatory
+ Broad-acting transcription factors ( TFs ) in bacteria form regulons . 
+ Here , we present a 4-step method to fully reconstruct the leucine-responsive protein ( Lrp ) regulon in Escherichia coli K-12 MG 1655 that regulates nitrogen metabolism . 
+ Step 1 is composed of obtaining high-resolution ChIP-chip data for Lrp , the RNA polymerase and expression proﬁles under multiple environmental conditions . 
+ We identiﬁed 138 unique and reproducible Lrp-binding regions and classiﬁed their binding state under different conditions . 
+ In the second step , the analysis of these data revealed 6 distinct regulatory modes for individual ORFs . 
+ In the third step , we used the functional assignment of the regulated ORFs to reconstruct 4 types of regulatory network motifs around the metabolites that are affected by the corresponding gene products . 
+ In the fourth step , we determined how leucine , as a signaling molecule , shifts the regulatory motifs for particular metabolites . 
+ The physiological structure that emerges shows the regulatory motifs for different amino acid fall into the traditional classiﬁcation of amino acid families , thus elucidating the structure and physiological functions of the Lrp-regulon . 
+ The same procedure can be applied to other broad-acting TFs , opening the way to full bottom-up reconstruction of the transcriptional regulatory network in bacterial cells . 
+ ChIP-chip transcription factor
+ Transcriptional regulatory systems often regulate the formation rates and the concentration of small molecules by 2 feedback loops that regulate the transporters and metabolic enzymes . 
+ In many cases , these 2 feedback loops are connected by a common transcription factor ( TF ) that senses the concentration of the small molecule ( 1 ) . 
+ Little is known at present about the transition between the regulatory modes in the feedback loop motifs for global TFs in bacteria . 
+ One such transcription factor is the leucine-responsive protein ( Lrp ) , which is a global transcription regulator widely distributed throughout the bacteria including Escherichia coli ( 2 -- 4 ) . 
+ The Lrp regulon includes genes involved in amino acid biosynthesis and degradation , small molecule transport , pili synthesis , and other cellular functions including 1-carbon metabolism ( 2 , 4 -- 6 ) . 
+ The regulatory action of Lrp on target genes is often modulated by the binding of the small effector molecule leucine and in effect endows Lrp with the ability to affect transcriptional regulation in all possible ways . 
+ That is , upon addition of leucine to the environment , the activity of Lrp can be enhanced , reversed , or unaffected ( 2 , 4 , 7 ) . 
+ Little is known about in vivo Lrp-binding events at the genome scale in the presence or absence of leucine and the extent to which the different modes of regulation are used for different metabolites . 
+ Such information is needed to reconstruct the Lrp regulon and the understanding of nitrogen metabolism . 
+ In this study , we applied a systems approach by integrating genome-scale data from chromatin immunoprecipitation followed by microarray hybridization ( ChIP-chip ) for Lrp and RNA polymerase and from expression profiling to reconstruct the Lrp regulon . 
+ To achieve such reconstruction , we developed a 4-step process ( Fig. 1 ) . 
+ We first sought to comprehensively establish the Lrp-binding regions on the E. coli genome and any DNA sequence motif ( s ) correlated with the Lrp regulatory action . 
+ We measured the changes in RNA polymerase ( RNAP ) occupancies and mRNA transcript levels on a genome scale to determine the regulatory 
+ Seoub Park, and Bernhard Ø. Palsson1 man Dr., La Jolla, CA 92093-0412
+ roved October 14 , 2008 ( received for review July 24 , 2008 ) mode for each of the identified Lrp-binding regions under leucine-perturbed growth conditions . 
+ Second , we determined the regulatory modes governed by Lrp . 
+ Third , this enabled us to identify logical motif structures composed of 2 feedback loops for transporting and metabolizing small molecules . 
+ Fourth , we could classify the amino acids and other metabolites in to groups that had the same regulatory network motifs , and how these motifs were systematically shifted by the presence of leucine . 
+ The physiological role of the Lrp regulon is established through the reconstruction of its structure . 
+ Results
+ Step 1 : Identifying Lrp-Binding Regions and the Effects of Binding on Gene Expression . 
+ Four sets of experiments on a genome-wide scale were performed to achieve these goals ; ( i ) determination of Lrpbinding regions , ( ii ) promoter-profiling using rifampicin-treated cells , ( iii ) measurements of RNAP rearrangement , and ( iv ) measurements of changes in mRNA transcripts . 
+ Determination of Lrp-binding regions on a genome-wide scale . 
+ Lrp has been extensively characterized by in vitro DNA-binding experiments and in vivo mutational analysis ; however , direct analysis of in vivo Lrp binding has not been fully described ( 2 , 4 , 8 ) . 
+ Here , we employ the ChIP-chip approach to determine the in vivo Lrpbinding regions in E. coli cells growing in minimal media in exponential phase in the presence and absence of exogenous leucine and in stationary phase . 
+ Before microarray hybridization , we used quantitative PCR ( qPCR ) to validate the enrichment fold of the previously characterized Lrp-binding regions with ilvIH ( 9 ) , gcvTHP ( 10 , 11 ) , and gltBDF operons ( 12 ) in immunoprecipitated DNA ( IP-DNA ) obtained from the strain harboring 8 myc-tagged Lrp protein ( 13 ) . 
+ The qPCR results demonstrate that Lrp-bound DNA fragments were selectively immunoprecipitated from the growing E. coli cells under the conditions ( Fig . 
+ S1 ) . 
+ To determine the genome-wide Lrp-binding regions , we next performed a hybridization of the IP-DNA ( Cy5 channel ) and mock IP-DNA ( Cy3 channel ) onto the high-resolution whole-genome tiling microarrays , which contained a total of 371,034 oligonucle-otides with 50-bp tiles overlapping every 25-bp on both forward and reverse strands ( 14 ) . 
+ The normalized log2 ratios obtained from the hybridization identify the genomic regions enriched in the IP-DNA sample compared with the mock IP-DNA sample and thereby represent a genome-wide map of in vivo interactions between Lrp protein and E. coli genome ( Fig. 2A ) . 
+ The genome-wide Lrpbinding maps obtained from 3 different conditions , i.e. , exponential growth phase in the presence ( Fig. 2Ai ) and the absence ( Fig. 2Aii ) of exogenous leucine and stationary growth phase in the absence of leucine ( Fig. 2Aiii ) indicated that the Lrp association on the E. coli 
+ Author contributions : B.-K.C. and B.Ø.P . 
+ designed research ; B.-K.C. , E.M.K. , and Y.S.P. performed research ; B.-K.C. contributed new reagents/analytic tools ; B.-K.C. , C.L.B. , and B.Ø.P . 
+ analyzed data ; and B.-K.C. , C.L.B. , and B.Ø.P . 
+ wrote the paper . 
+ The authors declare no conﬂict of interest. This article is a PNAS Direct Submission.
+ 1To whom correspondence should be addressed . 
+ E-mail : bpalsson@ucsd.edu . 
+ This article contains supporting information online at www.pnas.org/cgi/content/full/ 0807227105/DCSupplemental . 
+ genome is dramatically sensitive to the nutrient richness . 
+ Using a peak detection algorithm based on the double-regression model ( 15 ) , 34 , 92 , and 134 unique and reproducible Lrp-binding regions were identified from the hybridizations in exponential phase in the presence and absence of leucine and in stationary phase in the absence of leucine , respectively ( Tables S1 and S2 ) . 
+ Identification and sequence analysis of Lrp-binding regions . 
+ Among a total of 138 Lrp-binding regions , 29 regions ( 21 % ) were bound under all 3 conditions examined ( Fig. 2B ) . 
+ As expected , the Lrp associations with target regions were very sensitive to the addition of exogenous leucine , as judged by the number of Lrp-binding regions found under conditions in the absence and presence of exogenous leucine . 
+ Only 30 % of binding sites ( 29 of 97 ) overlapped under the conditions in the absence and presence of exogenous leucine , and 65 % of binding sites ( 63 of 97 ) were newly found in the absence of exogenous leucine . 
+ A total of 41 Lrp-binding regions were newly identified in the IP-DNA sample obtained from the stationary condition , supporting the hypothesis that Lrp plays an important role in transcriptional regulation under stationary growth conditions ( 16 ) . 
+ Before this study , 23 Lrp-binding sites had been characterized by DNA-binding experiments in vitro and mutational analysis in vivo , 74 % ( 17 of 23 ) of which were identified in this study ( Fig. 2 A and C , Tables S1 and S2 ) . 
+ The exceptions were lrp , osmC , ompC , micF , aidB , and csiD promoters , whose functions are related to responses to osmotic stress and stationary growth phase . 
+ To determine whether the failure to detect Lrp binding at these 6 sites was due to the sensitivity of the microarrays , we performed conventional ChIP assays followed by qPCR analysis and confirmed the absence of Lrp binding in those regions under the experimental conditions used in this study . 
+ Furthermore , the ChIP-chip results were experimentally confirmed by using qPCR on arbitrarily selected sites from the 138 Lrp-binding regions ( gltB , ilvI , gcvT , kbl , leuE , dadA , yffQ , C0719 , sbcD , fimA , tppB , ftsQ , lysU , brnQ , sdaA , oppA , stpA , dppA , livJ , ydeN , and ynfF ) and 3 control regions ( dmsA , sdhC , and paaZ ) . 
+ All of the selected Lrp-binding regions exhibited enrichment as a log2 ratio range of 1.5 5.1 , whereas the 3 control regions showed no significant enrichment ( Fig . 
+ S1 ) . 
+ On the basis of this analysis , we concluded that all or nearly all Lrp-binding regions identified here are bona fide binding sites . 
+ We next assessed the locations of the Lrp-binding regions against the current annotated genome information ( Genbank accession number NC 000913 ) . 
+ The Lrp-binding regions were observed not only within intergenic ( i.e. , promoter and promoter-like ) regions but also within intragenic ( i.e. , ORF ) and convergent regions ( i.e. , intergenic region between 3 - ends of convergently transcribed genes ) ( Fig . 
+ S2 ) . 
+ Judging from the fact that 10 % of the E. coli genome is noncoding , our results indicate there is a strong preference for Lrp-binding targets to be located within the noncoding intergenic regions . 
+ To identify common DNA sequence motifs of the Lrp-binding regions , we developed an algorithm using both the DNA-binding position weight matrix ( PWM ) and the spacing between binding sites . 
+ The identified 15-bp sequence motif is structured with flanking CAG/CTG triplets and a central AT-rich signal that together are reminiscent of DNA sequence characteristics important for nucleosome positioning and stability ( Fig. 2D , Fig . 
+ S2 ) . 
+ We were not able to detect different LRP-binding motifs for subsets of LRP-binding regions corresponding to different LRP regulatory modes ( see below . ) 
+ Genome-wide rearrangement of RNAP association . 
+ To gain a better mechanistic understanding of the transcriptional regulatory roles of Lrp in response to exogenous leucine , we measured the RNAP occupancy on a genome scale to identify locations where RNAP occupancy is increased or decreased because of changes in Lrpbinding levels and exogenous leucine levels ( see table available at http://systemsbiology.ucsd.edu/publications ) . 
+ In parallel , to identify the promoter regions on a genome scale , the cells were treated with rifampicin to inhibit the transcription elongation step ( 17 ) . 
+ The RNAP-binding patterns obtained from rifampicin-treated cells ( see table available at http://systemsbiology.ucsd.edu/publications ) show similar patterns between the 2 conditions ; however , those from rifampicin nontreated cells indicate differential RNAP binding , representing the genome-wide rearrangement of RNAP caused by the exogenous leucine . 
+ Fig. 3 shows examples of the differential bindings of RNAP and Lrp on the gcvTHP , serA , oppABCDF , and sdaA operons in response to exogenous leucine . 
+ For example , the decrease in RNAP occupancy at the serA locus was observed because of the exogenous leucine . 
+ At the same time , the consequence of leucine addition was the sharp reduction of the Lrpbinding levels at the promoter region of serA . 
+ In this case , the role of Lrp is that of an activator and the exogenous leucine serve as an antagonist to repress the expression of the serA ( Fig. 3A ) . 
+ However , exogenous leucine represses the binding of Lrp , whose role is that of an inhibitor of RNAP binding to the oppABCDF and sdaA operons ( a repressor in these cases ) , so that the transcription of the genes in the operons becomes induced in response to the exogenous leucine ( Fig. 3 B and C ) . 
+ Measurements of changes in mRNA transcripts . 
+ Next , we examined the effects of exogenous leucine on the changes in the mRNA transcript levels of the exponentially growing cells using Affymetrix Gene-Chip E. coli Genome 2.0 arrays . 
+ To begin with , 629 genes were differentially expressed in response to the exogenous leucine with 
+ P value 0.05 and log2 ratio 0.5 ( see table available at http : / / systemsbiology.ucsd.edu/publications ) . 
+ Then , we compared the changes in mRNA transcript levels with the differential RNAP-binding levels . 
+ The differential RNAP-binding levels are defined as the difference between the sums of RNAP-binding levels across all probes in a targeted gene 's ORF under the 2 conditions . 
+ In the previous report ( 14 ) , there was very little correlation between the log2 ratio of RNAP-binding peaks obtained from the rifampicintreated cells and the expression of the nearest downstream ORF . 
+ However , we observed that the changes in RNAP-binding levels are correlated with the changes in the mRNA transcript levels in response to the exogenous leucine ( Fig . 
+ S3 ) . 
+ Given that Lrp differentially binds at least 91 promoter regions ( 194 genes ) within exponentially growing cells in the absence and presence of exogenous leucine , we expected the deletion of lrp to result in a substantial effect on the global gene expression patterns during the exponential growth phase in the absence and presence of exogenous leucine . 
+ Statistical analysis of the gene expression profiles of a parental ( MG1655 ) and an lrp deletion strains based on the 2-way ANOVA analysis ( P value 0.005 ) shows that the Lrp directly and indirectly regulates 52 % ( 330 genes ) of differentially expressed genes in response to the exogenous leucine ( see table available at http://systemsbiology.ucsd.edu/publications ) . 
+ The genes whose transcription levels we assumed to be controlled by the promoters directly bound by Lrp are summarized in Table S3 . 
+ Step 2 : Regulatory Modes of Lrp in Response to Exogenous Leucine . 
+ The most interesting aspect of the regulatory modes of Lrp in response to exogenous leucine is its variable response ; in some cases , exogenous leucine has no effect on the action of Lrp ; for others , it increases the effect of Lrp , and for others it reverses the effect of Lrp ( 7 ) . 
+ We reconfirmed by qPCR the changes in the mRNA transcript levels of the selected Lrp-binding regions ( 36 regions ) to use genome-wide expression profiling to define the regulatory modes of Lrp in response to exogenous leucine ( Fig . 
+ S3 ) . 
+ The results agreed well with the gene expression profiling data . 
+ Fig. 4A illustrates examples of changes in the levels of Lrp association at the promoters of gcvTHP , ftsQAZ , fimAICDFGH , livKHMGF , serA , and oppABCDF and changes in the levels of mRNA transcripts of the genes under the presence ( L ) and absence ( E ) of exogenous leucine . 
+ The Lrp associations at the promoter regions of gcvTHP and ftsQAZ were not strongly affected by the addition of exogenous leucine ( Fig. 4A i and ii ) , strongly supporting the observation that exogenous leucine has no effect on the transcriptional regulatory action of Lrp at these promoters . 
+ To confirm the unique transcriptional regulation mediated by Lrp , we next measured the level of mRNA transcripts of gcvTHP and ftsQAZ operon by qPCR . 
+ As expected , the addition of exogenous leucine had no effect on the level of mRNA transcripts ( Fig. 4A i and ii ) . 
+ Therefore , the first category of Lrp regulatory modes was denoted by the independent mode as shown in Fig. 4Bi . 
+ In this category , there are ftsQAZ and gcvTHP operons . 
+ In contrast , the Lrp associations at the promoter regions of fimAICDFGH and livKHMGF , which are known to be activated and repressed by Lrp , respectively , showed slight changes in the Lrp-binding levels in the presence of exogenous leucine ( 18 , 19 ) . 
+ However , the exogenous leucine strongly stimulated the changes in the levels of mRNA transcript levels of those operons ( Fig. 4A iii and iv ) . 
+ Because the exogenous leucine stimulates the effect of Lrp bindings on the promoters , the second category was denoted by the concerted mode ( Fig. 4Bii ) . 
+ Last , for most Lrp-regulated promoters identified ( 74 regions ) , the exogenous leucine relieved the effect of Lrp in vivo ( Fig. 4Biii ) . 
+ For example , Lrp activates and represses the transcription of serA and oppABCDF operon in the absence of exogenous leucine , respectively ( 20 , 21 ) , and exogenous leucine caused the strong repression of the Lrp-bindings on those promoter regions ( Fig. 4A v and vi ) . 
+ This third category was denoted by the reciprocal mode . 
+ Step 3 : Lrp-Regulated Feedback Loop Motifs . 
+ We next functionally classified the 236 genes directly regulated by Lrp and found that the products of many of the genes are known or predicted to function in a wide range of cellular processes and molecular functions , which included amino acid biosynthesis , amino acid degradation , nutrient transport , and synthesis of fimbriae ( Fig . 
+ S4 ) . 
+ We noticed that the functions of 45 % ( 118 genes ) of those genes are mainly localized to the small molecule transport and metabo-lism . 
+ Fig. 5A shows the integrative analysis of Lrp-binding profiles , RNAP occupancy profiles , and mRNA transcript levels of the selected transporters and metabolic enzymes for amino acids . 
+ In the case of transporters , the concerted and reciprocal regulatory modes exist in response to the exogenous leucine . 
+ Specifically , the transporters for branched-chain amino acids ( brnQ , ilvKHMGF , and ilvJHMGF ) are regulated by concerted mode ( Fig. 4Biv ) , whereas transporters for arginine , serine , alanine , and proline ( artMQIP , sdaC , cycA , and proY , respectively ) regulated by reciprocal mode ( Fig. 4vi ) . 
+ Interestingly , Lrp reciprocally regulates the aromatic amino acid transporters ( tyrP and mtr ) and indirectly regulates the general aromatic amino acid transporter ( aroP ) via TyrR transcription factor ( i.e. , Lrp directly regulates the TyrR by reciprocal mode ) . 
+ However , Lrp regulates the metabolic enzymes such as ilvE , tdcB , sdaA , and dadA through only reciprocal mode . 
+ Step 4 : Elucidation of the Function of the LRP Regulon . 
+ From this analysis , we were able to connect transport ( Uptake ) and metabolic ( Use ) feedback loop pairs ( Fig. 5Bi and Table S4 ) and characterize them by 1 of 4 possible combinations of feedback loop motifs ( 1 ) . 
+ In the left loop , Lrp regulates transcription of the transport protein ( T ) , facilitating the uptake of the small molecule ( Xout ) , whereas in the right loop , Lrp controls transcription of metabolic enzymes ( E ) responsible for transforming Xin into Y ( i.e. , metabolites ) . 
+ The logic of the coupled loop motifs can be described by a notation that uses 2 signs ( Fig. 5Bii ) . 
+ For example , the AA ( / ) loop motif indicates that the transcription of both transport and metabolic genes are activated , whereas the RR ( / ) motif demonstrates that the transcription of both genes are repressed . 
+ The possible logical structures of the feedback loop motifs can be characterized depending on how Lrp ( or Lrp-leucine complex ) activates or represses both T and E in response to the exogenous leucine : homeostasis ( / ) , nutrition ( / ) , flow homeostasis ( / ) , and accumulation ( / ) ( 1 ) . 
+ Based on the feedback loop motifs , we analyzed the behavior of logical structures of the transporters and metabolic enzymes in response to the exogenous leucine ( Table S4 ) . 
+ For example , there are 3 possible T-E combinations between branchedchain amino acids transporters ( brnQ , livKHMGF , and livJHMGF ) and 1 metabolic enzyme ( ilvE ) in E. coli . 
+ The combined feedback loop motifs for the branched-chain amino acids indicate the RR ( / ) logical structure in the absence of the exogenous leucine , whereas those are switched to RA ( / ) in exposing to exogenous leucine ( Fig. 5C ) . 
+ However , the combined feedback loop motifs in the T-E combinations for the aromatic amino acids show the transition between RA ( / ) and AR ( / ) logical structures . 
+ In the end , we classified the logical structures of the feedback loop motifs into 3 categories based on the behavior of logical structures in response to the exogenous leucine ( Fig. 5C ) . 
+ Discussion
+ We reconstructed the Lrp regulon in E. coli by combining genome-scale location analysis , promoter-profiling using rifampicin-treated cells , RNAP occupancy profiles , and gene expression data . 
+ We identified ( i ) regulatory modes for individual genes in the Lrp regulon , ( ii ) the behavior of the logical structures in the Lrp-regulatory feedback loop motifs composed of transporters and metabolic enzymes in response to the exogenous leucine , and ( iii ) the overall structure of the Lrp regulon and how it regulates the metabolism of families of amino acids and other metabolites . 
+ The genome-wide maps of Lrp-binding regions presented here not only confirm the previously characterized Lrp-binding regions ( 17 regions ) but also increase the number of known sites ( 138 regions ) by a factor of 5 . 
+ From the genome-wide mapping results , we were also able to show that : ( i ) A total of 138 Lrp-binding regions were identified , 84 % of which were located within noncoding regions , whereas the remaining 16 % were found within coding regions ; ( ii ) 34 and 92 Lrp-binding regions were identified in exponentially growing cells in the presence and absence of exogenous leucine , respectively , indicating that Lrp bindings to the E. coli genome are dramatically sensitive to the addition of exogenous leucine ; and ( iii ) the Lrp-binding sites on the E. coli genome under stationary growth condition ( 134 Lrp-binding regions identified ) indicated that Lrp plays pivotal roles in the transcriptional regulation under stationary growth conditions . 
+ The high number of Lrp-binding regions was not surprising , given that global transcription factors such as Fnr , Crp , Ihf , Fis , and Hns specifically bind to between 63 and 224 target regions ( 22 -- 24 ) . 
+ The genome-wide location analysis showed Lrp binding at 17 intragenic regions within ORFs and at 2 intergenic regions adjacent to convergently transcribed genes where current genomic annotation did not indicate a possible promoter . 
+ These Lrp-binding regions may indicate that genomic features have not been yet discovered , such as promoters of novel genes , or that transcription factor -- DNA interactions occur by chance and have not been removed by evolution ( 25 ) . 
+ As more experiments of this type are performed and as more functions are assigned to the gene products of hypothetical ORFs , an even clearer picture of the Lrp-binding regions in E. coli will emerge . 
+ It is indicative of the power of the ChIP-chip approach that four-fifths of the Lrp-binding regions observed here were previously unknown . 
+ Furthermore , the physiological functions of nearly one-third of directly Lrp-regulated genes are currently unknown , most of which ( 98 % ) are likely to be regulated by Lrp under the stationary growth condition . 
+ Therefore , much of this regulation can be understood by the principle of `` feast or famine '' adaptation for survival in nutrient-rich or depleted environments ( 16 ) . 
+ Although the physiological functions of the genes are currently unknown , the results presented here support previous suggestions that the physiological role of Lrp is to monitor the nutritional state of the cell to adjust its metabolism to changing nutritional conditions and , in cooperation with other regulatory networks such as alternative sigma factor S , to coordinate these changes with the physical environment of the cell . 
+ In general , the global transcription factors located at the higher level of hierarchy in the transcriptional regulatory network have many direct regulatory targets that are transcription factors located in the lower levels of the transcriptional regulatory network hierarchy ( 26 ) . 
+ We have identified 11 such transcription factors that are under Lrp control . 
+ They are DhaR , CysB , TyrR , SlyA , EutR , TdcA , GadW , LrhA , DeoT , YkgK , and Yhe , the latter 2 of which are predicted regulatory proteins . 
+ Most of these transcription factors participate in the regulation of the amino acid metabolism and small molecule transport . 
+ These regulatory proteins altogether are known to regulate the transcription of at least 34 genes . 
+ There are likely to be additional genes indirectly regulated by Lrp , such as genes regulated by metabolites produced by the metabolic enzymes in the regulatory interaction . 
+ Therefore , the experimental data presented here support the previous suggestion that the size of Lrp regulon is 10 % of all ORFs in E. coli ( 16 , 27 ) . 
+ The data integration of Lrp-binding information with the gene expression profiles , promoter profiling using rifampicin-treated cells , and RNAP occupancy profiles has enabled us to elucidate a fuller mechanistic understanding of the differential Lrp-binding profiles in response to exogenous leucine . 
+ We were able to show that ( i ) the differential Lrp-binding profiles in response to the exogenous leucine described 3 unique regulatory modes ( independent , concerted , and reciprocal ) ; ( ii ) the functional classification of genes directly regulated by Lrp represents the diverse roles of Lrp in the E. coli metabolism and the 45 % of genes in Lrp regulon to the small molecule transport and metabolism ; ( iii ) the feedback loop motifs composed of transporters and metabolic enzymes can be reconstructed based on the unique regulatory modes governed by Lrp and the functional localization of the genes ; and ( iv ) having described the behavior of the feedback loop motifs in response to exogenous leucine , we finally show that the 3 regulatory circuits for controlling small molecules uptake and utilization by the global transcription factor Lrp . 
+ In summary , we have described an integrative analysis of various types of genome-scale data to comprehensively understand the design principles of a global transcription factor , Lrp in E. coli . 
+ In the future , this systems approach will enable us to derive a similar understanding for how broad-acting transcription factors coordinate their activities to arrive at a functional organism . 
+ Materials and Methods
+ Bacterial Strains , Media , and Growth Conditions . 
+ E. coli BOP508 cells harboring Lrp-8 myc , BOP 508 deleted for lrp , and MG1655 were grown in glucose ( 2 g/L ) minimal M9 medium supplemented with or without 10 mM leucine . 
+ Glycerol stocksoftheE.colistrainswereinoculatedintotheminimalmediumandcultured at 37 °C with constant agitation overnight . 
+ The cultures were diluted 1:100 into 50 mL of the fresh minimal medium and then cultured at 37 °C to an appropriate cell density . 
+ For the rifampicin-treated cells , the rifampicin dissolved in methanol was added to a ﬁnal concentration of 150 g/mL and stirred for 20 min . 
+ Cultures were monitored by OD600 nm to verify the inhibitory effects of rifampicin . 
+ 1 . 
+ Krishna S , Semsey S , Sneppen K ( 2007 ) Combinatorics of feedback in cellular uptake and metabolism of small molecules . 
+ Proc Natl Acad Sci USA 104:20815 -- 20819 . 
+ 2 . 
+ Calvo JM , Matthews RG ( 1994 ) The leucine-responsive regulatory protein , a global regulator of metabolism in Escherichia coli . 
+ Microbiol Rev 58:466 -- 490 . 
+ 3 . 
+ YokoyamaK , etal . 
+ ( 2006 ) Feast/famineregulatoryproteins ( FFRPs ) : EscherichiacoliLrp , AsnC and related archaeal transcription factors . 
+ FEMS Microbiol Rev 30:89 -- 108 . 
+ 4 . 
+ Newman EB , Lin R ( 1995 ) Leucine-responsive regulatory protein : a global regulator of gene expression in E. coli . 
+ Annu Rev Microbiol 49:747 -- 775 . 
+ 5 . 
+ Landgraf JR , Wu J , Calvo JM ( 1996 ) Effects of nutrition and growth rate on Lrp levels in Escherichia coli . 
+ J Bacteriol 178:6930 -- 6936 . 
+ 6 . 
+ Faith JJ , et al. ( 2007 ) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression proﬁles . 
+ PLoS Biol 5 : e8 . 
+ 7 . 
+ Lin R , D'Ari R , Newman EB ( 1992 ) Lambda placMu insertions in genes of the leucine regulon : Extension of the regulon to genes not regulated by leucine . 
+ J Bacteriol 174:1948 -- 1955 . 
+ 8 . 
+ Cui Y , Wang Q , Stormo GD , Calvo JM ( 1995 ) A consensus sequence for binding of Lrp to DNA . 
+ J Bacteriol 177:4872 -- 4880 . 
+ 9 . 
+ Marasco R , et al. ( 1994 ) In vivo footprinting analysis of Lrp binding to the ilvIH promoter region of Escherichia coli . 
+ J Bacteriol 176:5197 -- 5201 . 
+ 10 . 
+ Stauffer LT , Stauffer GV ( 1994 ) Characterization of the gcv control region from Escherichia coli . 
+ J Bacteriol 176:6159 -- 6164 . 
+ 11 . 
+ Stauffer LT , Fogarty SJ , Stauffer GV ( 1994 ) Characterization of the Escherichia coli gcv operon . 
+ Gene 142:17 -- 22 . 
+ 12 . 
+ Ernsting BR , Denninger JW , Blumenthal RM , Matthews RG ( 1993 ) Regulation of the gltBDF operon of Escherichia coli : How is a leucine-insensitive operon regulated by the leucine-responsive regulatory protein ? 
+ J Bacteriol 175:7160 -- 7169 . 
+ 13 . 
+ Cho BK , Knight EM , Palsson BO ( 2006 ) PCR-based tandem epitope tagging system for Escherichia coli genome engineering . 
+ BioTechniques 40:67 -- 72 . 
+ 14 . 
+ Herring CD , et al. ( 2005 ) Immobilization of Escherichia coli RNA polymerase and location of binding sites by use of chromatin immunoprecipitation and microarrays . 
+ J Bacteriol 187:6166 -- 6174 . 
+ 15 . 
+ Zheng M , Barrera LO , Ren B , Wu YN ( 2007 ) ChIP-chip : Data , model , and analysis . 
+ Biometrics 63:787 -- 796 . 
+ ChIP-Chip . 
+ Culturesatmidlogorstationarygrowthphasewerecross-linkedby1 % formaldehyde at room temperature for 25 min . 
+ After cell lysis and sonication , the cross-linked DNA-Lrp and DNA-RNAP complexes were immunoprecipitated by using antibody against myc-tag and RNA polymerase subunit ( rpoB ) , respectively , and Dynabeads Pan Mouse IgG magnetic beads ( Invitrogen ) followed by stringent washings ( see SI Text for the detailed ChIP-chip protocol ) . 
+ After reversal of the cross-links by incubation at 65 °C overnight , the samples were treated by RNaseA ( Qiagen ) and proteaseK ( Invitrogen ) and then puriﬁed with a PCR puriﬁcation kit ( Qiagen ) . 
+ To verify the enrichment of the Lrp-binding regions in the DNA samples , 1 L of IP or mock-IP DNA was used to perform gene-speciﬁc real-time qPCR with the speciﬁc primers to the promoter regions . 
+ The IP and mock-IP DNA were then ampliﬁed by random DNA ampliﬁcation method ( 14 ) . 
+ Then , the ampliﬁed DNA samples were labeled and hybridized onto whole-genome-tiled microarrays ( NimbleGen ) . 
+ Detailed methods used for ChIP-chip analysis is described in SI Text . 
+ Transcriptional Analysis . 
+ Cultures were grown to midlog growth phase aerobically ( OD A600 0.6 ) . 
+ The cultures ( 3 mL ) were then added to 2 volumes of RNAprotect Bacteria Reagent ( Qiagen ) and total RNA was isolated by using RNAeasy columns ( Qiagen ) with DNaseI treatment . 
+ Total RNA yields were measured by using a spectrophotometer ( A260 ) and quality was checked by visualization on agarose gels and by measuring the sample A260/A280 ratio ( 1.8 ) . 
+ cDNA preparation was performed as described in ref . 
+ 13 . 
+ Each qPCR contained 0.5 M of each forward and reverse primer ( SI Text for the detailed qPCR protocol ) , 150 ng of cDNA , and 25 L of SYBR Master Mix ( Qiagen ) . 
+ Affymetrix GeneChip E. coli Genome 2.0 arrays were used for genome-scale transcriptional analyses . 
+ cDNA synthesis , fragmentation , end-terminus biotin labeling , and array hybridization were performed as recommended by Affymetrix standard protocol . 
+ Data Analysis . 
+ To identify enriched regions in the ChIP-chip data , we used the previously developed peak-ﬁnding algorithm ( 15 ) . 
+ For the Lrp-binding site pro-ﬁle , thedetailsofthemannerinwhichweidentiﬁedhigh-probabilityLrp-binding regions using ChIP-chip peaks and of the algorithm that we developed to learn the Lrp DNA-binding signals are contained in SI Text . 
+ Raw Experimental Data . 
+ SI Text and all raw data ﬁles can be downloaded from http://systemsbiology.ucsd.edu/publications . 
+ ACKNOWLEDGMENTS . 
+ ThisworkwassupportedbyNationalInstitutesofHealth Grant GM062791 and the Ofﬁce of Science ( BER ) , U. S. Department of Energy , cooperative agreement DE-FC02-02ER63446 . 
+ 16 . 
+ Tani TH , et al. ( 2002 ) Adaptation to famine : A family of stationary-phase genes revealed by microarray analysis . 
+ Proc Natl Acad Sci USA 99:13471 -- 13476 . 
+ 17 . 
+ Campbell EA , et al. ( 2001 ) Structural mechanism for rifampicin inhibition of bacterial RNA polymerase . 
+ Cell 104:901 -- 912 . 
+ 18 . 
+ Kelly A , et al. ( 2006 ) DNA supercoiling and the Lrp protein determine the directionality of ﬁm switch DNA inversion in Escherichia coli K-12 . 
+ J Bacteriol 188:5356 -- 5363 . 
+ 19 . 
+ Haney SA , Platko JV , Oxender DL , Calvo JM ( 1992 ) Lrp , a leucine-responsive protein , regulates branched-chain amino acid transport genes in Escherichia coli . 
+ J Bacteriol 174:108 -- 115 . 
+ 20 . 
+ Platko JV , Willins DA , Calvo JM ( 1990 ) The ilvIH operon of Escherichia coli is positively regulated . 
+ J Bacteriol 172:4563 -- 4570 . 
+ 21 . 
+ Rex JH , Aronson BD , Somerville RL ( 1991 ) The tdh and serA operons of Escherichia coli : Mutational analysis of the regulatory elements of leucine-responsive genes . 
+ J Bacteriol 173:5944 -- 5953 . 
+ 22 . 
+ Grainger DC , et al. ( 2005 ) Studies of the distribution of Escherichia coli cAMP-receptor protein and RNA polymerase along the E. coli chromosome . 
+ Proc Natl Acad Sci USA 102:17693 -- 17698 . 
+ 23 . 
+ Grainger DC , Hurd D , Goldberg MD , Busby SJ ( 2006 ) Association of nucleoid proteins with coding and non-coding segments of the Escherichia coli genome . 
+ Nucleic Acids Res 34:4642 -- 4652 . 
+ 24 . 
+ Grainger DC , et al. ( 2007 ) Transcription factor distribution in Escherichia coli : Studies with FNR protein . 
+ Nucleic Acids Res 35:269 -- 278 . 
+ 25 . 
+ Shimada T , Ishihama A , Busby SJ , Grainger DC ( 2008 ) The Escherichia coli RutR transcriptionfactorbindsattargetswithingenesaswellasintergenicregions.NucleicAcids Res 36:3950 -- 3955 . 
+ 26 . 
+ Martinez-Antonio A , Collado-Vides J ( 2003 ) Identifying global regulators in transcriptional regulatory networks in bacteria . 
+ Curr Opin Microbiol 6:482 -- 489 . 
+ 27 . 
+ Hung SP , Baldi P , Hatﬁeld GW ( 2002 ) Global gene expression proﬁling in Escherichia coli K12 . 
+ The effects of leucine-responsive regulatory protein . 
+ J Biol Chem 277:40309 -- 40323 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/19150431.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/19150431.txt 0 → 100644
View file @27818a9
+ Molecular Cell Ar ticle
+ Rachel A. Mooney ,1 Sarah E. Davis ,1,4 Jason M and Robert Landick1 ,4 , * 1Department of Biochemistry 2Department of Genetics 3Genome Center 4Department of Bacteriology University of Wisconsin , Madison , WI 53706 , USA * Correspondence : landick@bact.wisc.edu DOI 10.1016 / j.molcel .2008.12.021 
+ Regulator Trafﬁcking on Bacterial
+ SUMMARY
+ The trafﬁcking patterns of the bacterial regulators of transcript elongation s , r , NusA , and NusG on 70 genes in vivo and the explanation for promoterproximal peaks of RNA polymerase ( RNAP ) are unknown . 
+ Genome-wide , E. coli ChIP-chip revealed distinct association patterns of regulators as RNAP transcribes away from promoters ( r ﬁrst , then NusA , then NusG ) . 
+ However , the interactions of elongating complexes with these regulators did not differ significantly among most transcription units . 
+ A modest variation of NusG signal among genes reﬂected increased NusG interaction as transcription prog-resses , rather than functional specialization of elongating complexes . 
+ Promoter-proximal RNAP peaks were offset from s peaks in the direction of tran-70 scription and co-occurred with NusA and r peaks , suggesting that the RNAP peaks reﬂected elongating , rather than initiating , complexes . 
+ However , inhibition of r did not increase RNAP levels within genes downstream from the RNAP peaks , suggesting the peaks are caused by a mechanism other than r-dependent attenuation . 
+ INTRODUCTION
+ Transcription of genes by RNAP is controlled by a multiplicity of regulators that modulate template DNA conformation , control initiation , or govern RNAP 's progress through transcription units ( TUs ) in response to internal and environmental signals . 
+ In bacteria and eukaryotes , transcription regulators can be divided into those acting during transcript initiation , elongation , or termination . 
+ Precisely where initiation regulators release and elongation regulators associate with RNAP is unknown . 
+ Further , the distinction between these classes of regulators is not absolute ; some may act during multiple stages of transcription , possibly with different effects . 
+ Finally , although some elongation regulators are known to target subsets of TUs , it is unclear whether general elongation regulators like NusA , NusG , and r interact with most elongating complexes ( ECs ) equivalently or instead preferentially interact with certain TUs or sites within TUs . 
+ In bacteria , s initiation factors bind tightly to core RNAP ( consisting of b0 , b , a2 , and u subunits ) and determine the sequence speciﬁcity of RNAP-promoter interactions ( Figure 1A ) . 
+ ss are thought to be released shortly after RNA synthesis begins . 
+ However , whether s release occurs obligately or stochastically , whether s may be completely retained on a subset of TUs , and whether s may transiently rebind to the EC during elongation with possible regulatory consequence all remain in debate ( Bar-Nahum and Nudler , 2001 ; Kapanidis et al. , 2005 ; Mooney et al. , 2005 ; Mooney and Landick , 2003 ; Mukhopadhyay et al. , 
+ 2001 ; Raffaelle et al. , 2005 ; Reppas et al. , 2006 ; Wade and Struhl , 2004 , 2008 ) . 
+ During or after promoter escape , the EC can associate with one or more elongation regulator ( Figure 1A ) . 
+ In bacteria , NusA and NusG alter EC properties differently via direct and independent interactions with RNAP and are the best characterized regulators of elongation ( Burns et al. , 1998 ; Greenblatt et al. , 1981 ; Li et al. , 1992 ; Linn and Greenblatt , 1992 ; Sullivan and Got-tesman , 1992 ) . 
+ NusA preferentially enhances transcriptional pausing associated with nascent RNA hairpins ( Artsimovitch and Landick , 2000 ; Farnham et al. , 1982 ; Greenblatt et al. , 1981 ; Yakhnin and Babitzke , 2002 ) , enhances intrinsic termination at some sites more than others ( Kassavetis and Chamberlin , 1981 ; Linn and Greenblatt , 1992 ; Yakhnin and Babitzke , 2002 ) , modulates r-dependent termination ( Burns et al. , 1998 ) , and is an essential component of antitermination complexes that form on ribosomal RNA ( rrn ) and phage l operons ( Mason et al. , 1992 ; Shankar et al. , 2007 ; Torres et al. , 2001 ) . 
+ NusG increases the rate of RNA chain extension , at least partly by decreasing pausing associated with backtracking ( Artsimovitch and Land-ick , 2000 ) , enhances r-dependent termination via interactions with RNAP and r ( Li et al. , 1992 , 1993 ; Sullivan and Gottesman , 
+ 1992 ) , and also is a component of both rrn and l antitermination complexes ( Mason et al. , 1992 ; Torres et al. , 2001 ) . 
+ Despite these multiple roles of NusA and NusG , it is unclear whether they associate equivalently with ECs on all TUs , differentially with subsets of TUs , or differentially at locations within TUs . 
+ The homohexameric r protein terminates transcription after binding to unstructured , C-rich nascent RNA . 
+ RNA stimulation of its ATP-dependent translocase activity allows r to travel 50 to 30 along the RNA and dissociate ECs unless blocked by intervening ribosomes ( Richardson , 2002 ) . 
+ It is uncertain where within TUs r interacts with ECs and whether r preferentially affects a subset of TUs . 
+ The report of Reppas et al. ( 2006 ) that a signiﬁcant fraction of TUs in E. coli exhibit promoter-proximal peaks of RNAP heightens interest in knowing whether promoter-proximal , r-dependent termination could contribute to the apparent decrease in RNAP density downstream from promoters . 
+ To investigate trafﬁcking of these regulators on bacterial TUs and the reported promoter-proximal block to transcriptio 
+ ( Reppas et al. , 2006 ) , we used `` chromatin immunoprecipitation '' ( Kuo and Allis , 1999 ; Solomon et al. , 1988 ) followed by microarray hybridization ( ChIP-chip ; Wade et al. , 2007 ) . 
+ Our study provides comparative analysis with improved resolution of some proteins examined previously ( RNAP , s , and NusA ; 70 Grainger et al. , 2005 ; Herring et al. , 2005 ; Raffaelle et al. , 2005 ; 
+ Reppas et al. , 2006 ; Wade and Struhl , 2004 ) , as well as genome-wide views of NusA , NusG , and r , leading to important insights into trafﬁcking of bacterial transcription regulators . 
+ RESULTS
+ Analysis of RNAP ChIP-chip Signals on E. coli TUs
+ We applied ChIP-chip to E. coli K-12 at mid-log phase of growth at 37 C in deﬁned minimal glucose medium ( Experimental Procedures ) , conditions in which many biosynthetic genes must be expressed and that were used previously for expression analysis ( Allen et al. , 2003 ) . 
+ Using speciﬁc antibodies targeting core RNAP , s , NusA , r , or a hemagglutinin ( HA ) epitope present in 70 three copies at the N terminus of the chromosomal nusG gene , we obtained associated DNA that was then ﬂuorescently labeled and hybridized to a tiled oligonucleotide microarray ( 25 bp spacing ; Experimental Procedures ) . 
+ Initial analysis of the immu-noprecipitated DNAs relative to input DNA revealed excellent correspondence among the sites of enrichment by anti-s and 70 anti-RNAP ( anti-b0 ) antibodies ( Figure 1B ) . 
+ Closer examination ( e.g. , of the expanded region around 0.94 mb shown in Figure 1B ) revealed that s was predominantly associated with 70 DNA near promoters , whereas RNAP could be detected in association with both promoter and transcribed-region DNA . 
+ The strongest signals were in genes encoding tRNA , rRNA , and ribosomal proteins ( e.g. , serW and rpsA ) , as expected and reported previously ( Grainger et al. , 2005 ; Raffaelle et al. , 2005 ; Reppas et al. , 2006 ; Wade and Struhl , 2004 ) . 
+ NusA , NusG , and r were associated with ECs in most locations where RNAP was present . 
+ RNAP is known to associate nonspeciﬁcally with chromosomal DNA ( deHaseth et al. , 1978 ; Grigorova et al. , 2006 ; von Hippel et al. , 1974 ) . 
+ To estimate the corresponding nonspeciﬁc ( background ) ChIP-chip signal for RNAP , we examined regions of the bacterial chromosome thought to be devoid of transcription , such as the cryptic bglB gene ( Defez and De Felice , 
+ 1981 ) . 
+ We identiﬁed 170 regions greater than 1 kb whose average RNAP ChIP signal was indistinguishable from that on bglB ( bkgd , Figure 1B ; gray box near 0.94 mB in expanded region ; Table S2 ) . 
+ The signals for these regions were normally distributed with a mean below the signal for 84 % of the complete genome-wide probe set ( compare black to blue histo-grams , Figure 1C ; Supplemental Experimental Procedures ) . 
+ This suggests that most of the E. coli genome is transcribed at levels above the nonspeciﬁc background , consistent with previous estimates ( Selinger et al. , 2000 ) . 
+ To characterize RNAP and regulator occupancy further , we identiﬁed `` high-quality '' TUs that were signiﬁcantly above this background and for which signals from adjacent TUs did not obscure the pattern of RNAP and regulator association and dissociation ( e.g. , serS in the expanded region as opposed to clpA and cydCD , which were obscured by strong signals from the adjacent serW tRNA gene ) . 
+ We identiﬁed 109 such TUs , which were spread across the E. coli genome and represented a range of expression levels and TU lengths ( Figure 1D and Table S1 ) . 
+ Regulator Trafﬁcking on Representative E. coli TUs
+ To gauge the basic patterns of regulator trafﬁcking on these 109 TUs , we wished to scale the data in proportion to occupancy of regulators on DNA . 
+ Although true occupancy is impossible to measure without knowing the relative efﬁciencies of crosslinking for each protein at each TU location as well as the signals corresponding to zero and full occupancy , we nevertheless deﬁned an apparent occupancy ( Occapp ) by linearly scaling signals for each protein between zero , which was set equal to the background deﬁned by bglB-similar regions ( Figure 1C ; Table S2 ) , and one , which was arbitrarily deﬁned as the average of the ten threeprobe clusters with highest average value ( Figure 1C ; Supplemental Experimental Procedures ) . 
+ Therefore , Occapp is a function of true occupancy and relative `` crosslinkability . '' 
+ An examination of eight representative TUs ( seven from among the 109 high-quality TUs plus rrnE ) revealed signiﬁcant variation both in the uniformity of RNAP and regulator Occapp across TUs and in the ratios of RNAP Occapp to s and other 70 regulators at locations within TUs ( Figure 2 ) . 
+ In some cases , the peak of s Occapp surrounding the transcription start site 70 ( TSS ) was much greater than RNAP Occapp , with the latter exhib-iting a relatively uniform distribution across the TU ( serS , rspF , and acnB ; Figures 2A , 2D , and 2F ) . 
+ In other cases , the s 70 peak was more similar to the corresponding RNAP Occapp ( atpI BEFHAGDC , gltBDF , and carAB ; Figures 2C , 2E , and 2H ) ; in these cases RNAP typically exhibited a pronounced promoterproximal peak similar to that previously reported ( Reppas et al. , 2006 ; Wade and Struhl , 2008 ) . 
+ These representative examples suggest there is no one-to-one correspondence between s Occapp and RNAP Occapp at promoters ; this obser-70 vation was reﬂected in the modest ( 0.77 ) correlation between peak Occapp values for s and RNAP ( Figure S1 ) . 
+ 70 
+ Patterns The regulators s , NusA , NusG , and r all appeared to be 70 present on each TU , but with notable differences in their Occapp distributions . 
+ NusA closely mirrored RNAP on each TU , appearing to associate with RNAP as the signal from s disappears . 
+ 70 This is consistent with the long-standing view that NusA displaces s during transcript elongation ( Greenblatt and Li , 70 1981 ) . 
+ In contrast , NusG appeared to associate with elongating 
+ RNAP farther from promoters and did not appear to be present at locations where RNAP forms promoter-proximal peaks . 
+ Rather , NusG Occapp rose gradually to levels that exceed other regulators on most TUs . 
+ The ratio of NusG/RNAP Occapp appeared to be much greater in the distal portions of some TUs ( e.g. , atpIBEFHAGDC and cyoABCDE ; Figures 2C and 2G ) than others ( e.g. , rrnE and rpsF-priB-rpsR-rplI ; Figures 2B and 2D ) . 
+ The different pattern of NusG on rrnE may reﬂect its participation ( with NusA , NusB , NusE , and a subset of ribosomal proteins ) in the rrn antitermination complex ( Torres et al. , 2001 ) . 
+ r exhibited a striking pattern of signiﬁcant promoter-proximal peaks near s peaks and RNAP promoter-proximal peaks , 70 but a lower Occapp over most of the TU . 
+ Finally , although s 70 
+ Occapp for the rrnE TU and seven representative TUs from among the 109 TUs selected for the absence of interfering upstream or downstream signals ( Figure 1D and Table S1 ) . 
+ Occapp was calculated as described in the Supplemental Experimental Procedures using two rounds of sliding-window smoothing ( 500 bp window for RNAP , NusA , NusG , and r ; 175 bp window for s ) . 
+ Genes are depicted as labeled open arrows , promoters as vertical lines 70 capped with arrows , and known intrinsic terminators as hairpins . 
+ Note that the scales of Occapp and TU length ( in kb , denoted by hatchmarks ) differ in each panel . 
+ Protein-encoding genes are colored blue , and the rRNA TU is colored yellow . 
+ Regulators are colored as in Figure 1 . 
+ Vertical dotted lines are the center of the s peak . 
+ For the rrn TU , there are two promoters ( and two s peaks ) . 
+ 70 70 ( A ) serS , a monocistronic TU encoding seryl-tRNA synthetase . 
+ ( B ) rrnE , one of seven E. coli rRNA TUs . 
+ Due to near-sequence-identity among the rRNA TUs , these signals represent the average of all seven rRNA TUs . 
+ ( C ) atpIBEFHAGDC , the nine-gene TU encoding the F0 , F1 ATP synthase . 
+ ( D ) rpsFpriBrpsRrplI , encoding the ribosomal protein S6 , DNA replication primosome protein N , ribosomal protein S18 , and ribosomal protein L9 . 
+ ( E ) gltBDF , encoding glutamate synthase large and small subunits and a peri-plasmic protein involved in nitrogen metabolism . 
+ ( G ) cyoABCDE , encoding cytochrome bo terminal oxidase and heme O synthase . 
+ ( H ) carAB , encoding carbamoyl phosphate synthetase . 
+ was principally present at promoters , as reported previously ( Re-ppas et al. , 2006 ; Wade and Struhl , 2004 ) , s Occapp remained 70 above zero across most TUs ( e.g. , serS , rrnE , and cyoABCDE ; Figures 2A , 2B , and 2G ) . 
+ To examine the correlation between RNAP and regulator presence on TUs more carefully , we calculated the average ChIP ¬ 
+ ( A ) Diagram illustrating calculation of mid-TU signals . 
+ For each of the 109 highquality TUs , the log2 ( IP/input ) signals for all probes within a 200 bp window surrounding the center of the TU were averaged to yield an estimate signal due to elongating RNAP or regulator associated with the elongating RNAP . 
+ Regulators are colored according to Figures 1 and 2 . 
+ ( B ) Correlation of s and RNAP mid-TU signals . 
+ Only TUs for which the mid-70 TU point was more than 500 bp from the s peak were included ( to avoid inﬂu-70 ence of signal from the s peak ; n = 80 ) ; r = 0.68 ; p < 0.001 . 
+ 70 ( C ) Correlation of NusA and RNAP mid-TU signals ( n = 109 ) ; r = 0.97 ; p < 0.001 . 
+ ( D ) Correlation of NusG and RNAP mid-TU signals ( n = 109 ) ; r = 0.71 ; p < 0.001 . 
+ ( E ) Correlation of r and RNAP mid-TU signals ( n = 109 ) ; r = 0.78 ; p = < 0.001 . 
+ ( F ) The correlation coefﬁcient between the RNAP signal and each of the regulator signals plotted versus mean mid-TU signal for the regulator . 
+ Mean signals for NusA , NusG , s , and r are 73 % , 95 % , 33 % , and 51 % of mean RNAP 70 signals , respectively . 
+ chip signals for each in a 200 bp window in the middle of the 109 high-quality TUs ( Figure 3A ) and compared the regulator and RNAP ChIP-chip signals directly ( Figures 3B -- 3F ) . 
+ Strikingly , s , NusA , NusG , and r mid-TU signals all exhibit an obvious 70 correlation with RNAP mid-TU signals . 
+ However , the correlation was much greater for NusA than for s , NusG , or r ( Figure 3F ) . 
+ 70 For s and r , the weaker correlation is consistent with lower 70 signal-to-noise ratio resulting from the reduced mean signals in the middle of the TUs . 
+ However , this is not the case for NusG , where the mean mid-gene signal was as large as the RNAP signal despite the much-reduced correlation ( Figure 3F ) . 
+ These results suggest that elongating RNAPs do not exhibit TU-speciﬁc variations in afﬁnity for s , NusA , NusG , or r. Although the rela-70 tive afﬁnity of each regulator for ECs differs ( i.e. , s and r exhibit 70 lower signals than NusA and NusG ) , there is no indication that they target one subset of TUs relative to others . 
+ Thus , they can rightly be classiﬁed as general elongation regulators as opposed to specialized regulators like RfaH that are recruited to a speciﬁc subset of TUs ( Artsimovitch and Landick , 2002 ) . 
+ To resolve the pattern of s , NusA , NusG , and r interactions 70 with RNAP more accurately , we took advantage of the similarity of these interactions among TUs to compute aggregate Occapp proﬁles ( Figure 4 ) . 
+ For this purpose , we selected a set of highly transcribed TUs among the 109 high-quality TUs ( to improve signal-to-noise ratios ) and avoided TUs known to contain transcription attenuators ( e.g. , trp or leu ) or multiple promoters that might complicate the distribution of RNAP . 
+ This yielded a set of 42 TUs that included 13 lacking an obvious promoter-proximal RNAP peak and 29 containing a readily discerned promoterproximal RNAP peak ( traces B and C in Figure 4A ) . 
+ We computed the aggregate Occapp for these TUs by aligning them relative to the genome coordinate of their s peak and then averaging 70 normalized Occapp values for each protein ( normalized relative to the highest Occapp for that protein in a given TU ) . 
+ The RNAP peak aggregate Occapp for the 42 TUs was offset in the direction of transcription from the s peak by 150 bp ( d in Figure 4A ) . 
+ The 70 size of this offset was widely distributed among different TUs and was uncorrelated with RNAP mid-TU signal ( Figure S2 ) . 
+ However , the 29 TUs exhibiting pronounced peaks were , on average , longer ( 3.43 kb average length ) , whereas the TUs on which Occapp declined much more slowly were , on average , shorter TUs ( 1.36 kb average length ; Mann-Whitney p < 0.001 ) . 
+ The aggregate Occapp proﬁles highlighted differences in regulator trafﬁcking on E. coli TUs . 
+ s appeared to dissociate from 70 
+ RNAP as RNAP loses contact with the promoter ( as reported previously by Raffaelle et al. , 2005 ; Reppas et al. , 2006 ; Wade and Struhl , 2004 ) . 
+ Although the s peak was nearly symmetric 70 around its center as noted by Reppas et al. ( 2006 ) , it was skewed 20 bp downstream at its vertical midpoint in our data ( Figure S3 ) . 
+ This s skew was caused by translocation of 70 
+ RNAP relative to the TSS , as evidenced by loss of the skew and a slight upstream shift of the s peak upon treatment of 70 cells with rifampicin ( Figure S3 ) . 
+ Conversely , NusA appeared to associate fully with elongating RNAP sometime after the s 70 signal disappeared ( Figures 4B and 4C ) . 
+ Both the NusA and r aggregate proﬁles exhibited promoter-proximal peaks , as observed for the individual proﬁles ( compare Figures 2 and 4C ) . 
+ However , the r peak was displaced 50 bp upstream ( relative to the RNAP peak ) , whereas the NusA peak was displaced downstream . 
+ Finally , NusG associated with elongating RNAP much more slowly than either NusA or r , reaching a plateau of Occapp 800 bp downstream of the s peak . 
+ The same aggre-70 gate and individual-TU patterns of NusG association were observed using anti-NusG polyclonal antibody ( Figure S4 ) , ruling 
+ Taken together , our analysis of regulator trafﬁcking on E. coli TUs ( Figures 2 -- 4 ) leads to the following key conclusions . 
+ First , s crosslinks almost exclusively to promoter DNA , although 70 a downstream skew of the s peak and weak s signal in the 70 70 middle of TUs are consistent with stochastic release of s 70 from elongating RNAP followed by weak s association with 70 ECs ( Mooney et al. , 2005 ) . 
+ The extent of s - EC association is 70 difﬁcult to assess from ChIP-chip data ( see Discussion ) ; we can not exclude the possibility that nonspeciﬁc antibody-EC interaction contributes to the mid-TU s . 
+ 70 Second , NusG associates with ECs more slowly than NusA on most TUs ( Figures 2 and 4 ) , except on antiterminated rrn TUs , where its faster association likely reﬂects incorporation into an antiterminated EC ( Torres et al. , 2001 ) . 
+ Conversely , the slower association of NusG on other TUs may suggest that its binding is stimulated by a feature of the EC that increases the farther RNAP transcribes . 
+ Third , r is evident at most TU locations , with a peak interaction at locations in between the strongest s and RNAP signals 70 ( Figures 4B and 4C ) . 
+ This suggests that r may associate with transcripts shortly after the initiation of transcription . 
+ r is detect-able throughout TUs , and the extent of this interaction is well correlated with the amount of RNAP located on the TU ( Figure 3E ) . 
+ This is consistent with the generally accepted role of r in premature termination whenever translation is compromised . 
+ To investigate the greater variability of NusG/RNAP ratios and NusG 's apparently slower association with ECs , we computed the NusG/RNAP , NusA/RNAP , and r/RNAP ratios for each gene and examined these ratios as a function of the average RNAP signal per gene ( Figures 5A -- 5C ) . 
+ NusA and r both exhibited relatively uniform distributions ; genes with low RNAP signals exhibited higher ratios ( as expected mathematically ; Figures 5A and 5B ) . 
+ In this analysis , NusA/RNAP ratios on rRNA genes were slightly above the trend line but were still consistent with at least 1:1 NusA : RNAP on most ECs . 
+ tRNA genes exhibited disproportionately high ratios of both NusA and r , suggesting that transcription of tRNA genes may differ from protein-coding genes . 
+ Small RNA ( sRNA ) genes , in contrast , exhibited normal ratios of NusA and r to RNAP . 
+ The NusG/RNAP ratio distribution differed strikingly from the 
+ NusA or r ratios . 
+ Although rRNA genes exhibited high NusG / RNAP ratios , a subset of genes with lower average RNAP signal exhibited even higher NusG/RNAP ratios ( inset , Figure 5C ) . 
+ Interestingly , several of these were genes involved in energy production ( genes from the nuo and cyo operons ) , murein/peptidoglycan biosynthesis and recycling ( oppD & F , murB & E ) , or amino-acid biosynthesis ( trpA & B , metI , cysM ) . 
+ This raised the possibility of a functional connection to elevated NusG levels on certain TUs ( e.g. , to localize transcription of certain genes ) . 
+ As an alternative , we considered whether the length of TUs might explain the abnormal NusG/RNAP ratios ( e.g. , if long TUs acquire higher NusG occupancy ) . 
+ To test this , we compared the NusG/RNAP ratio to the distance of genes from their TSS ( for cases where the TSS is known ) and found a strong correlation of TSS-gene distance to NusG/RNAP ratio ( Spearman r = 0.57 ; Figure 5D ) . 
+ Genes that deviated signiﬁcantly from this strong correlation by exhibiting low NusG/RNAP ratios included rfa and rfb genes ( inset , Figure 5D ) . 
+ This is readily explained because rfa and rfb genes are regulated by RfaH , a specialized paralog of NusG that competes with NusG for interaction with ECs ( Belogurov et al. , 2007 ) . 
+ We conclude that the gradual increase in NusG association as transcription progresses , rather than a connection to gene function , explains elevated NusG/RNAP ratios on some genes . 
+ The high NusG/RNAP ratios on energy-related and amino-acid-biosynthetic operons simply reﬂect the greater-than-average length of these TUs . 
+ To conﬁrm this interpretation , we plotted the average NusG/RNAP ratios for different gene functional classes by the average TSS-gene distance for the functional class ( Figure 5E ) . 
+ Classes with NusG/RNAP signal ratios below the genome average ( red circle , Figure 5E ) contained , on average , shorter genes , whereas classes exhibiting signiﬁcantly higher NusG/RNAP signal ratios contained longer genes . 
+ Thus , the primary determinant of NusG levels is TSS-gene distance , rather than gene function . 
+ Promoter-Proximal RNAP Peaks Correlate with Promoter-Proximal NusA and r Peaks
+ Promoter-proximal RNAP peaks have been detected in E. coli and Drosophila and are suggested to reﬂect RNAPs kinetically blocked early in elongation ( for Drosophila ) or possibly even prior to promoter escape ( for E. coli ; Core and Lis , 2008 ; Muse et al. , 
+ 2007 ; Reppas et al. , 2006 ; Wade and Struhl , 2008 ; Zeitlinger et al. , 2007 ) . 
+ Therefore , we asked whether promoter-proximal RNAP peaks were associated with NusA and r , which presumably requires promoter escape . 
+ We ﬁrst calculated the traveling ratio ( TR ; the ratio of RNAP signal in the promoter-proximal peak to that within the TU ; Reppas et al. , 2006 ) for a set of genes with a 50-s70 peak and that were greater than 1 kb in length ( to insure the peak and mid-gene signals were well separated ; Figure 6A ) . 
+ We then tested whether a NusA peak , r peak , or both occurred within 300 bp of the RNAP peak and binned the results based on TR ( Figure 6B ) . 
+ If the RNAP peaks reﬂect RNAPs poised prior to promoter escape , then the fraction of RNAP peaks with NusA or r copeaks should decrease at low TR ( because a low TR would indicate promoter-bound RNAP that should not recruit NusA or r , in contrast to ECs that can bind both ) . 
+ Instead , we observed little change in the frequency of NusA and r copeaks at low TR . 
+ We also binned the frequency of NusA and r copeaks based on gene expression level ( Allen et al. , 2003 ) , to ask if a block to promoter escape correlates with low expression ( as suggested previously by Reppas et al. , 2006 ; Figure 6C ) . 
+ No correlation was evident . 
+ Further , the frequency of copeaks correlated to RNAP peak height ( Figure 6D ) , suggesting that the failure to detect NusA or r copeaks for a fraction of RNAP peaks 
+ ( 25 % ) is mostly explained by false negatives in the peak-calling algorithm , since the signal-to-noise ratio for RNAP is better than that for NusA or r. Taken together , these results suggest that promoter-proximal RNAP peaks reﬂect RNAPs that have escaped promoters , at which point signals for NusA and r become detectable . 
+ To verify that RNAP peaks reﬂected premature termination rather than a block to promoter escape , we used quantitative RT-PCR to test representative sets of TUs that exhibited or lacked RNAP peaks ( Figures 4B and 4C ) for a drop in RNA transcript levels . 
+ This is an imperfect test because RNAs generated by premature termination are more difﬁcult than long mRNAs to quantify accurately and also may be unstable . 
+ Nonetheless , six of eight TUs exhibiting RNAP peaks produced signiﬁcantly more RNA near the 50 end versus 0 of 4 for TUs lacking RNA peaks ( Figure S5 ; p < 0.005 ; Student 's t test ) . 
+ Thus , most RNAP peaks are associated with premature transcription termination . 
+ Reppas et al. ( 2006 ) raised the possibility that RNAP peaks might instead correspond to RNAPs poised prior to promoter escape in part because they found 300 s peaks 70 not associated with detectable mRNAs . 
+ Thus , we asked if these s peaks exhibited NusA or r copeaks . 
+ Of the 300 70 peaks , 20 correspond to highly expressed stable RNA genes ; 138 of the remainder were associated with an RNAP peak ( Table S6 ) . 
+ Of these 138 , 74 were within 300 bp of s and 70 RNAP peaks in our data . 
+ Of these 74 , 45 ( 61 % ) were associated with a NusA peak ; 49 ( 66 % ) were associated with a r peak ; 33 ( 46 % ) were associated with both ; and 13 ( 18 % ) were associated with neither ( Figure S6 ) . 
+ As noted above , some NusA and r copeaks for small RNAP peaks were probably missed . 
+ Nonetheless , a few RNAP peaks likely represent promoter-bound enzyme : of three examples speciﬁcally cited by Reppas et al. ( 2006 ) , one ( hepA ) was associated with NusA and r , but two ( deoB and yjiT ) were associated with neither ( data not shown ) . 
+ The ﬁnding that promoter-proximal RNAP peaks correspond to RNAPs blocked early in elongation raised the possibility they result from transcriptional attenuation . 
+ Indeed , the Occapp proﬁles of genes regulated by attenuation resembled the aggreproximal peaks , could cause the RNAP peaks by r-dependent attenuation before a ribosome can bind and initiate translation , we examined the effect of the well-characterized r inhibitor , bicyclomycin ( Supplemental Experimental Procedures ) . 
+ If the RNAP peaks were caused by r-dependent attenuation , they should be reduced when cells are treated with bicyclomycin . 
+ genes that exhibit low TRs ( Cardinale et al. , 2008 ; Figure S8 ) . 
+ Thus , r-dependent attenuation does not appear to be the principal cause of promoter-proximal RNAP peaks . 
+ DISCUSSION
+ Our ChIP-chip study of the distributions of RNAP , s , NusA , 70 NusG , and r on E. coli TUs reveals the patterns of trafﬁcking for regulators most central to control of transcript elongation in bacteria and has important implications for understanding the mechanisms underlying these patterns . 
+ s , NusA , NusG , and 70 r are distributed relatively uniformly among most transcribing RNAP molecules with apparent relative afﬁnities for elongating RNAP of NusAzNusG > r > s70 . 
+ As RNAP moves away from a promoter , crosslinking of s greatly decreases . 
+ r and NusA 70 appear to associate with RNAP as s association decreases , 70 with r slightly preceding NusA , whereas NusG associates with elongating RNAP more slowly . 
+ As previously reported ( Reppas et al. , 2006 ) , RNAP exhibits strong promoter-proximal peaks on many , but not all TUs . 
+ We ﬁnd that these peaks correspond to ECs and that they do not result from r-dependent attenuation . 
+ NusA, NusG, and r Exhibit Different Patterns of EC Association, but No TU-Speciﬁc Specialization
+ Our ﬁnding that NusA , NusG , and r are , to a ﬁrst approximation , uniformly associated with ECs on most TUs suggests they act as general modulators of transcript elongation with about equal probability of altering responses of RNAP to intrinsic pause , arrest , or termination sites , regardless of where these sites occur in the genome . 
+ Due to the limited resolution of ChIP-chip , this does n't preclude speciﬁc associations of regulators at intrinsic sites that affect only a minority of elongating RNAP molecules or at which events occur rapidly relative to movement of RNAP over the surrounding DNA sequences . 
+ The results do rule out the possibilities that NusA , NusG , or r associate with certain TUs or certain sites within TUs to the exclusion of other TUs or locations . 
+ Nonetheless , each regulator associates with ECs as they move away from promoters in a distinct , regulator-speciﬁc pattern that is similar on most TUs ( Figure 7 ) . 
+ NusA exhibits negligible signal at promoters and associates with RNAP as s association is lost , closely paralleling RNAP 70 levels once RNAP moves away from a promoter ( Figures 2 -- 4 ) . 
+ NusA 's highest afﬁnity contacts occur between the NusA CTD and the a-subunit CTD ; additional contacts are made by NusA 's KH and S1 domains to the nascent RNA and by the NusA NTD to a second site on RNAP , which may include the b-subunit ﬂap tip ( Liu et al. , 1996 ; Mah et al. , 2000 ; Toulokhonov et al. , 2001 ) . 
+ At promoters , the a CTD binds to upstream DNA , either sequence speciﬁcally at UP elements or nonspeciﬁcally in association with s ( Estrem et al. , 1999 ) , and s region 4 occupies the 70 70 ﬂap tip until nascent RNA reaches 16 -- 17 nt in length ( Murakami et al. , 2002 ; Nickels et al. , 2006 ) . 
+ Thus , NusA contacts are either not possible ( to nascent RNA ) or masked by DNA or s until 70 RNAP moves away from the promoter , at which point the association of NusA with the a CTD and nascent transcript likely tether NusA to the EC via interactions that are largely independent of EC position in a TU ( Figure 7 ) . 
+ Like NusA , NusG exhibits negligible signal at promoters , but unlike NusA , it appears to associate with RNAP in two phases . 
+ In the ﬁrst phase , evident in aggregate Occapp proﬁles ( Figure 4 ) , NusG increases association with RNAP rapidly to 1 kb downstream from promoters . 
+ This ﬁrst phase is distinct signal does not mirror the promoter-proximal RNAP peaks ( Figure 4C ) . 
+ In the second phase , NusG Occapp increases more slowly , resulting in the increased NusG/RNAP ratios for genes farther from promoters ( Figures 5D and 5E ) . 
+ One explanation for the delayed association pattern of NusG could be competition with s for its binding location on RNAP . 
+ 70 
+ NusG is suggested to bind RNAP via contacts to the clamp helices ( Belogurov et al. , 2007 ) , which also make the tightest RNAP contact to s ( via s region 2 ; Arthur and Burgess , 70 70 1998 ; Young et al. , 2001 ) . 
+ Although s region 4 dissociates 70 from the ﬂap tip when 16 -- 17 nt of RNA are synthesized , the s region two-clamp helices interaction can persist in the EC 70 without steric conﬂict ( Mooney et al. , 2005 ) . 
+ In this case , slow NusG association could reﬂect delayed dissociation of s 70 region 2 . 
+ This would mean that s dissociates from RNAP 70 more slowly than reported by the ChIP-chip assay , which instead shows a sharp fall-off in s crosslinking immediately down-70 stream from promoters ( Figure 4 ; Raffaelle et al. , 2005 ; Reppas et al. , 2006 ; Wade and Struhl , 2004 ; see below ) . 
+ Alternatively , s may release rapidly and NusG binding could require long 70 RNA transcripts , since it has been suggested that NusG contains an RNA-binding activity ( Steiner et al. , 2002 ) . 
+ r associates with TUs closer to promoters than either NusA or NusG and then appears to decrease somewhat in TU association farther from promoters , with an approximately uniform association relative to RNAP signal ( Figures 3 and 4 ) . 
+ The location of the promoter-proximal r peak is consistent with the requirement of 80 -- 100 nt for r effects on ECs ( Lau and Roberts , 1985 ) . 
+ Thus , r appears to bind as soon as the requisite nascent transcript becomes available but perhaps fails to terminate transcription because NusG is not yet associated with RNAP . 
+ This early binding could position r to detect and subsequently terminate synthesis of the occasional mRNA on which translation fails . 
+ The strong r ChIP signal may be reduced once ribosomes load 
+ Our analysis of s conﬁrmed prior reports that the great majority 70 of s ChIP signal is lost as RNAP escapes the promoter ( Raf-70 faelle et al. , 2005 ; Reppas et al. , 2006 ; Wade and Struhl , 2004 ) , but asymmetry of the s aggregate Occapp peak suggests the 70 signal is lost on average 20 bp into TUs ( Figure S3 ) . 
+ However , a low s ChIP signal was present and was correlated with RNAP 70 signal at the middle of TUs ( Figure 3B ) . 
+ This likely reﬂects s - EC 70 interaction , although we can not exclude other possibilities ( e.g. , that transcription increases nonspeciﬁc binding of s - contain-70 ing holoenzyme to DNA , for instance by removing nucleoid proteins from DNA ) . 
+ In any case , it is difﬁcult to assess the extent of the interaction from the low s ChIP signal because it may 70 reﬂect far less efﬁcient s crosslinking to DNA than for promoter 70 complexes ( e.g. , indirect s - RNAP and RNAP-DNA crosslinking 70 in ECs rather than direct s - promoter DNA crosslinking ) . 
+ Our 70 results are consistent with the view that s breaks DNA contact 70 when RNAP escapes a promoter after which s 's weakened 70 contacts to RNAP allow its stochastic release ( Mooney et al. , 2005 ; Raffaelle et al. , 2005 ; Shimamoto et al. , 1986 ) but still support at least a weak equilibrium association with ECs and s rebinding at promoter-like sequences encountered during 70 
+ The Mechanistic Basis of Promoter-Proximal RNAP Peaks
+ In principle , promoter-proximal RNAP peaks could reﬂect one of at least three mechanistically distinct types of blocks to transcription . 
+ RNAP could be trapped ( 1 ) prior to promoter escape ( e.g. , before strand opening or in abortive initiation ) ; ( 2 ) early in elongation in a paused ( or poised ) state from which it can be released to productive elongation ; or ( 3 ) by premature and presumably regulated transcription termination ( transcriptional attenuation ) . 
+ Promoter-proximal RNAP peaks are common for human and Drosophila genes where they appear to be correlated with developmentally regulated rather than with housekeeping genes ( ENCODE Project Consortium , 2004 ; Guenther et al. , 2007 ; Muse et al. , 2007 ; Zeitlinger et al. , 2007 ) . 
+ These peaks have been attributed to promoter-proximal pausing based on several criteria ( Core and Lis , 2008 ; Muse et al. , 2007 ; Zeitlinger et al. , 2007 ) . 
+ In S. cerevisiae , promoter-proximal peaks occur only in stationary phase and by unknown mechanism ( Wade and Struhl , 2008 ) . 
+ All three types of mechanisms are well characterized in E. coli : promoter trapping ( Laishram and Gowrishan-kar , 2007 ; Rosenthal et al. , 2008 ) , promoter-proximal pausing ( Marr and Roberts , 2000 ; Hatoum and Roberts , 2008 ) , and attenuation ( Merino and Yanofsky , 2005 ) . 
+ Our ﬁndings establish that most promoter-proximal E. coli 
+ RNAP peaks correspond to ECs . 
+ First , the promoter-proximal RNAP peaks were offset in the direction of transcription by 150 bp ( Figure 4 ) . 
+ The transition from abortive to productive elongation , marked by release of s from promoter contacts 70 ( or from RNAP contacts ) , occurs within the ﬁrst 20 nt of transcript elongation ( Chander et al. , 2007 ; Revyakin et al. , 2006 ) . 
+ Known cases of s - stimulated pausing in vivo occur no later than +25 70 ( Ring et al. , 1996 ) . 
+ Thus , the location of RNAP peaks at +150 is inconsistent with a block prior to promoter escape and EC formation . 
+ Second , NusA , which is thought to bind to ECs after release of s , and r , which requires > 50 nt of RNA to bind , 70 both appeared to be associated with RNAP in the promoterproximal peaks . 
+ Assuming that ChIP-chip captures a close-to-instantaneous snapshot of RNAP positions on DNA , we suggest that the promoter-proximal RNAP peaks reﬂect transcriptional attenuation caused by a mechanism other than r-dependent termination , rather than RNAP poised at promoters ( Wade and Struhl , 2008 ) . 
+ The position of these RNAP peaks is consistent with the typical position of transcription attenuators ( Merino and Yanofsky , 2005 ) and strongly resembles ChIP-chip proﬁles of RNAP on TUs known to be subject to transcriptional attenuation ( e.g. , trp and pyrBI ; Figure S7 ) . 
+ Promoter-proximal peaks in eukaryotes have been ascribed to paused ECs ( Core and Lis , 2008 ; Muse et al. , 2007 ; Zeitlinger et al. , 2007 ) . 
+ Although long elusive , transcription attenuation is now clearly shown to occur in eukaryotes ( Steinmetz et al. , 2006 ) . 
+ Conclusive evidence that promoterproximal halted RNAPs are actually paused rather than on a termination pathway exists only for a limited number of cases ( e.g. , Drosophila heat shock genes and bacteriophage l P 0 R ; Adelman et al. , 2005 ; Marr and Roberts , 2000 ) . 
+ The regulation of early elongation by attenuation may prove to be more 
+ For additional information, see the Supplemental Experimental Procedures.
+ Materials
+ E. coli K12 strains MG1655 and MG1655 HA3 : : nusG were used for all experiments . 
+ MG1655 HA3 : : nusG was constructed by gene replacement without selection to give a strain isogenic to MG1655 encoding three copies of the hemagglutinin ( HA ) epitope tag at the 50 end of nusG . 
+ Monoclonal antibodies against s ( 2G10 ) , RNAP ( anti-b 70 0 , NT73 or anti-b , NT63 ) , and NusA ( 1NA1 ) were purchased from Neoclone ( Madison , WI ) . 
+ The monoclonal 12CA5 antiHA antibody ( to target HA3 : : nusG ) was purchased from Roche . 
+ The polyclonal antibody against NusG was generated by Proteintech ( Chicago ) , and polyclonal antibody against r was a kind gift from Jeff Roberts ( Cornell University ) After labeling , ChIP samples were hybridized to a custom microarray from Nimblegen ( Madison , WI ) that contains two copies of 187,204 Tm-matched R45-mer oligonucleotides that tile the E. coli chromosome with an average of spacing of 24.5 bp . 
+ Cell Growth and ChIP-chip
+ Cells were grown in deﬁned minimal medium ( with 0.2 % glucose ) with vigorous shaking at 37 C to mid-log ( light scattering at 600 nm equivalent to 0.4 OD ) . 
+ Formaldehyde was added to 1 % ﬁnal , and shaking was continued for 5 min before quenching with glycine . 
+ Cells were harvested , washed with 
+ PBS , and stored at 80 C. Cells were sonicated and digested with micrococcal nuclease and RNase A before immunoprecipitation . 
+ The ChIP DNA sample was ampliﬁed by ligation-mediated PCR ( Lee et al. , 2006 ) to yield > 4 mg of DNA , pooled with two other independent samples , and sent to Nimblegen , where samples were labeled with Cy3 and Cy5 ﬂuorescent dyes ( one for the ChIP sample and one for a control input sample ) and hybridized to a single microarray as a two-color experiment . 
+ ACCESSION NUMBERS
+ Raw microarray data have been deposited in GEO under the accession number GSE13938 . 
+ The Supplemental Data include Supplemental Experimental Procedures and seven ﬁgures and can be found with this article online at http://www.cell . 
+ com/molecular-cell/supplemental / S1097-2765 ( 08 ) 00891-5 . 
+ ACKNOWLEDGMENTS
+ We thank C. Herring for construction of the HA3 : : nusG allele , K. Struhl for helpful discussions and sharing results prior to publication , and J. Grass for assistance with a control experiment . 
+ This work was supported by grants to A.Z.A. ( USDA Hatch ) and R.L. ( NIH GM38660 ) . 
+ Mol. Cell 26, 117–129.
+ Burns , C.M. , Richardson , L.V. , and Richardson , J.P. ( 1998 ) . 
+ Combinatorial effects of NusA and NusG on transcription elongation and Rho-dependent termination in Escherichia coli . 
+ J. Mol . 
+ Biol . 
+ 278 , 307 -- 316 . 
+ Cardinale , C.J. , Washburn , R.S. , Tadigotla , V.R. , Brown , L.M. , Gottesman , M.E. , and Nudler , E. ( 2008 ) . 
+ Termination factor Rho and its cofactors NusA and NusG silence foreign DNA in E. coli . 
+ Science 320 , 935 -- 938 . 
+ Chander , M. , Austin , K.M. , Aye-Han , N.N. , Sircar , P. , and Hsu , L.M. ( 2007 ) . 
+ An alternate mechanism of abortive release marked by the formation of very long abortive transcripts . 
+ Biochemistry 46 , 12687 -- 12699 . 
+ Core , L.J. , and Lis , J.T. ( 2008 ) . 
+ Transcription regulation through promoterproximal pausing of RNA polymerase II . 
+ Science 319 , 1791 -- 1792 . 
+ Defez , R. , and De Felice , M. ( 1981 ) . 
+ Cryptic operon for beta-glucoside metab-olism in Escherichia coli K12 : genetic evidence for a regulatory protein . 
+ Genetics 97 , 11 -- 25 . 
+ deHaseth , P.L. , Lohman , T.M. , Burgess , R.R. , and Record , M.T. , Jr. ( 1978 ) . 
+ Nonspeciﬁc interactions of Escherichia coli RNA polymerase with native and denatured DNA : differences in the binding behavior of core and holoenzyme . 
+ Biochemistry 17 , 1612 -- 1622 . 
+ ENCODE Project Consortium . 
+ ( 2004 ) . 
+ The ENCODE ( ENCyclopedia Of DNA 
+ Estrem , S.T. , Ross , W. , Gaal , T. , Chen , Z.W. , Niu , W. , Ebright , R.H. , and Gourse , R.L. ( 1999 ) . 
+ Bacterial promoter architecture : subsite structure of UP elements and interactions with the carboxy-terminal domain of the RNA polymerase alpha subunit . 
+ Genes Dev . 
+ 13 , 2134 -- 2147 . 
+ Farnham , P.J. , Greenblatt , J. , and Platt , T. ( 1982 ) . 
+ Effects of NusA protein on transcription termination of the tryptophan operon of Escherichia coli . 
+ Cell 29 , 945 -- 951 . 
+ Grainger , D.C. , Hurd , D. , Harrison , M. , Holdstock , J. , and Busby , S.J. ( 2005 ) . 
+ Studies of the distribution of Escherichia coli cAMP-receptor protein and RNA polymerase along the E. coli chromosome . 
+ Proc . 
+ Natl. Acad . 
+ Sci . 
+ USA 102 , 17693 -- 17698 . 
+ Greenblatt , J. , and Li , J. ( 1981 ) . 
+ Interaction of the sigma factor and the nusA gene protein of E. coli with RNA polymerase in the initiation-termination cycle of transcription . 
+ Cell 24 , 421 -- 428 . 
+ Greenblatt , J. , McLimont , M. , and Hanly , S. ( 1981 ) . 
+ Termination of transcription by nusA gene protein of Escherichia coli . 
+ Nature 292 , 215 -- 220 . 
+ Grigorova , I.L. , Phleger , N.J. , Mutalik , V.K. , and Gross , C.A. ( 2006 ) . 
+ Insights into transcriptional regulation and sigma competition from an equilibrium model of RNA polymerase binding to DNA . 
+ Proc . 
+ Natl. Acad . 
+ Sci . 
+ USA 103 , 5332 -- 5337 . 
+ Guenther , M.G. , Levine , S.S. , Boyer , L.A. , Jaenisch , R. , and Young , R.A. ( 2007 ) . 
+ A chromatin landmark and transcription initiation at most promoters in human cells . 
+ Cell 130 , 77 -- 88 . 
+ Hatoum , A. , and Roberts , J. ( 2008 ) . 
+ Prevalence of RNA polymerase stalling at Escherichia coli promoters after open complex formation . 
+ Mol . 
+ Microbiol . 
+ 68 , 17 -- 28 . 
+ Herring , C.D. , Raffaelle , M. , Allen , T.E. , Kanin , E.I. , Landick , R. , Ansari , A.Z. , and Palsson , B.O. ( 2005 ) . 
+ Immobilization of Escherichia coli RNA polymerase and location of binding sites by use of chromatin immunoprecipitation and 
+ Kapanidis , A.N. , Margeat , E. , Laurence , T.A. , Doose , S. , Ho , S.O. , Mukhopadhyay , J. , Kortkhonjia , E. , Mekler , V. , Ebright , R.H. , and Weiss , S. ( 2005 ) . 
+ Retention of transcription initiation factor sigma70 in transcription elongation : 
+ Kassavetis , G.A. , and Chamberlin , M.J. ( 1981 ) . 
+ Pausing and termination of transcription within the early region of bacteriophage T7 DNA in vitro . 
+ J. Biol . 
+ Chem . 
+ 256 , 2777 -- 2786 . 
+ Kuo , M.H. , and Allis , C.D. ( 1999 ) . 
+ In vivo cross-linking and immunoprecipitation for studying dynamic Protein : DNA associations in a chromatin environment . 
+ Methods 19 , 425 -- 433 . 
+ Laishram , R.S. , and Gowrishankar , J. ( 2007 ) . 
+ Environmental regulation operating at the promoter clearance step of bacterial transcription . 
+ Genes Dev . 
+ 21 , 1258 -- 1272 . 
+ Lau , L.F. , and Roberts , J.W. ( 1985 ) . 
+ Rho-dependent transcription termination at lambda R1 requires upstream sequences . 
+ J. Biol . 
+ Chem . 
+ 260 , 574 -- 584 . 
+ Lee , T.I. , Johnstone , S.E. , and Young , R.A. ( 2006 ) . 
+ Chromatin immunoprecipitation and microarray-based analysis of protein location . 
+ Nat . 
+ Protocols 1 , 729 -- 748 . 
+ Li , J. , Horwitz , R. , McCracken , S. , and Greenblatt , J. ( 1992 ) . 
+ NusG , a new Escherichia coli elongation factor involved in transcriptional antitermination by the N protein of phage lambda . 
+ J. Biol . 
+ Chem . 
+ 267 , 6012 -- 6019 . 
+ Li , J. , Mason , S.W. , and Greenblatt , J. ( 1993 ) . 
+ Elongation factor NusG interacts with termination factor rho to regulate termination and antitermination of transcription . 
+ Genes Dev . 
+ 7 , 161 -- 172 . 
+ Linn , T. , and Greenblatt , J. ( 1992 ) . 
+ The NusA and NusG proteins of Escherichia coli increase the in vitro readthrough frequency of a transcriptional attenuator preceding the gene for the beta subunit of RNA polymerase . 
+ J. Biol . 
+ Chem . 
+ 267, 1449–1454.
+ Liu , K. , Zhang , Y. , Severinov , K. , Das , A. , and Hanna , M.M. ( 1996 ) . 
+ Role of Escherichia coli RNA polymerase alpha subunit in modulation of pausing , termination and anti-termination by the transcription elongation factor NusA . 
+ Mah , T.F. , Kuznedelov , K. , Mushegian , A. , Severinov , K. , and Greenblatt , J. ( 2000 ) . 
+ The alpha subunit of E. coli RNA polymerase activates RNA binding by NusA . 
+ Genes Dev . 
+ 14 , 2664 -- 2675 . 
+ Marr , M.T. , and Roberts , J.W. ( 2000 ) . 
+ Function of transcription cleavage factors GreA and GreB at a regulatory pause site . 
+ Mol . 
+ Cell 6 , 1275 -- 1285 . 
+ Mason , S.W. , Li , J. , and Greenblatt , J. ( 1992 ) . 
+ Host factor requirements for processive antitermination of transcription and suppression of pausing by the N protein of bacteriophage lambda . 
+ J. Biol . 
+ Chem . 
+ 267 , 19418 -- 19426 . 
+ Matsumoto , Y. , Shigesada , K. , Hirano , M. , and Imai , M. ( 1986 ) . 
+ Autogenous regulation of the gene for transcription termination factor rho in Escherichia coli : localization and function of its attenuators . 
+ J. Bacteriol . 
+ 166 , 945 -- 958 . 
+ Merino , E. , and Yanofsky , C. ( 2005 ) . 
+ Transcription attenuation : a highly conserved regulatory strategy used by bacteria . 
+ Trends Genet . 
+ 21 , 260 -- 264 . 
+ Mooney , R.A. , and Landick , R. ( 2003 ) . 
+ Tethering s to RNA polymerase 70 reveals high in vivo activity of s factors and s - dependent pausing at 70 promoter-distal locations . 
+ Genes Dev . 
+ 17 , 2839 -- 2851 . 
+ Mooney , R.A. , Darst , S.A. , and Landick , R. ( 2005 ) . 
+ Sigma and RNA polymerase : an on-again , off-again relationship ? 
+ Mol . 
+ Cell 20 , 335 -- 345 . 
+ during transcription : ﬂuorescence resonance energy transfer assay for movement relative to DNA . 
+ Cell 106 , 453 -- 463 . 
+ Murakami , K.S. , Masuda , S. , Campbell , E.A. , Muzzin , O. , and Darst , S. ( 2002 ) . 
+ Structural basis of transcription Initiation : an RNA polymerase holoenzyme / DNA complex . 
+ Science 296 , 1285 -- 1290 . 
+ Muse , G.W. , Gilchrist , D.A. , Nechaev , S. , Shah , R. , Parker , J.S. , Grissom , S.F. , Zeitlinger , J. , and Adelman , K. ( 2007 ) . 
+ RNA polymerase is poised for activation across the genome . 
+ Nat . 
+ Genet . 
+ 39 , 1507 -- 1511 . 
+ Nickels , B.E. , Roberts , C.W. , Roberts , J.W. , and Hochschild , A. ( 2006 ) . 
+ RNA-mediated destabilization of the sigma ( 70 ) region 4/beta ﬂap interaction facilitates engagement of RNA polymerase by the Q antiterminator . 
+ Mol . 
+ Cell 24 , 457 -- 468 . 
+ Raffaelle , M. , Kanin , E.I. , Vogt , J. , Burgess , R.R. , and Ansari , A.Z. ( 2005 ) . 
+ Holo-enzyme switching and stochastic release of sigma factors from RNA polymerase in vivo . 
+ Mol . 
+ Cell 20 , 357 -- 366 . 
+ Reppas , N.B. , Wade , J.T. , Church , G.M. , and Struhl , K. ( 2006 ) . 
+ The Transition between transcriptional initiation and elongation in E. coli is highly variable and often rate limiting . 
+ Mol . 
+ Cell 24 , 747 -- 757 . 
+ Revyakin , A. , Liu , C. , Ebright , R.H. , and Strick , T.R. ( 2006 ) . 
+ Abortive initiation and productive initiation by RNA polymerase involve DNA scrunching . 
+ Science 314 , 1139 -- 1143 . 
+ Richardson , J. ( 2002 ) . 
+ Rho-dependent termination and ATPases in transcript termination . 
+ Biochim . 
+ Biophys . 
+ Acta 1577 , 251 -- 260 . 
+ Ring , B. , Yarnell , W. , and Roberts , J. ( 1996 ) . 
+ Function of E. coli RNA polymerase s factor s in promoter-proximal pausing . 
+ Cell 86 , 485 -- 493 . 
+ 70 
+ Rosenthal , A.Z. , Kim , Y. , and Gralla , J.D. ( 2008 ) . 
+ Poising of Escherichia coli RNA polymerase and its release from the sigma 38 C-terminal tail for osmY transcription . 
+ J. Mol . 
+ Biol . 
+ 376 , 938 -- 949 . 
+ Selinger , D.W. , Cheung , K.J. , Mei , R. , Johansson , E.M. , Richmond , C.S. , Blatt-using a 30 base pair resolution Escherichia coli genome array . 
+ Nat . 
+ Biotechnol . 
+ 18 , 1262 -- 1268 . 
+ Shankar , S. , Hatoum , A. , and Roberts , J.W. ( 2007 ) . 
+ A transcription antiterminator constructs a NusA-dependent shield to the emerging transcript . 
+ Mol . 
+ Cell 27 , 914 -- 927 . 
+ Shimamoto , N. , Kamigochi , T. , and Utiyama , H. ( 1986 ) . 
+ Release of the sigma subunit of Escherichia coli DNA-dependent RNA polymerase depends mainly on time elapsed after the start of initiation , not on length of product RNA . 
+ J. Biol . 
+ Chem . 
+ 261 , 11859 -- 11865 . 
+ Solomon , M.J. , Larsen , P.L. , and Varshavsky , A. ( 1988 ) . 
+ Mapping protein-DNA interactions in vivo with formaldehyde : evidence that histone H4 is retained on a highly transcribed gene . 
+ Cell 53 , 937 -- 947 . 
+ Steiner , T. , Kaiser , J.T. , Marinkovic , S. , Huber , R. , and Wahl , M.C. ( 2002 ) . 
+ Crystal structures of transcription factor NusG in light of its nucleic acid - and protein-binding activities . 
+ EMBO J. 21 , 4641 -- 4653 . 
+ Steinmetz , E.J. , Warren , C.L. , Kuehner , J.N. , Panbehi , B. , Ansari , A.Z. , and Brow , D.A. ( 2006 ) . 
+ Genome-wide distribution of yeast RNA polymerase II and its control by Sen1 helicase . 
+ Mol . 
+ Cell 24 , 735 -- 746 . 
+ Torres , M. , Condon , C. , Balada , J.M. , Squires , C. , and Squires , C.L. ( 2001 ) . 
+ Ribosomal protein S4 is a transcription factor with properties remarkably similar to NusA , a protein involved in both non-ribosomal and ribosomal RNA antitermination . 
+ EMBO J. 20 , 3811 -- 3820 . 
+ Toulokhonov , I. , Artsimovitch , I. , and Landick , R. ( 2001 ) . 
+ Allosteric control of RNA polymerase by a site that contacts nascent RNA hairpins . 
+ Science 292 , 730 -- 733 . 
+ von Hippel , P.H. , Revzin , A. , Gross , C.A. , and Wang , A.C. ( 1974 ) . 
+ Non-speciﬁc DNA binding of genome regulating proteins as a biological control mechanism : I . 
+ The lac operon : equilibrium aspects . 
+ Proc . 
+ Natl. Acad . 
+ Sci . 
+ USA 71 , 4808 -- 4812 . 
+ Wade , J.T. , and Struhl , K. ( 2004 ) . 
+ Association of RNA polymerase with transcribed regions in Escherichia coli . 
+ Proc . 
+ Natl. Acad . 
+ Sci . 
+ USA 101 , 17777 -- 17782 . 
+ Wade , J.T. , and Struhl , K. ( 2008 ) . 
+ The transition from transcriptional initiation to elongation . 
+ Curr . 
+ Opin . 
+ Genet . 
+ Dev . 
+ 18 , 130 -- 136 . 
+ Wade , J.T. , Struhl , K. , Busby , S.J. , and Grainger , D.C. ( 2007 ) . 
+ Genomic analysis of protein-DNA interactions in bacteria : insights into transcription and chromosome organization . 
+ Mol . 
+ Microbiol . 
+ 65 , 21 -- 26 . 
+ Yakhnin , A.V. , and Babitzke , P. ( 2002 ) . 
+ NusA-stimulated RNA polymerase pausing and termination participates in the Bacillus subtilis trp operon attenuation mechanism invitro . 
+ Proc . 
+ Natl. Acad . 
+ Sci . 
+ USA 99 , 11067 -- 11072 . 
+ Young , B.A. , Anthony , L.C. , Gruber , T.M. , Arthur , T.M. , Heyduk , E. , Lu , C.Z. , Sharp , M.M. , Heyduk , T. , Burgess , R.R. , and Gross , C.A. ( 2001 ) . 
+ A coiledcoil from the RNA polymerase beta0 subunit allosterically induces selective nontemplate strand binding by sigma ( 70 ) . 
+ Cell 105 , 935 -- 944 . 
+ Zeitlinger , J. , Stark , A. , Kellis , M. , Hong , J.W. , Nechaev , S. , Adelman , K. , Lev-ine , M. , and Young , R.A. ( 2007 ) . 
+ RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo . 
+ Nat . 
+ Genet . 
+ 39 ,
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/19647521.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/19647521.txt 0 → 100644
View file @27818a9
+ Protein Occupancy Landscape of a Bacterial Genome
+ 2The Lewis-Sigler Institute for Integrative Genomics Princeton University , Princeton , NJ 08544 , USA 3Present address : School of Sciences and Engineering , The American University in Cairo , 11835 New Cairo , Egypt * Correspondence : tavazoie@genomics.princeton.edu DOI 10.1016 / j.molcel .2009.06.035 
+ SUMMARY
+ Protein-DNA interactions are fundamental to core biological processes , including transcription , DNA replication , and chromosomal organization . 
+ We have developed in vivo protein occupancy display ( IPOD ) , a technology that reveals protein occupancy across an entire bacterial chromosome at the resolution of individual binding sites . 
+ Application to Escher-ichia coli reveals thousands of protein occupancy peaks , highly enriched within and in close proximity to noncoding regulatory regions . 
+ In addition , we discovered extensive ( > 1 kilobase ) protein occupancy domains ( EPODs ) , some of which are localized to highly expressed genes , enriched in RNA-poly-merase occupancy . 
+ However , the majority are localized to transcriptionally silent loci dominated by conserved hypothetical ORFs . 
+ These regions are highly enriched in both predicted and experimentally determined binding sites of nucleoid proteins and exhibit extreme biophysical characteristics such as high intrinsic curvature . 
+ Our observations implicate these transcriptionally silent EPODs as the elusive organizing centers , long proposed to topologically isolate chromosomal domains . 
+ INTRODUCTION
+ Replication , maintenance , and expression of genetic information are processes that are orchestrated through precise interactions of hundreds of proteins with chromosomal DNA . 
+ For decades , research has focused on the behavior and functional consequences of DNA-protein interactions at individual loci . 
+ However , understanding systems-level behaviors , such as chromosomal organization , genome replication , and transcriptional network dynamics , requires observations at the scale of the entire system . 
+ Microarray-based chromatin immunoprecipitation ( ChIP-chip ) allows global measurements of chromosomal occupancy for individual proteins ( Ren et al. , 2000 ) . 
+ In another global approach , methylase protection , a fraction of all occupied sites are monitored in vivo , independently of the identity of the bound proteins ( Tavazoie and Church , 1998 ) . 
+ However , there currently the genome . 
+ We have developed such a technology and used it to proﬁle protein occupancy of the E. coli chromosome at the resolution of individual binding sites . 
+ RESULTS
+ In Vivo Protein Occupancy Display
+ In order to globally proﬁle the occupancy of all proteins on chromosomal DNA , we ﬁrst stabilize in vivo protein-DNA interactions through covalent crosslinking with formaldehyde ( Figure 1A ) . 
+ After cell lysis and sonication , protein footprints are minimized to a mode of 50 bp through DNase I digestion ( Figure 1B ) . 
+ Phenol extraction is then used to trap amphipathic protein-DNA complexes at the interface between the organic and aqueous phases . 
+ Following interface isolation and crosslink reversal , short DNA fragments are end labeled and hybridized to a high-density tiling array containing 25-mer oligonucleotides at the resolution of one every four base pairs across the entire genome . 
+ After scanning and data normalization , a high-resolu-tion global protein occupancy proﬁle is achieved . 
+ For each probe on the chip , protein occupancy enrichment or depletion levels are quantiﬁed using a z-score that represents the probe-by-probe relative signal intensity with respect to the mean , and normalized to the standard deviation , of signals from replicate hybridizations of whole genomic DNA ( Experimental Procedures ) . 
+ Global Protein Occupancy Proﬁle of the E. coli Chromosome
+ The vast fraction of characterized protein-DNA interactions occur via sequence-speciﬁc interactions of transcription factors with DNA within , and in close proximity to , noncoding regulatory regions ( Gama-Castro et al. , 2008 ) . 
+ Consistent with this , we see highly signiﬁcant occupancy enrichment in noncoding regions as compared to coding regions ( Figure 1C ) . 
+ This difference in occupancy is clearly discernable in a local chromosomal view where high-amplitude peaks are largely conﬁned to the regions between genes ( Figure 2A ) . 
+ Independent biological replicates demonstrate that the position and relative amplitude of these occupancy peaks show a high level of reproducibility ( Figure 2A ) . 
+ Although there is , overall , relative depletion of occupancy within open reading frames ( ORFs ) , occasionally this is interrupted by a sharp occupancy peak ( Figures 2A and S1 [ available online ] ) . 
+ The functional role of these intragenic interactions is not known individual proteins can be readily discerned , displaying footprints on the scale of a typical transcription-factor-binding site ( Figures 2B and S2 ) . 
+ An automated peak detection algorithm identiﬁed 2063 individual occupied sites in a population of E. coli cells growing in late exponential phase ( Figure S3 ) . 
+ The pattern of peaks is reproducible in biological replicates and 
+ Discovery of Extended Protein Occupancy Domains
+ Intriguingly , examination of the entire genome-wide occupancy proﬁle revealed contiguous regions of protein binding , many of which extend beyond a kilobase in length ( Figures 3A -- 3D and 
+ S5 ) . 
+ We performed a systematic search for these extended protein occupancy domains ( EPODs ) under early exponential growth using an automated algorithm that identiﬁed regions 1024 bp or longer with contiguous median occupancy values above the 75th percentile of all genome-wide values ( Supplemental Experimental Procedures ) . 
+ These domains had a median length of 1.6 kb and extended as long as 14 kb ( Figure S6A ) . 
+ We wondered whether the extreme signal in these domains corresponded to the footprint of RNA polymerase within highly transcribed regions . 
+ To test this possibility , we performed transcriptional proﬁling under identical cellular growth conditions ( Experimental Procedures ) . 
+ As can be seen ( Figure 3A ) , we found clear cases where the boundaries of an EPOD coincided with those of highly transcribed regions such as those containing ribosomal protein genes ( Figure 3A ) . 
+ However , we found many cases where EPODs existed in a transcriptionally silent state , across both genes and intergenic regions , and even long operons ( Figures 3B -- 3D and S7 ) . 
+ Due to their extreme and bimodal RNA median expression level across domains ( Supplemental Experimental Procedures and Table S3 ) . 
+ This resulted in 121 domains in the highly expressed class ( heEPODs ) and 151 in the trans-criptionally silent class ( tsEPODs ) . 
+ Previously published RNA polymerase ChIP-chip data ( Grainger et al. , 2005 ) , from cells grown under identical conditions , allowed us to compare RNA polymerase occupancy of tsEPODs and heEPODs relative to a background set generated by randomly sampling genomic sequences from the overall EPOD length distribution ( Figure 4A ) . 
+ As expected , heEPODs showed extremely high levels of RNA polymerase occupancy ( p < 10 ) . 
+ In comparison , tsEPODs 246 showed lower levels of RNA polymerase occupancy ( p < 0.02 ) 
+ In order to gain further insight into the potential role of EPODs , we looked for enrichment of speciﬁc functional categories in genes that overlapped them ( Table S1 ) . 
+ As expected , heEPODs were highly enriched in processes and pathways that are highly expressed , including translation and tRNAs . 
+ The most signiﬁcantly enriched classes within tsEPODs were predicted and hypothetical ORFs , with marginally signiﬁcant enrichment in pro-phage and prophage-related genes . 
+ On the other hand , tsEPODs , by and large , avoid putatively essential genes ( Table S2 ) . 
+ The number of tsEPODs , their apparently random , yet widespread genome-wide distribution , and their enrichment within trans-criptionally silent ORFs of unknown function , suggested that they may fulﬁll an architectural role . 
+ In fact , there exists compel-ling evidence that the E. coli chromosome is organized into domains , subserving both chromosomal compaction and topological domain isolation ( Postow et al. , 2004 ) . 
+ Evidence for such in vivo organization comes from both genetic and biochem ¬ 
+ ( Delius and Worcel , 1974b ; Hinnebusch and Bendich , 1997 ; 
+ Pettijohn , 1996 ; Postow et al. , 2004 ) . 
+ However , the formation , composition , maintenance , and dynamics of these domains remain open questions ( Bendich , 2001 ; Postow et al. , 2004 ; Travers and Muskhelishvili , 2007 ) . 
+ Investigators have argued that such domains may be organized through the binding and cooperation of abundant proteins collectively referred to as nucleoid proteins ( Azam and Ishihama , 1999 ) . 
+ These proteins have characteristics that suit them well for this task . 
+ These include high abundance , low sequence speciﬁcity , tendency to cause DNA curvature , and propensity to bind curved DNA . 
+ In addition , some of these factors ( e.g. , H-NS ) are known to form at least homodimeric interactions ( Stella et al. , 2005 ) , a capacity that as argued previously ( Dame et al. , 2000 ; Skoko et al. , 2006 ) may allow distant chromosomal sites to be brought together to form topologically isolated domains . 
+ Low-resolution ChIP-chip studies against known nucleoid proteins ( Grainger et al. , 2006 ) revealed both a bias toward interaction in noncoding regions and a correlation with Fis and H-NS binding , suggesting cooperative interaction of nucleoid proteins in maintaining genomic architecture . 
+ We sought evidence for the involvement of nucleoid proteins in the formation of tsEPODs . 
+ The availability of probabilistic sequence speciﬁcity models , in the form of position weight tion for LacI ( a nonnucleoid transcription factor ) showed the opposite trend , with signiﬁcantly lower values ( p < 10 ) within 7 tsEPODs ( Figure 4C ) . 
+ Consistent with the preference of nucleoid proteins for A/T-rich DNA ( Cho et al. , 2008 ; Grainger et al. , 2006 ) , we also saw a highly skewed A : T frequency bias : 59 % within tsEPODs , as compared to 49 % for the background and 50 % for heEPODs ( p < 10 , Figure 4D ) . 
+ We also found tsEPODs 30 to display extreme biophysical characteristics ( Pedersen et al. , 2000 ) such as high curvature ( p < 10 ) and stacking energy 24 ( p < 10 ) , again consistent with the hypothesis that these 34 regions constitute chromosomal organizing centers ( Figures 4E and S9 ) . 
+ Consistent with our computational analyses above , we saw signiﬁcant enrichment for the high-afﬁnity binding of nucleoid proteins in our tsEPODs relative to background ( Figure 4F ) within individual ChIP-chip proﬁles for H-NS , IHF , and Fis ( Grainger et al. , 2006 ) . 
+ Intriguingly , we also saw a highly signiﬁcant enrichment for the binding of Fis within heEPODs ( Figure 4F ) . 
+ This is consistent with the locus-speciﬁc role of Fis in the regulation of highly expressed genes , including ribosomal RNAs ( Aiyar et al. , 2002 ; Cho et al. , 2008 ; Grainger et al. , 2006 ) . 
+ DISCUSSION
+ In total , our observations argue in favor of a model in which the binding of tsEPODs by nucleoid proteins establishes them as chromosomal organizing centers . 
+ We argue that the underlying biophysical properties of these regions may largely dictate this role . 
+ IHF is known to have a preference for curved DNA , causing it to bend sharply upon binding ; the nucleoid proteins HU and H-NS bind strongly to curved DNA as well ( Swinger and Rice , 2004 ) . 
+ Fis , H-NS , and IHF restrain supercoils ( Pettijohn , 1996 ) , and both H-NS ( Dame et al. , 2000 ) and Fis ( Skoko et al. , 2006 ) show oligomerization and DNA compaction in vitro . 
+ We propose that nucleation starts with nucleoid proteins preferentially binding these curved regions of DNA . 
+ Because several of the nucleoid proteins prefer to bind curved DNA , these initial protein-DNA interactions make the region more favorable for further binding events . 
+ In this way , a wave of nucleoid proteins may spread across these regions , reinforced through the maintenance of curvature and intradomain protein-protein interactions . 
+ Homo-and heterodimeric protein-protein interactions , for example , as shown for H-NS ( Stella et al. , 2005 ) , can then bring these domains in contact with each other , forming the classic rosette structures 
+ Our observations do not suggest that every tsEPOD is essential to chromosomal organization at all times . 
+ Rather , a subset of tsEPODs could be involved in the formation of higher-order structure in any one cell , or across different environmental conditions . 
+ For relevant discussions see Deng et al. ( 2005 ) , Postow et al. ( 2004 ) , and Valens et al. ( 2004 ) . 
+ The lack of any discernable ﬁtness deﬁcit for a reduced genome E. coli strain , MDS42 ( Kolisnychenko et al. , 2002 ) , which is missing 24 % of the ORFs contained in tsEPODs , supports this dynamic and redundant picture . 
+ In fact , in vivo protein occupancy display ( IPOD ) analysis of this reduced genome showed that the occupancy pattern of the remaining EPODs is largely preserved , with 44 % of EPOD sequences in MDS42 exactly overlapping those deﬁned in MG1655 ( Figure S10 ) . 
+ Although there are a minority of loci with substantially different occupancy patterns , most of the residual discrepancy is due to differences in the exact deﬁnition of EPOD boundaries and not their locations . 
+ These observations provide additional support for our proposed model , namely 
+ The cumulative distribution of various measures are shown for transcriptionally silent EPODs ( red ) , highly expressed EPODs ( blue ) , and a matched background control ( black ) . 
+ The Wilcoxon rank sum test is used to determine statistical signiﬁcance of observed deviations relative to background . 
+ ( A ) Experimentally determined relative RNA polymerase occupancy . 
+ ( B and C ) Computationally scored PWM binding preference for a nucleoid protein ( H-NS ) and a nonnucleoid transcriptional repressor ( LacI ) using a 
+ ( D ) Cumulative distributions of A : T frequencies within EPODs . 
+ ( E ) Cumulative distribution of predicted relative curvature values within EPODs . 
+ ( F ) Distribution of binding sites for various nucleoid proteins and CRP within composition , act as extended protein occupancy domains , which in turn may allow them to participate in organizing large-scale chromosomal topology . 
+ However , we also raise the possibility that the establishment of these transcriptionally silenced protein occupancy domains may subserve other functions . 
+ For example , others have argued for the role of nucleoid proteins such as H-NS in the silencing of horizontally transferred DNA ( Dorman , 2007 ) . 
+ A closer inspection of some EPODs suggests that our automated classiﬁcation of them into the two groups of highly expressed and transcriptionally silent may not capture the full range of their diversity . 
+ Indeed , one of the longest tsEPODs is deﬁned over a cluster of genes encoding enzymes in the pathway of lipopolysaccharide ( LPS ) biosynthesis ( Figure 3B ) . 
+ Analysis of strand-speciﬁc RNA abundance of this locus ( Figure S11 ) clearly shows that , although this region is classiﬁed as transcriptionally `` silent , '' there is low-level expression that is mostly conﬁned to the ﬁrst three genes in the operon ( rfaQ , rfaG , and rfaP ) . 
+ These observations suggest that extended protein occupancy may be present at loci with low-level expression , and that it may be caused by processes that are distinct from those operating at absolutely silent loci . 
+ We have developed IPOD , a global , in vivo approach for monitoring the protein occupancy of an entire bacterial genome at the resolution of individual binding sites . 
+ Aqueous/organic phase separation has been previously used to enrich on the basis of nucleosome density in S. cerevisiase ( Nagy et al. , 2003 ) , and Grainger et al. demonstrated that crosslinked RNA-poly-merase-bound sequences are preferentially partitioned to the organic phase in E. coli ( Grainger et al. , 2006 ) . 
+ Here we have shown that localization of small nucleoprotein complexes at the aqueous/organic interface is a simple yet powerful strategy for proﬁling protein occupancy across an entire prokaryotic genome . 
+ Although the identity of the protein bound at each site is not known , increasingly accurate sequence-speciﬁcity models of protein-DNA interactions should allow probabilistic assignments to known DNA-binding proteins . 
+ In fact , since IPOD analysis allows measurements of correlated occupancy of many sites across different conditions , it should aid in the reﬁnement of existing sequence-speciﬁcity models and the discovery of new ones . 
+ The ability to simultaneously monitor both protein occupancy and transcriptional output , at high spatial and temporal resolution , promises to allow true systems-level modeling of transcriptional network dynamics and chromosomal organization . 
+ At large spatial scales , these data have revealed the existence of transcriptionally silent protein occupancy domains . 
+ Our diverse observations implicate these regions as the long-proposed domain-organizing centers of the E. coli chromosome . 
+ EXPERIMENTAL PROCEDURES
+ Protein Occupancy Display In Vivo Crosslinking and Footprint Minimization
+ In vivo formaldehyde crosslinking was performed as in Laub et al. ( 2002 ) with minor variations . 
+ Batch cultures of E. coli MG1655 were grown to early ( 2.4 3 10 CFU/ml ) or late ( 2.4 3 10 CFU/ml ) exponential phase , in Luria-Bertani 7 8 medium ( 0.1 % Bacto Tryptone , 0.05 % yeast extract , 0.05 % NaCl ) , at which point 30 ml of cells was mixed with 300 ml 1M sodium phosphates ( pH 7.6 ) and 810 ml 37 % formaldehyde . 
+ Batch cultures of E. coli MDS42 were grown to early exponential phase ( OD600 0.3 ) . 
+ All cultures were grown at 37 C with shaking . 
+ Duration of exposure to formaldehyde , at room temperature with shaking , was varied from 5 to 20 min without noticeable differences in cross-linking efﬁciency ; 20 min exposure was used in the experiments presented here . 
+ Crosslinking was quenched by addition of 2 ml 2 M glycine . 
+ The samples were shaken at room temperature for 10 min and then moved to ice for an additional 10 min to complete quenching . 
+ The cells were pelleted by centrifugation at 5525 3 g , 4 C for 4 min and then washed twice with ice-cold 13 phosphate-buffered saline . 
+ The remaining liquid was removed , and the cells were frozen in a dry-ice slurry and stored at 80 C for not more than 1 month . 
+ Cell pellets were thawed on ice and resuspended in 500 ml Lysis Buffer A 
+ ( 10 mM Tris [ pH 8.0 ] , 20 % sucrose , 50 mM NaCl , 10 mM EDTA ) with 20 mg / ml of freshly added lysozyme . 
+ The samples were incubated at 37 C for 30 min and then mixed with 500 ml 23 IP buffer ( 100 mM Tris [ pH 7.0 ] , 300 mM NaCl , 2 % Triton X-100 ) with 0.7 mg/ml of freshly added PMSF . 
+ The samples were incubated at 37 C for an additional 15 min . 
+ The cells were pelleted by centrifugation at 850 3 g , 4 C , for 3 min , and the supernatants were gently removed by pipetting . 
+ The cell pellets were resuspended in 1 ml Lysis Buffer B ( 53 : 250 mM HEPES [ pH 7.5 ] , 2.5 M NaCl , 5 mM EDTA , two Complete Protease Inhibitor Mini tablets [ Roche P/N 11393100 ] , ﬁlter sterilized ) and moved to ice . 
+ The cells were sonicated on ice using a Misonix 
+ Sonicator 3000 with a microtip at power level 2 for three 10 s pulses , with 10 s rests on ice between pulses . 
+ The lysates were clariﬁed by centrifugation at 16,100 3 g , 4 C , for 5 min . 
+ The supernatants were transferred to a separate tube and stored at 80 C for not more than 1 month . 
+ Sonicated cellular lysates were thawed on ice . 
+ Aliquots of 350 ml cell lysate were treated with 5 ml RNaseA ( 10 mg/ml ) , 38 ml rDNaseI ( 38U , Ambion ) , and 37 ml 103 rDNaseI buffer at 37 C for 1 hr . 
+ Protein-DNA Complex Isolation, Crosslink Reversal, and DNA Labeling
+ Protein-DNA complexes were isolated by phenol extraction . 
+ To achieve this , 150 ml 10 mM Tris and 500 ml 25:24:1 phenol : chloroform : isoamyl alcohol were added to the samples . 
+ The samples were vortexed for 10 s and centri-fuged at top speed for 2 min at room temperature . 
+ A white disk was readily discernible at the aqueous/organic interface . 
+ To purify this interface , all aqueous and organic liquid was removed by pipetting . 
+ A second extraction was performed by adding 500 ml 10 mM Tris and 450 ml 24:1 chloroform : i-soamyl alcohol , vortexing and centrifuging as in the previous step . 
+ Again , all liquid was removed from the interface by pipetting , and residual liquid was removed by wicking . 
+ For crosslink reversal , the interface was suspended in 500 ml 10 mM Tris and 50 ml 10 % SDS and placed at 100 C for 30 min . 
+ The tubes were placed on ice , then moved to 65 C for 3 hr following addition of 5 ml proteinase K ( 20 mg/ml ) . 
+ After heat treatment , the solutions were phenol/chloroform extracted and ethanol precipitated in the presence of glycogen to purify the DNA . 
+ The DNA pellets were resuspended in 50 ml water and quantiﬁed using a Nanodrop spectrophotometer . 
+ Two micrograms of the fragmented DNA , isolated from DNA-protein complexes , was used as the input in a labeling reaction with the Enzo Terminal Labeling kit ( P/N 42630 ) . 
+ The labeling reactions were assembled according to the manufacturer 's instructions ; the labeling reaction was incubated at 37 C for 30 min to 1 hr . 
+ No qualitative difference in labeling efﬁciency was observed between reactions labeled for 30 min or for 1 hr . 
+ Tiling-Array Hybridization We designed an Affymetrix tiling array for the MG1655 E. coli genome , containing probes that cover the entire genome at a resolution of 4 bp between steps ; however , the steps alternate strand coverage , so there is an 8 bp step between probes on the same strand . 
+ There are a total of 2.47 million probes on the array , of which 2,300,160 directly enter analysis as E. coli probe pairs . 
+ The Affymetrix system pairs each 25-mer perfect match genomic sequence with a corresponding 25-mer that has a mismatch in the 13th position . 
+ The mismatch probe is intended as a crosshybridization control . 
+ In addition to the E. coli sequences , 33,996 probe pairs are sequence-speciﬁc controls against other genomes , including B. subtilis , lambda phage , and A. thaliana . 
+ The tiling array ( Ecoli_Tab520346F ) is a standard-sized 200 ml volume microarray with 5 mm features . 
+ The FS450_0002 washing protocol ( Affymetrix ) was used . 
+ Brieﬂy , this includes two posthybridization washes , a streptavidin-phycoerythrin stain , a poststain wash , an antibody ampliﬁcation stain , a second streptavidinphycoerythrin stain , a ﬁnal wash , and addition of holding buffer to the micro-array . 
+ After completion of the wash cycle , the microarrays were scanned a 
+ Generation of Reference Genomic Hybridizations
+ As the Affymetrix platform is a `` single-color '' hybridization system , it was necessary to choose an appropriate reference sample in order to determine relative enrichment/depletion of occupancy across the genome . 
+ To accomplish this , we used a whole-genome DNA preparation from the wild-type ( MG1655 ) E. coli strain , grown to stationary phase in an overnight culture . 
+ The genomic DNA was exposed to a low concentration of DNase I in the presence of cobalt to fragment the DNA to a median size of approximately 300 base pairs . 
+ The DNA was labeled as above , and hybridizations were performed with six biological replicates . 
+ RNA Expression Proﬁling
+ To obtain tiling-resolution RNA measures across the E. coli genome , wild-type cells were grown in biological duplicate in LB to OD600 = 0.3 . 
+ The cultures were moved to ice , and the QIAGEN RNeasy kit ( P/N 74104 ) was used to isolate total cellular RNA . 
+ Immediately following elution in RNase-free water , residual DNA was removed by treatment with DNaseI at 37 C for 15 min . 
+ The samples were treated again using the RNeasy kit , resuspended in 40 ml RNase-free water , and stored at 20 C . 
+ The RNA samples were reverse transcribed to DNA using the SuperScript system from Invitrogen ( P/N 18053017 ) . 
+ RNA ( 10 mg ) was incubated with 
+ 5 mg random hexamer primer at 70 C for 10 min . 
+ The samples were moved to ice and then mixed with 8 ml SuperScript buffer , 4 ml DTT , 2 ml 10 mM dNTP mix , 3 ml DEPC-treated water , and 1 ml RNasin . 
+ Then , 2 ml of SuperScript II Enzyme was added , and the samples were incubated at 25 C for 10 min , and 42 C for 2.5 hr , then moved to 95 C for 5 min to terminate the reaction . 
+ To fragment the RNA template , 2 ml 1 N NaOH was added , and the samples were placed at 65 C for 15 min . 
+ At room temperature , the pH was readjusted by adding 2 ml 1 N HCl . 
+ The cDNA was cleaned using the QIAGEN QiaQuick Nucleotide Removal Kit ( P/N 28304 ) and resuspended in 40 ml water . 
+ Hybrid-izations were performed in biological duplicate as above , using 1.3 mg cDNA . 
+ Data Processing and Normalization Protein Occupancy Display
+ We developed in-house computational and statistical analysis tools for use with the E. coli tiling array . 
+ We used a previous study ( Choe et al. , 2005 ) as a model for overall design of normalization among arrays and perfect match adjustment on single arrays . 
+ Analysis scripts were written in Perl , MatLab , and R to standardize the statistical manipulations across all data sets . 
+ The output ﬁle from the scanning process is a CEL ﬁle ; we used a proprietary Affy-metrix utility , bpmap_bcel_join , to extract the perfect match and mismatch raw signal intensities from the CEL ﬁles . 
+ Choe et al. ( 2005 ) demonstrated that subtracting the mismatch value from the perfect match value was a simple yet effective method for correcting the perfect match signal for crosshybridization . 
+ In order to quantify relative enrichment/depletion of sequences for protein occupancy , we utilized signals from six independent genomic DNA hybridizations to calculate a z-score . 
+ The z-score for each probe is deﬁned as the experimental value for the probe minus the mean of the six references ( for that probe ) , divided by the standard deviation of the six references ( for that probe ) . 
+ In exponentially growing cells , multiple origin replication events give rise to inﬂated hybridization signal in and around the original of replication . 
+ This increased intensity around the origin was clearly visible during early exponential growth but decreased signiﬁcantly for cells in late exponential phase . 
+ In order to correct for this , we used a local normalization protocol where the signal from each probe was normalized using the ratio between the mean signal intensity between the experimental signal and the mean of the six reference genomic replicates , calculated within an 80 kb window centered on the data point for that probe . 
+ Occupancy z-scores were calculated using these locally normalized values . 
+ Therefore , a positive z-score reﬂects the overrepresentation of a DNA sequence in the pool of DNA-protein complexes relative to genomic DNA , while a negative z-score indicates relative depletion of the DNA sequence in the set of DNA-protein complexes . 
+ For many of the analyses , the z-score values where spatially smoothed using a moving average window ranging from 32 to 512 base pairs . 
+ RNA Expression The E. coli Affymetrix tiling array was utilized for RNA expression analysis as described above . 
+ The resulting data consisted of PM-MM signal values at the resolution of 8 bp for each strand across the entire E. coli genome . 
+ The data for two replicate experiments were quantile normalized and averaged to yield mean expression values for each strand . 
+ We used linear interpolatio to generate RNA signal at 4 bp resolution for each strand . 
+ These strandspeciﬁc proﬁles were then smoothed using a 512 bp moving average window . 
+ In order to generate a strand-independent transcriptional output proﬁle , we then took the larger of the two strand 's signal at each genomic coordinate . 
+ This smoothed , strand-independent transcriptional proﬁle was used for 
+ The Supplemental Data include Supplemental Experimental Procedures , three tables , and eleven ﬁgures and can be found with this article online at http : / / www.cell.com/molecular-cell/supplemental/S1097-2765 ( 09 ) 00479-1 . 
+ was assisted by fellowship # 08-1090-CCR-EO from the New Jersey State Commission on Cancer Research . 
+ S.T. was supported by grants from the NSF ( CAREER ) , DARPA , NHGRI , NIGMS ( P50 GM071508 ) , and the NIH Director 's Pioneer Award ( 1DP10D003787-01 ) . 
+ The oligonucleotide array data were deposited at NCBI Gene Expression Omnibus with accession number
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/19656291.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/19656291.txt 0 → 100644
View file @27818a9
+ NsrR targets in the Escherichia coli genome: new insights
+ Jonathan D. Partridge ,1 Diane M. Bodenmiller ,2 † Michael S. Humphrys2 ‡ and Stephen Spiro1 * 1Department of Molecular and Cell Biology , The University of Texas at Dallas , 800 W Campbell Road , Richardson , TX 75080 , USA . 
+ 2School of Biology , Georgia Institute of Technology , 310 Ferst Drive , Atlanta , GA 30332 , USA . 
+ Summary
+ The Escherichia coli NsrR protein is a nitric oxide-sensitive repressor of transcription . 
+ The NsrR-binding site is predicted to comprise two copies of an 11 bp motif arranged as an inverted repeat with 1 bp spacing . 
+ By mutagenesis we conﬁrmed that both 11 bp motifs are required for maximal NsrR repression of the ytfE promoter . 
+ We used chromatin immunoprecipitation and microarray analysis ( ChIP-chip ) to show that NsrR binds to 62 sites close to the 5 ends of genes . 
+ Analysis of the ChIP-chip data suggested that a single 11 bp motif ( with the consensus sequence AANATGCATTT ) can function as an NsrR-binding site in vivo . 
+ NsrR binds to sites in the promoter regions of the ﬂiAZY , ﬂiLMNOPQR and mqsR-ygiT transcription units , which encode proteins involved in motility and bioﬁlm development . 
+ Reporter fusion assays conﬁrmed that NsrR negatively regulates the ﬂiA and ﬂiL promoters . 
+ A mutation in the predicted 11 bp NsrR-binding site in the ﬂiA promoter impaired repression by NsrR and prevented detectable binding in vivo . 
+ Assays on softagar conﬁrmed that NsrR is a negative regulator of motility in E. coli K12 and in a uropathogenic strain ; surface attachment assays revealed decreased levels of attached growth in the absence of NsrR . 
+ Accepted 6 July , 2009 . 
+ * For correspondence . 
+ E-mail stephen.spiro @ utdallas.edu ; Tel. ( +1 ) 972 883 6896 ; Fax ( +1 ) 972 883 2409 . 
+ Present addresses : † Lilly Research Laboratories , Eli Lilly and Company , India-napolis , IN 46285 , USA . 
+ ‡ Centers for Disease Control and Prevention , 1600 Clifton Road , Atlanta , GA 30333 , USA . 
+ Introduction
+ Nitric oxide ( NO ) is synthesized by the inducible nitric oxide synthase in phagocytic cells and is an important component of the innate immune response to infection ( Fang , 2004 ) . 
+ NO is also made by some bacteria , either as a by-product of nitrite reduction to ammonia , or as an intermediate of denitriﬁcation ( Watmough et al. , 1999 ) . 
+ Thus , pathogenic bacteria can potentially be exposed both to endogenously generated NO and to the NO produced by host cells . 
+ In Escherichia coli , the transcriptional regulators NorR and NsrR mediate adaptive responses to NO by controlling the expression of genes encoding enzymes that reduce or oxidize NO to less toxic species ( Mukhopadhyay et al. , 2004 ; Bodenmiller and Spiro , 2006 ; Spiro , 2007 ) . 
+ The key NO detoxifying enzymes are the ﬂavohaemoglobin ( encoded by the hmp gene ) and the ﬂavorubredoxin ( encoded by norVW ) , which are regulated by NsrR and NorR respectively ( Hutchings et al. , 2002 ; Gardner et al. , 2003 ; Bodenmiller and Spiro , 2006 ) , and the respiratory nitrite reductase , Nrf , which reduces both nitrite and NO to ammonia ( Poock et al. , 2002 ) . 
+ The s54-dependent transcriptional activator NorR is stimulated by the formation of a nitrosyl species at a mono-nuclear iron site in its signalling domain ( D ′ Autréaux et al. , 2005 ) . 
+ In the case of NsrR , the binding site for NO is likely to be an [ Fe-S ] cluster ( Isabella et al. , 2008 ; Tucker et al. , 2008 ; Yukl et al. , 2008 ) . 
+ We initially identiﬁed NsrR as a repressor of the ytfE , hmp and ygbA genes ( Bodenmiller and Spiro , 2006 ) . 
+ The product of the ytfE gene is a di-iron protein , which has been implicated in the repair of damaged [ Fe-S ] clusters ( Justino et al. , 2007 ) . 
+ A YtfE homologue ( NorA ) from Ralstonia eutropha has been shown to bind NO , and has been suggested to function to lower the cytoplasmic NO concentration ( Strube et al. , 2007 ) . 
+ The ygbA gene is of unknown function , while hmp encodes a well-characterized NO detoxiﬁcation system ( Poole and Hughes , 2000 ) . 
+ Subsequently described targets for NsrR regulation include the hcp-hcr and yeaR-yoaG genes , and the nrf operon that encodes Nrf ( Filenko et al. , 2007 ; Lin et al. , 2007 ) . 
+ Known and predicted targets for NsrR regulation have in their promoter regions an 11 bp inverted repeat with a spacing of 1 bp ( Rodionov et al. , 2005 ; Bodenmiller and Spiro , 2006 ; Lin et al. , 2007 ) . 
+ In this paper we conﬁrm that this sequence is required for NsrRmediated repression of the ytfE promoter , but also present evidence to suggest that a single copy of the 11 bp motif may be sufficient for NsrR binding . 
+ The full extent of the NsrR regulon of E. coli has been assessed computationally ( Rodionov et al. , 2005 ) , and by analysis of the transcriptome of a strain in which NsrR was depleted by the presence of multiple copies of a cloned NsrR-binding site ( Filenko et al. , 2007 ) . 
+ As a complementary approach to identifying genes that might be regulated by NsrR , we describe in this paper the use of chromatin immunoprecipitation and microarray analysis ( ChIP-chip ) to locate NsrR-binding sites in the E. coli genome . 
+ Computational analysis of newly identiﬁed targets revealed additional insights into the requirements for a functional NsrR target site . 
+ Unexpectedly , we found NsrR-binding sites associated with the promoter regions of three transcription units containing genes with well-established or suspected roles in motility and/or bioﬁlm development . 
+ We conﬁrmed that two of the three promoters are subject to regulation by NsrR and NO , and showed that NsrR is a negative regulator of motility in E. coli . 
+ Results
+ Isolation of mutations in the NsrR-binding site We have previously shown that NsrR regulates the ytfE , hmp and ygbA promoters , and have predicted the sequences of the NsrR-binding sites in these promoters ( Bodenmiller and Spiro , 2006 ) . 
+ There is also a predicted NsrR-binding site in the promoter region of the hcp-hcr genes ( Rodionov et al. , 2005 ) , which encode the hybrid cluster ( prismane ) protein and an associated redox enzyme , and NsrR is a repressor of hcp-hcr transcription ( Filenko et al. , 2007 ) . 
+ We analysed the 5 ′ non-coding regions of these four transcription units for the occurrence of candidate cis-acting regulatory sequences , using the MEME algorithm ( Bailey et al. , 2006 ) . 
+ MEME detected the previously predicted NsrR-binding sites , and further suggested the presence of a second NsrR-binding site in the ygbA , hcp and hmp promoters . 
+ In each case , the primary ( previously described ) NsrR sites overlap the -10 and/or transcription start site ( Fig. 1A ) , while the secondary sites are further upstream . 
+ The seven predicted sites ( Fig. 1A ) give rise to the sequence logo depicted in Fig. 1B , which is very similar to the logo previously generated for NsrR-binding sites in a group of Enteric bacteria ( Rodionov et al. , 2005 ) , with the addition of two AT base pairs which are present at the 5 ′ ends of all seven E. coli sites ( Fig. 1 ) . 
+ A similar sequence can also be found in the yeaR promoter , which is regulated directly by NsrR ( Lin et al. , 2007 ) . 
+ The presence in the cell of multiple copies of the putative NsrR-binding site from the ytfE promoter causes de-repression of a ytfE -- lacZ reporter fusion , by a repressor titration effect ( Bodenmiller and Spiro , 2006 ) . 
+ Deletion of a single AT base pair at the centre of the NsrR-binding site eliminated repressor titration ( Bodenmiller and Spiro , 2006 ) . 
+ We selected the NsrR-binding site in ytfE as a a. Wild type and mutant ytfE promoter sequences cloned in pSTBlue-1 were transformed into a strain with a ytfE-lacZ reporter fusion . 
+ Activities reﬂect de-repression of the ytfE promoter by repressor titration . 
+ b. Wild type and mutant promoters were fused to lacZ and integrated in the chromosome . 
+ Cultures were grown anaerobically in a minimal medium . 
+ For treatment with nitrite , cultures were grown to midexponential phase , supplemented with 5 mM nitrite , then assayed 60 -- 90 min later . 
+ c. Wild type ytfE promoter assayed in a DnsrR background . 
+ d . 
+ The deletion mutation eliminates ytfE promoter activity , presumably because it is located in the -10 sequence . 
+ ND , not done . 
+ model for further study , and sought to isolate additional mutations in this sequence that impair NsrR binding . 
+ We subjected the 205 bp ytfE promoter fragment to random mutagenesis , and screened on lactose indicator media for clones with lower activities in the repressor titration assay . 
+ By picking Lac - colonies , we repeatedly isolated the same 1 bp deletion that we had previously made by site directed mutagenesis . 
+ We assume that the run of four AT base pairs in the NsrR-binding site in ytfE is prone to deletion by slipped-strand mispairing during the mutagenesis reaction . 
+ This mutation eliminates activity in the repressor titration assay ( Table 1 ) , and ytfE promoter activity , presumably because the deletion is in the -10 sequence ( we were unable to isolate Lac + fusion phages to assay the activity of this mutant ytfE promoter ) . 
+ By screening for a partial phenotype in the repressor titration assay ( pale blue colonies ) , we isolated two substitutions at positions 2 and 6 in the NsrR-binding site ( Fig. 1B ) . 
+ Both caused a defect in the repressor titration assay , and de-repression of the ytfE promoter in both the absence and presence of nitrite ( Table 1 ) . 
+ The two mutations isolated by random mutagenesis are at positions that are almost completely conserved in known and predicted NsrR-binding sites , and introduce nucleotides that never occur in known and predicted sites ( Fig. 1 ; Table 2 ) , with the single exception of a C at position 6 in the MEME-predicted second site in hmp ( Fig. 1A ) . 
+ Additional mutations were introduced into the NsrR-binding site in ytfE by site-directed mutagenesis . 
+ Mutations corresponding to those previously isolated at positions 2 and 6 were made in the right half of the inverted repeat , at positions 22 and 18 respectively ( Fig. 1B ) . 
+ We also made mutations at the symmetry-related positions 5 and 19 , and made three double mutants in which symmetry-related single mutations were combined ( Fig. 1B , Table 1 ) . 
+ Repressor titration assays showed that all single mutations signiﬁcantly reduced the ability of the sites to titrate NsrR , and that single mutations at symmetry-related positions had similar effects ( mutations 2 and 22 being the most severe ) . 
+ In all three cases , symmetry-related double mutations had larger effects than either of the corresponding single mutations ( Table 1 ) . 
+ Mutant ytfE sequences made by site-directed mutagenesis were also fused to lacZ for assays of promoter activity . 
+ A similar pattern was found to that seen with the repressor titration assays , in that single mutations caused some de-repression of the promoter , with mutations at symmetry-related positions having similar effects ( Table 1 ) . 
+ Double mutations caused greater de-repression than the corresponding single mutations , showing that both halves of the site are required for optimal repression of the ytfE promoter . 
+ Mutations at positions 2 and 22 again caused the most severe phenotypes . 
+ Induction by nitrite increased the activities of promoters with mutations at positions 2 and 22 between 2.5 - and 3.6-fold , but between 8.8 - and 14.7-fold for all of the other promoters ( Table 1 ) . 
+ The reduced induction ratio is due to particularly high activities for the 2 , 22 and 2 + 22 mutants in cultures grown anaerobically in the absence of nitrite ( Table 1 ) . 
+ The reason why mutations at these positions have a disproportionate effect on promoter activity under noninducing conditions is not known . 
+ All of the promoters remain somewhat inducible by nitrite , demonstrating that none of the mutations completely eliminates NsrR binding . 
+ We tested binding of NsrR to a selection of the mutant ytfE promoters in vivo by chromatin immunoprecipitation ( ChIP ) . 
+ For these experiments , the NsrR protein was modiﬁed by addition of a C-terminal 3XFlag tag ; the modi-ﬁed protein was expressed from a single-copy gene at the nsrR locus on the chromosome ( Efromovich et al. , 2008 ) . 
+ Cultures were transformed with pSTBlue-1 derivatives containing wild type and mutant ytfE promoters ( the same plasmids used for the repressor titration assays ) . 
+ After ChIP , the immunoprecipitated DNAs were used as templates for PCRs using vector-speciﬁc primers designed to amplify ytfE sequences . 
+ Ampliﬁcation conditions were optimized to allow detection of NsrR binding to the wildtype site in ytfE ( Fig. 1D , compare lanes 1 and 2 ) . 
+ Under these conditions , binding to the deletion mutant was barely detectable , and all of the single and double point mutations tested showed signiﬁcantly reduced binding ( Fig. 1D ) . 
+ Quantitative conclusions can not be drawn from this experiment , but it nevertheless shows that all of the mutations reduce binding in vivo sufficiently to severely impair detection by ChIP , under conditions that allow detection of the wild type interaction . 
+ Our results with the ytfE promoter are consistent with previous suggestions that the NsrR consensus-binding site in E. coli consists of two copies of the sequence 5 ′ - AAGATGCYTTT-3 ′ arranged as an inverted repeat separated by 1 bp ( Rodionov et al. , 2005 ; Bodenmiller and Spiro , 2006 ; Lin et al. , 2007 ) . 
+ Results presented later in this paper suggest that a single 11 bp motif can also function as an NsrR-binding site . 
+ Genome-wide search for NsrR-binding sites We next used ChIP-chip to identify NsrR-binding sites in the genome of E. coli K12 strain MG1655 . 
+ Cultures expressing the 3XFlag-tagged NsrR were grown anaerobically in the presence and absence of nitrate . 
+ Under anaerobic growth conditions , nitrate provides a source of endogenously generated NO and causes de-repression of NsrR targets ( Bodenmiller and Spiro , 2006 ) . 
+ After ChIP , the precipitated DNAs were labelled with Cy5 and Cy3 and hybridized together to a high-density microarray ( from Oxford Gene Technology ) . 
+ Peaks in the ﬂuorescence ratio therefore identify regions of the chromosome that are bound by NsrR , with the degree of occupancy of sites being greater in the culture grown in the absence of nitrate . 
+ Full technical details of this experiment and statistical procedures used for data analysis have been published ( Efromovich et al. , 2008 ) . 
+ The ChIP-chip data for the three originally identiﬁed NsrR targets ( ytfE , hmp and ygbA ) are shown in Fig. 2 . 
+ The ChIP-chip data show only a weak signal for NsrR binding at the ygbA promoter ( Fig. 2 ) , which is known to be regulated by NsrR in vivo ( Bodenmiller and Spiro , 2006 ) . 
+ Thus we required rigorous methods to identify other weak signals in the data that may represent bona ﬁde cis-acting regulatory sites . 
+ We adopted three approaches to this problem . 
+ First , we scrutinized the three datasets individually , and looked for peaks in which two or more consecutive probes showed a greater than twofold enrichment . 
+ If a peak met these criteria in at least two of the three samples , then it was recorded as positive . 
+ Twenty-nine peaks were identiﬁed in this way , of these nine were not considered further on the grounds that they were between convergently transcribed genes or deep within coding regions ( here deﬁned as > 300 bp from the start codon ) . 
+ This method did not identify the ygbA peak , so may be too conservative . 
+ Next , we analysed the three datasets independently with ChIPOTle ( Buck et al. , 2005 ) , and scored a peak as positive if it was signiﬁcant ( P < 0.0001 ) in at least two of the three datasets . 
+ This analysis identiﬁed an additional 41 peaks ( including ygbA ) of which 16 were discarded as internal sites or sites between convergent genes . 
+ Finally , we used a novel method for peak detection ( Efromovich et al. , 2008 ) which , disregarding sites deep within coding regions , between convergently transcribed genes or with very small mean enrichment ratios ( < 1.5 ) , identiﬁed an additional 17 NsrR-binding sites . 
+ The ﬁnal output of 62 NsrR-binding sites in or close to 5 ′ non-coding regions is shown in Table 2 . 
+ Previously identiﬁed and novel NsrR-binding sites The presence of NsrR-binding sites in or near to 5 ′ noncoding regions identiﬁes genes that potentially belong to the NsrR regulon . 
+ Of the promoters bound by NsrR in vivo ( Table 2 ) , eight not previously known to be regulated by NsrR ( hypA/hycA , acs , aceE , ydcX , putA , ndh and sodB ) show responses to nitrite in transcriptomics experiments that are consistent with positive or negative regulation by NsrR ( Constantinidou et al. , 2006 ) . 
+ Most known NsrR regulon members ( ytfE , yeaR-yoaG , hmp , hcp-hcr and ygbA ) show differential regulation in an asymptomatic strain of E. coli growing in the urinary tract ( Roos and Klemm , 2006 ) . 
+ Potential NsrR targets ydcX , dsdX , ndh and tehAB ( Table 2 ) are also upregulated in the urinary tract ( Roos and Klemm , 2006 ) and so share an expression pattern with genes known to be regulated by NsrR . 
+ Several of the other targets listed in Table 2 have been reported to respond to sources of NO or to S-nitrosoglutathione in other transcriptomics experiments ( Justino et al. , 2005 ; Hyduke et al. , 2007 ; Pullan et al. , 2007 ; Bourret et al. , 2008 ; Jarboe et al. , 2008 ) . 
+ In total , of the 62 targets implicated by the ChIP-chip data , 33 have been previously shown to be inﬂuenced by NsrR and/or sources of NO or nitrosative stress ( Table 2 ) . 
+ Nine transcription units were previously suggested by transcriptomics to be repressed by NsrR ( Filenko et al. , 2007 ) ; the promoter regions of seven of these ( ytfE , hmp , hcp , nrfA , yccM , ygbA and napF ) are bound by NsrR in vivo according to the ChIP-chip data ( Table 2 ) , and two ( uspF and yeaR ) are not . 
+ Visual inspection of the raw ChIP-chip data conﬁrmed the absence of signals for uspF and yeaR . 
+ Direct regulation of yeaR by NsrR has been demonstrated ( Lin et al. , 2007 ) , hence this is a true false-negative in the ChIP-chip data . 
+ One possible explanation is that the 3X-Flag tag on NsrR is occluded by other proteins bound to the yeaR promoter . 
+ Overexpression of NsrR causes reduced expression of the small RNA RybB and of the rpoE gene encoding sE ( Thompson et al. , 2007 ) . 
+ Neither gene is bound by NsrR in its promoter region according to the ChIP-chip data . 
+ We assume that these are also false negatives , or that NsrR regulation of rybB and rpoE is indirect . 
+ The transcriptomics data provided good evidence of one gene positively regulated by NsrR , ydbC ( Filenko et al. , 2007 ) . 
+ We found no NsrR-binding site in the ydbC promoter , though there is a site in the downstream gene , ydbD ( Table 2 ) , which is upregulated by NO ( Justino et al. , 2005 ) . 
+ More than 50 promoters implicated as NsrR targets by ChIP-chip were not identiﬁed in a transcriptomics experiment in which an nsrR mutation was phenocopied by repressor titration ( Filenko et al. , 2007 ) . 
+ In some cases this may simply be because NsrR binding to DNA has no regulatory consequence . 
+ More likely explanations are that the repressor titration approach used was not very sensitive ( see below ) , and/or that some NsrR targets are subject to additional layers of regulation , such that they would not be identiﬁed in a straightforward analysis of the transcriptome under a limited range of growth conditions . 
+ The latter consideration applies , for example , to the tynA and feaB genes , which were identiﬁed as potential NsrR targets by ChIP-chip ( Table 2 ) , but which are subject to regulation by NsrR only in cultures grown on unusual carbon or nitrogen sources ( Rankin et al. , 2008 ) . 
+ In several cases , NsrR-binding sites are close to the 5 ′ ends of genes that are ( or are probably ) internal to single transcription units , and therefore are not associated with promoters ; examples are feoB , hcr and nrdB ( Table 2 ) . 
+ The regulatory signiﬁcance , if any , of these sites is not known ; hcr is particularly interesting because the promoterproximal gene of the operon ( hcp ) also has an NsrR-binding site and is regulated by NsrR ( Filenko et al. , 2007 ) . 
+ The new potential targets for NsrR regulation ( Table 2 ) include genes and operons involved in carbon and energy metabolism ( hycA/hypA , feaB , aceE , mhpT , tynA , caiA and ndh ) , NO metabolism ( norR/norV ) , proteolysis ( clpB , ftsH and ptrA ) , transport processes ( mhpT , yhfC , dsdX and yhfC ) , stress responses ( sodB and sufA ) and motility ( mqsR , ﬂiL and ﬂiA ) . 
+ In an initial follow-up study , we have conﬁrmed NsrR regulation of tynA and feaB ( Rankin et al. , 2008 ) . 
+ Computational analysis of NsrR-binding sites Non-coding regions that contain NsrR-binding sites as revealed by ChIP-chip were initially scrutinized for common potential regulatory sequences using WEEDER ( Pavesi et al. , 2004 ) . 
+ This search suggested that most novel potential NsrR targets do not contain a sequence resembling the long inverted repeat that is present in the ytfE , hmp and ygbA promoters ( Fig. 1 ) . 
+ However , a sequence resembling half of the inverted repeat could be detected in many cases . 
+ To extend this search , we constructed a position-speciﬁc score matrix ( PSSM ) from the six easily detectable half-sites in the ytfE , hmp and ygbA promoters and used the PSSM to search the E. coli genome with Virtual Footprint ( Münch et al. , 2005 ) . 
+ As new half sites in promoters known to be bound by NsrR ( Table 2 ) were detected , they were added to the PSSM and the search was repeated iteratively . 
+ In this way , we identiﬁed 49 potential NsrR-binding sites in 37 of the intergenic regions to which NsrR binds in vivo ( Table 2 ) . 
+ The sequence logo for these 49 sites is shown in Fig. 1C , which reveals that the consensus sequence for the suggested NsrR-binding site contains a 12 bp interrupted partial palindrome , 5 ′ - AANATGCATTTN-3 ′ , corresponding to one half of the previously described inverted repeat sequence ( Fig. 1A ; see above ) . 
+ In a number of promoters , the computational search failed to identify signiﬁcant matches to the PSSM ( Table 2 ) . 
+ Similarly , in several other ChIP-chip studies , binding sites have been identiﬁed in chromosomal regions that do not contain a good match to the consensus sequence for the regulatory protein concerned ( reviewed in Wade et al. , 2007 ) . 
+ Regulation of motility genes by NsrR
+ We were particularly interested to observe NsrR-binding sites associated with the promoter regions of transcription units containing genes that are known ( ﬂiAZY and ﬂiLM-NOPQR ) or suspected ( mqsR-ygiT ) to have roles in motility and sessile growth ( Fig. 2 ) . 
+ The ﬂiA gene encodes the alternative sigma factor , s28 , which is required for the transcription of Class III motility and chemotaxis genes ( Chilcott and Hughes , 2000 ) . 
+ The ﬂiZ gene , which is co-transcribed with ﬂiA , encodes a protein which acts as a positive regulator of motility , and as an inhibitor of the expression of curli ﬁmbriae , which are required for surface-attached growth ( Pesavento et al. , 2008 ; Saini et al. , 2008 ) . 
+ The ﬂiLMNOPQR operon encodes structural components of the ﬂagellum and the ﬂagellin export apparatus ( Chilcott and Hughes , 2000 ) . 
+ The mqsR gene ( which is very likely co-transcribed with ygiT , a predicted regulatory gene ) has been described as a regulator of motility and bioﬁlm formation ( Gonzalez Barrios et al. , 2006 ) . 
+ The predicted NsrR-binding site in the ﬂiA promoter overlaps the start site for transcription by RNA polymerase containing s28 , the product of the ﬂiA gene ( Fig. 3 ) . 
+ The site is therefore well situated to mediate negative regulation of ﬂiAZY expression . 
+ To test the functionality of this site , we substituted the highly conserved G at position 6 ( Fig. 1C ) with a C . 
+ We chose to mutate position 6 because the equivalent mutation in the ytfE promoter causes a severe phenotype ( Table 1 ) and this nucleotide is located such that the mutation is unlikely to affect either the s70 or the s28 promoter of ﬂiA ( a substitution at position 2 would change the s28 transcription start site ) . 
+ The G to C mutation is on the bottom strand of the ﬂiA promoter ( Fig. 3A ) ; the mutant promoter is designated ﬂiA c. NsrR binding to the ﬂiA promoter in vivo was examined by ChIP . 
+ When PCR ampliﬁcation of immunoprecipitated DNA was optimized to allow detection of NsrR binding to the wild-type ﬂiA promoter , binding to the ﬂiA c promoter was undetectable above background levels ( Fig. 3B ) . 
+ Thus , these experiments conﬁrm the presence of an NsrR-binding site in the ﬂiA promoter as was suggested by ChIP-chip ( Fig. 2 ) and bioinformatic analysis , and provide experimental support for the revised consensus sequence for NsrR-binding sites . 
+ To quantify the regulation of motility genes by NsrR , we constructed lacZ reporter fusions to the ﬂiA , ﬂiL and mqsR promoters and measured their activities in an nsrR mutant and a strain containing multiple copies of the nsrR gene , in the presence and absence of a source of NO . 
+ We found no evidence for regulation of the mqsR promoter by NsrR ( data not shown ) , possibly because we have yet to identify suitable growth conditions that reveal regulation by NsrR . 
+ The ﬂiA and ﬂiL promoters were 1.9 - and 1.7-fold upregulated in an nsrR mutant respectively , and had moderately increased activities in the presence of a source of NO ( Table 3 ) . 
+ In the presence of multiple copies of nsrR , the activities of both promoters were 6 -- 7 fold reduced , an effect that was partially reversed by the addition of a source of NO to growth media ( Table 3 ) . 
+ Taken together , these data indicate that NsrR is a negative regulator of both the ﬂiA and the ﬂiL promoters . 
+ Assay of a ﬂiA c -- lacZ fusion revealed that the single base pair mutation in the ﬂiA c promoter mimicked the effect of NO in an otherwise wild-type strain ( Table 3 ) . 
+ The partial de-repression caused by the ﬂiAc mutation could be overcome by the presence of multiple copies of nsrR ( Table 3 ) . 
+ These results are consistent with the ﬂiAc mutation lowering the affinity of the NsrR-binding site in the ﬂiA promoter , as was suggested by ChIP . 
+ The ﬂiA -- lacZ reporter fusion was ~ 2-fold upregulated in a strain transformed with a high copy number plasmid containing the cloned ytfE promoter ( data not shown ) . 
+ This effect of the ytfE promoter was abolished by the deletion at position 12 of the NsrR-binding site ( Fig. 1B ) , suggesting that multiple copies of the NsrR-binding site in ytfE de-repress the ﬂiA promoter by repressor titration ( hence providing additional conﬁrmation of the presence of an NsrR-binding site in ﬂiA ) . 
+ The small magnitude of the repressor titration effect on ﬂiA likely explains why ﬂiA was not identiﬁed as an NsrR target in the transcriptomics analysis ( Filenko et al. , 2007 ) . 
+ In the reciprocal experiment , multiple copies of the ﬂiA promoter failed to cause de-repression of the ytfE -- lacZ reporter fusion ( and also failed to de-repress the ﬂiA promoter , data not shown ) . 
+ One possible explanation is that the inverted repeat sequence in ytfE provides a higher affinity NsrR-binding site than the single half site in ﬂiA . 
+ The same consideration may also explain why deletion of the central base pair in the inverted repeat in ytfE abolishes repressor titration despite preserving two intact half sites . 
+ Regulation of motility by NsrR
+ Flagella-based motility was assayed on soft agar plates . 
+ An nsrR mutant showed a small though reproducible and signiﬁcant increase in motility ( 1.3-fold ; P < 0.0001 ) as compared with the wild-type strain ( measured as the diameter of the motility ring ; data not shown ) . 
+ Addition of an NO source caused a similar increase in motility in a wild-type strain but not in an nsrR mutant . 
+ These observations are consistent with the negative regulation of motility genes by NsrR that was measured in reporter fusion assays . 
+ In a strain containing multiple copies of nsrR , the motility ring was ~ 2-fold smaller ( P < 0.0001 ) than in a control strain with a single chromosomal copy of nsrR ( Fig. 4 ) . 
+ This effect of NsrR on motility was reversed in plates supplemented with a slow-releasing source of NO ( Fig. 4 ) , suggesting that inactivation of NsrR alleviates negative control of motility . 
+ The NsrR protein contains three conserved cysteine residues thought to be involved in the co-ordination of an [ Fe-S ] cluster , which is the likely site of NO sensing ( Isabella et al. , 2008 ; Tucker et al. , 2008 ; Yukl et al. , 2008 ) . 
+ We have substituted cysteine 96 with serine , and found that the NsrR-C96S variant is unable to repress fully the NsrR targets ytfE , hmp , hcp and ygbA ( J. Partridge and S. Spiro , unpubl . 
+ data ) . 
+ The C96S protein also has no negative effect on motility ( data not shown ) , conﬁrming that the effect of NsrR on motility requires the protein to be in a form that is competent to control transcription . 
+ This excludes the possibility that inhibition of motility by multiple copies of nsrR is a non-speciﬁc consequence of protein over-production . 
+ Motility is a variable and strainspeciﬁc phenotype in E. coli . 
+ In similar assays to those described above , we showed that multiple nsrR copies inhibit motility to a similar extent ( ~ 2-fold ; P < 0.0001 ) in an E. coli K12 strain ( RP437 ) that is frequently used for assays of motility and chemotaxis . 
+ It has recently been shown that hmp mutants of E. coli are non-motile ( Stevanin et al. , 2007 ) , though on the succinate medium used the phenotype would also be consistent with a defect in aerotaxis . 
+ As the hmp gene is negatively regulated by NsrR ( Bodenmiller and Spiro , 2006 ) , one possible interpretation of our results is that the motility defect associated with an increased nsrR copy number results from downregulation of hmp . 
+ We assayed the motility of hmp mutants of MG1655 and RP437 , using both tryptone and succinate soft agar ( Stevanin et al. , 2007 ) . 
+ With these strains , we found no detectable effect of hmp on E. coli motility . 
+ Thus , the motility phenotype that we observe when nsrR is deleted or overexpressed is not a consequence of hmp up or downregulation respectively . 
+ NsrR regulates motility and surface attachment in a uropathogenic strain of E. coli
+ We were interested to determine whether the effects of NsrR ( and NO ) on motility that we observed in K12 strains are generalizable to pathogenic strains of E. coli . 
+ We focused on a uropathogenic strain of E. coli ( UPEC ) that is associated with urinary tract infections . 
+ The amino acid sequence of NsrR , and the nucleotide sequence of the ﬂiA promoter shown in Fig. 3 are identical in the UPEC strain CFT073 and MG1655 . 
+ In CFT073 , ﬂagella-based motility is important for the organism 's ability to ascend the urinary tract and disseminate further in the host ( Lane et al. , 2007 ) . 
+ Furthermore , some genes that are regulated by NsrR in K12 strains are upregulated during urinary tract infection , notably the hmp gene encoding the NO detoxifying haemoglobin ( Snyder et al. , 2004 ) . 
+ Thus , there is evidence that both motility and 
+ NsrR might have important roles in vivo , and transcriptomics data suggest that CFT073 is exposed to NO ( Snyder et al. , 2004 ) . 
+ We found that nsrR overexpression exerted a greater negative effect on motility in CFT073 ( > 3-fold ; P < 0.0001 ) than was the case in K12 strains , an effect that was reversed by addition of NO to the medium ( Fig. 4 ) . 
+ NO caused a small though signiﬁcant ( P < 0.0001 ) stimulation of CFT073 motility ( Fig. 4 ) , as did deletion of the nsrR gene ( Fig. 4 ) . 
+ Thus , NsrR is a negative regulator of motility in CFT073 , and NO inﬂuences motility via NsrR . 
+ As for K12 strains , we found no motility phenotype associated with an hmp mutation in CFT073 ( data not shown ) . 
+ As motility and attached growth are typically subject to reciprocal regulation , we measured the ability of CFT073 ( and derivatives deleted for nsrR or containing multiple copies of the nsrR gene ) to adhere to the surface of glass tubes . 
+ Deletion of nsrR or addition of a source of NO signiﬁcantly reduced attached growth ( Fig. 5 ) . 
+ The presence of multiple copies of nsrR stimulated attached growth , an effect that was partially reversed by the addition of NO ( Fig. 5 ) . 
+ These results indicate that NsrR regulates attached growth in CFT073 ( most likely indirectly ) and that NO inﬂuences attached growth via NsrR . 
+ Discussion
+ Data presented in this paper suggest that NsrR-binding sites in E. coli fall into two classes : those ( such as the site in ytfE ) comprising two copies of an 11 bp inverted repeat with 1 bp spacing , and those ( exempliﬁed by the site in ﬂiA ) which have a single copy of the 11 bp element . 
+ The available information suggests that the inverted repeat is a higher-affinity site that allows NsrR repression to operate over a larger range . 
+ A logical cor-ollary of this suggestion is that the two types of site are occupied by NsrR in different oligomeric states . 
+ As the 11 bp motif is a palindrome ( Fig. 1C ) , it may be a binding site for an NsrR dimer , in which case the 23 bp inverted repeat might be occupied by a dimer of dimers . 
+ In sedimentation equilibrium experiments , the Streptomyces coelicolor NsrR formed a sequence-speciﬁc complex with DNA with a molecular weight consistent with the protein being dimeric ( Tucker et al. , 2008 ) . 
+ However , these experiments were done with protein containing a [ 2Fe-2S ] cluster ; the physiologically relevant form of NsrR may contain a [ 4Fe-4S ] cluster ( Yukl et al. , 2008 ) . 
+ The [ 4Fe-4S ] NsrR from Bacillus subtilis is dimeric in solution , its oligomeric state in the presence of a DNA target has not been examined ( Yukl et al. , 2008 ) . 
+ Interestingly , the NsrR homologue IscR also binds to two types of site ( Type 1 and Type 2 ) , though in this case they are unrelated sequences . 
+ Binding to Type 2 sites does not require the [ Fe-S ] cluster of IscR , and two IscR dimers bind cooperatively to a Type 2 site ( Nesbit et al. , 2009 ) . 
+ We have demonstrated that the NO-sensitive repressor protein NsrR is a negative regulator of motility genes and of ﬂagella-based motility in E. coli K12 . 
+ We propose that NO exerts effects on motility through NsrR-mediated regulation of the ﬂiA promoter . 
+ The ﬂiA gene product ( s28 ) is required for the expression of all Class III ﬂagella and chemotaxis genes ( Chilcott and Hughes , 2000 ) , so by regulating ﬂiA NsrR potentially exerts widespread indirect effects on motility and chemotaxis , both of which are involved in migration through soft agar ( Wolf and Berg , 1989 ) . 
+ The effects of NsrR on ﬂiA promoter activity and on motility are quite small , and are more pronounced in strains containing multiple copies of nsrR than in an nsrR mutant . 
+ Similar contrasts between deletion and overexpression have been observed previously for genes regarded as negative regulators of motility . 
+ For example , the pefI-srgD genes of Salmonella enterica serovar Typhimurium were recently described as negative regulators of motility , despite the fact that there is no phenotype associated with deletion of the genes , which inhibit motility when expressed from the araBAD or tetA promoters ( Wozniak et al. , 2009 ) . 
+ We also showed that NsrR regulates motility and attached growth in a UPEC strain . 
+ A search of the UPEC strain CFT073 genome with the same PSSM used to search the E. coli K12 genome revealed predicted NsrR-binding sites in the promoter regions of genes involved in the production of pili ( papI and papI_2 ) and ﬁmbriae ( sfaB , C1936 , ipbA and ipuA ) . 
+ Thus it is possible that the effect of NsrR on attached growth is multifactorial and indirect . 
+ One possible component of the effect is that changes in ﬂiZ expression mediated by NsrR regulation of the ﬂiAZY promoter lead to altered levels of expression of curli ﬁmbriae . 
+ In E. coli K12 , FliZ acts by indirectly causing downregulation of genes involved in the expression of curli ﬁmbriae , which are required for surface attachment ( Pesavento et al. , 2008 ) . 
+ Circumstantial evidence has previously implicated NO as a regulator of chemotaxis , motility and bioﬁlm development . 
+ Haem-containing NO-binding domains of methyl accepting chemotaxis proteins have been characterized ( Karow et al. , 2004 ; Nioche et al. , 2004 ) , although the prediction that these proteins mediate taxis towards or away from NO has not been tested . 
+ In transcriptomic experiments , the expression of some motility genes has been observed to be perturbed by exposure of cultures to sources of NO or nitrosative stress ( imposed by S-nitrosothiols ) , although both positive and negative responses have been reported , and the regulators involved were not identiﬁed ( Bourret et al. , 2008 ; Constantinidou et al. , 2006 ; Jarboe et al. , 2008 ) . 
+ In the nonpathogenic organism Nitrosomonas europaea , NO stimulates bioﬁlm formation ( Schmidt et al. , 2004 ) . 
+ In Azotobacter vinelandii , expression of the ﬂhDC genes ( which encode the master regulator of motility ) is negatively regulated by the oxygen-sensor CydR , an orthologue of the E. coli FNR protein ( León and Espín , 2008 ) . 
+ CydR is sensitive to NO ( Wu et al. , 2000 ) , suggesting that exposure to NO might stimulate motility in A. vine-landii via increased expression of ﬂhDC . 
+ In Pseudomo-nas aeruginosa and Staphylococcus aureus , NO inhibits bioﬁlm formation or stimulates dispersal , and NO stimulates motility in P. aeruginosa ( Barraud et al. , 2006 ; Van Alst et al. , 2007 ; Schlag et al. , 2007 ) . 
+ A molecular mechanism which accounts for the effects of NO on bioﬁlm development or motility in these organisms has not previously been described , though there has been some speculation about the regulatory proteins involved that might act as receptors for NO ( Romeo , 2006 ) . 
+ The mechanism we propose in this paper may not be applicable to P. aeruginosa and S. aureus , because those species do not have obvious orthologues of NsrR . 
+ Nevertheless , we suggest that other NO sensing transcriptional regulators ( Spiro , 2007 ; Rodionov et al. , 2005 ) might play an equivalent role in these cases . 
+ Experimental procedures
+ Strains, media and growth conditions
+ The strains and plasmids used in this work are listed in Table 4 . 
+ The rich medium was L Broth ( tryptone , 10 g l-1 yeast extract , 5 g l-1 ; NaCl , 5 g l-1 ) . 
+ A mineral salts medium ( Spencer and Guest , 1973 ) supplemented with glucose ( 0.5 % and 0.2 % , w/v , for anaerobic and aerobic cultures respectively ) , casamino acids ( 0.05 % , w/v ) and thiamine ( 5 mg ml-1 ) was used for growth of cultures for b-galactosidase assays . 
+ Ampicillin ( 100 mg ml-1 ) and kana-mycin ( 25 mg ml-1 ) were added as required . 
+ Cultures were grown aerobically or anaerobically as previously described ( Bodenmiller and Spiro , 2006 ) . 
+ For b-galactosidase assays ( Miller , 1992 ) , aerobic cultures were treated with 50 mM spermine-NONOate , and anaerobic cultures with 5 mM nitrite when in early exponential phase ( OD600 = 0.15 -- 0.3 ) , then were assayed 90 min later while still in log phase . 
+ Spermine-NONOate liberates two equivalents of NO with a half-life of 39 min at 37 °C ( Cayman Chemicals ) , and under our culture conditions caused little or no growth inhibition at this concentration . 
+ The mqsR , ﬂiA and ﬂiL promoter regions were ampliﬁed by PCR ( all primer sequences are available from the authors on request ) and fused to lacZ in pRS415 , transferred to lRS45 , and integrated into the chromosome as described previously ( Simons et al. , 1987 ; Bodenmiller and Spiro , 2006 ) . 
+ Genes were disrupted by replacing the coding region with a l kanamycin-resistance cassette using the red recombinase method with pKD4 as the template ; mutations were converted to unmarked deletions using pCP20 ( Datsenko and Wanner , 2000 ) . 
+ The nsrR plasmid pJP07 is a derivative of p2795 ( Husseiny and Hensel , 2005 ) and has been described previously ( Rankin et al. , 2008 ) . 
+ NsrR with a C96S substitution was expressed from the equivalent plasmid pJP09 . 
+ The ﬂiA promoter ( on a 474 bp fragment in pSTBlue ) was mutated in the putative NsrR-binding site , and the mutant promoter was fused to lacZ in pRS415 as described above . 
+ Sitedirected mutants were introduced using the QuickChange Site-Directed Mutagenesis Kit ( Stratagene ) according to manufacturer 's instructions . 
+ For random mutagenesis , pGIT1 ( Bodenmiller and Spiro , 2006 ) was mutagenized with the GeneMorph PCR mutagenesis kit ( Stratagene ) according to the manufacturer 's instructions . 
+ After mutagenesis , plasmid DNA was transformed into strain JOEY19 ( lytfE-lacZ ) , and transformants were screened on L agar containing Xgal . 
+ Colonies with a white or pale blue phenotype were selected , plasmid DNA was puriﬁed and the sequence of the ytfE fragment determined . 
+ Clones with multiple mutations were not further analysed . 
+ Mutant DNAs of interest generated by random or site-directed mutagenesis were cloned into pRS415 , and integrated on to the chromosome at the lambda attachment site , as previously described . 
+ ChIP and ChIP-chip
+ For ChIP analysis of the ytfE and ﬂiA promoters , chromatin was precipitated from cultures of strains expressing 3XFlagtagged NsrR ( or an untagged control ) and transformed with pSTBlue-1 derivatives containing wild type or mutant ytfE or ﬂiA sequences . 
+ Precipitated DNAs were puriﬁed and equal amounts of templates ( 1 ng for ytfE , 2 ng for ﬂiA ) were ampli-ﬁed by 16 cycles of PCR with primers ﬂanking the cloning site in pSTBlue . 
+ ChIP-chip was performed and data analysed as described previously ( Efromovich et al. , 2008 ) . 
+ Chromatin immunoprecipitation and microarray data have been depos-ited in the GEO database ( accession GSE11230 ) . 
+ Motility and attachment assays
+ Motility was assayed on soft agar swim plates inoculated with 4 ml of an exponential phase ( OD600 ~ 0.5 ) culture . 
+ The plates contained 1 % tryptone , 0.25 % NaCl , 0.3 % Difco agar , and antibiotics as required . 
+ The plates were incubated in a wet box for 20 h at 30 °C . 
+ Surface attachment to 16 mm glass tubes was assayed in standing cultures grown for 24 h at 30 °C in L broth , using the crystal violet staining method ( Pratt and Kolter , 1998 ) . 
+ For treatment with NO , the plates or standing cultures were supplemented with 100 mM ( for K12 strains ) or 250 mM 
+ ( for CFT073 ) diethylenetriamine-NONOate , which liberates two equivalents of NO with a half-life of 20 -- 56 h under the conditions of these experiments ( Cayman Chemicals ) . 
+ Acknowledgements
+ We are grateful to Sandy Parkinson , Harry Mobley , Michael Hensel , Barry Wanner and Valley Stewart for generously providing strains and plasmids , to Gladys Alexandre , Sam Efromovich , Mike Manson and David Grainger for helpful discussions , and to Ray Dixon for comments on the manuscript . 
+ This work was supported in part by Grant MCB-0702858 from the National Science Foundation .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/19706412.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/19706412.txt 0 → 100644
View file @27818a9
+ Rho directs widespread termination of intragenic
+ The transcription termination factor Rho is a global regulator of RNA polymerase ( RNAP ) . 
+ Although individual Rho-dependent terminators have been studied extensively , less is known about the sites of RNAP regulation by Rho on a genome-wide scale . 
+ Using chromatin immu-noprecipitation and microarrays ( ChIP-chip ) , we examined changes in the distribution of Escherichia coli RNAP in response to the Rho-speciﬁc inhibitor bicyclomycin ( BCM ) . 
+ We found 200 Rho-terminated loci that were divided evenly into 2 classes : intergenic ( at the ends of genes ) and intragenic ( within genes ) . 
+ The intergenic class contained noncoding RNAs such as small RNAs ( sRNAs ) and transfer RNAs ( tRNAs ) , establishing a previously unappreciated role of Rho in termination of stable RNA synthesis . 
+ The intragenic class of terminators included a previously uncharacterized set of short antisense transcripts , as judged by a shift in the distribution of RNAP in BCM-treated cells that was opposite to the direction of the corresponding gene . 
+ These Rho-terminated antisense transcripts point to a role of noncoding transcription in E. coli gene regulation that may resemble the ubiquitous noncoding transcription recently found to play myriad roles in eukaryotic gene regulation . 
+ Jason M. Petersa,b, Rachel A. Mooneya, Pei Fen Kuanc, Jennifer L. Rowlanda, Sündüz Keleşc,d, and Robert Landicka,e,1
+ Departments of aBiochemistry, bGenetics, cStatistics, dBiostatistics and Medical Informatics, and eBacteriology, University of Wisconsin, Madison, WI 53706 Edited by Jeffrey W. Roberts, Cornell University, Ithaca, NY, and approved July 16, 2009 (received for review April 8, 2009)
+ T ranscription termination is critical for maintaining control over gene expression . 
+ Bacteria employ 2 distinct types of termination : ( i ) intrinsic termination , for which a GC-rich RNA hairpin followed by a U-tract dissociates RNA polymerase ( RNAP ) without the need for accessory proteins , and ( ii ) factor-dependent termination caused by the Rho protein . 
+ Rho was originally identified as a factor that increased the `` accuracy '' of in vitro transcription by terminating RNAP at specific positions on a bacteriophage DNA template ( 1 ) . 
+ Later , Rho was found to be the cause of polarity , whereby the uncoupling of transcription and translation by premature stop codons decreases gene expression of downstream genes in an operon ( 2 ) . 
+ Rho is a homohexameric protein with RNA-dependent ATPase activity ( 3 ) . 
+ Rho binds to the nascent RNA and translocates 5 to 3 along RNA using energy derived from ATP hydrolysis ( 4 ) . 
+ At certain sites , Rho contacts RNAP , and terminates the elongation complex ( EC ) by an unknown mechanism ( 5 ) . 
+ Bicyclomycin ( BCM ) is a specific inhibitor of Rho ( 6 ) . 
+ BCM blocks Rho-dependent termination in vivo ( 7 ) and in vitro ( 8 ) through noncompetitive inhibition of the RNA-dependent AT-Pase activity of Rho ( 9 ) . 
+ Biochemical and structural analyses show that BCM binds adjacent to the ATPase of Rho ( 8 ) and prevents ATP hydrolysis by interfering with a key glutamic acid residue that is involved in catalysis ( 10 ) . 
+ Treatment of wild-type E. coli K-12 with high concentrations of BCM is lethal ( 6 ) , because rho is an essential gene ( 11 ) . 
+ However , sublethal doses of BCM are sufficient to perturb Rho termination in vivo ( 7 ) . 
+ Genome-wide studies have documented the role of Rho as a global regulator of RNAP . 
+ Chromatin immunoprecipitation assays using tiling microarrays ( ChIP-chip ) revealed remarkably similar global distributions of RNAP and Rho on DNA ( 12 ) . 
+ These similar distributions suggest that Rho contacts ECs soon after initiation , interacts with ECs throughout elongation , and interacts with ECs on nearly all transcription units ( TUs ) , rather than having specificity for a small set of genes . 
+ Cardinale et al. ( 13 ) used expression array analysis to gauge the effect of BCM treatment on mRNA levels . 
+ Their findings showed changes in abundance of a subset of transcripts , particularly for genes integrated into the genome by horizontal transfer . 
+ Thus , Rho termination occurs preferentially on a subset of genes , even though its physical distribution is wide-spread . 
+ However , the specific locations of BCM-inhibited Rho-dependent terminators have not yet been determined . 
+ We used ChIP-chip to examine changes in the distribution of RNAP in response to Rho inhibition by BCM . 
+ We found 200 Rho-terminated loci where BCM shifted the distribution of RNAP downstream of the apparent termination site . 
+ Half of the Rho-dependent terminators were located at the 3 ends of genes ( intergenic ) , including small RNAs ( sRNAs ) and transfer RNAs ( tRNAs ) . 
+ The other half were found within the coding sequence of annotated genes ( intragenic ) . 
+ For one set of intragenic terminators , the readthrough event was in the opposite direction of the gene , indicating antisense transcription . 
+ Results
+ BCM Alters the Distribution of RNAP . 
+ To determine the contribution of Rho to the genome-wide distribution of RNAP , ChIP was performed on cells grown in the presence or absence of BCM at 20 g/mL . 
+ This concentration of BCM was chosen because it did not alter the growth rate of cells under the conditions used in these experiments ( 14 ) , and thus limited the potential indirect effects that could result from inhib-iting Rho . 
+ DNAs from ChIP experiments targeting the or subunit of RNAP and `` input '' genomic DNA were differentially labeled with Cy3 and Cy5 dyes , then hybridized to a tiling microarray ( see Materials and Methods ) , revealing the RNAP distribution in BCM-treated and untreated conditions . 
+ Independent biological replicates showed good agreement ( Pearson 's r 0.9 ) . 
+ Changes in the distribution of RNAP upon BCM treatment were readily apparent by visual inspection of the data , and were quantified statistically . 
+ A moving average method implemented in the program CMARRT ( 15 ) was used to identify regions where at least 3 consecutive probes exhibited increased ChIP-chip signal in BCM-treated cells versus untreated cells ( see Materials and Methods ; no BCM-induced reductions in RNAP occupancy were detected ) . 
+ This analysis revealed a total of 199 BCM significant regions ( BSRs ) dispersed throughout the E. coli K-12 chromosome . 
+ Most of the probes with increased ChIP-chip signal in BCM-treated cells were within BSRs , but they represented only a small percentage of the total probes ( 3 % ; Fig. 1A ) . 
+ This suggests that the effects of BCM were mostly direct consequences of Rho inhibition rather than a large-scale redistribution of RNAP in response to cellular stresses or other pleiotropic effects . 
+ The BSR dataset was compared to previously characterized Rho-dependent terminators , which confirmed that BCM effectively inhibited Rho in our experiment . 
+ For instance , the rho gene is autoregulated 
+ Author contributions : J.M.P. , R.A.M. , and R.L. designed research ; J.M.P. , R.A.M. , and J.L.R. performed research ; J.M.P. , P.F.K. , and S.K. analyzed data ; and J.M.P. , S.K. , and R.L. wrote the paper . 
+ The authors declare no conﬂict of interest . 
+ This article is a PNAS Direct Submission . 
+ Data deposition : The data reported in this paper have been deposited in the Gene ExpressionOmnibus ( GEO ) database , www.ncbi.nim.nih.gov/geo ( accessionno.GSE16562 ) . 
+ 1To whom correspondence should be addressed . 
+ E-mail : landick@bact.wisc.edu . 
+ This article contains supporting information online at www.pnas.org/cgi/content/full/ 0903846106/DCSupplemental . 
+ by a Rho-dependent terminator immediately upstream of its coding sequence ( 16 ) . 
+ As expected , a BSR was found at the rho locus just after the rhoL gene ( Fig. 1B ) . 
+ In untreated cells , ChIP-chip signal for RNAP was highest at the rho promoter , situated just upstream of rhoL . 
+ After the rhoL gene , the signal for RNAP decreased , indicative of Rho-dependent termination . 
+ When BCM was used to inhibit Rho function , however , the RNAP signal remained high throughout the Rho-dependent terminator region and gradually decreased across the rho gene , indicating readthrough of the rhoL terminator . 
+ Our findings also are broadly consistent with effects of BCM on global mRNA expression reported by Cardinale et al. ( 13 ) , but provide high-resolution positional information that could not be accessed through mRNA expression analysis alone . 
+ Based on genomic position , approximately half of all BSRs were located within 300 bp of an expression array probeset that was upregulated at least 2-fold in mRNA expression ( supporting information ( SI ) Fig . 
+ S1 ) . 
+ However , the mRNA expression analysis did not detect a large fraction of the BSRs identified in our dataset ( 49 % ) . 
+ The lower resolution of the expression array data relative to the tiling 
+ BSR type tRNAa sRNAb mRNAc Total K-12 speciﬁcd Prophaged cNumber of BSRs associated with annotated mRNA genes . 
+ dNumber of BSRs associated with E. coli K-12-speciﬁc genes or prophageDNA ( ASAP Database , http://www.genome.wisc.edu/tools/asap.htm ) . 
+ eDirectionality was not determined . 
+ array-based ChIP-chip data likely explains this discordance , although differences in experimental growth conditions could also contribute . 
+ Importantly , the ChIP-chip-derived BSR data define the locations at which Rho-dependent termination normally occurs . 
+ To understand the roles of these Rho-dependent terminators , we next sought to associate each BSR with a specific gene . 
+ Although ChIP-chip experiments do not provide strand information per se , the `` directionality '' of terminator readthough was used to assess the orientation of RNAP on DNA . 
+ An example of directionality can be found at the rho locus ( Fig. 1B ) . 
+ The distribution of RNAP shifts to the right downstream of rhoL in ChIP-chip data from BCM-treated cells compared with untreated cells . 
+ Therefore , the terminator must be on the `` plus '' strand at the 3 end of rhoL . 
+ This logic was extended to assign each BSR to a particular gene ( Table S1 ) . 
+ When directionality could not be determined ( as was the case in 15 BSRs ) , the BSR was assigned to the gene that contained the majority of significant probes for that BSR . 
+ Quantitative PCR of ChIP DNA was used to confirm the array results at 3 of the BSR-associated loci ( rho , valVW , and rygD ; Fig . 
+ S2 ) . 
+ Our analysis revealed a diverse set of Rho targets in the E. coli genome ( Figs. 1C and 2 , and Table 1 ) . 
+ Half of the targets ( n 102 ) were after genes ( intergenic targets ) , where Rho would be expected to terminate transcription . 
+ Most of these followed protein-coding genes ( 83 mRNAs ) , but 12 followed tRNA genes and 7 followed sRNA genes . 
+ However , the other half ( n 97 ) were within coding regions ( intergenic targets ) , including 25 that could be assigned to antisense transcripts . 
+ This distribution suggests that Rho plays important roles in E. coli transcription in addition to termination at the ends of operons or mediation of polarity . 
+ Rho Termination at tRNAs . 
+ Many tRNA operons appeared to be terminated by Rho . 
+ Of the 36 tRNA-containing TUs located outside of rrn ( and thus subject to termination ) , 12 had a BSR immediately downstream of the mature 3 end of the last tRNA in the TU ( Table S2 ) . 
+ Rho termination had been previously demonstrated in vivo and in vitro at one of these tRNA TUs ( tyrTV ) ( 17 ) . 
+ Two tRNA loci that show the effects of BCM treatment on the distribution of RNAP are valVW and thrW ( Fig. 3A and Fig . 
+ S3 ) . 
+ Although the RNAP ChIP-chip signal is restricted to the tRNA operon itself in untreated cells , BCM treatment caused the distribution of RNAP to extend downstream past the presumed Rho-dependent termination point . 
+ The ChIP-chip signals on tRNA operons without significant BCM effects , such as lysT-valT-lysW-valZ-lysYZQ , were qualitatively and quantitatively distinct from tRNA operons affected by BCM ( cf. Fig. 3A and Fig . 
+ S3 to Fig. 3B ) . 
+ To determine the distinctions between tRNA operons that were affected by BCM and those that were not , we analyzed the sequence within and surrounding the BSR . 
+ The number of tRNAs in an operon , and the direction of transcription in genes downstream of the operon , had no relationship with Rho termination ( Table S2 , Fig. 3A , and Fig . 
+ S3 ) . 
+ Additionally , no obvious `` termination sequence '' could be ascribed to tRNA BSRs using motif-finding algorithms ( e.g. , MEME ; http://meme.sdsc.edu/meme4/ ) . 
+ However , the first 50 nucleotides after the mature 3 end of the tRNA differed significantly in GC content for tRNAs affected by BCM ( Table S3 ) . 
+ Although these sequences were only 25 % C on average , they were significantly more enriched for C than their non-Rho terminated counterparts ( Student 's t test , P 0.01 ) and significantly depleted in G . 
+ The average G content was only 12 % within the first 50 nucleotides after these tRNAs , which was highly significant compared with tRNAs without corresponding BSRs ( Student 's t test , P 0.001 ) . 
+ These patterns are consistent with previous studies that noted a bias toward C and away from G in cases of Rho-dependent polarity after premature stop codons ( 18 ) . 
+ Unsurprisingly , the feature that most distinguished tRNA operons affected by BCM from those that were not was the presence or absence of a putative intrinsic terminator hairpin RNA structure . 
+ Of the 24 tRNA operons that lacked associated BSRs , 22 ( 92 % ) encoded putative intrinsic terminator hairpin structures and corresponding U-tracts within 150 bp of the 3 end of the tRNA . 
+ Potential hairpins were identified by examining the RNA secondary structure in silico using the mfold algorithm ( 19 ) ( Table S2 ) . 
+ The 2 exceptions lacking both a BSR and putative intrinsic terminator were ileY and the thrU-tyrU-glyT-thrT operon . 
+ The ileY tRNA gene produced very little RNAP ChIP-chip signal in both BCM-treated and untreated conditions , and likely fell below the limits of detection . 
+ The thrU-tyrU-glyT-thrT operon is known to be cotranscribed with the downstream tufB gene ( 20 ) . 
+ Although a small drop in RNAP ChIP-chip signal occurred between thrT and tufB , apparently , the majority of ECs were not terminated . 
+ Eleven of the 12 ( 92 % ) tRNA operons with an associated BSR lacked putative intrinsic terminator hairpin structures . 
+ The exception , asnU , contained a putative RNA structure that resembled an intrinsic terminator hairpin despite being affected by BCM treatment ( Table S2 ) . 
+ However , the purported terminator contained an unpaired A residue between the hairpin stem and U-tract . 
+ Systematic substitutions of U-tract residues with A in the canonical pyrBI intrinsic terminator revealed that mutations closer to the hairpin stem caused progressively greater termination defects ( although the first U of the U-tract was not tested ) ( 21 ) . 
+ Also , weakening the base of the hairpin stem reduces termination markedly ( 22 ) . 
+ Therefore , this deviation from a canonical intrinsic terminator would likely disrupt the function of the terminator hairpin . 
+ This finding raises the intriguing possibility that Rho-dependent termination is a `` default '' termination pathway in E. coli , taking over when intrinsic terminator hairpins are disrupted by mutation or removed by horizontal transfer events . 
+ Rho Termination of sRNA Synthesis . 
+ Genes in a second class uncovered in the BSR analysis encoded known sRNAs . 
+ Seven annotated sRNA genes were found to have BSRs associated with the 3 end of the gene ( Fig. 1C ) . 
+ Two types of Rho-dependent terminators were found at sRNAs . 
+ The first type was primarily involved in sRNA 3 end formation . 
+ The rygD gene ( also known as sibD ) produces a noncoding stable RNA product that regulates the toxicity of the short , hydrophobic IbsD protein ( 23 ) . 
+ An extension in the distribution of RNAP at rygD is seen in the presence of BCM , indicating that this sRNA is terminated by Rho ( Fig . 
+ S4 ) . 
+ The second type of Rho-dependent terminator found at sRNAs appeared to play a role in the regulation of downstream genes . 
+ The sroG sRNA is situated in between the promoter and protein coding sequence of the ribB gene , which is involved in riboflavin synthesis ( 24 ) . 
+ Although the exact function of the SroG RNA has not been demonstrated experimentally , sequence alignments suggest that it contains a flavin mononucleotide ( FMN ) binding riboswitch known as an RFN ( ribo-flavin ) element ( 25 ) . 
+ Based on the absence of an intrinsic terminator hairpin , and complementarity between the Shine-Dalgarno ( SD ) of ribB and upstream sequences in the RNA , the riboswitch contained in sroG was proposed to operate by blocking translation of ribB in conditions of high FMN concentration ( 25 ) . 
+ Interestingly , a BSR occurred at the 3 end of sroG , implicating Rho-dependent termination as a mechanism for tightening the regulation of this riboswitch ( Fig. 3C ) . 
+ The ribB transcript , when left untranslated , is logically a good substrate for Rho action , and termination by Rho would prevent synthesis of the full-length ribB mRNA . 
+ This would ensure that RibB protein could not be produced , even if SD pairing is lost by FMN release from the riboswitch . 
+ This system is similar to the Bacillus subtilis trp operon , where Rho termination occurs after translation initiation is blocked by a hairpin that occludes the SD of trpE ( 26 ) . 
+ Our findings indicate that Rho termination at sRNAs can be involved both in 3 - end formation and in the mechanism by which sRNAs regulate their target genes . 
+ Just 7 of the 80 known sRNAs are terminated by Rho . 
+ Previous studies have identified sRNAs by searching for promoter-intrinsic terminator pairs in intergenic regions ( see ref . 
+ 27 ) , suggesting that only a fraction of Rho-terminated sRNAs have been discovered . 
+ Therefore , identifying Rho-dependent terminators with associated promoters could function as an additional method for finding novel sRNAs . 
+ Rho Inhibition Reveals Antisense Transcription . 
+ Although half of the BSRs were found at the 3 ends of genes , as would be predicted if Rho functions to terminate RNAP at the ends of TUs ( intergenic ) , the other half were located within genes ( intragenic ) . 
+ In many cases , we found that the directionality of intragenic terminator readthrough was opposite to the direction of the annotated gene ( Fig. 4 A and B ) . 
+ These observations were indicative of antisense transcription by RNAP . 
+ In total , we found 25 instances of antisense transcription in the BSR dataset , 24 of which were previously uncharacterized transcripts ( Table S4 ) . 
+ A majority ( 17/25 ) of the antisense transcripts had an associated 70 peak in ChIP-chip data from Mooney et al. ( 12 ) that indicated a putative promoter for the transcript . 
+ We estimated the approximate lengths of the antisense transcripts by finding the distance between the start of the BSR and the midpoint of its associated 70 peak . 
+ The average antisense transcript length was 456 nt . 
+ This number likely overstates the transcript length , because the same analysis applied to tRNAs overestimated their lengths by 50 -- 150 nucleotides . 
+ Thus , the average length of antisense transcripts found in this study falls within the range of 50 to 400 nt typically assigned to sRNAs ( 28 ) . 
+ Reppas et al. ( 29 ) had previously identified an antisense transcript on the opposite strand of the eutB gene . 
+ This transcript is also apparent in the BSR dataset due to the directionality of terminator readthrough ( Fig. 4A ) , and a corresponding peak in 70 ChIP-chip data suggests a promoter location for the transcript . 
+ An example of a previously uncharacterized antisense transcript found in the BSR dataset lies within the cryptic bgl operon on the opposite strand of the bglF gene ( Fig. 4B ) . 
+ The ambiguous directionality of the 70 and RNAP peaks in bglF was made clear by readthrough of an antisense , Rho-dependent terminator in BCM-treated cells , establishing the existence of antisense transcription in bglF . 
+ Our finding of 100 intragenic Rho-dependent terminators shows that transcription in E. coli is much more complex than previously envisioned , with many transcripts terminated within coding sequences and a greater amount of antisense transcription . 
+ Intragenic terminators are associated with both sense and antisense transcription . 
+ Intragenic antisense transcripts terminated by Rho represent a mostly uncharacterized group of RNAs with unknown functions . 
+ Intragenic sense Rho-dependent terminators may be associated with transcriptional attenuation ( 30 ) , premature termination due to failed translation , or synthesis of sRNAs that lie within larger genes . 
+ Discussion
+ Our findings lead to 3 insights into the role of Rho in global gene regulation . 
+ First , Rho terminates synthesis of small noncoding RNAs , including tRNAs to a much greater extent than previously realized . 
+ This is significant because the extensive structure of such RNAs is thought to inhibit Rho binding . 
+ Second , Rho terminates synthesis of intragenic transcripts , including antisense transcripts , of unknown function . 
+ Many of these likely represent previously uncharacterized , noncoding transcripts in E. coli . 
+ Finally , the strong effect of Rho on horizontally transferred genes may reflect the propensity of such genes to insert at tRNA-encoding loci , rather than Rho-targeting of foreign DNA per se . 
+ Genome Position Fig. 4 . 
+ Rho inhibition reveals antisense transcription . 
+ ( A ) An antisense transcript within the eutB gene was detected based on the direction of terminator readthrough by RNAP in BCM-treated cells . 
+ 70 ChIP-chip data ( orange ) from Mooney et al. ( 12 ) suggests a promoter location for the antisense transcript . 
+ ( B ) Unique antisense transcription within the bglF gene . 
+ Colors , labels , and data smoothing are as described in Fig. 1B , except that putative antisense transcripts are represented by red arrows . 
+ The BSR dataset is likely to reveal only a subset of Rho-dependent terminators in E. coli . 
+ Detection of Rho terminators by ChIP-chip requires sufficient occupancy of RNAP before the terminator to see the readthrough event . 
+ For instance , the well-characterized Rho-dependent trp t terminator ( 31 ) was barely discernable , and failed to meet the statistical cutoff due to low RNAP signal at the 3 end of the trp operon . 
+ Many condition-specific Rho terminators also were likely missed ( e.g. , the tnaC terminator in the catabolite-repressed tna operon ) ( 32 ) . 
+ Finally , cryptic Rho terminators that occur only when transcription and translation are uncoupled would not be found because translation should be efficient under our assay conditions . 
+ To estimate the total extent to which Rho terminates mRNA synthesis , we examined 109 high-quality TUs for which the RNAP ChIP-chip signal was significantly above background and could readily be distinguished from adjacent TUs ( 12 ) . 
+ Of these 109 TUs , 18 were associated with intergenic BSRs , indicating that 17 % of these TUs are terminated at their 3 ends by Rho . 
+ We extrapolated this percentage out to the total predicted number of TUs in E. coli ( 2,271 ) ( 33 ) , which gave 386 as the estimated number of intergenic Rho-dependent terminators . 
+ Based on this estimate , Rho-dependent termination is likely to account for 20 % of the total mRNA 3 - end formation in E. coli , rather than the 50 % estimate that is often cited ( 34 ) . 
+ We note that the 50 % estimate does not appear to be based on a genome-scale analysis . 
+ Rho-Dependent Termination and Stable RNA Synthesis . 
+ Stable RNA transcripts are surprising substrates for Rho action because they are typically highly structured , whereas Rho is thought to bind unstructured RNA . 
+ However , ChIP-chip analysis reveals Rho occupancy across most TUs , including sRNAs , tRNAs , and rRNAs ( 12 ) . 
+ Thus , Rho appears capable of association with structured transcripts , consistent with our finding that Rho terminates these transcripts . 
+ Rho-dependent termination of tRNA and sRNA transcripts is also unexpected because Rho generates heterogeneous transcript 3 ends that would seemingly be problematic for the function of these RNAs . 
+ Extraneous 3 nucleotides may interfere with folding or enzymatic modifications of stable RNAs , many of which require specific secondary structures for biological activity . 
+ However , extra 3 nucleotides can be removed by multiple 3 3 5 exonucleases that exist in E. coli . 
+ For instance , a 3 tail on the Rho-terminated valVW tRNA transcript becomes detectable only in a pnp rnb double mutant , implying that 3 ends generated by Rho termination are rapidly degraded by redundant 3 3 5 exonucleases encoded by pnp and rnb ( 35 ) . 
+ Thus , heterogeneous 3 tails generated by Rho can be readily removed to avoid interfering with RNA function . 
+ Rho inhibition had no effect on RNAP occupancy of rRNA TUs , whereas the recent proposal that Rho removes paused RNAPs predicts a 3 - proximal decrease ( 36 ) . 
+ However , the lower level of rrn transcription expected for minimal medium could preclude detection of this effect . 
+ Rho Terminates a Set of Antisense Transcripts of Unknown Function in E. coli . 
+ We find that Rho is involved in termination of a set of antisense transcripts with unknown function . 
+ These antisense transcripts are likely to be noncoding , because the protein-coding sequence on the opposite strand greatly constrains the sequence of the antisense RNA . 
+ The 25 antisense transcripts we detected likely represent only a small fraction of a larger set of similar antisense transcripts in E. coli . 
+ As noted previously for intergenic transcripts , our method will miss a significant number on which the effect of BCM fails to generate a BSR . 
+ Additionally , to be detectable , antisense TUs within genes must also generate signals significantly above the level of the corresponding genic TU . 
+ Thus , E. coli likely possesses a large set of intragenic , antisense TUs of which the 25 we detected are only a limited , highly transcribed subset . 
+ Some antisense transcripts may encode small RNAs with specific regulatory functions . 
+ For instance , antisense RNAs are known to block translation by pairing to a sense transcript ( e.g. , RyhB and IS10 in bacteria ) , to block formation of persistent RNA-DNA hybrids ( e.g. , RNAI in ColE1-type plasmids ) , or to interfere with sense transcription during their synthesis ( 37 , 38 ) . 
+ Some of these transcripts could conceivably produce sRNAs with functions unrelated to the genes within which they are embedded . 
+ However , the possibility that some intragenic transcripts result from `` transcriptional noise '' must be considered . 
+ The involvement of Rho is itself compellingly analogous to some types of noncoding transcription in eukaryotes . 
+ Bacterial RNAP and eukaryotic RNAPII are both terminated by at least 2 distinct pathways . 
+ In many bacteria , intrinsic termination appears to be the dominant mechanism for termination of mRNA synthesis ; indeed , our findings suggest Rho terminates only a minority of full-length E. coli mRNAs . 
+ RNAPII termination is coupled to transcript cleavage and polyadenlyation for most mRNAs ( 39 ) , but can instead occur by the Nrd1/Nab3/Sen1-dependent pathway for small nuclear RNAs ( snRNAs ) , small nucleolar RNAs ( snoRNAs ) , and some short mRNAs ( 40 ) . 
+ Sen1 contains an ATP-dependent , 5 3 3 RNA/DNA helicase activity and may function similarly to Rho ( 41 ) . 
+ Thus , Rho-dependent termination in bacteria appears to be analogous to Sen1-dependent termination in eukaryotes . 
+ The Nrd1/Nab3/Sen1 pathway is implicated in the termination of cryptic unstable transcripts ( CUTs ) that become detectable in S. cerevisiae mutants defective for nuclear RNA degradation ( 42 ) . 
+ Similar to pervasive noncoding transcription in other eukaryotes , the biological function of CUTs is unknown ; however , CUTs may simply reflect transcriptional noise that is an unavoidable consequence of robust gene expression , and the Nrd1/Nab3/Sen1 pathway may play a role in `` genome surveillance '' by suppressing them . 
+ Given the similarities of Rho-dependent and Sen1-dependent termination , one possibility is that at least some antisense transcription terminated by Rho in bacteria may also reflect transcriptional noise . 
+ Rho-Dependent Termination and Horizontal Transfer . 
+ Our findings are consistent with a connection between Rho and suppressed expression of horizontally transferred , `` foreign '' genes ( 13 ) , but suggest an indirect mechanism underlies this relationship . 
+ The connection is evident from the significant association of BSRs with E. coli K-12 genes lacking homologs in E. coli 0157 : H7 EDL 933 ( Mann-Whitney U test ; P 0.001 ) . 
+ However , specific targeting of Rho to AU-rich RNA in horizontally transferred genes ( 13 ) can be ruled out , as the global distribution of Rho lacks bias toward any particular set of TUs ( 12 ) . 
+ Three non-mutually exclusive ideas may explain why Rho termination is associated with `` foreign '' DNA . 
+ First , foreign genes acquired from distantly related organisms may not be adapted to the E. coli translation apparatus , allowing Rho to act on poorly translated RNAs . 
+ Second , some foreign DNA may contain specific Rho-dependent terminators . 
+ For instance , the rac prophage contains the Rho-dependent timm terminator upstream of the lethal kilR gene ( 13 ) ; BCM causes readthrough of timm and the appearance of a BSR ( Table S1 ) . 
+ Third , foreign DNA may preferentially insert into active TUs , and thereby produce readthrough transcription into foreign DNA that is terminated by Rho . 
+ Of the 63 E. coli K-12-specific genes associated with a BSR , 24 are inserted into active TUs at which Rho terminated transcription into the horizontally transferred DNA . 
+ This phenomenon is apparent at tRNA operons terminated by Rho ( Fig . 
+ S5 ) . 
+ Half the tRNAs terminated by Rho have associated BSRs that read into E. coli K-12-specific genes or prophage elements ( Fig. 1C and Table S1 ) . 
+ Indeed , the majority of prophages and other horizontally transferred elements in Gammaproteobacteria encode integrases that specifically target tRNAs genes as attachment sites ( 43 ) . 
+ Thus , horizontally transferred elements may integrate into the chromosome by disruption of tRNA genes , causing loss of their intrinsic terminators . 
+ In such cases , Rho can supply an alternate termination mechanism to prevent transcription of potentially toxic foreign genes from the tRNA gene promoter . 
+ Williams ( 43 ) categorized horizontally transferred elements that use tRNA as insertion points across several species of proteobacteria and Gram-positive bacteria . 
+ Of the 54 characterized horizontally transferred elements that insert into tRNA , 22 ( 41 % ) lacked an intrinsic terminator within 400 bases of the mature 3 end of the tRNA ( 43 ) . 
+ These data suggest that Rho termination provides a general mechanism for guarding the borders of tRNA transcription against the deleterious consequences of foreign gene expression in a diverse set of bacteria . 
+ Conclusion . 
+ Rho-dependent termination plays many roles in bacterial transcription , including generation of full-length mRNA 3 ends ( 1 ) , establishment of polarity ( 2 ) , resolution of extended RNA-DNA hybrids ( 44 ) , and protection of cells from harmful expression of foreign genes ( 13 ) . 
+ Our findings suggest Rho plays additional , and possibly more significant , roles by halting RNA chain elongation in a previously uncharacterized antisense transcriptome and by terminating synthesis of stable RNAs , including tRNAs and sRNAs . 
+ Rho-dependent termination is especially well suited for halting antisense transcription . 
+ The stringent sequence requirements of intrinsic terminators would be incompatible with a pro-tein-coding gene on the opposite strand . 
+ In contrast , Rho-dependent terminators exhibit modest sequence specificity ( C enriched and G depleted ) , which would place few limitations on codon usage in a protein-coding gene . 
+ Taken together , these data suggest Rho may play a principal role in halting transcription at locations where intrinsic terminators could not readily evolve ( e.g. , horizontally transferred DNA and antisense transcripts ) . 
+ Thus , further study of the targets of Rho may help elucidate the scope of the noncoding transcriptome of E. coli . 
+ Materials and Methods
+ Growth Conditions . 
+ E. coli K-12 MG1655 was grown in MOPS minimal medium containing 0.2 % glucose at 37 °C with vigorous agitation in the presence or absence of 20 g/mL BCM ( 12 ) . 
+ BCM was obtained from Fujisawa Pharmaceutical Co. . 
+ ChIP-Chip . 
+ ChIP-chip assays were performed as previously described ( 12 ) . 
+ Brieﬂy , cells were grown to an apparent OD600 of 0.4 and cross-linked by the addition of formaldehyde at 1 % ﬁnal concentration with continued shaking at 37 °C for 5 min before quenching with glycine ( 100 mM ﬁnal ) . 
+ Cells were then lysed and DNAs were sheared by sonication followed by treatment with micrococcal nuclease and RNase A. RNAP crosslinked to DNA was immunoprecipitated using antibodies against either the or subunit ( antibodies 8RB13 and NT73 , respectively ; Neoclone ) using Sepharose protein A and G beads . 
+ Enriched ChIP DNA and input DNA were ampliﬁed by linker-mediated PCR ( 45 ) and processed by NimbleGen , Inc. to incorporate Cy3 or Cy5 dyes , hybridized to a tiling array , and quantiﬁed by ﬂuorescence scanning . 
+ Two biological replicates were obtained for both BCM-treated and untreated conditions . 
+ Array Designs . 
+ Weused2distinctisothermaltilingarraysthatcovertheentireE.coli K-12 MG1655 genome . 
+ The ﬁrst array contained 187,204 oligonucleotide probes based on the sequence of the plus strand that were synthesized on the array in duplicate with 24-bp spacing ( 12 ) , whereas the second contained 374,408 probes that alternated strands with 12-bp spacing . 
+ Data Analysis . 
+ We performed locally weighted linear regression ( LOWESS ) normalization ( 46 ) on raw Cy3 and Cy5 signals to correct for intensity-dependent dye effects within each array using the `` normalizeWithinArrays '' function ( 47 ) in the limmapackage ( 48 ) forthestatisticalprogramR ( 49 ) . 
+ Normalizedlog2ratioswere 
+ 1 . 
+ Roberts JW ( 1969 ) Termination factor for RNA synthesis . 
+ Nature 224 ( 5225 ) :1168 -- 1174 . 
+ 2 . 
+ Richardson JP , Grimley C , Lowery C ( 1975 ) Transcription termination factor Rho activity is altered in Escherichia coli with suA gene mutations . 
+ Proc Natl Acad Sci USA 72 ( 5 ) :1725 -- 1728 . 
+ 3 . 
+ GalluppiGR , RichardsonJP ( 1980 ) ATP-inducedchangesinthebindingofRNAsynthesis termination protein Rho to RNA . 
+ J Mol Biol 138 ( 3 ) :513 -- 539 . 
+ 4 . 
+ Richardson JP ( 2006 ) How Rho exerts its muscle on RNA . 
+ Mol Cell 22 ( 6 ) :711 -- 712 . 
+ 5 . 
+ Banerjee S , Chalissery J , Bandey I , Sen R ( 2006 ) Rho-dependent transcription termination : More questions than answers . 
+ J Microbiol 44 ( 1 ) :11 -- 22 . 
+ 6 . 
+ Zwiefka A , Kohn H , Widger WR ( 1993 ) Transcription termination factor Rho : The site of bicyclomycin inhibition in Escherichia coli . 
+ Biochemistry 32 ( 14 ) :3564 -- 3570 . 
+ 7 . 
+ Yanofsky C , Horn V ( 1995 ) Bicyclomycin sensitivity and resistance affect Rho factormediated transcription termination in the tna operon of Escherichia coli . 
+ J Bacteriol 177 ( 15 ) :4451 -- 4456 . 
+ 8 . 
+ Magyar A , Zhang X , Kohn H , Widger WR ( 1996 ) The antibiotic bicyclomycin affects the secondary RNA binding site of Escherichia coli transcription termination factor Rho . 
+ J Biol Chem 271 ( 41 ) :25369 -- 25374 . 
+ 9 . 
+ Park HG , et al. ( 1995 ) Bicyclomycin and dihydrobicyclomycin inhibition kinetics of Escherichia coli Rho-dependent transcription termination factor ATPase activity . 
+ Arch Biochem Biophys 323 ( 2 ) :447 -- 454 . 
+ 10 . 
+ Skordalakes E , Brogan AP , Park BS , Kohn H , Berger JM ( 2005 ) Structural mechanism of inhibition of the Rho transcription termination factor by the antibiotic bicyclomycin . 
+ Structure 13 ( 1 ) :99 -- 109 . 
+ 11 . 
+ Bubunenko M , Baker T , Court DL ( 2007 ) Essentiality of ribosomal and transcription antitermination proteins analyzed by systematic gene replacement in Escherichia coli . 
+ J Bacteriol 189 ( 7 ) :2844 -- 2853 . 
+ 12 . 
+ Mooney RA , et al. ( 2009 ) Regulator trafﬁcking on bacterial transcription units in vivo . 
+ Mol Cell 33 ( 1 ) :97 -- 108 . 
+ 13 . 
+ Cardinale CJ , et al. ( 2008 ) Termination factor Rho and its cofactors NusA and NusG silence foreign DNA in E. coli . 
+ Science 320 ( 5878 ) :935 -- 938 . 
+ 14 . 
+ Ederth J , Mooney RA , Isaksson LA , Landick R ( 2006 ) Functional interplay between the jaw domain of bacterial RNA polymerase and allele-speciﬁc residues in the product RNA-binding pocket . 
+ J Mol Biol 356 ( 5 ) :1163 -- 1179 . 
+ 15 . 
+ Kuan PF , Chun H , Keles S ( 2008 ) CMARRT : A tool for the analysis of ChIP-Chip data from tiling arrays by incorporating the correlation structure . 
+ Paciﬁc Symposium on Biocomputing 13:515 -- 526 . 
+ 16 . 
+ Matsumoto Y , Shigesada K , Hirano M , Imai M ( 1986 ) Autogenous regulation of the gene for transcription termination factor Rho in Escherichia coli : Localization and function of its attenuators . 
+ J Bacteriol 166 ( 3 ) :945 -- 958 . 
+ 17 . 
+ Kupper H , Sekiya T , Rosenberg M , Egan J , Landy A ( 1978 ) A Rho-dependent termination siteinthegenecodingfortyrosinetRNAsu3ofEscherichiacoli.Nature272 ( 5652 ) :423 -- 428 . 
+ 18 . 
+ Alifano P , Rivellini F , Limauro D , Bruni CB , Carlomagno MS ( 1991 ) A consensus motif common to all Rho-dependent prokaryotic transcription terminators . 
+ Cell 64 ( 3 ) :553 -- 563 . 
+ 19 . 
+ Zuker M ( 2003 ) Mfold web server for nucleic acid folding and hybridization prediction . 
+ Nucleic Acids Res 31 ( 13 ) :3406 -- 3415 . 
+ 20 . 
+ Lee JS , An G , Friesen JD , Fill NP ( 1981 ) Location of the tufB promoter of E. coli : Cotranscription of tufB with four transfer RNA genes . 
+ Cell 25 ( 1 ) :251 -- 258 . 
+ 21 . 
+ Sipos K , Szigeti R , Dong X , Turnbough CL , Jr ( 2007 ) Systematic mutagenesis of the thymidine tract of the pyrBI attenuator and its effects on intrinsic transcription termination in Escherichia coli . 
+ Mol Microbiol 66 ( 1 ) :127 -- 138 . 
+ 22 . 
+ LarsonMH , GreenleafWJ , LandickR , BlockSM ( 2008 ) Appliedforcerevealsmechanistic and energetic details of transcription termination . 
+ Cell 132 ( 6 ) :971 -- 982 . 
+ 23 . 
+ Fozo EM , et al. ( 2008 ) Repression of small toxic protein synthesis by the Sib and OhsC small RNAs . 
+ Mol Microbiol 70 ( 5 ) :1076 -- 1093 . 
+ 24 . 
+ Vogel J , et al. ( 2003 ) RNomics in Escherichia coli detects new sRNA species and indicates parallel transcriptional output in bacteria . 
+ Nucleic Acids Res 31 ( 22 ) :6435 -- 6443 . 
+ 25 . 
+ Vitreschak AG , Rodionov DA , Mironov AA , Gelfand MS ( 2002 ) Regulation of riboﬂavin biosynthesis and transport genes in bacteria by transcriptional and translational attenuation . 
+ Nucleic Acids Res 30 ( 14 ) :3141 -- 3151 . 
+ thenaveragedoverprobepositionsfoundinthe187 ,204 probearraytomakethe 2 array formats directly comparable . 
+ Next , biological replicates for BCM-treated or untreated conditions were quantile normalized between arrays using the `` normalize.quantiles '' function in the R package affy ( 50 ) . 
+ For each of the BCM-treated ( Trt ) anduntreated ( UnTrt ) conditions , wecomputedtheaverageof the2biologicalreplicatesforeachprobeposition.Theanalysistoidentifyregions enriched in Trt relative to UnTrt was performed using CMARRT ( 15 ) on the difference between the average of treated and untreated conditions ( AveTrt AveUnTrt ) at the FDR level of 0.05 . 
+ Quantitative PCR . 
+ Quantitative PCR was performed on ChIP DNA using SYBR Green JumpStart Taq ReadyMix for Real-Time PCR ( Sigma-Aldrich ) in an ABI 7500 Real-Time PCR System thermal cycler ( Applied Biosystems ) . 
+ Two primers pairs were designed for each BSR locus tested . 
+ The ﬁrst primer pair annealed before the BSR , and the second annealed within the BSR . 
+ Primer sequences are available upon request . 
+ Cycle threshold values obtained from quantitative PCR were converted to a relative quantity of DNA based on a standard curve created for each primer pair . 
+ The relative DNA quantity within the BSR was then normalized to the quantity before the BSR . 
+ ACKNOWLEDGMENTS . 
+ We thank Yann Dufour for array design , and Nicole Perna for assistance in deﬁning E. coli K-12-speciﬁc genes . 
+ We also thank Richard Gourse , David Brow , Charles Turnbough , Jr. , and members of the Landick Lab for critical reading of the manuscript . 
+ This work was supported by National Institutes of Health Grant GM38660 to R.L. 
+ 26 . 
+ Yakhnin H , Babiarz JE , Yakhnin AV , Babitzke P ( 2001 ) Expression of the Bacillus subtilis trpEDCFBA operon is inﬂuenced by translational coupling and Rho termination factor . 
+ J Bacteriol 183 ( 20 ) :5918 -- 5926 . 
+ 27 . 
+ Livny J , Waldor MK ( 2007 ) Identiﬁcation of small RNAs in diverse bacterial species . 
+ Curr Opin Microbiol 10 ( 2 ) :96 -- 101 . 
+ 28 . 
+ Vogel J , Sharma CM ( 2005 ) How to ﬁnd small non-coding RNAs in bacteria . 
+ Biol Chem 386 ( 12 ) :1219 -- 1238 . 
+ 29 . 
+ Reppas NB , Wade JT , Church GM , Struhl K ( 2006 ) The transition between transcriptional initiation and elongation in E. coli is highly variable and often rate limiting . 
+ Mol Cell 24 ( 5 ) :747 -- 757 . 
+ 30 . 
+ Henkin TM , Yanofsky C ( 2002 ) Regulation by transcription attenuation in bacteria : How RNA provides instructions for transcription termination/antitermination decisions . 
+ Bioessays 24 ( 8 ) :700 -- 707 . 
+ 31 . 
+ Wu AM , Christie GE , Platt T ( 1981 ) Tandem termination sites in the tryptophan operon of Escherichia coli . 
+ Proc Natl Acad Sci USA 78 ( 5 ) :2913 -- 2917 . 
+ 32 . 
+ Stewart V , Landick R , Yanofsky C ( 1986 ) Rho-dependent transcription termination in the tryptophanase operon leader region of Escherichia coli K-12 . 
+ J Bacteriol 166 ( 1 ) :217 -- 223 . 
+ 33 . 
+ SalgadoH , etal . 
+ ( 2000 ) RegulonDB ( version3 .0 ) : Transcriptionalregulationandoperon organization in Escherichia coli K-12 . 
+ Nucleic Acids Res 28 ( 1 ) :65 -- 67 . 
+ 34 . 
+ ZhuAQ , vonHippelPH ( 1998 ) Rho-dependentterminationwithinthetrpt terminator . 
+ I. Effects of Rho loading and template sequence . 
+ Biochemistry 37 ( 32 ) :11202 -- 11214 . 
+ 35 . 
+ MohantyBK , KushnerSR ( 2007 ) RibonucleasePprocessespolycistronictRNAtranscripts in Escherichia coli independent of ribonuclease E. Nucleic Acids Res 35 ( 22 ) :7614 -- 7625 . 
+ 36 . 
+ Klumpp S , Hwa T ( 2008 ) Stochasticity and trafﬁc jams in the transcription of ribosomal RNA : Intriguing role of termination and antitermination . 
+ Proc Natl Acad Sci USA 105 ( 47 ) :18159 -- 18164 . 
+ 37 . 
+ Waters LS , Storz G ( 2009 ) Regulatory RNAs in bacteria . 
+ Cell 136 ( 4 ) :615 -- 628 . 
+ 38 . 
+ Ward DF , Murray NE ( 1979 ) Convergent transcription in bacteriophage lambda : Interference with gene expression . 
+ J Mol Biol 133 ( 2 ) :249 -- 266 . 
+ 39 . 
+ Lykke-Andersen S , Jensen TH ( 2007 ) Overlapping pathways dictate termination of RNA polymerase II transcription . 
+ Biochimie 89 ( 10 ) :1177 -- 1182 . 
+ 40 . 
+ Steinmetz EJ , et al. ( 2006 ) Genome-wide distribution of yeast RNA polymerase II and its control by Sen1 helicase . 
+ Mol Cell 24 ( 5 ) :735 -- 746 . 
+ 41 . 
+ Steinmetz EJ , Brow DA ( 1996 ) Repression of gene expression by an exogenous sequence element acting in concert with a heterogeneous nuclear ribonucleoproteinlike protein , Nrd1 , and the putative helicase Sen1 . 
+ Mol Cell Biol 16 ( 12 ) :6993 -- 7003 . 
+ 42 . 
+ Arigo JT , Eyler DE , Carroll KL , Corden JL ( 2006 ) Termination of cryptic unstable transcripts is directed by yeast RNA-binding proteins Nrd1 and Nab3 . 
+ Mol Cell 23 ( 6 ) :841 -- 851 . 
+ 43 . 
+ Williams KP ( 2002 ) Integration sites for genetic elements in prokaryotic tRNA and tmRNA genes : Sublocation preference of integrase subfamilies . 
+ Nucleic Acids Res 30 ( 4 ) :866 -- 875 . 
+ 44 . 
+ Harinarayanan R , Gowrishankar J ( 2003 ) Host factor titration by chromosomal R-loops asamechanismforrunawayplasmidreplicationintranscriptiontermination-defective mutants of Escherichia coli . 
+ J Mol Biol 332 ( 1 ) :31 -- 46 . 
+ 45 . 
+ Ng HH , Robert F , Young RA , Struhl K ( 2003 ) Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity . 
+ Mol Cell 11 ( 3 ) :709 -- 719 . 
+ 46 . 
+ Yang YH , et al. ( 2002 ) Normalization for cDNA microarray data : A robust composite method addressing single and multiple slide systematic variation . 
+ Nucleic Acids Res 30 ( 4 ) : e15 . 
+ 47 . 
+ Smyth GK , Speed T ( 2003 ) Normalization of cDNA microarray data . 
+ Methods 31 ( 4 ) :265 -- 273 . 
+ 48 . 
+ Smyth GK , Michaud J , Scott HS ( 2005 ) Use of within-array replicate spots for assessing differential expression in microarray experiments . 
+ Bioinformatics 21 ( 9 ) :2067 -- 2075 . 
+ 49 . 
+ Team RDC ( 2008 ) R : A language and environment for statistical computing . 
+ Available at R Foundation for Statistical Computing , http://www.R-project.org/ . 
+ 50 . 
+ Gautier L , Cope L , Bolstad BM , Irizarry RA ( 2004 ) affy -- analysis of Affymetrix GeneChip data at the probe level . 
+ Bioinformatics 20 ( 3 ) :307 -- 315 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/20602746.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/20602746.txt 0 → 100644
View file @27818a9
+ thIoPolognarCchip: surprising results are often artifacts
+ Abstract 
+ Background : The method of chromatin immunoprecipitation combined with microarrays ( ChIP-Chip ) is a powerful tool for genome-wide analysis of protein binding . 
+ However , a high background signal is a common phenomenon . 
+ Results : Reinvestigation of the chromatin immunoprecipitation procedure led us to discover four causes of high background : i ) non-unique sequences , ii ) incomplete reversion of crosslinks , iii ) retention of protein in spin-columns and iv ) insufficient RNase treatment . 
+ The chromatin immunoprecipitation method was modified and applied to analyze genome-wide binding of SeqA and σ32 in Escherichia coli . 
+ Conclusions : False positive findings originating from these shortcomings of the method could explain surprising and contradictory findings in published ChIP-Chip studies . 
+ We present a modified chromatin immunoprecipitation method greatly reducing the background signal . 
+ Background tially labeled DNAs are hybridized to the same microar-Chromatin immunoprecipitation coupled with microar - ray and the difference in fluorescence intensity gives a ray analysis ( ChIP-Chip ) has become a widely used measure of the enrichment . 
+ method for genome-wide localization of protein-DNA We set out to investigate the genome-wide binding of interactions [ 1 ] . 
+ Protocols have been established for dif - the sequestration protein SeqA in E. coli [ 6 ] . 
+ This task can ferent organisms with surprisingly little variation [ 2-5 ] . 
+ be considered especially challenging because SeqA has The first step in the ChIP-Chip procedure is to fix pro- been shown to bind selectively to hemimethylated GATC tein-DNA interactions in living cells by chemical cross - sites [ 7 ] . 
+ Although there are about 20.000 GATCs around linking ( Fig. 1 ) . 
+ The crosslinker must be small to diffuse the Escherichia coli chromosome only about 2 % will be fast into the cells . 
+ In practice , formaldehyde is used in hemimethylated in unsynchronized cells [ 8 ] . 
+ Such cell-to-most ChIP-Chip experiments . 
+ After cell lysis the DNA is cell variation increases the amount of cell material fragmented by sonication . 
+ This extract is then subjected needed and therefore potentially the level of background to immunoprecipitation ( IP ) with a specific antibody signals . 
+ In fact , we found that application of a published against the protein of interest . 
+ DNA bound by the protein ChIP-Chip method produced a background signal will be coprecipitated and enriched compared to DNA exceeding the specific signal . 
+ However , we were able to not bound by the respective protein . 
+ To facilitate immu - reduce the background significantly by modifying the noprecipitation and subsequent washing , antibodies are protocol . 
+ The new protocol allowed us to uncover the usually coupled to either agarose - or magnetic beads via genome-wide binding of SeqA and to reinvestigate σ32 protein A or G . 
+ After reversion of crosslinking the DNA binding to the E. coli chromosome . 
+ is purified by phenol extraction or commercial PCR cleanup kits . 
+ Often , an amplification step is included after Results DNA purification . 
+ Two different fluorescence labels are High background signal in ChIP-Chip experiments used to label the IP DNA and a hybridization control To investigate the genome-wide binding pattern of the DNA , respectively . 
+ Usually total DNA before IP ( input sequestration protein SeqA in Escherichia coli we applied DNA ) is used as hybridization control . 
+ The two differen - the ChIP-Chip method as described [ 3 ] . 
+ Cells were grown * Correspondence : kirsten.skarstad@rr-research.no in LB medium , crosslinked with formaldehyde and soni-1 Department of Cell Biology , Institute for Cancer Research , The Norwegian cated to break down DNA to fragments of approximately Radium Hospital , Oslo University Hospital and University of Oslo , 0310 Oslo , 500 bps . 
+ The IP was done in parallel with antibodies Norway against SeqA and , as a control , RNA polymerase subunit Full list of author information is available at the end of the article © 2010 Waldminghaus and Skarstad ; licensee BioMed Central Ltd. . 
+ This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0 ) , which permits unrestricted use , distribution , and reproduction in any medium , provided the original work is properly cited β . 
+ After reversion of crosslinking the DNA of the ChIP sample and the input DNA was differentially labeled and hybridized to a whole-genome microarray . 
+ Plotting of the ChIP signal against the genomic position revealed a great number of distinct peaks ( Fig. 2 ) . 
+ Surprisingly the binding patterns of SeqA and RNA polymerase turned out to be essentially identical ( Fig. 2 , compare red and blue ) . 
+ The overlap of the highest ChIP signals was > 80 % ( Fig. 3A ) . 
+ A difference could only be seen when SeqA and RNA polymerase signals were grouped by the number of SeqA rec-ognition sequences in the region of the corresponding probes ( Fig. 2B-C ) . 
+ While a slight correlation between the SeqA ChIP signal and the number of GATC sites was observed at numbers of sites above 5 , this was not the case for the RNA polymerase ChIP-Chip . 
+ This indicates that a specific SeqA signal is overlayed by a strong RNA polymerase-like signal in the SeqA ChIP-Chip experiment . 
+ To estimate the degree of background signal in the SeqA ChIP-Chip we repeated the experiment using a 
+ SeqA deletion strain . 
+ All signals detected with such a setup should be non-specific , since no SeqA protein will be present in the cell extract . 
+ The genome-wide pattern of SeqA ChIP signal in the ΔseqA cells showed enrichment at various regions also enriched in the wt cells ( Fig. 4A ) . 
+ As expected , the former lacked the slight correlation of the ChIP signal with the local GATC number ( Fig. 4B ) . 
+ This demonstrates that the method gave an enormous amount of background signal , exceeding the specific SeqA signal in the wt ChIP-Chip . 
+ Note that this background signal is not a variation of single probe intensities . 
+ It is instead the appearance of high signals in neighboring probes which is typical for a specific binding detected by ChIP-Chip . 
+ We set out to identify steps in the protocol where DNA regions giving a high background signal on the microarray behave differently compared to regions giving no background . 
+ Quantitative PCR ( qPCR ) was performed with the rpsD region which gave a high background signal on the microarray and uvrD which gave a low background signal ( both are marked in Fig. 2 ) . 
+ Washing turned out to be one critical step . 
+ The rpsD DNA was more than five-fold enriched when a spin-column was used to wash the precipitated fragments bound to aga-rose beads compared to when the same beads were washed without column ( Fig. 5A ; see materials and methods for details ) . 
+ Two-fold enrichment was detected for the uvrD region . 
+ The background signal we observed seemed to correspond to highly transcribed regions , i.e. DNA with many RNA polymerase molecules bound ( Fig. 2 ) . 
+ Protein-rich DNA is segregated into the organic phase during phenol-chloroform extraction of crosslinked DNA [ 9 ] . 
+ However , this phenomenon should not have affected a ChIP-Chip experiment , because the crosslinking is reversed before extraction is performed . 
+ The appearance of protein-rich gene regions as background might indicate an incomplete reversion of crosslinking at these sites . 
+ To clarify this question we compared DNA that was crosslinked and reversed with DNA that was not crosslinked . 
+ ( Fig. 5B ; see materials and methods for details ) . 
+ If the reversion of the crosslinking in this protocol is complete one would expect the two signals to be the same . 
+ This was indeed the case for the uvrD region . 
+ However , the rpsD DNA was more than seven-fold reduced in the crosslinked-reversed sample compared to the non-crosslinked DNA . 
+ To analyze the effect of crosslinking and reversion on a global scale we differentially labeled the DNA and applied it to a microarray . 
+ Ratios of the crosslinked-reversed versus the non-crosslinked DNA are shown in Fig. 5C ( blue signal ) . 
+ The results show that the same regions that gave a high background signal in the SeqA ChIP-Chip yielded a reduced signal if the DNA is crosslinked and reversed ( Fig. 5C ; compare blue and red signal , Fig. 3D-E ) 
+ We tested if variations of conditions influence the efficiency of crosslink reversion . 
+ Crosslinked DNA was reversed at different temperatures and with or without proteinase K ( Table 1 ) . 
+ Resulting DNA was analyzed by qPCR with uvrD and rpsD primers as above and compared to non-crosslinked DNA . 
+ As above , the uvrD control DNA was not changed much by crosslinking and reversion while the rpsD region was depleted . 
+ Notably , the level of depletion was similar for all investigated conditions . 
+ We conclude that chromosomal regions can be crosslinked to a degree which is not reversible and the respective DNA will be lost for downstream analysis . 
+ Modification of the ChIP-Chip procedure allows genome-wide analysis of SeqA binding Considering the identified weaknesses of the ChIP-Chip protocol it was possible to make appropriate modifications ( see material and methods for details ) . 
+ The first change was the omission of spin-columns in the washing of agarose beads . 
+ Second , the input DNA was taken from the supernatant resulting from centrifugation of the immunoprecipitated chromatin beads . 
+ In addition , we included RNase digestion of immunoprecipitated DN and excluded signals originating from microarray-probes to non-unique sequences during data analysis . 
+ The reasoning behind the latter two will be described in detail below . 
+ To test the new method we applied it to a cell extract of a seqA deletion strain using antiserum against SeqA ( Fig. 6 ) . 
+ As described above this should not give a specific ChIP signal and should therefore allow judgment of the level of background signal . 
+ Although some background was produced by the new method it was greatly reduced compared to the unmodified method ( Fig. 6 , compare blue to red ) . 
+ For the rpsD gene region the ChIP signal was reduced about 30-fold ( Fig. 6B ) . 
+ As a next step we used the new method to detect SeqA binding in wt E. coli cells . 
+ We found a distinct binding pattern with the highest peak at the origin of replication and very low SeqA binding in the terminus region of the chromosome ( Fig. 7 ) . 
+ The pattern differed greatly from that detected with the unmodified ChIP-Chip method ( Fig. 7 , compare red to grey , 3 B-C ) . 
+ Only minimal overlap with the crosslinking background was observed indicating significant reduction of background signals ( Fig. 3 , compare D-E with F ) . 
+ To put the results in a biological context we calculated the SeqA binding signal for a 60.000 bp moving window ( Fig. 7 , inner ring ) . 
+ The reasoning behind this is that SeqA has been shown to bind specifically to hemimethylated DNA `` trailing '' the replication fork . 
+ We estimated the stretch of hemimethylated DNA following the replication fork to be 60.000 bp ( based on a replication speed of 1000 bp/sec and an average hemimethylation time of 1 min ) . 
+ The result shows that SeqA binding is not evenly distributed over the chromosome . 
+ Instead there are regions with strong binding , such as the origin of replication ( oriC ) and areas with low binding , such as to the left and right of oriC ( Fig. 7 ) . 
+ The most extended area with low SeqA binding is about one-fourth of the chromosome around the replication terminus with distinct borders rather than smooth transitions to the neighboring high SeqA binding regions . 
+ A clear correlation was observed between the number of GATC sites in the probe region and the corresponding ChIP signal ( Fig. 7B ) . 
+ In summary , we have shown that the revised ChIP-Chip protocol can be successfully used to gain insight into the challenging question of chromosome-wide SeqA binding in E. coli . 
+ Reinvestigation of σ32 binding to the E. coli genome Given the enormous background signal produced by the original ChIP-Chip method initially used in this study we considered it likely that published results based on this method would contain many false positives . 
+ To examine this experimentally we used our modified ChIP-Chip protocol to reinvestigate binding of the heat shock sigma factor σ32 to the E. coli genome [ 10 ] . 
+ In the publishe study many novel σ32 binding sites were described . 
+ Using a specific antibody we precipitated σ32-bound DNA from lysates of cells before and 5 min after heat shock . 
+ Of the 38 σ32-targets found by Wade et al. and by others in studies using alternative methods , we detected 34 ( Table 2 ) . 
+ In contrast , out of the 49 targets found exclusively in the Wade et al. . 
+ ChIP-Chip study , just seven appeared in our results ( Table 3 ) . 
+ Six potential targets were detected that were not found by Wade et al. , including the gene dgsA , also described by others ( Table 4 ) [ 11 ] . 
+ Since application of our modified method excludes most σ32-targets described solely in the published ChIP-Chip study we consider it likely that these are in fact false positives ( see discussion ) . 
+ Limited RNase treatment is an additional source of false positives in ChIP-Chip studies The σ32 ChIP-Chip was used to investigate additional sources of false positive findings , such as the duration of RNase incubation of immunoprecipitated complexes . 
+ While some published ChIP-Chip protocols include an RNase digestion step others do not . 
+ We used an extended RNase incubation at 42 °C for at least 90 min in our modified ChIP-Chip method . 
+ To examine the effect of limited RNA digestion we shortened the incubation to 30 min with an otherwise unchanged protocol ( Fig. 8A ) . 
+ The shortened RNase incubation increased the unspecific background signal drastically compared to the two experiments with longer RNA digestion . 
+ Some false positive σ32-targets of the published ChIP-Chip study described above might originate from RNA , since the method used lacks an RNase step . 
+ Accordingly , we observed a much higher signal with shorter compared to extended RNase treatment for some of the false positive σ32-targets ( for example yghJ , Fig. 8B ) . 
+ Non-unique sequences can cause false positives in ChIP-Chip analysis One important source of false positive findings in ChIP-Chip studies is the inclusion of non-unique sequences . 
+ For the 40.000 probes on the microarray used in this study we examined the number of complementary sequences on the E. coli chromosome . 
+ 889 probes were found to match multiple loci on the chromosome , the numbers ranging from 2 to 11 ( data not shown ) . 
+ Note that signals obtained with these probes and the surrounding probes were routinely excluded from all results shown above as mentioned . 
+ However , to investigate the effect of these non-unique probes we reanalyzed the σ32-ChIP-Chip experiment of 30 °C cells described above including the non-unique sequences ( Fig. 8C ) . 
+ Some of these probes gave an elevated ChIP signal . 
+ Consequently , six new peaks were detected by our search algorithm in addition to the 15 peaks detected before ( Fig. 8C ) . 
+ Also the published σ32-study includes two target sites in non-unique sequence regions . 
+ These are the yibA promoter close to the rhsA gene and the yrdA promoter downstream of the ribosomal RNA gene rrsD . 
+ In summary , our data demonstrate the potential of non-unique sequences to cause false-positive findings in ChIP-Chip studies . 
+ Discussion
+ Multiple sources of false positives in ChIP-Chip studies Here we present four sources of high background signals that caused false positive target site detection in our experiments as well as in many published studies . 
+ In the following we discuss how this unspecific background might occur . 
+ The first two problems , namely the selective enrichment of some DNA fragments during spin-column washing and the variability in reversion of crosslinking , might actually be due to the same circumstance . 
+ Both affected chromosomal regions with high transcription activity , such as the ribosomal protein gene rpsD ( Fig. 5 ) . 
+ In such regions crosslinking of RNA polymerase , DNA and transcribed mRNA will form large complexes . 
+ Concerning the washing of immunoprecipitated DNA with spin-columns it is easy to imagine that such highly cross-linked fragments could be trapped in the column matrix . 
+ A release of these bound complexes in the elution step would explain the enrichment of protein-rich DNA through washing with spin-columns . 
+ This would be limited to the IP DNA in a ChIP experiment because usually no beads are used to purify the input DNA . 
+ The logical improvement of the protocol in this case was to wash the immunoprecipitated DNA without spin-columns . 
+ Another possibility would be to use systems which separate beads by magnetism instead of centrifugation . 
+ In contrast , the difference in crosslinking/reversion efficiencies at genomic loci could not be reduced by leaving out the crosslinking because it is an essential part of the protocol . 
+ The incomplete reversion of crosslinking led to depletion of protein-rich chromosomal regions during DNA preparation ( Fig. 5 ) . 
+ If this depletion were similar in the IP and input DNA it would not appear as ChIP signal because the corresponding ratio would be one . 
+ However , different rates of depletion in IP and input DNA would let this ratio go up or down . 
+ If for example 60 % of a crosslinked site is reversed in the IP DNA but only 30 % in the input DNA this would appear as two-fold enrichment and potentially as false positive target . 
+ Thus , transcriptionally active regions of the chromosome are more likely to show a high background signal . 
+ This problem could not be solved by variation of reversion conditions ( Table 1 ) . 
+ However , as one way to better separat the real targets from such background we increased the specific signal by using the supernatant of the immunoprecipitation as input DNA . 
+ This should amplify the specific signal because it will be enriched in the immunoprecipitated DNA and at the same time reduced in the reference DNA . 
+ A high background signal originating from non-digested RNA may also occur in ChIP-Chip experiments . 
+ This will for example be high if the Klenow fragment is used for labelling of immunoprecipitated DNA , since it can use RNA as primer to incorporate labelled nucle-otides . 
+ If a linker-mediated PCR is used to amplify the immunoprecipitated DNA the amount of RNA relative to DNA will be reduced , potentially reducing the RNA-caused background . 
+ Here we show that a thorough RNase digestion is a suitable way to eliminate the RNA background , allowing a free choice of subsequent labelling and amplification techniques . 
+ An additional origin of high background signals in microarray analysis is caused by the occurrence of non-unique sequences on the chromosomes . 
+ A systematic evaluation of labeling and microarray hybridization of predefined DNA targets revealed such genome redundancy as one major cause of false positives [ 12 ] . 
+ A probe to a non-unique sequence will bind a mix of DNA fragments originating from different chromosomal loci . 
+ The chromosomal position can influence the protein binding to the different copies of a non-unique sequence and may therefore lead to erroneous ChIP-Chip results . 
+ If for example one copy is located downstream an active promoter and the other copy not , a RNA polymerase ChIP would enrich the first locus but not the second . 
+ On the microarray this would appear as a medium enrichment at both chromosomal positions . 
+ Additional errors might occur at non-unique sequences with multiple copies and some sequence variation . 
+ In this case one probe might be complementary to for example two copies and the neighboring one to seven copies . 
+ Genes that are typically non-unique are the ribosomal and transfer RNA genes or transposons but also for example the rhsABCD gene family or gadAB in E. coli . 
+ To estimate the degree of false positives caused by non-unique sequences we screened the literature for occurrence of the mentioned genes as target sites in microarray studies . 
+ Appearance of non-unique sequence false positives turned out to be quite frequent . 
+ For example , 36 out of 269 ` extended protein occupancy domains ' in a recent study from Vora et al. are in regions with non-unique sequences [ 13 ] . 
+ Some studies even draw major conclusions from the appearance of non-unique sequence false positives . 
+ For example , the heat shock regulator HspR was suggested to be involved in regulation of tRNA and rRNA genes in Streptomyces coelicolor [ 14 ] , the B. subtilis condensin SMC was proposed to be recruited to rRNA and tRNA genes [ 15 ] and tRNA genes were described to be cohesin loading sites both in budding and fission yeast [ 16,17 ] . 
+ All of the mentioned gene loci are non-unique in the respective genomes . 
+ Note that in principle the described conclusions could be right ; it is just that the results of microarray experiments can say nothing about it and might actually be misleading instead . 
+ Fortunately , non-unique sequences can be easily detected and corresponding probes be excluded from data sets . 
+ Even better would be elimination during array design . 
+ Beside the causes of high background described in this study other factors have been shown to affect the background level . 
+ For example Lee and colleagues point out that ChIP-Chip experiments are highly dependent on the antibody used for the immunoprecipitation [ 4 ] . 
+ The background signal will be high if the antibody performs poorly or if it binds other proteins unspecifically . 
+ In this context the salt concentration of the IP and wash buffer is critical and can be adjusted to optimize immunoprecipitation [ 4 ] . 
+ In addition to the experimental procedure improper data processing can lead to false positive findings . 
+ How the data are analyzed will depend on different factors such as probe density and the relative number of binding sites [ 2 ] . 
+ Correct normalization regarding the dye bias in two color microarrays has been shown to be essential for ChIP-Chip experiments [ 18 ] . 
+ How frequent are false positives in published ChIP-Chip data ? 
+ The presence of non-unique sequence false positives might indicate that a high number of false positives are the rule , rather than exception in published ChIP-Chip studies . 
+ A false positive rate about 50 % was found by our reinvestigation of a published σ32-study [ 10 ] . 
+ The conclusion that the targets found in the published ChIP-Chip experiment but not in our study are false positives is supported by findings from others [ 11,19 ] . 
+ While almost all of the targets we detected have been found with other methods then ChIP-Chip , the only evidence for the supposed new targets by Wade et al. is their ChIP-Chip analysis [ 10,11,19 ] . 
+ It is noteworthy that this analysis was done with the protocol used in the first experiment of our study producing a high background [ 3 ] . 
+ In addition the supposed new targets lacked a typical σ32-recognition site [ 10 ] . 
+ Further evidence for a frequent false-positive rate in ChIP-Chip studies comes from large differences of binding site detections in parallel studies . 
+ For example , FIS was found to bind all regions on the E. coli genome that are bound by RNA polymerase despite the absence of consensus binding sites [ 20 ] . 
+ A later study showed very different results with data that nicely fit the distribution of FIS binding motifs [ 21 ] . 
+ In two independent studies the binding of the estrogen receptor to the human chromo some 17 of MCF-7 breast cancer cells was analyzed [ 22,23 ] . 
+ We compared the 389 binding sites described in the Gevry study to the 390 sites detected in the Carroll study and found only about 50 % overlap ( binding sites were considered the same when not more than 2000 bp apart , data not shown ) . 
+ Interestingly , others have also suggested an extended degree of false positives as explanation for contradictory results in parallel ChIP-Chip studies . 
+ Highly dissimilar binding patterns of the Mediator complex in yeast were reported [ 24-26 ] . 
+ Fan and Struhl reinvestigated the contradictory results and suggested that the differences were caused by a high degree of false positives due to the experimental set-up of Andrau and colleagues [ 27 ] . 
+ These supposed false posi tives are mainly located in transcriptionally active coding regions as is also the case in our study . 
+ A high number of false positives would make systematic approaches to analyze ChIP-Chip-derived binding sites especially difficult . 
+ Indeed , a recent analysis of yeast ChIP-Chip data revealed that only 48 % of detected transcription factor binding sites could be explained by direct binding and an additional 16 % by indirect binding [ 28 ] . 
+ The remaining 36 % of the data set could not be explained by either direct or indirect transcription factor binding and were suggested to be noise . 
+ Taken together , high false positive rates seem to be common in ChIP-Chip studies . 
+ In some cases it actually seems to be an accepted fact . 
+ For example , Partridge and colleagues removed over one third of ChIP-Chip detected NsrR target sites just because they did not fit their expectations of lying in promoter regions [ 29 ] . 
+ However , this high false-positive rate was not investigated any further . 
+ How to deal with the background
+ Beside the need for technical improvements , the high level of ChIP-Chip false positives emphasizes the great importance of suitable control experiments . 
+ Good controls are ChIP-Chip experiments with cells lacking the IP epitope ( for instance ΔseqA ; Fig. 6 ) , mock IPs without antibody ( Fig. 5A ) or IPs with preimmune serum or IPs from cells growing under conditions that are expected to give no or reduced binding of the respective protein ( such as 30 ° for the heat shock sigma factor σ32 ; Fig. 8C ) . 
+ A suitable control experiment has two important functions . 
+ First , it allows estimation of the experimental quality . 
+ In this study the ΔseqA control was the key to understanding that the ChIP-Chip method gave high background ( Fig. 4 and 6 ) . 
+ Second , a control experiment can help to detect targets in the actual experiment . 
+ We used the σ32 control ChIP-Chip at 30 °C to find significant targets in the corresponding data set of heat shocked cells ( see materials and methods ) . 
+ It has been suggested that DNA from control experiments should be used as a hybridization reference , meaning that for example the IP DNA from a wt strain and a deletion strain are differentially labeled and hybridized to the same array [ 30 ] . 
+ However , others point out that a control should never be used as hybridization reference [ 2 ] . 
+ We agree with the latter opinion because use of control DNA as hybridization reference would not allow assessment of the experimental quality as outlined above . 
+ For instance , bad quality DNA from experiments with limited digestion of RNA ( Fig. 8A-B ) might not be detected if used as hybridization reference . 
+ Taken together , appropriate control experiments should be included in every ChIP-Chip study . 
+ Submission of the raw and processed control data to the public should be self-evident but is an exception in published studies so far . 
+ Recently , chromatin immunoprecipitation has been combined with high throughput sequencing methods ( ChIP-Seq ) . 
+ Interestingly , an analysis of different types of control DNA resulted in a variable pattern of background distributed over the chromosomes [ 31,32 ] . 
+ The pattern of background peaks varied between input DNA , noncrosslinked DNA and mock-IP DNA and lead to the conclusion that the type of reference DNA directly influence the number of sites deemed significant when scoring ChIP-Seq data . 
+ This underlines that the described problems apply to chromatin immunoprecipitation based methods in general . 
+ Revised ChIP-Chip method reveals new biological insights The revised ChIP-Chip method we developed enabled us to analyze binding of the sequestration protein SeqA to the E. coli chromosome . 
+ SeqA is involved in regulation of replication initiation and also proposed to play a role in chromosome organization and segregation [ 6 ] . 
+ It was found to exhibit prolonged binding to hemimethylated GATC sites at oriC and thereby hindering reinitiation [ 7,33 ] . 
+ Enhanced binding of SeqA at oriC was also found in our ChIP-Chip analysis , in fact it was the highest peak detected ( Fig. 7 ) . 
+ The second-highest peak was in the dnaA promoter region which has been shown to have an exceptionally long hemimethylation period [ 8 ] . 
+ While our data support SeqA binding as proposed for oriC and the dnaA promoter it contradicts published suggestions on chromosome-wide binding . 
+ Brendler and colleagues found an even distribution of potential SeqA binding sites over the chromosome [ 34 ] . 
+ Our data suggest that SeqA structures retain specific DNA tracts for varying amounts of time . 
+ Most striking is the relatively short duration of SeqA binding to the left and right of oriC and to the DNA at about one-quarter of the chromosome surrounding the replication terminus . 
+ The latter finding is in contrast to results from ChIP-PCR experiments with synchronized cells which suggested a prolonged SeqA binding in the terminus region [ 35 ] . 
+ Clearly , further analysis and additional experiments are needed to understand the biological meaning of the SeqA binding pattern . 
+ Conclusions
+ We describe here a revised ChIP-Chip method and show its potential to greatly reduce false positive target site detection , which seems to be a widespread problem . 
+ Although we present many examples of high false positive rates in published studies , it has to be pointed out that this will vary greatly with the exact experimental details as outlined above . 
+ Since method details such as the duration of the RNase treatment or the use of spin columns have a major impact on the background signal , it is of high importance t give an accurate description of the procedure used . 
+ The results reported here should allo critical reviewing of published ChIP-Chip studies as well as assessment and potential modification of other variants of the ChIP-Chip method and related methods . 
+ Methods
+ Cell growth , crosslinking and preparation of cell extracts For SeqA and RNA polymerase ChIP-Chip E. coli MG1655 or MG1655 ΔseqA ( Table 5 ) was grown at 37 °C to an OD600 of about 0.15 in 50 ml LB ( + 0.2 % glucose ) before 27 μl of formaldehyde ( 37 % ) per ml medium were added ( final concentration 1 % ) . 
+ Crosslinking was performed at slow shaking ( 100 rpm ) at room temperature for 20 min followed by quenching with 10 ml of 2.5 M glycine ( final concentration 0.5 M ) . 
+ For heat-shock experiments , E. coli MG1655 was grown in 65 ml LB medium at 30 °C to an OD600 of about 0.3 . 
+ Subsequently 30 ml of culture was transferred to a pre warmed flask at 43 °C and the remainder kept at 30 °C . 
+ Crosslinking and quenching was as described above except that cells were kept at 30 or 43 °C for 5 min before further slow shaking at room temperature . 
+ Cells were collected by centrifugation and washed twice with cold TBS ( pH7 .5 ) . 
+ After resuspension in 1 ml lysis buffer ( 10 mM Tris ( pH 8.0 ) , 20 % sucrose , 50 mM NaCl , 10 mM EDTA , 10 mg/ml lysozyme ) and incubation at 37 °C for 30 min followed by addition of 4 ml IP buffer , cells were sonicated on ice with 12 times 30 sec and 30 sec breaks at an UP 400 s Ultrasonic processor ( Dr. Hielscher GmbH ) with 100 % power . 
+ After centrifugation for 10 min at 9000 g , 800 μl aliquotes of the supernatant were stored at -20 °C . 
+ ChIP
+ The ChIP protocol initially used in this study was as described in Grainger et al. , 2004 except that DNA was purified with phenol/chloroform instead of a PCR clean up kit . 
+ 800 μl of sonicated cell extract ( see above ) was incubated with 20 μl protein A/G agarose beads ( Ultralink ) and 5 μl of SeqA antiserum or antibody against RNA polymerase subunit β ( Neoclone ) at 4 °C over night . 
+ Samples were transferred to a Spin-X centrifuge column ( Costar ) , centrifuged for 2 min at 4.000 rpm to collect the beads . 
+ The flow through was removed . 
+ Washing was done by adding 500 μl buffer to the beads on the spin column and rotation at room temperature for three minutes with subsequent collection of the beads by centrifugation as above . 
+ Washing was performed with the following buf-fers ( IP buffer two times all others one time ) : IP buffer ( 50 mM HEPES-KOH pH 7.5 , 150 mM NaCl , 1 mM EDTA , 1 % Triton × 100 , 0.1 % Sodium deoxycholate , 0.1 % SDS ) , IP buffer with 500 mM NaCl , wash buffer ( 10 mM Tris pH 8.0 , 250 mM LiCl , 1 mM EDTA , 0.5 % Nonidet-P40 , 0.5 % Sodium deoxycholate ) and TE . 
+ For elution , 100 μl elution buffer ( 50 mM Tris ( pH 7.5 ) , 10 mM EDTA , 1 % 
+ SDS ) was added to the column with the beads , incubated in a 65 °C water bath for 10 min and centrifuged as above . 
+ To reverse the cross link 80 μl TE and 20 μl proteinase K ( 20 mg/ml ) were added and samples incubated for 2 h at 42 and 6 h at 65 °C . 
+ DNA was purified with phenol/chloroform . 
+ To prepare the control DNA , 800 μl of sonicated cell extract was incubated at 65 °C over night . 
+ 1 μl RNase A ( 20 mg/ml ) were added and samples incubated 30 min at 65 °C before extraction with phenol/chloroform . 
+ The ChIP protocol as described above resulted in the high background signal ( Fig. 2 and 4 ) . 
+ The following modifications were applied for the other ChIP-Chip experiments . 
+ First , agarose beads were not collected on a spin column but instead at the bottom of a usual 1.5 ml eppendorf tube . 
+ The supernatant was then removed by pipetting . 
+ Second , the control DNA was taken from the supernatant resulting from centrifugation of the precipitated chromatin beads processed further as the immuno precipitated DNA after elution . 
+ Third , before addition of proteinase K , sample and control DNA were incubated with RNase A ( 50 μg / ml ) for at least 90 min at 42 °C ( except in the σ32-analysis shown in Fig. 8A where incubation was 30 min as indicated ) . 
+ Incubation of 800 μl cell extract with 15 μl σ32 - or 5 μl SeqA antiserum was for 1 h at 4 °C . 
+ Labeling and array hybridisation
+ Usually DNA from six parallel immuno-precipitations ( each with 800 μl extract as described ) were joined and labeled with Cy3-dCTP using the Klenow fragment and random primers of the BioPrime kit from Invitrogen . 
+ An equal amount of hybridization control DNA was labeled with Cy5-dCTP . 
+ Hybridization was for about 36 h at 55 °C to E. coli whole genome microarrays from Oxford Gene Technology . 
+ The arrays have a probe length of 60 bases and a start to start spacing of about 150 bases . 
+ ChIP-Chip analysis were made in duplicates , except the crosslink-reversion array ( Fig. 5 ) , the ΔseqA arrays ( Fig. 6 ) and the shorter RNase incubation array ( Fig. 8A ) . 
+ Please note that the array hybridized with the SeqA ChIP of the ΔseqA strain with the unmodified method was of poor quality but regarded sufficient for its purpose described above . 
+ Microarray data processing Arrays were scanned on an Agilent SureScan High-Reso-lution Scanner . 
+ Spot intensities were extracted using the Feature Extraction software 10.5.1.1 from Applied Biosystems with a linear dye normalization correction method . 
+ The data were further analyzed with the statistics software R , in particular the Bioconductor package and the limma library [ 36,37 ] . 
+ The background was subtracted and data points with a value below 0 after background subtraction were removed . 
+ Ratios of g ( sample ) to r ( con trol ) were calculated and normalized to the array wide average . 
+ For arrays performed in duplicates the mean of the two normalized values was calculated . 
+ Probes in gene regions with non unique sequences were deleted ( a list is available on request ) . 
+ For σ32-target detection data obtained from heat-shocked cells were searched for two or more neighboring probes with a log2 signal > 0.5 in both replicates . 
+ This resulted in 74 potential targets ( 34 previously described , 9 described exclusively by Wade et al. , 2006 , 31 not found by Wade et al. ) . 
+ After subtraction of log2 signals of the corresponding replicates from non-heat-shocked cells , 47 potential targets remained ( Tables 2 , 3 , 4 ; 34 previously described , 7 described exclusively by Wade et al. , 2006 , 6 not found by Wade et al. ) . 
+ For peak 32 detection in σ - data of non-heat-shocked cells ( Fig. 8C ) we searched for probes with a log2 ratio > 1 and the one to the left and right > 0.5 . 
+ GenomeViz was used for visualization of ChiP-Chip data [ 38 ] . 
+ Data points with log2 ratios > 0.5 were extracted and the corresponding genome locus assigned as 1000 bp up - and down-stream of the respective probe middle . 
+ For the moving window calculation of SeqA binding the sum of positive log2 ratios of 60.000 bp windows were calculated with a step size of 1000 bps . 
+ Raw as well as processed data are available at the Genome Omnibus Database , accession number GSE19053 . 
+ To analyze the overlap of ChIP-Chip experiments a cut-off was chosen for each data set to select ~ 1000 probes with the highest ChIP signal ( or the lowest signal for the crosslinking experiment ) . 
+ The overlap is the number of probes were the signal is beyond this cut-off at similar positions in the two compared data sets . 
+ ChIP washing comparison
+ For the comparison of washing methods ( Fig. 5A ) 2 × 800 μl of crosslinked , sonicated MG1655 cell extract were incubated with 20 μl protein A/G agarose beads ( Ultralink ) without antibody for 1 h at 4 °C . 
+ One of these mock IP samples was then processed with the use of spin-col-umns and one without as described above . 
+ Eluted DNA was purified with phenol/chloroform and analysed by quantitative PCR as described below . 
+ Note that purification of the DNA with a Qiagen PCR cleanup kit gave the same results as the phenol extraction ( data not shown ) . 
+ Crosslink comparison To compare crosslinked-reversed with non crosslinked DNA 100 ml E. coli MG1655 LB culture was grown at 37 °C to an OD600 of 0.15 . 
+ After collecting 50 ml as ` non crosslinked ' sample , crosslinking was done as described above . 
+ Crosslinked and non crosslinked cells were washed and sonicated corresponding to the ChIP-Chip protocol above . 
+ For experiments presented in Fig. 5B and 5D , 400 μl of the sonicated extracts were mixed with 400 μl TE and incubated with 2 μl RNase A ( 20 mg/ml ) at 42 °C for 1 h. Next , 200 μl proteinase K ( 20 mg/ml ) were added and samples incubated for 2 h at 42 and 6 h at 65 °C . 
+ For experiments without proteinase K shown in table 1 , 200 μl of crosslinked extract was mixed with 200 μl TE and incubated at 65 °C over night or 10 min at 100 °C . 
+ For the other experiments 200 μl were mixed with 160 μl TE plus 40 μl proteinase K ( 20 mg/ml ) and incubated at 37 °C over night or for 2 h at 42 °C followed by 65 °C for 6 h. DNA was extracted with phenol and chloro-phorm and analyzed by microarray hybridization ( as above ) or qPCR as described below 
+ Reactions were carried out in triplicates of 25 μl volume each . 
+ About 10 ng DNA was used as template in 10 μl ddH2O and added to a mix of 12.5 TaqMan Gene Expression mix ( Applied Biosystems ) and 2.5 μl primer mix ( 9 μM each forward and reverse primer and 2.5 μM probe ) in 96 well PCR plates . 
+ For a primer list see Table 5 . 
+ Reactions were carried out with a 7500 Real Time PCR System 
+ ( Applied Biosystems ) . 
+ The system software was used to calculate Ct values which were transformed to relative values of template DNA . 
+ qPCR values for the yahEF gene region were used for normalization . 
+ Abbreviations
+ ChIP-Chip : chromatin immunoprecipitation combined with microarrays ; ChIP-Seq : chromatin immunoprecipitation combined with next generation sequencing ; IP : immunoprecipitation . 
+ Authors' contributions
+ TW designed and carried out the experiments , analyzed the data and drafted the manuscript . 
+ KS participated in design of the study , interpretation of data and in writing of the manuscript . 
+ Both authors read and approved the final manuscript . 
+ Acknowledgements
+ We thank Franz Narberhaus ( Bochum ) for the σ32 antiserum and Douglas Hurd ( Oxford Gene Technology ) for instruction in DNA labeling and microarray hybridization . 
+ We are grateful for the support from the Helse Sør-Øst / University of Oslo Microarray Core Facility , supported by the functional genomics programme ( FUGE ) of the Research Council of Norway . 
+ We thank Erik Boye , Frank Führer and Leonardo A. Meza-Zepeda for critical reading of the manuscript and the Skarstad-group for helpful discussions . 
+ Irene Kim is acknowledged for her help with submission of the microarray data to the Genome Omnibus Data-base . 
+ Supported by the Norwegian Research Council FUGE program and the German Research Foundation ( WA 2713/1 -1 ) . 
+ Received: 9 February 2010 Accepted: 5 July 2010 Published: 5 July 2010
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/21097887.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/21097887.txt 0 → 100644
View file @27818a9
+ Direct and indirect effects of H-NS and Fis
+ ABSTRACT 
+ Nucleoid-associated proteins ( NAPs ) are global regulators of gene expression in Escherichia coli , which affect DNA conformation by bending , wrapping and bridging the DNA . 
+ Two of these -- H-NS and Fis -- bind to specific DNA sequences and structures . 
+ Because of their importance to global gene expression , the binding of these NAPs to the DNA was previously investigated on a genome-wide scale using ChIP-chip . 
+ However , variation in their binding profiles across the growth phase and the genome-scale nature of their impact on gene expression remain poorly understood . 
+ Here , we present a genome-scale investigation of H-NS and Fis binding to the E. coli chromosome using chroma-tin immunoprecipitation combined with highthroughput sequencing ( ChIP-seq ) . 
+ By performing our experiments under multiple time-points during growth in rich media , we show that the binding regions of the two proteins are mutually exclusive under our experimental conditions . 
+ H-NS binds to significantly longer tracts of DNA than Fis , consistent with the linear spread of H-NS binding from high - to surrounding lower-affinity sites ; the length of binding regions is associated with the degree of transcriptional repression imposed by H-NS . 
+ For Fis , a majority of binding events do not lead to differential expression of the proximal gene ; however , it has 
+ Nucleic Acids Research , 2011 , Vol . 
+ 39 , No. 6 2073 -- 2091 doi :10.1093 / nar/gkq934 a significant indirect effect on gene expression partly through its effects on the expression of other transcription factors . 
+ We propose that direct transcriptional regulation by Fis is assoc-iated with the interaction of tandem arrays of Fis molecules to the DNA and possible DNA bending , particularly at operon-upstream regions . 
+ Our study serves as a proof-of-principle for the use of ChIP-seq for global DNA-binding proteins in bacteria , which should become significantly more economical and feasible with the development of multiplexing techniques . 
+ INTRODUCTION
+ Transcription in bacteria is controlled by a combination of DNA sequence , topology and a range of trans-acting factors ( 1 ) . 
+ The best-studied trans-acting regulators are transcription factors ( TFs ) which modulate transcription by promoting or inhibiting the interaction of the DNA-dependent RNA-polymerase with promoter regions . 
+ Bacterial regulators are broadly classiﬁed into global and local ( 2 ) based primarily on the number of genes that they target for regulation : notable among the former are a subset of ` nucleoid-associated proteins ' ( NAPs ) ( 3 ) . 
+ Many NAPs alter the topology of the bound DNA by bending , wrapping or bridging it ( 3,4 ) . 
+ This has multiple effects on the bacterial cell , among which is transcriptional regulation . 
+ Analysis of 12 types of NAPs present in 
+ Escherichia coli showed that they differ in their expression during the growth phase ( 5 ) , the degree of sequence spe-ciﬁcity ( 6 ) and the capacity for post-translational modiﬁcations . 
+ Two of the best-studied NAPs , H-NS and Fis , display peak expression during exponential growth and bind to speciﬁc DNA sequences and structures . 
+ Neither protein is known to be post-translationally modiﬁed . 
+ H-NS is a global repressor of transcription in enterobacteria . 
+ It acts as a ` genome sentinel ' ( 7 ) by suppressing the transcription of horizontally-acquired genes , thus providing a ﬁtness beneﬁt to Salmonella grown under laboratory conditions ( 8 ) . 
+ It is expressed throughout the growth phase , but shows maximal expression during exponential growth ( 5 ) , though conﬂicting data -- that H-NS expression is constant across the growth phase -- have been presented for Salmonella ( 9 ) . 
+ H-NS displays sequence-speciﬁc binding and simultan-eously affects chromosome structure and transcription by forming DNA -- H-NS -- DNA bridges , so reinforcing plectonemically supercoiled structures ( 10,11 ) . 
+ Fis is a versatile protein which affects multiple processes including transcription , replication and recombination ( 12 ) . 
+ In contrast to H-NS , the expression of Fis peaks during exponential phase , but decreases to undetectable levels in stationary phase ( 5 ) ; therefore , it is thought to be an important player in controlling the growth transition to stationary phase ( 13 ) . 
+ Currently available information in RegulonDB database ( 14 ) indicates that , as a TF Fis can activate as well as repress gene expression . 
+ On binding , Fis introduces an interwound and branched structure in the DNA where a branch is deﬁned as ` a separate DNA lobe containing at least one intrinsic cross-over ' ; these structures may be associated with regions of high transcriptional activity ( 15 ) . 
+ Fis inﬂuences the distribution of DNA topoisomers in a population of cells : for example ﬁs deletion leads to a decreased proportion of cells with low negative supercoiling in stationary phase ( 13 ) , which might have an impact on stationary phase gene expression . 
+ Analysis of general trends for transcriptional control by Fis and H-NS have generally been performed using compilations of data from small-scale experiments ( 16 ) . 
+ Recently , the use of chromatin immunoprecipitation ( ChIP ) , followed by DNA hybridization to genome-tiling microarrays has led to a systematic and relatively less biased identiﬁcation of genomic loci physically associated with these proteins , primarily in mid-exponential phase of growth . 
+ The only study to have investigated both proteins simultaneously -- using microarrays with probes tiled at 160bp resolution -- showed that there is signiﬁcant overlap between the genes targeted by the two proteins ( 17 ) . 
+ There are two contrasting models of how H-NS represses transcription : Lucchini and colleagues proposed that H-NS inhibits the initial RNA-polymerase-DNA interaction ( 8 ) , whereas Grainger and co-workers and Oshima et al. demonstrated that the polymerase is trapped at the promoter ( 17,18 ) . 
+ For Fis , the majority of bound genes were shown not to change in expression in a ﬁs deletion strain ( 19 ) , which is intriguing given that Fis is considered to be a global regulator of transcription . 
+ Despite the above studies , we do not know whether and how the binding of these proteins to the DNA varies across the growth phase . 
+ This is particularly important since their expression levels are known to change substantially during growth . 
+ It has been previously suggested that H-NS might act both as a canonical TF and as a silencer of gene expression ( 20 ) : however , the distinction between these two modes of H-NS function have not been described on a genomic scale . 
+ Finally , given prior observations of limited overlap between genes bound by Fis and those that change in expression in a deletion strain ( 19 ) , the genome-scale impact of Fis -- DNA interactions on gene expression remains poorly understood . 
+ Here , we present an investigation of genomic loci bound by Fis and H-NS in E. coli K12 using ChIP followed by high-throughput sequencing , instead of microarray hybridization , of the immunoprecipitated DNA ( ChIP-Seq ) ( 21 ) . 
+ Improvements in sequencing have revolutionized genomics by providing a platform for quantifying nucleic acid concentrations that affords higher dynamic range , higher resolution and lower false positive rates ( 22,23 ) . 
+ These are now being used extensively to investigate protein -- nucleic acid interactions in eukaryotes ( 21,24 -- 27 ) . 
+ In bacteria , however , their use has been largely limited to whole-genome sequencing and trans-criptomic analysis ( 28 -- 33 ) , though transcriptome-level investigations have been extended using immunopre-cipitation-based interrogation of protein -- RNA interactions in Salmonella ( 34 ) . 
+ Recently , a ChIP-Seq-based analysis of a Mycobacterium tuberculosis TF DosR , which binds to 25 loci on the genome , was published ( 35 ) . 
+ To our knowledge , ours is the ﬁrst detailed genome-scale interrogation of protein -- DNA interactions , for any global DNA binding protein in bacteria , using high-throughput sequencing . 
+ In addition to providing a proof-of-principle for the use of this new technology for bacteria , we perform our study at multiple time-points during growth in rich medium , thus generating new insights into how these proteins function under different cellular conditions . 
+ Further , by analysing our data in conjunction with gene expression and RNA-polymerase -- DNA interaction data we provide new interpretation of the regulatory functions of these proteins . 
+ MATERIALS AND METHODS
+ Strains and general growths conditions
+ The E. coli K-12 MG1655 bacterial strains used in this work are the following : E. coli MG1655 ( F-lambda-ilvG - rfb-50 rph-1 ) ; MG1655 Dhns ( Dhns : : Kanr ) ; MG1655 Dﬁs ( Dﬁs : : Kanr ) ; MG1655 hns-FLAG ( hns : :3 xFLAG : : Kan ) ; MG1655 ﬁs-FLAG r ( ﬁs : :3 xFLAG : : Kan ) . 
+ Luria-Bertani ( 0.5 % NaCl ) broth r and agar ( 15 g/liter ) were used for routine growth . 
+ Where needed , ampicillin , kanamycin and chloramphenicol were used at ﬁnal concentrations of 100 , 30 and 30 mg/ml , respectively 
+ Disruption of hns and ﬁs genes in the E. coli chromosome was achieved by the Red recombination system , previously described by Datsenko and Wanner ( 36 ) . 
+ Primers designed for this purpose are shown in Supplementary Data 19 . 
+ Sets of additional external primers were used to verify the correct integration of the PCR fragment by homologous recombination ( Supplementary Data 19 ) . 
+ The 3xFLAG epitope was added at the C terminus of the H-NS and Fis protein by a PCR-based method with plasmid pSUB11 as template ( 37 ) . 
+ Primers used for introducing the 3xFLAG tag are shown in Supplementary Data 19 . 
+ The tagged construct was then introduced onto the chromosome of E. coli MG1655 using the Red recombinase system ( 36 ) . 
+ At each stage , DNA and strain constructions were conﬁrmed by PCR and/or sequencing . 
+ This approach resulted in the introduction of a kanamycin resistance cassette in the chromosome downstream of the tagged gene . 
+ The cassette can be removed by FLP-mediated site-speciﬁc recombination ( 36 ) , although this was not done for the experiments described here . 
+ In all cases , the complete functionality of the 3xFLAG-tagged version of the proteins was tested . 
+ RNA extraction and microarrays procedures
+ To prepare cells for RNA extraction , 100 ml of fresh LB was inoculated 1:200 from an overnight culture in a 250 ml ﬂask and incubated with shaking at 180 rpm in a New Brunswick C76 waterbath at 37 C. Two biological replicates were performed for each strain and samples were taken at early-exponential , mid-exponential , transition-to-stationary and stationary phase . 
+ The cells were pelleted by centrifugation ( 10000g , 10 min , 4 C ) , washed in 1 PBS and pellets were snap-frozen and stored at 80 C until required . 
+ RNA was extracted using Trizol Reagent ( Invitrogen ) according to the manufacturer 's protocol until the chloroform extraction step . 
+ The aqueous phase was then loaded onto mirVanaTM miRNA Isolation kit ( Ambion Inc. ) columns and washed according to the manufacturer 's protocol . 
+ Total RNA was eluted in 50ml of RNAase free water . 
+ The concentration was then determined using a Nanodrop ND-1000 machine ( NanoDrop Technologies ) , and RNA quality was tested by visualization on agarose gels and by Agilent 2100 Bioanalyser ( Agilent Technologies ) . 
+ For the generation of ﬂuorescence-labelled cDNA the FairPlay III Microarray Labelling Kit ( Stratagene ) was used . 
+ Brieﬂy , 1mg of total RNA was annealed to random primers , and cDNA was synthesized in a reverse transcription reaction with an amino allyl modiﬁed dUTP in the presence of 1 mg of Actinomycin D . 
+ The amino allyl labelled cDNA was then coupled to a Cy3 dye ( GE Healthcare ) containing a NHS-ester leaving group . 
+ The labelled cDNA was hybridized to the probe DNA on custom Agilent microarrays by incubating at 65 C for 16 h . 
+ The unhybridized labelled cDNA was removed and the hybridized labelled cDNA was visualized using an Agilent Microarray Scanner . 
+ Note that we performed a one-colour experiment on the Agilent array . 
+ RT–PCR for validation
+ To validate the results of the microarray analysis , quantitative reverse-transcriptase PCR ( qRT-PCR ) was carried out using speciﬁc primers to the mRNA targets showing up - or down-regulation , and control targets not showing differential expression . 
+ RNA was extracted as described above from wild-type , Dhns and Dﬁs cells and 30 ng total RNA was used with the Express One-Step SYBR GreenER kit ( Invitrogen ) according to the manufacturer 's guidelines , using a MJ Mini thermal cycler ( Bio-Rad ) . 
+ ChIP
+ ChIP was performed as previously described ( 38 ) with some modiﬁcations to the protocol . 
+ Cells were grown aerobically at 37 C to the desired OD600 and formaldehyde was added to a ﬁnal concentration of 1 % . 
+ After 20 min of incubation , glycine was added to a ﬁnal concentration of 0.5 M to quench the reaction and incubated for a further 5 min . 
+ Cross-linked cells were harvested by centrifugation and washed twice with ice-cold TBS ( pH 7.5 ) . 
+ Cells were resuspended in 1 ml of lysis buffer [ 10 mM Tris ( pH 8.0 ) , 20 % sucrose , 50 mM NaCl , 10 mM EDTA , 20 mg/ml lysozyme and 0.1 mg/ml RNase A ] and incubated at 37 C for 30 min . 
+ Following lysis , 3 ml immunoprecipitation ( IP ) buffer [ 50 mM HEPES -- KOH ( pH 7.5 ) , 150 mM NaCl , 1 mM EDTA , 1 % Triton X-100 , 0.1 % sodium deoxycholate , 0.1 % sodium dodecyl sulphate ( SDS ) and PMSF ( ﬁnal concentration 1 mM ) ] was added and the lysate passed through a French pressure cell twice . 
+ Two microlitres of aliquots were removed and the DNA sheared to an average size of 200 bp using a Bioruptor ( Diagenode ) with 30 cycles of 30 s on/off at high setting . 
+ Insoluble cellular matter was removed by centrifugation for 10 min at 4 C , and the supernatant was split into two 800 ml aliquots . 
+ The remaining 400 ml was kept to check the size of the DNA fragments . 
+ Each 800 ml aliquot was incubated with 20 ml Protein A / G UltraLink Resin ( Pierce ) on a rotary shaker for 45 min at room temperature to get rid of complexes binding to the resin non-speciﬁcally . 
+ The supernatant was then removed and incubated with either no antibody ( mock-IP ) , FLAG mouse monoclonal antibody ( Sigma-Aldrich ) or RNAP b subunit mouse monoclonal ( NeoClone ) and 30 ml Protein A/G UltraLink Resin , pre-incubated with 1mg/ml bovine serum albumin ( BSA ) in TBS , on a rotary shaker at 4 C overnight ( FLAG antibody ) or at room temperature for 90 min ( RNAP b subunit antibody ) . 
+ Samples were washed once with IP buffer , twice with IP buffer +500 mM NaCl , once with wash buffer [ 10 mM Tris ( pH 8.0 ) , 250 mM LiCl , 1 mM EDTA , 0.5 % Nonidet P-40 and 0.5 % sodium deoxycholate ] and once with TE ( pH 7.5 ) . 
+ Immunoprecipitated complexes were eluted in 100 ml elution buffer [ 10 mM Tris ( pH 7.5 ) , 10 mM EDTA and 1 % SDS ] at 65 C for 20 min . 
+ Immunoprecipitated samples and the sheared DNA following the Bioruptor were de-crosslinked in 0.5 elution buffer containing 0.8 mg/ml Pronase at 42 C for 2 h followed by 65 C for 6 h. DNA was puriﬁed using a PCR puriﬁcation kit ( QIAGEN ) . 
+ Prior to sequencing , th 
+ DNA fragment sizes were checked and gene-speciﬁc quantitative PCR ( qPCR ) was carried out . 
+ Real-time qPCR
+ To measure the enrichment of the Fis , H-NS or RNAP-binding targets in the immunoprecipitated DNA samples , real-time qPCR was performed using a MJ Mini thermal cycler ( Bio-Rad ) . 
+ One microlitre of IP or mock-IP DNA was used with speciﬁc primers to the promoter regions ( primer sequences are available upon request ) and Quantitect SYBR Green ( QIAGEN ) . 
+ Library construction and Solexa sequencing
+ Prior and post library construction , the concentration of the immunoprecipitated DNA samples was measured using the Qubit HS DNA kit ( Invitrogen ) . 
+ Library construction and sequencing was done using the ChIP-Seq Sample Prep kit , Reagent Preparation kit and Cluster Station kit ( Illumina ) . 
+ Samples were loaded at a concentration of 10 pM . 
+ Public data sources
+ The E. coli K12 MG1655 genome was downloaded from the KEGG database ( 39 ) . 
+ Annotations of gene coordin-ates were obtained from Ecocyc 11.5 database ( 40 ) . 
+ Literature-derived transcriptional regulatory network , including known targets for Fis and H-NS , for E. coli K12 was obtained from RegulonDB 6.2 database ( 14 ) . 
+ Targets of the global transcriptional regulator CRP were downloaded from RegulonDB 6.2 and augmented , where required , with additional targets identiﬁed by Grainger and colleagues ( 41 ) . 
+ Genomic regions with atypical composition of higher-order oligonucleotides -- and thus putatively corresponding to horizontally-acquired DNA -- were identiﬁed using the Alien Hunter software ( 42 ) . 
+ Lists of genes identiﬁed as bound by H-NS or Fis in previous high-resolution tiling microarray studies were downloaded from the respective publications ( 17 -- 19 ) . 
+ Protein occupancy domains of E. coli were downloaded from Vora et al. ( 43 ) . 
+ Orthologs between Salmonella enterica Typhimurium LT2 and E. coli K12 MG1655 were obtained from the work of Moreno-Hagelsieb and Latimer ( 44 ) . 
+ Analysis of protein binding regions from high-throughput sequencing data
+ Identiﬁcation of protein binding regions on the DNA . 
+ Sequences obtained from the Illumina Genome Analyzer were mapped to both strands of the E. coli K12 MG1655 genome using BLAT allowing no gaps and up to two mismatches . 
+ Each alignment was extended to 200 bp -- the approximate average length of DNA fragments -- on the 30 end . 
+ Only reads which mapped to a single region of the genome were considered for further analysis . 
+ For each base position on the genome , the number of reads that mapped to that position was calculated . 
+ The distribution of read counts thus obtained had a sharp peak at a low value followed by a heavy tail . 
+ Since this characteristic of the distribution is similar to that obtained for high-resolution gene expression tiling arrays , we used a procedure adopted earlier for tiling array analysis ( 45 ) . 
+ Brieﬂy , the background was a normal distribution with the following parameters : m = mode ( as computed using the ` shorth ' procedure in R ) of the entire distribution and ; s = 1.483 median absolute deviation of all values less than the mean of the entire distribution . 
+ This gives a better ﬁt of the empirical distribution than a Poisson distribution of the same m ( Supplementary Data 1 ) . 
+ The cutoff read count was deﬁned as Z = m +3 s. Any consecutive stretch of DNA where each coordinate had a read count greater than or equal to the cutoff was ﬂagged ; pairs of adjacent regions so obtained were merged to give a single binding region if they were separated by < 200 bp . 
+ Then the number of reads mapped to each binding region , normalized by the total number of reads obtained for that sample , was compared to the corresponding value from the mock-IP using a binomial test , as described in the PeakSeq algorithm ( 46 ) . 
+ Any region giving a Bonferroni-corrected P 0.01 was deﬁned as a bonaﬁde protein binding region . 
+ We performed mock-IP only in the midexponential phase taking into consideration the following : ( i ) it has been suggested that a single control library can be used across multiple ChIP-Seq experiments given that these were performed in the same organism under similar fragmentation conditions ( 47 ) ; ( ii ) qPCR data for mock-IP experiments from our laboratory show minimal and inconsistent variation across time-points . 
+ Comparison with previously published ChIP-chip datasets . 
+ For this purpose , we downloaded lists of genes identiﬁed as bound ( either upstream or in genebody ) by H-NS and Fis from published tiling microarray studies ( 17 -- 19 ) . 
+ These genes were compared with those which overlap with binding regions identiﬁed in our study ; here the cut-off for deﬁning an overlap was set at 100 bp . 
+ Here , we used the union of genes detected as bound by the protein of interest in early - and mid-exponential phases of growth to partly account for possible differences in the environmental/cellular conditions used in the compared studies . 
+ Identiﬁcation of and scanning for sequence motifs . 
+ To identify DNA sequence motifs for the binding of H-NS and Fis , we obtained the sequence of 101 bp of DNA including 50 bp on either side of the summit for each binding region . 
+ Here the summit for each binding patch was deﬁned as the base coordinate with the highest read count within that region . 
+ The sequences so obtained were scanned for motifs using the MEME software ( 48 ) with the following parameters : zero or one motif per sequence ; motif width ranging from 6 to 24 ; searching both strands of the sequences ; using a background distribution ﬁle containing the mono - and di-nucleotide frequencies of the E. coli chromosome . 
+ Then the complete sequence of each binding region was searched for the presence of these motifs using the MAST programme ( 48 ) with a P 0.001 and using the same background mono - and di-nucleotide frequencies as above . 
+ Any deﬁnition of a motif in thi work refers to those which were identiﬁed within the binding regions . 
+ Deﬁnition of operons bound by the protein of interest . 
+ We used the operon deﬁnitions available in RegulonDB 6.2 ( 14 ) to identify a set of 2567 lead genes , which are the ﬁrst genes of each operon . 
+ An operon was ﬂagged as being bound by the protein of interest if at least 50 bp of the intergenic region upstream of the operon overlapped with a binding region . 
+ For long intergenic regions , only the ﬁrst 400 bp of the sequence immediately upstream of the operon were used . 
+ Derivation of RNA-polymerase occupancy from highthroughput sequencing data . 
+ Reads obtained from the sequencing of RNA-polymerase-associated DNA were mapped to the genome and read counts obtained per base position as described above . 
+ For each gene , the median read count across all base positions corresponding to the gene body was deﬁned as its occupancy . 
+ In addition , for each lead gene , the highest read count in the upstream region was calculated and used as a representation of transcription initiation . 
+ Data from each sample were normalized to the total number of reads obtained for that sample and then divided by the corresponding value from the mock-IP . 
+ Analysis of gene expression from microarray data
+ Gene expression analyses were performed on custom-designed isothermal Agilent microarrays containing 10 821 60-mer probes covering 4373 genes . 
+ In addition to these sense probes , the array contained 4172 anti-sense probes which were excluded from this analysis . 
+ These probes were designed using Array Oligo Selector ( 49 ) . 
+ Microarray data were processed in Bioconductor using standard procedures . 
+ Brieﬂy , array data were background corrected using normexp ( 50 ) . 
+ Biological replicates were ﬁrst normalized using variance stabilization and normalization ( VSN ) ( 51 ) . 
+ All arrays , across genetic backgrounds , from the same time-point were again normalized together using VSN . 
+ Differential expression in the deletion strains compared with the wild-type at the same time-point was called at false discovery rate ( FDR ) - adjusted P-value of 0.01 using the LIMMA package ( 52 ) ; this was performed at the level of individual probes . 
+ Any gene was called differentially expressed even if one of the probes corresponding to it passed the above threshold . 
+ For direct comparison with operons that are bound by the protein of interest , we used only the list of lead genes that were differentially expressed . 
+ ` Absolute ' expression level for each probe under a given genetic background and growth phase , where required , was deﬁned as the average value across replicates ; this shows a signiﬁcant correlation with RNA-seq data obtained in our lab during exponential phase of growth ( Spearman Rank correl-ation = 0.73 ; Supplementary Data 19 ) . 
+ Statistical analysis
+ The Fisher 's exact test was used for categorical data . 
+ Wilcoxon -- Mann -- Whitney tests were performed when comparing distributions . 
+ Since the size of the distributions were typically large , we used the t-test as well to ensure that the result of our comparisons were signiﬁcant in both tests . 
+ In this article , we report P-values from the Wilcoxon test . 
+ Unless otherwise stated , a P-value cutoff of 0.01 was used to signal statistical signiﬁcance . 
+ Correlation coefﬁcients of read count ` Z ' ( see above ) between two samples were computed at the base resolution , ignoring ` background ' coordinates where the Z for both samples were < 2 . 
+ All these tests were carried out in R ( www.r-project . 
+ org ) . 
+ Accession numbers
+ All microarray and ChIP-seq data have been submitted to ArrayExpress , and have been assigned the following accession numbers : ChIP-seq : E-MTAB-332 ; Microarray design : A-MEXP-1866 ; Microarray raw and normalized data : E-MEXP-2838 ; RNA-seq data : E-MTAB-387 . 
+ RESULTS
+ Immunoprecipitation-sequencing of genomic DNA cross-linked to H-NS and Fis . 
+ We investigated H-NS - and Fis-binding to the E. coli K12 chromosome during early-exponential , mid-exponential , transition-to-stationary and stationary phases of growth in LB medium by ChIP combined with high-throughput sequencing ( 21,23 ) . 
+ As controls , we performed mock-IP experiments in mid-exponential phase in the absence of antibodies to identify non-speciﬁcally precipitated DNA . 
+ For each sample , we obtained 6 -- 15-million reads of 36-nt length , amounting to 50 -- 120-fold coverage of the E. coli genome ( Table 1 ) . 
+ We mapped these reads to both strands of the E. coli K12 MG1655 genome sequence and extended the mapping in the 30-end to 200 bp , which is the approximate average length of the DNA fragments obtained from the immunoprecipitation experiments . 
+ To identify bound loci , we calculated the number of reads that mapped to each base-pair in the genome ( Figure 1 ) . 
+ We expected to see a near-complete representation of the genome in our sequences , irrespective of where the proteins bind ; therefore we derived an internal background distribution for each sample as described earlier for tiling microarray data ( see ` Materials and methods ' section , Supplementary Data 1 ) ( 45 ) . 
+ The cutoff value for calling binding regions was ﬁxed at three standard deviations above the mean of the background normal distribution ( not more than 1 % of values within the normal distribution are higher than this cutoff ) . 
+ Any stretch of DNA where each position mapped to more reads than the above-deﬁned cutoff was called a binding region . 
+ Then , all binding regions separated by < 200 bp were merged ; this was performed to counter possible under-sequencing of chromosomal regions of length equal to a single read ( 22 ) . 
+ Finally , binding regions whose read counts did not differ signiﬁcantly from the mock-IP sample were removed . 
+ Selected binding region were veriﬁed using quantitative PCR ( Supplementary Data 19 ) . 
+ Comparison with previously published high-throughput datasets . 
+ First , we compared our dataset with previously published ChIP-chip data . 
+ Here we note that cross-study comparisons are not straightforward owing to differences in experimental conditions and platforms , analysis procedure and the manner in which data are presented . 
+ However where possible , we use published lists of bound genes and raw binding signals as points of comparison . 
+ We compared our data for H-NS ( combining early - and mid-exponential phase data ) with that from a tiling microarray-based study by Oshima and co-workers ( 18 ) . 
+ A large majority of genes ( 75 % ; Fisher 's exact test P < 10 ) ﬂagged as bound by H-NS in the above study 50 overlap with binding regions ( by at least 100 bp ) we identify ; just over a quarter of genes ( 27 % ) detected in our study are not identiﬁed by Oshima and colleagues . 
+ We then compared ChIP-chip data for Fis by Cho and colleagues ( 19 ) with our data . 
+ Overall , binding regions that we identify display signiﬁcantly higher ﬂuorescence intensities in the above dataset than randomly picked regions ( Supplementary Data 2 ) . 
+ This is despite the fact that Cho et al. performed their experiments in M9 plus glucose whereas ours was carried out in LB without supplemented sugars . 
+ Over two-thirds ( 67 % ; Fisher 's exact test P < 10 ) of genes bound by Fis in the Cho dataset 50 overlap with binding regions identiﬁed here . 
+ However , we detect a signiﬁcantly larger number of bound genes , with binding either in the gene body or in upstream regions ( 1592 genes , compared with 894 genes in the Cho dataset ) . 
+ Even at a more stringent threshold for identifying binding regions , we identify more bound genes ( 1006 genes ) than Cho et al. with a recovery of 53 % ( Fisher 's exact test P < 10 ) . 
+ 50 We then compared our dataset with the lower resolution ( 160-bp resolution ) ChIP-chip study of Grainger and colleagues ( 17 ) . 
+ For H-NS , there is excellent agreement in binding signals between the two datasets . 
+ However the overlap at the gene level is poor ( 39 % of genes ﬂagged as bound by H-NS by Grainger and colleagues are recovered here ; Supplementary Data 2 ) . 
+ We believe that the poor overlap at the gene level is a consequence of assumptions made in assigning binding regions to target genes ; this could have been exacerbated by the lower resolution of the Grainger study ( David Grainger , personal communication ) . 
+ Remarkably for Fis , there is no similarity between the two datasets either at the level of binding signals or bound genes ( 31 % of bound genes in the Grainger dataset are recovered ) . 
+ This might be a consequence of differences in experimental conditions , which might affect Fis more than H-NS because of the former 's link with catabolite repression ( 53,54 ) ; in fact , we observe a statistically signiﬁcant overlap ( Fisher 's exact test P = 9.8 10 ) between operons bound in 6 their upstream regions by Fis ( but not H-NS ) in our study and publicly-available targets of CRP ( 14,41 ) , the global regulator of catabolite repression . 
+ However , we note that there is only a limited correlation in binding signals between the studies of Cho et al. and Grainger et al. despite the fact that both studies were performed in minimal media with a difference only in the identity of the carbon source used ( glucose and fructose , respectively ; Supplementary Data 2 ) . 
+ It is possible that Fis binds to the E. coli genome extensively and that each study sampled only a subset of binding sites : this might be substantiated by the fact that the background signal is higher for Fis than H-NS ( Figure 1 ; Stephen Busby , personal communication ) . 
+ Finally , we compared the lists of genes identiﬁed as H-NS targets in S. enterica Typhimurium ( 8 ) with our data . 
+ A majority ( 58 % ) of genes bound by H-NS in Salmonella do not have orthologs in E. coli ( Fisher 's exact test , P = 5.2 10 ) . 
+ These genes are probably 36 horizontally acquired , and are exempliﬁed by the H-NS-mediated regulation of the Salmonella-speciﬁc pathogenicity islands such as Spi-1 and Spi-2 which have been horizontally-acquired . 
+ Similarly , 46 % of genes with H-NS binding in E. coli do not have orthologs in Salmonella . 
+ Therefore , the targets of H-NS are substantially different between E. coli and Salmonella . 
+ We note here that over 75 % of the conserved H-NS targets in Salmonella are bound by H-NS in E. coli ; this proportion is similar to the agreement between two independent studies of H-NS targets in E. coli ( see above ) . 
+ DNA-binding proﬁles of H-NS and Fis in mid-exponential phase . 
+ We focus on the DNA-binding proﬁles of H-NS and Fis in mid-exponential phase ( Figure 1 and Table 1 ) . 
+ H-NS binds to 17 % of the genome in terms of basepairs , whereas Fis binds to 11 % , distributed over 458 and 1464 discrete binding regions , respectively ( Figure 2A ) . 
+ In contrast to observations of Grainger and colleagues ( 17 ) -- made under substantially different growth conditions -- we ﬁnd little overlap between Fis and H-NS-binding regions ( Figure 2A ) . 
+ In fact , across the genome , there is a signiﬁcant negative correlation between the binding signals for H-NS and Fis . 
+ H-NS binds to longer tracts of DNA than Fis ( averages of 1686 and 355 bp for H-NS and Fis , respectively ; Wilcoxon test , P < 10 ; Figure 3A and B ; 50 Supplementary Data 3 ) . 
+ The observed length distribution for H-NS is in line with the results of a recent study in Salmonella ( 55 ) . 
+ This is consistent with the ability of H-NS to form long oligomers , extending from high afﬁnity nucleation sites to ﬂanking lower afﬁnity sites ( 10,56 ) . 
+ The H-NS binding motif ( 57 ) , deﬁned by enriched oligo-nucleotide sequences within bound regions , is 5 -- 6 nt in length and comprises only A/T nucleotides ( Figure 2B ; Supplementary Data 4 ) . 
+ This motif is present in 96 % of all binding regions at an average of 19.9 occurrences per region . 
+ In agreement with published results ( 19 ) , the 15-nt Fis motif consists of an A/T tract ﬂanked by highly conserved G/C on either side ( Figure 2B ) . 
+ This motif is present in 91 % of binding regions at an average of 2.3 occurrences per region . 
+ Note that we differentiate between binding regions and motifs : whereas binding regions are empirically identiﬁed by our experiments , motifs represent the computationally identiﬁed sequences that fall within our binding regions . 
+ On average , 18 and 17 % of all basepairs covered by H-NS and Fis binding regions -- corresponding to 24 and 21 % , respectively , of binding motifs -- fall within intergenic regions upstream of predicted operons ( Figure 2C ; Supplementary Data 5 ) . 
+ Given that 8 % of the E. coli genome comprises operon-upstream intergenic regions , Fis and H-NS display a preference for binding upstream of operons . 
+ Most of the other motifs fall within the body of operons ( 67 and 74 % for H-NS and Fis , respectively ) . 
+ In agreement with previous reports ( 8,18,58 ) , there is signiﬁcant enrichment of H-NS ( but not Fis ) binding across horizontally-acquired regions ( Figure 2D ) . 
+ Finally , we deﬁne 597 ( 23 % ) and 649 ( 25 % ) operons as bound in a regulatory capacity by H-NS and Fis , respect-ively , based on binding in upstream regulatory sequences ( applying a limit of 400 bp ) . 
+ The rest of our discussion is based on the above operons only and not those with protein binding only in the gene body ( as included in our comparison with previous studies ) . 
+ Operons targeted by H-NS are enriched for gene functions associated with ﬁmbrial biogenesis ( Fisher 's exact test , P = 5.1 10 ) , 3 which expands previous work linking H-NS to the regulation of bioﬁlm formation and motility ( 59 ) . 
+ As expected from prior molecular studies ( 60 ) , operons bound by Fis show an enrichment for genes involved in translation ( Fisher 's exact test , P = 1.6 10 ) . 
+ In agreement with 3 the signiﬁcant overlap of Fis bound genes with CRP targets , carbohydrate metabolism and transport also shows a signiﬁcant enrichment among Fis targets ( Fisher 's exact test , P = 4.7 10 ) . 
+ 3 
+ Length of H-NS binding regions and their characteristics . 
+ We had noted earlier that genomic regions bound by H-NS tend to be longer than those bound by Fis ( Figure 3 ) . 
+ In order to investigate systematically the association between the length of H-NS binding regions and genomic features recognized , we classiﬁed H-NS binding regions into those that are longer than 1000bp ( ` LH-NS ' regions ; n = 300 in mid-exponential phase ) and those that are shorter ( ` SH-NS ' regions ; n = 158 ) . 
+ We observe that a signiﬁcantly higher proportion of motifs within SH-NS ( 37 % ) than LH-NS ( 22 % ) regions fall in operon-upstream regions ( Fisher 's exact test , P = 1.2 10 ; Supplementary Data 5 ) . 
+ This 21 might be expected given the differences in their lengths and the fact that operon-upstream regions have high A / T content ( 61 ) . 
+ Unexpectedly however , the proportion of operon-upstream SH-NS motifs is signiﬁcantly higher than that for Fis motifs as well ( Fisher 's exact test , P = 1.3 10 ) . 
+ 20 We also observe that horizontally-acquired genes are signiﬁcantly enriched in the LH-NS group ( Supplementary Data 6 ) ; this is in accordance with the fact that predicted horizontally-acquired genes are located in long regions of typically higher A/T content than the genomic average ( Supplementary Data 7 ) . 
+ Therefore , short H-NS binding regions tend to behave in a manner typical of canonical TFs , where the protein binds upstream of the gene whose expression it regulates . 
+ On the other hand , longer binding regions wrap large segments of the chromosome , encompassing both genes and intergenic regions . 
+ Variable structures of Fis -- DNA complexes . 
+ It was previously demonstrated that Fis -- DNA complexes adopt variable structures depending on the A/T content of the DNA surrounding the core binding motif ( 62 ) . 
+ The variability in these complexes is manifested by the degree to which the bound DNA is bent , with greater bending in regions of higher A/T content . 
+ To investigate this in our data , we deﬁned binding regions in the top quarter of the distribution of A/T contents ( 101 bp around the summit ) as likely to be bent by Fis . 
+ The association between the A/T content of the binding region and gene expression is described later . 
+ As intergenic regions tend to have higher A/T contents and intrinsic bending , a greater proportion of motifs falling within high-A/T binding regions are in operon-upstream regions ( 40 versus 11 % ; Wilcoxon test , P < 10 ; Supplementary Data 5 ) . 
+ Furthermore , high 50 A/T Fis-binding regions show signiﬁcantly greater binding signal than other regions ( Wilcoxon test , P = 3.5 10 ; Supplementary Data 8 ) ; this might 9 reﬂect the fact that Fis-DNA complexes involving DNA bending dissociate slower than others ( 62 ) . 
+ Growth-phase dependent variation in the DNA-binding proﬁles of H-NS and Fis . 
+ Next , we investigated the variation in H-NS and Fis-binding proﬁles from the early-exponential to the stationary phases of growth ( Figure 4 ) . 
+ For H-NS , we detected similar binding at all four stages of growth ( Table 1 and Figure 4 ) . 
+ Though previous experiments showed a peak for H-NS protein expression during exponential growth followed by an 2 -- 2.5 decrease during later stages ( 5 ) , our western blot experiment showed constant H-NS levels across our experimental conditions ( Supplementary Data 9 ) , in agreement with previous results in Salmonella ( 9 ) . 
+ For Fis , we identiﬁed comparable numbers of binding regions in both early - and mid-exponential phases ( Table 1 and Figure 4 ) . 
+ In agreement with earlier studies , our western blots ( Supplementary Data 9 ) show that Fis is expressed below detectable levels after exponential growth ( 5 ) . 
+ Though the binding proﬁles are signiﬁcantly correlated between time-points ( Figure 4A ) , there are speciﬁc differences ( Supplementary Data 10 ) . 
+ For H-NS , the number of binding regions and genes targeted for binding increase as the cells progress from exponential to stationary phase ; this includes both stationary phase-speciﬁc binding regions and extension of mid-exponential phase binding regions . 
+ For Fis , we observe greater variability in binding between early - and mid-exponential phases than in H-NS ( r = 0.85 for Fis compared with 0.95 for H-NS , between early-and mid-exponential phases ) , with more binding in mid-exponential phase . 
+ However , we advocate caution in interpreting these results , as they may represent marginal quantitative differences resulting from the thresholds used to call binding events and therefore have limited biological relevance . 
+ Finally , as mentioned in the section above , there is a negative correlation between H-NS and Fis at each time-point ( Figure 4A ) , suggesting that the binding regions of the two proteins tend to be mutually exclusive . 
+ Direct , proximal effects of H-NS and Fis binding on gene expression 
+ Genes bound by H-NS and Fis show different expression levels in wild-type E. coli . 
+ Having examined the pattern of DNA binding by H-NS and Fis , we investigated whether genes bound by H-NS and Fis showed distinct patterns of gene expression in wild-type E. coli cells during mid-exponential phase . 
+ Using one-colour experiments on Agilent oligonucleotide microarrays , we found that absolute gene expression levels ( which correlate with expression measures derived from RNAseq data ) were : ( i ) lower for genes bound by H-NS than those that are not ; and ( ii ) higher for genes bound by Fis ( Figure 5A ; Supplementary Data 11 ) . 
+ We make consistent observations in experiments in which we measure genome-wide RNA-polymerase occupancy during mid-exponential phase using ChIP-seq ( Figure 5B ) . 
+ The former observation is in line with the accepted role of H-NS as a global repressor of gene expression ( 63 ) . 
+ The latter , linking Fis binding to higher expression levels , may be consistent with the hypothesis that a branched DNA topology , which is induced by Fis binding , is a chromatin state that is associated with transcriptional activity ( 15 ) . 
+ We compared our data with a public dataset from Vora and colleagues describing general protein occupancy across the E. coli genome ( Supplementary Data 12 ) ( 43 ) . 
+ These authors classiﬁed domains of high occupancy into those with high gene expression ( hePOD ; highly expressed protein occupancy domains ) and those that are transcriptionally silent ( tsPOD ) . 
+ As expected , we ﬁnd a strong enrichment for H-NS-bound genes within tsPODs ( Fisher 's exact test , P < 10 ) . 
+ In contrast to observations by the 50 above authors ( made using computational searches of Fis-binding motifs ) , we ﬁnd that Fis-bound genes are under-represented within tsPODs ( Fisher 's exact test , P = 9.0 10 ) . 
+ 5 Though these results show that there is an association between protein binding and the transcriptional state of the corresponding genes , they do not establish causality . 
+ In order to test this in vivo , we measured gene expression levels for Dhns and Dﬁs strains of E. coli K12 MG1655 , and veriﬁed selected results using RT-PCR . 
+ In agreement with our observations of expression levels in wild-type strains , more genes are up - than down-regulated in Dhns when compared with the wild-type ( 971 are up-regulated ; 335 are down-regulated in mid-exponential phase ; Supplementary Data 13 ) , whereas the contrary is true for Dﬁs ( 338 are down-regulated ; 160 are up-regulated ) . 
+ In order to investigate whether these effects are proximal on the chromosome to the binding regions of Fis and H-NS , we compared our ChIP-seq-based binding proﬁles with the genes that are differentially expressed in the mutant strains when compared with the wild-type . 
+ Global transcriptional repression by H-NS . 
+ A signiﬁcant proportion of genes that are bound by H-NS display differential up-regulation of gene expression in Dhns during mid-exponential phase : 65 % of H-NS -- bound genes are differentially expressed compared with only 19 % of genes not bound by H-NS ( Figure 5C ; Supplementary Data 14 ) . 
+ Similarly , the RNA-polymerase occupancy in the body of genes bound by H-NS increases signiﬁcantly in Dhns , again demonstrating increased transcription in the mutant strain ( Figure 5D ) . 
+ Previous genome-scale studies had reached conﬂicting conclusions on the manner in which H-NS represses transcription . 
+ ChIP-chip data for S. enterica Typhimurium H-NS by Lucchini and colleagues showed that RNA-polymerase is excluded from H-NS bound regions ( 8 ) . 
+ On the other hand , the work of Grainger et al. and Oshima et al. showed that the polymerase was bound to 50 -- 65 % of H-NS bound sites though the associated genes were transcriptionally inactive ( 17,18 ) ; as a result they proposed that H-NS-mediated repression might generally involve trapping the polymerase at the promoter . 
+ We ﬁn a distinct increase in the RNA-polymerase occupancy upstream of operons bound by H-NS in Dhns when compared with the wild-type ( Supplementary Data 15 ) , which is concomitant with a corresponding increase in the enzyme 's occupancy in the gene body ; this suggests that our data support the conclusions of Lucchini et al. . 
+ However , it must be mentioned here that RNA-polymerase trapping by H-NS , though not prevalent in our data , has been experimentally demonstrated at certain promoters ( 64,65 ) . 
+ The differences between the studies , especially with that by Grainger and colleagues , must be interpreted in light of the substantially different numbers of bound genes identiﬁed . 
+ In order to extend our analysis further , we performed our DNA microarray-based analysis of gene expression change under all four conditions of growth ( Supplementary Data 13 and 14 ) . 
+ H-NS has a statistically signiﬁcant direct impact on gene expression across all phases of growth . 
+ However , compared with midexponential phase a much smaller proportion of genes bound by H-NS are differentially regulated in Dhns in stationary phase ( 65 % of H-NS bound genes are ﬂagged as differentially expressed in mid-exponential compared with only 26 % in stationary phase ) . 
+ This could partly be a consequence of the relatively poor quality of RNA that could be collected from the stationary phase cells , which would lead to the assignment of weaker statistical signiﬁcance to differential regulation ; the total number of genes called as differentially expressed in stationary phase is far less than that in mid-exponential phase ( 1313 differentially expressed genes in mid-exponential compared with only 400 in stationary phase ) . 
+ Alternatively , there could be a biological basis to this , in which any possible gene expression increase in Dhns is suppressed by other stationary phase-speciﬁc factors . 
+ Differential expression in Dhns is associated with the length of binding regions . 
+ Having described the effect of H-NS binding on gene expression , we now examine the effect of different types of binding ( L / long and S / short ) H-NS H-NS described earlier . 
+ Both LH-NS and SH-NS operons show a signiﬁcant tendency to be differentially expressed in Dhns ( Figure 6C and D ; Supplementary Data 14 ) ; however , LH-NS operons tend to display more differential expression than S indicating a greater degree of repression . 
+ H-NS Further , in the wild-type , L genes show lower expres-H-NS sion levels than S genes ( Figure 6A and B , H-NS Supplementary Data 11 ) . 
+ To test further whether LH-NS and SH-NS genes represent distinct modes of transcriptional repression , as indicated by the above results , we compared their occurrence within tsPODs which represent transcriptionally silent loci ( 43 ) . 
+ We ﬁnd that LH-NS genes are enriched within tsPODs , whereas SH-NS genes are not ( Fisher 's exact test , P = 4.7 10 comparing L and S genes ; 13 Supplementary Data 12 ) . 
+ Together , these suggest that global regulation of transcription by H-NS may encompass : ( i ) transcriptional modulation , typically mild repression , of SH-NS genes and ( ii ) ` total ' transcriptional ` silencing ' of LH-NS genes , including putative horizontally-acquired genes ( 20 ) . 
+ The former , given the propensity of the corresponding binding regions to lie within operon-upstream regions , might act like a canonical TF ; transcriptional silencing on the other hand involves extensive wrapping of large tracts of the chromosome . 
+ Based on the overall distribution of the lengths of H-NS-binding regions , we suggest that the predominant role of H-NS is transcriptional silencing . 
+ Genes bound by Fis show only limited change in expression in Dﬁs . 
+ Though the role of H-NS as a transcriptional repressor is well-established , the impact of Fis on gene expression on a genomic scale remains unclear . 
+ Given that genes bound by Fis , on average , have higher expression levels in wild-type E. coli , one might reasonably expect these genes to be down-regulated in Dﬁs . 
+ Activation of transcription of individual operons , those of stable RNA in particular , by Fis is well-characterized ( 60,66,67 ) ; an inspection of regulatory targets for Fis in the RegulonDB database suggests that it activates more genes than it represses . 
+ However , it must be emphasised that activation of gene expression does not fully explain the regulatory roles of Fis as it is a key repressor of several non-essential genes during exponential growth ( 68 -- 70 ) . 
+ In our study , the large number of genes differentially expressed in Dﬁs account for only a small proportion of Fis-bound genes ( Figure 5C ; Supplementary Data 14 ) . 
+ We also make a consistent observation in the sequencingbased RNA-polymerase occupancy data for midexponential phase ( Figure 5D ) , thus indicating that the above is not an artefact of the array technology . 
+ Our results are in agreement with a previously published ChIP-chip study of Fis , which showed differential expression for only about a quarter of bound genes ( 19 ) . 
+ Curiously , despite the general agreement in Fis-bindin regions between early - and mid-exponential phases of growth , there is little overlap between the sets of genes that are differentially expressed in Dﬁs between the two time points ; similar observations were made earlier for Fis in E. coli ( 71 ) and IHF in Salmonella typhimurium ( 72 ) . 
+ These data indicate that deletion of ﬁs is not sufﬁcient to cause expression change in most genes that are bound by this protein ; this might be because Fis only has a weak role as a TF in these genes , or because these effects are compensated for by other cis - and trans-acting players which we do not study here . 
+ Differential expression in Dﬁs is associated with the length , strength and position of Fis binding . 
+ We then investigated whether binding regions associated with the relatively fewer differentially-expressed genes in Dﬁs show any distinctive property . 
+ These , when compared with binding regions associated with genes not differentially expressed in Dﬁs , ( i ) tend to be longer ( Figure 7A ; Wilcoxon test , P = 2.5 10 for mid-exponential phase ) and conse-10 quently contain more Fis binding motifs ( Figure 7B ) ; ( ii ) have higher A/T content ( Figure 7F ; Wilcoxon test , P < 10 ) . 
+ Following from the latter ( see ` Variable struc-50 tures of Fis -- DNA complexes ' section ) , these binding regions also tend to have higher binding signals ( Figure 7C and D ; Wilcoxon test , P = 7.0 10 ) , and contain a 8 greater proportion of operon-upstream motifs ( Figure 7E ; Wilcoxon test , P = 3.0 10 ) . 
+ 28 These results indicate that change in expression of a gene bound by Fis might require Fis-binding in multiple tandem copies , possibly nucleated by high-afﬁnity sites at operon-upstream regions . 
+ The higher A/T content of binding motifs associated with proximal differential expression suggests that , in accordance with observations made on a molecular scale , DNA-bending by Fis might be required for gene expression control ( 62 ) . 
+ These are exempliﬁed by the tyrT promoter which is regulated by three Fis dimers binding and bending the DNA ( 66,67 ) . 
+ However , these features are not predictive of differential expression ( Supplementary Data 16 ) , indicating that de-ﬁnitive determinants of gene expression control by Fis are still lacking . 
+ Indirect and non-proximal effects of H-NS and Fis binding on gene expression Down-regulation of highly expressed genes in Dhns and Dﬁs . 
+ A large number of genes are down-regulated in both Dhns and Dﬁs , a large majority of which are not bound by the NAPs concerned ; therefore these effects are likely to be indirect . 
+ Genes that are down-regulated in the two deletion strains tend to have higher expression levels than other genes in the wild-type strain ( Figure 8 ) . 
+ Thus , despite the dissimilarities in the binding of H-NS and Fis , an important minority of their inﬂuence on gene expression -- especially of highly expressed genes -- is shared . 
+ This might be a consequence of the impact the two proteins have on the topology of the chromosome -- its supercoiled state in particular ( 73 ) -- which , despite in vitro studies on plasmids and phage DNA , is only beginning to be characterized on a genome-wide scale and at a high resolution ( 74,75 ) . 
+ Given that genes that are down-regulated in the deletion strains tend to have high expression levels , we sought to mine our data to speculate on how the free RNA-polymerase molecules thus generated are redistributed in the mutants . 
+ A signiﬁcantly higher proportion of genes up-regulated in Dhns than in Dﬁs have RNA-polymerase occupancy that are within the top 10 % of highly expressed genes ( 12 % of up-regulated genes in Dhns , 3 % in Dﬁs , P = 2.6 10 ) . 
+ Thus , both deletions lead to fall in 4 expression of highly expressed genes ; however , the manner in which the free RNA-polymerase molecules are redistributed may be different between the two . 
+ In Dﬁs , these are probably distributed across genes with relatively low expression levels ; on the other hand , in Dhns this is compensated for by a subset of genes whose repression is relieved by the lack of H-NS ( 51 of 80 up-regulated operons in the top 10 % of genes with the highest RNA-polymerase occupancy in Dhns are bound by H-NS ) . 
+ Non-proximal effects of H-NS on motility . 
+ An observable phenotype of Dhns is loss of motility . 
+ These genes are not directly regulated by H-NS , making them targets for studying non-proximal effects of H-NS on gene expression . 
+ Though the expression of the transcription factor FlhDC -- the master regulator of ﬂagellar gene expression -- has been reported to be directly regulated by H-NS ( 76 ) , we do not ﬁnd evidence for the same in any of the conditions tested . 
+ Instead , we ﬁnd that 17 of the 26 operons coding for cyclic-di-GMP-metabolising 
+ GGDEF/EAL domain-containing proteins , which regulate the switch between motility and adhesion , are bound by H-NS in at least one of the four conditions ; 22 of the 29 such genes are differentially expressed in Dhns ( Supplementary Data 17 ) . 
+ It has already been shown that two GGDEF/EAL proteins that inversely control adhesion through regulating curli biogenesis are regulated by H-NS ( 59 ) ; indeed , we observe binding and regulation of csgD -- a transcriptional regulator of curli biogenesis -- by H-NS under all conditions . 
+ Our genome-scale study indicates that H-NS is a global regulator that is positioned at the apex of the c-di-GMP regulatory network controlling motility and adhesion . 
+ Cascading transcriptional regulatory interactions are responsible for part of non-proximal effects of Fis . 
+ Finally , a large majority of genes bound by Fis show little change in gene expression in Dﬁs . 
+ However , the Dﬁs mutation leads to a global change in gene expression during the exponential phases of growth , with over 950 genes differentially expressed in early - or mid-exponential phases of growth . 
+ Clearly , most of these gene expression changes are caused by indirect effects . 
+ These effects might be mediated by the impact of Fis on the overall chromosome topology . 
+ A second , more tract-able , effect might be through cascades of transcription factors . 
+ To investigate this , we used the transcriptional regulatory network comprising 3254 interactions between 163 TFs and 1450 target genes available in RegulonDB . 
+ We ﬁnd that 37 TFs , including the proliﬁc global regulator CRP , are differentially expressed in Dﬁs in early - or mid-exponential phases of growth . 
+ Of the 851 annotated targets of these TFs , 316 ( 37 % ) are differentially expressed in Dﬁs ; this represents a signiﬁcant enrichment over other genes of which only 20 % are differentially expressed ( Fisher 's Exact test , P = 5.9 10 ) . 
+ Of the 37 TFs differentially expressed , 13 only 12 are bound directly by Fis . 
+ The regulatory cascade effect described holds even if we were to restrict our analysis to the targets of these 12 TFs ( 199 of 541 targets are differentially expressed ; 37 % ) . 
+ Of the remaining 25 TFs , 10 are known direct targets of the Fis-bound TFs . 
+ Therefore the expression change of 22 of the 37 TFs can be explained by direct Fis binding or by regulation by Fis-bound TFs . 
+ In summary , a signiﬁcant proportion of genes are differentially expressed in Dﬁs probably because of the cascading effects of multiple transcription factors . 
+ DISCUSSION
+ We have investigated the genome-wide binding of two NAPs , H-NS and Fis , to the E. coli K12 MG1655 chromosome using ChIP followed by sequencing of resulting DNA . 
+ Though this technique has been extensively adopted in eukaryotic genomics , to our knowledge ours is the ﬁrst ChIP-Seq experiment for any global bacterial DNA-binding protein . 
+ We interpret our data using a combination of deletion strains , microarray-based meas-urements of gene expression and parallel-sequencing of RNA-polymerase-bound DNA fragments 
+ The binding of both NAPs has been studied on a genome-wide scale using microarrays . 
+ Grainger and colleagues studied the binding of H-NS , Fis ( and IHF ) in mid-exponential phase and expanded the list of genes known to be bound by these proteins ( 17 ) . 
+ In particular , they reported the presence of extensive overlap between the promoters bound by Fis and H-NS , which we do not observe in our conditions . 
+ Our observation of a negative correlation between the ChIP-Seq signals for Fis and H-NS is further manifested by the following observations : ( i ) H-NS binding is enriched in putative horizontallyacquired regions , whereas Fis binding is not ; ( ii ) H-NS targets are enriched in transcriptionally silent protein-occupied DNA domains , whereas Fis-bound genes are under-represented . 
+ This difference in observation between our study and that of Grainger and colleagues ( 17 ) is caused by the discrepancy between the two in identifying Fis binding regions . 
+ Though these differences are surprising , they may be explained in various ways . 
+ First , Fis might bind , with a range of afﬁnities , to most of the E. coli genome ; this may be observed in the higher background in our Fis experiment ( Figure 1 ) . 
+ Therefore , each study may be sampling a distinct set of bound loci . 
+ Second , the experimental conditions are vastly different : our experiments were carried out in rich LB medium without sugar supplements , whereas Grainger et al. performed theirs in M9 minimal medium plus fructose . 
+ This could have led to substantial differences in Fis binding proﬁles due to its reported association with catabolite repression and competition with the global transcription factor CRP ( 53,54 ) . 
+ Analysis performed here shows statistically signiﬁcant overlap between Fis-bound genes and known CRP targets . 
+ This suggested association between Fis and CRP targets might be indicative of cooperative or competitive interactions ; however given these data , this can not be substantiated at present . 
+ Taken together , there might be substantial differences in Fis function between rich and minimal medium , and in the presence and absence of catabolite repression-inducing sugars . 
+ In addition to the above , the following factors might have had relatively minor effects on the results . 
+ We used antibodies against the FLAG epitope which had been tagged to the protein of interest , whereas Grainger and colleagues used direct antibodies . 
+ The use of the same antibody against three different proteins makes the data from each protein more comparable by eliminating the effect of differential afﬁnities that different antibodies might have towards their target proteins . 
+ Though the use of a tag might alter the function of the target protein , microarray analysis of gene expression in the tagged strains show that these effects are insubstantial ( Supplementary Data 18 ) . 
+ Finally , the lengths of the fragments used for sequencing ( 200 bp ) and microarray hybridization ( 500 -- 1000 bp ) , and therefore the achievable resolution , are generally different ( 38 ) . 
+ Lucchini and co-workers used a similar low-resolution array to investigate the binding of H-NS to the genome of S. enterica Typhimurium ( 8 ) . 
+ The important conclusion of this study , which was independently demonstrated in the same organism by Navarre and colleagues ( 58 ) , was the silencing of horizontally-acquired genes by H-NS . 
+ They showed that H-NS-binding regions in general exclude RNA-polymerase . 
+ Oshima and colleagues identiﬁed binding regions of H-NS in E. coli using high-resolution microarrays and again showed its effect on horizontallyacquired genes ( 18 ) . 
+ In contrast to the conclusions of Lucchini et al. and in agreement with those of Grainger and co-workers ( 17 ) , these authors identiﬁed binding of RNA-polymerase to operon-upstream H-NS-binding regions , though the proximal genes are transcriptionally silent . 
+ Our data and analyses do not support this , possibly because of differences between the studies in the sampled binding sites , but are in agreement with those of Lucchini et al. . 
+ Both Lucchini and Navarre have also demonstrated that uncontrolled expression of H-NS-silenced genes can lead to ﬁtness defects ( 8,58 ) . 
+ However , under the conditions used in our study , the wild-type and Dhns have similar growth rates . 
+ The difference between our observations might be due to the nature of the genes which are regulated by H-NS in the two organisms . 
+ This is reﬂected in our observation that a majority of H-NS targets Salmonella are not conserved in E. coli ( see section ` Comparison with previously published high-throughput datasets ' ) , in line with the tendency of H-NS to silence horizontally-acquired genes . 
+ The above studies were performed only during mid-exponential phase of growth , though Grainger and co-workers extended theirs to a medium supporting lower growth rates ( 17 ) . 
+ A more recent genome-wide interrogation of H-NS-genome interactions by Noom and colleagues was interpreted , albeit tenuously , in the context of the formation of looped domain boundaries in the E. coli and S. typhimurium chromosomes ( 77 ) . 
+ These authors performed their study in stationary phase cells , in addition to mid-exponential cells : in agreement with the documented 2-fold decrease in H-NS levels in stationary phase , the authors found that the spacing between adjacent H-NS binding patches doubles in stationary phase . 
+ In contrast , we ﬁnd no evidence for decreased H-NS expression or binding in stationary phase , in agreement with observations made earlier for H-NS in Salmonella ( 9 ) . 
+ Cho and colleagues used high-density genome-tiling microarrays to interrogate the binding of Fis to the E. coli genome during mid-exponential growth under aerobic and anaerobic conditions , again in minimal medium ( 19 ) . 
+ They showed that there is little difference in binding proﬁles between aerobic and anaerobic conditions , a comparison we do not perform . 
+ On the other hand , unlike our study they did not investigate multiple time-points during a growth phase . 
+ Similar to our conclusions , these authors found little association between Fis binding and differential expression in Dﬁs . 
+ This extends the observations made for another global transcriptional regulator CRP in E. coli ( 41 ) , a large majority of whose binding sites are likely to have little effect on transcription . 
+ This led the authors to propose that the primary role of CRP is to structure the chromosome in an as yet uncharacterized manner ; its role as a global transcriptio factor might be an incidental development . 
+ A similar interpretation may be valid for Fis as well . 
+ Impact of binding characteristics on gene expression
+ Despite substantial overlap between our study and those of earlier investigations , we extend our interpretation by analysing the association between the nature of binding patches and its inﬂuence on gene expression . 
+ We show that H-NS binds to signiﬁcantly longer patches of the chromosome than Fis , in both early - and mid-exponential phases . 
+ We speculate that these long binding tracts might include both arms of the plectonemic supercoils and the apical loops that H-NS introduces on the bound DNA ( 3,10 ) ; however , we note that this does not rule out the fact that instead of bridging DNA , H-NS might stiffen the bound DNA at certain sites ( 78 ) . 
+ These long regions of H-NS binding enable transcriptional silencing -- displaying greater differential expression in Dhns and also showing an enrichment for being present within protein occupancy domains associated with transcriptionally silent loci -- whereas shorter patches act as gentler modulators of gene expression . 
+ Short H-NS binding regions display a greater preference towards binding to operon-upstream regions than both long H-NS - and Fis-binding regions . 
+ This tendency of short H-NS-binding regions to behave more typically like canonical transcription factors than Fis binding regions might explain the relatively greater proximal effect of short H-NS binding patches on gene expression when compared with Fis ( 40 % of genes targeted by short H-NS binding regions are differentially expressed in mid-exponential phase , whereas only 15 % of Fis targets are ; Fisher 's exact test , P = 1.6 10 ) . 
+ 8 As mentioned above , both our study and that by Cho et al. discover that a large majority of strong Fis-binding events are inconsequential from the transcriptional perspective ; however , we additionally suggest that the interaction of tandem arrays of Fis molecules to the DNA and possible DNA bending , particularly at operon-upstream regions , might be necessary , though not sufﬁcient , for affecting transcription . 
+ Further , we notice that signals in our ChIP-Seq experiments for Fis are weaker than those for H-NS ( Figure 1 ) . 
+ This observation must be interpreted with caution since the efﬁciency of immunoprecipitation may depend on the clustering of multiple target proteins on the same chromosomal loci . 
+ Additionally , this might also be due to a higher background for Fis , resulting from weak or sporadic binding events across the genome . 
+ If this difference is indeed because Fis -- DNA interactions in general are weaker and/or more dynamic than H-NS -- DNA contacts , it might be responsible for the relatively weaker association between Fis binding and proximal gene expression change . 
+ In contrast to previous studies , we also perform an analysis of the origins of non-proximal effects of the binding of Fis and H-NS to the chromosome . 
+ We show a general decrease in the expression of highly expressed transcripts in both the deletion strains , and speculate on the manner in which the RNA-polymerase is redistributed in these mutants : whereas foci of high transcriptional activity may be lost in Dﬁs , these are replaced by H-NS-bound genes in Dhns . 
+ Perspectives
+ The main roles of NAPs , particularly in relation to gene expression control , are still under active investigation . 
+ Though our study contributes to this ﬁeld , it leaves several questions , including the following , unanswered . 
+ ( i ) What is the predominant function of Fis-chromosome interactions ? 
+ ( ii ) What are the implications , if any , of our observation that , on a genome-wide scale , there is a higher background signal for Fis than H-NS ? 
+ ( iii ) What factors deﬁnitively link Fis binding to proximal gene expression change ? 
+ Finally , we also provide a proof-of-principle study for the use of massively parallel high-throughput sequencing for the analysis of protein -- DNA interactions on a genomic scale in bacteria . 
+ This is a state-of-the-art technology which affords signiﬁcantly higher resolution and dynamic range than microarray-based studies . 
+ However , there is substantial room for improvement . 
+ For example , modiﬁcations to the ChIP protocol , which minimize experimental artifacts -- including capture of large molecular weight complexes -- were proposed very recently ( 79 ) . 
+ Second , from the sequencing perspective , multiplexing techniques are under active development ( 80 ) . 
+ Since 10 -- 15-fold coverage of the genome ( compared with 150-fold obtained in our study ) should enable good recovery of binding regions for most bacterial proteins , multiplexing should make ChIP-Seq more economical and therefore prevalent in the ﬁeld . 
+ ACCESSION NUMBERS
+ Array Express E-MTAB-332 , Array Express A-MEXP-1866 , Array Express E-MEXP-2838 , European Nucleotide Archive ERP000280 , Array Express E-MTAB-387 . 
+ Supplementary Data are available at NAR Online.
+ ACKNOWLEDGEMENTS
+ The authors thank David Grainger , Sacha Lucchini and Nadia Abed for their advice on ChIP protocols . 
+ We thank Prof. Stephen Busby and Dr David Grainger for helpful discussion . 
+ FUNDING
+ Cambridge Commonwealth Trust ; St. John 's College , University of Cambridge ; Girton College , University of Cambridge ( to A.S.N.S. ) ; Spanish Ministry of Science and Innovation ( to A.I.P ) ; Biotechnology and Biological Sciences Research Council ( BBSRC ) grant ` Genomic Analysis of Regulatory Networks for Bacterial Differentiation and Multicellular Behaviour ' ( to G.M.F and N.M.L. ) ; Isaac Newton Trust ( to G.M.F ) ; European Molecular Biology Laboratory ( EMBL ) ( to N.M.L. ) . 
+ Funding for open access charge : European Molecular Biology Laboratory . 
+ Conﬂict of interest statement. None declared.
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/21572102.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/21572102.txt 0 → 100644
View file @27818a9
+ The PurR regulon in Escherichia coli K-12 MG1655
+ ABSTRACT 
+ The PurR transcription factor plays a critical role in transcriptional regulation of purine metabolism in enterobacteria . 
+ Here , we elucidate the role of PurR under exogenous adenine stimulation at the genome-scale using high-resolution chromatin immunoprecipitation ( ChIP ) -- chip and gene expression data obtained under in vivo conditions . 
+ Analysis of microarray data revealed that adenine stimulation led to changes in transcript level of about 10 % of Escherichia coli genes , including the purine biosynthesis pathway . 
+ The E. coli strain lacking the purR gene showed that a total of 56 genes are affected by the deletion . 
+ From the ChIP -- chip analysis , we determined that over 73 % of genes directly regulated by PurR were enriched in the biosynthesis , utilization and transport of purine and pyrimidine nucleotides , and 20 % of them were functionally unknown . 
+ Compared to the functional diversity of the regulon of the other general transcription factors in E. coli , the functions and size of the PurR regulon are limited . 
+ Byung-Kwan Cho, Stephen A. Federowicz, Mallory Embree, Young-Seoub Park, Donghyuk Kim and Bernhard Ø. Palsson*
+ Department of Bioengineering , University of California , San Diego , La Jolla , CA 92093 , USA 
+ INTRODUCTION
+ Metabolism enables a cell to assimilate exogenous nutrients both for energy generation and for macromolecular synthesis . 
+ A set of metabolic pathways directs the biosynthesis and utilization of nucleotides that is critical for virtually every aspect of cellular life . 
+ Purine and pyrimidine nucleotides constitute a part of the nucleic acids , cofactors in enzymatic reactions , intracellular and extracellular signals , phosphate donors and the major carriers of cellular energy ( 1,2 ) . 
+ Since the biosynthesis and utilization of the nucleotides is demanding of cellular resources and plays a broad role in cellular processes , imbalances between the different nucleotide pools signiﬁcantly perturb the normal cellular functions ( 2,3 ) . 
+ In general , metabolism is tightly controlled by feedback inhibition of enzyme activity by metabolites and by transcriptional regulation by DNA-binding proteins . 
+ The regulatory action of DNA-binding proteins is also modulated by the exogenous nutrients as stimuli . 
+ Therefore , there is great interest in not only elucidating the set of genes under regulation by the same stimuli ( deﬁned as a stimulon ) , but also identifying the collection of genes under regulation by the same regulatory protein ( deﬁned as a regulon ) . 
+ In the case of purine nucleotide metabolism in Escherichia coli , purine repressor ( PurR ) tightly regulates transcription of the enzymes involved in inosine 50-monophosphate ( IMP ) biosynthesis and the conversion of IMP to adenosine monophosphate ( AMP ) and guanosine monophosphate ( GMP ) ( 2 ) . 
+ The regulatory action of PurR on target genes is modulated by the binding of the small effector molecules [ hypoxanthine ( Hx ) or guanine ] and in effect endows PurR with the ability to affect transcriptional regulation ( 4 ) . 
+ In other words , upon availability of purine nucleotides from the environment , the activity of PurR can be enhanced to repress the expression of target genes . 
+ However , little is known about in vivo PurR-binding events and their causal relationships with gene expression at the genome scale in the presence or absence of purine nucleotides . 
+ Such information is needed to reconstruct the PurR regulon and to understand purine metabolism . 
+ The E. coli transcriptional regulatory network is believed to have a hierarchical topology with several global transcription factors ( TFs ) at the top-level ( 5 -- 7 ) . 
+ The global TFs were speciﬁed by the multiple functional categories of the genes regulated . 
+ By contrast , speciﬁc TFs restrict their target genes to the same metabolic pathways or the same functional categories ( 6 ) . 
+ Previously , PurR was classiﬁed into the group of general TFs ; however , due to the lack of information on target genes in its regulon , the understanding of the role of PurR is limited . 
+ In particular , it is unclear whether its effects on E. coli metabolism are direct or indirect . 
+ If the effects are indirect , it is also unclear whether the indirect effects are made through other TFs or other metabolites . 
+ In this study , we obtain and integrate genome-scale data from chromatin immunoprecipitation ( ChIP ) -- chip and gene expression proﬁling to elucidate the regulatory role of PurR at a genome scale . 
+ First , using changes in transcript levels on a genome scale , we deﬁned the adenine stimulon from comprehensively established sets of genes differentially expressed in response to exogenous adenine . 
+ Second , we used the purR deletion mutant to determine the PurR-dependent genes affected by the deletion mutant . 
+ Third , we set out to comprehensively establish the PurR-binding regions on the E. coli genome experimentally to further elucidate any DNA sequence motif correlated with the PurR regulatory action . 
+ Fourth , we determined the regulatory action of PurR based on the causal relationships between the association of PurR and changes in transcript levels . 
+ In the end , the reconstruction of the regulatory network of PurR allows us to understand the role of the PurR regulon as a part of the broader adenine stimulon . 
+ The results show that the role of PurR regulon is locally acting but its effect on the entire metabolism is critical in response to the exogenous purine stimulation . 
+ MATERIALS AND METHODS
+ Bacterial strains and growth conditions
+ All strains used are E. coli K-12 MG1655 and its derivatives . 
+ The E. coli strain harboring PurR-8myc was generated as described previously ( 8 ) . 
+ Deletion mutant ( purR ) was constructed by a Red and FLP-mediated site-speciﬁc recombination system ( 9 ) . 
+ Glycerol stocks of E. coli strains were inoculated into M9 minimal medium supplemented with 2 g/l glucose and cultured overnight at 37 C with constant agitation . 
+ The cultures were inoculated into 100 ml of the fresh M9 minimal medium in either the presence or absence of 100 mg/ml adenine and continued to culture at 37 C with constant agitation to mid-log phase . 
+ Transcriptome analysis
+ Samples for transcriptome analyses were taken from exponentially growing cells . 
+ From the cells treated by 2 vol of RNAprotect Bacteria Reagent ( Qiagen ) , total RNA was isolated using RNeasy kit ( Qiagen ) with DNaseI treatment in accordance with manufacturer 's instruction . 
+ AffymetrixGeneChipE . 
+ coli Genome 2.0 arrays were used for genome-scale transcriptional analyses . 
+ Complementary DNA ( cDNA ) synthesis , fragmentation , end-terminus biotin labeling and array hybridization were performed as recommended by Affymetrix standard protocol . 
+ Raw CEL ﬁles were analyzed using robust multi-array average for normalization and calculation of probe intensities . 
+ ChIP and microarray analysis
+ To identify PurR-binding regions in vivo , we isolated the DNA bound to PurR protein by ChIP . 
+ Cultures at mid-log phase were cross-linked by 1 % formaldehyde at room temperature for 25 min . 
+ After cell lysis and sonication , the cross-linked DNA-PurR complex was immunoprecipitated by using the speciﬁc antibody against myc-tag ( 9E10 , Santa Cruz Biotech ) and Dynabeads Pan Mouse IgG magnetic beads ( Invitrogen ) followed by stringent washings as described previously ( 10 ) . 
+ After reversal of the cross-links by incubation at 65 C overnight , the samples were treated by RNaseA ( Qiagen ) and proteaseK ( Invitrogen ) and then puriﬁed with a PCR puriﬁcation kit ( Qiagen ) . 
+ Then , the ampliﬁed ChIP DNA samples were labeled and hybridized onto whole-genome tiled microarrays ( Roche-NimbleGen ) . 
+ Data analysis
+ To identify PurR-binding regions , we used the peak ﬁnding algorithm built into the NimbleScan software . 
+ TM Processing of ChIP -- chip data was performed in three steps : normalization , IP/mock-IP ratio computation ( log base 2 ) , and enriched region identiﬁcation . 
+ The log ratios 2 of each spot in the microarray were calculated from the raw signals obtained from both Cy5 and Cy3 channels , and then the values were scaled by Tukey bi-weight mean . 
+ The log2 ratio of Cy5 ( IP DNA ) to Cy3 ( mock-IP DNA ) for each point was calculated from the scanned signals . 
+ Then , the bi-weight mean of this log2 ratio was subtracted from each point . 
+ Each log ratio dataset from triplicate samples was used to identify PurR-binding region using the software ( width of sliding window = 300 bp ) . 
+ Our approach to identify the PurR-binding regions was to ﬁrst determine binding locations from each data set and then combine the binding locations from at least ﬁve of the six data sets to deﬁne a binding region ( 11 ) . 
+ Motif searching
+ The PurR-binding motif analysis was completed using the MEME and FIMO tools from the MEME software suite ( 12 ) . 
+ We ﬁrst determined the proper binding motif and then scanned the full genome for its presence . 
+ The elicit-ation of the motif was done using the MEME program on the set of sequences deﬁned by the PurR-binding regions . 
+ Using default settings , the previously determined PurR motif was recovered and then tailored to the correct size by setting the width parameter to 16 bp . 
+ We then used these motifs and the PSPM ( position speciﬁc probability matrix ) generated by MEME to rescan the entire genome with the FIMO program . 
+ RESULTS
+ Determination of gene expression changes to the exogenous adenine
+ To characterize the changes in gene expression with exposure to exogenous adenine , global transcriptome analyses were performed using DNA microarrays . 
+ Escherichia coli strain K-12 MG1655 ( wild type ) was grown in M9 medium supplemented by 100 mg/ml adenine . 
+ Samples were removed from the culture with and without adenine stimulation and used for the extraction of total RNA . 
+ Analysis of the DNA microarray dat demonstrated that exogenous adenine stimulation led to changes in transcript level of about 10 % of E. coli genes ( Supplementary Table S1 ) . 
+ Those include the upregulation of 144 genes and downregulation of 255 genes with more than 2-fold expression change and P-value < 0.05 . 
+ As previously well described ( 2 ) , all of the genes associated with the purine biosynthesis pathway were repressed by the addition of adenine . 
+ Among them , purD encoding phosphoribosylamine -- glycine ligase showed the highest repression factor ( 43.89-fold ) ( Supplementary Table S2 ) . 
+ Interestingly , adenine addition led to the high downregulation of several transporter genes , including codB , xanP , yeeF and uraA encoding cytosine transporter ( 34.52-fold ) , xanthine NCS2 transporter ( 20.12-fold ) , amino acid APC transporter ( 18.61-fold ) and uracil NCS2 transporter ( 17.64-fold ) . 
+ A signiﬁcant portion of the genes downregulated by adenine addition is associated with the pyrimidine and amino acid biosynthetic pathways . 
+ In particular , the downregulation of genes comprising the arginine biosynthetic pathway clearly demonstrates that the purine me-tabolism links to the in vivo level of arginine in order to regulate the biosynthesis of deoxyribonucleic and ribo-nucleic acid ( 13 ) . 
+ On the other hand , the highly induced genes by adenine stimulation were involved in other various cellular processes ( Supplementary Table S2 ) . 
+ Among them , ydhC encoding drug MFS transporter had the highest activation factor ( 193.55-fold ) . 
+ Of interest , two non-coding RNAs , gcvB and rybB , were induced by a factor of 34.37 and 5.92 , respectively . 
+ Previous studies showed that GcvB enhances the ability of E. coli to survive low pH by upregulating the levels of alternate sigma factor , RpoS ( 14 ) . 
+ Another alternate sigma factor , RpoE-dependent RybB , regulates the synthesis of major porins in E. coli ( 15 ) . 
+ Among genes in purine salvage pathways , add encoding adenosine deaminase was induced by adenine as well ( 17.61-fold ) ( 2 ) . 
+ The genes in thiamine and biotin biosynthesis pathways were also induced by the exogenous adenine . 
+ These observations demonstrate that a large number of the downregulated genes related with purine and pyrimidine biosynthesis and transport , arginine biosynthesis and ATP synthesis coupled with proton transport form a major portion ( 64 % ) of the adenine stimulon . 
+ PurR-dependent transcriptome response to the exogenous adenine 
+ Next , we studied the global response of a purR deletion mutant to adenine stimulation to identify PurR-dependent genes ( Supplementary Table S1 ) . 
+ To address this issue , we isolated total RNA from the isogenic purR deletion mutant during exponential growth phase and hybridized the cDNA obtained from the total RNA onto Affymetrix microarrays . 
+ A comparison of the gene expression levels between cells grown in the presence and absence of the PurR protein in response to the exogenous adenine revealed that a total of 56 genes exhibit differential expression with more than 2-fold change and a false discovery rate ( FDR ) value < 0.05 ( P-value = 0.0056 ) from analysis of variance ( ANOVA ) analysis ( Supplementary Table S3 ) . 
+ Nineteen genes ( 34 % ) showed increased transcript levels in response to the exogenous adenine due to regulation by PurR ( Supplementary Table S3 ) . 
+ None of these 19 genes has been previously reported to be directly regulated by PurR . 
+ On the other hand , transcription of the 37 genes ( 66 % ) was repressed by PurR . 
+ Eleven genes of the IMP ( inosine 50-monophosphate ) biosynthetic pathway from PRPP ( 5-phosphoribosyl-1-pyrophosphate ) clustered into this group . 
+ It has been previously determined that 10 of them were directly repressed by the PurR protein ( 16 -- 24 ) . 
+ Eight genes in pyrimidine bio-synthesis and transport pathways were directly or indir-ectly regulated by the PurR protein , that include carA , carB , pyrB , pyrI , pyrC , pyrD , codA and codB . 
+ It has been experimentally determined that four of them ( carA , carB , pyrD and pyrC ) were directly repressed by the PurR ( 25 -- 29 ) . 
+ Transcription of yieG ( -- 2.91-fold ) , xanP ( -- 30.82-fold ) , and uraA ( -- 2.04-fold ) encoding adenine , xanthine and uracil transporter , respectively , were also affected by the purR deletion , indicating direct or indirect regulatory effect of the PurR protein on the adenine and uracil transport systems ( 30 ) . 
+ Transcriptional repression of genes in arginine biosynthesis pathway ( argA , argB and argC ) is potentially mediated by the PurR protein . 
+ Interestingly , acid stress response genes ( hdeB , hdeA and hdeD ) decreased expression in a purR deletion mutant . 
+ Consistent with this observation , the level of messenger RNA ( mRNA ) transcript of gadY , a regulatory small RNA that is highly upregulated by low pH ( 31 ) , was affected by the purR deletion . 
+ A hallmark of the E. coli response to the exogenous adenine is the rapid and strong repression of a set of genes in purine biosynthesis pathway . 
+ The observed repression of all of these genes in the wild-type strain , but not the purR deletion mutant , provided an internal validation of the microarray experiment . 
+ Genome-wide identiﬁcation of PurR regulon
+ PurR-binding regions have been characterized by in vitro DNA-binding experiments and mutational analysis ; however , direct analysis of in vivo PurR binding is not available . 
+ We thus employed the ChIP coupled with microarrays ( ChIP -- chip ) approach to determine the in vivo PurR-binding regions in E. coli cells under either the presence or the absence of exogenous adenine ( Figure 1 ) . 
+ We performed a hybridization of the immunoprecipitated DNA ( Cy5 channel ) and mock immunoprecipitated DNA ( Cy3 channel ) onto the high-resolution whole-genome tiling microarrays , which contained a total of 371 034 oligonucleotides with 50-bp tiles overlapping every 25 bp on both forward and reverse strands ( 11,32 ) . 
+ The normalized log2 ratios obtained from the hybridization identify the genomic regions enriched in the IP-DNA sample compared with the mock IP-DNA sample and thereby represent a genome-wide map of in vivo interactions between PurR protein and E. coli genome ( Figure 1A ) . 
+ Using a peak ﬁnding algorithm , 35 and 13 unique and reproducible PurR-binding regions were identiﬁed from th hybridizations in exponential phase in the presence and absence of adenine , respectively ( Table 1 ) . 
+ The genome-wide PurR-binding maps obtained from two different conditions , i.e. exponential growth phase in the presence and the absence of exogenous adenine , indicated that the PurR association on the E. coli genome is dramatically sensitive to the addition of adenine . 
+ For instance , PurR occupancy for the promoter regions of carAB , purC and purMN transcription units showed a great differential ratio between those two conditions ( Figure 1B , Table 1 ) . 
+ At the previously characterized PurR-binding promoter regions of pyrD , purB , purR , cvpA-purF-ubiX , guaBA , purL and purHD , we only observed the PurR-binding in the presence of exogenous adenine . 
+ Only 37 % of binding sites ( 13 of 35 ) overlapped under the conditions in the absence and presence of exogenous adenine , and 62 % of binding sites ( 22 of 35 ) were found in the presence of exogen-ous adenine ( Figure 1C ) . 
+ Adenine can be converted to 
+ IMP through the intermediate formation of adenosine , inosine , and Hx catalyzed by purine nucleoside phosphor-ylase ( deoD ) and adenosine deaminase ( add ) ( 2 ) . 
+ Thus , this observation indicates that the addition of exogen-ous adenine increased in the intracellular level of Hx , which led to the formation of the PurR -- Hx complex that functions in transcriptional regulation ( 4 ) . 
+ A total of 22 new PurR-binding regions were identiﬁed in this study , whose roles were involved in various cellular processes ( Table 1 ) . 
+ Prior to this study , 15 PurR-binding regions had been characterized by DNA-binding experiments in vitro and mutational analysis in vivo , 87 % ( 13 of 15 ) of which were identiﬁed in this study ( transcription units in bold characters in Table 1 ) . 
+ The exceptions were pyrC and glnB promoters , whose cellular functions are related to the pyrimidine biosynthesis and nitrogen metabolism , respectively . 
+ It is unclear why those PurR-binding regions were missed from our analysis 
+ We next assessed the locations of the PurR-binding regions against the current annotated genome information ( 11 ) . 
+ The PurR-binding regions were observed only within intergenic ( i.e. , promoter and promoter-like ) regions . 
+ Therefore , there exists a strong preference for the PurR-binding target to be located within the noncoding intergenic regions , similar to that observed for Lrp-binding sites ( 32 ) . 
+ To identify common DNA sequence motifs of the PurR-binding regions , we used the MEME suite tool ( 12 ) . 
+ The sequences of PurR-binding regions were used to generate the position speciﬁc probability matrix and to rescan the entire genome with the FIMO program . 
+ We then analyzed only those sites which were located in the PurR-binding regions and fell below a stringent cut-off ( P-value < 0.0001 ) . 
+ This revealed a total of 28 conserved sequences spread across 35 binding regions ( Table 1 ) . 
+ The identiﬁed sequence motif ( ACGNAAACGTTTGCNT ) was consist-ent with the previously characterized 16-bp palindromic binding site of the PurR ( Figure 1D ) ( 2 ) . 
+ Based on the fact that the increase in the intracellular adenine levels enhances PurR binding to its DNA targets and the coverage of the known binding regions in our data , we concluded that PurR-binding regions identiﬁed here are bona ﬁde binding sites . 
+ Genome-scale determination of causal relationship
+ Currently , a total of 22 genes have been characterized as members of PurR regulon to be directly repressed by PurR ( 33 ) . 
+ From our ChIP -- chip analyses , we signiﬁcantly expanded the size of the PurR regulon to comprise 53 target genes ( Table 1 ) . 
+ To determine the causal relationships between the binding of PurR and the changes in RNA transcript levels of genes in the PurR regulon , we integrated the information on the binding regions of PurR with transcriptomic analysis . 
+ Among 53 target genes in PurR regulon determined by ChIP -- chip analyses , we determined 23 genes ( 43 % ) differentially expressed in response to the purR deletion and the addition of exogen-ous adenine with more than a 2-fold change and an FDR value < 0.05 ( P-value = 0.0056 ) from ANOVA analysis ( Figure 1E and Supplementary Table S4 ) . 
+ The genes directly repressed by PurR in response to the exogenous adenine ( 23 genes ) include codB , codA , purT , xanP , yieG , pyrL , pyrB and pyrI encoding cytosine NCS1 transporter , cytosine deaminase , phosphoribosylglycinamideformyltransferase , xanthine NCS2 transporter , adenine transporter , PyrL leader peptide , aspartate carbamoyltransferase and aspartate carbamoyltransferase regulatory subunit , respectively , as newly found members 
+ The remaining 57 % of the genes had a direct association with PurR , lacking signiﬁcant changes in RNA transcript levels . 
+ The cellular functions of most of the remaining genes were not clustered in purine and pyrimidine meta-bolic pathways , indicating that the changes in their transcript levels require additional regulatory signals such as transcription factors . 
+ Surprisingly , none of the genes were directly activated by PurR . 
+ On the contrary , PurR completely represses target genes involved in the IMP biosynthetic pathway . 
+ This suggests that most of the repression is a direct interaction , but the transcriptional activation is indirect ( Supplementary Tables S1 , S3 and S4 ) . 
+ In general , the PurR-binding sites are located in the promoter region between position 35 and 10 promoter elements , indicating that the binding of PurR regulates transcription initiation ( 2 ) . 
+ In the case of purB and purR , PurR binds to the open reading frame so that it blocks transcription elongation ( 34 ) . 
+ Therefore , the binding position of PurR is of great interest in order to understand its regulatory mechanism . 
+ We calculated the distance between PurR-binding motifs and the transcription start sites ( TSSs ) based upon the TSSs recently published ( 11 ) . 
+ Of 32 promoter regions directly regulated by PurR , 14 regions ( 44 % ) include the PurR-binding motif between 10 and 35 promoter elements . 
+ Metabolic pathways directly regulated by PurR–Hx complex
+ Purine nucleotide metabolism plays a critical role in various cellular activities . 
+ To identify the metabolic pathways regulated by PurR -- Hx complex , the members of PurR regulon were functionally classiﬁed and further mapped to the E. coli metabolic pathways ( Figure 2 ) . 
+ The genes with direct PurR association lacking changes in transcript levels were classiﬁed into other cellular functions . 
+ However , the genes directly repressed by PurR in response to the exogenous adenine clustered mainly into purine and pyrimidine metabolic pathways . 
+ First , PurR directly autoregulates itself and generates PurR -- Hx complex in the presence of Hx or adenine ( Figure 2A ) . 
+ Second , PurR completely regulates purine transport , bio-synthesis , salvage and interconversion pathways ( Figure 2B ) . 
+ Interestingly , PurR directly regulates serine transport and metabolic pathways to produce N-10 formyltetrahydrofolate ( N - FTHF ) , which is an inter-10 mediate for the IMP biosynthetic pathway . 
+ In addition , we found that PurR directly represses xanthine ( xanP ) , purine nucleoside ( tsx ) and adenine ( yieG ) transporters . 
+ Although the transporter for Hx is currently unknown , xanthine transporter is unable to transport Hx ( 35 ) . 
+ Lastly , PurR downregulates the genes in pyrimidine bio-synthetic and transport pathways ( Figure 2C ) . 
+ Most of the genes having direct PurR association with differential gene expression were enriched into purine and pyrimidine metabolic pathways . 
+ Interestingly , none of genes involved with purine utilization , such as apt , deoD and add , are directly regulated by PurR . 
+ DISCUSSION
+ We determined the PurR regulon in E. coli in response to the exogenous adenine stimuli by integrating genome-scale location analysis and gene expression proﬁles . 
+ The genome-wide map of PurR-binding sites presented here not only conﬁrms previously characterized binding sites ( 15 regions ) but also expands the number of known binding sites ( 35 regions ) to a genome-wide assessment ; similar to what we previously reported for Lrp - and Fis-binding sites ( 10,32 ) . 
+ From the genome-wide mapping results , we were also able to show that : ( i ) a total of 35 PurR-binding regions were identiﬁed , all of which were located within noncoding regions , showing the strong binding preference of PurR-binding to the promoter and promoter-like regions ; ( ii ) only 37 % of binding sites ( 13 of 35 ) overlapped under the conditions in the absence and presence of exogenous adenine , indicating that PurR bindings to the E. coli genome are dramatically sensitive to the addition of exogenous adenine ( or hypoxanthine ) ; ( iii ) the integration of these results with mRNA transcript level information indicates that the functional assignment of the regulated genes is strongly enriched in the purine and pyrimidine metabolism-related functions . 
+ In addition , most of the genes were thoroughly repressed by PurR . 
+ Interestingly , the other genes directly bound by PurR lacking differential expression in response to the purR deletion or the exogenous adenine were functionally diverse ; and ( iv ) the PurR-binding motifs were observed at the regions of 10 and 35 promoter elements , indicating PurR regulates transcription initiation . 
+ We discovered PurR-binding regions from the promoter regions of codBA , purT , xanP , yieG and pyrLBI with the differential gene expression . 
+ First , PurR directly regulates the de novo biosynthesis of pyrimidine . 
+ Among the genes in the biosynthetic pathway , codB and codA encode a cytosine transporter belonging to the NCS1 family of purine and pyrimidine transporters and a cytosine deaminase metabolizing cytosine to uracil and ammonia , respectively . 
+ In addition , PurR directly regulates pyrB and pyrI , encoding catalytic and regulatory subunits of aspartate transcarbamylase ( ATCase ) , respectively , catalyzing the ﬁrst reaction of the de novo biosynthesis of pyrimidine nucleotides . 
+ Considering that PurR represses carA , carB , pyrC and pyrD , the very early steps of de novo pyrimidine biosynthesis and transport are tightly regulated by PurR in response to exogenous adenine . 
+ Interestingly , RutR , the uracil responsive transcription factor , binds to the promoter region of carAB ( 36 ) . 
+ Although Shimada and co-workers demonstrated that the RutR binding site plays little or no role in the regulation of transcription , it may have an additional regulatory role along with other proteins . 
+ In the carAB promoter , at least ﬁve regulatory proteins ( IHF , PepA , PurR , RutR and ArgR ) are involved in the purine , pyrimidine and arginine-speciﬁc control of the promoter activity ( 28 ) . 
+ The complexity of the multicomponent regulatory mechanisms modulating carAB transcription shows the need for a cellular balance between the synthesis of pyrimidine and purine residues . 
+ Second , PurR directly regulates the transport of purine nucleotides . 
+ We observed PurR-binding peaks at the upstream regions of xanP , mdtL and yieG with differential gene expression in response to the exogenous adenine . 
+ Interestingly , PurR-binding peaks were observed for xanP and mdtL in the absence of exogenous adenine but not for yieG , suggesting that the yieG encodes a high-afﬁnity transport system for adenine , which is dispensable in the presence of excess substrate . 
+ It has been suggested that another adenine transport system close to the genomic position of yieG operates at low afﬁnity and is not energy dependent ( 30 ) . 
+ However , it has not been discovered which gene has the adenine transport function with low afﬁnity . 
+ Here , we found that mdtL encoding drug MFS transporter is directly repressed by PurR and located close to yieG ( 4.5 kb ) , indicating that mdtL might be responsible for the low-afﬁnity adenine transport . 
+ Thus , the purine transport system of the PurR regulon can be composed of two high-afﬁnity transporters ( xanP and yieG ) and one low-afﬁnity transporter ( mdtL ) . 
+ Transcriptional regulatory systems often regulate the formation rates and the concentration of small molecules by feedback loops that regulate the transport , biosynthesis and metabolic enzymes ( 11,37 ) . 
+ Since adenine can be utilized by apt , encoding adenine phosphoribosyltransferase , we were able to connect transport , biosynthesis and metabolic feedback loop pairs ( Figure 2D ) . 
+ In the left loop , PurR -- Hx complex represses the transcription of the transport proteins ( T ) for purine ( xanP and yieG ) and pyrimidine ( codB ) , and biosynthetic proteins ( B ) for IMP ( purEK , purB , purT , purF , purC , purMN , purl and purHD ) and UMP ( carAB , codA , pyrBI and pyrCD ) , reducing the inﬂux of the purine or pyrimidine molecules ( Pin ) from the media ( Pout ) and precursors ( Ppre ) . 
+ In the right loop , meta-bolic enzyme ( U ) responsible for converting Pin into me-tabolites ( M ) is not directly regulated by PurR -- Hx complex ; however , its transcript level is reduced by the exogenous adenine . 
+ Thus , the logical structure of the connected feedback loop ( CFL ) motif described by a notation that uses three signs indicating repression ( R ) or activation ( A ) for each of T , B , and U can be R-R-R . 
+ In the previous studies ( 32,37 ) , the B component ( i.e. biosynthesis ) was not included in the logical structures . 
+ The R-R-R motif demonstrates that the inﬂux and efﬂux are repressed for ﬂow homeostasis ( 37 ) , which means that the exogenous adenine can not be utilized as nutrient molecules . 
+ In the case of nutrient molecules and homeostasis , the logical structures of CFL would have been A-A/R-A and R-A / R-A , respectively ( 37 ) . 
+ Since the R-R-R motif is uncommon for the regulation of small molecules in living cells , transcriptional regulatory networks for maintaining the levels of the purine and pyrimidine molecules may be more complex than previously thought . 
+ Previously , the PurR was classiﬁed into the group of general TFs based on the functional diversity of the genes in its regulon ( 6 ) . 
+ Compared to the other general TFs in E. coli such as Fnr ( 38 ) , Crp ( 39 ) and Lrp ( 32 ) , the functions and size of the PurR regulon are limited . 
+ However , cellular functions of the genes are highly enriched into the purine and pyrimidine transport and biosynthesis , indicating that the direct effect of PurR on the E. coli metabolism is local , but via the balance of cellular purine content , it plays a critical role in metabol-ism . 
+ Now we may need to select a new list of global transcription factors in E. coli . 
+ ACCESSION NUMBER
+ All raw data ﬁles have been deposited to Gene Expression Omnibus through accession numbers GSE26588 and GSE26589 
+ ACKNOWLEDGEMENTS
+ The authors thank Marc Abrams for manuscript.
+ FUNDING
+ National Institutes of Health Grant GM062791 ; The Ofﬁce of Science-Biological and Environmental Research , U.S. Department of Energy , DE-FOA-0000143 . 
+ Funding for open access charge : National Institutes of Health Grant GM062791 . 
+ Conﬂict of interest statement. None declared.
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/22082910.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/22082910.txt 0 → 100644
View file @27818a9
+ NIH Public A Author Manuscript
+ Abstract 
+ Although metabolic networks have been reconstructed on a genome-scale , the corresponding reconstruction and integration of governing transcriptional regulatory networks has not been fully achieved . 
+ Here we reconstruct such an integrated network for amino acid metabolism in Escherichia coli . 
+ Analysis of ChIP-chip and gene expression data for the transcription factors ArgR , Lrp , and TrpR showed that ～ 82 % of the genes they regulate are directly involved in amino acid metabolism . 
+ Further analysis shows that 19/20 amino acid biosynthetic pathways are either directly or indirectly controlled by these three transcription factors . 
+ Classifying the regulated genes into three functional categories of transport , biosynthesis , and metabolism leads to elucidation of regulatory motifs constituting the integrated network 's basic building blocks . 
+ The regulatory logic of these motifs was determined based on the relationships between transcription factor binding and changes in transcript levels in response to exogenous amino acids . 
+ Remarkably , the resulting logic shows how amino acids are differentiated as signaling and nutrient molecules , and thus revealing the overarching regulatory principles of this stimulon . 
+ Transcriptional regulatory networks ( TRN ) in bacteria govern metabolic flexibility and robustness in response to environmental signals1 . 
+ Thus , causal relationships between transcript levels for metabolic genes and the direct association of transcription factors ( TFs ) at the genome-scale is fundamental to fully understand bacterial responses to their environment2 ,3 . 
+ In particular , the molecular interaction between small molecules ranging from nutrients to trace elements and TFs governs the TRN and ultimately regulates the related metabolic pathways . 
+ From the causal relationships , a small set of recurring regulation patterns , or network motifs3 ,4 were identified and reconstructed to describe the design principles of complex biological systems . 
+ One primary discovery from this effort was the connected feedback circuit which coordinates influx ( biosynthesis and transport ) and efflux ( metabolism ) pathways that are jointly regulated by a TF sensing the relevant small molecule3 . 
+ For example , a part of the global TRN is comprised of certain TFs ( ArgR , Lrp , and TrpR ) that sense the presence of exogenous amino acids ( arginine , leucine , and tryptophan , respectively ) and , in response , regulate the expression of a number of target genes5 . 
+ Upon addition of these amino acids to the environment , the TFs exhibit enhanced , reversed , or unaffected regulatory modes3,6-8 . 
+ These TF responses make these amino acids not just nutrients but also signaling molecules9 . 
+ Results
+ Previously discovered network motifs3 ,4 represent a significant step forward in our understanding of complex biological behavior . 
+ However , they fail to appropriately elucidate the system wide response since they were either based upon incomplete information4 , or were only specific to a single transcription factor and regulon3 . 
+ This has resulted in an inability to appropriately understand complex regulatory phenomena existing across multiple transcription factors and regulatory signals . 
+ Hence , it is necessary to achieve a full elucidation of these interactions with systematic and integrated experimental analysis . 
+ Comprehensive elucidation of the causal relationships is achievable by integrated analysis of expression data obtained from microarray or sequencing ( e.g. , RNA-seq ) 10 with direct TF-binding information from chromatin immunoprecipitation coupled with microarrays or sequencing ( ChIP-chip or ChIP-seq ) 3,11 under appropriate environmental conditions . 
+ Thus , we obtain and integrate genome-scale data from ChIP-chip for each TF and gene expression profiling to reconstruct regulons involved in amino acid metabolism at the genome-scale . 
+ The elucidated regulatory logic falls into two categories that differentiate the role of amino acids as signaling and as nutrient molecules . 
+ Therefore , the reconstruction of the regulatory logic of the network motif allows us to establish the physiological role of each TF regulon and to determine how they govern the amino acid regulation in E. coli . 
+ Then , the integration of these multiple regulons into a unified network led to the first full bottom-up genome-scale reconstruction of a stimulon . 
+ ArgR , Lrp , and TrpR are TFs involved in amino acid metabolism in E. coli6 ,7,12 , responding to arginine , leucine , and tryptophan , respectively . 
+ The binding of the small effector molecule ( here being the amino acids ) to these TFs carries out the genome 's regulatory code by enhancing or decreasing the TFs affinity for a specific genomic region and concurrently modulating the transcription of downstream genes . 
+ In the case of Lrp , the direct analysis of in vivo binding was fully described3 using chromatin immunoprecipitation coupled with microarrays ( ChIP-chip ) experiments . 
+ A total of 141 binding regions were analyzed , representing coverage of 74 % of the previously identified regions3 . 
+ However , similar genome-scale data for the other two major TFs in amino acid metabolism , ArgR and TrpR were unavailable . 
+ To determine their binding regions on a genome-wide level in an unbiased manner , we employed the ChIP-chip approach to E. coli cells harboring 8 × myc-tagged ArgR or TrpR protein13 . 
+ The resulting log2 ratios obtained from the ChIP-chip experiments identify the genomic regions enriched in the IP-DNA sample compared with the mock IP-DNA sample and thereby represent a genome-wide map of in vivo ArgR - and TrpR-binding regions ( Fig. 1a ) . 
+ Using a previously described binding region detection algorithm14 , 61 and 8 unique and reproducible ArgR - and TrpR-binding regions were identified , respectively ( Supplementary Table 1 and Supplementary Table 2 ) . 
+ The 61 ArgR-binding sites detected included 13 sites previously characterized by DNA-binding experiments in vitro and mutational analyses in vivo15 ,16 . 
+ For example , the ArgR-arginine complex transcriptionally represses gltBD , artPIQM operon , and artJ gene encoding arginine transport systems17 ,18 . 
+ Our results confirmed that the ArgR-arginine complex binds to each of these promoter regions ( Fig. 1b ) . 
+ In addition , the ArgR occupancy level at the promoter of the artJ gene is greater than that of artPIQM operon in the presence and absence of exogenous arginine ( Supplementary Table 1 ) . 
+ This result is in good agreement with the de-repression/repression ratio of 28 for PartJ and 3.2 for PartP previously reported for repressibility of the artJ and artP promoters18 . 
+ Also , this result is consistent with recent microarray and qPCR experiments showing a significant arginine and ArgR-dependent down-regulation of both the artJ ( about 50-fold ) and artPIQM mRNA levels ( about three to six-fold ) 17 . 
+ In the case of TrpR , a total of five associations have been determined by DNA-binding experiments in vitro and mutational analyses in vivo7 ,19 , all of which were also identified in our study ( Fig. 1a and Supplementary Table 2 ) . 
+ For instance , TrpR directly binds to the promoter regions of aroH and mtr involved in biosynthesis and transport of aromatic amino acids ( Fig. 1b ) . 
+ Against the current genome annotation14 , all of the ArgR - and TrpR-binding regions were observed within intergenic regions , i.e. , promoter and promoter-like regions . 
+ The same preference was observed for Lrp-binding sites ( Supplementary Table 1 and 2 ) 3 . 
+ DNA sequence motifs for each of the transcription factors were also re-derived based solely upon the ChIP binding regions and were in full agreement with previously described motifs ( Supplementary Fig 2 ) . 
+ Based on the fact that the increase in the intracellular arginine and tryptophan levels enhances ArgR and TrpR binding to its DNA targets20 ,21 , the confirmation of previously discovered sequence motifs , and the full coverage of the known binding regions in our data we concluded that ArgR - and TrpR-binding regions identified here are bona fide binding sites . 
+ Interestingly , as with gltBD , artPIQM , potFGHI , and mtr ( Fig. 1b ) , we observed that Lrp directly binds to nine ArgR - and one TrpR-binding regions ( Fig. 1c and Supplementary Fig. 1 ) . 
+ For example , the direct binding of Lrp to the promoter region of the gltBD operon encoding glutamate synthase resulted in the activation of its transcription . 
+ In contrast , the role of ArgR-binding represents the negative regulation of the operon . 
+ Integrating binding regions and changes in transcript levels , the reciprocal mode3 in the transcriptional regulation of ArgR and Lrp was observed for cellular functions including putrescine transport ( potFGHI ) , arginine transport ( artPIQM ) , leucine response protein ( lrp ) , arginine biosynthesis and utilization ( argA and astCADBE ) , the formation of nucleoid ( stpA ) , as well as glutamate biosynthesis and transport ( gltBD and gltP ) . 
+ While Lrp activates the tryptophan transport ( mtr ) , TrpR represses its transcription . 
+ In addition to confirming previously identified ArgR - and TrpR-binding regions , we found 47 and 3 novel ArgR - and TrpR-binding regions , which include the promoter region of potFGHI , encoding putrescine ABC transporter ( Fig. 1b ) . 
+ A regulon is defined as a group of genes whose transcription is controlled by a transcriptional regulator . 
+ The arginine regulon describing the genetic and regulatory organization of the genes involved in arginine biosynthesis in E. coli was used as an example in proposing the definition of the regulon in 196417,22 . 
+ However , it has not been included in the definition of regulon whether each regulation is direct or indirect . 
+ So far , a total of 37 , 56 , and 10 genes have been characterized as members of regulons directly regulated by ArgR , Lrp , and TrpR , respectively15 ,16 Based upon regulatory codes described above , we significantly extended the size of these regulons and obtained 140 , 283 , and 15 target genes for each regulon . 
+ Since ArgR directly controls the transcription of lrp , the regulon size of each transcription factor can be described as ArgR ( 423 ) > Lrp ( 283 ) > TrpR ( 15 ) . 
+ These regulons represent a hierarchical structure that can be used to identify the indirect effect of the TFs . 
+ For example , thrLABC operon involved in the threonine biosynthesis is directly activated by Lrp , either in the absence or presence of exogenous leucine . 
+ We observed that ArgR indirectly represses this operon in response to exogenous arginine ; i.e. , transcriptional repression without the direct binding of ArgR . 
+ It is therefore possible to partially elucidate the indirect regulation by ArgR based on the hierarchical regulatory network . 
+ ArgR represses Lrp leading to the indirect repression of the thrLABC operon . 
+ As shown in this example , integrated analysis of ChIP-chip and expression profiles allows us to fully understand the hierarchical TRN including the indirect regulatory effects . 
+ Next , we classified the 438 target genes based on their functional annotation and found that most of these functions ( ～ 82 % ) were assigned to amino acid metabolism and transport , as well as carbohydrate , nucleotide , and energy metabolism ( Fig. 2 ) . 
+ We are then able to show ( Fig 3 ) that 19/20 amino acid biosynthetic pathways are directly or indirectly controlled by these three TF 's . 
+ To do this we first mapped the directly regulated genes to known amino acid biosynthetic pathways and transport systems to determine their direct metabolic roles ( Fig. 3a , b ) . 
+ ArgR directly regulates the transcription of all genes involved in the biosynthesis of arginine and histidine . 
+ It also regulates gltBD , aroB , aroK , and dapE involved in glutamate , aromatic amino acids , and lysine biosynthesis , respectively . 
+ The genes encoding the enzymes for the biosynthesis of branched chain amino acids are comprehensively regulated by Lrp , which also controls the transcription of gltBD and gdhA encoding glutamate synthase and glutamate dehydrogenase ( glutamate biosynthesis ) , serC and serB encoding phosphoserine transaminase and phosphatase ( serine biosynthesis ) , thrABC operon for aspartate kinase , homoserine kinase , and threonine synthase ( threonine biosynthesis ) , argA for N-acetylglutamate synthase ( arginine biosynthesis ) , and aroA for 3-phosphoshikimate-1-carboxyvinyltransferase ( the chorismate formation for aromatic amino acid biosynthesis ) . 
+ TrpR regulates the transcription of genes involved in tryptophan biosynthetic pathway ( trpLEDCBA operon ) , as well as aroH and aroL . 
+ In addition , it has been determined that TyrR directly regulates several genes in the aromatic amino acid biosynthesis ( aroF , aroG , aroK , aroA , tyrA , and tyrB ) in response to exogenous tyrosine15 ,16 . 
+ Taken together , these four TFs control the biosynthesis of 12 amino acids . 
+ Furthermore , the biosynthesis of proline , glutamine , glycine , cysteine , and methionine is through branched biosynthetic pathways of glutamate , serine and aspartate ( Fig. 3a ) . 
+ The remaining three amino acids ( i.e. , alanine , aspartate , and asparagine ) are synthesized from glutamate as an amino donor ( green dots in Fig. 3a ) . 
+ Therefore , biosynthetic pathways for all amino acids are directly or indirectly controlled by these four TFs . 
+ Next , we classified the amino acids into ten groups based on the substrate specificity of each transport system , which are A ( tyrosine , phenylalanine , tryptophan ) , B ( arginine , histidine , lysine ) , C ( glutamate , aspartate ) , D ( leucine , isoleucine , valine ) , E ( alanine , serine , glycine , threonine ) , F ( proline ) , G ( methionine ) , H ( cysteine ) , I ( asparagine ) , and J ( glutamine ) ( Fig. 3b ) . 
+ As expected , the amino acids in the same group have a similar chemical structure , e.g. aromatic amino acids and branched chain amino acids in group A and group D , respectively . 
+ Transport systems for groups G-J are highly specific and were therefore classified into individual groups . 
+ In general , genes for amino acid biosynthesis are repressed by each corresponding TF , whereas catabolic operons such as astCADBE , tdh-kbl , and gcvTHP are induced in response to the exogenous amino acids12 ,23 . 
+ To determine the causal relationships between binding of a TF and the changes in RNA transcript levels of genes in the regulons , we integrated the binding regions of ArgR , TrpR , Lrp , and TyrR with the publicly available transcriptomic data ( Fig. 4 ) 3,17 . 
+ We then determined activation or repression based upon the regulatory modes described previously3 . 
+ Among genes in the ArgR regulon , about 18 % genes were directly activated in response to the exogenous arginine , which include aroP and gltP genes encoding aromatic amino acids and glutamate/aspartate transporters . 
+ On the other hand , ArgR represses about 70 % of its regulon members , including potFGHI , artJ , artPIQM , and hisJQMP encoding putrescine , arginine , lysine , ornithine , and histidine ABC transporters ( Fig. 4 ) . 
+ ArgR represses genes involved in the arginine and glutamate biosynthesis pathways , and unexpectedly , it directly down-regulates genes involved in histidine , aromatic amino acids , and lysine biosynthesis pathways . 
+ In case of amino acid utilization , ArgR induces astCADBE and puuEB operons encoding the metabolic pathways for arginine and putrescine , respectively . 
+ The remaining 12 % of its regulon members had a direct association with ArgR without differential gene expression . 
+ Most of the remaining genes are currently annotated as genes of unknown function ( Supplementary Table 1 ) . 
+ Gene expression profiles validated that Lrp directly regulates 283 genes . 
+ 45 % and 55 % of the Lrp-regulated genes were repressed and activated in response to the addition of the exogenous leucine3 . 
+ As expected , Lrp controls the transport , biosynthetic and utilization pathways more globally than other transcription factors do . 
+ Lrp represses the transport systems for branched chain amino acids ( brnQ , livKHMGF , and livJ ) , dipeptides ( dppABCDF ) , and lipoproteins ( lolCDE ) but it activates a whole set of other transporters . 
+ Transporters that are activated by Lrp are aromatic amino acids ( tyrP and mtr ) , arginine ( artMQIP ) , glutamate ( gltP ) , alanine , serine , glycine and threonine ( cycA , tdcC , sdaC , and sstT ) , proline ( proY ) , putrescine ( potFGHI ) , dipeptide ( dtpB ) , and oligopeptides ( oppABCDF ) ( Fig. 4 ) . 
+ In terms of amino acid biosynthetic pathways , Lrp represses all genes but the thrLABC operon for threonine biosynthesis . 
+ For amino acid utilization , Lrp activates all pathways for aromatic amino acids , arginine , aspartate , branched chain aromatic amino acids , alanine , glycine , serine , threonine , methionine , and putrescine . 
+ In case of the TrpR regulon , a total of 15 genes are directly regulated , of which 13 genes are repressed ( Supplementary Table 2 ) 16,24 . 
+ TrpR also represses mtr encoding the tryptophan transporter as well as aroH , aroL , and trpABCDE involved in the tryptophan biosynthesis pathway . 
+ While TyrR activates the transport systems for aromatic amino acids ( aroP , tyrP , and mtr ) , it represses tyrosine biosynthetic pathway comprising of aroG , aroL , aroF , tyrA , and tyrB ( Fig. 4 ) . 
+ Based on the integrated analysis of TF-binding locations and gene expression profiles , we were able to connect transport , biosynthesis , and utilization of amino acids , and generate the connected bidirectional circuits ( Fig. 5a ) . 
+ In the left feed-back circuit , TF-amino acid ( TF-AA ) complexes regulate the transcription of the transporters ( T ) and biosynthesis pathways ( B ) , facilitating the influx of the amino acid molecules ( AAin ) from amino acids in the media ( AAout ) and precursors ( AApre ) . 
+ In the right feed-forward circuit , TF-AA complexes control transcription of utilization genes ( U ) responsible for converting AAin into metabolites ( M ) . 
+ Thus , the logical structures of the connected bidirectional circuit motifs can be described by a notation that uses three signs indicating repression ( R ) or activation ( A ) for each of T , B , and U ( Fig. 5b ) . 
+ For example , the A-R-A circuit motif indicates that the transcription of transport , biosynthesis , and metabolic genes are activated , repressed , and activated , respectively , whereas the R-R-A circuit motif demonstrates that the transcription of both transport and biosynthesis are repressed and the metabolic genes are activated . 
+ The possible logical structures of the connected circuit motifs can be characterized depending on how the TF-AA complex activates or represses both influx ( T and B ) and efflux ( U ) in response to the exogenous amino acids . 
+ Based on the connected circuit motifs , we analyzed the behavior of logical structures of the transcription of transport , biosynthesis , and metabolic genes in responses to the exogenous arginine and leucine ( Fig. 5b ) . 
+ Surprisingly , there are only three influx-efflux combinations found between amino acid groups and TFs ( Fig. 5c ) . 
+ For example , the connected circuit motif controlled by ArgR-arginine complex shows the R-R-A logical structure for group B amino acids ( lysine , histidine , and arginine ) , whereas the logical structure of the motif is switched to A-R-R for glutamate and aspartate and A-R-A for other amino acids . 
+ On the other hand , the connected motif controlled by Lrp-leucine complex indicates the R-R-A logical structure for group D ( valine , leucine , and isoleucine ) and is again switched to A-R-R for glutamate and aspartate and A-R-A for other amino acids . 
+ For glutamate our primary observation was that the utilization was repressed given its role as a substrate for nine biosynthetic pathways ( Fig . 
+ Discussion
+ 3,4 ) . 
+ However we acknowledge that the regulation is highly complex and not universally repressed . 
+ This logically follows from the critical and centralized role it plays throughout the metabolome25 . 
+ Overall , we conclude that for two global transcription factors ( ArgR and Lrp ) in amino acid regulation , the connected circuit motif has an R-R-A logical structure for signaling molecules ( i.e. , arginine for ArgR and leucine for Lrp ) and the A-R-A and A-R-R logical structures for other amino acids ( Fig. 5c ) . 
+ We reconstructed the regulons of ArgR , Lrp , and TrpR in E. coli individually and then integrated them to form the first genome-scale reconstruction of a stimulon . 
+ First , we set out to comprehensively establish the TF-binding regions on the E. coli genome experimentally and furthermore to elucidate any DNA sequence motif ( s ) correlated with the TF regulatory action . 
+ Second , we significantly extended the size of each regulon and obtained 140 , 283 , and 15 target genes for each regulon . 
+ Third , using changes in transcript levels on a genome-scale , we identified the regulatory modes for individual gene governed by each TF in responses to exogenous arginine , leucine , and tryptophan . 
+ The integrated analyses indicate that the functional assignment of the regulated genes is strongly enriched in the amino acid metabolism-related functions . 
+ As suggested previously , many of these genes are likely to be involved in the `` feast or famine '' adaptation for survival in nutrient-rich or depleted environments3 ,9 . 
+ Fourth , we assigned the regulated target genes to three functional categories ; transport , biosynthesis , and metabolism of amino acids . 
+ The classification allowed us to identify the connected circuit motif as a basic building block of the integrated network . 
+ Finally , we determined the regulatory logic of the connected circuit motif based on the causal relationships between the association of TFs and changes in transcript levels . 
+ These fall into two categories and thus allow for the differentiation between amino acids as signaling and nutrient molecules . 
+ In general , transport systems along with biosynthetic and metabolic pathways convert external resources to basic building blocks to sustain life . 
+ The coordinated regulation of this primary process underlies expression of optimized metabolic states under different external conditions . 
+ Thus , we examined the logical structures of the metabolite-regulation connected circuit in response to the changes in the external amino acid availability in the reconstructed stimulon . 
+ We uncovered three unique logical structures that govern the amino acid biosynthesis and metabolism . 
+ The R-R-A logical structure was observed for signaling molecules whereas the A-R-A and A-R-R logical structures were determined for other amino acids severing as nutrient source ( Fig. 5a , b ) . 
+ In principle , every metabolic pathway that includes transport , biosynthesis , and utilization functions could follow these logical structures . 
+ For example , the purine metabolism in E. coli contains a wide range of genes whose functions are transport ( yieG ) , biosynthesis ( cvpA-purF-ubiX , purHD , purMN , purT , purL , purEK , purC , hflD-purB , purA , and guaAB ) , utilization ( apt ) , and a transcriptional regulator ( purR ) . 
+ The metabolic functions of regulon members of PurR enriched into the purine metabolism and the connected circuit motif indicated the logical structures for signaling molecule in response to the exogenous purine26 . 
+ It can be therefore envisioned that other potential metabolic pathways follow similar logical structures as determined for the amino acid metabolism in bacteria . 
+ Bacterial cells import essential nutrients and inorganic ions such as galactose and iron due to the absence of the biosynthesis pathway . 
+ It is therefore of interest that the simple feedback circuit ( SFL ) motif , a connected circuit motif of transporter and utilization pathway by TF , is often observed in the regulatory circuits for these molecules27 . 
+ If we assume the feedback circuit composed of influx and efflux combination , the logical structures of R-R-A , A-R-A , and A-R-R in the CFL motif can be reduced to R-A , A-A , and A-R , respectively . 
+ In E. coli , 
+ Methods
+ the galactose metabolic pathway is controlled by the galactose repressor ( GalR ) and galactose isorepressor ( GalS ) , whereas iron homeostasis is controlled by the ferric uptake regulator ( Fur ) 28,29 In the case of galactose metabolism , both GalR and GalS directly repress the transcription of galP encoding galactose permease . 
+ In a similar way , GalR partially represses the mglBAC operon encoding high-affinity , ABC-type transport system . 
+ When galactose is available in the medium , the DNA-binding by both GalR and GalS is inhibited , followed by the activation of those genes along with the genes for galactose utilization29 . 
+ Therefore , the SFL motif exhibits the A-A logical structure , confirming the exogenous galactose as nutrient . 
+ In the iron homeostasis system in E. coli , intracellular iron binds to Fur , forming the active TF complex , which in turn activates the production of iron-using metabolic enzymes and also shuts down expression of iron transporters . 
+ Interestingly , the SFL motif for Fur regulon exhibits the R-A logical structure , similar to amino acids serving as signaling molecules described above . 
+ Therefore , we can conclude that iron acts as signaling molecule rather than nutrient . 
+ In summary , we have described an integrative analysis of genome-scale data sets to comprehensively understand the basic principles governing a stimulon in the TRN of E. coli . 
+ The overarching regulatory principle elucidated enabled us to differentiate between metabolites as signaling and nutrient molecules . 
+ This important distinction between seemingly similar metabolites is non-intuitive and represents a triumph of genome-scale systems analysis . 
+ Similar analysis of other stimulons and large-scale regulatory networks may reveal that this regulatory principle is general . 
+ Thus , this approach to the analysis of regulation at the network level may reveal other fundamental non-obvious regulatory principles at work in genome-scale regulatory networks . 
+ All strains used are E. coli K-12 MG1655 and its derivatives . 
+ The E. coli strains harboring ArgR-8myc , Lrp-8myc , and TrpR-8myc were generated as described previously13 . 
+ Glycerol stock of ArgR-8myc strains were inoculated into W2 minimal medium containing 2 g/L glucose and 2g/L glutamine , and cultured overnight at 37 °C with constant agitation30 . 
+ The cultures were inoculated into 50 mL of the fresh W2 minimal media in either the presence or absence of 1 g/L arginine and continued to culture at 37 °C with constant agitation to an appropriate cell density . 
+ E. coli strains harboring Lrp-8myc and TrpR-8myc were grown in glucose ( 2 g/L ) minimal M9 medium supplemented with or without 20 mg/L tryptophan or 10 mM leucine , respectively3 ,31 . 
+ To identify ArgR - , Lrp - , and TrpR-binding regions in vivo , we isolated the DNA bound to ArgR protein from formaldehyde cross-linked E. coli cells harboring ArgR-8myc by chromatin immunoprecipitation with the specific antibodies that specifically recognizes myc tag ( 9E10 , Santa Cruz Biotech ) 32 . 
+ Cells were harvested from the exponential growth conditions in the presence or absence of exogenous arginine or tryptophan . 
+ The immunoprecipitated DNA ( IP-DNA ) and mock immunoprecipitated DNA ( mock IP-DNA ) were hybridized onto the high-resolution whole-genome tiling microarrays , which contained a total of 371,034 oligonucleotides with 50-bp tiles overlapping every 25-bp on both forward and reverse strands3 ,14 . 
+ A ChIP-chip protocol previously described was used32 ,33 and microarray hybridization , wash , and scan were performed in accordance with manufacturer 's instruction ( Roche NimbleGen ) . 
+ To monitor the enrichment of promoter regions , 1 μL immunoprecipitated DNA was used to carry out gene-specific qPCR3 . 
+ The quantitative real-time PCR of each sample was performed in triplicate using iCycler ™ ( Bio-Rad Laboratories ) and SYBR green mix ( Qiagen ) . 
+ The real-time qPCR conditions were as follows : 25 μL SYBR mix ( Qiagen ) , 1 μL of each primer ( 10 pM ) , 1 μL of immunoprecipitated or mock-immunoprecipitated DNA and 22 μL of ddH2O . 
+ All real-time qPCR reactions were done in triplicates . 
+ The samples were cycled to 94 °C for 15 s , 52 °C for 30 s and 72 °C for 30 s ( total 40 cycles ) on a LightCycler ( Bio-Rad ) . 
+ The threshold cycle values were calculated automatically by the iCycler ™ iQ optical system software ( Bio-Rad Laboratories ) . 
+ Primer sequences used in this study are available on request . 
+ To identify TF-binding regions , we used the peak finding algorithm built into the NimbleScan ™ software . 
+ Processing of ChIP-chip data was performed in three steps : normalization , IP/mock-IP ratio computation ( log base 2 ) , and enriched region identification . 
+ The log2 ratios of each spot in the microarray were calculated from the raw signals obtained from both Cy5 and Cy3 channels , and then the values were scaled by Tukey bi-weight mean34 . 
+ The log2 ratio of Cy5 ( IP DNA ) to Cy3 ( mock-IP DNA ) for each point was calculated from the scanned signals . 
+ Then , the bi-weight mean of this log2 ratio was subtracted from each point . 
+ Each log ratio dataset from duplicate samples was used to identify TF-binding region using the software ( width of sliding window = 300 bp ) . 
+ Our approach to identify the TF-binding regions was to first determine binding locations from each data set and then combine the binding locations from at least five of six datasets to define a binding region using the recently developed MetaScope software14 ,35 . 
+ The ArgR - , Lrp - , and TrpR-binding motif analysis was completed using the MEME and FIMO tools from the MEME software suite36 . 
+ We first determined the proper binding motif and then scanned the full genome for its presence . 
+ The elicitation of the motif was done using the MEME program on the set of sequences defined by the ArgR - , Lrp - , and TrpR-binding regions respectively37 . 
+ Using default settings the previously determined ArgR38 , Lrp3 , and TrpR7 motif were recovered and then tailored to the correct size by setting the width parameter to 18-bp , 15-bp , and 8-bp respectively . 
+ We then used these motifs and the PSPM ( position specific probability matrix ) generated for each by MEME to rescan the entire genome with the FIMO program . 
+ The sequence logo generated from these sites . 
+ All raw data files can be downloaded from http://systemsbiology.ucsd.edu/publications or Gene Expression Omnibus through accession numbers GSE26054 . 
+ The authors thank Marc Abrams and Joshua Lerman for critical reading of the manuscript . 
+ The National Institutes of Health , through Grant GM062791 , and The Office of Science-Biological and Environmental Research , U.S. Department of Energy , DE-FOA-0000143 supported this work .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/22180530.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/22180530.txt 0 → 100644
View file @27818a9
+ regulation by homologous nucleoid-associated
+ ABSTRACT 
+ IHF and HU are two heterodimeric nucleoid-associated proteins ( NAP ) that belong to the same protein family but interact differently with the DNA . 
+ IHF is a sequence-specific DNA-binding protein that bends the DNA by over 160 . 
+ HU is the most conserved NAP , which binds non-specifically to duplex DNA with a particular preference for targeting nicked and bent DNA . 
+ Despite their importance , the in vivo interactions of the two proteins to the DNA remain to be described at a high resolution and on a genome-wide scale . 
+ Further , the effects of these proteins on gene expression on a global scale remain contentious . 
+ Finally , the contrast between the functions of the homo - and heterodimeric forms of proteins deserves the attention of further study . 
+ Here we present a genome-scale study of HU - and IHF binding to the Escherichia coli K12 chromosome using ChIP-seq . 
+ We also perform microarray analysis of gene expression in single - and double-deletion mutants of each protein to identify their regulons . 
+ The sequence-specific binding profile of IHF encompasses 30 % of all operons , though the expression of < 10 % of these is affected by its deletion suggesting combinatorial control or a molecular backup . 
+ The binding profile for HU is reflective of relatively non-specific binding to the chromosome , however , with a preference for A/T-rich DNA . 
+ The 
+ HU regulon comprises highly conserved genes including those that are essential and possibly supercoiling sensitive . 
+ Finally , by performing ChIP-seq experiments , where possible , of each subunit of IHF and HU in the absence of the other subunit , we define genome-wide maps of DNA binding of the proteins in their hetero - and homodimeric forms . 
+ INTRODUCTION
+ Nucleoid-associated proteins ( NAPs ) are considered to be global regulators of gene expression in bacteria . 
+ They alter the topology of bound DNA by bending , bridging or wrapping it , leading to multiple effects on the bacterial cell including transcriptional regulation ( 1 ) . 
+ Studies of 12 types of NAPs in Escherichia coli showed that they are generally expressed at high levels , and differ from each other in their expression across the growth phase and the degree of sequence speciﬁcity ( 2,3 ) . 
+ The global nature of the effects of NAPs on bacterial physiology has prompted several genome-scale studies of their binding and transcriptional effects in E. coli and Salmonella enterica ; these have sometimes led to intri-guingly conﬂicting conclusions on the functions of NAPs , thus underscoring their complexity ( 4 -- 10 ) . 
+ Two NAPs , IHF and HU , are composed of two hom-ologous subunits each ( IhfA and IhfB ; HupA and HupB ) . 
+ They are both members of the DNABII family of DNA-binding proteins and are strikingly similar to each other in sequence and in their unique structural fold ( 11 ) . 
+ However , the similarities end there : they differ in their sequence speciﬁcity , with IHF being sequence-speciﬁc and HU binding at low afﬁnity along the chromosome ( 2,12 ) with some speciﬁcity toward gapped or nicked DNA ( 13 -- 16 ) . 
+ Whereas the ability of each subunit of HU to form homodimers and bind to the DNA in such a form is relatively well-established ( 17,18 ) , such evidence is less clear for IHF ( 19,20 ) . 
+ Moreover , the two proteins differ in the degree of conservation across bacteria : whereas at least one subunit of HU is found across most bacterial genomes making it the most conserved NAP , IHF has a more restricted occurrence . 
+ Their functions have been described to include regulation of transcription , replication and recombination via DNA binding ( 1 ) and extend to the control of translation initiation by HU via protein -- RNA interactions ( 21,22 ) . 
+ Several molecular and genome-scale studies have investigated the role of IHF in transcriptional control . 
+ Notable among such studies are the description of its effects on the nir ( 23 ) and the ﬁm ( 24,25 ) operons , wherein IHF represses the nir and activates the ﬁm operon . 
+ Also remarkable is the role of IHF in helping the formation of activation loops at enhancer-dependent promoters ( 26 ) . 
+ Compilation of results from molecular studies -- performed under diverse conditions -- by the curators of the RegulonDB database ( 27 ) identiﬁed over 150 genes as being regulated at the transcriptional level by IHF , with over two-thirds activated by IHF . 
+ A very early microarray study ( 28 ) , primarily emphasizing technical aspects of data analysis , identiﬁed genes that are differentially regulated in an DihfA strain grown in MOPS minimal medium ; however , it must be noted that the strain on which the experiment was performed ( a derivative of K12 CP79 ) was different from that for which the microarray was designed ( K12 MG1655 ) . 
+ In Salmonella enterica Typhimurium , deletions of DihfA , DihfB and both DihfA and DihfB each led to different effects on transcription during growth in rich LB medium , thus suggesting distinct binding tendencies of the IhfA2 and IhfB2 homodimers and the IhfAB heterodimer ( 29 ) . 
+ The number of genes responding transcriptionally to Dihf is substantially higher in Salmonella than reported in E. coli ; these genes include virulence determinants in Salmonella . 
+ Finally , a genome-scale study of IHF binding to the E. coli genome using low-resolution micro-arrays showed a preference for the binding regions to be located in non-coding DNA ( 5 ) . 
+ Despite the near universal conservation of HU in the bacterial kingdom , only recently have genome-scale studies been performed to investigate its effect on gene expression . 
+ This is in spite of molecular studies investigating its role in controlling gene expression at speciﬁc loci , most notably the stabilization of the repression loop at the gal promoter ( 30 ) . 
+ One study performed clustering analysis of microarray data obtained for DhupA , DhupB and DhupA/DhupB ( DhupAB ) strains during exponential , transition and stationary phases of growth , thus identifying distinct HupA2 , HupB2 and HupAB regulons comprising genes used in energy metabolism , SOS response and osmolarity and acidic stress response ( 31 ) . 
+ In spite of the established effect of HU on the supercoiled state of the DNA , these authors found little association between genes comprising the HU regulon and those that respond to DNA supercoiling . 
+ Again however , this experiment was performed on a strain of E. coli ( C600 ) which was not the same as that based on which ( MG1655 ) the microarray was designed . 
+ A more recent microarray study of the double-deletion strain showed that genomic loci encoding HU-responsive genes tend to display high gyrase binding and therefore supercoiling sensitivity ( 32 ) . 
+ Finally , in Salmonella , distinct regulons were identiﬁed for the three dimeric forms of HU , such that dissimilar sets of genes were differentially expressed during different phases of growth ( 33 ) . 
+ To our knowledge , though HU is a major NAP , no study has investigated its in vivo binding to the chromosome on a genomic scale . 
+ Despite the above studies , the binding characteristics of the two proteins have not been described at a high resolution and on a genome-wide scale . 
+ Further , as evident from the conﬂicting results of previous studies , the effects of these proteins on gene expression on a global scale remain a contentious issue . 
+ Finally , the contrast between the functions of the homo - and heterodimeric conformations of these proteins remains poorly understood and deserves the attention of further study . 
+ Here , we present a genome-scale study of the binding characteristics of HU and IHF to the E. coli K12 chromosome at four different time-points during batch growth in LB , using chromatin-immunoprecipitation coupled to highthroughput sequencing ( ChIP-seq ) . 
+ We also perform microarray analysis of gene expression in single - and double-deletion mutants of each protein , to identify their regulons . 
+ Finally , by performing ChIP-seq experiments where possible , of each subunit of IHF and HU in the absence of the other subunit , we deﬁne genome-wide maps of DNA binding of the proteins in their hetero-and homodimeric forms . 
+ METHODS
+ Strains and general growths conditions
+ The E. coli K-12 MG1655 bacterial strains used in this work are listed in Supplementary Table 1 . 
+ Luria -- Bertani ( 0.5 % NaCl ) broth and agar ( 15 g l ) were used for 1 routine growth . 
+ Where needed , ampicillin , kanamycin and chloramphenicol were used at ﬁnal concentrations of 100 , 30 and 30 mg ml 1 , respectively . 
+ Construction of E. coli MG1655 knock-outs and FLAG-tagged strains
+ Disruption of ihf and hup genes in the E. coli chromosome was achieved by the Red recombination system ( 34 ) , as previously described by Baba et al. ( 35 ) . 
+ Primers designed for this purpose are shown in Supplementary Table 2 . 
+ Sets of additional external primers were used to verify the correct integration of the PCR fragment by homologous recombination ( Supplementary Table 3 ) . 
+ The cassette was then removed by FLP-mediated site-speciﬁc recombin-ation . 
+ Double-deletion strains were made by P1 transduction ( 36 ) 
+ The 3xFLAG epitope was added at the C terminus of the IhfA , IhfB , HupA and HupB proteins by a PCR-based method with plasmid pSUB11 as template ( 37 ) . 
+ Primers used for introducing the 3xFLAG tag are shown in Supplementary Table 2 . 
+ The tagged construct was then introduced onto the chromosome of E. coli MG1655 using the Red recombinase system . 
+ At each stage , DNA and strain constructions were conﬁrmed by PCR and/or sequencing . 
+ This approach resulted in the introduction of a kanamycin resistance cassette in the chromosome downstream of the tagged gene . 
+ The cassette was then removed by FLP-mediated site-speciﬁc recombination . 
+ RNA extraction and microarrays
+ To prepare cells for RNA extraction , 100 ml of fresh LB was inoculated 1:200 from an overnight culture in a 250 ml ﬂask and incubated with shaking at 180 rpm in a New Brunswick C76 waterbath at 37 C. Two biological replicates were performed for each strain and samples were taken at exponential , late exponential , early stationary and stationary phase . 
+ The cells were pelleted by centrifugation ( 10 000g , 10 min , 4 C ) , washed in 1xPBS and pellets were snap-frozen and stored at 80 C until required . 
+ RNA was extracted using Trizol Reagent ( Invitrogen ) according to the manufacturer 's protocol until the chloroform extraction step . 
+ The aqueous phase was then loaded onto mirVanaTM miRNA Isolation kit ( Ambion ) columns and washed according to the manufacturer 's protocol . 
+ Total RNA was eluted in 50 ml of RNAase free water . 
+ The concentration was then determined using a Nanodrop ND-1000 machine ( NanoDrop Technologies ) , and RNA quality was tested by visualization on agarose gels and by Agilent 2100 Bioanalyser ( Agilent Technologies , Palo Alto , CA , USA ) . 
+ For the generation of ﬂuorescence-labeled cDNA , we used the FairPlay III Microarray Labelling Kit ( Stratagene ) . 
+ Brieﬂy , 1 mg of total RNA was annealed to random primers , and cDNA was synthesized in a reverse transcription reaction with an amino allyl modiﬁed dUTP . 
+ The amino allyl labeled cDNA was then coupled to a Cy3 dye ( GE Healthcare ) containing a NHS-ester leaving group . 
+ The labeled cDNA was hybridized to the probe DNA on microarrays by incubating at 65 C for 16 h . 
+ The unhybridized labeled cDNA was removed and the hybridized labeled cDNA was visualized using an Agilent Microarray Scanner . 
+ Chromatin immunoprecipitation
+ Chromatin immunoprecipitation ( ChIP ) was performed as previously described ( 4,38 ) . 
+ Real-time qPCR
+ To measure the enrichment of the IhfA , IhfB , HupA , HupB or RNAP-binding targets in the immunoprecipitated DNA samples , real-time qPCR was performed using a MJ Mini thermal cycler ( Bio-Rad ) . 
+ About 1 ml of IP or mock-IP DNA was used with speciﬁc primers to the promoter regions ( primer sequences in Supplementary Table 3 ; results in Supplementary Table 4 ) and Quantitect SYBR Green ( QIAGEN ) . 
+ RT-PCR for validation
+ To validate the results of the microarray analysis , quantitative reverse -- transcriptase PCR ( qRT -- PCR ) was carried out using speciﬁc primers to the mRNA targets showing up - or down-regulation , and control targets not showing differential expression ( primer sequences in Supplementary Table 5 ; results in Supplementary Tables 6 and 7 ) . 
+ RNA was extracted as described above from wild type , DihfA , DihfB , DihfAB , DhupA , DhupB and DhupAB cells and 30 ng total RNA was used with the Express One-Step SYBR GreenER kit ( Invitrogen ) according to the manufacturer 's guidelines , using a MJ Mini thermal cycler ( Bio-Rad ) . 
+ Library construction and Solexa sequencing
+ Prior and post library construction , the concentration of the immunoprecipitated DNA samples was measured using the Qubit HS DNA kit ( Invitrogen ) . 
+ Library construction and sequencing was done using the ChIP-Seq Sample Prep kit , Reagent Preparation kit and Cluster Station kit ( Illumina ) . 
+ Samples were loaded at a concentration of 10 pM . 
+ Public data sources
+ The E. coli K12 MG1655 genome was downloaded from the KEGG database and gene coordinate annotations from the Ecocyc 11.5 database ( 39 ) . 
+ Literature-derived transcriptional regulatory network and a list of operons were sourced from the RegulonDB 6.2 database ( 27 ) . 
+ List of genes bound by IHF was obtained from Grainger et al. ( 5 ) . 
+ ChIP-chip signals for DNA gyrase were obtained from Jeong et al. ( 40 ) . 
+ Functional category annotation data for E. coli K12 MG1655 was obtained from the COG database . 
+ RNA-seq data was obtained from our previous publication ( 4 ) . 
+ Analysis of genomic data
+ Reads obtained from the Illumina Genome Analyzer were mapped to both strands of E. coli K12 MG1655 genome using BLAT ( 41 ) , as described previously . 
+ Binding regions for IHF were calculated using the per-base read count distribution as performed earlier ( 4 ) ; in addition to a stat-istical enrichment ( binomial test ) in the ChIP signal over the mock-IP ( as proposed by PeakSeq ) ( 42 ) , we imposed a further 1.5-fold increase in the absolute signal . 
+ We also used the Bioconductor package BayesPeak ( 43,44 ) to identify binding regions for IHF and HU . 
+ For HU , we calculated two gene-level measures of binding signal : ( i ) the highest read count obtained between 150 and +20 of the ORF and ( ii ) the median read count across the ORF body ; the two measures provide equivalent results . 
+ In addition , we adapted a method used previously to analyse data from ChIP-chip experiments for nucleoporins in Drosophila melanogaster , to identify regions of enriched signal for HU ( 45 ) . 
+ This adapted method calculates differences in lo 
+ ( base 2 ) - transformed read count signals over 400 nt windows between the ChIP sample and the mock-IP sample , following normalization by DESeq ( 46 ) . 
+ The left hand side of this distribution , plus its mirror image around the mode gives the null distribution . 
+ All data points over the 95th percentile of the null distribution were considered as representing signiﬁcant binding in the sample . 
+ Binding motifs were identiﬁed using the MEME software and subsequently , binding regions scanned for the occurrence of the motif using MAST ( 47 ) . 
+ An operon was deﬁned as bound by the protein of interest if at least 50 bp of the intergenic region upstream of the operon overlapped with a binding region . 
+ For long intergenic regions , only the ﬁrst 400 bp immediately upstream of the operon were used . 
+ Gene expression analyses were performed on a previously described custom-designed isothermal Agilent microarray platform , and analyzed as described earlier ( 4 ) . 
+ Brieﬂy , array data were background corrected using normexp ( 48 ) and normalization performed using VSN ( 49 ) . 
+ Differential expression in the deletion strains compared with the wild-type was called at FDR-adjusted P-value of 0.05 , and a fold change of at least two . 
+ All statistical tests were carried out using R. 
+ RESULTS
+ DNA-binding properties of IHF and HU subunits
+ To study the binding characteristics of IHF and HU to the E. coli chromosome , we performed immunoprecipitation of each protein subunit -- during mid-exponential , late-exponential , transition-to-stationary and stationary phases of growth -- and sequenced the cross-linked DNA using an Illumina Genome Analyzer system ( ChIP-seq ) . 
+ We also used control data from a mock-IP experiment ( for mid-exponential phase ) described in our previous study ( 4 ) . 
+ We mapped the short sequence reads obtained from each sequencing experiment to the E. coli K12 MG1655 genome ( KEGG ID : eco ) . 
+ For each sample , we then obtained a read count distribution , quantiﬁed by the number of reads that map to each base position on the chromosome . 
+ We inspected the nature of the read count distributions by plotting their densities ( Figure 1A ) . 
+ The distributions for the various IHF samples each had a heavy right tail corresponding to regions of speciﬁc binding . 
+ On the other hand , the distributions for HU were only slightly skewed to the right , and in this respect similar to that from the mock-IP experiment . 
+ Whereas the read counts obtained for IHF were only weakly correlated to the mock-IP control ( = 0.12 for IhfA , mid-exponential phase sample ; the weak correlation presumably arising from a systemic background ) , those for HU showed a more signiﬁcant correlation with the mock-IP ( = 0.47 for HupA , mid-exponential phase ; Figure 1A ) . 
+ Similarly , plots of the distribution of mock-IP subtracted signal for HU ( following division of read counts by the total number of reads obtained in that sample , and log transformation ) was centered around zero with a relatively weak right-sided tail ( Figure 2A ) . 
+ On the other hand , this distribution for 
+ IHF was offset from zero with a peak well-below zero representing most of the genome with little or no binding , and those to the right corresponding to regions of enriched signal . 
+ Despite the strong resemblance of the HU data to the mock-IP , our HU experiment is representative of the protein 's DNA binding proﬁles for the following reasons : ( i ) there is a considerable right-sided tail to the mock-IP-subtracted HU ChIP-seq signal ; ( ii ) the read count proﬁle for each HU subunit is more correlated with that for the other subunit ( = 0.83 ) than with that for the mock-IP ( = 0.47 and 0.59 for HupA and HupB , respectively ; Figure 1B ; Supplementary Figure S1 ) ; ( iii ) the proﬁle at any given time-point is also more strongly correlated with that from the adjacent time-point than with the mock-IP proﬁle ( = 0.88 for HupA between exponential and late exponential phases ; Figure 1B and Supplementary Figure S2 ) ; ( iv ) ChIP experiments for HU are reproducibly successful , unlike that for the mock-IP which typically provides very low concentrations of DNA not always sufﬁcient for a sequencing reaction . 
+ Taken together , these provide a genome-wide , highresolution , in vivo validation of prior molecular data suggesting that IHF binds DNA in a sequence-speciﬁc manner whereas HU binds more uniformly . 
+ The strongly right-tailed distribution for IHF allowed us to identify regions of enriched signal -- or binding regions -- using a stringent version ( Methods ) of a proced-ure described earlier ( 4 ) . 
+ Over 85 % of the 1042 ( 1022 ) binding regions thus obtained for IhfA ( and IhfB ) overlap with those obtained using another published method BayesPeak ( 43,44 ) . 
+ We noted a similar agreement between our method and another previously used in our lab to detect binding regions from eukaryotic ChIP-chip / seq experiments ( 45 ) . 
+ In general , the signal enrichment in these IHF-bound regions is signiﬁcantly higher than that for another sequence-speciﬁc , yet promiscuous , NAP : FIS ( Figure 3A ) . 
+ During exponential phase , IHF-bound regions ( either subunit ) cover 13 % of the genome , including upstream regions of 443 operons ( 17 % ) . 
+ Genes identiﬁed as bound by either subunit of IHF during the two exponential-phase time-points in our study cover 68 % of those identiﬁed in an earlier publication using mid-resolution ChIP-chip microarrays ( 5 ) . 
+ We also recovered the known binding motif for IHF from these data ( Figure 3B ) . 
+ We detected 2999 and 3162 occurrences of this motif within the binding regions of IhfA and IhfB , respectively . 
+ Of these motifs , < 10 % is localized to regions upstream of predicted operons . 
+ This proportion is small compared to our previous data for Fis for which over 20 % of the binding regions fell upstream of operons . 
+ This is in line with the smaller number of bound operons ( based on binding to upstream regions ) per mega base pair of bound DNA for IHF when compared to Fis ( approximately 720 operons per mega base pair of binding region for IHF , compared to approximately 1250 operons per mega base pair for Fis ) . 
+ Because the binding proﬁle from our HU ChIP-seq experiment shows a strong resemblance to that from the mock-IP with relatively weak signals ( Figure 2A -- C ) , we used two methods to characterize its binding . 
+ First , we obtained a HU occupancy measure for each gene in th 
+ B
+ genome , which was deﬁned by the median of the read count distribution across the gene body . 
+ This value was then normalized by the corresponding value in the mock-IP data . 
+ This method is similar to that used to quantify nucleosome occupancies in eukaryotic studies ( 50 ) . 
+ This normalized HU occupancy correlates positively with the A/T content of the bound DNA ( Figure 4A ) . 
+ Second , for the exponential phase sample , we adapted a procedure used previously to investigate ChIP-chip data for nucleoporins in Drosophila ( 45 ) -- which also showed wide-spread but low levels of binding , to identify 1104 and 1179 regions of enriched signal for HupA and HupB , respectively , with excellent agreements between the binding regions for the two subunits ( > 90 % of peaks in the smaller list overlap with those in the second list ) . 
+ In agreement with our observations using gene-based occupancy proﬁles , these binding regions have signiﬁcantly higher A/T content than the genomic average ( Figure 4B ) . 
+ However , motif identiﬁcation was not reliable , as different motifs were identiﬁed as signiﬁcant for HupA and HupB despite their binding regions overlapping strongly ( Figure 4C ) . 
+ This suggests that slight variations in the exact positioning of the binding regions might affect motif identiﬁcation . 
+ Nevertheless , the one common feature of the identiﬁed motifs is A/T richness , which is in agreement with the ﬁndings described above and with results from an earlier report of in vitro speciﬁcity of HU -- DNA interactions ( 51 ) . 
+ This partiality towards A/T-rich genomic regions may be in line with previous 
+ B
+ C
+ reports suggesting a preference for HU to bind to bent DNA ( 52,53 ) . 
+ In summary , HU binds largely in a non-speciﬁc fashion to the chromosome , with a particular preference toward targeting A/T-rich regions . 
+ Finally , comparison of binding signals obtained from each subunit of the same protein indicates a high degree of correspondence between the two ( Figure 1B ) . 
+ Notably for IHF , the proportions of the genome covered by the binding regions for IhfB ( identiﬁed as described below ) were considerably more than that for IhfA in three of the four time-points ( excepting mid-exponential phase ) . 
+ Binding regions for IhfB are generally longer ( by 5 -- 10 % median ; P < 10 , Paired Wilcoxon test for all the abov 6 three time-points ; Supplementary Figure S3 ) than the corresponding region for IhfA suggesting that many IhfB binding regions are extensions of IhfA binding regions . 
+ This might be in concordance with a previous report showing that IhfB homodimers are more likely to form than IhfA dimers ( 19 ) , but that such dimers may not exist freely in solution ( 20 ) . 
+ For both proteins , there is high correlation among the binding proﬁles across time-points during our batch culture ( Figure 1B ) . 
+ Effects of IHF and HU on global gene expression in E. coli
+ To investigate the effects of IHF and HU on gene expression in E. coli , we created single ( DihfA , DihfB , DhupA and DhupB ) and double-deletion ( DihfAB , DhupAB ) strains for the genes comprising the subunits of the two proteins ( Supplementary Figures S4 and S5 ) . 
+ We then performed microarray experiments measuring transcript abundance in these strains during exponential and late-exponential , transition to stationary and stationary phases of growth and compared them to that in the time-matched wild-type cells . 
+ Effect of IHF on E. coli gene expression
+ We observe differential expression of only a small number of genes in the ihf single deletions ( 97 for IhfA and 56 for IhfB across all four conditions ) . 
+ Though a signiﬁcantly larger number of genes are differentially expressed in the ihfAB double deletion , the number is much smaller ( 477 across all four conditions ) than what we previously observed ( 4 ) for other sequence-speciﬁc nucleoid proteins such as Fis ( 1104 genes adopting the same criteria for calling differential expression as for the IHF data ) and H-NS ( 1987 genes ) . 
+ Most of these effects are seen during the two exponential phases with only approximately 50 genes being differentially expressed -- compared with the wild-type -- during the stationary phase . 
+ Across the conditions , almost equal numbers of genes are up - or downregulated in DihfAB ; however , over two-thirds of the genes that are differentially expressed during late-exponential phase are upregulated ( 70 % ) . 
+ Among the genes upregulated in DihfAB , there is a statistical enrichment for genes involved in ` energy production and conversion ' , a property that is seen particularly in late-exponential phase ; however , these do not show any strong representations of individual metabolic pathways . 
+ There is very little overlap among the sets of genes differentially expressed across different time-points ( Supplementary Figure S4 ) , despite the fact that the binding proﬁle of IHF does not change signiﬁcantly with growth phase . 
+ Further , similar to observations made earlier for Fis ( 4,6 ) , there is very little correspondence between IHF binding and differential expression . 
+ Speciﬁc examples of IHF-bound genes that are differentially expressed in the double deletion includes the ﬁm operon , which is strongly downregulated in the deletion strain in exponential phase . 
+ This is in agreement with prior molecular studies which have implicated IHF in both phase-switching and gene expression control at the ﬁm operon ( 25 ) . 
+ The lack of an observable global effect of IHF on the expression of genes bound by it might be explained by combinatorial regulation , i.e. the possible role of IHF as a facilitator of binding of other transcription factors to gene-upstream regions . 
+ For example , using ChIP-seq data previously generated in our lab , we ﬁnd that there is a signiﬁcant overlap between the genes bound by IHF and those by Fis ( 35 % of genes bound in all conditions by IHF are also bound by Fis ; P = 2 10 , Fisher 's exact 5 test ) . 
+ A previous study has shown that IHF is the second-most proliﬁc transcription factor in terms of the number of other transcription factors with which it shares target genes ( 54 ) . 
+ A striking example of this is the observed binding of IHF to a signiﬁcant proportion ( 40 % ; 5 P = 4 10 , Fisher exact test ) of genes regulated by s54 , whose activation by AAA ATPase transcription factors + requires IHF-dependent DNA bending ( 55 ) . 
+ The effect of such binding on gene expression might be highly speciﬁc to conditions , such as nitrogen limitation , not used in this study . 
+ Effect of HU on E. coli gene expression
+ In contrast to IHF , mutants deﬁcient in HU show large changes in gene expression ; across the four conditions tested here , 1490 genes are up , or downregulated in either the single or the double mutants ( Supplementary Figure S5 ) . 
+ The greatest effect is seen in the double mutant in which 1266 genes are differentially expressed when compared to the wild-type ; 512 genes change in expression in hupA whereas only 107 genes do so in hupB . 
+ Overall , a majority of differentially expressed genes are upregulated in hupAB ( 56 % ; P < 0.001 , compared against random assignments of up and down regulation of genes ) and hupA ( 69 % ; P < 0.001 ) -- the two mutants that display global changes in gene expression . 
+ A statistically signiﬁcant proportion of genes differentially expressed in hupA also change in expression in hupAB ( 43 and 54 % of genes up - and downregulated in hupA ; P < 10 , Fisher 's exact test ; Figure 5A ) ; despite 6 this , it must be noted that a signiﬁcant component of each regulon is distinct from the other . 
+ Genes that are upregulated in hupAB show an enrichment for essentiality for growth in rich media ( P < 10 6 for sets , Fisher 's exact test ; Figure 6 ) ; this is not true of genes differentially expressed in hupA . 
+ We then analyzed the COG functional categories of genes that are differentially expressed in these mutants , and ﬁnd that genes involved in translation and ribosome biogenesis are upregulated in the double mutant but not in hupA ( or in hupB ) . 
+ We also ﬁnd that genes involved in motility are upregulated in both the mutants . 
+ Finally , since HU is the most conserved NAP in bacteria , we analyzed the degree to which its target genes in E. coli are conserved across prokaryotes . 
+ Genes that are upregulated in hupAB tend to be highly conserved , whereas the same is not true of genes that are downregulated by hupAB or those that change in expression in hupA . 
+ The high degree of conservation observed for hupAB targets is not merely due to the aforementioned enrichment of genes involved in translation . 
+ It had previously been observed that gene that are differentially expressed in hupAB tend to be bound by DNA gyrase ( 32 ) , and are supercoiling-sensitive ; in our data , this trend is relatively weak in hupAB , though statistically signiﬁcant ( P < 10 ; Mann -- Whitney 6 test ) , but absent in hupA . 
+ It has been shown previously that deletion of HU leads to an increase in the accessibility of DNA to the DNA relaxing activity of topoisomerase I ( 32,56 ) . 
+ This is , at ﬁrst glance , at odds with the observation that genes , which are bound by the opposing DNA gyrase tend to be upregulated in the hupAB mutant in the present work and in an earlier work by Muskhelishvili 's group ( 32 ) . 
+ The authors of the above paper showed that there is little change in the unconstrained supercoiling levels in the double mutant ( 32 ) . 
+ They further hypothesized that the increased accessibility of topoisomerase I to the DNA in the HU double mutant might be compensated by higher local negative supercoiling introduced at the upregulated loci by greater DNA gyrase binding and higher levels of transcription . 
+ To test this hypothesis we classiﬁed all genes into four groups based on gyrase binding ( 40 ) and mid-exponential phase gene expression levels as measured using RNA-seq experiments in wild-type cells ( 4 ) : ( i ) HEHG : high expression , high gyrase binding ( high deﬁned by the top third of the distribution ) ; ( ii ) HELG : high expression , low gyrase binding ( low deﬁned by the bottom third of the distribution ) ; ( iii ) LEHG : low expression , high gyrase binding ; ( iv ) LELG : low expression , low gyrase binding . 
+ Though only 27 % of all classiﬁable genes belong to HEHG , 54 % of genes upregulated in hupAB have high gyrase binding and high gene expression in wild-type cells ( P < 10 ; Fisher 's exact 6 test ; Supplementary Figure S6 ) . 
+ This might indicate a possible role for increased local negative supercoiling , introduced by a combination of high transcription and DNA gyrase binding in determining upregulation in the hupAB mutant . 
+ Analysis of expression patterns across different phases of growth reveals complex trends ( Figure 5B ) . 
+ There is a progressive decrease in the number of genes that are differentially expressed in hupA as the culture progresses through batch growth ; however , there is only a slight overlap between the lists of differentially expressed genes in different time points . 
+ On the other hand , in hupAB , many genes change in expression during late exponential and stationary phases , though signiﬁcant effects could be seen during the other two time-points ; again each phase of growth sees a largely distinct set of genes being differentially regulated . 
+ These are described below . 
+ During exponential phase , similar numbers of genes are differentially expressed in hupA and hupAB , with a small though statistically signiﬁcant overlap between them . 
+ Essential , conserved genes and those involved in translation and ribosome biogenesis are over-represented among genes upregulated in hupAB but not in hupA . 
+ Similar enrichments are seen in the larger set of genes that is upregulated in hupAB during late-exponential phase ; however , in contrast to the earlier time-point , almost all genes that are up-regulated in hupA also do so in hupAB . 
+ During the two exponential phase time-points , the number of genes upregulated in hupAB ( 67 % ) overwhelms those that are downregulated . 
+ Only a few genes change in expression in hupB during this period . 
+ Though relatively few genes are differentially expressed during the transition to stationary phase , we note that there is a striking upregulation of various ﬂagellar genes , involved in motility , in all three mutants at this time ; we have validated several of these using RT -- PCR ( Supplementary Table S6 ) . 
+ This is at odds with previous observations of a HU mutant that is non-motile in E. coli K12 W3110 ( 57 ) because of reduced transcription of the ﬂagellin gene . 
+ Swimming motility assays performed by us resulted in smaller swarm diameters for the various hup mutants , the double mutant in particular ; however , all mutants were motile ( Supplementary Figure S7 ) . 
+ Finally , during stationary phase , only hupAB shows global changes in gene expression . 
+ Unlike in the earlier timepoints , only a slight majority ( 55 % ) of genes are upregulated , in which there is a statistical over-representation of translation-associated genes ( P < 10 , Fisher 's exact test 6 followed by multiple correction by FDR ) ; there is no functional enrichment detectable among downregulated genes . 
+ In summary , deletion of both hupA and hupB has signiﬁcantly greater impact on gene expression than that of either gene alone , with hupA displaying greater gen expression changes than hupB . 
+ Further , genes that are upregulated in hupAB but not in hupA tend to be stat-istically enriched in essential cell processes such as translation , and are more conserved in prokaryotes than expected by random chance . 
+ These results may be consist-ent with our observation that during growth , cell densities are lower ( 25 % less than the wild-type ) in the double deletion than in the wild-type or the single deletions ; this possibly arises from a longer lag phase observed in the double deletion , although the growth rate of hupAB during exponential phase does not seem to be different from that of the wild-type [ Supplementary Figure S8 ; also reported by ( 32 ) ] . 
+ Investigations of potential IHF and HU homodimers binding to the chromosome
+ Following from our observations of largely incongruent effects of single and double mutants of HU and IHF on E. coli gene expression , we performed ChIP-Seq experiments on each subunit of the two proteins , in strains carrying deletions of the second subunit . 
+ For IHF , our experiments did not yield enough DNA to perform sequencing reactions ; this is suggestive of very weak or no homodimer binding to the DNA , at least in the absence of the second subunit . 
+ For HU , we were able to obtain binding proﬁles for each subunit in the absence of the other , which were strikingly similar to those in the wildtype ( Supplementary Figure S9 ) . 
+ This indicates that the homodimers bind to the chromosome in similar patterns as the heterodimer . 
+ This may be reﬂected in the fact that most bacterial genomes encode only one HU subunit , and is supportive of the fact that more genes change in expression in the double deletion than in the single mutants . 
+ It has previously been suggested that HupA2 is the predominant form of HU in exponential phase , whereas the heterodimer takes over during later stages of growth ( 17 ) . 
+ Western blots presented here show higher expression of HupA than HupB during exponential phase ( Supplementary Figure S10 ) . 
+ Gene expression data described above show greater gene expression changes in hupA than in hupB , especially during exponential phase . 
+ Our ChIP-seq data , however , show similar binding proﬁles for both subunits of HU across all stages of growth ; results reported in this section further suggest that the binding proﬁle of HupB is similar between the wild-type and DhupA mutant . 
+ Though it is possible that there is a uniform reduction in binding signals for HupB in DhupA , which might account for gene expression changes observed in DhupA during exponential and late-exponential phases , there is little reorganization of HupB 's binding proﬁle . 
+ DISCUSSION
+ IHF and HU are two nucleoid-associated proteins that belong to the same DNA binding protein family , but show distinct levels of sequence speciﬁcities . 
+ IHF , a sequence-speciﬁc DNA binding protein , has extreme effects on the topology of bound DNA , which it bends by 160 ( 58 ) . 
+ HU , the most conserved NAP , binds more uniformly to the E. coli chromosome , with a preference for distorted DNA structures ( 13 -- 16,52,53 ) . 
+ Both proteins exist as heterodimers in E. coli . 
+ In this article , we report results from our genome-scale studies of the binding of IHF and HU to the chromosome of E. coli K12 MG1655 , and its effects on gene expression at various time-points of batch culture , from growth to stasis . 
+ IHF displays sequence-speciﬁc binding to the E. coli chromosome , with signal intensities signiﬁcantly stronger than those observed for Fis , another sequence-speciﬁc NAP . 
+ The two subunits of IHF show similar binding proﬁles , indicative of preferential heterodimer formation . 
+ In the wild-type strain , IhfB binding regions cover more of the chromosome than those of IhfA ( in three of the four conditions ) ; this might be in line with a prior observation that IhfB homodimers form more readily than IhfA homodimers ( 19 ) . 
+ However , our inability to recover enough DNA from ChIP experiments for IhfB in DihfA suggests that such homodimer formation may occur on the DNA ( 20 ) , only in the presence of a nucleating heterodimer complex . 
+ Across the four conditions tested , IHF binding regions target the upstream regions of over 30 % of all predicted operons , indicative of a global role for the protein in regulating gene expression . 
+ However , only 10 % of these change in expression when the genes coding for the two subunits of IHF are deleted . 
+ Additionally , compared to the effects of other sequence-speciﬁc NAPs such as Fis and H-NS ( 4 ) , the overall effect of IHF on gene expression under the present conditions is less in terms of the number of genes that are differentially expressed in the deletion mutant ( s ) . 
+ This might be linked to our observation that , in contrast to Fis where over 20 % of binding motifs lie upstream of operons ( 4 ) , only 10 % of predicted IHF binding motifs are so positioned . 
+ We suggest that the minimal proximal effect of IHF on gene expression might be due to combinatorial regulation , i.e. the tendency of IHF to regulate genes jointly with other factors . 
+ Another possibility is that HU might compensate for the absence of IHF ; this has been demonstrated for excisive recombination at speciﬁc sites ( 59 ) , but remains to be investigated on a genomic scale in the context of transcription . 
+ In this context , it must be noted that IHF has important functions outside of transcriptional regulation such as recombination ( 60 ) , which are not apparent in our transcriptome experiment ; in fact the large majority of binding sites which are located in non-intergenic regions might have such functions . 
+ In agreement with current knowledge , the binding proﬁle for HU is reﬂective of relatively non-speciﬁc binding to the chromosome , however with a notable preference for A/T-rich DNA in concordance with previous in vitro studies ( 51 ) . 
+ It has been shown previously that the composition of the HU dimer varies across the various phases of growth of E. coli with HupA2 being the dominant form during exponential phase and the heterodimeric form dominating during later stages of growth ( 17,61 ) . 
+ Our western blots do indicate higher levels of expression of HupA than HupB during exponential and late-exponential growth phases . 
+ But , the bindin proﬁle of each subunit of HU strongly correlates with that of the other subunit across the growth phases , including exponential growth ; this might indicate that homo - and heterodimeric forms of HU could bind to the DNA interchangeably . 
+ Further , the binding proﬁle of each subunit in the wild-type is similar to that in an otherwise isogenic strain that is lacking the other subunit . 
+ It is possible that the binding of HupB to the chromosome is uniformly less ( across the genome ) in DhupA than in the wild-type , thus accounting for gene expression changes seen in DhupA during exponential and late-exponential phases of growth . 
+ This interpretation may be in line with previous reports showing in vitro that HupB2 binds poorly to duplex DNA ( 18 ) ; however there is enough binding for us to recover in our ChIP experiments . 
+ However our data do not suggest any large-scale reorganization of the HupB binding proﬁle following hupA deletion . 
+ In apparent conﬂict with the consistency that the two subunits show in their binding , they have substantially different effects on gene expression . 
+ Brieﬂy , we observe that hupAB has the greatest effects on gene expression , distantly followed by hupA , with minimal effects seen in hupB . 
+ Previous studies in E. coli C600 and Salmonella enterica have also observed discordance between the sets of genes differentially expressed in single and double de-letions of HU subunits ( 11,33 ) . 
+ The authors of the paper on E. coli C600 identiﬁed few genes as members of HupB2 , interpreting this as a possible consequence of previously observed instability and low expression level of this form of the protein at 37 C , and the inability of HupB2 to introduce negative supercoiling on relaxed DNA in the presence of topoisomerase I ( 11 ) . 
+ A similar observation -- hupB showing signiﬁcantly smaller changes in gene expression than hupA and hupAB -- was made in two of the three growth phase time-points tested in S. enterica ; however , the extent of differential expression was similar in hupAB and hupA across all time-points ( 33 ) . 
+ The aforementioned study on E. coli C600 showed that the HU regulon is composed of genes involved in energy metabolism , SOS response , and osmolarity and acid stress responses ( 31 ) . 
+ In contrast to the conclusions of a later study ( 32 ) , which investigated the transcriptome of only the double mutant , these authors did not ﬁnd any supercoiling dependence in the expression of the members of the HU regulon . 
+ Here we observe that genes that are upregulated in hupAB are statistically enriched for essential cellular functions such as translation and show statistically higher binding to DNA gyrase than other genes . 
+ Our analysis also agrees with a previous hypothesis that local negative supercoiling introduced by DNA gyrase and high transcription might compensate for the increased accessibility of DNA to topoisomerase I in the HU double mutant ( 32 ) . 
+ Despite the fact that a signiﬁcant proportion of genes that are upregulated in hupA also change in expression in hupAB , the above functional enrichments are not observed in hupA . 
+ Similarly , and in line with the fact that HU is the most conserved NAP in bacteria , genes upregulated in hupAB are more conserved across bacteria than other genes . 
+ This may also be reﬂected in the fact that the growth curve of hupAB , but not that of hupA or hupB , differs from that of the wild-type . 
+ Moreover , many bacterial genomes encode only one subunit of HU ; the fact that a second subunit is encoded in E. coli might in part build in some redundancy to this conserved regulatory system . 
+ However , it is remarkable that the subunit of HU that is more conserved across bacteria is HupB , which appears to be the minor player in gene expression control at least in E. coli and S. enterica both of which encode both subunits of this protein . 
+ ACCESSION NUMBER
+ All sequence data have been deposited at NCBI SRA ( Study accession SRP008538 ) . 
+ All microarray data have been deposited at ArrayExpress ( accession number E-MEXP 3461 ) . 
+ SUPPLEMENTARY DATA
+ Supplementary Data are available at NAR Online : Supplementary Tables 1 -- 7 , Supplementary Figures 1 -- 10 , Supplementary Data set . 
+ ACKNOWLEDGEMENTS
+ The authors thank Vladimir Benes , David Ibberson and Sabine Schmidt at the Genomics Core Facility , EMBL-Heidelberg for the sequencing and microarray experiments . 
+ They thank the three anonymous referees for their critical comments and suggestions . 
+ FUNDING
+ Girton College , University of Cambridge ; Ramanujan Fellowship , Department of Science and Technology , Government of India SR/S2/RJN -49 / 2010 ( to A.S.N.S. ) ; Spanish Ministry of Science and Innovation ( to A.I.P. ) ; Biotechnology and Biological Sciences Research Council ( BBSRC ) grant ` Genomic Analysis of Regulatory Networks for Bacterial Differentiation and Multicellular Behaviour ' BB/E011489/1 ( to G.M.F. ) and BB/E01075X/1 ( to N.M.L. ) ; Isaac Newton Trust ( to G.M.F. ) ; European Molecular Biology Laboratory ( EMBL ) ( to N.M.L. ) . 
+ Funding for open access charge : European Molecular Biology Laboratory .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/22689638.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/22689638.txt 0 → 100644
View file @27818a9
+ Altered tRNA characteristics and 3 maturation in
+ ABSTRACT acid proteins . 
+ Reliable and efﬁcient translation depends critically on tRNA , which must exhibit speciﬁcity in aminoacylation , and correct pairing of the anticodon with its codon on the mRNA . 
+ The robust nature of the genetic code and numerous genome-encoded mechanisms promote translational accuracy ( 1,2 ) , thus preventing deleterious events such as the reassignment of codons that can alter the function of thousands of genes . 
+ Nevertheless , tRNAs and the genetic code sometimes do change , especially in genomes undergoing size reduction as exempliﬁed by mitochondria and plastids ( 1,3,4 ) . 
+ These organelle genomes , which are derived from genomes of symbiotic bacteria ( 5,6 ) , exhibit the most extreme cases of architectural alterations such as an increase in molecular evolutionary rate , inability to recombine , and massive gene loss that sometimes leads to tRNA loss and changes in the genetic code ( 7 ) . 
+ Organelles encode a limited set of proteins and rely on other co-occurring genomes for enzymes and tRNAs ( 3,8,9 ) . 
+ The reduced genomes of some bacterial endosymbionts exhibit similar but less extreme alterations in genome sequence compared with organelles ( 10 ) . 
+ However , unlike organelles , most endosymbionts are still autono-mous in the sense that they possess their own core genetic machinery ( 11 -- 13 ) , including the conventional bacterial structure of tRNAs ( 14,15 ) . 
+ Most endosymbionts retain the universal genetic code , but exceptions do exist among the tiniest genomes , in which UGA is sometimes recoded from Stop to Trp ( 16,17 ) . 
+ In contrast , organelle tRNAs and their translational machinery are highly divergent from those of most bacteria ( 1,3,8,18 ) . 
+ The question still remains as to how endosymbiont tRNAs and translational mechanisms differ from those of ancestral free-living genomes that are not reduced . 
+ Overall , we hypothesize that the process of genome shrinkage in endo-symbionts results in a reduction of translational efﬁciency and integrity resembling a transitional stage between free-living ancestors and organelles . 
+ Present day genomic features of bacterial endosymbi-onts result from their ancient transition from a free-living lifestyle to an obligate intracellular association ( 10 ) . 
+ Translational efficiency is controlled by tRNAs and other genome-encoded mechanisms . 
+ In organelles , translational processes are dramatically altered because of genome shrinkage and horizontal acquisition of gene products . 
+ The influence of genome reduction on translation in endosymbionts is largely unknown . 
+ Here , we investigate whether divergent lineages of Buchnera aphidicola , the reduced-genome bacterial endosymbiont of aphids , possess altered translational features compared with their free-living relative , Escherichia coli . 
+ Our RNAseq data support the hypothesis that translation is less optimal in Buchnera than in E. coli . 
+ We observed a specific , convergent , pattern of tRNA loss in Buchnera and other endosymbionts that have undergone genome shrinkage . 
+ Furthermore , many modified nucleoside pathways that are important for E. coli translation are lost in Buchnera . 
+ Additionally , Buchnera 's A+T compositional bias has resulted in reduced tRNA thermostability , and may have altered aminoacyl-tRNA synthetase recognition sites . 
+ Buchnera tRNA genes are shorter than those of E. coli , as the majority no longer has a genome-encoded 3 ' CCA ; however , all the expressed , 0 shortened tRNAs undergo 3 CCA maturation . 
+ Moreover , expression of tRNA isoacceptors was not correlated with the usage of corresponding codons . 
+ Overall , our data suggest that endosymbi-ont genome evolution alters tRNA characteristics that are known to influence translational efficiency in their free-living relative . 
+ INTRODUCTION
+ In the ﬁnal step of protein synthesis , mRNA sequences must be accurately and efﬁciently translated into amino 
+ Many bacteria that replicate strictly in host intracellular environments possess reduced genomes with sequences that are A+T biased relative to those of their free-living ancestors ( 10,19,20 ) . 
+ One such bacterium demonstrating these genomic shifts is Buchnera aphidicola , an obligate unculturable endosymbiont of aphids ( 21 ) . 
+ Buchnera has coevolved with its aphid hosts for 200-250 million years ( 21 ) , during which its genome shrunk to only 416 -- 652 kbp depending on the lineage ( 22 -- 27 ) . 
+ Based on previous gene expression and genomic studies in Buchnera , genome reduction and accelerated sequence evolution has resulted in changes that are hypothesized to lower the efﬁciency and accuracy of transcription and translation ( 28 -- 31 ) as compared with the free-living relatives . 
+ We predict that Buchnera will also exhibit less optimal tRNA features . 
+ Presently , transcribed tRNAs and associated transcriptional mechanisms , which are key components of efﬁcient and accurate translation , have not been extensively examined in Buchnera or any other bacterial endosymbiont . 
+ Comprehensive characterization of transcribed endo-symbiont tRNAs has previously been difﬁcult largely because of the inability to isolate unculturable symbiont tRNAs free of host contamination . 
+ However , analysis of tRNAs beyond the level of DNA-encoded genes can reveal the nature of tRNA maturation , including the diversity of posttranscriptional processing that may occur . 
+ Taking advantage of new methodologies in high-throughput RNA sequencing ( directional RNAseq ) , and the availability of several divergent Buchnera genomes ( 23,25,27 ) , we investigated how genome reduction and A+T richness affect tRNA evolution in this model endosymbiont . 
+ This comparative framework provides us with an understanding of the conservation of tRNA sequences that inﬂuence speciﬁcity in aminoacylation and secondary structure as well as conservation of nucleoside modiﬁcation pathways that inﬂuence anticodon -- codon base pairing ( 1,2,32 ) . 
+ From these data , we were able to address how Buchnera tRNAs and associated transcriptional ﬁdelity mechanisms are altered relative to those of free-living relatives , exempliﬁed by Escherichia coli . 
+ Additionally , because numerous reduced endosymbiont genomes have recently been sequenced ( 10 ) , we investigated whether a pattern of tRNA loss was present among reduced endosymbiont genomes . 
+ MATERIALS AND METHODS Sample preparation
+ Four aphid species , Acyrthosiphon pisum ( strains LSR1 and 5A ) , Acyrthosiphon kondoi ( strain Ak ) , Schizaphis graminum ( strain Sg ) and Uroleucon ambrosiae ( strain UA002 , referred to as Ua ) , were reared in the same growth chamber at 20 C. A. pisum was reared on seedlings of Vicia faba , A. kondoi on Medicago sativa , U. ambrosiae on Tithonia mexicana and S. graminum on Hordeum vulgare . 
+ For each aphid strain , B. aphidicola cells were ﬁltered from 3 g of mixed age aphids . 
+ Filtration was done according to the study by Moran et al. ( 33 ) , with modiﬁcations as follows . 
+ First , modiﬁed buffer A ( 34 ) was used instead of PBS . 
+ Also , after the 1000 rpm centrifugation step , the pellet was resuspended and used for subsequent ﬁltration steps instead of the supernatant . 
+ After the last centrifugation step , supernatant and the protein layer were discarded and the pellet was immediately immersed with Ambion TRI Reagent Solution . 
+ For RNA extraction , a similar protocol was used as in Hansen and Moran ( 34 ) except that , after step 5 , Qiagen 's miRNAeasy protocol under appendix A from Qiagen 's miRNAeasy Mini Handbook was used to enrich for miRNA ( i.e. RNA < 200 bp ) . 
+ RNA was DNAase treated , and quality and quantity was checked as in Hansen and Moran ( 34 ) . 
+ All ﬁltration and extraction materials were treated with RNAse AWAY ( Molecular BioProducts , Inc , CA , USA ) , and all solutions were RNase free . 
+ RNA sequencing, read processing, mapping, expression and identiﬁcation
+ The Yale Keck sequencing center carried out library preparation and sequencing of Buchnera tRNA for all ﬁve aphid strains . 
+ Brieﬂy , for tRNA library preparation , the Illumina mRNA directional sequencing protocol was followed starting at the phosphatase treatment step . 
+ RNA < 200 bp was directionally sequenced one lane per sample with Illumina 35 bp reads . 
+ The CLC Genomic Workbench Aarhus , Denmark was used for read processing and mapping . 
+ For all reads , small RNA adapters and reads with ambiguous nucleotides were trimmed from reads . 
+ Trimmed reads were then mapped to corresponding Buchnera genomes ( Table 1 ; 23,25,27 ) with CLC Genomic Workbench short read local alignment mapping using the default settings for short reads . 
+ All Buchnera taxa used in this study possess similar genome sizes ( Ap-5A = 642 122 bp ; Ak = 641 794 bp ; Ua = 615 380 bp ; Sg = 641 454 bp ) . 
+ tRNA reads that mapped sense and anti-sense relative to the tRNA gene were converted into Reads Per Kilobase of exon model per million mapped reads ( RPKM ) . 
+ Coverage per base pair was calculated using custom perl scripts and Microsoft Excel and was viewed in Artemis 13.0 ( 35 ) to visualize sense and anti-sense tRNA coverage . 
+ For each Buchnera strain , tRNA genes were annotated using genome annotations in NCBI , tRNAscan-SE 1.21 ( 14,15 ) and Artemis 13.0 ( 35 ) to verify whether 30 CCA was encoded in the genome . 
+ tRNA CCA 30 maturation occurs in all organisms and is essential for charging tRNAs with amino acids . 
+ To identify CCA 30 maturation , the last 30 20 bp of annotated tRNA 's were retrieved from all high quality raw reads . 
+ Reads that perfectly matched the last 20 bp were binned into the following three categories : ( i ) reads match the 30 tRNA end and no more nucleotides are processed , ( ii ) reads match the 30 tRNA end plus add-itional non-CCA nucleotides are transcribed and ( iii ) reads match the 30 tRNA end plus CCA is added by maturation . 
+ To analyse A+T richness in Buchnera and E. coli CDS and tRNA genes the program EMBOSS ( 36 ) was used . 
+ To calculate codon usage of 50 highly expressed Buchnera genes ( 37 ) , E-cai ( 38 ) was used 
+ After consensus , RNAseq reads corresponding to tRNA genes were mapped and assembled , tRNA species were identiﬁed with tRNAscan-SE 1.21 , with E. coli homology Blast searches ( 39 ) , and with veriﬁcation of the presence of signature identity elements relative to E. coli ( 32 ) . 
+ Survey of tRNA complements in small genomes
+ The last comprehensive survey of tRNA genes from bacteria was conducted in 2002 and only included the endosymbiont genome of Buchnera strain APS ( 40 ) . 
+ Because several smaller endosymbiont genomes have been sequenced since 2002 , we surveyed several more genomes that varied drastically in genome size and phylogenetic placement . 
+ The tRNAscan-SE Genomic tRNA database ( 41 ) was used to characterize the presence of tRNA gene isoacceptors ( i.e. a tRNA species that binds to one or more codons for a particular amino acid residue ) in 16 genomes . 
+ High throughput detection of modiﬁed nucleoside bases in tRNAs
+ During library preparation , some modiﬁed bases cause the reverse transcriptase to either fall off at the modiﬁed position , and/or to incorporate a ` mismatch ' relative to the reference genome sequence ( 42,43 ) . 
+ To detect modiﬁed bases and potential posttranscriptional processing , we screened for mismatches in tRNA reads relative to the reference tRNA gene similar to Iida et al. ( 42 ) and Findeiß et al. ( 43 ) . 
+ After mapping , only the sense tRNA reads in CLC ( using the same mapping parameters as discussed earlier ) , we ran CLC single-nucleotide polymorphism ( SNP ) analyses to detect mismatches . 
+ Threshold criteria for counting a mismatch were established by identifying conserved mismatches in both Ap-5A and AP-LSR1 ( two different strains from the same aphid species ) . 
+ These two strains shared 38 mismatches for which the mismatch rate was more than 1 % per base ( i.e. above Illumina 's expected error rate per base ) and the alternative variant count was at least eight reads . 
+ This mismatch criterion was then used to detect mismatches in other strains for a total of four divergent Buchnera taxa ( Ap , Ak , Ua and Sg ) . 
+ Predicted tRNA-modiﬁed bases and their pathways for each Buchnera tRNA were obtained from E. coli homologs using both http://modomics.genesilico.pl/ pathways / ( 44 ) and http://www.ecocyc.org/ ( 45 ) . 
+ Divergent Buchnera genomes ( 23,25,27 ) were searched for modiﬁcation pathway enzymes using E. coli homologs using Blastp ( 39 ) . 
+ Infernal ( 46 ) was used to generate tRNA sequence and secondary structure alignments among Buchnera strains and E. coli . 
+ The covariance model , RF00005 cm , was used , which accounts for tRNA secondary structure constraints . 
+ Using Infernal output , 4sale ( 47 ) was used to compute pairwise compensatory substitution tables from stems for all tRNAs among Buchnera strains and E. coli . 
+ Stability of tRNA secondary structure was measured as Delta G ( G ) , the change in Gibbs Free Energy ( in units of kcal/mole ) . 
+ Thus , the more negative G is , the more thermodynamically stable the tRNA secondary structure . 
+ G was computed for tRNAs of each strain individually using RNAalifold ( 48,49 ) with constraints on tRNA constraint folding generated by tRNAscan-SE 1.21 ( 14,15 ) . 
+ All raw sense and anti-sense tRNA data were submitted to NCBI Genbank under SRA submission : SRA049863 .3 , under Bioproject # s : ( i ) PRJNA82811 , ( ii ) PRJNA82809 , ( iii ) PRJNA82797 , ( iv ) PRJNA82793 , ( v ) PRJNA82789 . 
+ All paired sample-t test ( percent guanine-cytosine ( % GC ) , tRNA length ) , correlation ( pairwise RPKM comparisons ) and regression ( codon usage and tRNA expression ) statistics were carried out using IBM SPSS Statistics . 
+ 2010 for Mac , standard version 19.0 . 
+ New York , USA . 
+ RESULTS
+ All Buchnera tRNAs are transcribed
+ For all Buchnera genomes , tRNA genes occur in the same genomic positions ( Figure 1 ) . 
+ Based on tScan and blastn detection of homology with E. coli the same 32 tRNA genes and 29 anticodon types are conserved across Buchnera taxa ( Figure 1 ) . 
+ As expected , directional RNAseq reads map primarily in the sense direction of tRNA genes , with antisense reads averaging less than 1 % of the sense reads ( Table 1 , Figure 1 ) . 
+ All Buchnera tRNA genes are expressed in the sense direction , but some lack antisense expression , depending on strain , and sense expression is always higher than antisense expression ( except for Phe GAA in Ak and Sg ) ( Figure 1 ) . 
+ tRNA sense expression is positively correlated across divergent Buchnera taxa ( Table 2 ) . 
+ The level of antisense expression is highly correlated across all Buchnera taxa , but the correlation is less for Buchnera-Sg , the most divergent taxon ( Table 2 ) . 
+ Transcriptional start sites and coverage curves for antisense RNAs varied widely across Buchnera taxa . 
+ Nevertheless , conserved 50 transcriptional start sites and coverage curves were identiﬁed for several antisense RNAs that occurred on or near tRNA genes for all ﬁve Buchnera taxa ( Supplementary Table S1 ) . 
+ Conservation of tRNA identity elements in Buchnera
+ Recognition of tRNAs by tRNA synthetases is essential to the ﬁdelity of translation . 
+ Aminoacyl-tRNA synthetase 
+ ( aaRS ) must recognize multiple tRNA isoacceptors ( i.e. different tRNA species that bind to alternative codons for the same amino acid residue ) but discriminate against others . 
+ This recognition is dependent on tRNA identity elements , consisting of evolutionarily conserved bases at speciﬁc positions of tRNAs ( Giege ' et al. 1998 ) . 
+ Based on RNAseq data from all taxa , unmodiﬁed identity elements for each tRNA are identical to those in E. coli GCA except for base substitutions in Cys ( G15 to U15 ; A13 GGA GCT to G13 ) , Ser ( G73 to A73 ) , Ser ( variable loop 1 bp GGC shorter , except in Sg ) and Ala ( G20 to U20 , 5A and Ua only ; G20 to C20 , Sg and Ak only ) . 
+ Based on blastp analyses , all 20 cognate aaRS are encoded within each Buchnera genome . 
+ In contrast to E. coli and most other organisms with nonreduced genomes , Buchnera does not encode multiple tRNA genes with matching anticodons , except for three tRNA genes encoding the anticodon CAU . 
+ Two of these genes encode either an initiation or elongation Met tRNA based on tRNA identity elements and homology ( Table 3 ) . 
+ The other tRNA gene encoding a CAU anticodon possesses homology and identity elements corresponding LAU to the Ile anticodon ( Table 3 ) . 
+ * Bold = signiﬁcant ( P < 0.01 ) ; unbold = nonsigniﬁcant ( P > 0.05 ) . 
+ N = 32 . 
+ Selective loss of tRNA isoacceptors from Buchnera genomes
+ Numerous tRNA isoacceptors are present in E. coli but missing from all Buchnera strains . 
+ Many Buchnera tRNA isoacceptors that belong to 4-codon family boxes and to two-codon families ( 50-NNR codon type ) have been lost from Buchnera genomes ( Table 3 ) . 
+ 50-CNN anticodons were preferentially lost in family boxes corresponding to Leu , Gly , Ser , Thr and Pro . 
+ Only one family box , corres-ponding to Pro , lost both 50-CNN and 50-GNN anticodons . 
+ For two-codon families , a 50 CNN anticodon was lost from Gln ( and Leu and Arg for 6-codon families ) , relative to E. coli ( Table 3 ) . 
+ Based on Watson and Crick base-pairing and revised wobble rules ( 50,51 ) , all tRNA isoacceptors encoded and expressed in Buchnera can base pair with the 61 possible codons ( Table 3 ) , which are all still encoded in Buchnera 's protein-coding genes at variable frequencies . 
+ The pattern of tRNA gene isoacceptor loss was examined in 16 bacterial taxa representing a wide range of genome sizes and phylogenetic associations , including some with extremely reduced genomes ( Figure 2 ) . 
+ Reduced genomes show common patterns of retention of particular anticodons . 
+ For family box codons , 50-CNN anticodons followed by 50-GNN anticodons are consistently eliminated from the small genomes . 
+ For 50-NNR two-box codons , 50-CNN anticodons are eliminated . 
+ In the most reduced genomes , only 50-UNN anticodons remain for both family box and two-box codons . 
+ Unmodiﬁed 50-U anticodons can wobble and pair with all four base combinations for family box codons ( 50 ) . 
+ Therefore , for 50-NNR two-box codons , the 50-U of anticodons must be modiﬁed to prevent mistranslation of neighboring two-box ( NNY ) codons ( 1 , 2 ) ( e.g. an unmodiﬁed 50 U in a Gln 50-UUG anticodon can mispair with His codons 50-CAU and CAC [ Table 3 ] ) . 
+ Based on E. coli tRNA homologs , 26 different types of nucleoside modiﬁcations are predicted to occur in Buchnera tRNAs ( Table 4 , Supplementary Table S2 , Supplementary Dataset 1 ) . 
+ Nine of these modiﬁcations are important for the efﬁciency and ﬁdelity of protein synthesis and occur in N34 tRNA positions ( wobble ) of E. coli ( Table 4 ) . 
+ We expect ﬁve of these N34 modiﬁcations to be retained to code for all cognate codon pairs and prevent mistranslation of other amino acids ( e.g. 5 5 2 5 2 mnm u , mnm s U , cmnm Um , I and K C ) . 
+ An inosine ( I ) modiﬁcation is important in E. coli because 50-A from anticodon Arg is modiﬁed into I , which can ACG wobble and pair with Arg codons CGA , CGU and CGC ( 55 ) . 
+ Lysidine ( K C ) is an important modiﬁcation in 2 E. coli because 50-C from anticodon IleCAU is modiﬁed into K C ( L ) , which pairs with Ile codon AUA ( instead 2 of the Met codon AUG ) ( 59 ) . 
+ Other expected N34 modiﬁcations ( mnm u , mnm s2U and cmnm Um , ) are 5 5 5 important for modifying anticodon 50 U for NNR two codon boxes , thus preventing mistranslation ( 1,2 ) . 
+ Based on Buchnera genome annotations , entire pathways are only present for expected wobble bases I , k C and 2 cmnm um ( Table 4 ) ; however , some pathways are only 5 missing the last enzyme in a pathway ( e.g. mnm u 5 and mnm s U ) , and/or are still unknown in E. coli . 
+ 5 2 High throughput mismatch evidence ( see ` Materials and Methods ' section ) shared by multiple taxa supports the presence of a modiﬁed nucleoside at 50-A from anticodon Arg in all Buchnera strains . 
+ These data support ACG the presence of an inosine modiﬁcation in all taxa . 
+ For example , we found a high frequency of anticodon 50-ACG transcribed as 50-GCG , where the frequency of 50-G/A at this wobble base position was , Ap-5A : 61/39 % ; Ap-LSR1 = 70/30 % ; Ak = 69/31 % ; Ua = 27 / 73 % ; and Sg = 72/28 % . 
+ Presence of transcripts containing a 50-G for the ArgACG anticodon is strong indirect evidence for an inosine modiﬁcation . 
+ For example , during the reverse transcription process , the modiﬁed nucleoside inosine base pairs with C residues , and therefore ` G ' is found in the consensus cDNA sequence instead of ` A ' ( 60 ) . 
+ Conserved high throughput mismatch evidence for Ap-5A , Ap-LSR1 and Ak also supports the presence of a modiﬁed base at N34 for Lys , suggesting that TTT mnm s U is present in these strains even though the 5 2 E. coli version of the pathway appears incomplete in Buchnera . 
+ Error evidence was not detected for other expected modiﬁed wobble positions relative to E. coli , even though full pathways are retained in the genome ( Table 4 ) . 
+ Other tRNA modiﬁcations that are very important for the ﬁdelity of protein synthesis are N37 modiﬁcations . 
+ N37 modiﬁcations are known to stabilize weak A : U and U : A base pairing between N36 of the anticodon and N1 of the codon ( 1,2,51 ) . 
+ Based on in vitro experiments , N37 modiﬁcations are known to increase the interaction of the codon with the anticodon , preventing miscoding of amino acids and frameshifts ( 52,53,61 -- 63 ) . 
+ In turn , to maintain efﬁcient translation , we expect these modiﬁcations to be retained . 
+ Based on modiﬁcations for the hom-ologous tRNAs in E. coli , seven important N37 modiﬁed nucleosides are predicted in Buchnera . 
+ Among Buchnera genomes , four N37 nucleosides pathways are retained , two are missing , and one has an unknown pathway in E. coli ( Table 4 ) . 
+ High throughput mismatch evidence supports the presence of a modiﬁed base at N37 for Phe , GAA Pro , Leu and Leu and thus suggests that TGG GAG TAG ms2i6A , m1G , xG and xG , respectively , are present in all taxa . 
+ However , no mismatch was detected in Sg for Leu . 
+ The tRNA modiﬁcations at positions other GAG than N34 and N37 that are supported by mismatch evidence are shown in Supplementary Table S2 , and Supplementary Dataset 1 . 
+ Mismatch evidence was also found at positions at which E. coli does not process modiﬁed nucleosides , suggesting the presence of new modiﬁed nucleoside sites and/or RNA editing of mature tRNAs ( Supplementary Dataset 1 ) . 
+ Collectively , al mismatch frequencies ( with the exception of Arg ) were ACG dominated by the reference sequence base at a frequency of 90-99 % relative to mismatches for all taxa . 
+ Mismatches were primarily not changes to a single nucleo-tide base , but were composed of three different bases other than the reference base . 
+ No relationship between codon frequencies and tRNA expression
+ In many species , tRNA abundances are positively correlated with codon usage for highly expressed genes ( 64,65 ) . 
+ Anticodons of highly expressed tRNAs corres-pond to codons that are used frequently in these genes , thus improving the efﬁciency of translation ( 64,65 ) . 
+ Based on Watson and Crick and revised wobble base-pairing rules ( 50,51 ) , each Buchnera isoacceptor was paired with its corresponding codon pair . 
+ Met CAU , the only duplicate anticodon coding for the same codon , was excluded from analysis . 
+ The relationship between percent average codon usage of highly expressed genes and corresponding tRNA isoacceptor expression was examined for each Buchnera strain . 
+ No signiﬁcant relationship was found between average codon usage of 50 highly expressed genes in Ap-5A ( on leading and lagging strands ) and cognate tRNA isoacceptor sense expression ( Figure 3 ) . 
+ No signiﬁcant relationship was found on examining the relationship between highly expressed Buchnera genes ( four chaperones and 54 ribosomal proteins ) and cognate tRNA isoacceptor sense expression for all taxa ( Supplementary Figure S1A and S1B ) . 
+ Examination of codon usage and tRNA expression scatterplots reveals that most tRNA isoacceptors , regardless of codon usage , are expressed at similar levels ( e.g. for Ap-5a in RPKM the 75 percentile = 843 950 , median = 309 742 and CCA max = 4 407 138 ; Figure 3 ) . 
+ Trp is the highest expressed isoacceptor in all taxa ( except Ua ) , even though the corresponding codon occurs at low frequency ( Figure 3 and Supplementary Figure S1 ) . 
+ Buchnera tRNAs maintain secondary structure with compensatory base substitutions
+ As expected , Buchnera CDS are signiﬁcantly more A+T rich relative to CDS of E. coli [ Figure 4 ( c ) ] . 
+ Within each Buchnera genome , tRNA genes are 2.2-fold more G+C rich relative to CDS , indicating that selection conserves higher % G+C in tRNA genes . 
+ Nevertheless , Buchnera tRNA genes are signiﬁcantly more A+T rich than homologs in E. coli [ Figure 4 ( c ) ] . 
+ Stability of tRNA secondary structure can decrease with a reduction in % GC , especially in stem structures . 
+ Because Buchnera tRNAs are more A+T rich than those of E. coli [ Figure 4 ( c ) ] , we measured the stability of Buchnera tRNA secondary structure . 
+ G was signiﬁcantly more negative in E. coli tRNAs relative to homologs in Buchnera for all strains , indicating that Buchnera tRNAs have reduced stability in vitro [ Figure 4 ( b ) ] . 
+ Whether they have reduced stability in vivo , where stabilizing proteins may play a role , remains to be tested . 
+ Two tRNAs with the weakest secondary structure in all Buchnera relative to E. coli were Val and Trp ; both tRNAs possess GAG CCA numerous compensatory and single base substitutions in the stem regions [ Figure 4 ( a ) ] . 
+ Buchnera tRNAs are more A+T biased and display weaker secondary structure than those of E. coli ( Figure 4 ) . 
+ However , a high frequency of compensatory base substitutions are expected in the stem regions as a mechanism for maintaining functionality of these essential molecules . 
+ Relative to E. coli , a total of 37 -- 42 compensa-tory base substitutions were found in Buchnera tRNA stem regions ( Table 5 ) . 
+ Many of these compensa-tory substitutions were C/G to T/A directional changes ( Table 5 ) . 
+ Buchnera tRNA gene shrinkage and compensatory 30 maturation
+ Genome reduction primarily reﬂects loss of coding genes , as reduction in gene length is minor ( < 1 % , 37 ) , and gene packing is similar for bacterial genomes of different sizes ( 66 ) . 
+ However , Buchnera tRNA genes are often shorter in length than their homologs in E. coli [ Figure 5 ( a ) ] . 
+ The difference in length is typically 3 bp and mostly reﬂects the loss of encoded 30 CCA in the Buchnera tRNA genes . 
+ At the 30 end of tRNAs , CCA is required for amino acid activation , and must either be encoded in the tRNA gene or added during tRNA maturation by the CCA-adding enzyme . 
+ Although E. coli and other close relatives of Buchnera such as Vibrio and Pseudomonas spp . 
+ all encode 30 CCA in all tRNA genes except that for selenocysteine , only half of Buchnera tRNA genes encode 30 CCA [ 14-17 depending on strain , Figure 5 ( b ) ] . 
+ 0 The remaining Buchnera tRNA genes have lost the 3 encoded CCA . 
+ Our analysis of directional RNAseq reads indicates that the mature transcript of these genes 0 possesses a CCA at the 3 end [ Figure 5 ( b ) ] , implying CCA-addition . 
+ Some Buchnera tRNA genes with 30 CCA encoded also displayed CCA 30 maturation [ Figure 5 ( b ) ] , resulting in double or triple CCA at the 30 end of tRNAs . 
+ Recently , it was shown that tRNAs with dual 30 CCA are targeted for degradation ( 67 ) . 
+ More speciﬁcally , if a tRNA has 50 
+ Gs on bp 1 and 2 , and its acceptor stem is structurally unstable , then the CCA-adding enzyme marks unstable tRNAs by adding dual 30 CCAs , targeting it for degrad-ation by RnaseR ( 67 ) . 
+ Such degradation also seems possible in Buchnera strains , which encode both the CCA-adding enzyme and RnaseR . 
+ Thus , we examined all Buchnera tRNAs with dual and triple 30 CCA matur-ation . 
+ First , we noted that all E. coli tRNAs with a 50 G at the 1st and 2nd base position encode dual or triple CCA 0 on the 3 end of the tRNA gene [ Figure 5 ( c ) ] . 
+ Based on tRNAscan-SE 1.21 , the penultimate CCA is always incorporated into the 30 acceptor stem , exposing a single 30 CCA for activation . 
+ Most Buchnera tRNAs that display dual or triple 30 CCA maturation still retain a 50 G at the 1st and 2nd bases and are homologs to dual or triple 30 CCA encoded E. coli tRNAs [ Figure 5 ( c ) ] . 
+ Three strain-speciﬁc tRNAs with dual 30 CCA maturation do not have E. coli homologs with dual CCAs encoded . 
+ These Buchnera tRNAs also do not encode 50 Gs at the 
+ 1st and 2nd base . 
+ All Buchnera with dual or triple 30 CCA maturation incorporate the 2nd to last CCA into the 30 acceptor stem as in E. coli , except for one case , tRNA Leu in Ak [ Figure 5 ( c ) ] . 
+ TAA 
+ DISCUSSION
+ The efﬁciency and ﬁdelity of translation is reinforced by many mechanisms encoded in genomes . 
+ In reduced genomes , mutation rates are typically high , and selection becomes less effective in maintaining translational mechanisms . 
+ In this study , we found that bacterial endosymbi-ont lineages ( Buchnera ) that experience relaxed selection display less optimal tRNA characteristics relative to those of their free-living relative E. coli . 
+ Gene loss and A+T mutational bias in Buchnera have lead to the loss of tRNA isoacceptors and loss of modiﬁed base pathways , the reduction of tRNA gene length , and the accumulation of base substitutions and indels ( insertions / deletions ) in tRNA sequences that weaken tRNA secondary structure and possibly aminoacyl-tRNA synthetase recognition . 
+ These tRNA characteristics are conserved across four Buchnera lineages spanning 70 million years of divergence and may result in reduced translational efﬁciency and ﬁdelity relative to their ancestors . 
+ However , we did detect compensatory base substitutions in Buchnera tRNAs , which are expected to maintain secondary structure of tRNA stem regions . 
+ Additionally , RNAseq reads 0 reveal novel 3 maturation processes that compensate for tRNA gene length reduction . 
+ Divergent Buchnera taxa in this study encode and express the same 32 tRNA genes composed of 32 different isoacceptor types ( Figure 1 ) . 
+ In turn , no duplication of tRNA gene isoacceptors was found . 
+ Based on a survey of 50 eukaryotic , eubacterial , and archaeal genomes , low tRNA gene redundancy ( i.e. only one or two gene copies of a particular isoacceptor ) was only found in all archaeans and several bacterial genomes , and was approximately correlated with genome size ( 40 ) . 
+ In Buchnera , because of modiﬁed wobble rules ( 50,51 ) , all mature tRNAs expressed can theoretically base pair with the 61 possible codons ( Table 3 , Figure 1 ) , which are all still encoded in Buchnera CDS . 
+ One special Buchnera isoacceptor that has been identiﬁed previously in Buchnera-Ap ( taxa type strain APS ) is tRNA Ile CAU ( 40 ) , where 50-C is modiﬁed into lysidine by the enzyme TilS in E. coli ( 55 ) , which all Buchnera strains still encode . 
+ This special Ile isoacceptor codes for Ile instead of CAU Met due to a wobble modiﬁcation , and is ubiquitous in Eubacteria and Archaea ( 40 ) . 
+ During genome reduction , Buchnera has preferentially lost 50-CNN , and to a lesser extent , 50-GNN anticodons in 0 family boxes and 5 CNN anticodons from two-codon NNR families ( Table 3 ) . 
+ This pattern of tRNA isoacceptor loss is common for many bacteria with reduced genomes ( Figure 2 ) , and is most likely related to gene deletion processes . 
+ Selective loss of these speciﬁc isoacceptors in family boxes and NNR two-codon families in Eubacteria was observed in previous studies ( 1,40,68,69 ) but was related to A+T sequence bias not deletion processes ( 1,70 ) . 
+ We hypothesize that genome reduction , which is correlated with A+T bias , is the most likely explanation for this pattern of tRNA isoacceptor loss . 
+ First , the potential for wobble in codon -- anticodon basepairing implies that some tRNA isoacceptors are not essential for pairing with corresponding codons ( e.g. 50-CNN , 50-GNN anticodons ) and can be eliminated through mutation and deletion . 
+ Second , due to wobble rules , 50-GNN anticodons followed by 50-UNN anticodons are the most promiscuous isoacceptors when pairing with cognate codons ; thus , it is not surprising that 50UNN is always retained in family box and two-box NNR codons in the most reduced genomes . 
+ In turn , 50-UNN anticodons are probably retained because of their ability to recognize alternative codons rather than because of the high frequency of cognate codons in A+T rich CDS . 
+ Typically in bacteria and eukaryotes 50-CNN and 50-GNN anticodons of family boxes and 50-CNN anticodons from two-codon families along with 50 U anticodon modiﬁcations extending wobble are maintained by selection , because they increase the efﬁciency of translation ( 1,71 ) . 
+ We predict that the loss of tRNA isoacceptors in Buchnera as well as other endosymbionts potentially results in less efﬁcient translation . 
+ Numerous unmodiﬁed nucleotides at speciﬁc nucleotide positions on tRNA isoacceptors are conserved phylogen-etically and are known to play crucial roles in deﬁning tRNA speciﬁcity for aminoacylation ( 32,72 ) . 
+ These conserved nucleotides are called identity elements and are required for proper recognition by the cognate aaRS in addition to playing roles as deterrents to false recognition ( 32 ) . 
+ Our results reveal that most Buchnera tRNAs have maintained identity elements homologous to those in E. coli , with the exceptions of Cys , Ser , Ser GCA GGA GCT and Ala . 
+ In E. coli tRNA , the identity elements GGC cys G15 · G48 form an unusual tertiary base pair called a Levitt pair ( 73 ) . 
+ Additionally , the E. coli identity elements A13 · A22 are important in determining the structure of G15 · G48 ( 74 ) . 
+ Collectively , these E. coli identity elements are required for CysRS recognition due to their role in RNA tertiary structure ( 73 ) . 
+ In all , Buchnera taxa , tRNA G15 cys · G48 has mutated to U15 · G48 and A13 · A22 has mutated to G13 · A22 . 
+ Hou et al. ( 73 ) found that when G15 · G48 is mutated to U15 · G48 , its backbone conﬁguration is similar to the wild type tRNA ; however , only partial aminoacylation cys ( 46.2 % ) occurs relative to the wild type . 
+ How both types of changes in identity element together affect tertiary structure is unknown . 
+ In Buchnera tRNA Ala , the identity element G20 is GGC mutated to U20 in strains 5A and Ua and to C20 in Ak and Sg . 
+ In E. coli tRNA Ala , these same base changes VGC were shown to result in 6 and 50 reductions in alanine charging activity , respectively , relative to native tRNA Ala ( 75 ) . 
+ Buchnera Ala does not possess this VGC UGC mutation . 
+ Potentially , if this mutation is deleterious in Ala recognition , Ala can wobble to all four GGC UGC alternative codons for the family box codon family for alanine . 
+ Interestingly , the smallest sequenced genome of Buchnera , for the host Cinara cedri , retains the same tRNA isoacceptors and aaRSs as other Buchnera taxa examined in this study ; however , the Ala tRNA GGC gene has been lost , resulting in a total of only 31 tRNA genes . 
+ In Buchnera tRNA Ser , the identity element G73 GGA ( the discriminator base ) has mutated to A73 . 
+ Generally a mutation in the discriminator base is known to result in the loss of cognate aminoacyl-tRNA synthetase recognition ; however , Shimizu et al. ( 76 ) demonstrated that any four bases substituted in the discriminator base of E. coli Ser tRNA resulted in the same level of aminoacylation . 
+ Nevertheless , G73 in Ser tRNA is phylogenetically conserved ( 72 ) and has been shown to play minor roles in SerRS discrimination ( 77,78 ) . 
+ Additionally , in E. coli Ser tRNA , the variable region plays a very important role as an identity element ( 77,79 ) . 
+ In all Buchnera taxa , except Sg , the variable region length of the Ser isoacceptor is GCT 1 bp shorter than the E. coli Ser isoacceptor . 
+ In GCT summary , it is unknown how all these mutated identity elements affect Buchnera translation , but the same mutations in E. coli are known to signiﬁcantly reduce the efﬁciency of aminoacylation . 
+ In addition to requiring speciﬁcity in aminoacylation , reliable and efﬁcient translation requires the anticodon to correctly pair with its codon . 
+ Modiﬁed nucleosides of tRNAs are essential mechanisms reinforcing translational ﬁdelity and efﬁciency , especially at the wobble ( N34 ) and 30 position immediately adjacent to the anticodon ( P37 ) , ( 1,2,51 ) . 
+ Based on E. coli tRNA homologs , we expect 16 different types of modiﬁed bases to be present in the remaining 32 Buchnera tRNAs , for both N34 and N37 positions . 
+ In E. coli , 13 of these modiﬁed base pathways are known and Buchnera encodes complete pathways for six of these ( Table 4 ) . 
+ All Buchnera taxa have lost enzymes responsible for encoding N37 modiﬁed bases m A and 2 m A , which are important in stabilizing 5 6 0-NNC/G anticodons ( 2 ) ( Table 4 ) . 
+ Enzymes that synthesize the N37 modiﬁcation m t A are conserved in only half of 6 6 Buchnera taxa ; this enzyme is known to slightly increase the efﬁciency of base pairing of the anticodon Thr to GGU the codon ACC in E. coli ( 54 ) . 
+ All N37 modiﬁed base pathways important for preventing frameshifts and stabilizing A : U and U : A at the wobble position of the anticodon and the ﬁrst position of the codon were retained in all Buchnera taxa ( Table 4 ) . 
+ These mechanisms may be essential for the ﬁdelity of translation , especially for A+T rich genomes . 
+ Modiﬁed nucleosides at the wobble base position ( N34 ) of the anticodon are important for encoding the right amino acid , extending or restricting wobble , increasing the efﬁciency of base pairing and preventing frameshifts ( 2,53,55,56,58,59 ) . 
+ Buchnera taxa all encode the enzyme TilS that is essential for the synthesis of the modiﬁed base lysidine , and is important for encoding the amino acid Ile instead of Met ( 59 ) . 
+ All Buchnera taxa also encode the core enzymes MmmE and MnmG that are important for synthesizing the modiﬁed bases mnm u , 5 mnm s U , and cmnm Um , which restrict 5 5 2 5 0U wobble in NNR two-box codons , including Arg and Leucine ( Table 4 ) . 
+ All of these pathways are complete except for MmmC , which is involved in the last step for both modiﬁed bases , mnm u and mnm s U. However , 5 5 2 RNAseq mismatch evidence supports the presence of a modiﬁed base at the expected position of mnm s U 5 2 ( Table 4 ) . 
+ Interestingly , the genes encoding MmmA , MmmE , MnmG , and IscS or SufS , but not MnmC are retained in several tiny endosymbiont genomes ( 10 ) 
+ Conservation of these enzymes in reduced genomes indicates that these enzymes or derivatives are important for the production of the modiﬁed bases mnm u , cmnm Um , 5 5 and especially mnm s U , which is essential for preventing 5 2 frameshifts and restricting wobble in NNR two codon boxes ( Glu , Lys and Gln ) , thereby preventing the miscoding of amino acids . 
+ For incomplete pathways producing modiﬁed bases mnm u and mnm s U , either a 5 5 2 derivative may be synthesized and/or the insect host may import MnmC . 
+ For example , the pea aphid , A. pisum expresses its mnmC homolog ( XP_003245837 ) in both its body and in the specialized aphid cells ( bacteriocytes ) that contain Buchnera cells ( 34 ) . 
+ Another key enzyme that is retained in Buchnera is TadA , which is responsible for synthesizing inosine in E. coli ( 55 ) . 
+ This wobble modiﬁcation is present on Arg in many bacteria and can wobble to three alterACG native codons of Arg ( 2,59 ) . 
+ Rnaseq mismatch evidence highly supports this modiﬁcation , as inosine is recognized as G during the reverse transcription process ( 60 ) , and therefore we were able to measure a high frequency of modiﬁed Arg transcripts from all Buchnera taxa . 
+ ACG Unfortunately , other modiﬁed bases do not appear to be recognized as speciﬁc bases and in turn incorporate different frequencies of any of the four bases during reverse transcription of modiﬁed transcripts ( 42,43 ) . 
+ Collectively , Rnaseq evidence supported the presence of ﬁve modiﬁed bases , four in which the pathways are known and present ( or near present for mnm s U ) and one in 5 2 which the pathway is unknown ( Table 4 ) . 
+ If Buchnera tRNAs can be isolated without host contamination , modiﬁed base presence and identity can be conﬁrmed . 
+ In many bacterial species , tRNA abundances are positively correlated with codon usage for highly expressed genes , thus increasing translational efﬁciency ( 64,65 ) . 
+ In addition to analysing speciﬁc tRNA characteristics that inﬂuence the accuracy and efﬁciency of translation , we examined whether codon usage correlates with tRNA expression . 
+ We found that tRNA sense expression is highly correlated across Buchnera taxa ( Table 2 ) , and many tRNA isoacceptors are expressed at similar levels within taxa ( Figure 3 , Supplementary Figure S1 ) . 
+ A previous microarray study suggested that tRNA expression and codon usage of 50 highly expressed genes in Buchnera-Ap were positively correlated ( 37 ) , but the relationship was weak and expression of sense and antisense tRNAs were not distinguished , possibly confounding results . 
+ Our directional RNAseq data show no relationship between tRNA expression and codon usage , for the same set of highly expressed genes in Buchnera-Ap under similar conditions ( Figure 3 ) . 
+ Furthermore , no relationship was detectable in three other Buchnera taxa ( Supplementary Figure S1 ) . 
+ Collectively , these results suggest that selection is not maintaining codon bias for highly expressed proteins . 
+ Interestingly , Trp , is CCA the highest expressed isoacceptor in all Buchnera taxa ( except Ua ) and has very low codon usage . 
+ In all , Buchnera examined , isoacceptor Trp displays one of CCA the lowest secondary structures relative to E. coli 's homolog ; potentially Trp is highly expressed to comCCA pensate for low aminoacylation efﬁciency related to numerous base substitutions that weaken its secondary structure [ Figure 4 ( a ) ] . 
+ In this study , we found that Buchnera tRNAs have maintained high % GC relative to its CDS ; however , its tRNAs are more A+T rich and less stable relative to homologs in E. coli ( Figure 4 ) . 
+ These results are consistent with previous ﬁndings ( 28 ) showing that 16S rRNAs of Buchnera and other endosymbiont species are more A+T rich and less stable than those of free-living relatives . 
+ Similarly , mitochondrial tRNAs from animals are more A+T rich and less stable than nuclear tRNAs ( 80 ) . 
+ Collectively , these results suggest that the accumulation of deleterious mutations can lead to less stable secondary structures of essential RNAs involved in translation . 
+ Some selection for stabilization is also evident as numerous compensatory base substitutions have been ﬁxed in the stem regions of both rRNAs ( 28 ) and tRNAs ( Figure 5 ) . 
+ Alternatively , E. coli tRNAs may possess higher % GC because its optimal growth temperature is higher than that of Buchnera ( 81 ) , thus favoring higher % GC for increased thermal stability . 
+ During genome reduction , 72 -- 78 % of Buchnera tRNA genes among all taxa have deleted 3 bp , due to the loss of 30 encoded CCA [ Figure 5 ( a ) ] . 
+ Nevertheless , we found that all mature Buchnera tRNAs process 30 CCA , and therefore they all have potential for amino acid activation [ Figure 5 ( b ) ] . 
+ In all Buchnera taxa , six to eight mature tRNAs process dual or triple 30 CCA [ Figure 5 ( b ) ] . 
+ These characteristics , in addition to 50 G at the 1st and 2nd position and instability of the acceptor stem , result in tRNA degradation ( 67 ) . 
+ Interestingly , these tRNAs in Buchnera and E. coli transcribe 50 G at the ﬁrst and second base position and process dual or triple 30 CCA [ Figure 5 ( c ) ] . 
+ In these mature tRNAs in both E. coli and Buchnera , the second to last 30 CCA is always incorporated into the 30 acceptor stem . 
+ Potentially , the retention of encoded 50 G at N1 and N2 and the conservation of dual and triple 30 CCA maturation in these tRNAs [ Figure 5 ( c ) ] are essential to maintain the correct secondary structure and to police unstable tRNAs via the tRNA degradation pathway . 
+ In conclusion , our observations of altered tRNA characteristics are consistent with the hypothesis that translational ﬁdelity is lower in Buchnera compared with free-living relatives as represented by E. coli . 
+ First , Buchnera genome reduction has resulted in the loss of speciﬁc tRNA isoacceptors and modiﬁed nucleoside pathways that may reduce translational efﬁciency and ﬁdelity . 
+ Second , Buchnera 's A+T mutational bias and reduced selection has resulted in the reduction of tRNA stability in vitro and speciﬁc tRNA base substitutions that may alter the efﬁciency of aaRS recognition . 
+ Moreover , reduced translational efﬁciency was supported by the lack of relationship between codon usage of highly expressed genes and cognate tRNA isoacceptor expression . 
+ Nevertheless , purifying selection appears to be strong enough in Buchnera genomes to maintain high % GC of tRNA genes relative to CDS . 
+ Also , CCA 30 maturation of shortened tRNA genes , and numerous compensatory base substitutions in tRNA stems help maintain tRNA second-ary structure and function . 
+ Consequently , we predict tha the translational efﬁciency and ﬁdelity evident in Buchnera are in an intermediate state between free-living bacteria and organelles . 
+ ACCESSION NUMBERS
+ All raw sense and anti-sense tRNA data were submitted to NCBI Genbank under SRA Submission : SRA049863 .3 , under Bioproject numbers : ( i ) PRJNA82811 , ( ii ) PRJNA82809 , ( iii ) PRJNA82797 , ( iv ) PRJNA82793 and ( v ) PRJNA82789 . 
+ SUPPLEMENTARY DATA
+ Supplementary Data are available at NAR Online : Supplementary Tables 1 and 2 , Supplementary Figure 1 and Supplementary Dataset 1 . 
+ ACKNOWLEDGEMENTS
+ The authors thank Kim Hammond for rearing aphids and Dieter Söll , Jiqiang Ling , Patrick O'Donoghue and Markus Englert for helpful discussions and feedback on tRNA data . 
+ Also , they also thank Yogeshwar Kelkar , Rahul Raghavan and Patrick Degnan for helpful comments on the manuscript and thank four anonymous reviewers for their helpful comments and suggestions .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/22733746.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/22733746.txt 0 → 100644
View file @27818a9
+ Galactose repressor mediated intersegmental
+ By microscopic analysis of ﬂuorescent-labeled GalR , a regulonspeciﬁc transcription factor in Escherichia coli , we observed that GalR is present in the cell as aggregates ( one to three ﬂuorescent foci per cell ) in nongrowing cells . 
+ To investigate whether these foci represent GalR-mediated association of some of the GalR speciﬁc DNA binding sites ( gal operators ) , we used the chromosome conformation capture ( 3C ) method in vivo . 
+ Our 3C data demonstrate that , in stationary phase cells , many of the operators distributed around the chromosome are interacted . 
+ By the use of atomic force microscopy , we showed that the observed remote chromosomal interconnections occur by direct interactions between DNA-bound GalR not involving any other factors . 
+ Mini plasmid DNA circles with three or ﬁve operators positioned at deﬁned loci showed GalR-dependent loops of expected sizes of the intervening DNA segments . 
+ Our ﬁndings provide unique evidence that a transcription factor participates in organizing the chromosome in a threedimensional structure . 
+ We believe that these chromosomal connections increase local concentration of GalR for coordinating the regulation of widely separated target genes , and organize the chromosome structure in space , thereby likely contributing to chromosome compaction . 
+ The genes involved in D-galactose metabolism and regulation with their cognate promoters constitute a regulon , which is regulated by Gal repressor ( GalR ) in Escherichia coli . 
+ Puriﬁed GalR is a homodimer of a 37-kDa subunit ( 1 ) . 
+ GalR is also known to form oligomers and higher-order structures , which give rise to paracrystals , at less than 0.2 M salt concentrations both in the absence and presence of DNA ( 1 ) . 
+ GalR represses gene transcription by binding to speci c operator DNA sequences that ﬁ are associated with the gene promoters ( 2 ) . 
+ There are at least ve known promoters each associated with one or two GalR ﬁ binding operators , located at 17.0 ( galE operon ) , 48.2 ( mgl operon ) , 48.2 ( galS ) , 64.1 ( galR ) , and 66.5 ( galP ) min on the chromosome . 
+ So far only three of the cognate promoters of -- galE , galS , and galP contain two operators ( 2 ) . 
+ We have pre - -- viously shown that GalR bound to the two operators , which encompass two promoters , P1 and P2 , in the galE operon , associates to form a DNA loop ( 3 ) . 
+ Whereas simple DNA binding represses the P1 promoter , only DNA looping represses the P2 promoter . 
+ We proposed that the GalR ( dimers ) bound to the regulon operators located around the chromosome associate in some order giving rise to a speciﬁc 3D network of D-galactose metabolism-related genes to better coordinate the regulation of the functionally related promoters and to maintain higher local concentrations of GalR around the operators , . 
+ The interactions may also help compaction of the chromosome . 
+ The most likely way to bring distant regulatory loci together is through interactions between DNA-bound proteins . 
+ Here we demonstrate , by both in vivo and in vitro methods , that operator-bound GalR located around the chromosome interact with each other . 
+ We observed that there are greater interactions in nongrowing cells than in growing cells . 
+ We used ﬂuorescent Venus labeled GalR to trace location of GalR in cell ( 4 ) , Chromosome Conformation 
+ Capture ( 3C ) analysis to determine the aggregation of distally located DNA-bound GalR in vivo ( 5 ) , and atomic force microscopy ( AFM ) to visualize DNA-bound GalR -- GalR interactions in vitro ( 6 , 7 ) . 
+ The implication of the results is discussed . 
+ Results
+ Fluorescence Microscopy Analysis of GalR-Venus . 
+ Elf et al. demonstrated location of the LacI repressor protein bound to its DNA target in the lac operon ( 4 ) . 
+ This was accomplished by using a fusion of the LacI protein to a modiﬁed rapidly maturing YFP ﬂuorescent protein ( Venus ) , and observing the cells under a ﬂuorescent microscope . 
+ We used an anologous approach to ﬁnd out the location of DNA-bound GalR around the chromosome . 
+ We genetically fused the Venus gene sequence to a single copy chromosomal GalR gene at the carboxy-end to generate galR-venus ( Fig. 1A ) . 
+ Any potential effect of the fusion of Venus to GalR on the GalR expression level was examined by Western blot . 
+ As shown in Fig . 
+ S1 , the expression level of the fusion protein under various growth conditions was almost identical . 
+ We also tested whether the fusion would affect the binding of GalR to its target DNA . 
+ The results showed that the gene-reg-ulatory DNA-binding activity of the GalR-Venus fusion on the regulation of two promoters of the gal operon was the same as that of WT GalR ( 8 ) ( Fig . 
+ S2 ) . 
+ We concluded that the fusion of Venus did not fundamentally alter the normal GalR property . 
+ Moreover , a single dimer of GalR-Venus fusion bound to DNA in the cell can not be observed under the conditions used , as opposed to ﬁndings by Xie et al , who observed single molecules of LacI-Venus fusion protein ( 5 ) using an EMCCD camera with their microscope set-up that is reportedly capable of detecting single photons , at the expense of resolution ( 5 ) . 
+ They greatly increased the exposure time and calculated the midpoint of the ﬂuorescence signal to assign `` enhanced localization '' for each focus . 
+ In our setup , exposure times were ∼ 1 s. Without a single-photon sensitive microscope , we hoped to detect clusters comprising multiple molecules of ﬂuorescent GalR in our microscope if GalR molecules aggregate . 
+ As shown in Fig. 1B , when the stationary phase GalR-Venus cells were observed under a ﬂuorescence microscope , 277 of 284 counted cells grown in minimal medium displayed 1 -- 3 distinguishable ﬂuorescent foci . 
+ The distribution of the number of foci per cell is shown in Fig. 1C . 
+ We rarely observed cells with four or more foci . 
+ At the resolution limits of our setup , we conclude that the foci represent diffraction-limited localization events . 
+ Of course , independent localizations that are closer than 200 nm apart will appear as a single focus in our diffraction-limited setup . 
+ As mentioned above , all of our ﬂuorescence experiments were done 
+ Author contributions : Z.Q. , R.E. , and S.A. designed research ; Z.Q. , E.K.D. , and P.E. performed research ; S.A. analyzed data ; and Z.Q. and S.A. wrote the paper . 
+ The authors declare no conﬂict of interest.
+ 1Present address : Epidemiology Division , Tel Aviv Sourasky Medical Center , Tel Aviv 64239 , Israel . 
+ 2To whom correspondence should be addressed . 
+ E-mail : sadhya@helix.nih.gov . 
+ This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10 . 
+ 1073/pnas .1208595109 / - / DCSupplemental . 
+ with GalR-Venus fusion located at the chromosome in normal position that generates ∼ 60 -- 70 dimers in the absence of D-ga-lactose under the conditions used ( 37 °C ) as determined by Western blot . 
+ Under the conditions used we can not resolve whether each spot has many subnucleoid entities . 
+ There are ∼ 100 GalR dimers present in E. coli cells grown in minimal medium ( 9 ) . 
+ It is likely that DNA-bound GalR associate to generate few GalR aggregates . 
+ We have previously characterized a GalR mutant ( GalRT322R ) , which normally binds to the operators but inefﬁcient in forming tetra - and higher-order oligomers ( 3 ) . 
+ We generated a GalRT322RVenus mutant and ﬁrst tested its gene-regulatory activity the same way as the WT ( Fig . 
+ S2 ) . 
+ The results showed that although GalRT322R-Venus repressed the P1 promoter of the gal operon by binding to a single cognate operator locus OE , it is defective in tetramerization-mediated repression of the P2 promoter of the gal operon by DNA looping . 
+ Therefore , if GalR-Venus foci are caused by GalR multimerization , then we do not expect to observe ﬂuorescent foci in cells carrying the GalRT322R-Venus fusion protein and grown under the same conditions . 
+ Consistently , we observed ﬂuorescent foci in only four of 185 cells that contained the mutant GalR fusion . 
+ These results support the hypothesis that the foci observed with GalR-Venus were generated by association of two or more GalR dimers presumably bound to DNA . 
+ Furthermore , when log phase cells carrying the WT GalR-Venus fusion protein were inspected , the ﬂuorescent foci were signiﬁcantly reduced , caused by fast DNA replication fork of the chromosome presumably interfering with the intrachromosomal contacts ( Fig. 1B ) . 
+ Moreover , we also investigated the effect of D-galactose ( the inducer of gal operon ) on the observed foci . 
+ As shown in Fig . 
+ S3 , both in stationary phase and log phase , no signiﬁcant difference could be found between the cells treated with and without D-galactose . 
+ The distribution of the number of foci per cell cultured to stationary phase with D-galactose is shown in Fig. 1C . 
+ Because D-galactose does not effectively induce the gal operon located at 17.0 min of the chromosome in stationary phase cells ( 10 ) , D-galactose may not break GalR-mediated bridges detected here under similar conditions . 
+ Alternatively , the GalRmediated intrachromosomal links in stationary phase cells may have more complex structure/compositions to be sensitive to D-galactose . 
+ The ﬂuorescence data strongly suggest that the GalR binding sites along the chromosome are brought together through GalR polymerization . 
+ In addition , Xie et al reported that the signal intensities for single foci that they measured occurred in discrete quanta , suggesting that these investigators were measuring single molecules of ﬂuorophores ( 5 ) . 
+ As the intensities of our foci do not appear to occur in such discrete quanta , we are not concluding that we are detecting single molecules at each focus . 
+ Intersegmental Chromosomal Associations . 
+ The intersegmental chromosomal connections by DNA-bound GalR as suggested by the above ﬂuorescent labeling of GalR experiment can be tested in vivo by the 3C method or its reﬁnements ( 5 ) . 
+ Such techniques have been used in chromosomal 3D structure analysis in yeast and human ( 11 -- 13 ) . 
+ Fig . 
+ S4 shows the principle of the 3C method as adapted from Dekker et al. ( 5 ) . 
+ First , we performed the 3C analysis to test whether the operators of GalR are physically connected to each other in stationary phase cells . 
+ We designed appropriate primers listed in Table S1 for studying any connections between four of the operator loci located at 17.0 , 48.2 , 64.1 , and 66.5 min around the E. coli chromosome ( Fig. 2A ) . 
+ We also designed 22 other primers ( also listed in Table S1 ) each proximate to binding sites of FruR , MalT , PurR , TyrR , and H-NS transcription factors as likely negative controls for interactions with GalR sites ( shown in Fig. 2A ) . 
+ We used EcoRI for 3C analysis , as its digestion sites are appropriately located in the vicinity of all chosen DNA targets . 
+ The efﬁciency of digestion was estimated by the amount of PCR products obtained with primers around individual EcoRI sites . 
+ After digestion , we detected very little DNA ampliﬁcation of these restriction sites reﬂecting successful digestions , although the PCR ampliﬁcations were abundant without EcoRI treatments . 
+ Typical results with the PDF/PDR primer pair are shown in Fig. 3A ( lanes 2 , 3 , 5 , and 6 ) . 
+ There were very few ampliﬁcation products when treated with DNA ligase after digestion , showing that the digested two ends could not be ligated and restore the original DNA sequence in noticeable amounts ( Fig. 3A , lanes 1 and 4 ) . 
+ In addition , we found that the digestion efﬁciencies in cross-linked and non -- cross-linked samples were somewhat different ( Fig. 3B ) . 
+ In the non -- cross-linked samples , DNA was more or less completely digested in 1 h , whereas in the cross-linked samples , the maximum digestion efﬁciencies reached to ∼ 80 % in 4 h. Thus , we set the digestion time as 4 h . 
+ The PCF/PCR primer pair was used as the internal control of template for a DNA segment containing no EcoRI site . 
+ We performed 3C analysis between GalR and GalR or GalR and non-GalR DNA targets for potential contacts between them in stationary phase cells . 
+ Interaction efﬁciency between two sites was deﬁned by normalizing the ratios of the amount of PCR products between the two in the cross-linked and non -- cross-linked samples compared with the internal controls ( 13 ) . 
+ We arbitrarily set a threefold change or higher , to assign a positive interaction between a given pair of targets . 
+ Among 94 combinations tested , we found contacts among 30 of them . 
+ Sample PCR results are shown in Fig. 4A and summarized in Fig. 4B . 
+ There were no visible signals among the remaining 64 combinations . 
+ Surprisingly , three of the four GalR targets showed positive signals when matched not only against each other but also with six of the non-GalR targets that were not expected to contact GalR targets . 
+ For example , primers P14 , P24 , and P26 close to GalR targets gave interaction signals when paired against P9 and P12 primers designed to test contacts with PurR and TyrR targets , respectively . 
+ GalR-Mediated Intrachromosomal Contacts . 
+ To conﬁrm the participation of GalR in intrachromosomal contacts mentioned above , we performed the 3C assays in a ΔgalR mutant strain ( 14 ) . 
+ Compared with the interaction frequency in the WT , the de-letion of the GalR encoding gene removed most of the observed interactions , both between any GalR-GalR targets and most but not all of GalR and non-GalR targets ( Fig. 3C , lanes 1 -- 4 ) . 
+ The results show that GalR indeed mediates most of the interactions observed here ( Fig. 2B ) . 
+ A GalR homolog , GalS , binds speciﬁcally to all GalR targets tested but with different afﬁnities ( 15 , 16 ) . 
+ We investigated the effect of GalS on the GalR-mediated intrachromosomal contacts in a ΔgalS strain ( 17 ) . 
+ The results showed that , in the absence of GalS , the interaction frequency of the connections did not diminish but rather frequently enhanced ( Fig. 3C , lanes 5 and 6 ) , which is reminiscent of two previous observations : DNA-bound GalS , unlike DNA-bound GalR , does not associate ( 15 ) , and more GalR are made in ΔgalS cells ( 16 ) . 
+ We believe that the enhancement of GalR-mediated interactions observed in many cases in ΔgalS strain occurs because GalS competes with GalR for binding to DNA sites in WT , thus reducing the potency of GalR-mediated bridges in DNA ; in the absence of GalS and presence of an extra amount of GalR , the GalR-mediated connections are higher . 
+ Numerous GalR Binding Sites in the Chromosome . 
+ As shown above , three loci , galP ( 66.52 min ) , galR ( 64.11 min ) , and mgl-galS ( 48.22 min ) interacted not only with each other but presumably also with or near targets of FruR , MalT , PurR , TyrR , and HNS . 
+ Two models may explain the latter results . 
+ ( i ) The latter binding proteins collaborate with GalR to form bridging complexes . 
+ ( ii ) There are GalR binding sites not identiﬁed previously near the non-GalR targets . 
+ These ideas were tested by 3C analysis in strains with deleted binding proteins , ΔfruR , ΔmalT , ΔpurR , ΔtyrR , or Δhns . 
+ We found no signiﬁcant change in the observed PCR products in the deletion strains compared with WT ( Fig . 
+ S5 ) . 
+ On the other hand , in experiments in which one non-GalR site was tested against another non-GalR site , many linkages observed in the WT strain were not found in the ΔgalR strain ( Fig. 3D ) . 
+ The latter observations suggested that the second hypothesis is a more likely reason for interactions involving the so-called nonGalR sites . 
+ DNA sequence search indeed revealed the existence of 91 potential GalR binding sites around the chromosome ( with zero or one mismatch with the consensus bases in the gal operator sequence : TGNAANCGNTTNCA ( 2 ) . 
+ In fact , there is at least one potential GalR-binding site between each EcoRI restriction site at a non-GalR locus tested and the cognate primer sequence used in the 3C assays . 
+ The potential GalR-binding sites identiﬁed by 3C approach are shown in Table 1 . 
+ We do not know whether these newly identiﬁed potential GalR binding sites actually bind GalR and are involved in regulation of speciﬁc genes . 
+ Nonetheless , our results show that a transcription factor for the gal regulon connects distal segments in the bacterial chromosome . 
+ Curiously enough , we did not observe any contacts between the primer ( P1 ) designed for the GalR-regulated gal operon ( 17 min ) and other primers tested by the 3C assays . 
+ A priori , this was an unexpected observation . 
+ As mentioned earlier , the gal operon contains two operators , and GalR binds to both and associates , generating locally a DNA loop of 113 bp ( 18 , 19 ) . 
+ Given that GalR can form polymers , why GalR bound to these sites does not participate in the GalR-mediated chromosomal interconnections identiﬁed above remains to be investigated . 
+ Having established that the transcription factor GalR forms a chromosomal network in stationary phase cells , we tested whether these chromosomal interconnections exist in growing cells . 
+ It appears that more than 80 % of the observed contacts disappear when cells are growing exponentially ( Fig . 
+ S6A ) . 
+ These results are consistent with the hypothesis that moving DNA replication forks may disrupt the connections and that reassociation may be a slow process . 
+ Of the ﬁve signals in the WT that survived in growing cells , perhaps because these associations may have faster kinetics , only one was sensitive to the presence of D-galactose during logarithmic growth ( Fig . 
+ S6B ) . 
+ GalR-Mediated Looping in Vitro by Atomic Force Microscopy . 
+ We have previously used AFM to observe GalR - and LacI-mediated DNA loops ( 7 ) . 
+ In this research , we engineered ﬁve different operators for GalR binding in two mini DNA circles ( pMini-1 and pMini-2 ) with discrete distances between binding sites . 
+ The map of pMini-1 and pMini-2 are shown in Fig. 5 A and C. Puriﬁed mini DNA circles were mixed with or without GalR protein as described in SI Materials and Methods . 
+ Samples were then scanned by an atomic force microscope . 
+ The AFM images of DNA without the protein show plectonomic structures with tight superhelical stretches of DNA and occasional loops because of crossing over of two double helical chains ( Fig. 5 B and D , Upper ) . 
+ In the presence of GalR , we observed 71 molecules of GalR-mediated looped-out DNA of 246 total pMini-1 molecules and 33 of 154 total pMini-2 molecules inspected . 
+ We also counted the numbers of observed loops per GalR-DNA complexes . 
+ For pMini-1 , 30 % of DNA molecules formed loops ( 71 of 246 ) , in which 12.6 % contain two loops , 9.7 % three loops , 5.7 % four loops , and 0.8 % ﬁve loops . 
+ In pMini-2 , almost 20 % formed loops ( 33 of 154 ) , in which 9.1 % contain two loops , 3.2 % three loops , 7.8 % four loops , and 1.3 % ﬁve loops ( Fig. 5E ) . 
+ GalR mediation in the loop formation was inferred by measuring the height and width of the overlapping DNA chains ( 6 ) . 
+ DNA loops frequently emanate from one or more such taller and broader globular particles at the DNA crossover points as shown by black arrows in Fig. 5 B and D , Lower Left . 
+ We assume that these particles are oligomers of DNA-bound GalR . 
+ To establish that oligomerization of DNA-bound GalR generates looping of the DNA intervals , the contour lengths of large numbers of loops in both DNA alone and DNA/GalR samples were traced and the DNA lengths measured ( samples are shown in Fig. 5 B and D. ) . 
+ Without protein , the loops sizes were , as expected , random because of DNA crossovers . 
+ We note that the observed DNA crossover points in the DNA-only samples are of different volumes and are not large enough to account for at least a GalR dimer . 
+ Volume estimates of the `` cores '' in the GalR plus samples suggest that oligomers ( up to octamers ) connect two or more gal operators sites . 
+ We also found that each DNA mol-ecule may contain more than one GalR core particle ( Fig. 5D , Lower Left ) . 
+ This may be the result of group of DNA binding sites -- in the current case , two and three -- get together with GalR independent of each other . 
+ To conﬁrm the GalR mediated formation of core particle , we used the GalRT322R , which is incapable of tetramerization , to see segmental interactions by AFM analysis . 
+ We observed the dimeric GalR bound to DNA circles when incubated with GalRT322R , as indicated by red arrows in the lower right panel of Fig. 5 B and D. However , we also observed that only two loops per molecule at most ( 49 of 526 for pMini-1 and 12 of 205 for pMini-2 ) , whereas for the WT GalR , even ﬁve loops per molecule could be found ( shown in lower panel of Fig. 5 B , D , and E ) . 
+ The formation of two loops per molecule , which means that GalR is tetramerized , may be because of the residual tetra-merization activity in the mutant GalR . 
+ Moreover , the GalRT322R protein did not show any DNA-bound GalR that contains higher than tetrameric structure , unlike the DNA-bound WT GalR . 
+ To further examine the bridging of GalR binding sites , we constructed a smaller 648 bp DNA circle ( pMini-3 ) , which contained only three operators one from previously known sites -- and two from the newly proposed operators from 3C studies described above . 
+ It was important to include the latter two because direct GalR binding to these presumed operators has not been directly demonstrated . 
+ The operators were separated by 100 , 200 , and 300 bp in pMini-3 ( Fig. 6A ) . 
+ AFM analysis of pMini-3 alone showed , as expected , mostly plectonomic DNA ( Fig. 6B , Upper ) . 
+ The plasmid in the presence of GalR clearly showed 1 -- 3 loops of DNA per molecule . 
+ We found that each loop-containing DNA molecule , unlike the previous two plasmids , contained only one GalR core per molecule . 
+ This is because three operators per DNA would not allow formation of more than one GalR core . 
+ We measured contour lengths of GalR-bound and unbound molecules of pMini-3 . 
+ Based on the measured contour lengths , the inferred binding patterns of GalR to pMini-3 are shown in Fig. 6C . 
+ The contour lengths of the plasmids carrying protein are highly consistent with the expected values from the binding site loci . 
+ Discussion We used two independent in vivo methods , microscopic visuali-zation of ﬂuorescent-labeled GalR , and conformation capture of chromosome by cross-linking , to locate DNA bound GalR protein in stationary phase cell . 
+ Results from these two experiments presented here suggest that in stationary phase cells , the E. coli chromosome is partially condensed by the DNA sequence-spe-ciﬁc GalR transcription factor that connects remote segments of DNA in 3D space . 
+ Incidentally , we identiﬁed many more previously unknown GalR binding sites that participate in such interconnections . 
+ We conﬁrmed the connections between remote DNA sites by GalR in the absence of other factors in vitro by direct visualization of the interactions by AFM . 
+ Although the mechanism , the interface , and the energetics of a small ( ∼ 100-bp ) DNA loop formation by association of two DNA-bound GalR dimers are known in detail ( 20 , 21 ) , the frequency or the stability of the remote intersegmental chromosomal multiconnections presumably by multimeric GalR dimers remains unknown at this stage , although the connections are neither rare nor transient . 
+ Given the current ﬁndings with GalR , we believe that other proteins in the cell may make similar intrachromosomal connections and may also contribute to 3D folding . 
+ The implication of our ﬁnding of remote intersegmental chromosomal connections is of importance . 
+ GalR-mediated intrachromosomal connections may serve at least two functions : The `` togetherness '' of gal regulon members should increase the local concentration of GalR around the distant gal regulon promoters and thus coordinate regulation of the functionally regulated gene products , as argued by Dröge and Müller-Hill ( 22 ) . 
+ One way to increase the local concentration of DNA binding proteins is to have their genes located next to their DNA targets . 
+ Another putative role of GalR-mediated chromosomal connections may be architectural ; it may incidentally help chromosomal compaction . 
+ We note that speciﬁc biological roles of many putative GalR binding sites discovered here remain unknown ; either they are part of yet-to-be discovered GalR regulated genes and members of the gal regulon , or they serve purely an architectural role . 
+ We note that a ΔgalR strain does not have any effect on cell growth except a change in intermediary metabolites ( 23 ) . 
+ We are currently investigating the signiﬁcance of associations of DNA-GalR complexes in stationary phase cells . 
+ Our modiﬁed 3C method developed based upon the principle of Dekker et al. ( 6 ) has been used by Wang et al. ( 24 ) to show some intersegmental chromosomal contacts based on the nucleoid protein HNS . 
+ The E. coli chromosome has been shown to contain several kinds of topographical arrangements . 
+ ( i ) It contains six `` macro-domains '' with deﬁned boundaries , four of which are structured and two of which are nonstructured ( 25 -- 28 ) . 
+ Attempts to have site-speciﬁc recombination between att sites engineered into the macrodomains showed that chromosomal inversions within and between domains frequently have physiological consequences , and sometimes they are not permissible , putting some limits to chromosomal rearrangements ( plasticity ) ( 29 ) . 
+ ( ii ) The chromosome may be organized into supercoiled topological domains ( 30 -- 34 ) . 
+ ( iii ) Between replication cycles , the chromosome is 
+ 1 . 
+ Majumdar A , Rudikoff S , Adhya S ( 1987 ) Puriﬁcation and properties of Gal repressor : pL-galR fusion in pKC31 plasmid vector . 
+ J Biol Chem 262:2326 -- 2331 . 
+ 2 . 
+ Weickert MJ , Adhya S ( 1993 ) The galactose regulon of Escherichia coli . 
+ Mol Microbiol 10:245 -- 251 . 
+ 3 . 
+ Geanacopoulos M , Adhya S ( 2002 ) Genetic analysis of GalR tetramerization in DNA looping during repressosome assembly . 
+ J Biol Chem 277:33148 -- 33152 . 
+ 4 . 
+ Elf J , Li GW , Xie XS ( 2007 ) Probing transcription factor dynamics at the single-mole-cule level in a living cell . 
+ Science 316:1191 1194 . 
+ -- 5 . 
+ Dekker J , Rippe K , Dekker M , Kleckner N ( 2002 ) Capturing chromosome conformation . 
+ Science 295:1306 -- 1311 . 
+ 6 . 
+ Lyubchenko YL , Shlyakhtenko LS , Aki T , Adhya S ( 1997 ) Atomic force microscopic demonstration of DNA looping by GalR and HU . 
+ Nucleic Acids Res 25:873 -- 876 . 
+ 7 . 
+ Virnik K , et al. ( 2003 ) `` Antiparallel '' DNA loop in gal repressosome visualized by atomic force microscopy . 
+ J Mol Biol 334:53 -- 63 . 
+ 8 . 
+ Lewis DE , Geanacopoulos M , Adhya S ( 1999 ) Role of HU and DNA supercoiling in transcription repression : Specialized nucleoprotein repression complex at gal promoters in Escherichia coli . 
+ Mol Microbiol 31:451 -- 461 . 
+ 9 . 
+ Tokeson JP ( 1989 ) Ultrainduction of the Escherichia coli galactose operon . 
+ PhD Dissertation ( Howard Univ , Washington , DC ) . 
+ 10 . 
+ Adhya S ( 1967 ) Control of synthesis of the galactose metabolizing enzymes of Escherichia coli : I. Polarity in the galactose operon : II . 
+ The glucose effect and the ga-lactose enzymes . 
+ PhD Dissertation ( Univ of Wisconsin , Madison , WI ) . 
+ 11 . 
+ Dekker J ( 2008 ) Mapping in vivo chromatin interactions in yeast suggests an extended chromatin ber with regional variation in compaction . 
+ J Biol Chem 283:34532 -- 34540 . 
+ ﬁ 12 . 
+ Lieberman-Aiden E , et al. ( 2009 ) Comprehensive mapping of long-range interactions reveals folding principles of the human genome . 
+ Science 326:289 293 . 
+ -- 13 . 
+ Singh BN , Ansari A , Hampsey M ( 2009 ) Detection of gene loops by 3C in yeast . 
+ Methods 48:361 -- 367 . 
+ 14 . 
+ Tokeson JP , Garges S , Adhya S ( 1991 ) Further inducibility of a constitutive system : Ultrainduction of the gal operon . 
+ J Bacteriol 173:2319 -- 2327 . 
+ 15 . 
+ Geanacopoulos M , Adhya S ( 1997 ) Functional characterization of roles of GalR and GalS as regulators of the gal regulon . 
+ J Bacteriol 179:228 -- 234 . 
+ 16 . 
+ Semsey S , et al. ( 2009 ) Dominant negative autoregulation limits steady-state repression levels in gene networks . 
+ J Bacteriol 191:4487 -- 4491 . 
+ 17 . 
+ Golding A , Weickert MJ , Tokeson JP , Garges S , Adhya S ( 1991 ) A mutation deﬁning ultrainduction of the Escherichia coli gal operon . 
+ J Bacteriol 173:6294 -- 6296 . 
+ 18 . 
+ Aki T , Adhya S ( 1997 ) Repressor induced site-speciﬁc binding of HU for transcriptional regulation . 
+ EMBO J 16:3666 -- 3674 . 
+ condensed into a ﬁlament extending from one cellular pole to the other ( 35 ) . 
+ The two ends of the ﬁlaments are connected by a stretched-out DNA segment that includes the `` terminus . '' 
+ How the topography of the `` macro-domains , '' `` topological loops , '' the `` ﬁlament , '' and the currently demonstrated remote `` segmental connections '' in stationary phase cells reconcile with each other in the bacterial nucleoid remains a challenging question , and we do not know whether the different chromosome conformations discussed above are present in both log and stationary phase cells . 
+ Materials and Methods
+ Protocols for ﬂuorescent microscopy , AFM , and 3C analysis used in this study are described in SI Materials and Methods . 
+ Cells used for FM and 3C analysis were cultured in M63 minimal medium . 
+ Constructions of GalR-Venus and GalRT322R-Venus strains and mini circles used for AFM assay are described in detail also in the SI Materials and Methods . 
+ All the primers used for constructions are listed in Table S2 . 
+ ACKNOWLEDGMENTS . 
+ We thank Mark Umbarger ( Harvard Medical School ) for kindly providing the basic 3C protocol ; Ximiao He ( National Cancer Institute ) for help with DNA sequence search ; and Robert Weisberg , Richard Losick , Gene-Wei Li , and Donald Court for help and discussions . 
+ This research was supported by the Intramural Research Program of the National Institutes of Health , National Cancer Institute , Center for Cancer Research . 
+ 19 . 
+ Lewis DE , Adhya S ( 2002 ) In vitro repression of the gal promoters by GalR and HU depends on the proper helical phasing of the two operators . 
+ J Biol Chem 277 : 2498 -- 2504 . 
+ 20 . 
+ Geanacopoulos M , et al. ( 1999 ) GalR mutants defective in repressosome formation . 
+ Genes Dev 13:1251 -- 1262 . 
+ 21 . 
+ Lia G , et al. ( 2003 ) Supercoiling and denaturation in Gal repressor/heat unstable nucleoid protein ( HU ) - mediated DNA looping . 
+ Proc Natl Acad Sci USA 100 : 11373 -- 11377 . 
+ 22 . 
+ Dröge P , Müller-Hill B ( 2001 ) High local protein concentrations at promoters : Strat-egies in prokaryotic and eukaryotic cells . 
+ Bioessays 23:179 -- 183 . 
+ 23 . 
+ Lee SJ , et al. ( 2009 ) Cellular stress created by intermediary metabolite imbalances . 
+ Proc Natl Acad Sci USA 106:19515 -- 19520 . 
+ 24 . 
+ Wang W , Li GW , Chen C , Xie XS , Zhuang X ( 2011 ) Chromosome organization by a nucleoid-associated protein in live bacteria . 
+ Science 333:1445 -- 1449 . 
+ 25 . 
+ Boccard F , Esnault E , Valens M ( 2005 ) Spatial arrangement and macrodomain organization of bacterial chromosomes . 
+ Mol Microbiol 57:9 -- 16 . 
+ 26 . 
+ Espeli O , Mercier R , Boccard F ( 2008 ) DNA dynamics vary according to macrodomain topography in the E. coli chromosome . 
+ Mol Microbiol 68:1418 -- 1427 . 
+ 27 . 
+ Lesterlin C , Mercier R , Boccard F , Barre FX , Cornet F ( 2005 ) Roles for replichores and macrodomains in segregation of the Escherichia coli chromosome . 
+ EMBO Rep 6 : 557 -- 562 . 
+ 28 . 
+ Valens M , Penaud S , Rossignol M , Cornet F , Boccard F ( 2004 ) Macrodomain organization of the Escherichia coli chromosome . 
+ EMBO J 23:4330 -- 4341 . 
+ 29 . 
+ Esnault E , Valens M , Espéli O , Boccard F ( 2007 ) Chromosome structuring limits ge-nome plasticity in Escherichia coli . 
+ PLoS Genet 3 : e226 . 
+ 30 . 
+ Deng S , Stein RA , Higgins NP ( 2005 ) Organization of supercoil domains and their reorganization by transcription . 
+ Mol Microbiol 57:1511 -- 1521 . 
+ 31 . 
+ Hardy CD , Cozzarelli NR ( 2005 ) A genetic selection for supercoiling mutants of Escherichia coli reveals proteins implicated in chromosome structure . 
+ Mol Microbiol 57 : 1636 -- 1652 . 
+ 32 . 
+ Noom MC , Navarre WW , Oshima T , Wuite GJ , Dame RT ( 2007 ) H-NS promotes looped domain formation in the bacterial chromosome . 
+ Curr Biol 17 : R913 -- R914 . 
+ 33 . 
+ Postow L , Hardy CD , Arsuaga J , Cozzarelli NR ( 2004 ) Topological domain structure of the Escherichia coli chromosome . 
+ Genes Dev 18:1766 -- 1779 . 
+ 34 . 
+ Sinden RR , Ussery DW ( 1992 ) Analysis of DNA structure in vivo using psoralen photobinding : Measurement of supercoiling , topological domains , and DNA-protein interactions . 
+ Methods Enzymol 212:319 -- 335 . 
+ 35 . 
+ Wiggins PA , Cheveralls KC , Martin JS , Lintner R , Kondev J ( 2010 ) Strong intranucleoid interactions organize the Escherichia coli chromosome into a nucleoid ﬁlament . 
+ Proc Natl Acad Sci USA 107:4991 -- 4995 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/22768341.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/22768341.txt 0 → 100644
View file @27818a9
+ T7 RNA Polymerase Functions In Vitro without Clustering
+ Abstract 
+ Many nucleic acid polymerases function in clusters known as factories . 
+ We investigate whether the RNA polymerase ( RNAP ) of phage T7 also clusters when active . 
+ Using ` pulldowns ' and fluorescence correlation spectroscopy we find that elongation complexes do not interact in vitro with a Kd ,1 mM . 
+ Chromosome conformation capture also reveals that genes located 100 kb apart on the E. coli chromosome do not associate more frequently when transcribed by T7 RNAP . 
+ We conclude that if clustering does occur in vivo , it must be driven by weak interactions , or mediated by a phage-encoded protein . 
+ Introduction
+ Mounting evidence suggests that many RNA and DNA polymerases function in clusters rather than in isolation . 
+ Mammalian RNA polymerase II ( RNAP II ) , for example , appears to be active in ` factories ' which typically contain ,8 enzymes working on different templates , and DNA polymerases cluster in analogous ` replication factories ' [ 1,2,3 ] . 
+ Such ` factories ' may also exist in 
+ The single-subunit RNA-dependent RNA polymerases of many human viruses also cluster , forming large membrane-bound arrays in which individual molecules interact directly [ 8,9 ] . 
+ The formation of these assemblies can have strong effects on RNAP function ; poliovirus RNA-dependant RNAPs , for example , can not transcribe efficiently without forming clusters [ 10 ] . 
+ Although there are many ways in which the cell might benefit from the existence of polymerase clusters [ 1 ] , the evolutionary forces responsible for their formation remain poorly understood . 
+ One possibility is that clustering creates a high local concentration that facilitates nucleic acid synthesis [ 11 ] . 
+ Another is that RNAP clustering evolved because freely-mobile enzymes would track along and rotate about their templates , and so entangle their trailing nascent transcripts ; conversely , RNAPs immobilized in clusters would reel in their templates without rotating , and so extrude unentangled transcripts [ 11 ] . 
+ The RNAP of bacteriophage T7 is one of the best studied DNA-dependant RNAPs . 
+ The conformation of this single-subunit enzyme remains largely unchanged during promoter binding and polymerization of the first three nucleotides [ 12,13,14 ] ; however , by +7 , the enzyme has already undergone significant rearrangements [ 15 ] and by +14 has morphed into its final processive form [ 16,17 ] . 
+ The resulting elongation complex ( EC ) is highly stable [ 18 ] , and transcribes at ,50 -- 200 bp/s [ 19,20 ] . 
+ Little is known about the clustering of any of these T7 RNAP isoforms . 
+ However the unengaged enzyme does ` aggregate ' at the high concentrations ( ,10 mM ) used during purification and crystallization [ 21,22,23 ] -- and so is often solubilized using nonphysiological concentrations of NaCl and glycerol [ 24,25 ] . 
+ It is not known whether this interaction is physiologically relevant , or occurs at lower RNAP concentrations . 
+ Whether ECs cluster is equally unclear . 
+ Although isolated monomers can function when immobilized in vitro [ 19,26 ] , it remains to be seen whether ECs cluster in vivo or in solution . 
+ ECs have been imaged by atomic force microscopy and appear as monomers [ 27 ] ; however , the procedures used to prepare these samples may have destroyed any pre-existing clusters . 
+ Here , we investigate whether or not T7 RNAP ECs cluster using ` pulldowns ' , fluorescence correlation spectroscopy , and chromosome conformation capture . 
+ We find no evidence for clustering , and conclude that if it does occur in vivo , it is probably driven by weak interactions . 
+ Results
+ T7 RNAP ECs do not co-associate in vitro
+ To test whether active T7 RNAPs cluster , we examined whether ECs diffusing freely in solution interacted with distinguishable ECs directly attached to beads ( Fig. 1A ) . 
+ To achieve this , we created a transcription reaction containing RNAP as well as three DNA fragments of different lengths ( Fig . 
+ S1A ) : a 290-bp template encoding a T7 promoter that was freely-diffusing in solution , a 452-bp template which again encoded the promoter but was bound by a biotin at its 59 end to streptavidin-coated present in samples 1 -- 3 , but not 4 ( as it fails to pellet ) . 
+ The 452-bp template is present in samples 1 and 4 ( as it binds to beads , and pellets ) . 
+ Only trace amounts of the 290-bp template migrate as free DNA in sample 2 ( elongation complexes migrate more slowly as a smear ) , but this amount is increased in sample 3 ( as RNase and heat treatments release it from elongation complexes ) . 
+ The 290-bp template is found in sample 4 when the assay is performed in 10 mM KCl . 
+ However it is absent when the assay is performed in 10 mM KCl plus tRNA , or the more physiological buffer containing 100 mM K glutamate . 
+ doi :10.1371 / journal.pone .0040207 . 
+ g001 beads , and an 800-bp promoter-less control fragment . 
+ When ATP , UTP , and GTP ( but no CTP ) were added , RNAPs initiated on the two templates encoding promoters , and transcribed until they needed to incorporate CTP ; they then stably halted ( Fig . 
+ S1B ; previous work has shown that the resulting halted ECs have half-lives .10 min ; [ 18 ] ) . 
+ We then isolated the ECs formed on the 452-bp templates by pelleting the beads and removing the supernatant . 
+ Any ECs formed on 290-bp templates interacting with these pulled-down ECs would then be found in the pellet . 
+ When the pelleted DNA was isolated and visualized , a small amount of the 290-bp template -- but virtually no 800-bp control DNA -- was found ( Fig. 1Bi , sample 4 ) . 
+ Thus it seemed that ECs on the 290-bp template were associating with the beads and being pelleted . 
+ Examination of the DNA remaining in the supernatant using agarose gel electrophoresis allowed us to distinguish unbound templates ( which migrate as free DNA ) from occupied templates ( which migrate more slowly ; Fig . 
+ S2 ) . 
+ When the RNAPs in the removed supernatant are stripped from their templates ( by heating ) before gel electrophoresis , a large amount of 290-bp template migrates as free DNA ( Fig. 1Bi , sample 3 ) . 
+ However very little 290-bp template migrates freely when RNAPs remain bound to their templates ( Fig. 1Bi , sample 2 ) . 
+ These results suggest that the majority ( i.e. , 60 -- 80 % ) of 290-bp templates were occupied by halted RNAPs at the moment the beads were pelleted . 
+ Additional controls showed that RNAPs initiated as efficiently on the 452-bp template as on the 290-bp template ( Fig . 
+ S3 ) . 
+ Thus , we conclude that although the majority of 452-bp and 290-bp templates were occupied by RNAPs , only a small fraction of the 290-bp was pelleted . 
+ However , we were concerned that the interaction between ECs might be caused by aggregation of nascent RNA , and not by an interaction between RNAPs . 
+ To investigate this possibility , we repeated the experiment in a buffer containing 10-fold more tRNA than DNA template ( Fig. 1Bii ) . 
+ We expected that the tRNA would disrupt any non-specific RNA-based interactions ( by competing for any RNA-binding sites ) , while leaving polymer-ase-based protein-protein interactions unaffected . 
+ When the experiment was conducted in the presence of tRNA , only tiny amounts of the 290-bp template were found in the pellet ( `` 8 % of total ; Fig. 1Bii , compare samples 4 and 5 ) . 
+ Because the remaining 290-bp template did not appear to be enriched relative to the 800-bp promoter-less control fragment ( Fig. 1Bii , compare samples 4 and 5 ) , we concluded it was not pelleted due to EC-EC interactions , but rather , persisted because we only removed ,97 % of the supernatant . 
+ Our finding that no short template ( or control DNA ) was found in the pellet when a gentle wash step was included supports this interpretation ( data not shown ) . 
+ Therefore , we conclude that the previously-observed interaction was based on non-specific RNA interactions . 
+ As such interactions are unlikely to be physiologically relevant ( see Text S1A ) , we conclude that no meaningful RNAP-RNAP interactions were detected using these assay conditions . 
+ Repeating the assay using a more physiological buffer ( KGB , which contains 100 mM K glutamate , instead of LS1 , which contains 10 mM KCl ) yielded a similar conclusion even though no tRNA was present : although most templates were occupied by RNAPs ( Fig. 1Biii , compare free-migrating short template in samples 2 and 3 ) , no enrichment of the 290-bp template relative to the control DNA was observed ( Fig. 1Biii , sample 4 ) . 
+ Identical results were obtained when the total concentration of ECs was increased to 0.1 mM , and when bovine serum albumin was used as a blocking agent instead of casein ( data not shown ) . 
+ Were ECs to form stable , oligomeric clusters , we would expect that most of the occupied short template ( i.e. , ,60 -- 80 % of total ) would interact with the bead-bound ECs , and so be found in the pellet . 
+ Our finding that less than a few percent of the short templates are pulled down therefore supports the conclusion that ECs do not form stable clusters under these conditions . 
+ T7 RNAP ECs do not interact with a Kd ,1 mM In our previous experiment , we found that ECs attached to beads were unable to ` pull down ' ECs in solution . 
+ However , it is possible that the pelleting of the bead-bound ECs disrupted their interaction with ECs in solution . 
+ To eliminate this possibility , we used fluorescence correlation spectroscopy ( FCS ) to study EC diffusion behaviour . 
+ In this nonperturbative technique , a laser is focused on a ` confocal spot ' in solution , allowing the measurement of the diffusion times -- and therefore relative sizes -- of fluorescently-labelled ECs [ 28 ] . 
+ Since diffusion is slower for larger complexes , diffusion times increase with complex size . 
+ We expected single ECs with no interaction partners to diffuse relatively quickly , with a small diffusion time less than or equal to the sum of the diffusion times of their components ( i.e. , an RNAP and its template ; Text S1B ) ; in contrast , interacting ECs should diffuse more slowly as large complexes containing multiple RNAPs and templates -- with diffusion times greater than those expected for non-interacting ECs . 
+ We began by calculating an expected diffusion time for noninteracting ECs . 
+ We determined that the diffusion time of the 70-bp fluorescently-labeled template upon which our ECs would be formed was 2.460.1 ms ( Fig. 2Aii ) . 
+ This measurement was in agreement with values determined previously ( Text S1C ) . 
+ We then calculated that T7 RNAP would -- because of its size and globular nature -- have a diffusion time of 2 -- 3 ms ( Text S1C ) . 
+ Assuming that the diffusion time of a complex would be less than the sum of the diffusion times of its parts , we concluded that non-interacting ECs would have a diffusion time of 2.4 -- 5.4 ms. If ECs had a diffusion time above this range , it would suggest the existence of larger , and therefore higher-order , complexes . 
+ To generate ECs that could be tracked by FCS , we allowed RNAP to initiate on a 70-bp fluorescently-labeled template in the presence of ATP , UTP , and GTP . 
+ Under these conditions , the enzyme produced a 23-bp transcript before stably halting when the first C needed to be incorporated ( Fig . 
+ S1 ) . 
+ The majority of such a short nascent transcript is hidden within the RNAP ( or bound to its surface ; [ 27 ] ) , and we anticipated that the few bps emerging from the EC would not drive the RNA-based interactions observed in our ` pulldown ' assay . 
+ We expected that the templates in the EC-containing solution would be found in one of three populations : unoccupied templates , templates incorporated into ECs that are not bound to other ECs , and templates incorporated into ECs which in turn are bound to other ECs . 
+ For complexes with diffusion times within an order of magnitude of one another , FCS essentially reports the average diffusion time of all fluorescent species ; thus fast-diffusing templates not bound to clustered RNAPs could -- if numerous enough -- easily obscure the existence of more slowly-diffusing EC clusters . 
+ To ensure that the fraction of templates not incorporated into ECs was negligible , we used more RNAP than template in our reactions , and performed extensive controls to show that virtually every template was bound by an active RNAP ( Text S1D ) . 
+ The fraction of ECs found in clusters depends upon the strength of the attraction between RNAPs ; as most protein-protein interactions have Kd between 1 nM and 1 mM [ 29 ] , we expected that the strength of any EC clustering would also fall within this range . 
+ To detect such interactions , we required EC concentrations .0.1 mM ; unfortunately , our FCS setup could only measure fluorescent species present at concentrations below 50 nM . 
+ To allow higher concentrations of ECs , we used a low concentration of labeled template ( always 2 nM ) and a large excess of unlabeled template ( up to 0.54 mM ) in our transcription reactions . 
+ ECs formed on unlabeled templates would not be directly visible to our FCS assay , but could still bind to the labeled ECs and so retard their diffusion . 
+ After initiating a transcription reaction containing 2 nM labeled 70-bp template , 100 nM unlabeled 70-bp template , and 120 nM RNAP , we measured the average diffusion time of the now-occupied templates to be 3.360.2 ms ( Fig. 2Aiii ) . 
+ To be absolutely confident that all templates were incorporated into ECs ( Text S1D ) , we repeated the experiment using an increased RNAP : template ratio of 5:1 ; the template diffusion time marginally increased to 3.960.2 ms ( Fig. 2Aiii ) . 
+ These diffusion times fall squarely within the range expected for non-interacting ECs , and thus provide no evidence for RNAP clustering . 
+ However , we were unable to calculate precisely an expected diffusion time for small EC clusters ( e.g. , dimers or trimers ) , and thus could not formally exclude the possibility that our ECs were diffusing as dimers or other lower-order complexes , rather than monomers . 
+ To set a lower limit on the diffusion times of EC clusters , we replaced the 70-bp unlabeled templates in our experiment with 452-bp unlabeled templates ( Fig. 2Aiv ; S1 ) . 
+ Under these conditions , any EC clusters would contain at least one EC formed on a 452-bp template , and so would possess a D. 15 ms ( i.e. , the diffusion time of the 452-bp template alone ; Fig. 2Av ; Text S1C ) . 
+ However , substituting unlabeled 452-bp templates for unlabeled 70-bp templates had no significant effect on the diffusion time of the labeled 70-bp ECs , which still diffused with D = 3 -- 4 ms ( Fig. 2Aiii -- iv ) . 
+ This was the case even when the concentration of occupied 452-bp templates was increased to 0.54 mM ( Fig. 2Aiv ) . 
+ We conclude that -- under our assay conditions -- the overwhelming majority of RNAPs halted on the labeled 70-bp templates did not bind to the RNAPs halted on the 452-bp templates . 
+ We note that our finding that the diffusion times of ECs was relatively unaffected by the ratio of RNAP : template is not consistent with the possibility that an interaction was present , but titrated out by excess RNAP . 
+ To estimate the detection limit of our assay , we calculated the autocorrelation function that our assay would have produced , if the halted RNAPs were to interact . 
+ In the experiment of Figure 2Aiv , we measured the autocorrelation function of 2 nM labeled ECs ( formed on 70-bp templates ) , in the presence of 0.54 mM unlabeled ECs ( formed on 452-bp templates ) . 
+ If ECs dimerized with Kd = 1 mM , such a solution would contain ,40 % dimers and ,60 % monomers . 
+ We calculated the autocorrelation function of this solution by conservatively modeling monomers ( 70-bp templates bound by halted RNAPs ) as having a tD of 4 ms , and dimers ( complexes containing two active RNAPs , one 70-bp template , and one 452-bp template ) as having a tD of 15 ms. We find that such a solution would produce an autocorrelation function clearly distinguishable from the one measured in the experiment summarized in Fig. 2Aiv ( with results in Fig. 2B ) . 
+ Thus , we conclude that -- under our in vitro conditions -- active T7 RNAPs do not interact with a Kd ,1 mM . 
+ Genes transcribed by T7 RNAP do not detectably interact 
+ To test whether ECs interact in their native cellular environment ( i.e. , in living E. coli ) , we used ` chromosome conformation capture ' ( 3C ; [ 30 ] ) to determine whether or not two T7 promoter-encoding genes -- which are located far apart on the bacterial chromosome -- are in contact more frequently when transcribed by T7 RNAP . 
+ If ECs active at different genomic sites interacted , we expected that their respective transcription units would also be brought into close proximity . 
+ We began by constructing a strain that would allow us to test this hypothesis . 
+ We first inserted two genes encoding T7 promoters ( PT7-YFP and PT7-T7gene10 ) into the E. coli genome 100 kbp apart ( Fig. 3A ) . 
+ We expected that if ECs clustered , these two genes would be brought into contact when transcribed by the T7 polymerase . 
+ To control the levels of T7 RNAP in the cell , we integrated a gene expressing the polymerase under the control of a PBAD promoter ( Fig. 3A ) . 
+ This gene produced high levels of T7 RNAP when cells were grown in arabinose , but negligible levels when cells were grown in glucose ( Fig. 3Bi ) . 
+ Controls confirmed that this T7 RNAP efficiently transcribed the two T7 promoterdriven test genes ( Fig. 3B ) . 
+ We then used ` 3C ' to determine whether or not the two testgenes were in contact more frequently when transcribed by T7 RNAP . 
+ This PCR-based method determines the relative interaction frequencies of different genomic regions in vivo [ 30 ] . 
+ Cells are fixed with formaldehyde , and their chromatin digested with a restriction enzyme . 
+ Cross-linked restriction fragments are then ligated together , and the frequency of ligations between different pairs of restriction fragments is measured by PCR . 
+ We performed 3C on cells grown in either arabinose or glucose , and -- under both conditions -- determined the frequency with which the BglII restriction fragment containing PT7-T7gene10 was ligated to the fragment containing PT7-YFP ( Fig. 4A ) . 
+ We found that transcription of the two test-genes by T7 RNAP had no effect on the ligation frequency of their respective restriction fragments ( Fig. 4B , lanes 1,2 , primer pair a : c ) . 
+ Controls showed that the formation of the ligation products depended on formaldehyde crosslinking ( Fig. 4B lane 3 ) , and that the efficiency of the 3C protocol was independent of the presence of T7 RNAP ( Fig. 4B primer pairs a : b , d : e ) . 
+ We conclude that if T7 RNAP ECs do interact , they do not do so strongly enough to significantly change 
+ Discussion
+ Many RNAPs co-associate when active ; this clustering often influences function , for example , by increasing activity ( see Introduction ) . 
+ In order to determine whether T7 RNAP behaves similarly , we used three independent assays to test whether this polymerase also clusters when active . 
+ In the first assay , we attempted to ` pulldown ' ECs in solution using ECs attached to beads ( Fig. 1A ) , and found no evidence for a direct protein-protein interaction ( Fig. 1B ) . 
+ As this assay required physical manipulation of ECs which might break weak EC-EC interactions , we performed a second assay using fluorescence correlation spectros-copy ; this directly measures complex sizes without the need for physical manipulation , but it also failed to provide evidence for clustering ( Fig. 2 ) . 
+ Therefore , if T7 ECs do interact in vitro , it seems likely that they will do so with a Kd outside the detection range of our assays ( i.e. , .1 mM , which is much greater than the estimated in vivo concentration of 30 nM ; see Text S1G ) . 
+ As the buffers and enzyme concentrations we use are typical of those widely applied by others [ 18,20,24 ] , we conclude that in the majority of the instances where it has been studied , T7 RNAP has behaved as a monomer . 
+ Because interactions present in vivo can be missed by in vitro assays ( e.g. , if they require macromolecular crowding , or a ` bridge ' protein ) , we also used chromosome conformation capture ( 3C ) to examine association in vivo ( Fig. 3 ) . 
+ In mammals , 3C readily detects RNAP-driven clustering of active genes [ 31,32 ] , even when those interactions occur in only ,1 % cells in the population [ 31 ] . 
+ However , 3C failed to provide any evidence for clustering in bacteria ( Fig. 4 ) , even though the genes we examined are probably as tightly packed with polymerases as the ribosomal cistrons ( our T7 RNAP-based expression system can produce as much RNA as all seven ribosomal cistrons combined , which are each typically occupied by 70 RNAPs/gene ; [ 33,34 ] ; see also Text S1E ) . 
+ However , our 3C assay does have limitations . 
+ It involves formaldehyde fixation , which can rapidly disrupt nucleoid structure [ 35,36 ] , and so could -- in principle -- also destroy any clustering . 
+ Note , however , that clustering of genes binding H-NS , a global transcriptional silencer , can be detected by 3C [ 37 ] . 
+ We may also have inadvertently inserted our two test genes in regions of the bacterial genome that interact rarely . 
+ Another problem is that the phage-encoded proteins expressed during T7 infection were not present in our 3C assay . 
+ Any EC clustering dependent upon a phage-encoded ` bridge ' protein would not have been detected in our assays ( this , and other potential problems are discussed in Text S1F ) . 
+ In conclusion , we find no evidence for the clustering of active forms of T7 RNAP either in vitro or in vivo . 
+ Our in vitro assays allow us to exclude the possibility of a strong interaction between ECs ( i.e. , with Kd ,1 mM ) . 
+ Our in vivo 3C assay does not allow us to draw equally firm conclusions , but nevertheless suggests that if an interaction does exist , it is likely to be weak , disrupted by our assays , or dependent on phage proteins not present in our 3C experiment . 
+ If an interaction does not exist , then the phage enzyme clearly has different properties from its mammalian counterparts , with which it shares only minimal structural homology [ 38 ] . 
+ But , then , Nature must find other ways of immobilizing the phage enzyme , or otherwise preventing the entanglement of nascent transcripts about their templates [ 11,39 ] . 
+ Materials and Methods
+ Templates
+ Template DNA was created by PCR from pLSG407 [ 40 ] unless otherwise indicated . 
+ KRF3/28 was the product of a PCR using primers KRF3 and KRF28 . 
+ The ` 452-bp template ' ( created using KFR3/28 as a template ) was the product of primers KRF28 and KRF32 , and contained a 59 biotin , followed by a BamHI site , a T7 promoter , and a 382-bp C-less cassette followed by 16 bp of Ccontaining DNA . 
+ The ` 290-bp template ' contained a T7 promoter followed by a 243-bp C-less cassette and 12 bp of C-containing DNA , and was the product of primers KRF36 and KRF37 . 
+ The ` 70-bp template ' was created using the oligonucleotide template KRF47 in combination with the primers KRF42 and KRF45 , and contained a T7 promoter followed by a 23-bp C-less cassette and 12 bp of C-containing DNA . 
+ Template DNA was purified using a 
+ Minelute PCR purification kit (Qiagen).
+ Labeling of DNA with fluors
+ The fluorescently-labelled 70-bp DNA template was prepared in the same manner as the unlabeled template , except that the primer KRF43 was replaced by the fluorescently-labeled primer KRF45 ( see Table S1 for primer sequence ) . 
+ KRF45 contained an amine-labeled dT residue near its 59 end , and was labeled using succinimidyl esters of Cy3B ( GE Healthcare ) or Atto647 ( Atto-Tec ) following the manufacturer 's instructions . 
+ One hundred micrograms of KRF45 was dissolved in 100 mL of H2O and extracted three times with an equal volume of chloroform . 
+ After the addition of 10 mL 3 M sodium chloride and 250 mL ethanol , the oligonucleotide was incubated at 220uC for 30 min , and then centrifuged at 12,000 * g for 30 min at 4uC . 
+ The pellet was allowed to dry , resuspended in 75 mL of 0.1 M sodium borate ( pH 8.5 ) , and frozen in 25 mL aliquots . 
+ A 50 nmol aliquot of succinimidyl ester was then resuspended in 5 mL DMSO , mixed with a 25 mL aliquot of KRF45 , and left overnight ( in darkness ) at 25uC . 
+ Labeled oligonucleotides were purified away from unconjugated fluorophore by ethanol precipitation , followed by one wash with 70 % ethanol . 
+ Comparing the absorbance of the oligonucleotide at 260 nm ( using e = 193,750 M 21 21 260 cm ) with its absorbance at 563 nm ( for Cy3b ; using e563 = 130,000 M cm , CF260 = 0.08 ) or 650 nm ( for Atto647N ; 21 21 e = 150,000 M 21 21 650 cm , CF260 = 0.06 ) showed that 90 -- 100 % of oligonucleotides were labeled . 
+ Denaturing urea-PAGE followed by visualization of the unstained gel with a FLA5000 imager showed that .90 % of the dye migrated with the purified oligonucleotide . 
+ The transcription buffer used in this experiment was either low-salt buffer ( LS1 ; 40 mM Tris-acetate pH 7.6 , 10 mM potassium chloride , 15 mM magnesium acetate , 5 mM dithiothreitol , 0.1 mg/mL N,N-dimethylated casein , 0.05 % Tween 20 , 0.4 U / mL RNase inhibitor , Roche ) or the more physiological potassium-glutamate buffer ( KGB ; 40 mM Tris-acetate pH 7.6 , 100 mM potassium glutamate , 15 mM magnesium acetate , 5 mM dithiothreitol , 0.1 mg/mL N,N-dimethylated casein , 0.4 U/mL RNase inhibitor ; [ 41 ] ) . 
+ The buffer LS1 was used because a study of the effect of buffer composition on T7 RNAP activity found this formulation to be optimal [ 24 ] . 
+ The buffer KGB was used because it is thought to mimic the cellular milieu [ 41 ] . 
+ The blocking agent in KGB was changed from bovine serum albumin ( BSA ) to casein because the latter yielded slightly higher T7 RNAP activity [ 24 ] . 
+ The experiment was performed at 25uC ( when LS1 was used ) or 37uC ( when KGB was used ) . 
+ A 60 mL transcription reaction contained transcription buffer plus 4 pmol His6-tagged T7 RNA polymerase , 0.6 pmol biotinylated 452-bp template , 0.6 pmol 290-bp template , and 0.2 pmol 800-bp control DNA . 
+ Two samples ( 2 mL each ) were taken , and immediately added to 10 mL ice-cold 16 TBE loading dye ( 89 mM Tris-borate , 89 mM boric acid , 2 mM EDTA , 0.05 % bromophenol blue ) . 
+ Separately , 30 mL of M270 magnetic 8 streptavidin beads ( 6.7610 beads per mL ; Invitrogen ) were washed twice in 200 mL transcription buffer , and then resuspended in the remaining 56 mL of the transcription reaction . 
+ After incubation for 20 min ( with mixing after 10 min ) , ATP , UTP , and GTP were added to a final concentration of 0.5 mM . 
+ Then , after 30 s , beads were pelleted with the aid of a magnet , and the supernatant removed . 
+ After removing a 2 mL sample ( and addition to TBE loading dye as above ) , supernatants were heated to 65uC for 10 min , and treated with 10 U RNase I ( Promega ) for 10 min at 37uC . 
+ The pellet was resuspended in water , then 106 LS1 was added to a final concentration of 16 , followed by the addition of 10 U/mL RNase I and 10 U BamHI ( assuring the initial ,60 mL volume was conserved ) . 
+ After 20 min at 37uC , beads were pelleted , the supernatant heated to 65uC for 10 min , and 2 mL samples collected ( and added to TBE loading dye as above ) . 
+ Fluorescence correlation spectroscopy
+ Transcription reactions ( performed in LS1 ) were initiated by addition of ATP , UTP , and GTP to 0.5 mM , and incubated for 30 s at 25uC before being pipetted onto a cleaned coverslip at 25uC . 
+ Fluorescence correlation spectroscopy was performed as described [ 42 ] . 
+ Time traces were acquired for 10 s using a SPQR ¬ 
+ 14 avalanche photodiode ( Perkin Elmer ) , and autocorrelation functions were produced in real-time using a Flex02-02D correlation card ( Correlator.com ) . 
+ As our setup has a large pinhole , and therefore an elongated confocal spot ( longitudinal radius , wz . . 
+ wxy , the axial radius ) , translational diffusion times ( tD ) were extracted from autocorrelation curves by fitting to a two-dimensional single-species model , 1 t { 1 G ( t ) ~ ( 1z ) ( equation 1 ; [ 43 ] ) , where t is the delay time , N tD G ( t ) is the autocorrelation function , and N is the mean number of fluorescent molecules in the observation volume over the measurement . 
+ Experimentally acquired FCS curves were fit very well by this model ( e.g. , Fig. 2B and Fig . 
+ S4 ) . 
+ Although the molecules we analyze diffuse in three dimensions , the 3D model , 1 t t G ( t ) ~ ( 1z ) ( 1z ) ( where A = wz/wxy ; equation { 1 { 0:5 N t 2 D A tD 2 ; [ 28 ] ) , simplifies to the two-dimensional model ( equation 1 ) in the case of an elongated confocal spot [ 44 ] . 
+ To ensure that the 2D model was appropriate for modeling our data , we fit our Rhodamine 6G autocorrelation curves with both the 2D and 3D models . 
+ Fitting the data with the 3D model did not significantly change the values we obtained for either tD or N , however A could not be fit with reasonable confidence intervals ; changing the value of A therefore did not substantially affect the goodness of fit , a behavior consistent with confocal volumes where wz . . 
+ wxy . 
+ To ensure that our choice of model did not change the conclusions of our FCS work , we re-fit all of our FCS curves ( i.e. , all the data in Fig. 2A ) using the 3D model and setting A = 7 , a common value for single-photon excitation setups ; doing so increased all tD values by a small amount ( ,3 -- 5 % ) , with the difference between any two tD values changing by not more than 2 % . 
+ Two-species curves were calculated using the model 1 t { 1 G ( t ) ~ ( N1D1 ( t ) zN2D2 ( t ) ) , where Di ~ ( 1z ) , and N2 tDi N1 and N2 are the mean number of fluorescent molecules of species 1 and 2 , respectively , in the observation volume ( equation 3 ; [ 28 ] ) . 
+ Curve fitting was performed in MATLAB ( Mathworks ) . 
+ These models were also used to calculate the curves in Figure 2B . 
+ Fluorescence fluctuations were unlikely to be the result of dyespecific or photoinduced-photophysics , as the fitted N and tD of the fluorescently-labelled 70-bp template were unchanged when Atto647N was substituted for Cy3B , or when laser power was increased 10-fold ( data not shown ) . 
+ In order to convert diffusion times ( which depend on the size of the observation volume generated by the FCS setup ) into diffusion coefficients ( which are physical constants ) , we calculated the radius of the observation volume , v , using t 2 D ~ v = 4D ( equation 4 ; [ 28 ] ) . 
+ Measuring a diffusion time of 0.3860.1 ms ( fitting to equation 1 ) for the fluorescent standard rhodamine 6G ( D = 4.14 ? 
+ 10 cm / s ; 26 2 [ 45 ] ) allowed us to calculate v = 7806100 nm . 
+ This observation volume is slightly larger than usual in order to maximize the number of photons captured from fluorophores during singlemolecule FRET experiments carried out on the setup ; however , this does not affect our ability to measure diffusion times . 
+ Chromosome conformation capture
+ This protocol -- modified from the original [ 30 ] for use in bacteria -- was generously provided by Mark Umbarger ( Harvard ; [ 46 ] ) . 
+ The E. coli strain KF22-1 was grown overnight to saturation in LB +50 mg/mL kanamycin , diluted by 1:250 into flasks containing 25 mL of the same media ( preheated to 37uC ) , and incubated at 37uC with shaking . 
+ After 30 min , arabinose was added to 0.4 % , or glucose was added to 0.2 % . 
+ When the cultures reached an OD600 of 0.4 , sodium phosphate ( pH 7.6 ) and formaldehyde were added to final concentrations of 10 mM and 1 % respectively ( except for non-crosslinked controls ) . 
+ After 20-min incubation at 37uC and 20-min incubation in an ice bath ( both with light shaking ) the formaldehyde reactions were quenched by addition of glycine to 0.125 M , and incubated for 5 min at 25uC . 
+ All cultures were then spun down at 5000 * g for 10 min , washed once with 50 mL ice-cold Tris-buffered saline ( 20 mM Tris-HCl pH 7.5 , 150 mM NaCl ) , pelleted , and stored at 280uC . 
+ The pellets were then resuspended in 1 mL TE buffer ( 10 mM Tris , 1 mM EDTA , pH 8 ) , and minor adjustments were made to assure that the OD600 of all samples was equal . 
+ For each pellet , 60 kU of Ready-Lyse Lysozyme ( Epicentre ) was added , and the mixture incubated at 25uC for 15 min with occasional gentle pipetting to resuspend cells . 
+ SDS was then added to a final concentration of 0.5 % and cells were allowed to incubate for 30 min . 
+ Five microlitres of solubilized chromatin ( ,100 ng DNA ) were mixed into a 50 mL volume containing 16 restriction buffer # 3 
+ ( New England Biolabs ) and 1 % Triton X-100 , and incubated for 20 min to allow the Triton to neutralize the SDS . 
+ Fifty units of BglII ( New England Biolabs ) were added , and the chromatin digested for 2.5 h at 37uC with light shaking . 
+ One additional sample served as a no-restriction enzyme control . 
+ The reaction was then halted by addition of SDS to 1 % . 
+ In order to form intra-molecular ligation products , 60 mL digested chromatin was added to 760 mL ` ligation mix ' containing 16 T4 ligase buffer , 1 mM ATP , 25 mg/mL BSA , 1 % Triton X-100 , and 2.4 kU/mL T4 DNA ligase . 
+ One additional sample served as a ` no ligase ' control . 
+ Ligase mixtures were then incubated for 16uC for 1 hr . 
+ The reaction was halted by the addition of EDTA to 10 mM , and incubated overnight with 50 mg of proteinase K at 65uC . 
+ Four hundred microlitres of the DNA solution was then extracted twice with 400 mL of 25:25:1 phenol : chloroform : isoamyl alcohol . 
+ Glycogen was added to a final concentration of 50 mg/mL . 
+ Ice-cold sodium acetate and ethanol were then added to final concentrations of 0.75 M and 70 % ( v/v ) respectively . 
+ The DNA-glycogen mixture was incubated at 280uC for 3 h , and then spun down at 20,000 * g at 4uC for 20 min . 
+ The pellet was then washed with 1 mL 70 % ( v/v ) ethanol ( 25uC ) , air dried , and resuspended in 12 mL distilled , deionized , H2O . 
+ PCR was performed using FlexiTaq DNA polymerase ( Pro-mega ) and 16 reaction buffer , 1.75 mM MgCl2 , 0.2 mM dNTPs , 0.4 mM primers and 2 % DMSO on a thermocycler using the following program : ( i ) 95uC for 1 min , ( ii ) 95uC for 1 min , ( iii ) 65uC for 45 s , ( iv ) 72uC for 2 min , ( v ) repeat steps ii -- iv 35 times , and ( vi ) 72uC for 6 min . 
+ Ligations between restriction fragments 1 ( T7 gene 10 ) and 8 ( control DNA fragment ) were amplified using primers KF101to8BglIIfw and KF101to8BglIIrv ; these primers were designed to produce a fragment of 243 bp ( this corresponded to ligation product a : b in Fig. 4A ; all primer sequences can be found in Table S1 ) . 
+ Ligations between restriction fragments 1 ( T7 gene 10 ) and 16 ( pT7-Ypet ) were amplified using primers KF101to16B-glIIfw and KF101to16BglIIrv ; these primers were designed to produce a fragment of 217 bp ( this corresponded to ligation product a : c in Fig. 4A ) . 
+ We queried the inversion and ligation of two adjacent fragments of E. coli genomic DNA by PCR using primers 3CposconA and 3CposconB ; these primers were designed to produce a fragment of 443 bp ( this corresponded to ligation product d : e in Fig. 4A ) . 
+ The identity of all PCR products was confirmed by measuring the size of the products , and by digesting these products with BglII ( data not shown ) . 
+ We quantified the amount of ligation products produced in our 3C reactions using PCR , following well established protocols [ 47 ] . 
+ We began by optimizing PCR conditions ( i.e. , amount of 3C template per reaction , and number of PCR cycles ) to ensure that the amount of PCR product produced was linearly related to the amount of ligation product initially present in the PCR reactions . 
+ This was accomplished by performing PCR reactions containing serial dilutions of the 3C template , subjecting the PCR reactions to gel electrophoresis ( on a TBE-2 % agarose gel ) , staining the gels with SYBR green I , and measuring the intensities of the bands corresponding to the amplification products ( using AIDA image analysis software ) . 
+ We found that , for all the ligation products we examined , 36 PCR cycles on 30 ng of our 3C template resulted in bands with an intensity that was proportional to the amount of ligation product in the initial PCR reactions ( e.g , see Fig. 4B lanes 1 , 5 , and 6 ) . 
+ Using these conditions , we then conducted PCR on all experimental samples in triplicate . 
+ For each primer pair , controls containing 15 ng and 60 ng ` + T7 ' 3C template ( i.e. , 0.56 and 26 the normal amount ) were also included to ensure that the intensity of the bands produced on our gels was linearly related to the amount of ligation products in the PCR reactions ( these controls are found in Fig. 4B , lanes 1 , 5 and 6 ) . 
+ Only samples run on the same gel were directly compared . 
+ The goal of the experiment was to determine whether the interaction frequency of the transgenes PT7-gene10 and PT7-YFP , ( X ) , in the presence of T7 RNAP , XzT7 , was greater than the interaction frequency of these two genes in the absence of T7 RNAP , X { T7 . 
+ In other words , the goal was to determine whether XzT7 = X { T7 was greater than 1 . 
+ The relationship between interaction frequencies ( which occur in the cell ) and ligation frequencies ( which are present in a 3C template sample ) is given by ( LzT7 = LCzT7 ) X = X ~ ( equation 5 ) , where L and zT7 { T7 zT7 ( L { T7 = LC { T7 ) L { T7 are the ligation frequencies of the transgenes in the presence and absence of T7 RNAP , while LCzT7 and LC { T7 are the ligation frequencies of two control restriction fragments that should interact at the same rate regardless of whether or not the transgenes are transcribed by T7 RNAP ( these two control ligation products were amplified by primers a : b or d : e ; Fig. 4A ) . 
+ This equation states that directly comparing ligation frequencies between different 3C samples is possible only after differences in the efficiency of the 3C protocol between samples are controlled for . 
+ If we assume that the intensity of the band produced by each amplified ligation product is proportional to the original amount of ligation product in the 3C template ( we do , indeed show that this is the case , see above , and Fig. 4B lanes 1 , 5 , and 6 ) , then the intensity of the band seen in the gel , I , is related to the amount of ligation product in the PCR reaction , L , by L ~ a : I , where a is the efficiency of the relevant primer pair . 
+ Then ( aT7IzT7 = aCICzT7 ) ( IzT7 = ICzT7 ) XzT7 = X { T7 ~ ~ ( equation ( aT7I { T7 = aCIC { T7 ) ( I { T7 = IC { T7 ) 6 ) . 
+ This equation reveals that because the experiment is ultimately interested in a change in a single interaction frequency , primer efficiencies cancel out , and have no effect on the final result . 
+ It also gives the expressions that must be measured in order to determine whether the interaction frequency of the two transgenes changes in the presence of T7 RNAP . 
+ The values of ( IzT7 = ICzT7 ) and ( I { T7 = IC { T7 ) are given by the ` test gene contact frequencies ' in Fig. 4B lanes 1 and 2 . 
+ Because these values are virtually identical , XzT7 = X { T7 is ,1 . 
+ This result indicates that the interaction frequency of the transgenes is not changed by the presence of T7 RNAP . 
+ To test the efficiency of restriction nuclease digestion , PCR primers BglIIconfw and BglIIconrv were chosen to amplify a 285 bp fragment of genomic DNA containing a BglII site at its centre . 
+ To quantify total DNA , PCR primers rpoZampfw and rpoZamprv were chosen to amplify a 292 bp genomic fragment that did not contain a BglII site . 
+ Restriction digestion efficiency was determined by comparing the ratios of the BglIIconfw/rv fragment : rpoZampfw/rv fragments in the presence and absence of restriction digestion . 
+ Supporting Information
+ based assays . 
+ A. Diagrams of DNA fragments ( i ) 800-bp promoter-less control fragment . 
+ ( ii ) 452-bp template . 
+ ( iii ) 290-bp template . 
+ ( iv ) 70-bp template . 
+ Numbers indicate the position of elements ( in bp ) relative to the 59 ends of the templates . 
+ B. Transcripts produced by T7 RNAP . 
+ The templates in ( A ) were transcribed in reactions containing 16 KGB , 100 nM template , 200 nM RNAP , and 0.5 mM ATP+GTP + [ 32P ] UTP ( 0.25 mCi / mL ) in the presence or absence of 0.5 mM CTP . 
+ After 10 min , the resulting RNA was separated by denaturing urea-PAGE , and visualized using a phosphoimager screen ( Molecular Dynamics ) and a FLA5000 imager ( Fuji ) . 
+ ( i ) Transcripts produced by all three templates . 
+ ( ii ) A second gel better resolving the transcripts produced using the 452-bp template ( below ) . 
+ The shorter products produced in reactions lacking CTP indicate that RNAPs transcribe the C-less cassettes but halt at the first C residue . 
+ ( TIF ) reaction ( in buffer LS1 ) lacking NTPs containing 50 nM T7 RNAP and 8 nM of the 452-bp template ( encoding a T7 promoter , a 382-bp C-less cassette , and a C-containing 39 end ) was prepared , and sampled under sequentially-applied conditions . 
+ These samples were separated using a native 1.5 % agarose gel , and stained with SYBR green I . 
+ In the absence of NTPs , the templates are not stably bound by RNAPs , and thus migrate as free DNA ( lane 1 ) . 
+ Adding ATP+UTP+GTP ( to 0.5 mM ) causes 
+ RNAPs to initiate and halt at the end of the C-less cassette . 
+ The templates are now stably bound by RNAPs and their transcripts , and so migrate more slowly ( lane 2 ) . 
+ Adding CTP ( to 0.5 mM ) allows RNAPs to ` run-off ' and vacate most templates , which migrate once again as free DNA ( lane 3 ) . 
+ DNase treatment shows that RNA makes only a minor contribution to the observed fluorescence ( lane 4 ) , while additional RNase treatment removes all nucleic acid ( lane 5 ) . 
+ B . 
+ The fraction of template occupied by T7 RNAP in ( B ) quantified using AIDA image-analysis software ( Raytest ) . 
+ For each condition , the amount of occupied template was calculated by subtracting the amount of freely-migrating DNA ( as judged by band intensity ) from the total amount of DNA ( found in lane 1 ) . 
+ Repeating the experiment in the buffer KGB instead of LS1 yielded similar results ( data not shown ) . 
+ ( TIF ) duced during the ` pulldown ' assay . 
+ A transcription reaction ( in KGB ) containing 0.1 mM biotinylated 452-bp template , 0.1 mM 290-bp template , and 0.3 mM T7 RNAP was initiated by the addition of ATP+GTP + [ 32P ] UTP ( 0.25 mCi/mL ) to 0.5 mM in the presence or absence of beads ( 4.56108 beads/mL ) . 
+ After 30 s , reactions were halted by the addition of formamide to 80 % ( v/v ) , and subjected to denaturing urea-PAGE . 
+ Total [ P ] RNA was 32 then visualized using a phosphoimager screen ( Molecular Dynamics ) and a FLA5000 imager ( Fuji ) . 
+ B. Quantitation of the 32 P incorporated into the transcripts in ( A ) . 
+ Initiation rates on the 452-bp and 290-bp templates can be inferred from the intensities of the corresponding transcripts ( which measured 382 bp and 243 bp , respectively ) . 
+ When transcript length is accounted for , we see that RNAPs initiated on the 452-bp template at ,0.76 the rate at which they initiated on 290-bp templates . 
+ We conclude that when the majority of 290-bp templates are occupied , a similar fraction of the 452-bp templates will also be occupied . 
+ species model . 
+ ( i ) Representative autocorrelation curve ( blue , upper panel ) recorded using FCS in the experiment of Fig. 2Aiv . 
+ A reaction containing 1.75 mM T7 RNAP , 2 nM labeled 70-bp template , and 0.54 mM unlabeled 452-bp template , was initiated by the addition of ATP+UTP+GTP . 
+ After RNAPs had halted at the first C residues ( 30 s ) , the autocorrelation function of the labeled templates was determined by FCS . 
+ ( ii ) A fit of the autocorrelation function produced in ( i ) using a two-dimensional one-species model ( red , upper panel ; equation 1 ) , and yielding a diffusion time of 4.1 ms. Residuals ( red , lower panel ) are minor , suggesting that the model used to fit the curve is well-suited to the sample ( see Materials and methods ) . 
+ ( TIF )
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/22821568.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/22821568.txt 0 → 100644
View file @27818a9
+ RNAsnap : a rapid, quantitative and inexpensive,
+ ABSTRACT TM 
+ Historically working with bacterial RNA has been technically difﬁcult because of its highly labile nature and the complicated procedures used for its isolation . 
+ Early RNA extractions employed guanidium isothiocyanate to lyse cells and denature proteins , while the RNA was isolated using a cesium chloride cushion and 
+ Nucleic Acids Research , 2012 , Vol . 
+ 40 , No. 20 e156 doi :10.1093 / nar/gks680 ultracentrifugation ( 1 ) . 
+ Subsequently , a hot phenol isolation method replaced cesium chloride gradients ( 2 ) . 
+ However , RNA extractions using hot phenol had signiﬁcant problems due to both the toxicity of the phenol and because the RNA obtained was not consistently of high quality ( 3 ) . 
+ Subsequently , a protocol was developed that combined guanidium isothiocyanate and phenol that yielded much more reproducible results compared to earlier methods ( 4 ) . 
+ As the interest in RNA metabolism in bacteria grew , many companies developed kits making it easier for any laboratory to isolate total RNA . 
+ These kits , which are relatively expensive , can be very useful for isolating RNA enriched for speciﬁc sizes , since the kits vary greatly in the chemistry and/or mechanics used to lyse cells , denature and remove proteins and to actually isolate the RNA . 
+ The use of detergents to promote cell lysis led to the discovery of a cationic detergent ( Catrimox-14 , Iowa Biotechnology Corp. , Coralville , IA , USA ) that both aided cell lysis and captured RNA and DNA by precipitation ( 5,6 ) . 
+ This method had the major advantage of not using phenol and provided good yields of high-quality RNA ( 7,8 ) . 
+ However , shortly after Qiagen acquired the patent rights to Catrimox-14 the detergent was withdrawn from the market . 
+ Subsequently , a variant of the Catrimox-14 isolation procedure was developed using a slightly different surfactant trimethyl ( tetradecyl ) ammonium bromide ( called Catrimide ) , which is a very effective and inexpensive substitute ( 9 ) . 
+ As we initiated a detailed study of rRNA processing in Escherichia coli , we wanted to use an RNA isolation procedure that could give us a rapid and accurate assessment of all RNA species within the cell . 
+ However , all current RNA isolation procedures contain multiple transfer steps , leading to reduced sample recovery . 
+ Furthermore , although each manufacturer provides speciﬁcations for the yield and RNA quality resulting from their procedure , there is no published side-by-side comparison of the various methods in terms of total RNA yield , RNA quality , size distribution of the isolated RNA molecules , time to carry out the procedure and cost per sample . 
+ In fact , upon examination of the various RNA samples we obtained using various kits and our own in-house experience with the Catrimide/LiCl method , it was apparent that none of the current RNA isolation methods provide an accurate representation of the intracellular RNA pools , since each method appears to selectively enrich for either large or small RNAs relative to the levels of medium sized species . 
+ Thus , depending on the isolation method used certain size classes of RNA were either enriched or depleted relative to the total RNA population . 
+ We describe here a new RNA isolation procedure TM ( called RNAsnap , for Simple Nucleic Acid Puriﬁcation ) that quantitatively recovers > 99 % of all RNA species in one step . 
+ The isolation method is remarkably simple , rapid , reproducible and inexpensive . 
+ With Gram-negative bacteria , it yields high-quality RNA in < 15 min that can be used directly for both polyacrylamide and agarose northern analysis . 
+ MATERIALS AND METHODS Bacterial strains
+ Escherichia coli strain MG1693 ( thyA715 rph-1 ) ( provided by the E. coli Genetic Stock Center , Yale University ) was grown with shaking at 37 C in Luria broth supplemented with thymine ( 50 mg/ml ) to exactly 50 Klett units above background ( No. 42 green ﬁlter or OD600 0.4 ) , which is 10 cfu/ml . 
+ Other strains were generously provided by 8 the Departments of Microbiology and Marine Sciences at the University of Georgia . 
+ RNAsnap RNA isolation method for Gram negative TM bacteria
+ One milliliter of bacterial culture ( 10 cells ) was centri-8 fuged at 16 000g for 30 s and the supernatant was removed by aspiration . 
+ The cell pellet was stored in dry ice until ready for extraction . 
+ Cell pellets were then resuspended in 100 ml of RNA extraction solution [ 18 mM EDTA , 0.025 % SDS , 1 % 2-mercaptoethanol , 95 % formamide ( RNA grade ) ] by vortexing vigorously . 
+ The cells were lysed by incubating the sample at 95 C in a sand bath for 7 min . 
+ The cell debris was pelleted by centrifuging the warm sample at 16 000 g for 5 min at room temperature . 
+ The supernatant was carefully transferred to a fresh tube without disturbing the clear gelatinous pellet . 
+ RNAsnap RNA isolation method for Gram positive bacteria and yeast
+ To isolate RNA from organisms with tough cell walls such as yeast ( Saccharomyces cerevisiae ) and Gram-positive bacteria ( Bacillus subtilis ) , the following modiﬁcation to the RNAsnap was added . 
+ The pellet from 1 ml of TM cells ( 10 ) was resuspended in 100 8 ml of RNA extraction solution . 
+ The resuspended cells were transferred to a 0.5 ml screw cap tube containing 200 ml of chilled zirconia beads ( from Ambion Ribopure kit ) . 
+ The cells TM were beaten on a vortex mixer with a small tube adapter for 10 min . 
+ The samples were then treated as described above . 
+ Catrimide/LiCl RNA isolation method
+ This procedure was performed similarly to the method described by Mohanty et al. ( 9 ) , but was modiﬁed for one ml samples . 
+ Brieﬂy , 1 ml of bacterial culture was added to 500 ml of stop buffer , which was previously frozen horizontally in a 1.5 ml microcentrifuge tube . 
+ The cells were immediately mixed by vortexing vigorously , and then pelleted by centrifugation at 5000g for 5 min at 4 C . 
+ The supernatant was carefully removed by aspiration , and the pellet was suspended in 200 ml of lysis buffer by vortexing . 
+ The sample was then placed into a dry-ice ethanol slurry for 90 s , and followed by 90 s of incubation in a 37 C water bath . 
+ This freeze -- thaw cycle was repeated four times in total . 
+ After the fourth 37 C incubation , the sample was transferred into the dry ice -- ethanol slurry in order to refreeze the solution , and 35 ml of 20 mM acetic acid was then added to the frozen solution . 
+ The sample was then placed back into the 37 C water bath , followed by addition of 200 ml of 10 % Catrimide [ ( trimethyl ( tetradecyl ) ammonium bromide ) ] when the sample was almost completely thawed . 
+ The sample was brieﬂy vortexed and centrifuged at 16 000g for 10 min at 4 C . 
+ The supernatant was carefully removed by aspiration , and the pellet was suspended in 500 ml of 2 M LiCl in 35 % ethanol by vortexing vigorously . 
+ The sample was then incubated at room temperature for 5 min , followed by centrifugation at 16 000g for 10 min at 4 C . 
+ The supernatant was carefully removed by aspiration and the pellet was resuspended in 500 ml of 2 M LiCl in water followed by a repeat centrifugation . 
+ The pellet was brieﬂy vortexed in 75 % ethanol and centrifuged at 8000g for 5 min at 4 C . 
+ The ethanol was removed by aspiration , and the tube was brieﬂy centrifuged for a second time in order to collect and remove the remaining ethanol with a pipette . 
+ The pellet was allowed to air dry at room temperature for 10 min and subsequently hydrated by the addition of 100 ml of RNase-free water and incubated at room temperature for 10 min . 
+ The tube was vigorously vortexed , centrifuged at maximum force ( 21 000g ) at room temperature for 1 min to pellet cell debris , and the RNA containing supernatant was transferred to a new tube . 
+ These RNA extraction procedures were done according to the manufacturer 's recommendations and protocols speciﬁc for the number of E. coli cells and conditions in which they were grown . 
+ Any step described as optional , but that might improve the quality or yield of RNA was followed . 
+ No optional DNase I treatment was performed on any RNA sample used in this study . 
+ Every effort was made to ensure that the extracted RNA using each method met the manufacturer 's guidelines in terms of overall RNA yield , A260/A280 ratio and RNA quality 
+ RNA quantity and A260/A280 ratios were determined using TM a Nanodrop 2000c ( Thermo Scientiﬁc ) . 
+ The amount of TM RNA in the RNAsnap supernatants was determined by A260 , using the RNA extraction solution as a blank . 
+ RNA quality was assessed by running 250 ng of each RNA sample , as determined by A260 , on a 1.2 % agarose -- 0.5 TBE gel with ethidium bromide , run at 5 v/cm for 1 h. RNA samples were denatured prior to loading by suspension in Gel Loading Buffer II ( 95 % formamide , 18 mM EDTA and 0.025 % each of SDS , xylene cyanol and bromophenol blue , Ambion ) and heating for 5 min at 95 C. Approximately 100 ng of each RNA sample were subsequently analyzed on a Bioanalyzer RNA chip ( Agilent Technologies ) using the manufacturer 's recommendations . 
+ Quantitative determination of RNA recovery using the TM RNAsnap method
+ In order to estimate the amount of RNA remaining in the pellet , we performed an RNAsnap extraction using TM 10 ml of E. coli cells ( 10 cells/ml ) using 500 8 ml of RNA extraction solution . 
+ After the supernatant was recovered and placed into a separate tube , an additional 500 ml of room temperature RNA extraction solution was gently added to the gelatinous pellet in order to wash the pellet of any remaining RNA containing supernatant , which could not be initially removed without disturbing the pellet . 
+ The tube was then spun at 16 000g for an additional 5 min and the supernatant was again removed without disturbing the pellet . 
+ The pellet was then suspended in 100 ml of RNase-free water . 
+ Subsequently , 100 ml of acidic phenol/chloroform ( Ambion , 5:1 solution , pH 4.5 ) was added and the tube was vortexed vigorously for 30 s . 
+ The tube was then centrifuged at 16 000g for 5 min and the aqueous phase was transferred to a fresh tube and sodium acetate/ethanol precipitated . 
+ The precipitated RNA was hydrated in 20 ml of RNase-free water . 
+ After the RNA was fully dissolved , the total amount of RNA was determined based on A260 and was compared with the amount of RNA in the ﬁrst 500 ml volume of RNA extraction solution recovered from the pellet . 
+ Northern analysis
+ Two types of northern blots were performed in this study , 6 % polyacrylamide / 8.3 M urea 1 TBE gels for small RNA species ( lpp , cspE , 5S rRNA , ryhB and pheU/pheV ) and 1.2 % Agarose 1 MOPS gels for larger species ( rpsJ operon , adhE and ompF ) . 
+ Northern analysis was performed as described in Stead et al. ( 10 ) . 
+ The RNA isolated by the RNAsnap method was used directly TM for polyacrylamide gels after dilution to the desired loading volume in a formamide-based RNA loading dye . 
+ For agarose northerns , the RNA in the extraction solution was brought up to a total volume of 10 ml with RNAsnap RNA extraction solution . 
+ Subsequently , TM 4 ml of loading solution ( 3.8 ml of any formamide-based RNA loading dye along with 0.2 ml of 37 % formaldehyde ) were added . 
+ The samples were heated at 65 C for 5 min and placed on ice for 1 min followed by brief centrifugation before loading onto a 1.2 % Agarose 1 MOPS gel , similar to the method of Vincze and Bowra ( 11 ) . 
+ Subsequently the RNA was transferred to a positively charged nylon membrane by electroblotting ( 9 ) . 
+ The northern membranes were subsequently probed with multiple P-labeled oligonucleotide probes such 32 that the signals for the lpp , 5S rRNA and pheU/V transcripts were simultaneously visualized on a single membrane ( similarly for cspE/ryhB and adhE/ompF ) . 
+ This approach helped to determine if loading errors could account for differences in signals between the two replicates , as the percentage difference should be the same for each of those RNA species probed in the same lanes , unless the RNA extraction method used caused non-quantitative recovery of a particular RNA species . 
+ It was also possible that a technical error during the transfer of RNA from the gel to the nitrocellulose membrane accounted for a difference between replicates , but this type of error is extraordinarily rare with polyacrylamide northerns in our hands , and occurs infrequently with agarose northerns . 
+ Sodium acetate/ethanol precipitation method
+ The RNAsnap RNA sample was ﬁrst diluted with four TM volumes of water followed by addition of 1/10 volume of 3 M sodium acetate , pH 5.2 and the sample was mixed by pipetting . 
+ Three volumes of 100 % ethanol were then added , the sample mixed brieﬂy by vortexing and incubated for at least 60 min at 80 C . 
+ The tube was centrifuged at 16 000g for 30 min at 4 C . 
+ The supernatant was carefully removed by aspiration and the pellet was washed with 250 ml of 75 % ethanol , followed by centrifugation at 8000g for 5 min at 4 C . 
+ The supernatant was removed via aspiration and the tube was brieﬂy centrifuged again . 
+ Following the removal of any remaining ethanol , the pellet was air dried . 
+ The pellet was resuspended in water and centrifuged at 16 000g to pellet any remaining water insoluble proteins and the RNA containing supernatant was transferred to a fresh tube . 
+ Reverse transcriptase–polymerase chain reaction
+ SK4390 ( rph-1 DrppH thyA715Kmr ) was grown with shaking at 37 C in Luria broth supplemented with thymine ( 50 mg/ml ) and kanamycin ( 25 mg/ml ) until 20 Klett units above background ( No. 42 green ﬁlter ) . 
+ The culture was then shifted to 44 C for 2 h . 
+ The culture was maintained at 80 Klett units above background by making periodic dilutions with pre-warmed Luria broth . 
+ RNA was extracted using the RNAsnap procedure described TM above or the TRIzol Max method according to manu-TM facturer 's instructions ( Invitrogen ) . 
+ Both RNA samples were subjected to sodium acetate/ethanol precipitation , DNA removal with the DNA-free kit ( Ambion ) and a TM ﬁnal sodium acetate/ethanol precipitation . 
+ Five micrograms of each RNA sample was reverse transcribed using a lpp gene-speciﬁc primer ( LPP538 : CAGGTACTA TTACTTGGGGTAT ) using SuperScript III reverse transcriptase ( Invitrogen ) according to the manufacturer 's instructions . 
+ The cDNAs were ampliﬁed using tw gene-speciﬁc primers ( LPP538 and LPPPCR1 : GCTACAT GGAGATTAACT ) using GoTaq Green Master Mix ( Promega ) . 
+ The polymerase chain reaction ( PCR ) products were run on a 2 % agarose -- Tris -- acetate -- EDTA gel and visualized with ethidium bromide in a G-Box ( Syngene ) . 
+ For additional conﬁrmation that the lpp cDNA had been ampliﬁed , Southern blot analysis was performed by transferring the PCR products to a Nytran SuPerCharge membrane using a Turboblotter ( Schleicher and Schuell ) . 
+ The membrane TM was probed with P-5 32 0-end-labeled lpp speciﬁc oligo-nucleotide ( LPP562A : CGCTTGCGTTCACGTCG ) and scanned with a Phosphorimager ( Storm 840 , GE TM Healthcare ) ( data not shown ) . 
+ Primer extension analysis
+ Primer extension analysis was performed as described by Stead et al. ( 10 ) with an oligonucleotide primer speciﬁc to the 50-end of mature 23S rRNA , which is identical for each of the seven E. coli rRNA operons ( 50-CGTCCTTCATC GCCTCTGACT-30 ) . 
+ An amount of 250 ng of total RNA ( isolated using the RNAsnap procedure ) was used for TM the reverse transcription reactions . 
+ Only half of each reaction mixture was run on the gel . 
+ The sequencing ladder was derived from the rrnB operon . 
+ RESULTS TM
+ Development of RNAsnap , a rapid and highly quantitative RNA isolation method
+ In most isolation methods , the amount of total RNA present is initially determined based on either absorbance at 260 nm ( A260 ) or through the use of ﬂuorescent dyes . 
+ Although these approaches provide an accurate estimate of the RNA present in a particular sample , the relative amounts of each RNA species can vary widely depending on the distribution of each RNA species . 
+ These variations are directly related to the particular isolation method employed due to the inherent properties of the matrices used in each procedure , which are biased towards either large ( rRNA or other large mRNAs ) or small ( tRNAs and sRNAs ) RNA species ( see below ) . 
+ In order to help address the problems of both representative and quantitative recovery , we sought to develop a one-step RNA extraction procedure that could be carried out in a single tube in which total RNA was quantitatively recovered in the supernatant and the bulk of the DNA and proteins were left in the pellet . 
+ We hypothesized that such an approach would both greatly simplify RNA isolation and would provide a more accurate overview of the actual intracellular distribution of all RNA species , since any losses associated with multiple handling steps , such as phenol/chloroform extraction , would be eliminated . 
+ During the development of the RNAsnap method , TM we took advantage of the fact that E. coli cells were easily lysed in a boiling solution , such as used in colony PCR methods . 
+ In addition , it is standard practice to denature RNA in a formamide-based loading solution prior to its separation on either polyacrylamide or agarose gels . 
+ We combined aspects of these two techniques to develop the formamide-based RNA extraction solution described here ( see ` Materials and Methods ' section ) . 
+ We observed that exponentially growing E. coli cells were rapidly lysed when suspended in this solution and heated at 95 C for 7 min . 
+ Following centrifugation for 5 min at 16 000g , the RNA was in the supernatant and the gelatin-ous pellet contained protein , cell debris and the majority of the DNA . 
+ The RNA was quantiﬁed based on A260 by ﬁrst blanking a spectrophotometer with the RNA extraction solution . 
+ It was important that the RNA extraction solution was made fresh and was also used as the blank , since the A260 of the extraction solution itself changed over time after the addition of 2-mercaptoethanol . 
+ A one ml sample of an early exponential culture of 8 E. coli ( 10 cells ) yielded 60 ± 3 mg of total RNA with the entire procedure taking < 15 min ( Table 1 ) . 
+ The RNAsnap isolated RNA was suitable , without TM any further treatment , for northern analysis using either polyacrylamide or agarose gels ( Figure 2 ) . 
+ The genomic TM DNA contamination in the RNAsnap sample was comparable to that obtained with the other isolation methods ( data not shown ) . 
+ However , although minor genomic DNA contamination does not interfere with northern blot analysis and some enzymatic reactions , it can interfere during experiments involving reverse transcription and RNAseq analysis . 
+ Thus , RNAsnap RNA was TM subject to DNase I treatment using the DNA-free kit TM ( Ambion ) following sodium acetate/ethanol precipitation for experiments involving primer extension and reverse transcriptase ( RT ) -- PCR ( see below and ` Materials and Methods ' section ) . 
+ The RNAsnap method recovers >99% of all RNA TM species
+ Even though the RNAsnap procedure was rapid and TM yielded more total RNA per cell than any other method tested ( Table 1 ) , it was important to determine how much RNA remained in the gelatinous pellet . 
+ Accordingly , we scaled up the isolation to 10 ml of culture ( 10 cells ) , but 9 again carried out the protocol in a single tube . 
+ Following removal of the supernatant containing the RNA , the pellet was gently washed once with the extraction solution at room temperature . 
+ After a subsequent centrifugation , the pellet was resuspended in water and extracted using acidic phenol/chloroform ( See ` Materials and Methods ' section ) . 
+ The aqueous phase was precipitated with sodium acetate/ethanol and resuspended in water . 
+ In each of two replicates , 2.5 mg of high-quality RNA was recovered from the re-extracted pellet , while > 700 mg of RNA were found in the original supernatant , indicating that the efﬁciency of RNA recovery from E. coli using the RNAsnap method was > 99 % ( data not shown ) . 
+ An TM amount of 250 ng of RNA from both the re-extracted pellet and the original supernatant were run on an agarose gel to conﬁrm the presence , quality and quantity of the RNA . 
+ Interestingly , the proﬁle of the various abundant RNA species ( tRNAs , 5S rRNA , sRNAs , 16S rRNA and 23S rRNA ) was identical between the two RNA samples upon visual inspection of the agarose gel ( data not shown ) . 
+ Analysis of RNAsnap isolated RNA TM
+ In an attempt to determine the size distribution of the transcripts present in the RNA isolated by the RNAsnap method , we compared the RNA samples TM obtained using our previously optimized Catrimide/LiCl method ( 9 ) and three of the most widely used commercially available RNA isolation kits [ TRIzol Max TM Bacteria ( Invitrogen ) , RNeasy Protect Bacteria ( Qiagen ) and RiboPure Bacteria ( Ambion ) ] . 
+ Each ex-TM traction method was tested using at least two independent biological replicates and two or more technical replicates per biological replicate . 
+ The quality of each RNA sample was assessed using three main criteria : purity as determined by a spectrophotometer ( A260/280 ratio ) ; the 23S rRNA/16S rRNA ratio as determined by Bioanalyzer analysis ( Agilent Technologies ) ; and an RNA integrity number ( RIN ) derived from the Bioanalyzer analysis ( Table 2 ) . 
+ The RIN number ( stand-ardization of RNA quality control ) was developed using total eukaryotic RNA , based on a numbering system of 1 -- 10 , with 1 being the most degraded RNA and 10 being the most intact ( Agilent Technologies ) . 
+ It has been demonstrated that with bacterial RNAs a RIN value < 7 led to signiﬁcant variations in data ( 12 ) . 
+ As shown in Figure 1 , the quality of the RNA derived using the RNAsnap method was as good or better than TM RNA obtained by the other methods tested based on both bioanalyzer analysis ( Figure 1A and Table 2 ) and agarose gel electrophoresis ( Figure 1B ) . 
+ The ratio of E. coli 23S to 16S rRNA in the samples isolated by the RNAsnap TM method was 1.8 , which came closer to the theoretical ratio of 1.88 ( 2904 nt/1541 nt ) than any other method tested ( Table 2 ) . 
+ The A260/280 ratio of 2.0 for all the RNA preparations ( Table 2 ) indicated that all of the samples were relatively pure with the possible exception TM of the RNAsnap sample . 
+ Normally , an A260/280 ratio of 1.8 -- 2 is indicative of highly puriﬁed RNA when resuspended in a buffered solution like Tris-EDTA , pH 8.0 . 
+ However , this ratio is highly dependent on the pH and the ionic strength of the solution ( 13 ) . 
+ The pH of the TM RNAsnap RNA sample was 9.4 . 
+ As predicted , resuspension of the RNA in RNase-free water after a sodium acetate/ethanol precipitation signiﬁcantly improved the TM ratio ( Table 2 ) . 
+ Additionally , diluting the RNAsnap RNA sample 4-fold with RNase-free water improved the A260/280 ratio to 1.9 ( data not shown ) , which was compar-able to the other methods shown in Table 2 . 
+ Thus , the low 260/280 ratio seen with the RNAsnap RNA sample TM most likely resulted from the presence of formamide . 
+ Interestingly , there were signiﬁcant differences in terms of the amounts of the rRNAs and tRNAs present TM ( Figure 1 ) as well as RIN ( Table 2 ) . 
+ The RNAsnap , TM Catrimide/LiCl , RNeasy and Ribopure methods yielded comparable amounts of 16S and 23S rRNAs , which were signiﬁcantly higher than what was observed with the TRIzol Max Bacteria method ( Table 2 ) . 
+ In TM contrast , the TRIzol Max Bacteria method yielded th TM highest concentrations of 5S rRNA and tRNAs , followed by the RNAsnap method ( Figure 1A ) . 
+ TM The obvious differences in the distribution of RNAs among of the most abundant RNA size classes obtained from the various RNA isolation methods ( Figure 1 ) led us to determine the relative abundances of speciﬁc RNA molecules ranging in size between 76 and 5700 nt using northern analysis . 
+ Since the RNAsnap method re-TM covered > 99 % of total cellular RNA , we calculated the abundance of each transcript derived from the other methods ( Figure 2 ) relative to what was obtained with the RNAsnap RNA ( Table 3 ) . 
+ Transcripts > 1000 nt TM ( ompF , adhE and the rpsJ operon ) were less abundant in the TRIzol Max RNA compared to any of the other TM methods ( Table 3 ) . 
+ In fact , the recovery of the larger transcripts decreased gradually as a function of increased size leading to very low recovery of the 5700 nt rpsJ operon mRNA ( the largest transcript tested ) . 
+ Furthermore , the variability from one isolation to another using the TRIzol Max method was also very high for larger TM transcripts ( Table 3 , higher standard deviations ) . 
+ In contrast , all the other RNA isolation methods contained the larger species at levels that were 1.6 - to 4.4-fold higher than the RNAsnap RNA . 
+ TM At the lower end of the RNA size spectrum , i.e. transcripts < 300 nt ( pheU/pheV , ryhB , 5S rRNA ) , the RNeasy Protect Bacteria , RiboPure and Catrimide / TM LiCl methods yielded signiﬁcantly less RNA with up to 20-fold decreases for some species ( Figure 2 and Table 3 ) . 
+ The one exception was the ryhB small regulatory RNA , which was present in comparable amounts in all ﬁve RNA samples ( Table 3 ) . 
+ The TRIzol Max sample consist-TM ently had between 1.4 - and 2-fold higher levels of all three small RNAs tested ( Table 3 ) . 
+ For the two species in the 300-nt range ( cspE and lpp ) all ﬁve methods gave compar-able levels ( Table 3 ) , within experimental error . 
+ Taken together , it is clear that each of the current RNA isolation methods has distinct biases regarding transcript size . 
+ Thus while the RNAsnap method appeared to be TM less efﬁcient in isolating larger transcripts compared to the RNeasy Protect Bacteria , RiboPure and Catrimide / TM LiCl methods , the higher abundance of larger RNA mol-ecules was accompanied by underrepresentation of the smaller molecules ( Table 3 ) . 
+ Similarly , higher levels of small RNAs ( Table 3 ) as well as thick bands of tRNA and 5S rRNA in the TRIzol Max RNA samples TM 
+ Generality of RNAsnap RNA isolation method TM
+ Isolation of RNA from stationary phase cells using current methods has been difﬁcult ( 9 ) . 
+ In contrast , the RNAsnap method worked equally well with either TM late stationary phase or exponential phase cells ( data not shown ) . 
+ In addition , the RNAsnap procedure TM was easily and quantitatively scaled up to handle 10 ml of culture ( 10 cells ) for situations where larger amounts 9 of RNA were needed . 
+ Furthermore , the RNAsnap TM RNA could be used directly in both polyacrylamide / urea and agarose gels without further puriﬁcation ( Figure 2 ) . 
+ Although all the data shown here involved E. coli RNA , we have used the RNAsnap method to success-TM fully isolate high-quality RNA from a number of other Gram-negative bacteria including : Alcalingenes faecalis ( ATCC 8750 ) ; Serratia marcescens ( ATCC 14756 ) ; Shigella ﬂexneri ( ATCC 9199 ) ; Pseudomonas aeruginosa ( ATCC 27853 ) ; Salmonella enterica ( ATCC 29629 ) ; Ruegeria pomeroyi ( ATCC 700808 ) ; and Myxococcus xanthus DK1622 . 
+ Additionally , using a slightly modiﬁed version of the RNAsnap method ( see TM ` Materials and Methods ' section ) in which zirconium bead homogenization was added for lysis efﬁciency , high-quality RNA was obtained from two Gram-positive bacteria : Bacillus subtilis ( ATCC 6633 ) and Staphylococcus aureus ( ATCC 6538 ) . 
+ The modiﬁed method also worked well with both Saccharomyces cerevisiae and Kluyveromyces lactis . 
+ Using RNAsnap for primer extension and RT–PCR TM experiments
+ The RNAsnap isolated RNA was further tested for its TM functionality in commonly applied techniques such as RT -- PCR , RNA ligation and primer extension analysis . 
+ It should be noted that for all applications involving enzymatic reactions , the RNA from the RNAsnap TM method was further puriﬁed using a sodium acetate / ethanol precipitation step ( see ` Materials and Methods ' section ) . 
+ Speciﬁcally , we compared RNA samples isolated using either the RNAsnap or the Trizol TM 
+ Max RNA isolation procedures in an RT -- PCR experi-TM ment that ampliﬁed the E. coli lpp mRNA . 
+ As shown in Figure 3 , there was 1.6-fold more lpp mRNA in the Trizol Max isolated RNA compared to the TM RNAsnap isolated RNA after 10 cycles , which reﬂected TM the relative abundances shown in Table 3 . 
+ The PCR amp-liﬁcation reached a plateau after 10 cycles ( Figure 3 ) . 
+ TM In addition , RNAsnap isolated RNA was used in 0 0 determining the 5 - and 3 - ends of the pheU and pheV tRNA transcripts ( Bowden , K. , Mohanty , B. K. and Kushner , S.R. , manuscript in preparation ) by initially ligating the 50 - and 30-ends of the transcripts ( 14 ) . 
+ TM RNAsnap isolated RNA has also been used successfully in various primer extension experiments . 
+ For example , in the experiment shown in Figure 4 , we have examined the 50-termini of 23S rRNA in rnc-14 and wild-type strains . 
+ DISCUSSION
+ We have described here a simple , rapid and reproducible RNA isolation procedure ( RNAsnap ) that yields highTM quality RNA from Gram-negative bacteria ( Figures 1 and 2 ) , Gram-positive bacteria and yeast that can be used for northern analysis without any further puriﬁcation . 
+ As shown in Table 1 , not only did the RNAsnap TM method provide the highest total RNA yield of all ﬁve isolation procedures ( 1.7 - to 4-fold higher ) , but it was also the fastest and least expensive . 
+ Furthermore , the method ensures the isolation of the widest range of RNA species ( Table 1 ) . 
+ Using eight transcripts ranging in size between 76 and 5700 nt , we have demonstrated that the RNAsnap isolation procedure is TM an unbiased method that likely preserves the in vivo distribution of all RNA species , thus providing the most accurate representation of intracellular RNA pools compared to any of the other isolation methods tested . 
+ Furthermore , it works equally well with exponential and stationary phase cultures . 
+ For downstream applications such as primer extension analysis , RNA ligation and RT -- PCR , further puriﬁcation of RNAsnap isolated RNA using sodium acetate / TM ethanol precipitation was very straightforward . 
+ A faster but signiﬁcantly more expensive option was the RNeasy kit ( or similar silica-column-based extraction kit ) or RiboPure kit , which can be used to recover TM the RNA from the formamide-based RNA extraction solution . 
+ Using either column-based method following the RNAsnap extraction yielded extremely high-quality TM 
+ RNA suitable for any type of highly-sensitive RNA analysis ( data not shown ) . 
+ However , the drawback to using a column , as demonstrated in this study ( Table 3 , RNeasy Protect Bacteria and RiboPure Bacteria ) , was TM the non-quantitative recovery of RNA species depending on their size and possible secondary/tertiary structure of the RNA molecule . 
+ With the advent of qRT -- PCR , microarrays and next generation sequencing , genome-wide expression proﬁling has become an indispensible tool to decipher biological systems . 
+ However , at the heart of the most robust and sophisticated gene-expression analysis lays the quality and reproducibility of the extracted RNA pool . 
+ For example , if a research group were to use a column-based RNA extraction methodology , such as those tested in this study , to examine maturation of small RNAs < 200 nt , the results of the study would be ﬂawed due to non-quantitative recovery of RNA molecules < 200 nt using the RNA extraction methods ( Table 3 ) . 
+ Alternatively , if a group were to examine the relative abundance of a 1000 nt transcript compared with a 5000 nt transcript , the ratio between the two abundances would vary considerably based on the RNA extraction methodology employed . 
+ More importantly , it is clear that no RNA isolation methodology ( with the exception TM of the RNAsnap method ) is suitable for the study of all types and sizes of RNA molecules in the same experiment . 
+ Overall , the quality and representative recovery offered by RNAsnap method is unmatched by the other TM methods tested in this study and is uniquely suited for highthroughput gene-expression analyses . 
+ FUNDING
+ Funding for open access charge : The National Institutes of General Medical Sciences [ GM81554 to S.R.K. ] .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/23071782.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/23071782.txt 0 → 100644
View file @27818a9
+ Network of Phosphate Starvation in Escherichia coli
+ Abstract 
+ The phosphate starvation response in bacteria has been studied extensively for the past few decades and the phosphatelimiting signal is known to be mediated via the PhoBR two-component system . 
+ However , the global DNA binding profile of the response regulator PhoB and the PhoB downstream responses are currently unclear . 
+ In this study , chromatin immunoprecipitation for PhoB was combined with high-density tiling array ( ChIP-chip ) as well as gene expression microarray to reveal the first global down-stream responses of the responding regulator , PhoB in E. coli . 
+ Based on our ChIP-chip experimental data , forty-three binding sites were identified throughout the genome and the known PhoB binding pattern was updated by identifying the conserved pattern from these sites . 
+ From the gene expression microarray data analysis , 287 differentially expressed genes were identified in the presence of PhoB activity . 
+ By comparing the results obtained from our ChIP-chip and microarray experiments , we were also able to identify genes that were directly or indirectly affected through PhoB regulation . 
+ Nineteen out of these 287 differentially expressed genes were identified as the genes directly regulated by PhoB . 
+ Seven of the 19 directly regulated genes ( including phoB ) are transcriptional regulators . 
+ These transcriptional regulators then further pass the signal of phosphate starvation down to the remaining differentially expressed genes . 
+ Our results unveiled the genome-wide binding profile of PhoB and the downstream responses under phosphate starvation . 
+ We also present the hierarchical structure of the phosphate sensing regulatory network . 
+ The data suggest that PhoB plays protective roles in membrane integrity and oxidative stress reduction during phosphate starvation . 
+ Introduction
+ Phosphate participates in many important cellular processes [ 1 ] such as energy metabolism , and the construction of genetic molecules and organelles including cell membranes . 
+ Since the concentration of phosphate is usually low in natural environments , many bacteria have evolved to sense this essential nutrient and to adapt to phosphate-limiting conditions . 
+ Several transcriptomics and proteomics studies had been done to reveal bacteria adaptation in a diverse range of bacteria including Bacillus subtilis [ 2 ] , Corynebacterium glutamicum [ 3 ] , Escherichia coli [ 4,5 ] , Prochlorococcus marinus [ 6 ] , Sinorhizobium meliloti [ 7 ] and Vibrio cholera [ 8 ] . 
+ In E. coli , phosphate sensing had been reported to be performed by a seven-component apparatus [ 9 ] . 
+ The sensor kinase of this machinery , PhoR , plays an important role to pass the limited environmental phosphate signal to its response regulator , PhoB . 
+ During phosphate starvation , PhoR dimer is autophosphorylated on one histidine residue of each monomer . 
+ This phospho-PhoR dimer has the kinase activity that can transfer the two phosphoryl groups to the aspartate residue in each of the PhoB monomers [ 1 ] . 
+ The phospho-PhoB dimer is the active form of the transcriptional factor that recognizes the previously characterized PhoB recognition consensus sequence CTGTCAT-A ( AT ) A ( TA ) - CTGT ( CA ) A ( CT ) ( Pho box ) and regulates its target genes [ 1,10,11 ] . 
+ In response to phosphate limitation , PhoB binds to the Pho box and transmits the phosphate-limiting signal to 
+ To date , thirty-one responding genes composed of nine transcription units are known to be regulated by PhoB , while several other genes lack direct evidence of PhoB binding in E. coli [ 9 ] . 
+ However , previously reported proteomics data of E. coli indicate that the expression of around 400 proteins varied in a comparison between excess and limited phosphate conditions [ 5 ] . 
+ Thus , studying the genome-wide regulation exercised by PhoB in response to phosphate starvation is required to understand the underlying mechanisms of bacterial adaptation to phosphate starvation . 
+ In this study , we combined ChIP-chip and gene expression microarray experiments , for the first time , to present the global responses of E. coli to phosphate starvation through the PhoR / PhoB two-component system . 
+ This integrative genome-wide approach allowed us to identify 54 PhoB binding targets and 287 differentially expressed genes in the presence of PhoB activity during phosphate starvation . 
+ These results indicate that PhoB directly regulates a group of genes which contain distinct transcriptional regulators and further indirectly influences other genes . 
+ A specific group of genes involved in the functions of transportation and metabolism for membrane protection have also been identified . 
+ Results and Discussion
+ Genome-wide mapping of PhoB binding profiles
+ We applied the ChIP-chip techniques to measure the binding of PhoB across the whole genome under the phosphate-limiting condition ( Figure 1 ) . 
+ The PhoB-FLAG expressing strain 
+ ( MG1655_PhoB_FLAG ) and the wild type strain ( MG1655 ) , which contains no FLAG tag , were used as a comparison for the recognition of anti-FLAG antibody ( Table 1 ) . 
+ The activity of our PhoB-FLAG fusion protein in the MG1655_PhoB_FLAG strain was nearly the same as the activity of PhoB in the MG1655 wild type strain ( see Figure S1 ) . 
+ In this design , the genome-wide map of interactions between PhoB and E. coli genomic DNA was constructed ( Figure S2A ) . 
+ Our ChIP-chip results contained six of the nine PhoB-regulating targets described in a recent review [ 9 ] . 
+ The six targets are ugpB , phnC , phoA pstS , phoE and phoB ( Figure S2B ) . 
+ One possible reason that we were not able to detect all the previously described targets may be due to the differences in experimental conditions or the in vivo/in vitro experimental designs . 
+ Further investigation of significantly enriched regions revealed 43 significantly enriched peaks identified by a CMARRT package [ 12 ] with a controlled error rate set at 0.05 ( see Methods for details ) . 
+ Other uncharacterized PhoB targets were also identified in our study , and the overall target genes were classified into six groups with functions involved in transcriptional regulation , transportation , metabolism , membrane structure , unknown function and pseudogene ( Table 2 ) . 
+ Previously uncharacterized PhoB binding targets
+ Eight novel PhoB binding sites are adjacent to ten genes that were shown to be differentially expressed in our analysis of gene expression microarray ( see below ) . 
+ These ten genes are likely to be directly regulated by PhoB . 
+ Promoter regions containing these target sites were amplified and cloned into the promoterless luciferase expression vector pGL3 to create the promoter : : lucifer-ase fusions . 
+ These fusion plasmids were transformed into the E. coli wild type strain and the phoB knockout strain . 
+ We found that all ten plasmids showed significant differences in luciferase expression ( Figure 2 ) . 
+ These eight binding sites are related to the ten targets since there are two divergently transcribed gene pairs which share the same putative binding sites . 
+ To further examine if these eight sites are directly bound by PhoB , we used the gel mobility shift assay to detect the protein-DNA interactions . 
+ Synthesized single-stranded DNA fragments covering the putative binding sites were first endlabelled with biotin , annealed and incubated with the purified PhoB-His fusion protein in vitro . 
+ The purity of our PhoB-His fusion protein is shown in Figure S3 . 
+ From the results of gel mobility shift assays , the two putative binding sites located upstream of yhjC and ydfH were seen as having low-affinity binding in our in vitro experimental conditions . 
+ The other six binding targets also showed different affinities for PhoB binding as the shifts occurred at different concentrations of PhoB ( Figure 3 ) . 
+ Identification of the PhoB binding pattern
+ The PhoB binding pattern can be identified using motif analysis of the enriched peaks from the ChIP-chip results . 
+ All 43 enriched regions were input into the MEME software to find conserved patterns . 
+ The most significant 18 bp pattern was identified ( evalue = 2.2 e218 ) , meaning that all currently identified targets of PhoB share a significant conserved pattern ( Table 2 ) . 
+ The sequence logo representation of this pattern is shown in Figure S2C . 
+ This pattern clearly agrees with the known PhoB binding pattern ( Figure S2D ) . 
+ Surprisingly , nearly half ( 20/43 ) of the binding targets were located within the coding regions and this percentage is relatively higher than that of other transcriptional regulators mentioned in the study by Shimada et al [ 13 ] . 
+ They observed that the RutR regulator also has a high percentage of binding sites ( 90 % ) located in the coding regions . 
+ This could be due to incomplete evolution to eliminate the non-functional DNA sites or uncharacterized regulations . 
+ In contrast to RutR , PhoB is a well-conserved protein and the phosphate-sensing mechanism is vital for survival ; thus PhoB is likely well-evolved and its bindings in the coding regions may have biological functions . 
+ Differentially expressed genes containing putative PhoB binding sites in their coding regions were selected to confirm that PhoB binds to their coding regions and may participate in regulating them . 
+ Three genes : cof , yahA , and yddV ( shown in Table 2 ) , fit the criteria and the 60 bps centered at the three putative PhoB binding sites were further tested by gel mobility shift assays ( Figure S4 ) . 
+ Although the detailed mechanisms involved remain to be defined , the results here reveal that PhoB plays roles in the regulation of gene expression through binding to the coding regions . 
+ Functional categories altered by PhoB
+ To assess the gene expression status affected by PhoB , RNA samples were extracted from the MG1655_PhoB_KO and the MG1655 strains under the same condition used for the ChIP-chip experiments . 
+ Followed by cDNA synthesis , biotin-labelling and hybridization onto the Affymetrix array , gene expression status was measured . 
+ There were 287 differentially expressed genes that were directly or indirectly regulated by PhoB ( Table S1 ) . 
+ Within these 287 differentially expressed genes , 177 genes were upregulated while 110 genes were down-regulated with PhoB activity . 
+ In order to investigate the global biological roles played by PhoB under the phosphate-limiting condition , the COG functional distribution of these differentially expressed genes was plotted ( Figure 4 ) . 
+ It is reasonable to see that a large group ( .10 % ) of genes participated in inorganic ion transportation and metabolism and was up-regulated during phosphate starvation in order to enhance phosphate uptake and usage . 
+ Additionally , about 7 % of genes participating in cell envelope biogenesis/outer membrane were also up-regulated . 
+ The hierarchical structure of phosphate sensing regulatory network
+ It is worth noting that our gene expression data showed 287 genes affected by PhoB while only 19 out of these 287 genes were considered to be directly regulated PhoB targets ( Table 2 ) . 
+ There are 22 differentially expressed transcriptional regulators and six of them ( cusR , feaR , phoB , prpR , ydfH , and yhjC ) contained the Pho box within their upstream regions . 
+ Thus PhoB may pass the phosphate-limiting signal first to the six regulators which they then regulate the other 15 regulators , which in turn affect the remaining 265 differentially expressed genes . 
+ Under the hierarchical structures of phosphate signalling passages , feed forward loop ( FFL ) network motifs play a role in signal sensing and responding mechanisms . 
+ Each FFL network motif contains three genes . 
+ Two of the three are transcription regulators , one of which regulates the other , and they jointly regulate the third target gene . 
+ At least four sets of gene pairs , cusR/cusC , prpR/prpB , feaR/feaB , and yedW/yedX , have the potential to form feed forward loops with 
+ PhoB regulation . 
+ For example , PhoB regulates cusR and both PhoB and CusR regulators regulate cusC . 
+ Thus , the phoB , cusR , and cusC form a FFL network motif . 
+ A previous study has demonstrated in silico that these FFLs can enhance the signal transduction processes or delay the response or adjust the sensing mechanisms through transcriptional regulation [ 14 ] . 
+ The underlying biological functions of these potential FFLs are left for future in vivo investigations . 
+ Overall , the data suggest that PhoB specifically regulates a relatively small group of genes , which influence a large group of downstream genes during phosphate starvation . 
+ PhoB operates cooperative regulatory mechanisms
+ In addition to the formation of FFL network motifs , three more observations revealed that PhoB cooperates with other transcription factors to modulate downstream responses . 
+ The first one is the contrasting regulatory modes observed from our reporter gene assay and microarray data . 
+ The two upstream regions of yhjC and yegH may be positively regulated by PhoB based on our reporter gene assay . 
+ However , they both showed down regulation in the presence of PhoB from our microarray datasets . 
+ The opposite results indicate that other regulators may have inhibitory roles in the coding regions to block the up-regulation of PhoB in our experimental condition . 
+ In addition , post-transcriptional modifications , such as mRNA degradation or small RNA regulation , may also be other reasons for reductions in PhoB up-regulation . 
+ Secondly , only about 35 % of PhoB binding targets showed differential expression ( Table 2 ) . 
+ This may be a clue that PhoB cooperates with other factors to modulate transcription since other factors may reduce the effect of PhoB regulation . 
+ Finally , we also observed an indirectly regulated 14 kb region to which PhoB did not tend to bind to ( Figure S5 ) . 
+ It is interesting that genes in this 14 kb region all showed up-regulation in the presence of PhoB . 
+ This result suggests that other factors may protect this highly expressed region from PhoB binding . 
+ Although the underlying mechanisms require further investigation , these observations implicate that 
+ PhoB operates complex regulatory mechanisms and cooperates with other factors in the signal integration of genetic regulatory networks in E. coli . 
+ PhoB is involved in the regulation of transporter systems and membrane component rearrangement 
+ Out of the 287 genes showing differential expression in our gene expression experiments , more than 60 genes encode the proteins of transporter systems ( Table S1 ) . 
+ Previous reports had shown that phosphorus-uptake related transporters are activated during phosphate starvation [ 1 ] . 
+ Based on our results , a large group of genes encoding transporter systems were also activated , such as : Oligopeptide transporter ( oppABCDF ) , Copper/silver efflux system ( cusCFBA ) , multidrug-efflux systems ( mdtABCD , cmr ) , neutral amino-acid efflux system ( eamB ) and others ( Table S1 ) . 
+ These transporters may play roles to adjust the overall metabolic flux of the cells although the adjustments involved are not clear at this time . 
+ Membrane constituents such as lipopolysaccharides , outermembrane proteins , and membrane lipids have been reported to be regulated during phosphate starvation [ 15,16 ] . 
+ This may be because phosphorus is a major component of cell membraneforming phospholipids . 
+ If the phospholipids can not be renewed , the membrane becomes too weak to defend stresses like oxidation pressure , osmotic stress , and others . 
+ Our microarray data showed that a group of genes related to the metabolisms of murein , palmitoylated lipid A , colanic acid , and putrescine are modulated under PhoB activity . 
+ Murein
+ Murein or peptidoglycan can help E. coli cells to stabilize their cell envelope under the high intracellular pressure [ 17 ] . 
+ The genes mipA ( scaffold protein for murein synthesizing machinery ) , ycfS ( L , 
+ D-transpeptidase linking Lpp to murein ) and mltD ( predicted membrane-bound lytic murein transglycosylase ) were observed to be activated in transcriptional expression . 
+ The mipA gene also has a PhoB binding signal in its upstream region and is considered as a directly regulated target . 
+ Lipid A
+ For the modification of lipid A , the hexa-acyl pyrophosphate Lipid A is known to be modulated through the Pho regulon in E. coli [ 18 ] . 
+ From our microarray data , we observed the involvement of PhoB in the up-regulation of pagP . 
+ Palmitoylated lipid A may also be synthesized during phosphate starvation since PagP transfers palmitate from phospholipid to lipid A precursor to generate palmitoylated lipid A , which protects bacteria from host defences and is likely related to bacterial virulence [ 19,20 ] . 
+ Colanic acid
+ Additionally , colanic acid is an extracellular polysaccharide and has been shown to increase tolerance to heat and acid conditions [ 21,22 ] . 
+ The genes , wzxC ( colanic acid exporter ) , wcaJ ( predicted 
+ UDP-glucose lipid carrier transferase ) , wcaK ( predicted pyruvyl transferase ) , wcaL ( predicted glycosyl transferase ) and wcaM ( predicted colanic acid biosynthesis protein ) are involved in the colanic acid biosynthesis and transportation pathway and were observed to be up-regulated in the wild-type strain relative to the PhoB knock-out strain . 
+ Putrescine
+ As for the linear polyamine , putrescine , its role is related to membrane stabilization and optimal growth . 
+ However , a high concentration of polyamines will inhibit cell growth and protein synthesis . 
+ Therefore , the polyamine degradation pathway exists in bacteria for balancing the concentration [ 23 ] . 
+ This pathway involves puuCBE ( gamma-Glu-gamma-aminobutyraldehyde dehydrogenase , gamma-Glu-putrescine oxidase and GABA aminotransferase ) , puuP ( putrescine importer ) , puuA ( gamma-Glu-putres-cine synthase ) , and puuD ( gamma-Glu-GABA hydrolase ) . 
+ In our study , all of these genes showed down-regulation and their transcription repressor , PuuR , in turn was up-regulated in the presence of PhoB activity . 
+ The repressed putrescine degradation pathway indicates that , during phosphate starvation , membrane stabilization is more important than growth since E. coli cells enter the stationary phase . 
+ PhoB is involved in oxidative stress protection
+ Previous studies described that although cells stop growing , bacteria will still undergo aerobic respiration during phosphate starvation [ 24 ] . 
+ Under this circumstance , hydrogen peroxide may not be diluted through cell division and thus may accumulate in cells . 
+ Oxidative stress was demonstrated to occur during phosphate starvation . 
+ In addition , the alkyl hydroperoxide reductase ( AHP ) complex helps scavenge hydrogen peroxide produced during phosphate starvation [ 25,26 ] . 
+ In our study , the ahpCF was identified to be up-regulated indirectly by PhoB . 
+ This suggests that PhoB plays a protective role for the oxidative stress which occurs during phosphate starvation . 
+ It is known that methylglyoxal is synthesized to enhance the phosphate turnover during phosphate starvation [ 27,28 ] . 
+ Although methylglyoxal can help to protect against electrophile attack and detoxification , excess methylglyoxal leads to cell death . 
+ From our gene expression analysis , the yeaE gene encoding the methylglyoxal reductase was up-regulated in the wild-type E. coli strain compared to the PhoB knock-out strain . 
+ This is another indication that PhoB has a protective role for oxidative stress 
+ PhoB participates in protecting cells during phosphate starvation
+ We have presented that during phosphate starvation , PhoB is involved in triggering the membrane component rearrangement for membrane integrity . 
+ In addition , PhoB also indirectly affects genes participating in protecting cells from oxidative stress and genes that balance the level of methylglyoxal . 
+ These results together suggest that PhoB protects the bacterium by enhancing membrane integrity and reducing oxidative damage to the cell membranes . 
+ We have identified several predicted transcription factors that are regulated by PhoB . 
+ Further studies of these predicted transcription factors are needed in order to understand the complex interplay between genes and regulators in the bacterial signalling and regulatory networks during phosphate starvation . 
+ In summary , our genome-wide approach for characterizing the roles of PhoB by ChIP-chip and gene expression array provides a comprehensive global binding profile of PhoB . 
+ We have presented a hierarchical structure of transcriptional regulators of the phosphate-sensing network as well as the potential membrane protective roles of PhoB . 
+ Materials and Methods
+ Bacterial strains, plasmids and growth conditions
+ Bacterial strains used in this study are shown in Table 1 . 
+ Tables S2 and S3 list the plasmids and the oligonucleotides , respectively . 
+ A phoB knock-out derivative from the BW25113 strain was requested from Keio collection [ 29 ] . 
+ This phoB disruption was then transferred into MG1655 strain by P1 transduction [ 30 ] and named MG1655_PhoB_KO . 
+ The MG1655_PhoB_FLAG which carries a 3xFLAG tag at the 39 end of phoB gene was constructed from the BW25113 strain using an epitope tagging approach [ 31 ] . 
+ For PhoB ChIP-chip experiments , strains MG1655 and MG1655_PhoB_FLAG were grown in Morpholinepropanesulfo-nic acid ( MOPS ) minimal medium with 200 mM K HPO and 2 4 0.4 % glucose . 
+ Figure 1 shows the time point for cell harvesting and the cultivation of MG1655 , MG1655_PhoB_FLAG , and MG1655_PhoB_KO under phosphate-limiting and phosphatesufficient conditions . 
+ The time point at OD of 1.0 was 600nm selected since phosphate was used up and PhoB had a higher activity for ChIP-chip assay . 
+ For the gene expression microarray experiments , MG1655 and MG1655_PhoB_KO were compared under the same conditions as the ChIP-chip assay . 
+ To compare the promoter activity of the upstream regions , promoter : : luciferase gene fusion plasmids were constructed , and luminescence was measured for both MG1655 and MG1655_PhoB_KO strains at the same time point as the two above experiments ( Figure 1 ) . 
+ The ChIP-chip experiments and the reporter gene assays were carried out in at least biological triplicates , while the gene expression microarray experiments were performed in two biological replicates . 
+ Determination of phosphate concentration
+ To determine the concentration of orthophosphate , an ascorbic acid method described previously was applied in biological triplicates with slight modifications [ 32 ] . 
+ After overnight culturing of the MG1655 , MG1655_PhoB_KO , and MG1655_PhoB _ - FLAG strains in MOPS minimal medium containing 1000 mM K HPO and 0.4 % glucose , cultures were diluted in 2 4 1:100 ratio in MOPS minimal medium containing 200 / 1000 mM K2HPO4 and 0.4 % glucose and grew at 37uC . 
+ At each time point , cultures were collected , centrifuged at 12,000 g for 5 min , and then 1 ml supernatants were added to 160 ml reaction solution ( 1 N sulphuric acid , 0.1 mM potassium antimonyl tartrate , 4.8 mM ammonium molybdate and 30 mM ascorbic acid ( added lastly ) ) . 
+ After 10 min incubation of supernatants with the reaction solution , the light absorbance at 880 nm was measured . 
+ By interpolation of the standard curve , the phosphate concentration was determined . 
+ Chromatin immunoprecipitation (ChIP) experiment
+ To identify the genome-wide DNA-binding profile of PhoB , ChIP assays were performed on MG1655_PhoB_FLAG and MG1655 . 
+ The ChIP assay protocol was modified from Byung-Kwan Cho et al. [ 33 ] . 
+ The MG1655_PhoB_FLAG strain expresses the PhoB-FLAG fusion protein where the FLAG tag can be recognized by anti-FLAG antibody and used for ChIP assaying [ 34 ] . 
+ The MG1655 strain , which expresses no FLAG tag , was used as a control group . 
+ Cultures were grown to an OD600 value of 1.0 and treated with 1 % formaldehyde for 10 min . 
+ To quench the reaction , glycine was added at the final concentration of 0.125 M for 5 min . 
+ Cells were centrifuged at 12,000 g at 4uC for 20 min and washed two times with the washing buffer ( 10 mM Tris-HCl ( pH 7.4 ) , 0.1 M NaCl , 1 mM EDTA and 0.5 % Tween-20 ) . 
+ The washed cells were then lysed with the lysis buffer ( 10 mM Tris-HCl ( pH 7.4 ) , 0.1 M NaCl , 1 mM EDTA and 0.5 % Tween-20 , 8 kU/ml lysozyme , 1 mM PMSF , and protease inhibitor cocktail ( Sigma ) ) for 30 min at 4uC . 
+ The lysates were sonicated ( Bioruptor ) to result in DNA fragments ranging from 100 bps to 1000 bps with the average size of 500 bps . 
+ After sonication , the lysates were centrifuged at 12,000 g for 20 min at 4uC and the resulting supernatants were used for immunoprecipitation . 
+ To eliminate the non-specific bindings between the magnetic beads coated with Protein G ( Invitrogen ) and the anti-FLAG antibody , the magnetic beads were pre-incubated with 0.05 mg / ml anti-FLAG antibody ( Sigma ) . 
+ Similarly , for the purpose of eliminating the non-specific bindings between our lysates and the beads , lysates were also pre-cleared by incubating them with the beads without the anti-FLAG antibody . 
+ To immunoprecipitate the PhoB-FLAG-DNA complex , beads pre-incubated with the antibody were added in both lysates from MG1655_PhoB_FLAG and MG1655 strains at 4uC overnight . 
+ The beads were washed once with IP buffer ( 10 mM Tris-HCl ( pH 7.4 ) , 0.1 M NaCl , 1 mM EDTA , and 0.05 % [ v/v ] Tween-20 and 1 mM fresh PMSF ) , twice with ChIP wash buffer I ( 10 mM Tris HCl ( pH 7.4 ) , 300 mM NaCl , 1 mM EDTA , 0.1 % Tween-20 and 1 mM fresh PMSF ) , three times with ChIP wash buffer II ( 10 mM Tris-HCl ( pH 7.4 ) , 500 mM NaCl , 1 mM EDTA , 0.1 % [ v/v ] Tween-20 and 1 mM fresh PMSF ) , once with ChIP wash buffer III ( 10 mM Tris-HCl ( pH 7.4 ) , 250 mM LiCl , 1 mM EDTA , 0.1 % [ v/v ] Tween-20 and 1 mM fresh PMSF ) and once with TE buffer ( 10 mM Tris-HCl ( pH 7.4 ) and 1 mM EDTA ) . 
+ After removing the TE buffer , beads were incubated twice with elution buffer ( 50 mM Tris-HCl ( pH 7.4 ) , 10 mM EDTA and 1 % SDS ) at 65uC for 15 min and the two resulting eluted solutions were combined . 
+ After incubating the combined eluted samples with proteinase K ( Sigma ) to the final concentration of 10.5 U/ml at 42uC for 2 hours , the reverse cross-link procedure was performed by incubating at 65uC overnight to unlink the covalent bonds formed by formaldehyde between peptides and DNA . 
+ Samples were then treated with RNase A ( Sigma ) to the final concentration of 26 mg / ml , followed by purifying DNA from the RNase A-treated samples using the PCR purification kit ( Qiagen ) . 
+ Whole genome tiling array analysis for ChIP-chip experiments
+ The NimbleGen 385 K high density tiling array for E. coli K12 MG1655 ( Cat . 
+ No. 05542901001 ) was used for our ChIP-chip assay . 
+ The instructions of the NimbleGen 's protocol ( version 2.0 ) were followed for all procedures . 
+ Immunoprecipitated samples were amplified by whole genome amplification kit ( Sigma ) twice and pooled together . 
+ The amplified samples from MG1655_PhoB_FLAG were labelled with Cy5 dye while the control samples from MG1655 were labelled with Cy3 dye . 
+ After the hybridization step , the arrays were washed and then scanned with an Axon scanner ( GenePix 4000B ) . 
+ The scanned TIF image files were then processed by NimbleScan software to generate the intensity pair files . 
+ The R package Ringo [ 35 ] was used to read the pair files , and the limma package [ 36 ] was used for within - and between-array normalization [ 37 ] . 
+ The averaged values of normalized Cy5 and Cy3 intensities from triplicate samples were used to calculate the log2-ratios ( Cy5/Cy3 ) . 
+ The enriched regions were then identified by the CMARRT package [ 12 ] with a controlled error rate set at 0.05 . 
+ Our ChIP-chip data had been submitted to NCBI GEO database and the GSE Series record is GSE21857 . 
+ Motif identification
+ To find the position weight matrix ( PWM ) of PhoB binding sites , E. coli K12 MG1655 sequences of all the enriched regions were extracted from NCBI RefSeq ( accession no . 
+ NC_000913 ) . 
+ We used the MEME program [ 38 ] to search for the most significant conserved pattern with pattern length ranging from 18 to 22 bps ( accomplished by using the -- minw 18 -- maxw 22 options of MEME ) . 
+ The range was selected because the previously reported PhoB binding pattern is 18 bps in length [ 11 ] , while the structure information indicated that the site is 22 bps [ 10 ] . 
+ A seven-order background model was built from the whole E. coli K12 MG1655 reference sequence ( accomplished by using the -- bfile , background model file . 
+ option in MEME ) . 
+ In addition , sites on both strands were allowed ( accomplished by using the -- revcomp option ) . 
+ The sequence logo [ 39 ] was then used to present the PWM graphically . 
+ Gene expression microarray and analysis
+ The Affymetrix E. coli Genome 2.0 array was used to investigate gene expression status in the presence and the absence of PhoB activity . 
+ The E. coli K12 MG1655 and MG1655_PhoB_KO strains grew in MOPS minimal medium containing 200 mM K2HPO4 and 0.4 % glucose . 
+ At an OD600 of 1.0 , cultures were treated with 10 mg/ml lysozyme and 10 % SDS at 4uC for 5 min to lyse bacterial cells . 
+ Then , the protocol for total RNA purification using TRIZOL reagent ( Sigma ) was followed . 
+ The Affymetrix standard protocol was then applied for cDNA synthesis , fragmentation , biotin labelling and hybridization . 
+ The raw CEL files were normalized by a robust multi-array average approach [ 40 ] . 
+ The microarray data have also been included in GSE21857 of NCBI GEO database . 
+ To assess statistically significant differential expression , we applied linear models and empirical Bayes methods [ 41 ] through the limma package , and the Benjamini and Hochberg 's q-value threshold was set at 0.05 . 
+ The filtered results were considered as the differentially expressed genes . 
+ To investigate the functions of the differentially expressed gene , the functional categories of the clusters of orthologous group ( COG ) were used [ 42 ] . 
+ Construction and assay of promoter::luciferase fusions
+ The promoter regions of the PhoB targets identified in ChIP-chip experiments were amplified by PCR from the MG1655 strain using the primers listed in Table S3 . 
+ After treatment with NheI and NcoI restriction enzymes , the digested linear products were then ligated into a NheI-NcoI digested pGL3-basic vector ( Promega ) . 
+ The pGL3 plasmid contains a promoterless luciferase gene . 
+ The cultivation condition was in MOPS minimal medium supplemented with 200 mM K2HPO4 and 0.4 % glucose at the same condition as the experiments for ChIP-chip assay and gene expression microarray ( Figure 1 ) . 
+ The luciferase activities were measured using a luciferase assay system ( Promega ) . 
+ PhoB-His fusion protein purification
+ In order to overexpress the PhoB-His fusion protein , the PhoB coding region was cloned into a pET21d ( + ) plasmid . 
+ This constructed vector expressing the PhoB-His ( 6x ) fusion protein was transformed into BL21 . 
+ Overnight cultures were diluted 1:500 into 250 mL LB cultures containing 100 mg/ml ampicillin . 
+ The cultures were grown at 37uC until an OD600 of 0.4,0.6 , then treated with 1 mM IPTG to induce PhoB-His ( 6x ) expression and then grown at 37uC for another 2 hours . 
+ After centrifugation , the pellets were resuspended in 10 ml lysis buffer ( 20 mM NaH2PO4 , 500 mM NaCl , 20 mM imidazole , and 1 mg/ml lysozyme ) . 
+ The cells were lysed for 30 min at 4uC and then the lysates were cleared by centrifugation at 14000 g for 30 min at 4uC . 
+ After applying the lysate to the Ni-sepharose column ( GE Healthcare ) , the column was washed two times by 4 ml wash buffer ( 20 mM NaH2PO4 , 500 mM NaCl , and 30 mM imidazole ) . 
+ The elution was performed by applying 1 ml elution buffer ( 20 mM NaH2PO4 , 500 mM NaCl , 500 mM imidazole ) to the column four times . 
+ The eluted samples were dialyzed in the storage buffer 
+ ( 25 mM Tris-HCl , 50 mM NaCl , 0.1 mM EDTA , and 0.1 mM DTT ( pH 7.4 ) ) . 
+ The concentration of PhoB-His fusion protein was determined by the Bradford assay ( Bio-Rad ) using the bovine serum albumin ( BSA ) as the standard . 
+ Gel mobility shift experiments
+ The synthetic single-stranded 60 bp DNA fragments centered at PhoB putative binding sites were used in these experiments ( Table S3 ) . 
+ DNA fragments were first 39-end labeled with biotin using a DNA 39 End Biotinylation Kit ( Pierce ) and then annealed before use . 
+ Before the binding assay , the PhoB-His fusion protein was phosphorylated in the reaction buffer ( 50 mM Tris-HCl , 10 mM MgCl2 , 0.1 mM DTT , and 20 mM acetylphosphate ) at 37uC for 75 min [ 43 ] . 
+ The phosphorylated PhoB-His fusion protein was then used in the mobility shift assays . 
+ Each binding reaction contained 20 fmol 39-end biotin labeled dsDNA , 20 mM Tris-HCl ( pH 7.0 ) , 50 mM NaCl , 1 mM DTT , 10 mM MgCl2 , 100 mg/ml BSA , and 0.5 mg/ml poly dI-dC with various amounts of PhoB-His ( 6x ) fusion protein ( see Figures 3 and S4 ) . 
+ Reactions were incubated for 15 min at 37uC , and then loaded onto a 6 % native polyacrylamide gel running at 100 V in 0.5 X TBE buffer . 
+ After separation , samples were blotted to Amersham Hybond-N membranes using a Hoefer TE 70 device . 
+ The labeled biotin signals were transferred and detected using a LightShift Chemiluminescent EMSA Kit ( Pierce ) according to the manufacturer 's instructions . 
+ For each tested target , at least two to three biological replicates were performed and the best figure was picked and shown in Figures 3 and S4 . 
+ Supporting Information
+ wild type strain . 
+ In order to confirm that the activity of our PhoB-FLAG fusion protein is not affected by the C-terminal FLAG tag , a reporter gene assay was used to measure the activity of the self-regulated PhoB promoter . 
+ The constructed phoB promoter : : luciferase fusion plasmid were transformed into three strains , MG1655 , MG1655_PhoB_FLAG , and MG1655_PhoB_KO ( see Materials and Methods ) . 
+ The transformed strains were grown in MOPS minimal medium containing 0.2 mM K2HPO4 and 0.4 % glucose until OD600 of 1.0 . 
+ The growth condition and the time point are the same as our ChIP-chip assay and gene expression microarray . 
+ The luciferase activities were measured using a luciferase assay system ( Promega ) . 
+ The y-axis in this figure shows the relative light unit ( RLU ) . 
+ The phoB-deprived strain , MG1655_PhoB_KO , shows the basal level activity of phoB promoter without PhoB . 
+ The wild type MG1655 strain represents the activity of phoB promoter under wild type PhoB positive regulation . 
+ This figure displays that our PhoB-FLAG fusion protein results from ChIP-chip experiments at an OD600 value of 1.0 . 
+ The log2 fold change ( y-axis ) is the log2 ratio ( grey line ) of the normalized Cy5 signal ( MG1655_PhoB_FLAG ) divided by the normalized Cy3 signal ( MG1655 ) after averaging our triplicate results . 
+ These ratios were plotted against their locations on the 4.64 Mb E. coli chromosome ( x-axis ) . 
+ ( B ) Expansion of PhoB binding peaks on the previously known regulatory sites . 
+ The detected peaks were centered with 5000 bps flanking regions and these peaks were located in the promoter regions of ( i ) phoE , ( ii ) phoB , ( iii ) ugpB , ( iv ) pstS , ( v ) phoA , and ( vi ) phnC . 
+ The log2-ratios ( grey line ) and the smoothed ratios ( black line ) on y-axis were calculated from the normalized Cy5 signal ( MG1655_PhoB_FLAG ) divided by the normalized Cy3 signal ( MG1655 ) after averaging the triplicate results . 
+ ( C ) The most significant pattern was found in the 43 PhoB ChIP-chip peaks . 
+ The DNA sequences from all 43 PhoB ChIP-chip peaks ( see Table 2 ) were combined and analyzed using the MEME program . 
+ This pattern was identified with the significant value of 2.2 e218 and then a sequence logo representation was generated by an R package called seqLogo . 
+ ( D ) The previously known PhoB binding pattern was retrieved from RegulonDB ( http://regulondb.ccg.unam.mx/ MatrixAlignment/results / ) . 
+ The first five bases of the known pattern was trimmed to produce an 18 bp pattern that can be compared with our pattern . 
+ Panels C and D in this figure show high similarity between these two patterns . 
+ ( TIF ) protein . 
+ This figure displays the ( A ) SDS-PAGE and ( B ) the western blot results to demonstrate the purity of our purified PhoBHis fusion protein . 
+ The lanes from left to right are Marker ( M ) , cell lysate ( CL ) , flow-through ( FT ) , the first washed fraction ( W1 ) , the second washed fraction ( W2 ) , the first eluted fraction ( E1 ) , the second eluted fraction ( E2 ) , the third eluted fraction ( E3 ) , the forth eluted fraction ( E4 ) , and the pooled and enriched fraction ( P ) . 
+ The expected size of PhoB-His fusion protein is 27.9 kDa . 
+ ( TIF ) genes , yahA ( i ) , cof ( ii ) , and yddV ( iii ) , were shown to be differentially expressed in our gene expression microarray analysis . 
+ From our ChIP-chip analysis , significant PhoB binding peaks were detected and the putative PhoB binding motifs were also identified within the coding regions ( panel A ) . 
+ For further investigation of PhoB bindings , the in vitro binding assays were carried out by gel mobility shift assay ( panel B , see Methods for details ) . 
+ The results demonstrate that PhoB binds to the coding regions of the three genes despite the weak binding to yahA . 
+ ( TIF ) addition to PhoB binding regions , there is a long region in which the PhoB binding signals are lower than background noises . 
+ This region ranges from the genomic location of 1292000 to 1306000 bps . 
+ The y-axis represents the log ratio ( grey line ) and 2 the smoothed ratio ( black line ) of the normalized Cy5 signal ( MG1655_PhoB_FLAG ) divided by the normalized Cy3 signal ( MG1655 ) after averaging our triplicate results . 
+ In order to show the boundary of this non-preferred binding region , the genomic region from 1291000 to 1207000 bps is plotted . 
+ Genes located in this region are shown at the bottom of the plot . 
+ The pseudogene , insZ , located between tdk and adhE is not shown . 
+ All genes , tdk , insZ , adhE , ychE , oppABCDF and yciU were up-regulated with PhoB activity . 
+ ( TIF )
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/23203983.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/23203983.txt 0 → 100644
View file @27818a9
+ The 2013 Nucleic Acids Research Database
+ ABSTRACT 
+ The 20th annual Database Issue of Nucleic Acids Research includes 176 articles , half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals . 
+ This year 's highlights include two databases of DNA repeat elements ; several databases of transcriptional factors and transcriptional factor-binding sites ; databases on various aspects of protein structure and protein -- protein interactions ; data-bases for metagenomic and rRNA sequence analysis ; and four databases specifically dedicated to Escherichia coli . 
+ The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation ( NCBI 's dbVar and EBI 's DGVa ) , the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease , potential drugs , their targets and the mechanisms of protein -- ligand binding . 
+ Two new databases present genomic and RNAseq data for monkeys , providing wealth of data on our closest relatives for comparative genomics purposes . 
+ The NAR online Molecular Biology Database Collection , available at http://www . 
+ oxfordjournals.org/nar/database/a/ , has been updated and currently lists 1512 online databases . 
+ The full content of the Database Issue is freely available online on the Nucleic Acids Research website ( http://nar.oxfordjournals.org/ ) . 
+ Cambridge, CB24 6DZ, UK and National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
+ NEW AND UPDATED DATABASES
+ This 1300-page virtual volume represents the 20th annual Database Issue of Nucleic Acids Research ( NAR ) . 
+ It includes descriptions of 88 new online databases , 77 update articles on databases that have been previously featured in the NAR Database Issue ( Table 1 ) and 11 art-icles with updates on database resources whose descriptions have been previously published in other journals ( Table 2 ) . 
+ At this point it might be instructive to look back at the origin and evolution of the NAR Database Issue . 
+ Its history started from two supplementary issues that were published in NAR in April of 1991 and in May of 1992 and consisted of 18 and 19 articles , respectively ( see http : / / nar.oxfordjournals.org/content/19/supplement.toc and http://nar.oxfordjournals.org/content/20/supplement.toc ) . 
+ These articles offered descriptions of several nucleotide sequence databases , such as GenBank , the EMBL Data Library , compilations of small RNA , tRNA , and 5S , 16S , and 23S rRNA sequences ( including the Ribosomal Database Project ) , DNA sequences from Escherichia coli and a human genome database ( GDB ) . 
+ Those ﬁrst issues also included descriptions of several protein databases , such as SWISS-PROT , PIR , PROSITE , Restriction Enzyme Database ( REBASE ) , Transcription Factors Database ( TFD ) and Histone database . 
+ There was also a medical genetics database , Haemophilia B , listing point mutations and indels in the coagulation factor IX ( F9 ) gene that caused this blood clotting disorder , which has affected the royal families of several European countries . 
+ The next issue , published on July 1 , 1993 , was the ﬁrst one formally labelled as the Database Issue . 
+ It consisted of 24 articles , which added databases of RNA and protein structure and the ENZYME database . 
+ It was followed by 
+ 2P2Idb http://dimr.cnrs-mrs.fr 2010 Allen Brain Atlas http://www.brain-map.org 2009 BioGPS http://biogps.org 2009 DARNED http://beamish.ucc.ie/ 2010 DoriC http://tubic.tju.edu.cn/doric/ 2007 FlyAtlas http://ﬂyatlas.org/ 2007 GenColors http://sgb.ﬂi-leibniz.de/ 2005 Genomicus http://www.dyogen.ens.fr/genomicus 2010 InnateDB http://www.innatedb.com/ 2008 MicroScope http://www.genoscope.cns.fr/agc/microscope/ 2009 NPIDB http://npidb.belozersky.msu.ru/ 2007 
+ NAR Database Issues in September 1994 , then in January 1996 , and each January after that . 
+ In the past 20 years , the Database Issue has gradually grown in size before stabilizing at the level of 180 art-icles . 
+ However , despite the almost 10-fold increase in the 
+ Structural data on protein -- protein interactions and their inhibitors Gene expression and neuroanatomical data on human and mouse brain Gene annotation portal and a resource on gene and protein function Database of RNA Editing Replication origin ( oriC ) regions in bacterial and archaeal genomes Drosophila gene expression atlas Genome annotation and comparison database for small genomes Syntenic relationships between eukaryote genomes A database of mammalian innate immune response Microbial genome annotation and analysis platform Nucleic acids -- protein interaction database number of published articles , the key topics of the current issue remain largely the same as 20 years ago . 
+ This issue again features articles from GenBank and the European Nucleotide Archive ( formerly the EMBL Data Library ) , which , together with the DNA Data Bank of Japan , for the International Nucleotide Sequence Database collabor-ation , INSDC ( 1 -- 4 ) . 
+ Just as 20 years ago , there are updates from SWISS-PROT and PIR ( now combined into UniProt ) and PROSITE ( 5,6 ) . 
+ Continuing the tradition of featuring well-curated data-bases of RNA sequences , this issue includes an update on SILVA , a widely used comprehensive database of bacter-ial , archaeal and eukaryotic 16S/18S and 23S/28S rRNA sequences ( 7 ) , and a description of Protist Ribosomal Reference database ( PR2 ) , a new database that catalogs small subunit rRNA sequences from unicellular eukaryotes ( 8 ) . 
+ An update on the Ribosomal Database Project , a constant feature of the NAR Database Issue since 1991 ( 9 ) , was last published in 2009 ( 10 ) . 
+ Other RNA databases in this issue include an update on Rfam ( 11 ) , the universally acclaimed database of RNA families , as well as several databases on long non-coding RNA , microRNA and their targets . 
+ An update of MODOMICS , a database on RNA modiﬁcation , is now supplemented by RNApathwaysDB , a database of RNA maturation and decay pathways developed by the same group ( 12,13 ) . 
+ As before , this issue presents several transcription factor ( TF ) databases . 
+ Two of them cover TFs themselves : TFClass offers a classiﬁcation of human TFs , while NPIDB presents structural information on DNA -- protein and RNA -- protein complexes ( 14,15 ) . 
+ Several other databases collect information on the TF-binding sites . 
+ These include Factorbook , a database of TF-binding data from the ENCODE project ; HOCOMOCO , a collection of human TF-binding sites ; CTCFBSDB , a database of CCCTC-binding factor ( CTCF ) - binding sites ; RegulonDB , a database of transcriptional regulation in E. coli ; and SwissRegulon , a database of regulatory sites in human , mouse and yeast genomes and in model bacteria ( 16 -- 20 ) . 
+ The structural databases featured in this issue all show a trend towards a better integration and cross-referencing tools . 
+ This refers both to the updates of well-known data-bases , such as the RCSB Protein Data Bank ( PDB ) , CATH and PDBTM , and to such databases as EBI 's SIFTS , a joint effort of UniProt and PDBe to provide a residue level mapping of their entries and supplement it with annotation from other public databases ; Genome3D , a recent collaborative project aiming to provide structural annotation from CATH and SCOP to the genomic sequences ; and dcGO , which develops domain-centric ontologies to link protein domains with functions , phenotypes and diseases ( 21 -- 23 ) . 
+ Likewise , with E. coli remaining the workhorse of molecular biology , this issue includes update articles on the EcoGene ( the ﬁrst one since 2000 ) , EcoCyc and RegulonDB databases , as well as a description of the newly developed E. coli Metabolome Database ( 20,24 -- 26 ) . 
+ HUMAN DISEASE GENOMICS—THE NEXT FRONTIER?
+ As discussed earlier ( 27 ) , the original GDB did not survive the inﬂux of the new data and multiple changes of ownership . 
+ Nevertheless , we now have a wide variety of databases that cover different aspects of human genome and genomes of model organisms . 
+ This issue features annual updates from Ensembl and ENCODE projects and from the UCSC Genome Browser and the Japanese H-InvDB database ( 28 -- 31 ) . 
+ The model organism data-bases are represented by the updates to FlyBase , Mouse Genome database , Xenbase and ZFIN ( 32 -- 35 ) . 
+ Two new databases , RhesusBase and NHPRTR , present extensive genome and RNAseq data for non-human primates , including great apes , old world monkeys , new world monkeys and prosimians ( 36,37 ) . 
+ These data could go a long way towards establishing monkeys as model organisms for comparative genomics studies . 
+ One more database is dedicated to a more distant relative of human , the urochordate Oikopleura dioica ( 38 ) . 
+ A potentially important development is the construction of two new databases of repetitive DNA elements , Dfam and SINEBase ( 39,40 ) . 
+ Along with the industry standard Repbase Update ( 41,42 ) and monthly RepBase Reports ( http://www.girinst.org/repbase/reports/ ) , these databases promise to contribute to a better understanding of eukaryotic repeat elements . 
+ With the abundance of databases providing valuable tools for genome analysis , there is a clear trend towards bringing genomics ` from the bench to the bedside ' , i.e. using genomic data for a better understanding and , hopefully , better treatment of human disease . 
+ A number of projects , including ClinSeq ( http://www.genome.gov/ 20519355 ) , DDD ( http://www.ddduk.org/ ) and UK10K ( http://www.uk10k.org/ ) are working towards these goals , and several databases featured in this issue represent important steps in this direction . 
+ Last year 's issue introduced the GWASdb database of human genetic variants identiﬁed by genome-wide association studies ( 43 ) . 
+ GWAS Central , established in 2007 as HGVbaseG2P ( 44 ) , has been revamped and now includes data from over 1000 studies . 
+ Now , a joint article from NCBI and EBI describes their databases of genomic structural variation , dbVar and DGVa ( 45 ) . 
+ These databases cover diverse variation data including inversions , insertions and translocations that are > 50 bp in length . 
+ NCBI is also developing ClinVar ( http://www . 
+ ncbi.nlm.nih.gov / clinvar / ) , a database of relationships between human gene variation and the observed health status ( 46 ) . 
+ The task of streamlining the genetic tests that provide such information is taken up by the recently created NIH Genetic Testing Registry , a database of genetic tests and laboratories that perform them , with detailed information about what exactly is measured in each test and its analytic and clinical validity ( 47 ) . 
+ The impact of the genomic data on developing targeted approaches for ﬁghting disease is particularly evident in the case of cancer . 
+ This issue features updates from three great databases , the UCSC Cancer Genome Browser ( 48 ) , the Atlas of Genetics and Cytogenetics in Oncology and Haematology ( 49 ) and the TP53 website [ ( 50 ) , the ﬁrst update of the database on tumor factor p53 mutations since 1997 ] . 
+ In addition , there are two new databases dedicated to studying cancer at the level of speciﬁc cell lines . 
+ The CellLineNavigator database provides gen expression proﬁles of different cancer cell lines in different pathological states ( 51 ) , whereas the Genomics of Drug Sensitivity in Cancer ( GDSC ) collects the results of high-throughput studies examining the sensitivity for anti-cancer drugs in various cell lines ( 52 ) . 
+ CURATION OF THE NAR DATABASE COLLECTION 
+ During the past 20 years , all databases featured in the NAR Database Issues were added to the NAR online Molecular Biology Database Collection , available at http://www.oxfordjournals.org/nar/database/a/ . 
+ With the annual attrition rate of < 5 % , this Collection has been steadily growing and , in 2012 , exceeded 1400 database entries ( 53 ) . 
+ It was clear that the list was due for a serious clean-up , and one of the authors ( XMFS ) devised and set in motion a semi-automated procedure to identify obsolete and non-responsive websites . 
+ Remarkably , > 90 % of the databases listed in the last year 's release of the online Collection were found to be functional . 
+ Corresponding authors of close to a hundred non-responsive resources had been contacted and 44 websites ( 3.2 % of the total ) have been approved for deletion . 
+ About 100 entries in the Collection have been updated by receiving corrected URLs , summaries highlighting recent developments , or some other changes in the deposited data . 
+ Although deletion of 40 databases was well within the average drop-off rate and was hardly surprising , further analysis revealed that most of these resources were not lost . 
+ Instead , in the normal course of database evolution , they have been integrated into larger projects . 
+ For example , a couple of segmental duplications databases were merged into the Database of Genomic Variants ( 54 ) , NAR Database Collection entry no. 655 , while the NCBI 's Cancer Chromosomes database has been merged into dbVar [ described in detail in this issue , ( 45 ) ] . 
+ Further , improved annotation of the human genome made redundant a number of resources that covered speciﬁc areas of the genome ( e.g. the IXDB with its physical maps of human chromosome X ) . 
+ In one instance , the ExDom database of exon -- intron structures of genes in seven eukaryotic genomes ( 55 ) had to be removed from the Collection , as it has taken the commercial route and does not provide a free version anymore , although the author 's company offered a discounted version for academic users . 
+ Unfortunately , the tightening budgets ( 56 ) might force other databases to follow the same path . 
+ In total , the NAR online Molecular Biology Database Collection now includes 1512 databases sorted into 14 categories and 41 subcategories . 
+ The authors wishing to have their databases , published elsewhere , to be included in the Collection are welcome to contact XMFS directly . 
+ ACKNOWLEDGEMENTS
+ The authors thank Drs Javier Herrero and Michael Schuster for helpful comments and the Oxford 
+ University Press team led by Jennifer Boyd and Andrew Malvern for their help in compiling this issue . 
+ FUNDING
+ Intramural Research Program of the U.S. National Institutes of Health at the National Library of Medicine [ to M.Y.G. ] . 
+ Funding for open access charge : Waived by Oxford University Press . 
+ Conﬂict of interest statement . 
+ The authors ' opinions do not necessarily reﬂect the views of their respective institutions .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/23586855.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/23586855.txt 0 → 100644
View file @27818a9
+ Non-canonical protein-DNA interactions identified
+ Abstract 
+ Background : ChIP-chip and ChIP-seq are widely used methods to map protein-DNA interactions on a genomic scale in vivo . 
+ Waldminghaus and Skarstad recently reported , in this journal , a modified method for ChIP-chip . 
+ Based on a comparison of our previously-published ChIP-chip data for Escherichia coli σ32 with their own data , Waldminghaus and Skarstad concluded that many of the σ32 targets identified in our earlier work are false positives . 
+ In particular , we identified many non-canonical σ32 targets that are located inside genes or are associated with genes that show no detectable regulation by σ32 . 
+ Waldminghaus and Skarstad propose that such non-canonical sites are artifacts , identified due to flaws in the standard ChIP methodology . 
+ Waldminghaus and Skarstad suggest specific changes to the standard ChIP procedure that reportedly eliminate the claimed artifacts . 
+ Results : We reanalyzed our published ChIP-chip datasets for σ32 and the datasets generated by Waldminghaus and Skarstad to assess data quality and reproducibility . 
+ We also performed targeted ChIP/qPCR for σ32 and an unrelated transcription factor , AraC , using the standard ChIP method and the modified ChIP method proposed by Waldminghaus and Skarstad . 
+ Furthermore , we determined the association of core RNA polymerase with disputed σ32 promoters , with and without overexpression of σ32 . 
+ We show that ( i ) our published σ32 ChIP-chip datasets have a consistently higher dynamic range than those of Waldminghaus and Skarstad , ( ii ) our published σ32 ChIP-chip datasets are highly reproducible , whereas those of Waldminghaus and Skarstad are not , ( iii ) non-canonical σ32 target regions are enriched in a σ32 ChIP in a heat shock-dependent manner , regardless of the ChIP method used , ( iv ) association of core RNA polymerase with some disputed σ32 target genes is induced by overexpression of σ32 , ( v ) σ32 targets disputed by Waldminghaus and Skarstad are predominantly those that are most weakly bound , and ( vi ) the modifications to the ChIP method proposed by Waldminghaus and Skarstad reduce enrichment of all protein-bound genomic regions . 
+ Conclusions : The modifications to the ChIP-chip method suggested by Waldminghaus and Skarstad reduce rather than increase the quality of ChIP data . 
+ Hence , the non-canonical σ32 targets identified in our previous study are likely to be genuine . 
+ We propose that the failure of Waldminghaus and Skarstad to identify many of these σ32 targets is due predominantly to the lower data quality in their study . 
+ We conclude that surprising ChIP-chip results are not artifacts to be ignored , but rather indications that our understanding of DNA-binding proteins is incomplete . 
+ Keywords : ChIP-chip , ChIP-seq , σ32 
+ 1Wadsworth Center , New York State Department of Health , Albany , NY 12208 , USA 2 Department of Biomedical Sciences , University at Albany , Albany , NY 12201 , USA 
+ Background
+ ChIP-chip ( sometimes referred to as ChIP-on-chip ) and ChIP-seq are widely-used genomic methods that combine chromatin immunoprecipitation ( ChIP ) with microarrays and deep sequencing , respectively , to map protein-DNA interactions in vivo [ 1 ] . 
+ The genome-wide binding profiles of hundreds of proteins have been mapped using ChIP-chip and ChIP-seq in organisms ranging from bacteria to humans . 
+ ChIP-chip/ChIP-seq often identifies non-canonical target regions for DNA-associated proteins , i.e. target regions that are inconsistent with our current understanding of the protein being studied . 
+ In many cases , these discoveries have provided new insight into the function of those proteins . 
+ In bacteria , many transcription factor ( TF ) binding sites identified using ChIP-chip/ChIP-seq are located in `` unexpected '' genomic regions : ( i ) upstream of genes whose described function is seemingly unconnected to the described function of the TF [ 2-4 ] , ( ii ) upstream of genes whose expression does not change detectably when the TF-encoding gene is mutated [ 2,4-8 ] , ( iii ) inside genes [ 2-4 ,9 -13 ] , and ( iv ) far from any DNA sequences that are close matches to the known consensus binding site [ 2,3,8,14,15 ] . 
+ In most cases , the significance of these observations is unclear , although they suggest that ( i ) gene annotations are often incomplete , ( ii ) TFs often function redundantly , such that expression of the regulated gene does not change unless multiple TF-encoding genes are deleted , ( iii ) TFs often regulate the expression of non-coding RNAs that initiate within genes [ 16 ] , and ( iv ) TFs often bind DNA cooperatively such that the DNA sequence requirements are altered or relaxed . 
+ Our published ChIP-chip study of σ32 , an alternative σ factor in E. coli , led to the identification of 22 putative σ32 binding sites within genes [ 11 ] . 
+ These represent ~ 25 % of all the σ32 binding sites we identified . 
+ All but 2 of the gene-internal promoters are > 300 bp from an annotated translation start codon . 
+ We proposed that RNA polymer-ase ( RNAP ) associated with σ32 ( RNAP : σ32 ) often binds to promoter elements within genes and initiates transcription of non-coding RNAs in either the sense or antisense orientation . 
+ We confirmed this for three examples that we examined in more detail . 
+ Furthermore , five of the σ32 binding sites within genes are immediately adjacent to genes identified in previous studies as being upregulated by σ32 , but for which no promoter could be identified in the upstream region [ 17,18 ] . 
+ Our ChIP-chip data also permitted identification of 65 σ32 binding sites in intergenic regions , 26 of which are not associated with genes identified in either of two transcriptomic studies of σ32 [ 17,18 ] . 
+ Thus , many of the sites of σ32 association we identified are non-canonical . 
+ In a recent study published in this journal , Waldminghaus and Skarstad describe modifications to the standard ChIP-chip procedure [ 19 ] . 
+ The key modifications are avoiding the use of Spin-X filter columns during immunoprecipitation ( IP ) wash steps , including an RNase treatment following the IP , and collecting reference material after the IP rather the traditional `` input '' starting chromatin . 
+ Waldminghaus and Skarstad propose that the standard ChIP-chip method results in identification of false positives that are eliminated when using the modified method . 
+ Waldminghaus and Skarstad demonstrated their modified ChIP-chip procedure by performing ChIP-chip of E. coli σ32 . 
+ They identified many fewer target regions for σ32 than our earlier study . 
+ We will refer to the 46 σ32 target regions identified in our previous study but not by Waldminghaus and Skarstad as `` Disputed σ32 targets '' ( DSTs ) . 
+ DSTs are enriched for non-canonical σ32 binding sites . 
+ Specifically , 16 of the 46 DSTs are located inside genes or between convergently transcribed genes , and 21 DSTs are located in intergenic regions but are not associated with genes identified in transcriptomic studies of σ32 [ 17,18 ] . 
+ We have reanalyzed our published ChIP-chip datasets and those of Waldminghaus and Skarstad . 
+ This reanalysis demonstrates low reproducibility in the datasets of Waldminghaus and Skarstad . 
+ We also used targeted ChIP/qPCR to directly compare the standard and modified ChIP methods . 
+ We demonstrate that non-canonical targets of σ32 are real and that the lower data quality and deficiencies in the modified ChIP method are sufficient to explain the absence of DSTs in the list of σ32 targets generated by Waldminghaus and Skarstad . 
+ Results and discussion
+ Existing evidence that DSTs are genuine sites of σ32 association Waldminghaus and Skarstad suggest that DSTs are artifacts that result from non-specific IP of RNA that is then amplified by Klenow DNA polymerase during sample preparation for ChIP-chip [ 19 ] . 
+ However , there are several features of DSTs that are consistent with them being genuine sites of σ32 association and inconsistent with them being artifacts resulting from amplification from RNA : 
+ Note that , for all the analyses described herein , we have excluded the two DSTs that are located in repetitive sequence ( yibA and yrdA ; see Conclusions ) . 
+ Comparison of data quality between our data and those of Waldminghaus and Skarstad The disparity between the σ32 targets identified in the two studies led us to compare the quality of the ChIP-chip data . 
+ For each dataset we used an established method to estimate the null distribution of ChIP-chip signals [ 20,21 ] . 
+ Specifically , we determined the modal value and used the probes with scores at or below this value to fit a normal distribution . 
+ Using this fitted normal distribution we determined the mean and standard deviation of the null distribution . 
+ This allowed us to calculate z-scores ( number of standard deviations from the mean ) for each microarray probe , thus providing a measure of dynamic range that is independent of the absolute ChIP-chip signals , which have arbitrary units . 
+ Scatter plots of z-scores for the duplicate datasets from each study are shown in Figure 1A-B . 
+ These scatter plots demonstrate several key features of the datasets from each study : 
+ We conclude that our ChIP-chip data are of substantially higher quality with respect to both dynamic range and reproducibility . 
+ Figure 1C-H shows normalized ChIP-chip data for replicate datasets from both studies for six selected genomic regions . 
+ These data further demonstrate the differences in reproducibility and dynamic range between the two studies . 
+ The genomic regions shown include DSTs and non-canonical targets ( inside genes and/or no detectable regulation in transcriptomic studies ) . 
+ Several factors likely contribute to the difference in data quality between the two studies . 
+ First , we used a TAP-tagged derivative of σ32 whereas Waldminghaus and Skarstad used an antibody raised against the native protein . 
+ Second , our heat shock conditions ( 50 °C for 10 minutes ) were different to those of Waldminghaus and Skarstad ( 43 °C for 5 minutes ) . 
+ Third , as described below , the modifications to the ChIP method reduce the sensitivity of the assay . 
+ ChIP/qPCR validation of DSTs
+ We used ChIP/qPCR with the standard and modified ChIP methods to measure association of σ32 with four DSTs in cells before and after heat shock . 
+ As a positive control , we measured association of σ32 with the region upstream of dnaK , a well-established σ32 target [ 17,18 ] identified both in our study and that of Waldminghaus and Skarstad . 
+ We used cells expressing an N-terminally FLAG-tagged copy of σ32 expressed from its native locus ( our earlier study used a C-terminally TAP-tagged copy of σ32 ) . 
+ Using the standard ChIP method , we observed significant association of σ32 with all regions tested and a significant increase in σ32 association with all regions tested following heat shock ( Figure 2A ) . 
+ Previous ChIP-seq studies have revealed biases in the level of some genomic regions in input DNA , the most common control sample for ChIP experiments [ 22-24 ] . 
+ In the case of ChIP-chip , this bias is likely to be due to nucleosomes , and is hence specific to eukaryotes [ 23,24 ] . 
+ Nevertheless , we wished to rule out the possibility that DSTs were identified as a result of input biases . 
+ Therefore , we repeated the ChIP / qPCR using an untagged strain . 
+ We observed no significant ChIP/qPCR signal for any region tested ( Additional file 1 : Supplementary Data ) . 
+ We conclude that all four DSTs tested are genuine sites of σ32 binding . 
+ We compared the standard ChIP method with the modified method proposed by Waldminghaus and Skarstad . 
+ Importantly , ChIP with the modified method used the same sonicated , cross-linked cell extracts as the standard method . 
+ Using the modified method , we detected significant σ32 association with the region upstream of dnaK ( Figure 2B ) , and association increased significantly following heat shock ( Figure 2B ) . 
+ However , the absolute ChIP signal was substantially lower than that observed using the standard ChIP method ( Figure 2A ) . 
+ Thus , the modified ChIP method has a decreased sensitivity relative to the standard method . 
+ Using the modified ChIP method we detected significant association of σ32 following hea shock with three of the four DSTs tested ( Figure 2B ) . 
+ We also observed a significant reduction in σ32 association in the absence of heat shock at two of these DSTs ( Figure 2B ) . 
+ Thus , even with the decreased sensitivity of the modified ChIP method , three of the four DSTs tested were validated as genuine sites of σ32 association . 
+ We believe that we were unable to detect significant association of σ32 with the fourth DST , ybjX , due to the substantial decrease in sensitivity relative to the standard ChIP method . 
+ We note that the ChIP signal for ybjX was the lowest of all th regions tested using the standard method ( Figure 2A ) . 
+ We conclude that the reduced sensitivity of the modified ChIP method prevented Waldminghaus and Skarstad from identifying DSTs as sites of σ32 association . 
+ This is consistent with the observation that DSTs have above average ChIP-chip scores in the Waldminghaus and Skarstad datasets ( Figure 1B ) . 
+ As an independent assessment of σ32 association with DSTs , we measured association of core RNAP ( β subunit ) with dnaK and the four DSTs described above , with and without overexpression of σ32 from a plasmid . 
+ Association of β with dnaK and two DSTs was significantly higher in cells overexpressing σ32 as compared to those with empty vector ( Figure 3 ) . 
+ This provides independent validation of the association of σ32 with these regions . 
+ Two of the DSTs tested showed no significant difference in the association of β between cells overexpressing σ32 and those with empty vector . 
+ In the case of ybjX , we propose that the lack of increase in RNAP levels is due to the relatively low association of σ32 ( Figure 2A ) . 
+ Thus , association of RNAP : σ32 may not significantly increase the overall association of RNAP in the presence of a relatively high level of RNAP that is independent of σ32 ( presumably RNAP : σ70 ) . 
+ Consistent with our ChIP/qPCR data , ybjX expression was not detectably increased by σ32 overexpression in two transcriptomic studies [ 17,18 ] . 
+ In the case of tdk/ychG , we propose that RNAP : σ32 binds this region specifically during heat shock but not following σ32 over-expression without heat shock , perhaps due to the requirement for other heat shock-induced/activated proteins . 
+ ChIP method comparison for AraC
+ The comparison of the ChIP methodologies described above demonstrates that the modified ChIP method is less sensitive . 
+ There are multiple changes to the standard method , so it is unclear which specific change ( s ) results in the decreased sensitivity . 
+ One significant change in the method described by Waldminghaus and Skarstad is the omission of Spin-X columns during the IP wash steps . 
+ We directly assessed the importance of Spin-X columns by measuring association of AraC ( C-terminally FLAG-tagged ) with target regions in E. coli using ChIP / qPCR performed either with or without Spin-X columns . 
+ The use of Spin-X columns increased the ChIP/qPCR signal for all regions tested but qualitatively the data are the same for both methods ( Figure 4 ) . 
+ Importantly , we detected association of AraC with a non-canonical targe within the dcp gene using both methods ( Figure 4 ) . 
+ This site of AraC association is hundreds of base pairs from either end of the gene and there is no detectable change in transcription of dcp or association of RNAP at this region following deletion of araC and/or addition of arabinose ( Stringer , A.M. , Currenti , S.A. , Bonocora , R.P. , Baranowski , C. , Petrone , B.L. , Singh , N. , Palumbo , M.J. , Reilly , A.E. , Zhang , Z. , Erill , I. and Wade , J.T. : Comprehensive genomic analysis of the Escherichia coli and Sal-monella enterica AraC regulons ; in preparation ) . 
+ Thus , the Spin-X column-free ChIP method detects association with non-canonical target regions , although association with all target regions is reduced relative to the standard ChIP method . 
+ In a control experiment using an untagged strain , we observed no significant ChIP/qPCR signal ( using the standard ChIP method ) for any region tested ( Additional file 1 : Supplementary Data ) . 
+ Conclusions
+ We conclude that Waldminghaus and Skarstad failed to identify DSTs not because of an improvement in the ChIP methodology , but because of lower data quality . 
+ Consistent with this , the majority of DSTs showed relatively low association of σ32 in our study : when ranked by the level of σ32 association , 36 of the bottom 43 targets are DSTs ( Figure 1A ) [ 11 ] . 
+ Furthermore , DSTs have significantly higher signal in the Waldminghaus and Skarstad datasets than expected by chance ( p < 1e-30 ; Figure 1B ) , consistent with the idea that these regions represent true binding sites for σ32 but fall below the detection threshold of this analysis . 
+ We note that Waldminghaus and Skarstad did not present any σ32 ChIP data generated using the standard methodology , precluding direct comparison of our work , nor did they use ChIP / qPCR with their modified method to measure association of σ32 with specific target regions [ 19 ] . 
+ Furthermore , Waldminghaus and Skarstad demonstrated a dramatic improvement in ChIP-chip data for SeqA using the modified ChIP method [ 19 ] , but their data is very similar to that generated using the standard ChIP method by another group [ 25 ] . 
+ Our comparison of ChIP-chip datasets highlights the importance of data quality for correct identification of protein-DNA interactions . 
+ Guidelines for ChIP-chip and ChIP-seq experimental and analytical approaches have been described previously [ 26,27 ] . 
+ Key components of these methods that are especially relevant to our own study are the comparison of replicates , the choice of control , and the importance of repetitive sequence . 
+ Current guidelines for ChIP-seq recommend the use of only two independent biological replicates [ 27 ] , but also stress the importance of reproducibility . 
+ As shown in Figure 1B , the poor reproducibility of the Waldminghaus and Skarstad datasets is likely to be a major cause of their failure to identify DSTs as regions truly bound by σ32 . 
+ Recommended controls are either input DNA or ChIP-enriched DNA from an untagged strain ( when using an epitope-tagged protein ) . 
+ Waldminghaus and Skarstad instead used DNA left in the supernatant after the initial IP , acknowledging that this DNA would be de-enriched for target regions . 
+ While this may increase the apparent signal , we caution against this approach as the ChIP-chip or ChIP-seq signals may not accurately reflect the actual level of binding . 
+ Finally , Waldminghaus and Skarstad highlighted the importance of treating repetitive DNA sequences with caution when interpreting ChIP-chip ( or ChIP-seq ) datasets . 
+ In the case of σ32 , two of the ChIP peaks identified in our earlier study overlap repetitive regions . 
+ It is impossible to determine from ChIP-chip data alone whether σ32 associates with one or all of the repetitive regions . 
+ Since this caveat applies to repetitive sequences in any ChIP-chip or ChIP-seq experiment , we echo the sentiment expressed by Waldminghaus and Skarstad and caution against ana-lysis of sequences in these regions . 
+ Many ChIP-chip studies have revealed the existence of unexpected protein-DNA interactions . 
+ For example 
+ ChIP-chip studies in bacteria have demonstrated that transcription factors often bind to sites within genes , sites without a recognizable motif , and sites that are not associated with described regulation by the transcription factor [ 15 ] . 
+ This is one of the great strengths of ChIP-chip and ChIP-seq , since these non-canonical binding sites often can not be identified using other genomic approaches such as transcription profiling . 
+ In the case of σ32 , our data provide strong evidence that RNAP : σ32 initiates transcription of many RNAs from within genes , and our original study described three such examples in greater detail [ 11 ] . 
+ The function of intragenic transcripts in bacteria is poorly understood , although several antisense transcripts have been shown previously to regulate expression of the overlapping mRNA [ 28 ] . 
+ Our own studies have revealed pervasive antisense transcription in E. coli [ 16 ] , and this has since been observed in several other bacterial species [ 28 ] . 
+ Intriguingly , many ChIP-chip studies of bacterial DNA-binding TFs have revealed sites of association inside genes [ 10,15 ] , suggesting regulation of intragenic transcripts . 
+ Similar phenomena have been observed in eukaryotes , including human cells [ 29,30 ] . 
+ Other types of non-canonical transcription factor binding sites , i.e. sites without a recognizable motif and sites that are not associated with described regulation by the transcription factor , are also poorly understood . 
+ However , sites without a recognizable motif could be explained by indirect association with DNA ( detectable using ChIP ) [ 15,31 ] or cooperative interactions with other DNA-binding proteins [ 32 ] . 
+ Sites that are not associated with described regulation by the transcription factor could be explained by combinatorial regulation by multiple , redundant transcription factors . 
+ In the case of σ32 , our data suggest that many σ32 promoters are not associated with detectable regulation using transcriptomic approaches due to a high basal level of transcription , or a specific requirement for heat shock conditions . 
+ It is important to note that Waldminghaus and Skarstad identified many non-canonical σ32-target regions in their study . 
+ Specifically , Waldminghaus and Skarstad detected σ32 association upstream of four genes whose expression was not detectably upregulated by overexpression of σ32 in either of two transcriptomic studies ( yafU , rpsL , yjhI , and fimB ) [ 17,18 ] , and six sites of σ32 association within genes or between convergently transcribed genes ( yfbM/yfbN , yfjU , ypjA , sbcD , cycA , and macB ) [ 19 ] . 
+ Waldminghaus and Skarstad suggest that `` surprising '' , non-canonical protein-DNA interactions are often artifacts . 
+ We caution against this dogmatic approach . 
+ Artifacts can arise from ChIP-chip and ChIP-seq experiments ; however , with the appropriate experimental and analytical methods , and with the appropriate controls , it is possible to identify protein-DNA interactions with high confidence . 
+ Atypical binding sites identified using these methods may indicate novel functions for well-studied proteins . 
+ These binding sites should not be dismissed , but rather should be the focus of additional studies . 
+ Methods Strains and plasmids
+ E. coli MG1655 rpoH-NFLAG containing the rpoH gene at its native chromosomal location fused to three FLAG tags was constructed using FRUIT [ 33 ] . 
+ Primer sequences are available on request . 
+ Construction of MG1655 with Cterminally FLAG-tagged AraC ( AMD187 ) will be described elsewhere ( Stringer , A.M. , Currenti , S.A. , Bonocora , R.P. , Baranowski , C. , Petrone , B.L. , Singh , N. , Palumbo , M.J. , Reilly , A.E. , Zhang , Z. , Erill , I. and Wade , J.T. : Comprehensive genomic analysis of the Escherichia coli and Salmonella enterica AraC regulons ; in preparation ) . 
+ pRB1 for expression of the rpoH gene ( σ32 ) was constructed by PCR amplification from chromosomal DNA with primers JW2199 and JW2200 ( Table 1 ) . 
+ Th 
+ JW2199 CTAGGCTAGCGAGAGGATTTGAATGACTGAC JW2200 CTAGGCATGCTTACGCTTCAATGGCAGCAC 
+ PCR product was digested with NheI and SphI and ligated into similarly digested pBAD18-Cm [ 34 ] . 
+ Cell growth
+ For heat shock ChIP experiments , 100 ml LB was inoculated with 1 ml of fresh overnight culture of MG1655 rpoH-NFLAG and cells were grown at 30 °C at 225 rpm to an OD600 of 0.5-0 .6 . 
+ Cultures were split ( 40 ml each ) for further incubation at either 30 °C or 50 °C for 10-minutes . 
+ For ChIP experiments involving overexpression of σ32 , 40 ml LB supplemented with 30 μg / ml chloramphenicol was inoculated with 0.4 ml of a fresh overnight culture of MG1655 containing either pRB1 or pBAD18-Cm . 
+ Cells were grown at 37 °C at 225 rpm to an OD600 of 0.7-0 .8 . 
+ Expression of rpoH from pRB1 was induced by the addition of 0.2 % arabinose and further incubation at 37 °C for 10 minutes . 
+ For ChIP of AraC , AMD187 was grown in LB at 37 °C at 225 rpm to an OD600 of 0.6-0 .8 . 
+ Standard ChIP method
+ Cells were crosslinked by the addition of formaldehyde to a final concentration of 1 % for 20 minutes . 
+ Formaldehyde was quenched with glycine ( 0.5 M final concentration ) and cultures were pelleted by centrifugation . 
+ Pellets were washed twice with Tris-buffered saline ( TBS ; pH 7.5 ) and resuspended in 1 ml FA lysis buffer ( 50 mM Hepes-KOH , pH 7 , 150 mM NaCl , 1 mM EDTA , 1 % Triton X-100 , 0.1 % sodium deoxycholate , 0.1 % SDS ) supplemented with 4 mg/ml lysozyme . 
+ After a 30 minute incubation at 37 °C , cells were chilled on ice and sonicated in 30 second on/off pulses for 30 minutes at 100 % output using a BioRuptor Sonicator . 
+ Lysates were centrifuged for five minutes to pellet cell debris . 
+ The supernatant was transferred to a new tube , brought up to a final volume of approximately 2 ml , and frozen in 0.5 ml aliquots . 
+ 0.5 ml crosslinked , sonicated cell lysate was brought up to a final volume of 0.8 ml with FA lysis buffer . 
+ A 20 μl aliquot was removed for `` input '' DNA control sample . 
+ 25 μl of protein A-Sepharose beads ( 50 % slurry in TBS ) and either 1 μl anti-RNA polymerase beta subunit ( Neoclone ) or 2 μl anti-FLAG ( M2 monoclonal ; Sigma ) was added to the lysate and incubated for 90 minutes at room temperature with gentle rotation . 
+ Beads were pelleted at 4000 rpm in a microcentrifuge for one minute and the supernatant was removed . 
+ Beads were resuspended in 700 μl FA lysis buffer , transferred to a Spin-X column ( Corning ) and washed for three minutes by rotation , centrifuged for 1-minute at 4,000 rpm in a microcentrifuge and the flow through discarded . 
+ The beads were washed in a similar fashion with 750 μl of each of the following : FA lysis buffer , FA lysis buffer 500 mM NaCl , ChIP wash buffer ( 10 mM Tris -- HCl , pH 8.0 , 250 mM LiCl , 1 mM EDTA , 0.5 % Nonidet-P40 , 0.5 % sodium deoxycholate ) and TE ( 10 mM Tris -- HCl , pH 8.0 , 1 mM EDTA ) . 
+ The Spin-X column was transferred to a fresh tube and the chromatin was eluted from the beads by addition of 100 μl ChIP elution buffer ( 50 mM Tris -- HCl , pH 7.5 , 10 mM EDTA , 1 % SDS ) and incubation at 65 °C for 10 minutes . 
+ The eluate was collected by centrifugation for 1 min at 4,000 rpm in a microcentrifuge . 
+ Crosslinks were reversed for both the eluate and the input samples by incubation for 10-minutes at 100 °C . 
+ DNA was purified using QIAgen PCR purification kit followed by elution in either 50 μl or 200 μl for the IP samples or 200 μl for the input samples . 
+ For AraC ChIP , Spin-X columns were omitted from this procedure when indicated in the figure . 
+ Note that data shown for AraC ChIP/qPCR with Spin-X columns will be presented elsewhere ( Stringer , A.M. , Currenti , S.A. , Bonocora , R.P. , Baranowski , C. , Petrone , B.L. , Singh , N. , Palumbo , M.J. , Reilly , A.E. , Zhang , Z. , Erill , I. and Wade , J.T. : Comprehensive gen-omic analysis of the Escherichia coli and Salmonella enterica AraC regulons ; in preparation ) . 
+ Modified ChIP method described by Waldminghaus and Skarstad ChIP was performed as above but with the following modifications : ( i ) 100 μl of post-immunoprecipitation supernatant was substituted for the `` input '' control DNA sample , ( ii ) no Spin-X columns were used , ( iii ) 1 μl RNase A ( 30 mg/ml ) was added after elution and incubated for 2 hours at 42 °C for both the input and immunoprecipitated DNA samples , ( iv ) 80 μl TE and 20 μl proteinase K ( 20 mg/ml ) was added incubated for 2 hours at 42 °C , ( v ) crosslinks were reversed by incubation overnight at 65 °C , and ( vi ) DNA was purified by phenol/chloroform/isoamyl alcohol and chloroform / isoamyl alcohol extraction followed by ethanol precipitation . 
+ Note that aliquots from the same sonicated , crosslinked cell extract were used for both the standard and modified ChIP methods . 
+ ChIP and input samples were analyzed by quantitative real time PCR using an ABI 7500 Fast real time PCR machine , as described previously [ 2 ] . 
+ Enrichment of ChIP samples was calculated relative to a control region within the transcriptionally silent bglB gene , and normalized to input DNA . 
+ Occupancy units represent background-subtractedfold-enrichment . 
+ Oligonucleotides used for real time PCR were JW125/JW126 ( bglB ) , JW1610/JW1611 ( dnaK ) , JW1612/JW1613 ( ygcI ) , JW1614/JW1615 ( ybjX ) , JW1616/JW1617 ( tdk-ychG ) , JW1622/JW1623 ( b2084 ) , JW071/JW072 ( araB ) , JW073/JW074 ( araE ) , JW075 / JW076 ( araF ) , JW389/JW390 ( ytfQ ) , JW1312/JW1313 ( dcp ) , and JW393/JW394 ( ydeN ; Table 1 ) . 
+ Note that primers for ytfQ produced primer dimers in qPCR fo 
+ ChIP with an untagged strain ( Additional file 1 : Supplementary Data ) , so we were not able to assess enrichment of this region . 
+ Estimating null distributions for ChIP-chip datasets to calculate z-scores Previous studies have analyzed ChIP-chip datasets based on the assumption that the distribution of actual ChIP-chip signals below the modal value closely matches the null distribution , and fits a normal distribution [ 20,21 ] . 
+ We determined the modal value for each ChIP-chip dataset and used all probes scoring below the mode to estimate the standard deviation of a null distribution , treating the mode as the mean . 
+ We used these mean and standard deviation estimates to calculate z-scores ( i.e. number of standard deviations from the mean ) for each probe . 
+ Assessment of the number of DSTs in intergenic regions 88 % of the E. coli genome is genic . 
+ Of the 46 DSTs , 15 have peak probe coordinates that fall in intergenic regions . 
+ Note that some additional DSTs were classified as being `` intergenic '' due to the stringent criterion used in our earlier work [ 11 ] to account for incomplete probe coverage on the microarray . 
+ We used a Binomial Test to determine the probability that 15 of 46 DSTs would be located in intergenic regions if their genomic position was unbiased with respect to genes . 
+ Comparison of DST z-scores to those of all z-scores for waldminghaus and skarstad datasets For each replicate dataset , we determined the z-score for each DST peak probe . 
+ We then determined z-scores for 1,000 randomly-selected probes from the complete dataset . 
+ We used a Mann -- Whitney U Test to determine the probability that the z-scores for DST peak probes are not larger than those of randomly-selected probes . 
+ Additional file 1: Control ChIP/qPCR data using an untagged strain.
+ Abbreviations
+ Competing interests
+ The authors declare that they have no competing interests.
+ Authors’ contributions
+ RPB performed the experiments described in Figure 2 and the Additional file 1 : Supplementary Data . 
+ DMF performed the experiment described in Figure 3 . 
+ AMS performed the experiment described in Figure 4 . 
+ JTW performed all other analyses . 
+ JTW wrote the paper with input from RPB and DMF . 
+ JTW conceived the study . 
+ All authors read and approved the final manuscript . 
+ Acknowledgements
+ We thank Todd Gray and David Grainger for comments on the manuscript . 
+ We thank Todd Gray , David Grainger , Stephen Busby , Kevin Struhl and Evgeny Nudler for helpful discussions . 
+ This work was supported by National Institutes of Health ( NIH ) Grant 1DP2OD007188 . 
+ DMF was supported by NIH training grant T32AI055429 . 
+ Received: 24 April 2012 Accepted: 1 April 2013 Published: 15 April 2013
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/23632166.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/23632166.txt 0 → 100644
View file @27818a9
+ Genome conformation capture reveals that the
+ ABSTRACT 
+ To fit within the confines of the cell , bacterial chromosomes are highly condensed into a structure called the nucleoid . 
+ Despite the high degree of compaction in the nucleoid , the genome remains accessible to essential biological processes , such as replication and transcription . 
+ Here , we present the first high-resolution chromosome conformation capture-based molecular analysis of the spatial organization of the Escherichia coli nucleoid during rapid growth in rich medium and following an induced amino acid starvation that promotes the stringent response . 
+ Our analyses identify the presence of origin and terminus domains in exponentially growing cells . 
+ Moreover , we observe an increased number of interactions within the origin domain and significant clustering of SeqA-binding sequences , suggesting a role for SeqA in clustering of newly replicated chromosomes . 
+ By contrast , ` histone-like ' protein ( i.e. Fis , IHF and H-NS ) - binding sites did not cluster , and their role in global nucleoid organization does not manifest through the mediation of chromosomal contacts . 
+ Finally , genes that were downregulated after induction of the stringent response were spatially clustered , indicating that transcription in E. coli occurs at transcription foci . 
+ Gene Regulation and Chromosome Biology Laboratory , Frederick National Laboratory for Cancer Research , National Cancer Institute , National Institutes of Health , Frederick , MD 21702 , USA , Institute of Natural and 2 Mathematical Sciences , Massey University , Auckland 0745 , New Zealand and Liggins Institute , University of 3 Auckland , Auckland 1023 , New Zealand 
+ INTRODUCTION
+ Our understanding of the spatial organization of bacterial genomes and its relationship to cellular function is limited 
+ [ for reviews see ( 1 -- 3 ) ] . 
+ Yet it is clear that despite not being enclosed in a nuclear membrane , bacterial nucleoids are spatially organized within a deﬁned sub-fraction of the cell volume ( 4 -- 11 ) . 
+ Various molecular [ reviewed in ( 2 ) ] and recombination-based methodologies have been used to identify the existence of micro - and macrodomains within the Escherichia coli nucleoid [ e.g. ( 2,5,8,12,13 ) ] . 
+ The four structured macrodomains ( 0.5 -- 1 Mb ) that have been identiﬁed exhibit preferential intra-domain recombination between att sites , whereas inter-domain recombination is reduced ( 5,7,8,12,13 ) . 
+ By contrast , microdomains are much smaller ( average 10 kb ) and have been linked to the topological isolation of supercoils ( 2,10 ) . 
+ Collectively , micro - and macrodomains are hypothesized to be critical for maintaining global organ-ization while enabling the local levels of compaction required to ﬁt a circular chromosome with an extended diameter of 490 nm within a cell with a length as small as 1000 nm ( 2 ) . 
+ Unlike eukaryote chromatin , the bacterial nucleoid does not contain histones . 
+ However , nucleoid-associated proteins ( NAPs ) , particularly histone-like proteins , such as histone-like nucleoid structuring ( H-NS ) protein , heat unstable nucleoid protein ( HU ) , factor for inversion stimulation ( Fis ) and integration host factor ( IHF ) , are believed to act like histones and play a signiﬁcant role in the organization of the nucleoid ( 14 -- 17 ) . 
+ These NAPs exhibit DNA bending , looping and bridging properties in vitro . 
+ However , studies also indicate that in vivo , the role of the NAPs could be more regulatory than architectural [ e.g. ( 18,19 ) ] . 
+ Non-classical NAPs ( i.e. SeqA , SlmA and MatP ) have been recently characterized as exhibiting macrodomain-speciﬁc DNA-binding properties [ reviewed in ( 16 ) ] and may represent alternative candidates for organizational roles within the nucleoid . 
+ The structure of the bacterial nucleoid is dynamic and affected by growth conditions and stress ( 15,20 -- 23 ) . 
+ For example , the relatively compact nucleoid present in fast growing cells is altered by treatment with serine hydroxamate ( SHX ) , which induces the stringent response ( 24 ) and inhibits replication initiation through artiﬁcial amino acid starvation . 
+ In terms of the biology of the E. coli nucleoid , the overall effect of the SHX-induced amino acid starvation is an expansion of the nucleoid and a change in transcription patterns ( 25,26 ) . 
+ This suggests a relationship between transcription and the organization of the nucleoid ( 27 ) . 
+ However , the mechanism ( s ) behind the re-structuring of the nucleoid in response to growth and stress is still largely unknown . 
+ Another long standing question is when and how the nascent nucleoid that arises from DNA replication segregates during bacterial cell growth [ reviewed in ( 1 ) ] . 
+ In E. coli , the time required for the replication of the nucleoid is ﬁxed at 40 min ( 28 ) . 
+ To maintain a fast growth rate , cells growing in rich media must initiate multiple rounds of replication before each division . 
+ Consequently , a typical cell growing in rich media contains up to 16 origins of replication ( 29 ) . 
+ Whether the nascent nucleoids segregate rapidly ( 30 -- 32 ) or remain associated after replication , by a cohesiondependent mechanism ( i.e. the cohesion model ) as seen in eukaryotes ( 33,34 ) , remains unresolved . 
+ Advances in chromosome conformation capture ( 3C ) - related methodologies ( 35 ) enable the direct high-reso-lution detection of chromosome organization [ e.g. ( 36 -- 40 ) ] . 
+ Recently , chromosome conformation capture carbon-copy ( 5C ) was used to generate a global DNA : DNA contact map for Caulobacter crescentus synchronized swarmer cells ( 9 ) . 
+ Here , we present a highresolution analysis of the DNA : DNA interactions within E. coli nucleoids in rapidly growing and starved cell populations . 
+ Using genome conformation capture ( GCC ) , we observe a clear relationship between DNA : DNA interactions , copy number and DNA replication . 
+ This suggests that nucleoids remain associated after replication , consistent with the cohesion model . 
+ Furthermore , SeqA-binding sites exhibit replication-dependent clustering , whereas binding sites for the major histone-like proteins ( Fis , H-NS and IHF ) did not . 
+ Finally , we observe a correlation between gene regulation and spatial clustering . 
+ MATERIALS AND METHODS
+ Strains and growth conditions
+ For GCC analyses ( 36 ) , E. coli strains ( Supplementary Table S1 ) were recovered from 80 C on Luria Bertani ( LB ) agar ( 2 % ) plates ( 24 h , 37 C ) . 
+ LB medium ( 3 ml , Gibco ) starter cultures were inoculated and grown ( 37 C , 220 rpm , 16 h ) . 
+ The optical density ( OD600 ) of cultures was measured and used to inoculate LB test cultures to an OD600 of 0.02 . 
+ The test cultures were grown ( 37 C , 220 rpm ) until the OD600 reached 0.25 , and the cells were harvested . 
+ For the SHX-treated samples , the cultures were treated with SHX ( 500 mg/ml , 30 min ) before harvesting . 
+ Genome conformation capture
+ E. coli chromatin was prepared according to Rodley et al. ( 36 ) with minor modiﬁcations . 
+ In brief , 5 10 formalde-9 hyde cross-linked ( 1 % ) cells were lysed ( Supplementary Materials and Methods ) in the presence of protease inhibi-tor ( Roche ) , and the chromatin was collected ( 21 500g , 20 min , 4 C ) . 
+ Chromatin was washed and suspended in chromatin digestion buffer ( 10 mM Tris -- HCl , pH 8.0 , 5 mM MgCl2 and 0.1 % TritonX-100 ) . 
+ Chromatin samples were digested with HhaI ( 100 U , New England Biolabs ) , diluted ( 20-fold ) and ligated with T4 DNA ligase ( 20 U , Invitrogen ) . 
+ A ligation control was added to the digested chromatin ( Supplementary Materials and Methods and Supplementary Table S2 ) before ligation . 
+ After ligation , cross-links , protein and RNA were removed . 
+ pUC19 plasmid was added as a sequencing control before three extractions with 1:1 phenol : chloroform . 
+ DNA was column puriﬁed ( Zymo , DNA clean and concentratorTM kit ) according to the manufac-5 turer 's instructions and eluted in milliQ H2O . 
+ Three micrograms of puriﬁed DNA was sent for paired-end sequencing ( 100 bp ) at the ATC sequencing facility ( Rockville , MD , USA ) on an Illumina Hi-Seq . 
+ Genome conformation capture network assembly, effects of sample production and processing and bioinformatics analysis
+ To identify interacting DNA fragments from the paired-end sequence reads , network assembly was performed using the Topography suite v1 .19 ( 41 ) . 
+ GCC networks were constructed from 100-bp paired-end Illumina Genome Analyser sequence reads ( Supplementary Materials and Methods ) . 
+ Except where indicated , bioinformatics and statistical analyses were performed on interactions identiﬁed by sequence reads that were uniquely mapped onto the reference genome and were above the cut-off value derived from the ligation control interactions ( Supplementary Materials and Methods ) . 
+ A breakdown of the interactions present in the E. coli samples is provided in Supplementary Table S3 . 
+ The effect of bar-coding , sequencing lane and biological replicates on the correlation between samples was quantiﬁed using the Cohen 's Kappa statistic , showing that these factors did not strongly affect sample correlations ( Supplementary Materials and Methods ) . 
+ All bioinformatics analysis was performed using in house Perl and Python scripts ( Supplementary Materials and Methods ) . 
+ Except where indicated , statistical analyses were performed in R ( 42 ) . 
+ Genome copy number
+ Copy number was determined across the E. coli genome using control-free copy number and genotype caller ( Control-FREEC ) ( 43 ) . 
+ The E. coli input sequences were in the SAM format , genome length was set at 4 639 675 bp , window size = 1000 and telocentromeric = 0 . 
+ The GC proﬁle was calculated and included . 
+ Transcription microarray
+ Brieﬂy , similar to GCC , E. coli was grown in LB ( Gibco , lot 817849 ) to an OD600 0.2 and harvested directly , o ﬁrst treated with SHX before RNA isolation . 
+ RNA was isolated using hot phenol and ﬁnally suspended in DEPC-treated water ( Invitrogen ) . 
+ The cDNA libraries were constructed using a SuperScript Double-Stranded cDNA Synthesis Kit ( Invitrogen ) and sent to Roche-Nimblegen for microarray hybridization . 
+ Each experiment ( exponential or SHX ) is a pool of three biological replicates . 
+ A total of two technical replicates were performed per condition ( exponential and SHX ) . 
+ Genes that were signiﬁcantly up-or downregulated in SHX-treated compared with exponential samples were identiﬁed by calculating the log2 of the SHX/exponential ratio ( Supplementary Materials and Methods and Supplementary Tables S4 and S5 ) . 
+ MatS, SeqA, SlmA and NAP clustering analyses
+ NAP-binding sites were obtained from Grainger et al. ( 18 ) . 
+ MatP-binding sites ( MatS ) were obtained from Mercier et al. ( 5 ) . 
+ Regions for analysis were deﬁned by taking a speciﬁed number of bases ( 50 , 100 or 250 bp ) either side of the peak binding position for NAPs or center of the MatPbinding site for MatS . 
+ For SeqA , the strongest 135 con-ﬁrmed SeqA-binding sites were obtained from Sanchez-Romero et al. ( 44 ) , and the 24 deﬁned SlmA-binding sites were obtained from Cho et al. ( 45 ) . 
+ To determine whether these regions could be found in a different interacting envir-onment compared with what would be expected by random chance , the total number of interactions with each of the individual regions and the number of interactions that occurred between the regions of interest ( i.e. clustering ) was determined from our GCC interaction network . 
+ We then generated 1000 random data sets of the same number and length ( bp ) as the actual region data set using two methods : ( i ) randomly selecting a start position for each region and then making it the same length as the region for which the random coordinate was being generated [ i.e. random spacing ( RS ) ] ; or ( ii ) randomly select the start position for the ﬁrst region and then sequentially determining the start and end position of all the other regions in the set such that the linear distances between regions were maintained [ i.e. conserved linear spacing ( CLS ) ] . 
+ This ensured that the particular interaction frequencies we observed were not because of the linear arrangement of the regions around the circular genome . 
+ One thousand random data sets were generated for the RS and CLS methods , and the total interaction and clustering frequencies were calculated from our GCC interaction network . 
+ The frequency with which the total interaction and clustering frequency of the actual data was higher or lower than the random data sets was used to estimate signiﬁcance . 
+ Interactions and clustering of genes that signiﬁcantly change their expression level on SHX treatment
+ Genomic coordinates of genes that signiﬁcantly change their expression level on treatment with SHX were obtained from http://regulondb.ccg.unam.mx/data/ GeneProductSet.txt . 
+ The total number of interactions with each of the individual genes and the number of interactions that occurred between the genes of interest was determined as for MatS , SeqA , SlmA and NAP clustering , as described earlier in the text . 
+ RESULTS
+ In GCC , the spatial organization of the nucleoid is captured by formaldehyde cross-linking within intact cells before cell lysis and the isolation of the nucleoid ( Figure 1A ) . 
+ Once isolated , the nucleoid is digested , diluted and incubated with DNA ligase to enable the capture of spatially proxim-ate but linearly separated loci ( Figure 1A ) ( 36 ) . 
+ This produces an interaction library that can be sequenced to identify the network of chromosomal interactions occurring at the moment of cross-linking . 
+ GCC differs from current competing unbiased 3C technologies in that all DNA material is sequenced without the previous selection of DNA fragments containing ligation products . 
+ Therefore , there are no enrichment introduced biases , and DNA copy variation can be determined . 
+ GCC relies on the intra-molecular ligation of cross-linked loci . 
+ However , inter-molecular ligation events resulting from random associations during the procedure can also occur , leading to false positives . 
+ To reduce the chances of isolating false positives , we ( i ) induce expansion of the nucleoid by isolation in a high-salt environment [ a ` high-salt nucleoid ' ( 2 ) ] , following cross-linking of the interacting loci ; and ( ii ) added external ligation controls during GCC library preparations to empirically measure the background level of random inter-molecular ligation events . 
+ Thus , we determined a cut-off , for the minimum number of sequences representing any one interaction , above which interactions were deemed signiﬁcant ( Supplementary Materials and Methods ) . 
+ The following analyses were only performed on interactions that were above this signiﬁcance threshold . 
+ Origin and terminus domains exist within the E. coli nucleoid
+ Chromosome interaction networks were determined for rapidly growing cells in rich medium harvested at early exponential phase and exponential cells treated with SHX ( Figure 1B and C ) . 
+ The exponential phase chromosome interaction network ( Figure 1B ) is dominant in two regions : ( i ) a high frequency interaction domain surrounding the origin ( Ori ) ; and ( ii ) a low frequency interaction domain surrounding the terminus ( Ter ) . 
+ These Ori and Ter domains are also present in the interaction network for the SHX-treated samples , although they are less pronounced ( Figure 1C ) . 
+ Higher resolution ( i.e. 20 kb ) emphasizes that the exponential phase interaction network contains regions that have a demonstrably lower average interaction frequency than the adjacent Ori and Ter domains ( Figure 1D ) . 
+ We attribute these reductions to the presence of non-ﬁxed domain boundaries within the population . 
+ We predicted that these boundaries would reduce interactions between domains , and that this would be manifested as a reduction in the interactions that cross the boundary regions . 
+ However , despite the obvious Ori preference , there is no sharp reduction in the numbers of interactions that cross our apparent domain boundaries ( Figure 1E ) . 
+ Despite the diffuse boundaries for the Ori and Ter domains , we observe several noticeable reductions in the interaction frequency at various locations in the chromosome that could represent additional domain boundaries 
+ 3000
+ 2000
+ 1000
+ Interactions within the Ori and Ter regions are linked to replication
+ Comparisons of the chromosome networks from the exponential and SHX-treated cells identiﬁed similar levels of self and adjacent interactions ( Supplementary Table S3 ) . 
+ However , SHX treatment results in fewer long distance interactions ( between 800 bp and half the length of the genome , respectively ; Supplementary Figure S1A ) , shorter loop lengths ( Supplementary Figure S1B ) and reduced numbers of partners per fragment ( Figure 2A and Supplementary Figure S2 ) when compared with the exponential network . 
+ These observations are consistent with SHX , decreasing the overall compaction of the nucleoid ( 21 -- 23 ) . 
+ The high frequency of replication initiation in rapidly growing cells leads to an enrichment of origin-proximal loci , which could explain the pronounced increase in the number of partners observed in this region in exponentially growing cells ( Supplementary Figure S2A ) . 
+ By contrast , treatment with SHX reduces this bias ( Supplementary Figure S2B ) . 
+ These results are consistent with the inhibition of replication initiation after SHX treatment leading to a reduction in the Ori : Ter copy number ratio ( 46 ) or structural alterations within the origin domain . 
+ To investigate whether interaction frequencies are affected by differences in copy number across the bacterial chromosome because of DNA replication , we compared interaction patterns and copy number before and after SHX treatment . 
+ Interactions were grouped according to the linear distance between the interacting loci and occurrence in the different environmental conditions ( Figure 2B and Supplementary Table S3 ) . 
+ The distribution of interaction strength and copy number relative to the origin was determined ( Figure 2C -- K ) . 
+ Exponential phase-speciﬁc and shared short distance interactions correlate with copy number ( Figure 2C , D and F ) . 
+ By contrast , SHX-speciﬁc or shared long distance interactions do not correl-ate with copy number ( Figure 2E , G and H ) . 
+ Critically , the ratio of Ori to Ter regions within both the exponential and SHX conditions remains at 3:1 ( compare copy number Figure 2C and E ) . 
+ Thus , the observed decrease in the frequency of the interactions within the origin domain ( compare Figure 1B and C ) is either because of a decrease in the absolute number of origin sequences or because of a structural alteration ( e.g. expansion ) of the Ori domain . 
+ Correcting the frequency of long distance interactions by copy number , a feature of GCC , indicates that most genomic regions interact with similar frequencies within the exponential-speciﬁc and shared interaction sets ( i.e. interactions that occur in both the exponential and SHX conditions ; Figure 2I and J ) . 
+ However , there are several notable deviations from this trend ( labeled peaks within Figure 2I and J ) . 
+ The observed deviations are due to interactions involving multiple fragments within each of the 10 000-bp segments that are plotted ( Figure 2I and J ) . 
+ By contrast , copy number correction of the long distance SHX-speciﬁc interactions identiﬁes an increase in the interaction frequency within the Ter domain . 
+ The remainder of the genome shows relatively even and low interaction frequencies within the SHX-speciﬁc interaction set ( Figure 2K ) . 
+ Clustering of MatP - and SeqA-binding sites links nucleoid structure and replication 
+ To further investigate the link between replication and nucleoid organization , we determined the clustering and interaction properties of loci containing characterized protein-binding sites for the MatP , SlmA and SeqA proteins . 
+ MatP is a protein that binds to matS sites and organizes the Ter macrodomain ( 5 ) . 
+ Analyses of matS loci identify signiﬁcantly ( P < 0.008 ) high clustering ( i.e. inter-matS loci interactions ) within the exponentially growing cells ( Supplementary Table S6 ) . 
+ In contrast , clustering of matS sites was not detected in the SHX-treated cells . 
+ The clustering in the exponentially growing condition was attributed to a single speciﬁc interaction between matS10 and matS5 ( Figure 3A ) . 
+ This interaction must result from intra - or inter-Ter associations of these matS sites ( Figure 3A i -- iv ) . 
+ The ﬁnding that SeqA binds as a dimer , which multimerizes to form a left-handed ﬁlament [ reviewed in ( 47 ) ] , suggests that this protein may link spatially separated binding sites . 
+ Clustering of the 135 strongest conﬁrmed SeqA-binding sites present within exponentially growing E. coli ( 44 ) was signiﬁcantly higher than the random set ( P < 0.05 ) ( Supplementary Table S7 ) . 
+ Moreover , these sites are signiﬁcantly more prone to interact with other loci than random sites ( P < 0.05 ; Supplementary Table S7 ) . 
+ Visualizing the positions of the SeqA -- SeqA interactions that formed within the E. coli genome showed that they tend to occur toward , and involve , the Ori domain in exponential cells ( Figure 3B and C ) . 
+ SeqA interactions that are shared between exponential and SHX-treated nuclei predominantly link the left and right replichores ( Figure 3C ) . 
+ By contrast , cells treated with SHX have a reduction in clusters involving SeqA sites surrounding the Ori domain and more inter-replichore interactions toward the terminal domain ( Figure 3C and D ) . 
+ This is consistent with the progression of active replication forks that were initiated before SHX treatment . 
+ SlmA binds at 24 deﬁned sites within the genome ( 45 ) and acts to prevent FtsZ polymerization and premature cell division before complete chromosome replication . 
+ Analyses of the clustering and interaction proﬁles of E. coli SlmA sites demonstrated that clustering of these sites was not different from that observed for randomly selected sites ( Supplementary Table S8 ) . 
+ However , SlmA sites did exhibit a signiﬁcantly increased propensity to interact with other genomic loci ( P < 0.05 ) compared with randomly spaced elements for both exponential and SHX-treated cells ( Supplementary Table S8 ) . 
+ The signiﬁcant increase in interaction frequency was lost when comparisons were made with random sets that have conserved linear spacing ( Supplementary Table S8 ) . 
+ Note that the differences observed in signiﬁcance when the test data set was compared with randomly generated data sets ( i.e. RS or CLS ) conﬁrm that the linear spacing of E. coli loci is important . 
+ Whether this is an effect or cause of spatial organization remains to be determined . 
+ Intra- or inter-NAP–binding site clustering does not contribute to the global organization of the E. coli nucleoid
+ We investigated the clustering and interaction properties of H-NS - , IHF - and Fis-binding sites , which are not enriched in any particular macrodomain . 
+ There is no detectable clustering for the 200-bp regions surrounding the Fis - , H-NS - and IHF-binding sites in either the exponential or SHX-treated nucleoids ( Table 1 ) . 
+ Moreover , the classical NAP-binding sites have depleted levels of interactions in exponentially growing E. coli cells ( Table 1 ) 
+ These results can be explained by restrictions in the ﬂexibility of the DNA ( and , hence , reduced ligation efﬁciencies ) because of the binding of the NAP . 
+ However , increasing the length of the region surrounding the binding site has no effect on the clustering ( data not shown ) . 
+ Additionally , we do not observe intra-NAP -- binding site clustering ( Table 1 ) , consistent with the temporal isolation of the expression of these NAPs ( 48 ) . 
+ Genes up - or downregulated after SHX treatment exist in different spatial environments , conﬁrming functional compartmentalization of the nucleoid 
+ Eukaryotic studies have identiﬁed a non-random distribution of gene expression associated with the presence of spatially distinct environments that promote or inhibit nuclear functions [ e.g. ( 49 -- 51 ) ] . 
+ Similarly , we observe that E. coli genes whose transcript levels increased or decreased in response to SHX treatment are overrepresented in some gene ontology terms ( Supplementary Table S5 ) and are non-randomly distributed across the linear genome ( Figure 4A and B ) in a manner that does not correlate with GC content ( Supplementary Figure S3A ) . 
+ There is no correlation between transcript level and interaction frequency at the level of speciﬁc restriction fragments ( Supplementary Figure S3B and C ) . 
+ However , the SHX downregulated genes have high average transcript ( P < 0.001 ; Supplementary Table S9 ) , clustering and interaction ( Figure 3C ) levels in exponential phase cells . 
+ These results suggest that genes that are highly expressed in exponential phase and downregulated after SHX treatment are not only linearly but also highly spatially clustered . 
+ In conjunction with microscopic observations of large RNA polymerase clusters ( foci ) within exponentially growing E. coli cells ( 21 ) , our results support the hypothesis that the highly expressed exponential phase genes are associated with transcription foci . 
+ Despite this , genes downregulated in response to SHX treatment ( P < 0.001 ; Supplementary Table S9 ) remained highly clustered ( Figure 4C ) . 
+ Similarly , upregulated genes within lowly clustered regions do not increase their clustering on activation ( Figure 4C ) . 
+ As such , the maintenance of the clustering is independent of transcript levels and ipso facto transcription . 
+ DISCUSSION
+ The E. coli nucleoid has a complex structure that emerges from the sum of the cellular processes that occur within the bacterial cell . 
+ We identiﬁed two macrodomains within the E. coli chromosome interaction networks corresponding to the Ori and Ter domains that have been previously identiﬁed ( 5,7,8,12,13,52 ) . 
+ However , the two remaining macrodomains [ Left ( L ) , Right ( R ) ] and the two nonstructured domains ( NS ) are not obvious within our data . 
+ Moreover , we did not identify hard boundarie surrounding either the Ori or Ter domain , consistent with earlier predictions ( 7,12 ) . 
+ It remains possible that the L , R and NS domains and the domain boundaries were obscured because of the use of an unsynchronized population of cells . 
+ Alternatively , the formation of the macrodomains and the previously observed reductions in inter-domain recombination rates ( 12 ) could be achieved by a combination of mechanisms of which physical segregation is only one component . 
+ This explanation is supported by the observation that a low level of connectivity remains between the Ter and Ori domains . 
+ Critically , this connectivity occurs at levels above those observed for random inter-molecular ligation under our experimental conditions and indicates that although these domains are largely separated , there is some inter-domain mixing during the cell cycle . 
+ This is consistent with the observation that recombination rates between att sites are reduced but not completely abolished between these domains ( 12 ) . 
+ The chromosome interaction networks we identiﬁed within both exponential and SHX-treated E. coli cells contain variable numbers of short and long distance loops . 
+ The observation that the number of long distance interactions ( long distance loops ) reduced after treatment with SHX can be interpreted as indicating that the nucleoid expands under this condition , consistent with microscopic observations ( 21,22,53 ) . 
+ Either the observed expansion is speciﬁc and directed as part of the stress response or it is a non-speciﬁc consequence of SHX acting on the factors that mediate the interactions ( e.g. rapid protein turn over with no replacement ) . 
+ The exact reasons for the loss of interactions remain to be determined . 
+ However , the fact that SHX-speciﬁc interactions form indicates a directed alteration in nucleoid organization . 
+ Is the E. coli nucleoid shaped as a sausage or rosette?
+ The presence of short and long distance loops within both networks points to the E. coli genome folding into a series of DNA loops connected to a central node ( i.e. a rosette ) . 
+ This interpretation agrees with electron microscope observations of isolated nucleoids [ reviewed in ( 2 ) ] . 
+ However , our observation that the Ter region has few contacts with itself ( i.e. is extended in nature ) and is less well connected to the remainder of the genome is consistent with previous observations made by David Sherratt 's group ( 4,54 ) . 
+ Therefore , despite differences in growth rate between the studies ( 4 ) , our data also support the hypothesis that the E. coli chromosome is organized as a sausage in which the bulk of the chromosome is organized into a compacted rod that is circularized by the Ter domain [ Figure 5A ( 4,54 ) ] . 
+ The apparent dichotomy of these interpretations is reconcilable through the realization that the isolation of a sausage-shaped genome during preparation for electron microscopy would result in the appearance of a rosette . 
+ Thus , the sausage model is a variation of the rosette model where the rosette is ﬂattened through con-ﬁnement or as a result of the biological processes within the live cell . 
+ Replication contributes to nucleoid organization through SeqA
+ The SeqA and SlmA proteins are implicated in the regulation of replication and chromosome separatio 
+ [ reviewed in ( 16 ) ] . 
+ Our results indicate that SlmA-binding sites do not cluster as part of nucleoid occlusion during replication initiation or extension . 
+ Therefore , the dimerization necessary to activate SlmA occurs at a single or linearly-adjacent binding site ( s ) but does not result from spatial associations of distant SlmA sites . 
+ Consistent with the supposition by Dame et al. ( 16 ) , the low levels of SlmA clustering observed indicate that any contribution that SlmA-FtsZ makes to nucleoid structure must be facilitated by tethering to an external framework [ e.g. shortened preformed FtsZ polymers ( 45 ) , or non-functional protoﬁlaments ( 55 ) ] or the cell membrane . 
+ By contrast , the replication-dependent nature and distribution of the exponential phase SeqA-mediated long distance interactions provides support for a role for SeqA clustering in the formation of an intra - and/or inter-chromosomal structure ( Figure 5A and B ) . 
+ This is particularly true for SeqA interactions that form over the origin of replication and could function to sequester newly replicated origins and delay chromosome separation [ ( 56 -- 58 ) , reviewed in ( 16,47 ) ] . 
+ As such , the SHX-depend-ent loss of the long distance interactions is predicted if replication and segregation occur consecutively ( 29 ) . 
+ Thus , the loss of SeqA-mediated interactions within the SHX-treated nucleoid reﬂects an underlying spatial segregation of the replicated chromosome regions ( 46 ) . 
+ The predominance of SeqA clusters between loci that are approximately equidistant from the Ori within the SHX -- speciﬁc , and shared interaction data sets represent links between the hemimethylated GATC sites trailing the replisome . 
+ We interpret the distinct subset of interreplichore SeqA clusters as indicating that the DNA polymerases are pausing at speciﬁc genomic sites within the cell populations . 
+ Finally , there is no correlation between alterations to transcript levels and SeqA clustering ( data not shown ) ; therefore , SeqA clustering is independent of transcription . 
+ Collectively , these results support a strong linkage between replication and nucleoid organization ( 4 ) . 
+ For ease of visualization , the chromosomal interactions that we identiﬁed are presented as intra-chromosomal connections ( Figure 1 ) . 
+ This form of presentation is problematic , as the proximity-based ligation data are probabilistic and represent a population average from unsynchronized cells ( 59 ) . 
+ As such , it is impossible to determine which combinations of interactions occur within a single nucleoid . 
+ Second , although the sequences we obtain as part of the GCC protocol identify the interacting loci , they do not provide information on whether the interactions occur within or between the chromosome ( s ) . 
+ This is an important consideration when investigating nucleoid structure in exponential phase bacterial cells that contain and segregate partially replicated chromosomes ( 3 ) . 
+ Therefore , it is possible that the formation of long distance SeqA-dependent and - independent interactions can be facilitated by overlaps between the replichore arms that result from the chromosome alignment [ i.e. inter-chromosomal ( Figure 5A , right ) ] . 
+ Interestingly , such a system may contribute to gene dosage control , as well as the control of chromosome segregation . 
+ However , it remains possible that interactions also occur within a chromosome [ i.e. intra-chromosomal 
+ What role does the matS5–10 loop play in nucleoid organization?
+ MatS sites have a role in deﬁning the Ter domain ( 5,8 ) . 
+ In vivo experiments indicate that the deﬁnition of the Ter domain and condensation of this region are separable events with the condensation dependent on the presence of the MatP C-terminal coiled-coil domain , which is responsible for tetramerization and looping ( 60 ) . 
+ We found that the matS5 and matS10 sites form a speciﬁc loop that surrounds the TerA site ( 1 339 796 -- 1 339 791 bp ) and is located away from the dif site ( 1 589 000 bp ) toward the Ori on the right replichore . 
+ Note that matS5 is one of two matS sites ( the other being matS21 ) that do not show in vivo MatP binding in an E. coli K12 derivative of MG1655 ( 5 ) . 
+ The question thus arises as to what contribution the matS5 -- 10 interaction makes to the Ter domain structure and function . 
+ It is possible that the matS5 -- 10 loop explains observations of a spatially separable condensed region within the center of the Ter linker domain ( 4 ) . 
+ Furthermore , the absence of detectable matS clustering between the other matS loci raises the possibility of differentiation in the functions of the matS sites . 
+ However , further experiments are required to conﬁrm these hypotheses and identify how or if MatP contributes to the formation of the matS5 -- 10 loop . 
+ Do ‘histone-like’ NAPs play a role in global nucleoid structure?
+ The spatial clustering of NAP ( i.e. H-NS , Fis and IHF ) DNA-binding sites is not signiﬁcant within the gross spatial organization of the E. coli nucleoid we identiﬁed . 
+ Rather our results are consistent with the hypothesis that H-NS , IHF and Fis contribute to compaction through localized structuring [ reviewed in ( 61 ) ] , gene regulation or the formation of large protein heterocomplexes [ reviewed in ( 62 ) ] . 
+ These results are in contrast to those of Wang et al. 2011 ( 14 ) , who identiﬁed H-NS clustering within the E. coli nucleoid using microscopic and proxim-ity-ligation -- based measurements in slow-growing early log phase cells . 
+ This apparent discrepancy may be due to the signiﬁcant increase in resolution afforded by the use of the HhaI enzyme in our study . 
+ This conclusion is supported by our identiﬁcation of interactions linking HhaI restriction fragments from within the larger EcoRI restriction fragments that were previously characterized as demonstrating an H-NS -- dependent association [ Supplementary Figure S4 ( 14 ) ] . 
+ Therefore , we propose that the previously recognized relationship between ligation efﬁciency and the presence/absence of h-ns mutants ( 14 ) was likely due to a combination of a global reorganization of localized genome structure ( 63 ) and epi-static effects resulting from H-NS -- dependent transcriptional changes 
+ Do transcription foci have a role in nucleoid organization ? 
+ The observed organization of highly transcribed genes into clustered spatial environments is consistent with the hypothesis that some clustering is occurring around transcription foci [ e.g. ( 64 ) ] . 
+ Similarly , the copy-number independent long distance interactions may reﬂect se-quence-driven intra-chromosomal nucleoid folding for the coordination of transcription through enhancer-like interactions consistent with previous observations in bacteria ( 14,65,66 ) and eukaryotes [ e.g. ( 67 -- 69 ) ] . 
+ The existence of these prokaryotic transcription foci is supported by microscopic observations of RNA polymerase foci within E. coli cells ( 20,21 ) . 
+ The fact that similar clustering was observed in Pseudomonas aeruginosa ( data not shown ) and among highly transcribed genes in Schizosaccharomyces pombe ( 40 ) implies that the clustering of highly transcribed genes may be a ubiquitous feature of the control of gene expression . 
+ It is likely that the linear gene clusters ( Figure 4A ) form into combinations of localized and distributed spatial clusters ( Figure 5C ) . 
+ Given that RNA polymerase is redistributed after SHX treatment ( 21,22 ) , decreases in the number of long distance interactions ( i.e. reductions in the extent of distributed clustering ) , we observed following stress induction could be interpreted as indicating that RNA polymerase mediates some interactions . 
+ However , the identiﬁcation of a core interaction pattern that is conserved within the E. coli nucleoid after SHX treatment indicates that at least some of these interactions are stable to a signiﬁcant redistribution of RNA polymer-ase . 
+ This result agrees with eukaryotic studies that dem-onstrate long distance interactions are insensitive to inhibition of ongoing RNA polymerase transcription ( 70 ) . 
+ Furthermore , the high levels of clustering and interactions observed at genes that were highly expressed in the exponential phase and subsequently downregulated by SHX treatment indicates that the localized clustering -- but not necessarily the identity of the partners -- is stable . 
+ However , it remains possible that transcription-associated interactions respond slowly to environmental change , allowing for short term ﬂuctuations in environmental conditions without the requirement for major rearrangement of genome organization . 
+ This forms an epigenetic memory that is capable of being inherited ( 71 ) similar to that observed in yeast ( 72 -- 76 ) . 
+ Does a nucleolus-like structure form within the E. coli nucleoid?
+ It has been proposed that the formation of transcription factories that include the ribosomal RNA genes and ribosomal protein encoding loci could induce the compaction of the nucleoid through the formation of a nucleolus-like structure ( 23,77,78 ) . 
+ However , we found no evidence that the nucleoid structure promotes the clustering of ribosomal RNA genes and ribosomal protein encoding loci ( data not shown ) . 
+ This may be due to technical limitations in the analysis of repetitive loci that can not be unambiguously positioned onto the reference genome . 
+ Alternatively , it may be due to the very high levels of transcriptional activity at these loci interfering with the cross-linking and ligation steps during the preparation of our chromosome interaction libraries . 
+ In silico modeling of the nucleoid that incorporates biophysical parameters and interaction frequencies [ similar to ( 9,79 ) ] may resolve this issue . 
+ Epistatic interactions and the chromosome interaction network
+ The bacterial cell is a complex structured entity in which each part exists ` for and by means of the whole ' ( 80 ) . 
+ As such nucleoid structure is an integral -- inseparable -- part of the cells response to environmental challenge . 
+ Moreover , the contribution of any one gene to the bacter-ial phenotype relies on its relationship with other genes on levels that include regulation , transcription , translation , complex formation and function . 
+ Therefore , it is likely that the interaction network we have determined contains information on epistatic relationships between multiple genes that occur at the regulatory , transcriptional and translational levels because of the co-dependence of these processes in E. coli . 
+ Future work should interrogate prokaryotic interaction networks for evidence of epistatic relationships and must address the mechanism ( s ) governing the organization of global structure . 
+ CONCLUSION
+ The detection of both long and short distance interactions within the E. coli nucleoid is consistent with empirical measures and modeling , which indicated that intra-nucleoid interactions play a dominant role in shaping the E. coli nucleoid ( 11 ) . 
+ However , the long distance interactions did not consistently involve loci located equidistant from the Ori on opposite replichores ; therefore , it is unlikely that the E. coli nucleoid is preferentially structured as ellipsoids as observed in C. crescentus ( 9 ) . 
+ Rather our study indicates that the chromosome ( s ) within exponentially fast-growing E. coli cells are structured by interactions that are linked to the ongoing replication and transcription processes within the cell . 
+ The speciﬁcity of the observed interactions identiﬁes spatial organization as a signiﬁcant factor in bacterial gene regulation and indicates that the spatial clustering of highly regulated genes is a ubiquitous feature of gene regulation . 
+ ACCESSION NUMBERS
+ The GCC data has been banked with Gene expression omnibus ( GSE40603 ) . 
+ Expression data has been deposited GSE40304 . 
+ SUPPLEMENTARY DATA
+ Supplementary Data are available at NAR Online : Supplementary Tables 1 -- 10 , Supplementary Figures 1 -- 6 and Supplementary Materials and Methods 
+ ACKNOWLEDGEMENTS
+ The authors would like to thank Philippe Collas , Austen Ganley , Gary Greyling , Lutz R. Gehlen , Heather Hendrickson , Julia Horsﬁeld , Rod McNab , Ana Pombo , Tom Schneider and Yan Ning Zhou for helpful discussions . 
+ FUNDING
+ Intramural Research Program of the National Institutes of Health , National Cancer Institute , Center for Cancer Research ( to C.C. and J.D. ) ; the Marsden Fund ( to J.M.O.S. ) ; Massey University research fund ( to J.M.O.S and B.H.A.R. ) ; Massey University scholarship ( to R.S.G. ) . 
+ Funding for open access charge : The Liggins Institute Auckland University . 
+ Conﬂict of interest statement. None declared.
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/23646895.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/23646895.txt 0 → 100644
View file @27818a9
+ methylglyoxal: transcriptional readthrough from the nemRA
+ Ertan Ozyamak ,1 ** † Camila de Almeida ,1,2 ‡ Alessandro P. S. de Moura ,2 Samantha Miller1 and Ian R. Booth1 * 1School of Medical Sciences , Institute of Medical Sciences , University of Aberdeen , Aberdeen AB25 2ZD , UK . 
+ 2Institute of Complex Systems & Mathematical Biology , School of Natural & Computer Sciences , University of Aberdeen , Aberdeen AB24 3UE , UK . 
+ Summary
+ Methylglyoxal ( MG ) elicits activation of K + efflux systems to protect cells against the toxicity of the electrophile . 
+ ChIP-chip targeting RNA polymerase , supported by a range of other biochemical measurements and mutant creation , was used to identify genes transcribed in response to MG and which complement this rapid response . 
+ The SOS DNA repair regulon is induced at cytotoxic levels of MG , even when exposure to MG is transient . 
+ Glyoxalase I alone among the core MG protective systems is induced in response to MG exposure . 
+ Increased expression is an indirect consequence of induction of the upstream nemRA operon , encoding an enzyme system that itself does not contribute to MG detoxiﬁcation . 
+ Moreover , this induction , via nemRA only occurs when cells are exposed to growth inhibitory concentrations of MG . 
+ We show that the kdpFABCDE genes are induced and that this expression occurs as a result of depletion of cytoplasmic K + consequent upon activation of the KefGB K + efflux system . 
+ Finally , our analysis suggests that the transcriptional changes in response to MG are a 
+ Accepted 9 April , 2013 . 
+ For correspondence . 
+ * E-mail i.r.booth @ abdn.ac.uk ; Tel. ( +44 ) 1224 437396 ; Fax ( +44 ) 1224 437465 ; ** Email ozyamak@gmail.com, Tel. ( +1 ) 510 642 2140 ; Fax ( +1 ) 510 642 4995 . 
+ Present addresses : † Department of Plant & Microbial Biology , University of California , Berkeley , CA 94720 , USA ; ‡ Astra-Zeneca Ltd , Mereside , Alderley Park , Macclesﬁeld SK10 4TG , UK . 
+ Data deposition : Data reported in this paper have been deposited in the ArrayExpress database ; http://www.ebi.ac.uk/microarray-as/ae/ ( Accession No . 
+ E-MTAB-100 ) . 
+ culmination of the damage to DNA and proteins , but that some integrate speciﬁc functions , such as DNA repair , to augment the allosteric activation of the main protective system , KefGB . 
+ Introduction
+ Bacterial adaptation blends both modulation of cytoplasmic enzymes and changes in gene expression to effect a response that enhances survival of changes in the environment . 
+ The bacterial response to electrophiles has been well-characterized at the level of activation of protective K + efflux systems ( Ferguson , 1999 ) , but studies of the contribution from speciﬁc transcriptional events are more limited . 
+ Methylglyoxal ( MG ) is a toxic electrophile produced during unbalanced sugar metabolism in Escherichia coli ( E. coli ) and other bacteria ( Freedberg et al. , 1971 ; Russell , 1993 ) . 
+ Conservation of the glyoxalase system for MG detoxiﬁcation from bacteria to man suggests that such exposure is common to all lifestyles ( Mannervik , 2008 ; Sukdeo and Honek , 2008 ; Suttisansanee and Honek , 2011 ) . 
+ Several studies hint towards the production of MG in macrophages in response to the entry of pathogenic microorganisms such as Salmonella or Mycobacterium , as part of the host defence mechanisms ( Eskra et al. , 2001 ; Eriksson et al. , 2003 ; Rachman et al. , 2006 ) . 
+ The occurrence of MG in many food and beverage products has also been reported ( Nemet et al. , 2006 ; Tan et al. , 2008 ) and this may contrib-ute to background levels of DNA damage ( Kenyon and Walker , 1980 ; Sassanfar and Roberts , 1990 ; Yuan et al. , 2008 ) since MG is a known mutagen ( Marnett et al. , 1985 ; Dorado et al. , 1992 ) . 
+ Recently , exposure to MG has been suggested to underpin the faster rate of development of ` persister ' cells in E. coli populations ( Girgis et al. , 2012 ) , which may reﬂect the mutagenic potential of this electrophile . 
+ In enteric bacteria , the major route for MG production is from dihydroxyacetone phosphate ( DHAP ) , which is converted to MG by the action of MG synthase ( mgsA ; Fig. 1A ) ( Hopper and Cooper , 1971 ) . 
+ During normal growth the production of MG is maintained at a low level by the requirement for homotropic activation of MgsA by DHAP and by the strong inhibition of the enzyme by free phosphate . 
+ Consequently , MG production only occurs at a high rate when the cellular pool of phosphate is depleted and DHAP pools are very high -- such conditions arise when cells move from famine to feast , a condition that predisposes cells to perform high levels of transport and metabolism of sugars ( Totemeyer et al. , 1998 ) . 
+ Low concentrations of MG are bacteriostatic , but at high levels MG kills bacteria via covalent modiﬁcation of proteins , DNA and lipids ( Krymkiewicz , 1973 ; Colanduoni and Villafranca , 1985 ) . 
+ MG modiﬁes bases in DNA ( Krymkiewicz , 1973 ) , particularly guanine , and repair can lead to double strand breaks in the DNA ( Ferguson et al. , 2000 ) and induction of DNA repair enzymes ( Kenyon and Walker , 1980 ; Sassanfar and Roberts , 1990 ; Yuan et al. , 2008 ) . 
+ Protection against MG in E. coli , and other enteric bacteria , has several components . 
+ A central feature is the formation of cysteinyl adducts with glutathione ( GSH ) and the subsequent metabolism by the GSH-dependent glyoxalase system , encoded by the unlinked gloA and gloB genes ( Ferguson et al. , 1998 ; MacLean et al. , 1998 ; Kizil et al. , 2000 ) . 
+ This pathway leads to cytoplasmic recycling of GSH during MG breakdown , in contrast to the fate of other electrophile adducts formed with GSH and/or other protective thiols , such as mycothiol ( Ferguson et al. , 1993 ; 1995 ; Ferguson and Booth , 1998 ; Eskra et al. , 2001 ; Fahey , 2001 ; Newton et al. , 2009 ; 2012 ) . 
+ In addition , E. coli has evolved a more sophisticated protective mechanism that involves both GSH-dependent and GSH-independent enzyme systems and K + efflux ( KefGB and KefFC ) systems that respond directly to GSH and GSH adducts ( GSX ) ( Elmore et al. , 1990 ; Ferguson et al. , 1995 ; MacLean et al. , 1998 ; Ozyamak et al. , 2010 ) . 
+ The GSH-dependent glyoxalase system , consisting of glyoxalase I and II ( GlxI & GlxII ) , provides the main route for MG detoxiﬁcation resulting in the production of D-lactate ( Fig. 1A ) ( MacLean et al. , 1998 ; Mannervik , 2008 ) . 
+ Survival is highly dependent on the activity of these enzyme systems via their impact on S-lactoylglutathione ( SLG ) pools ( Ozyamak et al. , 2010 ) . 
+ The balance of the activities of GlxI & GlxII determines the cytoplasmic pool of SLG , which is the activator of ligand-gated K + efflux systems KefGB and KefFC ( Fig. 1A ) . 
+ Activation of KefGB and KefFC causes cytoplasmic acidiﬁcation , the degree of which is directly correlated with survival ( Ozyamak et al. , 2010 ) . 
+ Although E. coli has two systems , KefGB and KefFC , of which the former is dominant in the response to MG , many Gram-negative bacteria have a single Kef system . 
+ A third , GSH-independent , enzyme ( glyoxalase III , GlxIII ) with the ability to convert MG directly to D-lactate has been identiﬁed as Hsp31 ( encoded by hchA ) ( Misra et al. , 1995 ; Subedi et al. , 2011 ) . 
+ In addition , a number of Aldo-keto reductases may play ancillary roles in metabolizing MG , via their activity as low speciﬁcity , aldehyde reductase ( Ko et al. , 2005 ; Lee et al. , 2010 ) . 
+ In this study we applied ChIP-chip technology ( Grainger et al. , 2005 ; 2009 ) to measure changes in the genome-wide redistribution of RNA polymerase ( RNAP ) during MG stress . 
+ ChIP-chip directly measures the occupation of DNA by speciﬁc binding proteins ( Herring et al. , 2005 ) . 
+ When RNAP is targeted , as here , one may infer changes in transcriptional patterns analogous to classical transcriptomics studies ( Grainger et al. , 2005 ) . 
+ In addition to avoiding problems with mRNA stability , additional information is gained from the RNAP distribution across the transcribed regions , such as in the case of stalled RNAP molecules ( Wade et al. , 2007 ; Grainger and Busby , 2008 ) . 
+ In this study these analyses of transcription were complemented by biochemical analyses and mutant creation to test spe-ciﬁc hypotheses arising from the observed patterns of RNAP distribution . 
+ Our study provides the ﬁrst insight into the transcriptional response of E. coli to sudden exposure to either sublethal or lethal concentrations of MG and also describes the temporal response as the MG concentration increases progressively during unbalanced metabolism . 
+ A large number of transcriptional changes were observed in response to MG exposure , but of these only the enhanced expression of the gloA gene , encoding GlxI , and the SOS response are directly beneﬁcial . 
+ Other changes are either neutral or counter-protective . 
+ The expression data are consistent with transcriptional responses responding primarily to cell damage rather than activation of a regulon of protective systems . 
+ Results
+ Experimental design
+ The response of cell populations to MG depends both on the MG concentration and on the cell density ( Fraval and McBrien , 1980 ) . 
+ We have performed ChIP-chip with DNA-RNAP complexes isolated from E. coli MG1655 cells incubated under three different growth regimes ( Fig. 1B ) : ( I ) sublethal concentration of MG ( 0.8 mM MG at cell density OD650 ~ 0.4 ) . 
+ In these experiments the MG concentration falls progressively throughout the experiment due to detoxiﬁcation by the cells ; ( II ) lethal dose of MG ( 0.8 mM at OD650 ~ 0.04 ) . 
+ Here the MG concentration falls very slowly throughout the experiment , but remains at a lethal concentration throughout the sampling period ; and ( III ) progressive intoxication ( cells synthesize MG throughout the experiment , such that the concentration rises from zero to > 0.7 mM over a 4 h time period ) ( Totemeyer et al. , 1998 ) ( for more details see Experimental procedures and Supporting information ) . 
+ In addition , strain MG1655 was compared with derivative MJF632 lacking KefGB and KefFC , the electrophile-activated K + efflux systems that confer protection ( Ferguson , 1999 ) . 
+ Exposure to sublethal concentrations of MG ( Type I ) Treatment of mid-exponential phase cultures ( initial OD650 = 0.4 ) with 0.8 mM MG led to only approximately 50 % growth inhibition , with the implication that transcription should remain active throughout the experimental period ( Fig. 2 ) . 
+ Treated cells recovered the maximum growth rate 60 min after addition of MG , which corresponds to the time taken to reduce the external concentration of MG to a non-inhibitory concentration ( ~ 0.1 mM ) ( MacLean et al. , 1998 ) . 
+ The culture subsequently reached the same ﬁnal cell density as non-treated cells ( Fig. 2 ) , indicating that no irreversible damage had occurred from this experimental regime . 
+ A strain lacking both the KefGB and KefFC K + efflux systems , MJF632 , exhibited a delayed recovery from exposure to MG ( Fig. 2B ) ; we have previously shown that lack of the efflux systems does not modify the detoxiﬁcation rate ( Ferguson et al. , 1995 ; Almeida , 2009 ) . 
+ ChIP-chip analysis was performed after 30 min , midway through the period of reduced growth rate . 
+ A number of genes were induced and others repressed ( Table S2 , Dataset S1 ) , as represented by peaks and troughs , respectively , in the data . 
+ RNAP peaks across genome areas were in good agreement with the bounda-ries of known transcription units ( TUs ) ( Fig. 3 ) . 
+ Induction of the LexA-regulated SOS regulon MG is known to cause DNA modiﬁcation , principally the formation of adducts with deoxyadenosine and deoxyguanine ( Papoulis et al. , 1995 ; Frischmann et al. , 2005 ) . 
+ It was not surprising , therefore , that the SOS system was induced after MG exposure . 
+ LexA-regulated genes ( e.g. recAX , lexA-dinF , dinB ) were among the genes with increased RNAP occupancy ( Fig. 3A and B , Table S2 ) , indicating a high transcriptional activity for DNA repair , consistent with the DNA damage expected during MG treatment reported by others ( Kenyon and Walker , 1980 ; 
+ Sassanfar and Roberts , 1990 ; Yuan et al. , 2008 ) . 
+ This transcriptional pattern was conﬁrmed by qRT-PCR , which demonstrated very signiﬁcant increases in mRNA for SOS genes ( Fig. 4 ) . 
+ The increased expression of the SOS regulon was in line with expectations and thus provided a good baseline for the other changes in gene expression discussed below . 
+ Increased RNAP occupancy at SOS response genes was accompanied by decreased occupancy at genes associated with fast growth , such as those for motility and amino acid biosynthetic pathways ( e.g. ﬂgBCDEFGHIJ and gltBD ; Fig. 3C , Dataset S1 ) . 
+ Induction of the nemRA operon is beneﬁcial for MG tolerance through an indirect mechanism
+ We have previously demonstrated the critical role of GlxI in generating the regulator of K + efflux systems KefGB and 
+ KefFC ( Fig. 1A ) and thus mediating protection against MG ( MacLean et al. , 1998 ; Ozyamak et al. , 2010 ) . 
+ We thus speciﬁcally sought to analyse the GlxI-encoding gene gloA . 
+ Bioinformatic analysis , as provided on the RegulonDB database ( Salgado et al. , 2013 ) , suggested the presence of a potential promoter in the intergenic region between nemA and gloA indicating that nemRA and gloA can form two independent TUs ( Fig . 
+ S1 ) . 
+ However , there is no marked transcriptional terminator between nemA and gloA , suggesting the possibility for transcriptional readthrough . 
+ Increased RNAP binding was observed along the length of the gloA gene . 
+ However , this was continuous with the binding to the upstream nemRA operon ( Fig. 5A ) . 
+ The nemRA operon encodes the N-ethylmaleimide reductase , NemA , and NemR , the repressor protein of the nemRA operon . 
+ It was shown previously that NemR can be inactivated by alkylating reagents such as N-ethylmaleimide , but also by MG ( Umezawa et al. , 2008 ) . 
+ Thus , we sought to determine whether there was a real linkage between the nemRA and gloA genes . 
+ A mutant strain deleted for nemR was created ( see Supporting information ) and transcription of the gloA gene assessed by qRT-PCR of the mRNA pool from cells extracted after exposure to sublethal MG concentrations . 
+ In wild type cells transcripts for both nemA and gloA were detected with the latter being more abundant than the former , consistent with independent promoters . 
+ Deletion of nemR led to 15 and 5-fold higher levels of transcript for nemA and gloA respectively ( Fig. 6A ) . 
+ In contrast , deletion of nemA ( in a NemR + ) background did not modify the level of the gloA transcript detected . 
+ Consistent with these data , we observed that cell-free extracts contained similar levels of GlxI activity whenever the strain was NemR + , but GlxI activity was increased ~ 5-fold in a nemR deletion strain , consistent with translation of the more abundant gloA mRNA ( Fig. 6B ) . 
+ Finally , this increased expression of gloA was manifested in a decreased sensitivity to MG in the DnemR strain compared with both the wild type and the DnemA strain ( these two strains exhibiting equivalent sensitivity ) ( Fig. 6C ) . 
+ The data are consistent with readthrough from the nemA promoter providing enhancement of expression of gloA . 
+ Several systems have been proposed to be components of the defence mechanism against MG in E. coli ( Misra et al. , 1995 ; Ferguson et al. , 1998 ; Subedi et al. , 2011 ) . 
+ We assessed whether genes for the core protective systems also showed increased transcription as result of sublethal MG exposure . 
+ RNAP occupancy across the genes for GlxII ( gloB ) and the K + efflux systems ( kefGB and kefFC ) were not signiﬁcantly changed whether assessed by ChIP-chip ( Fig. 5B -- D ) or by qRTPCR analysis of mRNA pools ( Fig. 4 ) . 
+ Moreover , no increased RNAP occupancy was detected for genes of GSH biosynthesis enzymes ( gshA , gshB , ybdK ) or for GSH-independent GlxIII ( hchA ) ( Dataset S1 ) . 
+ In contrast , strong induction of the frmAB and yqhD genes , involved in aldehyde detoxiﬁcation was observed ( Fig. 4 , Fig. 5E and F ) . 
+ The GSH-dependent FrmAB enzyme system is involved is the detoxiﬁcation of formaldehyde ( Herring and Blattner , 2004 ; Gonzalez et al. , 2006 ) . 
+ The YqhD system has been shown to have aldo-keto reductase activity against a wide range of aldehydes , including MG ( Lee et al. , 2010 ) . 
+ Despite the increased transcription of genes for these systems , single deletion mutants lacking frmA or yqhD did not exhibit increased sensitivity to MG , whereas a gloA mutant , lacking GlxI , showed the expected sensitivity ( Fig. 6D ) . 
+ Recent work has identiﬁed that a double mutant lacking both gloA and yqhD acquired increased sensitivity to glyoxal , but not to MG , when compared with the single gloA mutant ( Lee et al. , 2010 ) , thus conﬁrming that increased expression of YqhD is unlikely to be a major factor in MG tolerance . 
+ The data show that E. coli cells do not induce key protective systems as an adaptation strategy to sublethal MG exposure , but do induce systems that appear not to have a major physiological role for MG tolerance . 
+ Transcriptional response is rapid in cells exposed to lethal concentrations of MG (Type II)
+ Exposure of cultures at low cell density ( OD650 ~ 0.04 ) to MG causes rapid cell death ( ~ 0.2 % cells are viable after 30 min in the presence of 0.8 mM MG ) ( MacLean et al. , 1998 ; Ozyamak et al. , 2010 ) . 
+ We investigated the transcriptional response at intervals ( 2.5 , 10 & 30 min ) after MG challenge ; ( Type II experiment ; Fig. 1B ) ( see Experimental procedures , Dataset S1 ) . 
+ We observed a very similar RNAP distribution pattern and enrichment ratios to the Type I experiments above ( Table S2 ) . 
+ Time-dependent changes in the ChIP-chip signals for members of the SOS response genes were observed . 
+ In the 2.5 min sample increases in expression of the SOS genes were very small or not reproducible . 
+ Stronger signals for the SOS genes were observed in both the 10 min and 30 min samples ( Fig . 
+ S2A and B , Table S2 ) . 
+ A signiﬁcant difference from cells treated with sublethal MG was the increased RNAP occupancy of genes associated with oxidative stress . 
+ The OxyR-regulated ahpCF operon was clearly upregulated , with the highest enrichment at 2.5 min and decreased signals thereafter ( Fig . 
+ S2C ) . 
+ Other OxyR-regulated genes ( Storz et al. , 1990a , b , c ) such as trxC , grxA or dps also exhibited this pattern ( Fig . 
+ S2D -- F ) . 
+ No consistent increase in RNAP occupancy across the ahpCF operon was observed in Type I experiments ( sublethal MG ) . 
+ This difference may reﬂect transient induction of OxyR-regulated genes upon lethal MG challenge , possibly due to transient GSH depletion , that is missed in Type I experiments due to the very signiﬁcantly lower MG concentration at the sampling time due to rapid detoxiﬁcation of MG at the higher cell density ( the external concentration would fall to ~ 0.4 mM ) ( Almeida , 2009 ) . 
+ Transcriptional response during progressive MG intoxication (Type III)
+ Bacteria frequently produce MG as a metabolic by-product during adaptation from famine to feast ( Freedberg et al. , 1971 ; Totemeyer et al. , 1998 ) and consequently sudden exposure of cells to a high concentration of MG may not be physiological . 
+ We therefore sought to compare the transcriptional response during the production and accumulation of MG in the medium with the responses described above . 
+ In addition , such a regime would provide an indication of the concentration depend-ence of the response to MG . 
+ Previously , we have described the production of MG by E. coli cells growing on xylose when stimulated to increased transcription of the xylose regulon by cAMP addition , which mimics the famine to feast scenario ( Totemeyer et al. , 1998 ) . 
+ Over a 5 h time-course the growth rate slowed to zero ( at ~ 0.4 mM MG ) , followed by death as the MG concentration rises to 0.8 mM ( Fig. 7A and B ) . 
+ RNAP distribution proﬁles at each time point were compared either with the control , with no added cAMP ( Fig. 7C and D , Dataset S1 ) , or to the initial sample at 10 min after cAMP addition ( Fig. 8A -- D , Table S3 ) , at which time the level of MG was undetectable . 
+ After 30 min incubation with cAMP the MG pool had risen to ~ 50 mM and only limited changes in gene expression were observed ( both induction and repression ) . 
+ Genes for cysteine biosynthesis , which is required for GSH biosynthesis , appeared to be repressed by MG ( i.e. RNAP exhibited reduced occupancy at this operon ; Fig. 8A ) . 
+ However , since growth continues for at least one further generation , the existing enzymes must remain active at this MG concentration and a potential reduction in transcription may not be signiﬁcant for cysteine production . 
+ After 60 min ( MG concentration 100 -- 150 mM ; Fig. 8B , Table S3 ) the transcription pattern was clearly perturbed with speciﬁc enzyme systems being induced , including the frmAB and yqhD genes ( Fig. 8B , Fig . 
+ S3C and D , Table S3 ) . 
+ However , as mentioned above , these enzymes do not appear to play a major role in protection against MG ( Fig. 6D ) . 
+ Induction of the his regulon ( Fig. 8B ) may be explained by the reaction of MG with this amino acid causing partial starvation ( Aldini et al. , 2005 ) and release from attenuation ( Yanofsky , 1981 ; Barnes and Tuley , 1983 ) . 
+ Large-scale changes in the transcriptome were evident at both 120 min and 240 min at which time the MG concentration has reached growth inhibitory ( ~ 0.4 mM ) and lethal levels ( ~ 0.7 mM ) respectively . 
+ In both cases strong induction of the SOS regulon and soxSR and marRAB was evident ( Fig. 8C and D , Table S3 ) . 
+ It was also at this time point that nemA ( and gloA ) transcription was also increased ( Fig. 8C , Figs S4 and S5B ) . 
+ During the ﬁnal stages of MG intoxication the overall balance of RNAP binding favoured a few specialist DNA repair functions , while the majority of genes involved in housekeeping metabolism were repressed ( Fig. 8C and D ) . 
+ These observations were conﬁrmed by qRT-PCR for selected genes ( Fig . 
+ S4 ) . 
+ Throughout the time series , genes that are regulated by cAMP exhibited high RNAP occupancy , which is indicative that cAMP remains abundant . 
+ At the outset the TUs were uniformly occupied by RNAP ( Fig. 7C ) , but as the MG concentration rose further ( Fig. 7B ) , RNAP was progressively located at the promoter regions and at intergenic regions , producing pronounced peaks and valleys from the previous uniform distribution . 
+ For example , the xylFGHR ( Fig. 7C ) and manXYZ operons , and malT ( Fig . 
+ S3A and B ) were upregulated after 10 min and remained high throughout the experiment despite the modiﬁed distribution of RNAP . 
+ Thus , there was no general shutdown of the cAMP-CRP regulatory system even at lethal concentrations of MG . 
+ Transcription also continued unabated ( e.g. frmA , nemA and recA ) as revealed by qRT-PCR analysis ( Fig . 
+ S4 ) , despite the greater polarity in the distribution of RNA polymerase in later time samples . 
+ The counter-protective Kdp system is induced by MG Induction of the kdpFABC and kdpDE operons was observed in all three sets of ChIP-chip data ( i.e. Type I , II & III ; Figs 4 and 8B ) . 
+ The kdpFABC genes encode a high affinity scavenging P-type K + - ATPase ( Laimins et al. , 1978 ; Rhoads et al. , 1978 ) . 
+ Transcription of the structural genes is under control of KdpDE and this two-component regulatory system responds to insufficiency of the Trk and Kup , constitutive K + transporters , to maintain the K + pool ( Laimins et al. , 1981 ) . 
+ During MG stress , the expression of the Kdp system is consistent with the expected enhanced 
+ K + loss consequent upon activation of KefGB and KefFC systems . 
+ However , it is also counterintuitive since K + loss and consequent cytoplasmic acidiﬁcation is intrinsic to the mechanisms protecting cells against MG ( Ferguson , 1999 ) . 
+ Thus , we sought to verify the original ChIP-chip data . 
+ Firstly , we established that the signals for the kdpFABCDE region responded simply to K + sufficiency by simply exchanging the low K + growth medium ( K0 .2 ) for high K + ( K115 ) . 
+ When expressed as a ratio ( K0 .2 / K115 ) the kdpFAB-CDE genes exhibited increased RNAP occupation relative to the ﬂanking genomic regions ( Fig. 9A ) , consistent with their transcription during steady state growth in low K + medium . 
+ The observed changes in occupancy of the kdpFABCDE operon by RNAP in low and high K + media were conﬁrmed by qRT-PCR ( Fig. 9B ) . 
+ Growth in the presence of MG at low K + resulted in further enhancement of the kdpFABCDE ChIP-chip signal , suggesting that K + loss associated with activation of KefGB and KefFC during MG detoxiﬁcation generated an enhanced signal for transcription ( Fig. 9A ) . 
+ To test this prediction we generated ChIP-chip data for strain MJF632 ( DkefGB , DkefFC ) , which lacks both K + efflux systems . 
+ Consistent with the model there was no increase in ChIP signal for the kdpFABCDE operon in this strain ( Fig. 9A ) and qRT-PCR analysis of mRNA pools conﬁrmed this observation ( Fig. 4 ) . 
+ Other transcriptional responses to MG were similar to the wild type strain ( Fig. 4 , Table S2 ) and ChIP-chip signal patterns and mRNA stability of highly expressed genes were similar ( Supporting information , Fig . 
+ S6 ) . 
+ Previously we have established that expression of the kdpFABCDE operon sensitizes E. coli to MG ( Ferguson et al. , 1996 ) . 
+ We sought to verify that the strain used here , MG1655 , also dies more rapidly if exposed to MG when the kdpFABC operon is active . 
+ An isogenic mutant lacking kdpA , the K + channel forming subunit , was found to survive exposure to MG ~ 10-fold better than the wild type ( Fig. 9C ) . 
+ Thus , the cells express a system that counters their own survival . 
+ Discussion
+ MG toxicity is encountered in all forms of life and the response most frequently utilizes GSH-dependent detoxi-ﬁcation of the electrophile and repair of damage by specialist inducible enzyme systems ( Ferguson , 1999 ) . 
+ E. coli offers a paradigm for the bacterial response to MG . 
+ Glyoxalase-type enzymes are ubiquitous in bacteria despite the rather more limited distribution of GSH ( Suttisansanee and Honek , 2011 ) . 
+ This disparity has partially been resolved by the discovery of sugar-based thiol compounds that are intrinsic components of the detoxiﬁcation system in some Gram-positive bacteria and by the recent elucidation of novel biosynthetic pathways to g-glutamylcysteine peptides ( Newton et al. , 2009 ; 2012 ; Gaballa et al. , 2010 ; Suttisansanee and Honek , 2011 ; Veeravalli et al. , 2011 ) in a wide range of organisms . 
+ E. coli augments the detoxiﬁcation by a novel acidiﬁcation mechanism by which cytoplasmic K + is exchanged for external H + via the KefGB and KefFC systems ( Ferguson , 1999 ) . 
+ The activity of these systems is controlled by the balance between reduced GSH and GSH adducts formed during detoxiﬁcation . 
+ Similar systems have been identi-ﬁed in Bacillus ( YhaTU ; ( Booth et al. , 2003 ; Fujisawa et al. , 2004 ; 2007 ) and the discovery of bacillithiol , myco-thiol and glyoxalases speciﬁc for these thiols leaves open the possibility of equivalent regulation of K + efflux ( J. Helmann , pers . 
+ comm . ) . 
+ The E. coli system is so effective that it offers protection even when detoxiﬁcation is essentially blocked by mutations affecting the second enzyme in the glyoxalase pathway , GlxII ( gloB ) ( Ozyamak et al. , 2010 ) . 
+ GlxI , the ﬁrst enzyme in the pathway is essential for protection against MG because of its central role in generating the activator of KefGB . 
+ Thus , simply removal of GSH through formation of the hemithioacetal , the spontaneous reaction product formed by reaction of MG with GSH , is not enough to activate KefGB -- the system requires the GlxI-catalysed formation of SLG ( MacLean et al. , 1998 ; Ozyamak et al. , 2010 ) . 
+ Given that this essentially constitutive , allosterically modulated system is so effective we sought to determine the transcriptional response to MG using ChIP-chip analysis to follow the positioning of RNAP on the genome . 
+ The data present a comprehensive picture of the transcriptional response of E. coli to MG and reveals intriguing changes in gene expression some of which are counterintuitive . 
+ Even when exposed to lethal concentrations of MG that kill > 99.9 % of cells , the bacteria remain transcriptionally active throughout the treatment . 
+ Moreover , previous studies reported that even when MG-mediated growth inhibition was maximal , incorporation of external label into RNA and protein continued , albeit at a lower rate ( Fraval and McBrien , 1980 ) . 
+ No analysis of the balance between rRNA/tRNA and mRNA was performed in that early study . 
+ In our study similar RNAP distributions , and inferred transcription patterns , were observed under the three different experimental regimes tested . 
+ Genes that are transcribed in response to MG can be ascribed to three broad classes -- ( i ) those required for DNA repair , ( ii ) enzyme systems that are known to be regulated by proteins that are modulated by the modiﬁcation of critical cysteine residues , and ( iii ) systems that appear to be adventitiously expressed as a consequence of the changed physiology of the cells as they detoxify MG . 
+ Among the latter is the transient response of the OxyR regulon during sudden exposure to lethal concentrations of MG ( Type II experiments , Fig . 
+ S2 ) and soxRS genes during Type III progressive intoxication ( Fig. 8C ) . 
+ The OxyR response to hydrogen peroxide ( H2O2 ) is known to be transient ( Zheng et al. , 1998 ; Aslund et al. , 1999 ; Carmel-Harel and Storz , 2000 ) and would ﬁt the kinetics observed here . 
+ Depletion of GSH pools by MG , leading to a transient change in cytoplasmic redox potential , may be sufficient to explain the increased transcription of some of the genes under OxyR control , whereas direct covalent modiﬁcation of OxyR by MG seems less likely to be the mechanism ( Zheng et al. , 1998 ) . 
+ No increased RNAP binding was observed at the genes for GSH biosynthesis that might be expected under conditions of oxidative stress , but this may simply reﬂect the hierarchy of gene expression with the OxyR regulon ( Carmel-Harel and Storz , 2000 ) . 
+ In contrast , the increased expression of the SOS regulon is as predicted from the known reaction of MG with DNA causing base modiﬁcation ( Krymkiewicz , 1973 ; Kenyon and Walker , 1980 ; Sedgwick and Vaughan , 1991 ; Ferguson et al. , 2000 ; Moolenaar et al. , 2000 ; Karschau et al. , 2011 ) . 
+ Type II experiments reveal that this response is moderately slow -- increases in recAX expression are not seen in the 2.5 min time point after increased exposure ( Fig . 
+ S2 ) , presumably reﬂecting the rate at which the balance between excision repair creation of single strand gaps exceeds the rate of re-synthesis of the DNA and ligation ( Karschau et al. , 2011 ) . 
+ In Type III experiments it is clear that severe growth inhibition precedes the major induction of the SOS regulon ( Fig. 7A and D ) . 
+ One of the most striking transcriptional responses that E. coli cells elicited to MG challenge was the induction of several potential detoxiﬁcation systems ( nemA , frmAB , yqhD ) ( Fig. 5A , E and F ) . 
+ However , these systems do not appear to have a physiological protective role against MG toxicity since deletion mutants exhibited the same levels of MG tolerance as the wild type strain ( Fig. 6C and D ) . 
+ The molecular basis for these transcriptional responses is most probably covalent modiﬁcation of regulatory proteins by MG . 
+ Thus , NemR , the repressor protein of the nemRA operon , is rendered inactive by electrophiles ( e.g. N-ethylmaleimide and MG ) through the modiﬁcation of at least one speciﬁc cysteine residue ( Umezawa et al. , 2008 ) . 
+ During the preparation of this manuscript two recent studies have shown that the modiﬁcation of NemR leads to decreased binding of this protein to the nemRA promoter leading to readthrough transcription of gloA ( Gray et al. , 2013 ; Lee et al. , 2013 ) in agreement with our independent observations here . 
+ Lee et al. ( 2013 ) report that Cys21 and Cys116 are critical for responding to electrophiles and propose a model in which NemR regulation is mediated by the formation of Cys21 -- Cys21 and Cys116 -- Cys116 disulphide bonds on the dimeric protein . 
+ Gray et al. ( 2013 ) , who studied the HOCl-responsiveness of NemR conclude that oxidation of Cys106 is sufficient for NemR 's ability to respond to bleach ( HOCl ) and other reactive chlorine species . 
+ Upregulation of the frmRAB operon , encoding a formal-dehyde detoxiﬁcation system , under MG stress may also be interpreted in the context of repressor alkylation / modiﬁcation , since the FrmR protein also contains a conserved cysteine residue . 
+ Similarly , transcription of the yqhD gene , encoding a non-speciﬁc aldo-keto reductase activity , is regulated by YqhC , a cysteine-rich protein encoded upstream of yqhD ( Fig. 5F ) ( Lee et al. , 2010 ) . 
+ The yqhD-dkgA and nemRA-gloA operons can be induced by a diverse range of reactive molecules ( Turner et al. , 2011 ; Gray et al. , 2013 ; Lee et al. , 2013 ) supporting the hypothesis that induction of the above mentioned detoxiﬁcation systems is a general consequence of the electrophilic nature of MG . 
+ Studies with B. subtilis show that both formaldehyde and MG elicit a stress response characteristic for thiol-reactive , non-aldehyde electrophiles , such as quinones and diamide ( Nguyen et al. , 2009 ) . 
+ Moreover , the authors demonstrated an essential role for cysteine modi-ﬁcation in the transcriptional regulator , AdhR , in response to formaldehyde and MG . 
+ Thus , while some transcriptional responses are undoubtedly protective , others simply reﬂect the protein damage via cysteine modiﬁcation . 
+ Our previous work has established that three major variables have the potential to lower the sensitivity to MG : a low activity for the Kdp system , increased expression of both KefGB and GlxI , leading to enhanced potassium efflux and cytoplasmic acidiﬁcation and ultimately enhanced protection . 
+ However , Kdp expression is increased by the presence of MG , which is counterprotective ( Ferguson et al. , 1996 ) and conﬁrmed here ( Fig. 9C ) . 
+ In this study we saw no evidence for increased expression of the KefGB and KefFC systems ( Fig. 5B and C ) that could have countered the effects of increased Kdp activity . 
+ In contrast , increased expression of gloA leading to elevated GlxI activity and thus greater activation of KefGB ( MacLean et al. , 1998 ; Ozyamak et al. , 2010 ) , can arise by readthrough from the nemRA operon as noted above . 
+ Although the scale of gloA mRNA change and ChIP-chip signals ( Figs 4 and 5A respectively ) is small , our previous studies have shown that a 30 -- 50 % increase in GlxI activity would be sufficient to cause a very large change in survival ( MacLean et al. , 1998 ) . 
+ The gene order nemRA-gloA is conserved among the g-proteobacteria and the ChIP-chip data here suggest that transcriptional readthrough from the nemRA operon into gloA arises at concentrations of MG that are just sufficient to cause growth inhibition ( ~ 0.4 mM ) ( Fig. 7A , Figs S4 and S5B ) . 
+ The lack of a strong terminator signal between nemA and gloA provides a mechanism for amplifying the activity of GlxI when cells encounter inhibitory levels of MG . 
+ An independent s70 promoter has been predicted to lie 5 ' to gloA ( Fig . 
+ S1 ) , which might function to produce the ` housekeeping ' level of GlxI observed in cells not previously exposed to the electrophile ( Fig. 6B ) . 
+ Moreover , the gloA gene is expressed from multicopy plasmids lacking the upstream nemRA genes , which is consistent with the presence of a functional promoter ( MacLean et al. , 1998 ) . 
+ A recent study proposes that the nemRA-gloA genes constitute a system for the reduction of quinones and glyoxals , and point towards a similar transcriptional organization in some eukaryotic organism ( Lee et al. , 2013 ) . 
+ However , a distinction has to be made between different glyoxals ( glyoxal and MG ) in terms of cell physiology . 
+ In a previous study the authors have shown that YqhD is the major detoxifying enzyme for glyoxal and that the GlxI & II system does not serve as an efficient pathway for its detoxiﬁcation ( Lee et al. , 2010 ) . 
+ Moreover , it is worth noting that it is unknown whether glyoxal elicits the activation the KefGB and KefFC systems as MG does . 
+ Interestingly , another study shows that HOCl stress can result in the increased production of MG E. coli ( Gray et al. , 2013 ) . 
+ The authors suggest the relevance of the nemRA-gloA gene organization , regulated by the HOCl-sensitive NemR , to be that cells anticipate the production of MG and induce the protective GlxI enzyme . 
+ Our data highlight the concentration-dependent nature of responses when MG accumulates progressively , and correlate this with the effect on growth and survival . 
+ At low MG concentrations ( < 0.1 mM , a concentration that only slightly inhibits growth ; MacLean et al. , 1998 ) a limited number of major changes occurred affecting selected operons ( Fig. 8A and B ) , but wide-ranging changes in gene expression were evident at later time points ( MG concentration > 0.4 mM ; Fig. 7B ) , with repression dominating over induction ( Fig. 8C and D ) . 
+ The lack of RNAP at these repressed loci can not be due to generalized inhibition of transcription by MG , since there were also major new peaks of RNAP binding , reﬂecting new promoter recognition patterns ( Fig. 8C and D ) and speciﬁc increases in mRNA ( Fig . 
+ S4 ) , indicating transcription of these genes . 
+ One interesting observation is the change in peak geometry as a function of increasing MG exposure . 
+ At the lowest MG concentrations , an even distribution of RNAP was observed across the TU ( e.g. xylFGHR , manXYZ and malT in response to cAMP addition ; Fig. 7C , Fig . 
+ S5A and B ) . 
+ However , as MG accumulated peaks became skewed towards promoter regions ( Fig. 7C , Fig . 
+ S3A and B ) , suggesting that at high MG concentrations transcription can become paused at the promoter leading to the observed skewed peak geometry . 
+ The degree of skewing of the proﬁles is gene - and operonspeciﬁc indicating that the DNA sequence may itself play a role in determining the processivity of the RNAP in the presence of MG . 
+ At the time of assay that skewing becomes evident ( 120 min ) the majority of the population is still viable ( Fig. 7B ) . 
+ Moreover in the equivalent Type I and II experiments mRNA is still being produced ( Fig. 4 ) and thus dead cells should not be the principal reason for the changed RNAP distribution . 
+ Guanine is the base most readily modiﬁed in the presence of MG ( Krymkiewicz , 1973 ; Ferguson et al. , 2000 ) . 
+ One possibility is that the metabolism of guanine and adenine nucleotides ( cAMP , ATP , GTP , ppGpp and pppGpp ) has been affected , with pleiotropic consequences for RNAP activity , which would be expected to affect genes and operons differentially . 
+ This analysis shows that E. coli mounts a strong transcriptional response to MG exposure , but that this may predominantly reﬂect the covalent modiﬁcation of speciﬁc proteins and of DNA bases rather than integration of gene expression through a master regulator . 
+ Only the expression of the kdp genes appears to respond speciﬁcally to the activation of the protective KefGB system by MG . 
+ With the important exception of GlxI ( and here only after exposure to high concentrations of MG ) the genes for the protective pathways ( KefGB , KefFC , GlxII , GSH biosynthesis ) are not increased . 
+ This is consistent with our previous analysis that the dynamics of activation of KefGB are a critical determinant of survival ( Ferguson et al. , 1993 ; MacLean et al. , 1998 ; Ozyamak et al. , 2010 ) . 
+ Although LexA/RecA is the regulatory switch for the SOS regulon , there is no precedent for these proteins being directly modulated by MG . 
+ Thus , the transcriptional changes that reﬂect the imbalance between intoxication , detoxiﬁcation and protection , damage and repair and , a limited integration of cellular metabolism with the activation of KefGB is achieved through the formation of GSH adducts . 
+ Experimental procedures
+ Strains and media
+ All experiments were performed with E. coli K-12 MG1655 and isogenic deletion mutants as listed in Table S1 . 
+ E. coli K-12 strains other than MG1655 were used to create the MG1655 derivatives ( see Supporting information ) . 
+ Depending on the experimental design cells were grown either in K0 .2 minimal medium containing ~ 0.2 mM K + or K115 minimal medium containing ~ 115 mM K + ( Epstein and Kim , 1971 ) . 
+ Both media were supplemented with 0.2 % ( w/v ) glucose , 0.0001 % ( w/v ) thiamine , 0.4 mM MgSO4 and 6 mM ( NH4 ) 2SO4 · FeSO4 . 
+ In experiments conducted to stimulate MG production cells were grown in K0 .2 medium with 0.2 % ( w/v ) xylose as the sole carbon source and supplemented with 2 mM cAMP . 
+ Solid media contained 14 g l-1 agar . 
+ To prepare solid K0 .2 medium the agar was ﬁrst washed with a 1 M NaCl solution to displace trace amounts of K + and then washed several times with distilled water prior to use in plates . 
+ Growth conditions and in vivo cross-linking for ChIP-chip of RNAP
+ Generally , overnight cultures were grown for at least 16 h at 37 °C ( 250 rpm ) and diluted into fresh pre-warmed medium to OD ~ 0.05 . 
+ Cells were grown to the relevant growth phase 650 ( see schema in Dataset S1 ) and cross-linked by adding 1 % formaldehyde and incubation at 22 °C for 20 min ( 70 rpm ) . 
+ Excess formaldehyde was quenched with 0.5 M glycine and the cells were incubated for 5 min at 22 °C with shaking . 
+ Typically ~ 110 cells were harvested by centrifugation at 4 °C , washed three times with ice-cold Tris-buffered-saline ( pH 7.5 ) and cell pellets frozen at -20 °C . 
+ Cells in experiments investigating the RNAP redistribution were grown in K medium under three growth regimes ( Type 0.2 I -- III ; see schema in Dataset S1 ) . 
+ For Type I experiments two parallel cultures ( test and control ) were inoculated ( initial OD = 0.05 ) from a single overnight culture and grown to 650 OD ~ 0.4 . 
+ The test culture was treated with 0.8 mM MG and 650 both cultures were incubated further for 30 min before cross-linking . 
+ For Type II experiments a pre-culture was similarly grown to OD650 ~ 0.4 and cells were diluted 10-fold into prewarmed fresh media in the absence or presence of 0.8 mM MG and then were cross-linked after 2.5 , 10 and 30 min . 
+ For Type II experiments each time point involved the sacriﬁce of a complete ﬂask of culture , thus parallel ﬂasks , each derived from the original inoculum , were used and sacriﬁced at different times . 
+ In addition , we conducted control experiments to assess changes in RNAP distribution solely due to dilution of cells , by comparing changes in the diluted cells to those in the pre-culture . 
+ We did not observe signiﬁcant RNAP occupancy changes in these experiments ( Dataset S1 ) . 
+ Finally , to assess the potential impact of MG-induced DNA fragmentation on ChIP-chip experiments a series of controls were performed to investigate the recovery of DNA from MG-treated cells ( see Supporting information ) . 
+ Type III experiments involved the growth of cells in K0 .2 medium with 0.2 % ( w/v ) xylose a carbon source . 
+ An over-night culture ( with xylose ) was grown for at least 24 h to allow the cells to adapt to the carbon source . 
+ A culture was grown to OD ~ 0.4 and a deﬁned volume containing 6 ¥ 110 650 cells was withdrawn to provide reference samples and cross-linked . 
+ The remainder of the culture was diluted 8-fold into pre-warmed fresh media in the presence of 2 mM cAMP ( test ) and cells were cross-linked after 10 , 30 , 60 , 120 and 240 min . 
+ In addition , cells were diluted into a control culture without cAMP and cells were cross-linked upon reaching OD650 ~ 0.15 ( approx . 
+ 120 min ) . 
+ As with the Type II experiments each time point involved the sacriﬁce of a complete ﬂask of culture and thus the data for different time points are derived from parallel cultures generated from a single inoculum . 
+ Subsequent ChIP-chip analysis of the cells collected at the different intervals compared changes to the reference samples from mid-exponential phase . 
+ All experiments have been rep-licated at least two times for ChIP-chip and independently replicated for mRNA pool determinations and assays of enzyme activities . 
+ In experiments comparing the RNAP occupancy in cells grown in K0 .2 and K115 media cultures were grown in the respective media overnight , diluted into fresh medium , grown from OD650 ~ 0.05 to OD650 ~ 0.4 , and the cells cross-linked . 
+ ChIP-chip procedure
+ Immunoprecipitation was carried out following the procedure described by Grainger et al. ( 2004 ) , with a modiﬁcation to the lysozyme-driven cell lysis protocol . 
+ Lysozyme was used at a ﬁnal concentration of 1 mg ml-1 ( L6876 , Sigma ) because we observed considerable variation in the efficiency of cell lysis ( 30 -- 100 % ) when a ﬁnal concentration of 10 mg ml-1 was used . 
+ The lysates were sonicated 12 times for 15 s each ( 1 min rest ) in an ice bath to shear the chromatin complexes using a Misonix sonicator 3000 ( output level 4 ) . 
+ The sonication procedure resulted in a DNA fragment range of 300 -- 1100 bp . 
+ ChIP experiments were performed using a mouse monoclonal antibody against the b subunit of RNAP ( W0002 ; Neoclone ) . 
+ Immunoprecipitated DNA samples were puriﬁed , but no ampliﬁcation step was performed . 
+ Samples were proc-essed by OGT ( Oxford , UK ) to incorporate Cy3 or Cy5 dyes and hybridized onto OGT 4x44K high-density oligonucleotide arrays . 
+ Routinely , control samples were labelled with Cy3 and test samples were labelled with Cy5 . 
+ Data analysis
+ Data were normalized and transformed as detailed in Supporting information . 
+ We employed a combination of freely available data visualization and data analysis tools to detect and report peaks and supplemented the analysis with our newly developed software tool CamiScan to annotate reported peaks , enabling us to analyse large data sets more rapidly . 
+ For a more detailed description of data normalization and analysis see Supporting information . 
+ Cells were grown and treated exactly as for the ChIP experiments and RNA molecules stabilized by treating cells with RNAprotect Bacteria Reagent ( Qiagen ) . 
+ RNA was extracted using the RNAeasy Kit ( Qiagen ) and reverse transcribed using the First-Strand cDNA Synthesis Kit ( GE Healthcare ) . 
+ cDNA was quantiﬁed with a LightCycler 480 using SYBR Green ( Roche ) . 
+ For a list of primers and a more detailed description of data normalization and analysis see Supporting information and Table S4 . 
+ Cell viability and MG production assays
+ Assays were performed as previously described ( Ozyamak et al. , 2010 ) , except that cells were recovered on K0 .2 solid media for viability assays . 
+ The sensitivity of strains to MG was assessed using an MG disc assay as described in Supporting information . 
+ Acknowledgements
+ We pay tribute to the work of Gail Ferguson ( 1969 -- 2011 ) who elucidated many of the aspects of protection against MG in E. coli . 
+ Her insight is sadly missed by all who worked with her . 
+ The authors thank their colleagues for their contributions to discussions on this work , in particular Dr Morgiane Richards for discussions on data analysis . 
+ The work was supported by the Wellcome Trust ( GR040174 and 086903 ) , the University of Aberdeen ( C.A. and E.O. ) and the MRC ( E.O. ) and the BBSRC ( Grant No . 
+ BB/F003455/1 , SysMo initiative ) . 
+ IRB acknowledges generous support from The Leverhulme Trust ( award 2012-060/2 ) . 
+ Thanks also to David Grainger and Steve Busby for providing training and advice to E.O. and to colleagues at OGT ( Oxford ) for their support . 
+ Supporting information
+ Additional supporting information may be found in the online version of this article at the publisher 's web-site .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/23818864.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/23818864.txt 0 → 100644
View file @27818a9
+ Genome-scale Analysis of Escherichia coli FNR Reveals
+ Abstract 
+ FNR is a well-studied global regulator of anaerobiosis , which is widely conserved across bacteria . 
+ Despite the importance of FNR and anaerobiosis in microbial lifestyles , the factors that influence its function on a genome-wide scale are poorly understood . 
+ Here , we report a functional genomic analysis of FNR action . 
+ We find that FNR occupancy at many target sites is strongly influenced by nucleoid-associated proteins ( NAPs ) that restrict access to many FNR binding sites . 
+ At a genome-wide level , only a subset of predicted FNR binding sites were bound under anaerobic fermentative conditions and many appeared to be masked by the NAPs H-NS , IHF and Fis . 
+ Similar assays in cells lacking H-NS and its paralog StpA showed increased FNR occupancy at sites bound by H-NS in WT strains , indicating that large regions of the genome are not readily accessible for FNR binding . 
+ Genome accessibility may also explain our finding that genome-wide FNR occupancy did not correlate with the match to consensus at binding sites , suggesting that significant variation in ChIP signal was attributable to cross-linking or immunoprecipitation efficiency rather than differences in binding affinities for FNR sites . 
+ Correlation of FNR ChIP-seq peaks with transcriptomic data showed that less than half of the FNR-regulated operons could be attributed to direct FNR binding . 
+ Conversely , FNR bound some promoters without regulating expression presumably requiring changes in activity of condition-specific transcription factors . 
+ Such combinatorial regulation may allow Escherichia coli to respond rapidly to environmental changes and confer an ecological advantage in the anaerobic but nutrient-fluctuating environment of the mammalian gut . 
+ Introduction
+ Regulation of transcription initiation by transcription factors ( TFs ) is a key step in controlling gene expression in all domains of life . 
+ Genome-wide studies are revealing important features of the complexity of transcription regulation in cells not always apparent from in vitro studies . 
+ In eukaryotes , both the inhibition of TF binding by chromatin structure and the combinatorial action of multiple TFs contribute to the genome-wide pattern of TF binding and function [ 1 -- 5 ] . 
+ In contrast , our knowledge of transcriptional regulation by bacterial TFs stems largely from elegant in vitro experiments that have provided atomic resolution views of TF function [ 6 ] . 
+ Much less is known about how chromosome structure and combinatorial action affect bacterial TF binding and transcriptional regulation on a genome-wide scale [ 7 ] . 
+ Previous studies have suggested that , in contrast to the chromatin-restricted TF binding in eukaryotes , the Escherichia coli genome is permissive to TF binding because the occupancy pattern for some TFs correlates well with match to consensus sequence and consequent binding affinity [ 8 -- 10 ] . 
+ Other studies suggest that nucleoid-associated proteins ( NAPs ; for example H-NS , Hu , Fis , and IHF ) organize the chromosome into discrete domains and structures that may affect transcriptional regulation [ 7,11 -- 13 ] , but possible global effects of NAPs on TF-binding have not been systematically tested . 
+ To investigate the roles of TF action and chromosome structure in a prototypical bacterial regulon , we studied the regulon of the anaerobic TF FNR . 
+ FNR is widely conserved throughout the bacterial domain , where it evolved to allow facultative anaerobes to adjust to O2 deprivation [ 14 ] . 
+ Under anaerobic conditions , E. coli FNR contains one [ 4Fe-4S ] cluster per subunit , which promotes a conformation necessary for FNR dimerization , site-specific DNA binding , and transcription regulation [ 15,16 ] . 
+ Genome-wide transcription profiling experiments [ 17 -- 19 ] established that E. coli FNR controls expression of a large number of genes under anaerobic growth conditions , in particular those genes whose products function in anaerobic energy metabolism . 
+ However , corresponding studies to establish which promoters are directly or indirectly regulated by FNR under comparable growth conditions have yet to be reported . 
+ Studies of the regulatory regions of a few FNR controlled promoters have provided key insights into the mechanism of transcriptional regulation by FNR and the characteristics of FNR binding sites [ 20,21 ] . 
+ From these studies we know that FNR binding sites can have only a partial match to the consensus sequence of TTGATnnnnATCAA , and be located at variable positions within promoter regions , directing whether FNR has either a positive or negative affect on transcription . 
+ At FNR repressed promoters , FNR binding site locations range from upstream of the 235 hexamer ( which binds region 4.2 of RNA polymerase s ) , to overlapping the transcription start site ( TSS ; 70 +1 ) . 
+ At most FNR activated promoters , the center of the binding site is ,41.5 nt upstream of the TSS , placing FNR in position to interact with both the s and a subunits of RNA polymerase 70 ( RNAP ) [ 21,22 ] . 
+ Very few promoters are known to have FNR binding sites centered at 261.5 or greater , a position dependent typically on interactions with only the a subunit of RNAP [ 21 ] . 
+ The predominance of FNR binding sites positioned at 241.5 nt may reflect a preference for a particular activation mechanism , but it also could reflect sample bias in the limited number of activated promoters that have been studied to date . 
+ Thus , current knowledge is insufficient to allow accurate prediction of FNR binding sites genome-wide . 
+ Many FNR regulated promoters are controlled by multiple TFs ( for example CRP , NarL , NarP , and NAPs [ 7,20,21 ] ) , which can have either positive or negative effects on FNR function depending on the promoter architecture . 
+ For example , the narG promoter is activated by FNR , IHF , and the nitrate-responsive regulator , NarL [ 23,24 ] ; in contrast , the dmsA promoter is activated by FNR , but repressed by NarL [ 25,26 ] . 
+ At the nir promoter , NarL displaces IHF to overcome a repressive effect of IHF and Fis , and thereby enhances FNR-dependent transcription [ 27 ] . 
+ Thus , in the presence of the anaerobic electron acceptor nitrate , FNR function can be either enhanced or repressed by NarL depending on the organization of TF-binding sites within the promoter region . 
+ In this way , the requirement of additional TFs for combinatorial regulation of promoters bound by FNR resembles transcriptional regulation in eukaryotes [ 28 ] . 
+ Such complex regulatory patterns can not currently be inferred simply by identifying the locations of TF binding sites or by the strength of the FNR binding site . 
+ Direct measure of occupancy at these sites by each TF and correlation with the resulting transcripts in different growth conditions is needed to understand how complex bacterial regulatory networks coordinate gene expression . 
+ As an important first step , Grainger et al. used chromatin immunoprecipitation followed by microarray hybridization ( ChIP-chip ) to examine FNR occupancy using a FLAG-tagged FNR protein in E. coli cultures grown anaerobically in a rich medium [ 29 ] . 
+ Although many new FNR binding sites were identified , these data were not obtained from cells grown in the growth media used for reported transcriptomic experiments [ 17 -- 19 ] and thus the datasets can not readily be compared . 
+ To systematically investigate FNR binding genome-wide , we performed chromatin immunoprecipitation followed by micro-array hybridization ( ChIP-chip ) and high-throughput sequencing ( ChIP-seq ) for WT FNR from E. coli grown anaerobically in a glucose minimal medium ( GMM ) . 
+ Computational and bioinformatic analyses were used to refine a FNR position weight matrix ( PWM ) . 
+ The PWM was used to determine the relationship between ChIP-seq/ChIP-chip enrichment and match to the PWM , and to identify predicted FNR binding sites not detected by ChIP-seq . 
+ To examine the subset of highquality predicted FNR binding sites that lacked a FNR ChIP-seq peak , we obtained and analyzed aerobic and/or anaerobic ChIP-chip data for NAPs H-NS and IHF along with analysis of previously published aerobic ChIP-seq data for the NAP Fis 
+ [ 30 ] to determine if NAP occupancy might prevent FNR binding . 
+ Further , the effect of H-NS on FNR occupancy was examined directly using ChIP-chip analysis of FNR as well as on O2 dependent changes in expression in the absence of H-NS and its paralog StpA . 
+ After identifying FNR binding sites genome-wide , we performed whole genome transcription profiling experiments using expression microarrays and highthroughput RNA sequencing ( RNA-seq ) to compare a WT and Dfnr strain grown in the same medium used for the DNA binding studies . 
+ The transcriptional impact of FNR binding genome-wide was investigated by correlating the occupancy data with the transcriptomic data to determine which binding events led to changes in transcription , to identify the direct and indirect regulons of FNR , and to define categories of FNR regulatory mechanisms . 
+ Finally , the aerobic and anaerobic ChIP-chip and ChIP-seq distributions of the s and ß subunits 70 of RNAP throughout the genome were analyzed to determine the role of O2 and FNR regulation on RNAP occupancy and transcription . 
+ Results
+ TF binding sites were mapped genome-wide in E. coli K-12 MG1655 using ChIP-chip and/or ChIP-seq for FNR , s and ß 70 subunits of RNAP , H-NS , and IHF under aerobic or anaerobic growth conditions , as indicated ( Figure 1 ) . 
+ In addition , we analyzed a publically available Fis data set collected under aerobic conditions [ 30 ] . 
+ The ChIP-chip distribution of the ß subunit of RNAP suggested widespread transcription under both aerobic and anaerobic conditions , as expected , whereas the O2-dependent changes in ß occupancy indicated those genes that are differentially regulated by O2 . 
+ Further , binding , and thus transcription , by the s housekeeping form of E. coli RNAP was observed 70 throughout the chromosome ; peak finding algorithms identified a large number of anaerobic s ChIP-seq peaks ( 2,106 ) and 70 aerobic s ChIP-seq peaks ( 2,446 ) ( Table S1 ) . 
+ About 700 of the 70 s peaks showed statistically significant O2-dependent changes in 70 occupancy ( Table S2 ) . 
+ The O2-dependent differences in RNAP occupancy suggest extensive transcriptional reprogramming in response to changes in O2 , providing an excellent model system for examining genome-scale changes in transcription . 
+ Comparison of the profiles of other DNA binding proteins indicated that the number of binding sites for NAPs genome-wide was much greater than for the TF FNR . 
+ ChIP-seq and ChIP-chip analyses identified 207 FNR peaks , 722 anaerobic H-NS enriched regions , 782 aerobic H-NS enriched regions , 1,020 anaerobic IHF enriched regions ( Tables S3 , S4 , and S5 ) and published analysis of 
+ Fis identified 1,464 aerobic enriched regions [ 30 ] . 
+ The unbiased distribution of H-NS and IHF throughout the chromosome supports previous genome-wide studies of these NAPs [ 30 -- 33 ] . 
+ H-NS is known to form filaments that cover multiple kb of DNA [ 7,12,13,30,34,35 ] and we observed that half of the identified aerobic ( 390 ) and anaerobic ( 356 ) H-NS enriched regions were over 1 kb in length , referred to as extended H-NS binding regions ( Table S3 ) . 
+ Comparison of the aerobic and anaerobic H-NS binding distributions suggests H-NS occupancy is not greatly affected by O2 ( Figure 1 ) . 
+ For FNR , the number of highconfidence ChIP peaks ( 207 ) identified ( Table S5 ) was just a few fold lower than the number of genes found to show FNR-dependent changes in expression ( between 300 -- 700 ) [ 17 -- 19 ] . 
+ These binding site data were used to determine features of FNR binding genome-wide . 
+ ChIP peak height did not correlate with similarity to the FNR consensus sequence 
+ A small number of FNR peaks showed a large degree of variation in peak height across the genome . 
+ Previous studies of the repressor LexA reported that ChIP-chip peak height correlated with the match to the consensus sequence [ 10 ] , suggesting that differences in site occupancy may reflect relative binding affinities to individual sites . 
+ Because FNR is a global regulator with a more degenerate binding site than LexA , we tested whether we could use this parameter to gain additional information about FNR binding-site preferences . 
+ A PWM ( Figure 2 Inset ) was constructed from an alignment of sequences from the ChIP-seq peaks and the scores representing the match to the PWM were determined with the algorithm PatSer ( Table S5 ) [ 36 ] . 
+ In contrast to the studies of LexA [ 10 ] , we found a poor correlation between the height of the FNR ChIP-seq peak and the match to the FNR PWM for the site predicted within each peak ( Figure 3A ) . 
+ The same lack of correlation was also observed with FNR ChIP-chip data , indicating that this was not specific to the detection method . 
+ Additionally , there was a lack of correlation between FNR peak height and the number of known FNR binding sites . 
+ Furthermore , the majority of the FNR ChIP-seq or ChIP-chip peaks had similar heights , regardless of the score of the FNR motif present ( Figure 3A ) . 
+ One explanation for this latter result is that most FNR binding sites were saturated for binding in vivo . 
+ To examine this possibility directly , we performed ChIP-chip experiments over a range of cellular FNR dimer concentrations below the normal anaerobic cellular level of ,2.5 mM [ 37 ] , controlled by varying IPTG levels in a strain with fnr fused to an IPTG-inducible promoter . 
+ Peak areas for 35 selected FNR sites , representing a distribution of peak heights , were quantified for several cellular FNR dimer concentrations ( ,0.45 , ,0.7 , ,1.9 , and ,2.5 mM ) . 
+ These plots showed a typical binding saturation curve for both novel and previously identified FNR binding sites , and revealed that all sites examined were saturated for binding at the normal cellular FNR dimer level of ,2.5 mM ( Figure 3B , Figure S1 ) . 
+ However , because the broad distribution of peak heights between different sites was still observed , despite the fact that the sites were maximally occupied , we concluded that variation in peak height was not related to strength of FNR binding ( Figure 3 , Figure S1 ) . 
+ As a control , we tested four FNR peaks that were determined to be non-specific due to enrichment in a Dfnr control ChIP-chip experiment and these peaks showed no change in peak height when FNR levels were varied ( Figure S1 ) . 
+ Thus , we conclude that differences in peak height in the ChIP-seq and ChIP-chip experiments for FNR were most likely due to differences in cross-linking efficiency or immunoprecipitation at particular genomic locations and not to differences in FNR binding affinity . 
+ Cross-linking of FNR to a subset of genomic locations may be inhibited by other proteins 
+ A well-known challenge in genomic studies is the use of computational tools to accurately predict DNA binding sites , particularly for global regulators like FNR that have degenerate binding sites . 
+ To investigate the usefulness of the PWM generated from our set of ChIP binding sites for predicting FNR sites genome-wide , we initially used a PatSer [ 36 ] threshold low enough that a FNR motif was identified in each FNR ChIP-seq peak . 
+ However , this threshold resulted in .10,000 possible genomic FNR binding sites . 
+ In contrast , if we used a precision-recall ( PR ) curve [ 38 ] to determine the optimal threshold to predict FNR binding sites ( ln ( p-value ) of 210.75 ) , then we obtained a more reasonable number ( 187 ) of predicted FNR binding sites ( Figure 2 , Table S6 ) . 
+ Surprisingly , fewer than half of these sites ( 63 of 187 ) corresponded with a FNR ChIP-seq peak ( Table S6 ) , despite the fact that some predicted sites without a corresponding ChIP-seq peak had higher quality PatSer scores than those with a ChIP-seq peak . 
+ Although it is possible that some of the predicted sites without a ChIP-seq peak contain flanking sequence elements that disfavor FNR binding , we considered the possibility that many are functional sites but either FNR binding was masked by other DNA binding proteins or FNR cross-linking failed for other reasons . 
+ NAPs are known to affect the binding of some TFs in E. coli [ 7,12 ] . 
+ To ask if the NAPs H-NS , IHF , or Fis might occlude the 124 predicted FNR binding sites lacking a FNR ChIP-seq peak , we analyzed ChIP-chip data for H-NS and IHF , obtained from the same growth conditions , and publicly available ChIP-seq data for Fis [ 30 ] . 
+ Nearly all of these FNR sites ( 111 of 124 sites ; ,90 % ; silent FNR sites ) were enriched in IHF , H-NS , or Fis , consistent with the idea that these NAPs occupy the silent FNR sites and thereby block FNR binding ( Table S6 , Figure S2 ) . 
+ Similar occupancy was observed when the 124 predicted FNR sites were compared with H-NS and IHF enrichment from published ChIP-chip and ChIP-seq data performed under different growth , conditions [ 30,31,33 ] . 
+ In comparison , only 20 % ( 14 of 63 sites ) of the FNR sites that coincided with a FNR ChIP-seq peak were enriched in a NAP ChIP signal , significantly less than NAP occupancy at FNR sites lacking a peak ( p-value ,0.05 ) . 
+ In contrast , we found ,50 % of the previously identified LexA binding sites [ 10 ] were co-occupied with H-NS . 
+ We conclude that the NAPs H-NS , IHF , or Fis likely prevent FNR binding at some sites by occlusion . 
+ We also examined whether the silent FNR sites are preferentially occluded by the extended H-NS binding regions . 
+ The extended binding regions of H-NS ( .1 kb ) likely represent H-NS filaments that are known to cover multiple kb of DNA and silence transcription [ 7,12,13,30,34 ] . 
+ Consistent with this notion , our results showed that the extended H-NS binding regions were negatively correlated with RNAP ( ß ) ChIP-chip occupancy and this silencing occurred in both the presence and absence of O ( p-2 value ,0.05 ) ( Figure S3 ) . 
+ In contrast , shorter H-NS enriched regions ( ,1 kb ) were both positively and negatively correlated with RNAP ChIP-chip occupancy under aerobic and anaerobic growth conditions . 
+ The 46 silent FNR sites bound by H-NS were more likely to be occupied by extended H-NS binding regions ( 42 sites ) than by short H-NS binding regions ( 4 sites ) ( p-value ,0.05 ; example in Figure S3C ) , suggesting that extended H-NS binding regions may inhibit FNR binding at silent FNR sites . 
+ To investigate the impact of H-NS binding on FNR occupancy , we characterized FNR ChIP-chip peaks in a strain deleted for both hns and stpA ; stpA encodes a H-NS paralog that partially compensates for H-NS in a Dhns mutant [ 39,40 ] . 
+ Many new FNR peaks ( 196 ) appeared in the Dhns/DstpA strain ( Figure 1 , that found in the WT strain ( Figure 4C , Tables S6 and S7 ) . 
+ The majority ( 78 of 99 ) of silent FNR sites lacking FNR peaks in the Dhns/DstpA strain were enriched for IHF and/or Fis , suggesting that these NAPs still occluded FNR binding in the absence of H-NS and StpA ( Table S6 ) . 
+ Taken together , these results establish that removal of H-NS and StpA allowed FNR to bind to sites 
+ Nearly all FNR peaks found in the WT strain were retained in the Dhns/DstpA mutant ( 163 of 169 peaks ; Figure 4A , Table S7 ) , but a small proportion ( ,15 % ) showed a significant increase in peak average ( average log2 ( IP/INPUT ) value of the binding region ) in the Dhns/DstpA strain ( Figure 4D ) . 
+ The majority of these FNR peaks with increased peak averages were also bound by H-NS in the WT strain , suggesting that removing H-NS allowed for increased cross-linking or immunoprecipitation of FNR at these loci likely due to changes in chromosomal structure in the absence of H-NS and StpA [ 35 ] . 
+ In contrast , removing H-NS did not affect 
+ FNR occupancy or cross-linking at locations lacking H-NS ChIP signal in WT strains . 
+ We conclude that H-NS reduces or blocks FNR binding at many locations in vivo . 
+ Operons in the FNR regulon were organized into seven regulatory categories 
+ To determine which FNR binding events from the WT strain caused a change in gene expression , the FNR occupancy data were correlated with the 122 operons differentially expressed ( DE ) by FNR ( Table S8 ) . 
+ Surprisingly , less than a half of the 122 operons were correlated with a FNR ChIP-seq peak while less than a fourth of the 207 FNR ChIP-seq peaks were correlated with a FNR-dependent change in expression ( Figure S4 ) . 
+ To address this unexpected result , we systematically analyzed the regulation of all of these operons by incorporating published data and classified the operons into seven regulatory categories ( Figure 5 ) . 
+ Category 1 ( Table 1 ) contained operons that were directly activated by FNR because they showed a FNR-dependent increase in anaerobic transcript levels and a FNR ChIP-seq peak within 500 nt of the translation start site of the first gene of an operon . 
+ Category 2 ( Table 1 ) contained operons that were directly repressed by FNR 
+ ( showed a FNR-dependent decrease in expression and had a FNR ChIP-seq peak ) . 
+ Categories 3 -- 5 contained a surprisingly large number of operons ( 156 ) with a FNR ChIP-seq peak within 500 nt of the translation start site of the first gene of an operon but no FNR-dependent change in expression . 
+ Previously published studies ( 23 operons ) and our additional collation of other relevant 
+ TF-binding sites ( 52 operons ) suggest that at least half ( 75 ) of these sites may be directly regulated by FNR under alternative growth conditions ( Table S9 ) . 
+ For example , Category 3 ( Tables 2 and 3 ) contained operons known or proposed to be co-regulated by FNR and another TF under growth conditions not used in our study . 
+ Category 4 ( Table 4 ) contained operons known to be repressed by another TF under our growth conditions . 
+ Category 5 ( Table S9 ) contained operons with other potential regulatory mechanisms . 
+ Category 6 ( Table 5 , Table S10 ) contained operons that were indirectly regulated by FNR because no FNR ChIP-seq peak was found within 500 nt of the translation start site despite showing a FNR-dependent change in expression . 
+ Finally , Category 7 ( Table 
+ S11 ) contained operons with a FNR peak identified only in the 
+ Dhns/DstpA strain , which also showed potential FNR regulation in the absence of H-NS and StpA . 
+ Category 1 - Direct activation by FNR
+ The 32 operons directly activated by FNR ( Table 1 ) contain some of the best-studied FNR regulated operons . 
+ In addition to operons associated with anaerobic respiration ( dmsABC , frdABCD , nrfABCDEFG , narGHJI ) [ 41 -- 43 ] , this category included glycolytic ( pykA ) and fermentative enzymes ( pflB and ackA ) , which would be expected to promote mixed acid fermentation of glucose to ethanol , acetate , formate and succinate in the absence of an added electron acceptor ( Figure 6 ) , the conditions used in this study . 
+ As expected , we also found that these promoters showed an increase in s occupancy , as illustrated by representative 70 FNR and s data for FNR activation of dmsABC ( Figure 7 ) , 70 providing a proof-of-principle for our approach . 
+ While expression of many operons in this category was known to be FNR regulated , only about half had been shown to directly bind FNR ( Table 1 ) . 
+ FNR also directly activated operons with functions that illustrate the broader role of FNR in anaerobic metabolism : pepE , a peptidase , suggesting peptide degradation in E. coli similar to that observed in Salmonella [ 44 ] ; ynjE , an enzyme involved in biosynthesis of molybdopterin , a cofactor used by anaerobic respiratory enzymes [ 45 ] ; pyrD , a dihydroorotate dehydrogenase in pyrimidine biosynthesis [ 46 ] ; and ynfK , a predicted dethiobiotin synthetase and paralog of BioD of the biotin synthesis pathway . 
+ The activation of the biofilm TF bssR by FNR suggests a link between biofilm formation and anaerobiosis ( Table 1 ) . 
+ FNR directly activated the carnitine-sensing TF CaiF , confirming a link between FNR and carnitine metabolism [ 29,47 ] . 
+ In addition , the 
+ FNR-enriched region found upstream of fnrS supports FNR direct transcription activation of this small regulatory RNA [ 48,49 ] , although the fnrS sRNA was not represented in our gene expression arrays and was too small to be detected by our RNA-seq protocol ( Table 1 ) . 
+ To determine the position of FNR binding sites relative to the TSS , we used the FNR PWM ( Figure 2 Inset ) to search the FNR enriched regions using a PatSer score threshold low enough to identify FNR sites from every ChIP peak [ 36 ] . 
+ A majority ( 89 % ) of the FNR ChIP-seq peaks in the FNR direct regulon contained one FNR binding site ( Table 1 ) . 
+ Of the 23 promoters directly activated by FNR with a known TSS , 19 FNR sites were centered at 241.5 ( 64 nt ) , the known position of a Class II site , while one site was centered at 260.5 ( Class I site ) ( Table 1 ) , supporting previous results suggesting a bias toward FNR binding Class II sites in activated promoters . 
+ Category 2 - Direct repression by FNR
+ Analysis of the 21 operons directly repressed by FNR revealed both simple and complex repression mechanisms ( Table 1 ) . 
+ The majority of the operons directly repressed by FNR showed expression patterns similar to that of ndh , encoding the aerobic NADH dehydrogenase II , which showed a FNR-dependent decrease in expression and decrease in s occupancy under 70 anaerobic growth conditions ( Figure 7 ) . 
+ These operons included nrdAB , the aerobic ribonucleotide reductase ; hisLGDC , a subset of the histidine biosynthesis enzymes ; fbaB , the class I fructose-1 ,6 - bisphosphate aldolase involved in gluconeogenesis ; and can , the carbonic anhydrase . 
+ FNR also repressed iraP , which encodes the anti-adaptor protein that stabilizes s , and rmf , which encodes the S stationary phase inducible ribosome modulation factor . 
+ In contrast , a subset of operons showed complex repression similar to cydAB , with an anaerobic dependent increase in expression despite the fact that anaerobic expression increased further in a strain lacking FNR , indicating partial repression ( Table 1 ) [ 50 ] . 
+ Nearly all of these operons are also co-regulated by ArcA ( Park and Kiley , Personal Communication ) suggesting that , like cydAB , FNR and ArcA co-regulation could lead to maximal expression of these genes under microaerobic conditions [ 50 ] . 
+ These operons include hdeD , gadE and hdeAB-yhiD , involved in acid stress response , and ompC and ompW , encoding outer membrane proteins . 
+ The finding that strains lacking ompC , rmf , and rpoS show decreased viability compared to single or double mutants [ 51 ] suggests that these proteins may function in a common stress response , potentially necessary under microaerobic growth conditions . 
+ Interestingly , for the 16 promoters directly repressed by FNR with a known TSS , the FNR binding sites were broadly distributed , ranging from 2125.5 to overlapping the +1 ( Table 1 ) . 
+ In sum , these results indicate the surprising finding that FNR directly represses a broad set of functions , including some stress responses , expanding the role of FNR beyond simply repressing genes associated with aerobic respiration . 
+ Finally , comparison of the transcriptomic data to changes in s 70 holo-RNAP ChIP-seq occupancy under aerobic and anaerobic growth conditions revealed that nearly all FNR-regulated operons are expressed using s RNAP . 
+ Increases or decreases in s 70 70 enrichment under anaerobic conditions correlated well , for the most part , with the expression changes for promoters activated or repressed by FNR , respectively , as well as expression changes in anaerobic and aerobic WT cultures ( Table 1 , Tables S2 and S12 ) . 
+ Three operons , which lacked s enrichment , have been shown to 70 E N be dependent on s ( hcp-hcr ) [ 52 ] , s ( hycABCDEFGHI ) [ 53 ] and s ( fbaB ) [ 54 ] , raising the possibility that alternative s factors S transcribe a subset of the FNR direct regulon . 
+ Category 3 – Co-activation by another TF and FNR
+ Comparison of our FNR data with published regulatory data suggested that many FNR regulated operons were co-activated by TFs not active during growth in GMM , specifically NarL , NarP and CRP . 
+ For example , FNR-dependent transcription of napF-DAGHBC , encoding the periplasmic nitrate reductase , requires co-activation by the NO3 / NO2 sensing response regulator NarP 2 2 [ 55 ] . 
+ Transcriptomic data [ 19 ] showed FNR and NarL or NarP dependent activation in the presence of NO3 and/or NO2 2 2 ( Table 2 ) [ 19 ] for nine operons that we found associated with FNR ChIP-seq peaks but lacking a FNR-dependent change in expression in our transcriptomic experiments , suggesting co-activation by NarL or NarP when NO3 and/or NO2 is present . 
+ 2 2 Another possible co-activator of operons in this group is CRP , which is inactive under glucose fermentation conditions presumably because of decreased cAMP [ 56 ] . 
+ Although previous studies have shown that ansB is co-activated by FNR and CRP [ 57 ] , we did not observe binding of FNR upstream of ansB in this study , potentially due to differences in growth conditions . 
+ Nevertheless , 12 operons within this group showed an increase in anaerobic expression in transcriptomic data obtained from WT strains grown with carbon sources other than glucose ( e.g. glycerol , mannose , arabinose or xylose ) compared to growth in glucose ( Table 3 ) [ 19,58 ] ( Park and Kiley , Personal Communication ) . 
+ A majority ( nine ) contained distinct CRP and FNR binding sites , suggesting co-activation by FNR and CRP when glucose is absent and cAMP levels are increased ( Table 3 ) . 
+ Interestingly , for the other three of these operons , guaB , ptsH and uxaB , the identified FNR binding site overlapped the CRP binding site , suggesting potential competition between FNR and CRP for binding when both TFs are active ( Table 3 ) . 
+ Category 4 -- Repression by another TF prevents FNR regulation 
+ We propose that FNR activation of ten operons is repressed by Fur under the iron replete conditions used here , similar to the known regulation of feoABC , encoding a ferrous iron uptake transporter [ 59 ] . 
+ In addition to feoABC , nine additional operons known to be bound by Fur had a FNR ChIP-seq peak but lacked a FNR-dependent change in expression , suggesting that Fur repression masked FNR regulation of these operons ( Table 4 ) . 
+ Category 5 -- Other potential regulatory mechanisms with FNR 
+ Expression of several of the remaining operons associated with FNR ChIP-seq peaks are known to require other TFs but were not known to be co-regulated by FNR , potentially explaining the lack of FNR-dependent regulation under our growth conditions . 
+ A subset of these FNR-regulated operons may be co-regulated by OxyR ( active under oxidative stress ) , CadC ( active at low external pH ) or PhoP ( active in low Mg concentration ) ( Table S9 ) . 
+ In a 2 + recent SELEX study [ 60 ] , three BasR binding sites were identified upstream of operons containing FNR peaks but without a FNR-dependent change in expression , suggesting BasR could possibly influence FNR regulation at these three promoters ( Table S9 ) . 
+ In some cases , promoter architecture may mask FNR regulation . 
+ A small number of operons ( 12 ) contained multiple TSSs , raising the possibility that FNR may regulate transcription from a TSS that does not increase the total transcript levels to above the cutoff used in our analyses ( Table S9 ) . 
+ Alternative s factors , active under other growth conditions , may also play a role in regulating transcription of a subset of these operons ( Table S9 ) . 
+ Taken together , we conclude that although FNR serves as a global signal for anaerobiosis , many operons likely require the combinatorial integration of TFs sensing other environmental signals for expression . 
+ Category 6 -- Indirect FNR regulation through hierarchical transcriptional regulator action 
+ Surprisingly , a large number of operons ( 70 ) were differentially expressed by FNR but were not associated with a FNR ChIP-seq peak , suggesting they are regulated by FNR indirectly ( Category 6 , Table S10 ) . 
+ To determine whether any of these operons had a FNR site upstream that was missed by ChIP-seq , sequences 500 nt upstream of these operons were searched using the FNR PWM and the algorithm PatSer with the PR curve determined threshold ( Figure 2 ) [ 36 ] . 
+ Only one operon , hmp , contained a predicted FNR-binding site and previous data also supported FNR binding to hmp [ 61 ] . 
+ Thus , 69 operons are indirectly regulated by FNR . 
+ The indirect regulation by FNR could be easily explained for 11 operons targeted by the small RNA FnrS , which is directly 2 activated by FNR [ 48,49 ] . 
+ These RNAs increased in the FNR strain because of the lower FnrS levels ( Table 5 ) [ 48,49 ] . 
+ Category 7 -- FNR regulation in the absence of H-NS and StpA 
+ To determine whether FNR binding to sites unmasked by the absence of H-NS and StpA caused a change in expression , we assayed if any of the corresponding genes were differentially expressed by O2 only in the Dhns/DstpA strain . 
+ Of the 158 new FNR peaks unmasked in the Dhns/DstpA strain , 18 genes showed an anaerobic increase in expression ( Table S11 ) , and consistent with this , many of the promoters contained a FNR binding site at a position associated with activation ( e.g. near 241.5 ) . 
+ For example , hemolysin E ( hlyE ) , in agreement with previous results [ 62 ] , and the anaerobic NAP Dan ( ttdR ) [ 63 ] showed increased expression under anaerobic conditions only . 
+ This suggests a possible role of Dan in the absence of H-NS and StpA . 
+ Only two genes showed a decrease in expression in the absence of H-NS and StpA ( yncD and feaR ) under only anaerobic growth conditions . 
+ However , the expression of the vast majority of genes having FNR bound at unmasked sites resulted in changes under both aerobic and anaerobic growth conditions , indicating that changes in nucleoid structure that occur in the absence of H-NS and StpA could cause misregulation of transcription . 
+ For example , H-NS and Rho coordinate to regulate transcriptional termination and the absence of H-NS may cause increased transcriptional readthrough of Rho-dependent terminators [ 64 ] . 
+ Thus , it seems likely that our analysis provides an underestimate of the impact of H-NS on FNR function , since physiological conditions that alter H-NS activity are likely to have less severe effects on nucleoid structure . 
+ Discussion
+ By combining genome-wide FNR occupancy data from ChIP-seq and ChIP-chip experiments with transcriptomic data , we uncovered new features of bacterial transcriptional regulation and the FNR regulon . 
+ Our findings suggest that in vivo FNR occupies only a subset of predicted FNR binding-sites in the genome , and that FNR binding can be blocked by NAPs like H-NS . 
+ Furthermore , the lack of correlation between match to consensus of FNR binding sites and 
+ ChIP enrichment suggests that variations in ChIP signal result from changes in cross-linking efficiency or epitope access rather than variable occupancy . 
+ We found that the FNR regulon is malleable ; the set of genes controlled by FNR can be readily tailored to changing growth conditions that may activate or inactivate other TFs , allowing flexible reprograming of transcription . 
+ This strategy would allow the regulon to expand or contract depending on available nutrients , providing a competitive advantage in the ecological niche of E. coli of the mammalian gut [ 65 ] . 
+ FNR peak height does not correlate with the match to the FNR consensus site 
+ The finding that there was little relationship between peak height and the quality of the FNR motif differs from the results found for LexA , which showed a correlation between peak height and match to consensus [ 10 ] . 
+ Our data suggest that FNR peak height may be more related to the efficiency of cross-linking or immunoprecipitation since sites that appear to be saturated for binding displayed significantly different peak heights . 
+ Thus , at least for FNR , peak height can not be used to assess relative differences in site occupancy between chromosomal sites . 
+ Cross-linking or immunoprecipitation of FNR may be less efficient than for LexA because the larger number of other regulators bound at FNR-regulated promoters may affect accessibility to the cross ¬ 
+ FNR sites having either a strong match to consensus ( for example , ydfZ -- TTGATaaaaAACAA ) or a weak match ( for example , frdA -- TCGATctcgTCAAA ) were saturated for binding at FNR dimer concentrations at its cellular level ( ,2.5 mM ) [ 37 ] ; thus , in vivo most accessible FNR sites are likely to be fully occupied . 
+ These data also revealed that FNR occupancy was not significantly different for strong and weak sites over the tested range of FNR dimer concentrations , suggesting that in vivo FNR binding is unlikely to be dictated solely by the intrinsic affinity of FNR binding sites . 
+ Genome-wide data reveal FNR binding throughout the chromosome is influenced by other cellular factors beyond the presence of a FNR motif Our finding that not all predicted FNR binding sites are bound by FNR in vivo offers new insight into the accessibility of the genome for binding TFs . 
+ Previous studies have predicted anywhere from 12 to 500 FNR binding sites in the E. coli genome [ 66 -- 69 ] , depending on the algorithm used . 
+ Of the 187 FNR binding sites predicted here , only 63 contained a corresponding FNR ChIP-seq peak in the WT strain , suggesting many high quality FNR sites are not bound . 
+ Although some of these silent sites may result from false negatives in the ChIP experiments ( e.g. failure to immunoprecipitate FNR bound at some sites ) , only five of the 124 silent FNR sites ( acnA , aldA , hyfA , hmp and iraD ) showed any evidence of FNR regulation in prior studies [ 70 ] . 
+ Rather , several lines of evidence suggest that binding of NAPs or other TFs in vivo masks FNR binding at many of these sites . 
+ First , we observed that binding sites for the NAPs IHF , H-NS , and Fis were statistically overrepresented at the positions of silent FNR binding sites , suggesting these proteins occlude FNR binding . 
+ Second , we found that in the absence of H-NS and StpA , additional FNR binding sites became available for FNR binding as detected by ChIP , suggesting that NAPs influence FNR site availability in vivo . 
+ A similar effect has been observed in eukaryotes , where extensive research on TF site availability has shown that chromatin structure in vivo can block binding of TFs ( e.g. Pho4 , Leu3 and Rap1 ) to high quality DNA binding sites [ 1,2,4 ] . 
+ Additionally , known changes to chromosomal structure by IHF , Fis , and H-NS have been shown to inhibit DNA binding of other proteins [ 7,12,71 ] . 
+ Thus , if the binding profiles of NAPs change under alternative growth conditions , then the occluded FNR binding sites would likely become available for FNR binding . 
+ Nonetheless , the fact that the 207 FNR-enriched regions from this study included 80 % of the 63 regions identified by Grainger et al. ( Table S5 ) , despite the difference in the growth conditions and experimental design [ 29 ] , suggests that the overlapping subset of FNR binding events may reflect a core set that is insensitive to growth conditions or binding of other TFs . 
+ Furthermore , binding events specific to each growth condition may be reflective of either changes in accessibility of FNR to binding sites due to changes in DNA-binding protein distribution or perhaps increases in activity of a second TF that binds cooperatively . 
+ Other regulators , such as CRP , a closely related member of the FNR protein family , also appear to have more binding sites available genome-wide than are occupied in vivo under tested growth conditions . 
+ Shimada et al. identified 254 CRP-cAMP binding sites using Genomic SELEX screening , which was 3 -- 4 fold more than the number of CRP sites previously identified by ChIP-chip experiments [ 72,73 ] ; thus not all chromosomal CRP sites appear to be accessible for binding , although additional experiments would be required to explicitly examine the accessibility of CRP binding sites throughout the genome . 
+ Taken together , these results suggest that the restrictive effect of chromosomal structure could influence TF binding beyond FNR . 
+ Environmental stimuli that change NAP distribution would also change TF binding site accessibility and affect transcription . 
+ For example , as E. coli enters the mammalian GI tract , it experiences a temperature increase from ,25 uC to 37uC , and this increase in temperature has been shown to affect transcription of a number of operons , including increased expression of anaerobic-specific operons [ 74,75 ] . 
+ Because H-NS binding is sensitive to changes in temperature [ 76,77 ] , an explanation for these temperaturedependent transcriptional changes [ 74,75 ] could be genome-wide decreases in H-NS binding and distribution ; these changes could increase the accessibility of the binding sites for FNR and other TFs to regulate transcription . 
+ Supporting this explanation , several genes with a temperature dependent increase in expression showed FNR binding and regulation in the absence of H-NS and StpA , including hlyE , feaR , yaiV , and torZ . 
+ The activity of NAPs can also be affected by the binding of other condition specific TFs . 
+ For example , ChIP-chip and Genomic SELEX analysis of the stationary phase LysR-type TF , LeuO , suggested that binding of LeuO antagonized H-NS activity , but not necessarily H-NS binding , throughout the genome in Salmonella enterica and E. coli [ 78,79 ] . 
+ Thus , a picture emerges from our data that binding of FNR is dependent on characteristics of the genome beyond the presence of a FNR binding site ; this restrictive effect of chromosome structure by NAPs may affect binding of other TFs in bacteria . 
+ NAPs have been shown to occlude and affect binding of TFs and other DNA binding proteins , such as restriction endonucleases and DNA methylation enzymes , suggesting a general role of NAPs in regulating genome accessibility by bending , wrapping and bridging the DNA structure [ 7,12,13,27,42,76,80,81 ] . 
+ Additionally , NAPs influence DNA supercoiling , which has been shown to affect binding of the TFs Fis and OmpR in S. enterica [ 82,83 ] , providing another mechanism by which NAPs can change the chromosomal structure to influence TF-DNA binding . 
+ Taken together , our results support a dynamic model of complex genome structure that affects TF binding to control gene regulation in bacteria . 
+ Condition-specific expression of the FNR regulon likely requires other transcription factors 
+ Although expression of a subset of the operons in the FNR regulon appeared to require only FNR for regulation ( Categories 1 and 2 ) , our findings point to widespread cooperation between FNR and other TFs for condition-specific regulation ( Categories 3 and 4 ) . 
+ Changes in activity of these TFs would result in FNR regulation to adapt to changes in environment , such as growth in non-catabolite repressed carbon sources ( CRP ) [ 57 ] , anaerobic respiration of nitrate ( NarL and NarP ) [ 19 ] , and growth in ironlimiting conditions ( Fur ) [ 84 ] . 
+ Although this co-regulation provides insight into growth conditions that should allow FNR-dependent changes in gene expression , the synergistic regulators for many promoter regions bound by FNR are currently unknown ( Category 5 ) , but would likely be identified in future genome-scale studies using different growth conditions , particularly microaerobic growth , which has been shown to affect FNR regulation of virulence genes in the pathogen Shigella flexneri [ 85 ] . 
+ Overall , our results suggest that the regulation of a subset of FNR-dependent promoters in E. coli may depend on combinatorial regulation with other TFs , a mechanism that resembles regulation of eukaryotic promoters [ 8,20,86 ] . 
+ These experimental data support previous in silico regulatory models generated using published data [ 87 -- 89 ] , suggesting combinatorial regulation may be common in E. coli . 
+ Further , ChIP-chip and ChIP-seq analyses of other TFs in E. coli ( e.g. CRP , Fis , and IHF ) and Salmonella typhimurium ( e.g. Sfh , a H-NS homolog ) , identified many TF binding sites that did not correlate with changes in gene expression in corresponding TF-specific transcriptomic experiments [ 30,33,73,90,91 ] . 
+ These results raise the possibility of potential combinatorial regulation for other TFs , although additional analysis is required to support this notion . 
+ The indirect FNR regulon also involves other regulators 
+ We found that FNR directly controls expression of five secondary regulators , most of which are also regulated by specific cofactors , suggesting that the scope of the indirect FNR regulon ( Category 6 ) is also likely to change depending on growth conditions . 
+ Of the five regulators , three act in an apparent hierarchal manner . 
+ The small RNA FnrS , which is upregulated by FNR and is suggested to stimulate mRNA turnover , decreased the mRNA levels of multiple FnrS target genes in GMM [ 48,49 ] . 
+ Expression of the TF CaiF was also activated by FNR , but the genes regulated by CaiF were not expressed in GMM because CaiF requires the effector carnitine to be active [ 92 ] . 
+ FNR , activated BssR , a TF involved in biofilm formation . 
+ About 40 operons are thought to be controlled by BssR [ 93 ] , but none of the five BssR-dependent operons in the FNR indirect regulon that we 2 tested by qRT-PCR showed any change in expression in a BssR strain ( data not shown ) ; thus , under our growth conditions , BssR appeared to be inactive . 
+ FNR also directly repressed the expression of two TFs , including the pyruvate sensing TF PdhR which represses several operons in the absence of pyruvate [ 94,95 ] . 
+ Although one might expect that PdhR repressed genes would increase anaerobically , many of these genes are redundantly repressed by ArcA ( Park and Kiley , Personal Communication ) ; thus the impact of PdhR may be negligible under anaerobic growth in GMM . 
+ Similarly , the TF GadE , which is active at low pH [ 96 ] , was also directly repressed by FNR and accordingly the operons in the GadE regulon were not identified as part of the indirect FNR regulon in GMM . 
+ Finally , we note the caveat that some operons that appear indirectly regulated by FNR may change expression as a result of indirect physiological and metabolic effects in a FNR strain , 2 which may alter the activity of other TFs , resulting in misregulation of operons . 
+ For example , our data show that FNR does not directly regulate arcA transcription , but previous results have suggested that ArcA activity may be affected by the metabolic changes that occur when fnr is deleted [ 97 ] . 
+ Thus , although a subset of ArcA regulatory targets ( 29 operons ) showed potential indirect FNR regulation , such effects were likely caused by changes in the phosphorylation state of ArcA resulting from metabolic changes in a FNR strain ( Table S13 ) ( Park and Kiley , ¬ 
+ Personal Communication).
+ In conclusion , our results reveal complex features of TF binding in bacteria and expand our understanding of how E. coli responds to changes in O2 and other environmental stimuli . 
+ A subset of predicted FNR binding sites appear to be inhibited by NAPs and are available in the absence of H-NS and StpA , suggesting that the bacterial genome is not freely accessible for TF binding and that changes in TF binding site accessibility could result in changes in transcription . 
+ Finally , correlation of the occupancy data with transcriptomic data suggests that FNR serves as a global signal of anaerobiosis but the expression of a subset of operons in the FNR regulon requires other regulators sensitive to alternative environmental stimuli . 
+ This strategy is reminiscent of global regulation by CRP-cAMP [ 73 ] in that FNR , like CRP , is bound at many promoters under specific conditions without corresponding changes in mRNA levels , suggesting a common strategy whereby promoters are primed to be activated when the appropriate growth conditions are encountered . 
+ Materials and Methods
+ Strains and growth conditions
+ All strains were grown in MOPS minimal medium supplemented with 0.2 % glucose ( GMM ) [ 98 ] at 37uC and sparged with a gas mix of 95 % N2 and 5 % CO2 ( anaerobic ) or 70 % N2 , 5 % CO2 , and 25 % O2 ( aerobic ) . 
+ Cells were harvested during mid-log growth ( OD600 of ,0.3 using a Perkin Elmer Lambda 25 UV/Vis Spectrophotometer ) . 
+ E. coli K-12 MG1655 ( F - , l - , rph-1 ) and PK4811 ( MG1655 DfnrVSp / Sm ) [ 99 ] were used for the ChIP-R R chip , ChIP-seq and transcriptomic experiments unless otherwise specified . 
+ All data obtained in this study used GMM as the growth media , and although we know that not all promoters directly regulated by FNR are expressed under these conditions , this has the advantage that both mutant and parental strains exhibit the same growth rate . 
+ For experiments that varied the in vivo concentration of FNR , a strain that contained a single , chromosomal copy of WT fnr under the control of the Ptac promoter at the l attachment site was constructed . 
+ Following digestion of pPK823 [ 99 ] with XbaI and HindIII , the DNA fragment containing fnr was cloned into the XbaI and HindIII sites of pDHB60 ( Ap ) [ 100 ] to form pPK6401 . 
+ R Plasmid pPK6401 was transformed into DHB6521 [ 100 ] and the Ptac-fnr construct was stably integrated into the l attachment site using the Lambda InCh system as described [ 100 ] to produce PK6410 . 
+ P1vir transduction was used to move the Ptac-fnr , Ap R allele into strain PK8257 , which contains the FNR activated ydfZ promoter-lacZ fusion and deletion of lacY . 
+ This strain was transformed with pACYClacI - CAM [ 101 ] to generate PK8263 . 
+ Q To determine the effect of FNR on the expression of the BssR regulon , a DbssR strain was constructed by P1vir transduction of DbssR : : kan from the Keio collection [ 102 ] into MG1655 to R generate PK8923 . 
+ To determine the role of H-NS on FNR binding , first stpA was recombined with the Cm gene , cmr , using l R red recombination and the pSIM plasmid [ 103 ] . 
+ P1vir transduction introduced the Dhns : : kan allele from the Keio collection [ 102 ] R into the strain lacking stpA to generate the Dhns/DstpA strain . 
+ RNA isolation
+ Total RNA was isolated as previously described [ 104 ] . 
+ The concentration of the purified RNA was determined using a NanoDrop 2100 , while the integrity of the RNA was analyzed using an Agilent 2100 Bioanalyzer and the RNA Nano LabChip platform ( Agilent ) . 
+ Whole genome transcriptomic microarray analysis
+ Total RNA ( 10 mg ) from two biological replicates each of MG1655 ( + O2 and 2O2 ) and PK4811 was reverse transcribed using random hexamers ( Sigma ) and the SuperScript II Double-Stranded cDNA Synthesis Kit ( Invitrogen ) following the manufacturer 's protocol . 
+ The cDNA ( 1 mg ) was fluorescently labeled with Cy3-labeled 9 mers ( Tri-Link Biotechnologies ) with Klenow Fragment ( NEB ) for 2 hours at 37uC and recovered using ethanol precipitation . 
+ Labeled dsDNA ( 2 mg ) was hybridized onto the Roche NimbleGen E. coli 4plex Expression Array Platform ( 4672,000 probes , Catalog Number A6697-00-01 ) for ,16 hours at 42uC in a NimbleGen Hybridization System 4 ( Roche NimbleGen ) following the manufacturer 's protocol . 
+ The hybrid-ized microarrays were scanned at 532 nm with a pixel size of 5 mm using a GenePix 4000B Microarray Scanner ( Molecular Devices ) , and the PMT was adjusted until approximately 1 % of the total probes were saturated for fluorescence intensity . 
+ The data were normalized using the Robust Multichip Average ( RMA ) algorithm in the NimbleScan software package , version 2.5 [ 105 ] . 
+ ArrayStar 3.0 ( DNASTAR ) was used to identify genes that showed at least a two-fold change in expression between the WT and Dfnr strains and were significantly similar among biological replicates , using a moderated t-test ( p-value ,0.01 ) [ 106 ] . 
+ Genes were organized into operons using data from EcoCyc [ 70 ] . 
+ An operon was called differentially expressed ( DE ) if only one gene within an operon showed a statistically significant change in expression . 
+ NimbleGen microarrays identified 214 statistically significant DE genes that 
+ The anaerobic MG1655 and FNR samples from the 2 normalized whole genome expression microarray data from Kang et al. [ 18 ] were also analyzed . 
+ Genes were determined to be DE if they had a change in expression greater than or equal to two-fold and if the genes were found to be statistically similar between biological replicates using a t-test ( p-value ,0.01 ) . 
+ An operon was called DE if only one gene within an operon showed a statistically significant change in expression . 
+ This analysis identified 204 significant DE genes in 130 operons . 
+ Sixty operons were found to be DE in both the NimbleGen and Kang et al. data sets ( Table S8 ) . 
+ Of the 70 operons found DE in only the Kang et al. data set , 41 operons were just below the significance threshold in the NimbleGen data set and 11 operons resulted from activation of the flagellar regulon due to an insertion upstream of flhDC , which was absent in the isolate of MG1655 used in this study . 
+ The Dhns/DstpA aerobic and anaerobic expression data were obtained from stand specific , single stranded cDNA hybridized to custom designed , high-density tiled microarrays containing 378,000 probes from alternate strands , spaced every ,12 bp through the genome as described previously [ 107 ] except Cy3 was used instead of Cy5 . 
+ Microarray hybridization and scanning were performed as described above except that the PMT was adjusted until the median background value was ,100 . 
+ All probe data were normalized using RMA in the NimbleScan software package , version 2.5 [ 105 ] . 
+ Gene probe values found to be significantly different between two biological replicates using a Benjamini & Hochberg corrected t-test ( p-value ,0.05 ) were eliminated from further analysis . 
+ Genes were called DE if the median log values 2 were different by more than two-fold and if the genes were significantly different using an ANOVA test ( p-value ,0.05 ) . 
+ High-throughput RNA sequencing ( RNA-seq ) analysis 
+ To enrich for mRNA from total RNA , the 23S and 16S rRNA were removed using the Ambion MICROBExpress kit ( Ambion ) following manufacturer 's guidelines , except the total RNA was incubated with the rRNA oligonucleotides for one hour instead of 15 minutes . 
+ The rRNA depleted RNA samples isolated from two biological replicates of MG1655 and its FNR derivative were 2 processed by the Joint Genome Institute ( JGI ) for RNA-seq library creation and sequencing . 
+ The RNAs were chemically fragmented using RNA Fragmentation Reagents ( Ambion ) to the size range of 200 -- 250 bp using 16 fragmentation solution for 5 minutes at 70uC ( Ambion ) . 
+ Double stranded cDNA was generated using the SuperScript Double-Stranded cDNA Synthesis Kit ( Invitrogen ) following the manufacturer 's protocol . 
+ The Illumina Paired End 
+ Sample Prep kit was used for Illumina RNA-seq library creation using the manufacturer 's instructions . 
+ Briefly , the fragmented cDNA was end repaired , ligated to Illumina specific adapters and amplified with 10 cycles of PCR using the TruSeq SR Cluster Kit ( v2 ) . 
+ Single-end 36 bp reads were generated by sequencing on the Illumina Genome Analyzer IIx , using the TruSeq SBS Kit ( v5 ) following the manufacturer 's protocol . 
+ Resulting reads were aligned to the published E. coli K-12 MG1655 genome 
+ ( U00096 .2 ) using the software package SOAP , version 2.20 [ 108 ] , allowing no more than two mismatches . 
+ Reads aligning to repeated elements in the genome ( for example rRNA ) were removed from analysis . 
+ For reads that had no mapping locations for the first 36 bp , the 3 -- 30 bp subsequences were used in the subsequent mapping to the reference genome . 
+ Reads that had unique mapping locations and did not match annotated rRNA genes were used for further analysis . 
+ For each gene , the tag density was estimated as the number of aligned sequencing tags divided by gene size in kb and normalized using quantile normalization . 
+ The tag density data were analyzed for statistically significant differential expression using baySeq , version 2.6 [ 109 ] with a 
+ FDR of 0.01 , and genes were organized into operons using data from EcoCyc [ 70 ] . 
+ An operon was called DE if only one gene within an operon showed a statistically significant change in expression . 
+ The RNA-seq analysis identified 133 statistically significant DE operons ( 197 genes ) . 
+ Altogether , microarray and RNA-seq experiments identified 258 operons DE by FNR and slightly fewer than half of these operons ( 122 ) were found in at least two of the transcriptomic experiments ( Figure S5 , Table S8 ) . 
+ Chromatin immunoprecipitation followed by hybridization to a microarray chip or high-throughput sequencing ChIP assays were performed as previously described [ 110 ] , except that the glycine , the formaldehyde and the sodium phosphate mix were sparged with argon gas for 20 minutes before use to maintain anaerobic conditions when required . 
+ Samples were immunoprecipitated using polyclonal antibodies raised against FNR , IHF or H-NS , which had been individually absorbed against mutant strains lacking the appropriate protein . 
+ In the case of FNR , affinity purified antibodies were used in some experiments , purified using the method previously described [ 111 ] . 
+ For 70 RNA Polymerase , a s monoclonal antibody from NeoClone ( W0004 ) or a RNA Polymerase ß monoclonal antibody from NeoClone ( W0002 ) were used for immunoprecipitation . 
+ For FNR , neither lengthening the cross-linking time nor increasing or decreasing the amount of FNR antibody used in the ChIP protocol showed significant changes in the FNR ChIP-chip peak heights or number of peaks identified . 
+ For ChIP-chip , FNR ( three samples ) , FNR ( one sample ) , b ( two samples ) , H-NS ( two 2 samples ) and IHF ( two samples ) were fluorescently-labeled using Cy3 ( INPUT ) and Cy5 ( IP ) and hybridized for ,16 hours at 42uC in a NimbleGen Hybridization System 4 ( Roche NimbleGen ) to custom designed , high-density tiled microarrays containing 378,000 probes from alternate strands , spaced every ,12 bp through the genome . 
+ The hybridized microarrays were scanned at 532 nm ( Cy3 ) and 635 nm ( Cy5 ) with a pixel size of 5 mm using a GenePix 4000B Microarray Scanner ( Molecular Devices ) , and the PMT was adjusted until approximately 1 % of the total probes were saturated for fluorescence intensity of each dye used . 
+ The NimbleScan software package , version 2.5 ( Roche NimbleGen ) was used to extract the scanned data . 
+ ChIP-chip data were normalized within each microarray using quantile normalization ( `` normalize.quantiles '' in the R package VSN , version 3.26.0 ) [ 112 ] to correct for dye-dependent intensity differences as previously described [ 113 ] . 
+ Biological replicates were normalized between microarrays using quantile normalization as previously described [ 113 ] , and the normalized log2 ratio values ( IP over INPUT ) were averaged . 
+ There was a strong correlation between enriched regions of ChIP-chip biological replicates ( R = 0.7 ) . 
+ ChIP-chip peaks for FNR , H-NS and IHF were identified in each data set by the peak finding algorithm CMARRT , version 1.3 
+ ( FDR of 0.01 ) [ 114 ] and proportional Z-tests were used to determine significant differences between proportional data . 
+ For ChIP-seq experiments , 10 ng of immunoprecipitated and purified DNA fragments from the FNR ( two biological replicates ) and s samples ( two biological replicates from both aerobic and 70 anaerobic growth conditions ) , along with 10 ng of input control , were submitted to the University of Wisconsin-Madison DNA Sequencing Facility ( FNR samples and one s sample ) or the 70 Joint Genome Institute ( one s sample ) for ChIP-seq library 70 preparation . 
+ Samples were sheared to 200 -- 500 nt during the IP process to facilitate library preparation . 
+ All libraries were generated using reagents from the Illumina Paired End Sample Preparation Kit ( Illumina ) and the Illumina protocol `` Preparing Samples for ChIP Sequencing of DNA '' ( Illumina part # 11257047 RevA ) as per the manufacturer 's instructions , except products of the ligation reaction were purified by gel electropho-resis using 2 % SizeSelect agarose gels ( Invitrogen ) targeting either 70 275 bp fragments ( s libraries ) or 400 bp fragments ( FNR libraries ) . 
+ After library construction and amplification , quality and quantity were assessed using an Agilent DNA 1000 series chip assay ( Agilent ) and QuantIT PicoGreen dsDNA Kit ( Invitrogen ) , respectively , and libraries were standardized to 10 mM . 
+ Cluster generation was performed using a cBot Single 
+ Read Cluster Generation Kit ( v4 ) and placed on the Illumina cBot . 
+ A single-end read , 36 bp run was performed , using standard SBS kits ( v4 ) and SCS 2.6 on an Illumina Genome Analyzer IIx . 
+ Basecalling was performed using the standard Illumina Pipeline , version 1.6 . 
+ Sequence reads were aligned to the published E. coli K-12 MG1655 genome ( U00096 .2 ) using the software packages SOAP , version 2.20 , [ 108 ] and ELAND ( within the Illumina Genome Analyzer Pipeline Software , version 
+ 1.6 ) , allowing at most two mismatches . 
+ Sequence reads with sequences that did not align to the genome , aligned to multiple locations on the genome , or contained more than two mismatches were discarded from further analysis ( ,10 % of reads ) . 
+ For visualization the raw tag density at each position was calculated using QuEST , version 1.2 [ 115 ] , and normalized as tag density per million uniquely mapped reads . 
+ The read density was determined for each base in the genome for the IP and INPUT 70 samples for FNR and s samples . 
+ For FNR , peaks were identified using three peak finding algorithms : CisGenome , version 1.2 , NCIS , version 1.0.1 , and MOSAiCS , version 1.6.0 70 [ 116 -- 118 ] ( FDR for all of 0.05 ) , while s peaks were identified using NCIS , version 1.0.1 ( FDR of 0.05 ) . 
+ Further discussion of these algorithms is in Text S1 . 
+ Differences between aerobic and s 70 anaerobic ChIP-seq occupancy were determined using a one-t , sided , paired - test ( p-value 0.01 ) comparing 100 bp surrounding the center of each peak . 
+ To normalize between + O2 and 2O2 samples , the read counts for the enriched regions ( peaks ) for each sample were shifted by the negative median read count value of the background ( un-enriched ) signal . 
+ The p-values were adjusted using the Bonferroni method to correct for multiple testing . 
+ There was a strong correlation between ChIP-seq biological replicates ( R = 0.8 ) as well as between ChIP-chip and ChIP-seq data ( Figure S6 ) . 
+ All data were visualized in the MochiView 
+ Additional ChIP-chip - O2 data sets were performed for WT FNR and a Dfnr [ 99 ] control . 
+ The 15 FNR peaks identified only in ChIP-chip had low IP/INPUT ratios and were eliminated since ChIP-seq is known to have increased signal to noise relative to ChIP-chip [ 120 ] . 
+ The Dfnr - O2 ChIP-chip data identified 71 peaks that corresponded to peaks in the FNR - O2 ChIP-seq data , indicating they were not FNR specific , and were removed from the FNR ChIP-seq dataset ( Table S5 ) . 
+ FNR PWM construction and identification of predicted FNR binding sites at FNR ChIP-seq peaks 
+ To construct the FNR PWM , the sequence of a region of ,100 bp around the nucleotide with the largest tag density within each of the FNR ChIP-seq peaks ( the summit of each peak ) found by all three peak finding algorithms was analyzed . 
+ MEME was used to identify over-represented sequences [ 121 ] and the Delila software package was used to construct the PWMs [ 122 ] . 
+ To search all ChIP-seq peaks for the presence of the FNR PWM , a region of 200 bp around the summit of each FNR ChIP-seq peak was searched with the FNR PWM using PatSer , version 3e [ 36 ] , and the top four matches to the FNR PWM , as determined by PatSer PWM score , were recorded at each ChIP-seq peak . 
+ The standard deviation of the PatSer scores for the four FNR predicted binding sites at each ChIP-seq peak was determined and used as a threshold to determine the number of predicted binding sites at each peak . 
+ If the PatSer predicted FNR binding site at a peak with the highest PatSer score was more than one standard deviation greater than the PatSer predicted FNR binding site with the second best PatSer score , that peak was identified as having only one predicted FNR binding site . 
+ For FNR peaks ( ,11 % ) with the two best PatSer predicted FNR binding site scores less than one standard deviation apart , a Grubbs test for outliers was used a single time to identify outliers within the four PatSer predicted FNR binding sites at a peak ( a of 0.15 , critical Z of 1.04 ) . 
+ If a PatSer predicted FNR binding site at a FNR peak was identified as an outlier , it was removed from analysis and the standard deviation was re-calculated using the remaining three PatSer binding site scores at that peak . 
+ The remaining PatSer predicted FNR binding sites at the FNR peak were then re-examined as described above . 
+ After removing outlier PatSer predicted FNR binding sites , a peak was determined to contain two predicted FNR binding sites if the two best predicted FNR binding sites at that peak had PatSer scores less than one standard deviation apart . 
+ The precision-recall curve was constructed using the FNR PWM and searching throughout the genome using PatSer , version 3e [ 36 ] . 
+ Precision was defined as True Positives ( locations with a 
+ FNR ChIP-seq peak and a predicted FNR binding site ) divided by True Positives plus False Positives ( locations with a predicted FNR binding site but no FNR ChIP-seq peak ) . 
+ Recall was defined as True Positives divided by True Positives plus False Negatives ( locations with a FNR ChIP-seq peak but no FNR predicted binding site ) . 
+ A high precision value means all predicted binding sites are true positives , but there is a high false negative rate . 
+ A high recall value means all true positives have been captured , but there is a high false positive rate . 
+ Controlling expression of fnr with an IPTG-inducible promoter and performing ChIP-chip and analysis 
+ The strain with fnr under the control of Ptac ( PK8263 ) was used to study changes in [ FNR ] on ChIP-chip peak height . 
+ Cultures were grown anaerobically overnight in MOPS +0.2 % glucose and were subcultured to a starting OD600 of ,0.01 in MOPS +0.2 % glucose plus Cm20 and various [ IPTG ] ( 4 mM IPTG , 8 mM IPTG , and 16 mM IPTG ) . 
+ After this initial step , growth , ChIP-chip experiments ( two biological replicates of 4 and 8 mM IPTG and three biological replicates of 16 mM IPTG were used ) and initial analysis were identical to the procedures described above . 
+ Estimates of FNR concentration were determined by quantitative Western blot as previously described [ 37 ] . 
+ A novel method of normalization was developed to compare peak areas between IPTG concentrations for 35 peaks that showed a large distribution in peak heights and 4 peaks that were classified as false positives by enrichment in the Dfnr ChIP-chip sample . 
+ The peak finding algorithm CMARRT identified peaks in the WT FNR ChIP-chip sample , and this peak region was trimmed to include the center 50 % of the peak region . 
+ This trimmed region was used for each [ IPTG ] sample for consistency . 
+ For each of the 39 peaks examined , the probe values in a region of ,3000 bp beyond the peak boundary ( ,1500 bp upstream and downstream of the peak boundary ) was selected for analysis from each sample . 
+ Within the ,3000 bp region , the probes beyond the peak boundary were considered background for each sample . 
+ The median of the background ( un-enriched ) probes was calculated and the log2 IP / INPUT probe values for the entire peak region ( enriched and unenriched ) were shifted by the negative median value of the background probes . 
+ The peak average ( average of log2 IP/INPUT values ) and standard deviation was determined for 39 peak regions to compare between samples at each [ IPTG ] and WT ChIP-chip samples . 
+ A one-sided , paired t-test was performed between all conditions ( p-value ,0.05 ) to determine statistically significant changes in average peak values . 
+ Comparing FNR enrichment in WT and Dhns/DstpA genetic backgrounds using ChIP-chip analysis
+ Growth , ChIP-chip experiments , normalization and peak calling was performed as described above . 
+ To normalize between WT and Dhns/DstpA samples , the enriched regions ( peaks ) for each sample were shifted by the negative median log2 IP/INPUT value of the background ( un-enriched ) probes . 
+ The peak averages 
+ ( average of log2 IP/INPUT values ) were determined for each condition ( WT and Dhns/DstpA ) at each FNR peak found in either strain background . 
+ A one-sided , paired t-test with Bonferroni correction was performed between the two conditions ( pvalue ,0.05 ) to determine the statistically significant change in peak averages . 
+ For peaks found in both WT and Dhns/DstpA , peaks were identified as significantly higher in Dhns/DstpA using a one-sided , paired t-test with Bonferroni correction performed between the two conditions ( p-value ,0.05 ) and if the FNR peak average in the Dhns/DstpA strain was greater than the standard 
+ Data deposition and visualization
+ The ChIP-chip and ChIP-seq data can be visualized on GBrowse at the following address : `` http://heptamer.tamu.edu/ cgi-bin/gb2/gbrowse / MG1655 / '' . 
+ All genome-wide data from this publication have been deposited in NCBI 's Gene Expression Omnibus ( GSE41195 ) ( Table S14 ) [ 123 ] . 
+ Supporting Information
+ and contain both novel and known FNR sites . 
+ A t-test ( pvalue ,0.05 ) shows a statistically significant difference in peak average at all genes between 4 mM IPTG and 8 mM IPTG ( slyA and bssR ( panel H ) are significant at p-value ,0.1 ) . 
+ Panels Q through T are FNR ChIP peaks that were eliminated from further analysis because there was also enrichment in the Dfnr control experiment . 
+ A t-test ( p-value ,0.05 ) shows no statistically significant difference in peak average at these genes between any [ IPTG ] sample . 
+ ( EPS ) 
+ FNR sites in anaerobic GMM . 
+ E ) Distribution of H-NS binding region lengths throughout the genome in anaerobic GMM . 
+ ( EPS ) three transcriptomic data sets used in this study but lack a corresponding FNR ChIP-seq peak upstream ( Category 6 ) . 
+ ( EPS ) 
+ Expression - B ' ( operons found differentially expressed between WT and Dfnr from reanalysis of the Kang et al. data [ 18 ] . 
+ ( EPS ) 
+ Acknowledgments
+ We thank Gary Stormo for advice about PatSer and PWM searching , Erin Mettert and Erik Jessen for strain construction , Sarah Teter for affinity purified FNR antibody , and Nicole Beauchene and Dan Park for access to unpublished data . 
+ We also thank members of the Landick and Donohue lab for advice on data analysis and members of the Kiley lab for comments on the manuscript . 
+ Author Contributions
+ Conceived and designed the experiments : KSM RL PJK . 
+ Performed the experiments : KSM FT. Analyzed the data : HY IMO DC KL SK KSM . 
+ Contributed reagents/materials/analysis tools : DC KL SK . 
+ Wrote the paper : KSM RL PJK . 
+ 86 . 
+ Ishihama A ( 2010 ) Prokaryotic genome regulation : multifactor promoters , multitarget regulators and hierarchic networks . 
+ FEMS Microbiol Rev 34 : 628 -- 645 . 
+ 87 . 
+ Martınez-Antonio A , Janga SC , Salgado H , Collado-Vides J ( 2006 ) Internal-sensing machinery directs the activity of the regulatory network in Escherichia coli . 
+ J Mol Biol 14 : 22 -- 27 . 
+ 88 . 
+ Martınez-Antonio A , Collado-Vides J ( 2003 ) Identifying global regulators in transcriptional regulatory networks in bacteria . 
+ Curr Opin Microbiol 6 : 482 -- 489 . 
+ 89 . 
+ Martınez-Antonio A , Janga SC , Thieffry D ( 2008 ) Functional organisation of Escherichia coli transcriptional regulatory network . 
+ J Mol Biol 381 : 238 -- 247 . 
+ 90 . 
+ Cho B-K , Knight EM , Barrett CL , Palsson BØ ( 2008 ) Genome-wide analysis of Fis binding in Escherichia coli indicates a causative role for A - / AT-tracts . 
+ Genome Res 18 : 900 -- 910 . 
+ 91 . 
+ Dillon SC , Cameron ADS , Hokamp K , Lucchini S , Hinton JCD , et al. ( 2010 ) Genome-wide analysis of the H-NS and Sfh regulatory networks in Salmonella Typhimurium identifies a plasmid-encoded transcription silencing mechanism . 
+ Mol Microbiol 76 : 1250 -- 1265 . 
+ 92 . 
+ Buchet A , Eichler K , Mandrand-Berthelot MA ( 1998 ) Regulation of the carnitine pathway in Escherichia coli : investigation of the cai-fix divergent promoter region . 
+ J Bacteriol 180 : 2599 -- 2608 . 
+ 93 . 
+ Domka J , Lee J , Wood TK ( 2006 ) YliH ( BssR ) and YceP ( BssS ) regulate Escherichia coli K-12 biofilm formation by influencing cell signaling . 
+ Appl Environ Microbiol 72 : 2449 -- 2459 . 
+ 94 . 
+ Quail MA , Guest JR ( 1995 ) Purification , characterization and mode of action of PdhR , the transcriptional repressor of the pdhR-aceEF-lpd operon of Escherichia coli . 
+ Mol Microbiol 15 : 519 -- 529 . 
+ 95 . 
+ Ogasawara H , Ishida Y , Yamada K , Yamamoto K , Ishihama A ( 2007 ) PdhR ( pyruvate dehydrogenase complex regulator ) controls the respiratory electron transport system in Escherichia coli . 
+ J Bacteriol 189 : 5534 -- 5541 . 
+ 96 . 
+ Hommais F , Krin E , Coppée J-Y , Lacroix C , Yeramian E , et al. ( 2004 ) GadE ( YhiE ) : a novel activator involved in the response to acid environment in Escherichia coli . 
+ Microbiology 150 : 61 -- 72 . 
+ 97 . 
+ Iuchi S , Aristarkhov A , Dong J , Taylor J , Lin E ( 1994 ) Effects of nitrate respiration on expression of the Arc-controlled operons encoding succinate dehydrogenase and flavin-linked L-lactate dehydrogenase . 
+ J Bacteriol 176 : 1695 -- 1701 . 
+ 98 . 
+ Neidhardt FC , Bloch PL , Smith DF ( 1974 ) Culture medium for Enterobacteria . 
+ J Bacteriol 119 : 736 -- 747 . 
+ 99 . 
+ Lazazzera BA , Bates DM , Kiley PJ ( 1993 ) The activity of the Escherichia coli transcription factor FNR is regulated by a change in oligomeric state . 
+ Genes Dev 7 : 1993 -- 2005 . 
+ 100 . 
+ Boyd D , Weiss DS , Chen JC , Beckwith J ( 2000 ) Towards single-copy gene expression systems making gene cloning physiologically relevant : lambda InCh , a simple Escherichia coli plasmid-chromosome shuttle system . 
+ J Bacteriol 182 : 842 -- 847 . 
+ 101 . 
+ Derman AI , Puziss JW , Bassford PJ Jr , Beckwith J ( 1993 ) A signal sequence is not required for protein export in prlA mutants of Escherichia coli . 
+ EMBO J 12 : 879 -- 888 . 
+ 102 . 
+ Baba T , Ara T , Hasegawa M , Takai Y , Okumura Y , et al. ( 2006 ) Construction of Escherichia coli K-12 in-frame , single-gene knockout mutants : the Keio collection . 
+ Mol Syst Biol 2 : 2006.0008 . 
+ 103 . 
+ Yu D , Ellis HM , Lee EC , Jenkins NA , Copeland NG , et al. ( 2000 ) An efficient recombination system for chromosome engineering in Escherichia coli . 
+ Proc Natl Acad Sci USA 97 : 5978 -- 5983 . 
+ 104 . 
+ Khodursky AB , Bernstein JA , Peter BJ , Rhodius V , Wendisch VF , et al. ( 2003 ) Escherichia coli spotted double-strand DNA microarrays : RNA extraction , labeling , hybridization , quality control , and data management . 
+ Methods Mol Biol 224 : 61 -- 78 . 
+ 105 . 
+ Irizarry RA , Bolstad BM , Collin F , Cope LM , Hobbs B , et al. ( 2003 ) Summaries of Affymetrix GeneChip probe level data . 
+ Nucleic Acids Res 31 : e15 . 
+ 106 . 
+ Smyth GK ( 2004 ) Linear models and empirical bayes methods for assessing differential expression in microarray experiments . 
+ Stat Appl Genet Mol Biol 3 : Article3 . 
+ 107 . 
+ Cho B-K , Zengler K , Qiu Y , Park YS , Knight EM , et al. ( 2009 ) The transcription unit architecture of the Escherichia coli genome . 
+ Nat Biotechnol 27 : 1043 -- 1049 . 
+ 108 . 
+ Li R , Yu C , Li Y , Lam T-W , Yiu S-M , et al. ( 2009 ) SOAP2 : an improved ultrafast tool for short read alignment . 
+ Bioinformatics 25 : 1966 -- 1967 . 
+ 109 . 
+ Hardcastle TJ , Kelly KA ( 2010 ) baySeq : Empirical Bayesian methods for identifying differential expression in sequence count data . 
+ BMC Bioinformatics 11 : 422 . 
+ 110 . 
+ Davis SE , Mooney RA , Kanin EI , Grass J , Landick R , et al. ( 2011 ) Mapping E. coli RNA Polymerase and associated transcription factors and identifying promoters genome-wide . 
+ Meth Enzymol 498 : 449 -- 471 . 
+ 111 . 
+ Witte K , Schuh AL , Hegermann J , Sarkeshik A , Mayers JR , et al. ( 2011 ) TFG-1 function in protein secretion and oncogenesis . 
+ Nat Cell Biol 13 : 550 -- 558 . 
+ 112 . 
+ Huber W , Heydebreck von A , Sültmann H , Poustka A , Vingron M ( 2002 ) Variance stabilization applied to microarray data calibration and to the quantification of differential expression . 
+ Bioinformatics 18 Suppl 1 : S96 -- S104 . 
+ 113 . 
+ Dufour YS , Landick R , Donohue TJ ( 2008 ) Organization and evolution of the biological response to singlet oxygen stress . 
+ J Mol Biol 383 : 713 -- 730 . 
+ 114 . 
+ Kuan PF , Chun H , Keleş S ( 2008 ) CMARRT : a tool for the analysis of ChIP-chip data from tiling arrays by incorporating the correlation structure . 
+ Pac Symp Biocomput : 515 -- 526 . 
+ 115 . 
+ Valouev A , Johnson DS , Sundquist A , Medina C , Anton E , et al. ( 2008 ) Genome-wide analysis of transcription factor binding sites based on ChIP-seq data . 
+ Nat Methods 5 : 829 -- 834 . 
+ 116 . 
+ Ji H , Jiang H , Ma W , Johnson DS , Myers RM , et al. ( 2008 ) An integrated software system for analyzing ChIP-chip and ChIP-seq data . 
+ Nat Biotechnol 26 : 1293 -- 1300 . 
+ 117 . 
+ Kuan PF , Chung D , Pan G , Thomson JA , Stewart R , et al. ( 2011 ) A statistical framework for the analysis of ChIP-seq data . 
+ J Am Stat Assoc 106 : 891 -- 903 . 
+ 118 . 
+ Liang K , Keleş S ( 2012 ) Normalization of ChIP-seq data with control . 
+ BMC Bioinformatics 13 : 199 . 
+ 119 . 
+ Homann OR , Johnson AD ( 2010 ) MochiView : versatile software for genome browsing and DNA motif analysis . 
+ BMC Biol 8:49 . 
+ doi : 10.1186 / 1741-7007-8-49 . 
+ 120 . 
+ Aleksic J , Russell S ( 2009 ) ChIPing away at the genome : the new frontier travel guide . 
+ Mol Biosyst 5 : 1421 -- 1428 . 
+ 121 . 
+ Bailey TL , Elkan C ( 1994 ) Fitting a mixture model by expectation maximization to discover motifs in biopolymers . 
+ Proceedings on the Second International Conference on Intelligent Systems for Molecular Biology : 28 -- 36 . 
+ 122 . 
+ Schneider TD , Stormo GD , Yarus MA , Gold L ( 1984 ) Delila system tools . 
+ Nucleic Acids Res 12 : 129 -- 140 . 
+ 123 . 
+ Edgar R , Domrachev M , Lash AE ( 2002 ) Gene Expression Omnibus : NCBI gene expression and hybridization array data repository . 
+ Nucleic Acids Res 30 : 207 -- 210 . 
+ 124 . 
+ Neuweger H , Persicke M , Albaum SP , Bekel T , Dondrup M , et al. ( 2009 ) Visualizing post genomics data-sets on customized pathway maps by ProMeTra - aeration-dependent gene expression and metabolism of Corynebacterium glutamicum as an example . 
+ BMC Syst Biol 3 : 82 . 
+ 125 . 
+ Li H , Lovci MT , Kwon YS , Rosenfeld MG , Fu XD , et al. ( 2008 ) Determination of tag density required for digital transcriptome analysis : application to an androgen-sensitive prostate cancer model . 
+ Proc Natl Acad Sci USA 105 : 20179 -- 20184 . 
+ 126 . 
+ Wu H , Tyson KL , Cole JA , Busby SJ ( 1998 ) Regulation of transcription initiation at the Escherichia coli nir operon promoter : a new mechanism to account for co-dependence on two transcription factors . 
+ Mol Microbiol 27 : 493 -- 505 . 
+ 127 . 
+ Sawers G , Kaiser M , Sirko A , Freundlich M ( 1997 ) Transcriptional activation by FNR and CRP : reciprocity of binding-site recognition . 
+ Mol Microbiol 23 : 835 -- 845 . 
+ 128 . 
+ Sawers G , Suppmann B ( 1992 ) Anaerobic induction of pyruvate formate-lyase gene expression is mediated by the ArcA and FNR proteins . 
+ J Bacteriol 174 : 3474 -- 3478 . 
+ 129 . 
+ Green J , Baldwin ML , Richardson J ( 1998 ) Downregulation of Escherichia coli yfiD expression by FNR occupying a site at 293.5 involves the AR1-containing face of FNR . 
+ Mol Microbiol 29 : 1113 -- 1123 . 
+ 130 . 
+ Tyson KL , Bell AI , Cole JA , Busby SJ ( 1993 ) Definition of nitrite and nitrate response elements at the anaerobically inducible Escherichia coli nirB promoter : interactions between FNR and NarL . 
+ Mol Microbiol 7 : 151 -- 157 . 
+ 131 . 
+ Bonnefoy V , DeMoss JA ( 1992 ) Identification of functional cis-acting sequences involved in regulation of narK gene expression in Escherichia coli . 
+ Mol Microbiol 6 : 3595 -- 3602 . 
+ 132 . 
+ Partridge JD , Browning DF , Xu M , Newnham LJ , Scott C , et al. ( 2008 ) Characterization of the Escherichia coli K-12 ydhYVWXUT operon : regulation by FNR , NarL and NarP . 
+ Microbiology 154 : 608 -- 618 . 
+ 133 . 
+ Ziegelhoffer EC ( 1996 ) FNR-dependent transcriptional regulation in Escherichia coli : in vitro investigations of DNA binding and transcriptional activation and repression . 
+ Madison , WI : University of Wisconsin - Madison . 
+ 134 . 
+ Filenko NA , Browning DF , Cole JA ( 2005 ) Transcriptional regulation of a hybrid cluster ( prismane ) protein . 
+ Biochem Soc Trans 33 : 195 -- 197 . 
+ 135 . 
+ Shalel-Levanon S , San K-Y , Bennett GN ( 2005 ) Effect of ArcA and FNR on the expression of genes related to the oxygen regulation and the glycolysis pathway in Escherichia coli under microaerobic growth conditions . 
+ Biotechnol Bioeng 92 : 147 -- 159 . 
+ 136 . 
+ Golby P , Kelly DJ , Guest JR , Andrews SC ( 1998 ) Transcriptional regulation and organization of the dcuA and dcuB genes , encoding homologous anaerobic C4-dicarboxylate transporters in Escherichia coli . 
+ J Bacteriol 180 : 6586 -- 6596 . 
+ 137 . 
+ Zientz E , Janausch IG , Six S , Unden G ( 1999 ) Functioning of DcuC as the C4-dicarboxylate carrier during glucose fermentation by Escherichia coli . 
+ J Bacteriol 181 : 3716 -- 3720 . 
+ 138 . 
+ Mettert EL , Kiley PJ ( 2007 ) Contributions of [ 4Fe-4S ] - FNR and integration host factor to fnr transcriptional regulation . 
+ J Bacteriol 189 : 3036 -- 3043 . 
+ 139 . 
+ Green J , Guest JR ( 1994 ) Regulation of transcription at the ndh promoter of Escherichia coli by FNR and novel factors . 
+ Mol Microbiol 12 : 433 -- 444 . 
+ 140 . 
+ Quail MA , Haydon DJ , Guest JR ( 1994 ) The pdhR-aceEF-lpd operon of Escherichia coli expresses the pyruvate dehydrogenase complex . 
+ Mol Microbiol 12 : 95 -- 104 . 
+ 141 . 
+ Govantes F , Orjalo AV , Gunsalus RP ( 2000 ) Interplay between three global regulatory proteins mediates oxygen regulation of the Escherichia coli cytochrome d oxidase ( cydAB ) operon . 
+ Mol Microbiol 38 : 1061 -- 1073 . 
+ 142 . 
+ Kim D , Hong JS-J , Qiu Y , Nagarajan H , Seo J-H , et al. ( 2012 ) Comparative analysis of regulatory elements between Escherichia coli and Klebsiella pneumoniae by genome-wide transcription start site profiling . 
+ PLoS Genet 8 : e1002867 . 
+ 143 . 
+ Gibert I , Barbé J ( 1990 ) Cyclic AMP stimulates transcription of the structural gene of the outer-membrane protein OmpA of Escherichia coli . 
+ FEMS Microbiol Lett 56 : 307 -- 311 . 
+ 144 . 
+ Shin D , Cho N , Heu S , Ryu S ( 2003 ) Selective regulation of ptsG expression by Fis . 
+ Formation of either activating or repressing nucleoprotein complex in response to glucose . 
+ J Biol Chem 278 : 14776 -- 14781 . 
+ 145 . 
+ Zheng D , Constantinidou C , Hobman JL , Minchin SD ( 2004 ) Identification of the CRP regulon using in vitro and in vivo transcriptional profiling . 
+ Nucleic Acids Res 32 : 5874 -- 5893 . 
+ 146 . 
+ Postma PW , Lengeler JW , Jacobson GR ( 1993 ) Phosphoenolpyruvate : carbohydrate phosphotransferase systems of bacteria . 
+ Microbiol Rev 57 : 543 -- 594 . 
+ 147 . 
+ Hutchings MI , Drabble WT ( 2000 ) Regulation of the divergent guaBA and xseA promoters of Escherichia coli by the cyclic AMP receptor protein . 
+ FEMS Microbiol Lett 187 : 115 -- 122 . 
+ 148 . 
+ Feng Y , Cronan JE ( 2010 ) Overlapping repressor binding sites result in additive regulation of Escherichia coli FadH by FadR and ArcA . 
+ J Bacteriol 192 : 4289 -- 4299 . 
+ 149 . 
+ Nørregaard-Madsen M , Mygind B , Pedersen R , Valentin-Hansen P , Søgaard-Andersen L ( 1994 ) The gene encoding the periplasmic cyclophilin homologue , PPIase A , in Escherichia coli , is expressed from four promoters , three of which are activated by the cAMP-CRP complex and negatively regulated by the CytR repressor . 
+ Mol Microbiol 14 : 989 -- 997 . 
+ 150 . 
+ Peekhaus N , Conway T ( 1998 ) Positive and negative transcriptional regulation of the Escherichia coli gluconate regulon gene gntT by GntR and the cyclic AMP ( cAMP ) - cAMP receptor protein complex . 
+ J Bacteriol 180 : 1777 -- 1785 . 
+ 151 . 
+ Zhang Z , Gosset G , Barabote R , Gonzalez CS , Cuevas WA , et al. ( 2005 ) Functional interactions between the carbon and iron utilization regulators , Crp and Fur , in Escherichia coli . 
+ J Bacteriol 187 : 980 -- 990 . 
+ 152 . 
+ Chen Z , Lewis KA , Shultzaberger RK , Lyakhov IG , Zheng M , et al. ( 2007 ) Discovery of Fur binding site clusters in Escherichia coli by information theory models . 
+ Nucleic Acids Res 35 : 6762 -- 6777 . 
+ 153 . 
+ Lavrrar JL , Christoffersen CA , McIntosh MA ( 2002 ) Fur-DNA interactions at the bidirectional fepDGC-entS promoter region in Escherichia coli . 
+ J Mol Biol 322 : 983 -- 995 . 
+ 154 . 
+ Christoffersen CA , Brickman TJ , McIntosh MA ( 2001 ) Regulatory architecture of the iron-regulated fepD-ybdA bidirectional promoter region in Escherichia coli . 
+ J Bacteriol 183 : 2059 -- 2070 . 
+ 155 . 
+ Brickman TJ , Ozenberger BA , McIntosh MA ( 1990 ) Regulation of divergent transcription from the iron-responsive fepB-entC promoter-operator regions in Escherichia coli . 
+ J Mol Biol 212 : 669 -- 682 . 
+ 156 . 
+ Zhang J , Zeuner Y , Kleefeld A , Unden G , Janshoff A ( 2004 ) Multiple sitespecific binding of Fis protein to Escherichia coli nuoA-N promoter DNA and its impact on DNA topology visualised by means of scanning force microscopy . 
+ Chembiochem 5 : 1286 -- 1289 . 
+ 157 . 
+ Young GM , Postle K ( 1994 ) Repression of tonB transcription during anaerobic growth requires Fur binding at the promoter and a second factor binding upstream . 
+ Mol Microbiol 11 : 943 -- 954 . 
+ 158 . 
+ Vassinova N , Kozyrev D ( 2000 ) A method for direct cloning of Fur-regulated genes : identification of seven new Fur-regulated loci in Escherichia coli . 
+ Microbiology 146 : 3171 -- 3182 . 
+ 159 . 
+ Stojiljkovic I , Bäumler AJ , Hantke K ( 1994 ) Fur regulon in gram-negative bacteria . 
+ Identification and characterization of new iron-regulated Escherichia coli genes by a fur titration assay . 
+ J Mol Biol 236 : 531 -- 545 . 
+ 160 . 
+ McHugh JP , Rodrıguez-Quin ̃ones F , Abdul-Tehrani H , Svistunenko DA , Poole RK , et al. ( 2003 ) Global iron-dependent gene regulation in Escherichia coli . 
+ A new mechanism for iron homeostasis . 
+ J Biol Chem 278 : 29478 -- 29486 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/24146601.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/24146601.txt 0 → 100644
View file @27818a9
+ dPeak: High Resolution Identification of Transcription
+ Abstract 
+ Chromatin immunoprecipitation followed by high throughput sequencing ( ChIP-Seq ) has been successfully used for genome-wide profiling of transcription factor binding sites , histone modifications , and nucleosome occupancy in many model organisms and humans . 
+ Because the compact genomes of prokaryotes harbor many binding sites separated by only few base pairs , applications of ChIP-Seq in this domain have not reached their full potential . 
+ Applications in prokaryotic genomes are further hampered by the fact that well studied data analysis methods for ChIP-Seq do not result in a resolution required for deciphering the locations of nearby binding events . 
+ We generated single-end tag ( SET ) and paired-end tag ( PET ) ChIP-Seq data for s70 factor in Escherichia coli ( E. coli ) . 
+ Direct comparison of these datasets revealed that although PET assay enables higher resolution identification of binding events , standard ChIP-Seq analysis methods are not equipped to utilize PET-specific features of the data . 
+ To address this problem , we developed dPeak as a high resolution binding site identification ( deconvolution ) algorithm . 
+ dPeak implements a probabilistic model that accurately describes ChIP-Seq data generation process for both the SET and PET assays . 
+ For SET data , dPeak outperforms or performs comparably to the state-of-the-art high-resolution ChIP-Seq peak deconvolution algorithms such as PICS , GPS , and GEM . 
+ When coupled with PET data , dPeak significantly outperforms SET-based analysis with any of the current state-of-the-art methods . 
+ Experimental validations of a subset of dPeak predictions from s70 PET ChIP-Seq data indicate that dPeak can estimate locations of binding events with as high as 2 to 21 bp resolution . 
+ Applications of dPeak to s70 ChIP-Seq data in E. coli under aerobic and anaerobic conditions reveal closely located promoters that are differentially occupied and further illustrate the importance of high resolution analysis of ChIP-Seq data . 
+ Introduction
+ Since its introduction , chromatin immunoprecipitation followed by high throughput sequencing ( ChIP-Seq ) has revolutionized the study of gene regulation . 
+ ChIP-Seq is currently the state-of-the-art method for studying protein-DNA interactions genome-wide and is widely used [ 1 -- 5 ] . 
+ ChIP-Seq experiments capture millions of DNA fragments ( 150 * 250 bp in length ) that the protein under study interacts with using random fragmentation of DNA and a protein-specific antibody . 
+ Then , high 0 throughput sequencing of a small region ( 25 * 100 bp ) at the 5 end or both ends of each fragment generates millions of reads or tags . 
+ Sequencing one end and both ends are referred to as single-end tag ( SET ) and paired-end tag ( PET ) technologies , respectively ( Figure 1A ) . 
+ Standard preprocessing of these data involves mapping reads to a reference genome and retaining the uniquely mapping ones [ 6,7 ] . 
+ In PET data , start and end positions of each DNA fragment can be obtained by connecting positions of paired reads [ 8 ] . 
+ In contrast , the location of only the 0 5 end of each DNA fragment is known in SET data . 
+ The usual 0 practice for SET data is to either extend each read to its 3 direction by the average library size which is a parameter set in 0 the experimental procedure [ 7 ] or shift the 5 end position of each read by an estimate of the library size [ 9 ] . 
+ Then , genomic regions with large numbers of clustered aligned reads are identified as binding sites using one or more of the many available statistical approaches [ 6,7,9 -- 11 ] ( the first step in Figure 1C ) . 
+ Currently , the SET assay dominates all the ChIP-Seq experiments despite the fact that PET has several obvious , albeit less studied , advantages over SET . 
+ In PET data , paired reads from both ends of each DNA fragment can reduce the alignment ambiguity , increase precision in assigning the fragment locations , and improve mapping rates . 
+ This is especially advantageous for studying regulatory roles of repetitive regions of genomes [ 12,13 ] . 
+ Although many eukaryotic genomes are rich in repetitive elements , PET technology has not been extensively used with eukaryotic genomes [ 8,14 ] . 
+ One of the main reasons for this is that 
+ ChIP-Seq data is information rich even when the repetitive regions are not profiled [ 15 ] and that the PET assay costs 1:5 * 2 times more than the SET assay . 
+ Put differently , given a fixed cost , PET sequencing results in a lower sequencing depth compared to SET sequencing . 
+ In contrast to eukaryotic genomes , prokaryotic genomes are highly mappable , e.g. , 97:8 % of the Escherichia coli ( E. coli ) genome is mappable with 32 bp reads . 
+ This decreases the higher mapping rate appeal of the PET assay for these genomes . 
+ In this paper , we systematically investigate advantages of the PET assay from a new perspective and demonstrate both experimentally and computationally that it significantly improves the resolution of protein binding site identification . 
+ Improving resolution in identifying protein-DNA interaction sites is a critical issue in the study of prokaryotic genomes because prokaryotic transcription factors have closely spaced binding sites , some of which are only 10 to 100 bp apart from each other [ 16 -- 19 ] . 
+ These closely spaced binding sites are considered to be multiple `` switches '' that differentially regulate gene expression under diverse growth conditions [ 17 ] . 
+ Therefore , identification and differentiation of closely spaced binding sites are invaluable for elucidating the transcriptional networks of prokaryotic genomes . 
+ Although many methods have been proposed to identify peaks from ChIP-Seq data ( reviewed in [ 20 ] ) , such as MACS [ 9 ] , CisGenome [ 6 ] , and MOSAiCS [ 10 ] , these approaches reveal protein binding sites only in low resolution , i.e. , at an interval of hundreds to thousands of base pairs . 
+ Furthermore , they report only one `` mode '' or `` predicted binding location '' per peak . 
+ More recently , deconvolution algorithms such as CSDeconv [ 21 ] , GPS [ 22 ] ( recently improved as GEM [ 23 ] ) , and PICS [ 11 ] have been proposed to identify binding sites in higher resolution . 
+ However , these methods are specific to SET ChIP-Seq data and are not equipped to utilize the main features of PET ChIP-Seq data . 
+ Although a relatively recent method named SIPeS [ 24 ] is specifically designed for PET data and is shown to perform better than MACS paired-end mode [ 9 ] , our extensive computational and experimental analysis indicated that this approach is not suited for identifying closely located binding events . 
+ To address these limitations , we developed dPeak , a high resolution binding site identification ( deconvolution ) algorithm that can utilize both PET and SET ChIP-Seq data . 
+ The dPeak algorithm implements a probabilistic model that accurately describes the ChIP-Seq data generation process and analytically quantifies the differences in resolution between the PET and SET ChIP-Seq assays . 
+ We demonstrate that dPeak outperforms or performs competitively with the available SET-specific methods such as PICS , GPS , and 
+ GEM . 
+ More importantly , dPeak coupled with PET ChIP-Seq data improves the resolution of binding site identification significantly compared to SET-based analysis with any of the available methods . 
+ Generation and analysis of s70 factor PET and SET 
+ ChIP-Seq data from E. coli grown under aerobic and anaerobic conditions reveal the power of the dPeak algorithm in identifying closely located binding sites . 
+ Our study demonstrates the importance of high resolution binding site identification when studying the same factor under diverse biological conditions . 
+ We further support our findings by validating a small subset of our closely located binding site predictions with primer extension experiments . 
+ Results
+ Deeply sequenced E. coli s70 SET and PET ChIP-Seq data The s70 factor is responsible for transcription initiation at over 80 % of the known promoters in E. coli [ 25 ] . 
+ s70 combines with RNA polymerase to bind promoter sequences typically containing two consensus elements located at 35 bp and 10 bp upstream of the transcription start site [ 18 ] ; thus a s70 binding site spans about 40 bp upstream from the transcription start site . 
+ Many E. coli genes contain multiple s70 promoters , and much transcriptional regulation by oxygen as well as by other stimuli occurs by selection of one or a subset of the possible promoters in concert with binding of activators and repressors ( e.g. , ArcA and FNR for regulation by oxygen [ 17,19 ] ) . 
+ Understanding such regulation requires knowledge of precisely which promoters are used in a given condition . 
+ Therefore , the highest possible accuracy of ChIP-signal mapping will allow the best determination of promoter 70 binding by s - RNA polymerase holoenzyme . 
+ 70 We generated both PET and SET ChIP-Seq data for s factor from E. coli grown under aerobic ( zO2 ) and anaerobic ( { O2 ) conditions in glucose minimal media on the HiSeq2000 and Illumina GA IIx platforms . 
+ We used these experimental data for comparisons of PET and SET assays and evaluation of our high resolution binding site detection method dPeak throughout the paper . 
+ Figure 1B displays PET and SET ChIP-Seq coverage plots for the promoter region of the cydA gene under the aerobic condition . 
+ The height at each position indicates the number of DNA fragments overlapping that position . 
+ The cydA promoter contains five known s70 binding sites separated by 11 to 84 bp [ 25 ] . 
+ As evidenced in Figure 1B , coverage plots for PET and SET appear almost indistinguishable visually . 
+ To further understand the appearance of peaks that multiple binding events in this region would result in , we simulated PET and SET data with parameters matching to those of this region . 
+ Figures S1A , B , C in Text S1 display SET and PET coverage plots of this region when it harbors one and three binding events . 
+ These plots support that when binding events are in close proximity with distances less than the average library size , they appear as uni-modal peaks regardless of the library preparation protocol ( Figure S1C in Text S1 ) . 
+ We next evaluated two peak callers , MACS [ 9 ] and MOSAiCS [ 10 ] , both of which are specifically developed for SET data , on our SET and PET experimental datasets ( Table S1 in Text S1 ) . 
+ Both methods identified broad regions and the median widths of MACS peaks were 5 to 10 times larger than those of the MOSAiCS peaks . 
+ Detailed comparison of the MACS and MOSAiCS peaks revealed that each MACS peak on average has 1:54 to 2:23 MOSAiCS peaks ( Table S2 in Text S1 ) . 
+ Next , we evaluated the number of annotated s70 binding events from RegulonDB [ 25 ] ( http : / / regulondb.ccg.unam.mx / ) in each of the MACS and MOSAiCS peaks and found that MACS peaks , on average , had 1:86 to 2:02 annotated binding events whereas MOSAiCS peaks had 1:47 to 1:48 . 
+ Overall , we did not observe any differences in the peak widths of the PET and SET assays with MOSAiCS whereas MACS peaks from PET data tended to be wider than those of the SET data . 
+ These findings indicate that the potential advantages of the PET assay for elucidating closely located binding sites are not simply revealed from visual inspection and by analysis with methods developed specifically for SET data . 
+ Hence , deciphering the advantages of PET over SET for high resolution binding site identification warrants a statistical assessment . 
+ Next , we developed a generative probabilistic model and an accompanying algorithm , dPeak , that can specifically utilize local read distributions from SET and PET assays . 
+ This algorithm enabled unbiased evaluation of the SET and PET assays using our E. coli SET and PET ChIP-Seq data . 
+ Analytical framework of the dPeak algorithm
+ dPeak requires data in the form of genomic coordinates of paired reads ( for PET ) or genomic coordinates of reads and their strands ( for SET ) obtained from mapping to a reference genome . 
+ For computational efficiency , dPeak first identifies candidate regions ( i.e. , peaks ) that contain at least one binding event and considers each candidate region separately for the prediction of number and locations of binding events ( the first step of Figure 1C ) . 
+ Either two-sample ( using both ChIP and control input samples ) or one-sample ( only using ChIP sample when a control sample is lacking ) analysis can be used to identify candidate regions . 
+ For this purpose , we utilize the MOSAiCS algorithm [ 10 ] which produced narrower peaks than the MACS algorithm [ 9 ] in our ChIP-Seq datasets ( Table S1 in Text S1 ) . 
+ In each candidate region , we model read positions as originating from a mixture of multiple binding events and a background component ( the third step of Figure 1C ) . 
+ dPeak infers the number of binding events and the read sets corresponding to each binding event within each region . 
+ It iterates the following two steps for each candidate region . 
+ First , it assigns each read to a binding event or background , based on the positions and strengths of the binding events . 
+ Then , the position and strength of each binding event are updated using its assigned reads . 
+ In practice , the number of binding events in each candidate region is unknown a priori . 
+ Hence , we consider models with different numbers of binding events and choose the optimal number using Bayesian information criterion ( BIC ) [ 26 ] . 
+ We constructed generative probabilistic models for binding event components and a background component for each of the PET and SET data by careful exploratory analyses of multiple experimental ChIP-Seq datasets . 
+ Diagnostic plots of the fitted models ( Figure S3 in Text S1 ) indicate that the dPeak model fits ChIP-Seq data well . 
+ dPeak has two unique features compared to other peak deconvolution algorithms ( Table S3 in Text S1 ) . 
+ First , it accommodates both SET and PET data and explicitly utilizes specific features of both types . 
+ Second , it incorporates a background component that accommodates reads due to nonspecific binding . 
+ Consideration of non-specific binding is critical because the degree of non-specific binding becomes more significant as the sequencing depths get larger . 
+ An additional unique feature of dPeak is the treatment of unknown library size for SET data . 
+ As discussed earlier , to account for unknown library size , each read is either extended to or shifted by an estimate of the library size in most peak calling algorithms [ 20 ] . 
+ This estimate is often specified by users [ 7,10 ] or estimated from ChIP-Seq data [ 9,11 ] . 
+ Currently available algorithms with the exception of PICS use only one extension/shift estimate for all the regions in the genome . 
+ However , our exploratory analysis of real ChIP-Seq data and the empirical distribution of the library size from PET data ( Figure S2A in Text S1 ) indicate that using single extension/shift length might be suboptimal for peak calling ( data not shown ) . 
+ In order to address this issue , dPeak estimates optimal extension/shift length for each candidate region . 
+ Comparison of empirical distribution of the library size from PET data with the estimates of the region-specific extension/shift lengths indicates that dPeak estimation procedure handles the heterogeneity of the peakspecific library sizes well ( Figures S2B , C , D in Text S1 ) . 
+ This advancement ensures that dPeak is well tuned for deconvolving SET peaks , which then enables an unbiased computational comparison between the SET and PET assays . 
+ We compared dPeak with two competing algorithms , GPS [ 22 ] and PICS [ 11 ] , for analysis of SET ChIP-Seq data . 
+ We did not include the CSDeconv algorithm [ 21 ] in this comparison because it is computationally several orders of magnitude slower than the algorithms considered here . 
+ We utilized the synthetic ChIP-Seq data which was previously used to evaluate deconvolution algorithms [ 22 ] . 
+ In this synthetic data , binding events were generated by spiking in reads from predicted CTCF binding events at predefined intervals [ 22 ] without explicitly implanting binding sequence motifs . 
+ Therefore , we also excluded GEM [ 23 ] , which capitalizes on motif discovery to infer positions of binding events , from this comparison and used additional computational experiments below to perform comparisons with GEM . 
+ The synthetic data from [ 22 ] consisted of 1,000 joint ( i.e. , close proximity ) binding events , each with two events , and 20,000 single binding events . 
+ We assessed performances of algorithms on these two sets separately . 
+ Figure 2A shows the sensitivity of each algorithm at different distances between the joint binding events . 
+ Here , sensitivity is the proportion of regions for which both of the two true binding events are correctly identified . 
+ dPeak outperforms other methods across all considered distances between the joint binding events and especially for closely located binding events separated by less than the average library size of 250 bp . 
+ When the distance between the joint binding events is about 200 bp , dPeak is able to identify both binding events in 80 % of the regions whereas neither PICS nor GPS can detect both binding events in more than 20 % . 
+ Further investigation indicates that PICS merges closely spaced binding events into one event too often ( Figure S4 in Text S1 ) . 
+ We also found that GPS estimates the peak shape incorrectly when ChIP-Seq data harbors many closely located binding events ( Figure S5 in Text S1 ) . 
+ Furthermore , the sensitivity of GPS also decreases significantly when the distance between joint binding events increases . 
+ A closer look at the results reveals that GPS filters out too many predictions for joint binding events . 
+ To ensure that increased sensitivity of dPeak is not a result of increased number of false predictions , we evaluated positive predictive value ( fraction of predictions that are correct ) of each method . 
+ Specifically , we plotted the number of binding events predicted by each algorithm at different distances between the joint binding events in Figure 2B . 
+ Since there are two true binding events in each region , two predictions at every distance correspond to perfect positive predictive value . 
+ dPeak on average generates more than one prediction and does not over-estimate the number of binding events when the distance between joint events is less than the average library size . 
+ This result confirms that the higher sensitivity of dPeak in Figure 2A is not due to increased number of predictions . 
+ In contrast , PICS and GPS on average generate only one prediction for closely located binding events , which recapitulates the conclusions from Figure 2A . 
+ In summary , dPeak outperforms state-of-the-art deconvolution methods across different distances between joint binding events , especially when the distance between the binding events is less than the average library size . 
+ Next , we evaluated the sensitivity and positive predictive value of the three methods on 20,000 candidate regions with a single binding event using the additional synthetic data from [ 22 ] ( Table S4 in Text S1 ) . 
+ Average number of predictions per region with at least one predicted binding event and the corresponding standard errors are as follows : dPeak 1:16 ( 0:42 ) , PICS 1:02 ( 0:16 ) , GPS 2:72 ( 1:69 ) . 
+ Overall , dPeak slightly overestimates the number of binding events for regions with a single binding event , and hence PICS is slightly better than dPeak in positive predictive value for these regions . 
+ However , as revealed by our joint event analysis , this conservative approach of PICS severely under-estimates the number of binding events when multiple events reside closely . 
+ In contrast , GPS significantly under-estimates the number of binding events for the regions with a single binding event since it filters out too many predictions and does not result in a prediction for 82 % of the regions . 
+ In addition , it over-estimates the number of binding events across regions for which it produces at least one prediction . 
+ Comparisons in these two scenarios with and without joint binding events indicate that dPeak strikes a good balance between sensitivity and positive predictive value for 
+ Once we developed dPeak as a high resolution peak detection method for both SET and PET data , we implemented simulation studies to evaluate the PET and SET assays for resolving closely spaced binding events in an unbiased manner . 
+ Although SIPeS [ 24 ] supports PET ChIP-Seq data , we excluded it from the comparison of PET and SET ChIP-Seq datasets due to its poor performance ( Section 16 of Text S1 ) . 
+ We generated 100 simulated PET and SET ChIP-Seq data with two closely spaced binding events and evaluated the predictions of these two data types with dPeak ( Section 11 of Text S1 ; Figure S7 in Text S1 ) . 
+ Figure 2C plots the sensitivity of dPeak as a function of distance between the joint binding events and number of reads for both the 
+ PET and SET settings . 
+ Note that we evaluated sensitivity up to the distance of 50 bp because we used 20 bp windows to determine whether a binding event is correctly identified and as a result , results for the distance less than 50 bp could be misleading . 
+ When the distance between the events is at least as large as the average library size ( § 150 bp ) , the sensitivity using PET and SET data are comparable . 
+ However , as the distance between joint binding events decreases , the sensitivity using SET data decreases significantly . 
+ In contrast , PET ChIP-Seq retains its high sensitivity even for binding events that are located as close as 50 bp . 
+ As the number of reads decreases , sensitivity for both PET and SET data decreases . 
+ When there are only 20 DNA fragments ( i.e. , 40 reads ) per binding event , sensitivity for PET data also decreases as the distance between joint binding events decreases . 
+ However , even in this case , sensitivity of PET data is still significantly higher than that of SET data with much higher number of reads . 
+ Figure 2D displays the number of binding events predicted by dPeak at different distances between joint binding events when 40 reads correspond to each binding event for both PET and SET data and evaluates positive predictive value . 
+ Results are similar for higher number of reads ( data not shown ) . 
+ With PET ChIP-Seq , dPeak accurately chooses the number of binding events by BIC out of a maximum of five binding events at any distance between the joint binding events . 
+ In contrast , SET ChIP-Seq predicts less than two binding events when the distance between the events is less than 150 bp . 
+ We present additional simulation results in Section 10 of Text S1 ( Figure S6 in Text S1 ) . 
+ These simulations reveal that even for cases with single binding events , PET has a slight advantage over SET because it predicts the location of the binding event more accurately . 
+ Specifically , PET data always provides higher resolution compared to SET data regardless of the strength of the binding event , which we measure by the number of DNA fragments associated with the event . 
+ For example , for a binding event with 300 DNA fragments , the average distance between the predicted and true binding events is 0:6 bp with a standard deviation of 0:8 bp in the PET data whereas it is 7:6 bp with a standard deviation of 11:8 bp in the SET data . 
+ Note that although this simulation procedure is based on the assumptions of dPeak model for PET data , our exploratory analysis and goodness of fit ( Figure S3A in Text S1 ) show that these assumptions hold well in the real PET ChIP-Seq data and therefore , these results have significant practical implications for real ChIP-Seq data . 
+ Analytical investigation with the dPeak generative model explains the difference in sensitivity between PET and 
+ SET data Lower sensitivity of the SET compared to PET data is mainly driven by the loss of information due to unknown library size . 
+ We describe this information loss by two concepts named invasion and truncation ( Figure 3A ) . 
+ Top diagram of Figure 3A depicts two closely spaced binding events and a DNA fragment that is informative for the first binding event ( in red ) in the PET data . 
+ Invasion refers to over-estimation of the library size and extension of the read to a length longer than the true one . 
+ Equivalently , in the shifting procedure , this corresponds to shifting the read more than necessary . 
+ As a result , the read extended to the estimated library size covers both of the closely spaced binding events in the SET data and becomes uninformative or less informative for the binding event it corresponds to . 
+ Bottom diagram of Figure 3A also depicts two closely spaced binding events and illustrates truncation which we define as under-estimation of the library size . 
+ In this case , the displayed DNA fragment is long and spans both binding events ( in red ) . 
+ Therefore , it contributes to estimation of both binding events in the PET data . 
+ In contrast , the read extended to estimated library size only covers the first binding event in the SET data and , as a result , its contribution to the first binding event is overestimated whereas its contribution to the second binding event is underestimated . 
+ We evaluated the frequency by which fragments with invasion and truncation arise in SET data with a simulation study . 
+ Our results ( Table S5 in Text S1 ) indicate that as high as 76:8 % and 25:5 % of the fragments for a typical peak region can be subject to invasion and truncation with the SET assay . 
+ already insufficient information to predict two binding events even in PET data and relative loss of information ( i.e. , invasion ) in SET data is insignificant . 
+ These concepts describe how information on binding events can be lost or distorted by the incorrect estimation of the library size in the SET data . 
+ Analytical calculations based on the dPeak generative model show that invasion and truncation influence closely located binding events the most , especially when the library size is not tightly controlled , i.e. , exhibit large variation ( Figures 3B , C ) . 
+ dPeak analysis of s70 PET ChIP-Seq data identifies 70 significantly more RegulonDB supported s binding events than the analysis of SET ChIP-Seq data We compared the performance of PET and SET sequencing for s70 factor under the aerobic condition by generating a ` quasi-SET data ' by randomly sampling one of the two ends of each paired reads in PET data and comparing binding events identified from both sets . 
+ In order to match number of reads with SET data for fair comparison , only the half number of paired reads was used to construct PET data . 
+ Comparison with the quasi-SET data controlled for the differences in the sequencing depths of the original PET and SET samples in addition to the biological variation of the replicates . 
+ We then evaluated the dPeak 70 predictions from the PET and SET analyses using the s factor binding site annotations in the RegulonDB database as a gold standard . 
+ Because a significant number of promoter regions lack RegulonDB annotations , we evaluated the sensitivity based on the regions that contain at least one annotated binding site . 
+ This corresponds to 539 binding sites in 363 candidate regions that MOSAiCS identified . 
+ Of these 363 regions , 240 harbor only a single annotated binding event . 
+ For the regions with more than one annotated binding event , the average distance between binding events is 126 bp . 
+ dPeak analysis of the SET data identifies only 38 % of the 539 annotated binding events . 
+ In contrast , analysis of PET data with dPeak detects 66 % of the annotated binding sites . 
+ Figure 4A displays average sensitivity as a function of the average distance between annotated binding events for the regions with at least two RegulonDB annotations . 
+ A linear line is superimposed to capture the trend for both data types . 
+ Notably , the lower sensitivity of SET compared to PET is mainly due to closely located binding events . 
+ We also compared prediction accuracies of the PET and SET assays for the 240 regions that harbor a single annotated binding event . 
+ Figure 4B displays resolutions , which we define as the minimum of distances between predicted and annotated positions of binding events , achieved by the PET and SET assays . 
+ Median resolutions are 11 bp ( IQR = 16:25 bp ) and 28:5 bp ( IQR = 45:25 bp ) for PET and SET , respectively . 
+ This result indicates that positions of binding events can be more accurately predicted with the PET assay compared to SET even for regions with a single binding event . 
+ To further examine the accuracy of the s70 dPeak predictions , primer extension analysis was performed to map the transcription start site for eight genes ( Figures S10 -- S13 in Text S1 ; Table S7 in 
+ Text S1 ) . 
+ dPeak analysis of the PET ChIP-Seq data predicts two closely spaced s70 binding sites in the upstream of each of these eight genes with the distance between predictions ranging 34 bp to 177 bp . 
+ Seven of these predictions are not annotated in RegulonDB and thus represent potential novel transcription start sites . 
+ A transcription start site was detected within 21 bp of 14 ( 87:5 % ) of these s70 binding site predictions ( Figure 5A and Table 1 ) , further supporting the accuracy of the dPeak PET predictions . 
+ We treated these 14 validated sites as a gold standard and evaluated the performance of each deconvolution algorithm for these regions . 
+ Figure 5B depicts that dPeak with PET ChIP-Seq data attains significantly higher resolution compared to SET-based analysis regardless of the deconvolution algorithm used ( p-values of paired t-tests between dPeak using PET data and each of the other methods using SET data are v0 :01 ) . 
+ dPeak with SET ChIP-Seq data has a resolution comparable to or better than those of the competing algorithms . 
+ GPS is not included in this plot because it provides significantly worse resolution compared to other methods ( Figure S9C in Text S1 ) . 
+ Genome-wide comparisons using the RegulonDB transcription start site annotations as a gold standard also lead to a similar conclusion , supporting the notion that PET-analysis with dPeak provides the best resolution ( Figures S9A , B in Text S1 ) . 
+ Figures 4C and 4D display two representative peak regions from these analyses . 
+ Figure 4C illustrates two binding events in the promoter regions of sibD and sibE genes separated by 375 bp . 
+ In this case , two peaks are easily distinguishable just by visual inspection and the predictions using both PET and SET data are comparably accurate . 
+ Note that although these two binding events are visually distinguishable , standard applications of MACS and MOSAiCS identify this region as a single peak . 
+ Widths of MOSAiCS and MACS peaks for this region are 900 bp and 2,042 bp , respectively . 
+ MACS identifies the position of the right binding event as the `` summit '' of this region ( position 3,193,216 ) . 
+ Figure 4D displays the promoter region of yejG gene , where the distance between the two experimentally validated binding events is only 122 bp . 
+ In this case , dPeak application to PET data correctly predicts the number of binding events as two and identifies the locations of these events within 12 bp of the validated sites . 
+ In contrast , all of the SET-based analyses with the deconvolution algorithms ( PICS , GPS , GEM ) incorrectly predict one binding event located in the middle of the two experimentally validated binding sites . 
+ dPeak analysis of E. coli s70 PET ChIP-Seq data identifies closely located binding sites that are differentially occupied between aerobic and anaerobic conditions High resolution identification of binding sites is especially important for differential occupancy analysis where a protein of interest is profiled under different conditions . 
+ Given the high agreement between the dPeak algorithm and experimentally validated transcription start sites at a subset of promoter regions , we set out to identify differential promoter usage between the aerobic and anaerobic growth conditions by profiling the E. coli s70 factor . 
+ Results from the dPeak analysis of the aerobic and anaerobic PET data are summarized in Figure 5C both in the region ( i.e. , peak ) and binding event levels . 
+ We identified 868 peaks and 967 dPeak binding events that were common between the zO2 and { O2 conditions . 
+ Interestingly , only 82 peaks were unique to the zO2 condition but dPeak analysis identified 247 zO2-specific binding events . 
+ Similarly , we identified 130 peaks unique to the { O2 condition while dPeak analysis resulted in 268 { O2-specific binding events . 
+ We used the SET ChIP-Seq data from additional biological replicates under both conditions as independent validation of the results . 
+ This independent validation using SET data identified 40 { 60 % of the binding events identified by dPeak using PET ChIP-Seq data ( 56:1 % of the common events , 41:3 % of the zO2-specific binding events and 42:5 % of the { O2-specific binding events ) . 
+ Table S8 in Text S1 further summarizes these results by cross-tabulating the number of predicted binding events in each peak across the two conditions . 
+ It illustrates that there are indeed many peaks with at least one binding event in each condition and different number of binding events across the two conditions . 
+ Figure S14 in Text S1 displays an example of closely located binding sites that are differentially 70 occupied between aerobic and anaerobic conditions in s PET ChIP-Seq data . 
+ These results suggest that dPeak analysis identified many unique s70 binding events that could not be differentiated in the peak-level analysis . 
+ Discussion
+ High resolution identification of binding sites with ChIP-Seq has profound effects for studying protein-DNA interactions in prokaryotic genomes and differential occupancy . 
+ We evaluated PET and SET ChIP-Seq assays and illustrated that PET has considerably more power for deciphering locations of closely spaced binding events . 
+ Our data-driven computational experiments indicate that when the distance between binding events gets smaller than the average library size , SET analysis have notably less power than the PET analysis . 
+ Furthermore , PET provides better resolution than SET even when a region harbors a single binding event . 
+ We developed and evaluated the dPeak algorithm , a model-based approach to identify protein binding sites in high resolution , with data-driven computational experiments and experimental validation . 
+ dPeak is currently the only algorithm that can utilize both PET and SET ChIP-Seq data and can accommodate high levels of non-specific binding apparent in deeply sequenced ChIP samples ( Table S3 in Text S1 ) . 
+ Our data-driven computational experiments and computational analysis of experimentally validated s70 binding sites indicate that it significantly outperforms the currently available PET ChIP-Seq peak finder SIPeS [ 24 ] . 
+ Application of dPeak to E. coli s70 ChIP-Seq data under aerobic and anaerobic conditions revealed that although many peaks identified by standard application of popular peak finders might appear as common between the two conditions , a considerable percentage of these may harbor condition-specific binding events . 
+ The high-resolution s70 binding sites identified by dPeak could be combined with start-site mapping or consensus-sequence identification to assign transcriptional orientation to the s70 binding sites . 
+ The advantages of using the dPeak algorithm are not limited to the study of prokaryotic genomes . 
+ Applications in eukaryotic genomes include identification of the exact locations of binding motifs when multiple closely located consensus sequences reside in a peak region , studies of cis regulatory modules ( CRM ) , and refining consensus sequences . 
+ Figure S16 in Text S1 displays an example application of dPeak for differentiating among multiple closely located GATA1 binding sites with consensus WGATAR within a ChIP-Seq peak region critical for erythroid differentiation in mouse embryonic stem cells ( data from [ 27 ] ) . 
+ CRM studies investigate relationships between spatial configurations of binding sites of multiple transcription factors and gene expression . 
+ Relative orders , positions , and distances of binding sites of multiple factors and their relative strengths are key factors in CRM studies [ 28 ] . 
+ Because dPeak facilitates identification of binding sites of transcription factors in high resolution from ChIP-Seq data , it can enable construction of complex interaction networks among diverse factors across multiple growth conditions . 
+ We evaluated the performance of dPeak on eukaryotic genome ChIP-Seq data that GPS and PICS were optimized for . 
+ Figure S17 in Text S1 shows the performance comparison results for transcription factor GABPA profiled in GM12878 cell line from the ENCODE database . 
+ It indicates that dPeak performs comparable to or outperforms GPS and PICS . 
+ In the case of sequence-specific factors with well-conserved motifs such as the GABPA factor , we observed that dPeak prediction can be further improved in a straightforward way by incorporating sequence information . 
+ Figure S17 in Text S1 illustrates that dPeak with incorporated sequence information performs comparable to GEM and identifies the GABPA binding sites with high accuracy . 
+ Recently , ChIP-exo assay [ 29 ] , a modified ChIP-Seq protocol using exonuclease , has been proposed as a way of experimentally attaining higher resolution in protein binding site identification . 
+ Because the ChIP-exo protocol is new and relatively laborious , there are not yet many publicly available ChIP-exo datasets . 
+ We utilized ChIP-exo of CTCF factor in human HeLa-S3 cell line [ 29 ] and compared their binding event predictions with dPeak predictions on SET ChIP-Seq data of CTCF in the same cell line . 
+ Figure S18 in Text S1 illustrates that dPeak using SET ChIP-Seq data provides higher resolution than ChIP-exo data and that dPeak can be readily utilized for ChIP-exo data analysis . 
+ Furthermore , it also indicates that dPeak performs comparable to or outperforms currently available methods such as GPS and GEM for both ChIP-exo and SET ChIP-Seq data . 
+ Although the real power of the ChIP-exo technique will be revealed as more ChIP-exo datasets are produced and compared with ChIP-Seq datasets , our results with the currently available data suggest that analyzing ChIP ¬ 
+ Seq data with powerful deconvolution methods such as dPeak might perform as well as ChIP-exo . 
+ We implemented dPeak as an R package named dPeak . 
+ dPeak utilizes the fast estimation algorithm we developed and 70 parallel computing . 
+ Analysis of the s data ( ,1,000 candidate regions , each with ,2,300 reads on average ) using our current sub-optimal implementation of dPeak takes about 5 minutes using 20 CPUs ( 2:2 Ghz ) when up to 5 binding events are allowed in each candidate region , while it takes about 20 minutes to run PICS and GPS ( also using 20 CPUs ) . 
+ Similarly , analysis of human ENCODE POL2-H1ESC data ( ,14,000 candidate regions , each with * 140 reads on average ) takes about 10 minutes for dPeak , while it takes 100 and 30 minutes for GPS and PICS , respectively . 
+ dPeak is currently available at http://www.stat.wisc.edu/ chungdon/dpeak / and , will be contributed to public repositories such as Bioconductor [ 30 ] and Galaxy Tool Shed [ 31 ] upon publication . 
+ Materials and Methods
+ Growth conditions
+ All strains were grown in MOPS minimal medium supplement-0 ed with 0:2 % glucose [ 32 ] at 37 C and sparged with a gas mix of 95 % N2 and 5 % CO2 ( anaerobic ) or 70 % N2 , 5 % CO2 , and 25 % O2 ( aerobic ) . 
+ Cells were harvested during mid-log growth ( OD600 of * 0:3 using a Perkin Elmer Lambda 25UV = Vis Spectrophotometer ) . 
+ WT E. coli K-12 MG1655 ( F { , l { , rph { 1 ) was used for the experiments ( Kiley lab stock ) . 
+ ChIP experiments
+ ChIP assays were performed as previously described [ 33 ] , except that the glycine , the formaldehyde , and the sodium phosphate mix were sparged with argon gas for 20 minutes before use to maintain anaerobic conditions when required . 
+ Samples 70 were immunoprecipitated using 2 mL of RNA Polymerase s antibody from NeoClone ( W0004 ) . 
+ Library preparation, sequencing, and mapping of sequencing reads
+ For ChIP-Seq experiments , 10 ng of immunoprecipitated and purified DNA fragments from the aerobic and anaerobic s70 samples ( one biological sample for both aerobic and anaerobic growth conditions ) , along with 10 ng of input control ( two biological replicates for anaerobic Input and one biological sample for aerobic Input ) , were submitted to the University of Wisconsin-Madison DNA Sequencing Facility for ChIP-Seq library preparation . 
+ Samples were sheared to 200 { 500 nt during the IP process to facilitate library preparation . 
+ All libraries were generated using reagents from the Illumina Paired End Sample Preparation Kit ( Illumina ) and the Illumina protocol `` Preparing Samples for ChIP Sequencing of DNA '' ( Illumina part # 11257047 RevA ) as per the manufacturer 's instructions , except products of the ligation reaction were purified by gel electrophoresis using 2 % SizeSelect agarose gels ( Invitrogen ) targeting 275 bp fragments . 
+ After library construction and amplification , quality and quantity were assessed using an Agilent DNA 1000 series chip assay ( Agilent ) and QuantIT PicoGreen dsDNA Kit ( Invitrogen ) , respectively , and libraries were standardized to 10 mM . 
+ For PET ChIP-Seq data , cluster generation was performed using an Illumina cBot Paired End Cluster Generation Kit ( v3 ) . 
+ Paired reads , 36 bp run was performed for each end , using 200 bp v3 SBS reagents and CASAVA ( the Illumina pipeline ) v 1.8.2 , on the HiSeq2000 . 
+ For SET ChIP-Seq data , cluster generation was performed using an Illumina cBot Single Read Cluster Generation Kit ( v4 ) and placed on the Illumina cBot . 
+ A single read , 32 bp run was performed , using standard 36 bp SBS kits ( v4 ) and SCS 2.6 on an Illumina Genome Analyzer IIx . 
+ Base calling was performed using the standard Illumina Pipeline version 1.6 . 
+ Sequence reads were aligned to the published E. coli K-12 MG1655 genome ( U00096 .2 ) using the software packages SOAP [ 34 ] and ELAND ( within the Illumina Genome Analyzer Pipeline Software ) , allowing at most two mismatches . 
+ PET experiments yielded 13:8 million ( M ) and 22:3 M mappable paired 36mer reads and SET yielded 7:4 M and 11:5 M mappable 32mer reads for aerobic and anaerobic conditions , respectively . 
+ Control input experiments , generated with SET sequencing , resulted in 4:6 M and 10:2 M mappable 32mer reads for the aerobic and anaerobic conditions , respectively . 
+ Raw and aligned data files are available at ftp : / / ftp.cs.wisc.edu/pub/users/keles/dPeak and are being processed by GEO for accession number assignment . 
+ For PET data , if a DNA fragment ( paired reads ) belongs to g-th binding event , we model its leftmost position conditional on its length Li as Uniform distribution between mg { Liz1 and mg , where m is the position of g-th binding event . 
+ Lengths of DNA g fragments , Li , are modeled using the empirical distribution obtained from actual PET data . 
+ For SET data , if a read belongs 0 to g-th binding event , we model its 5 end position conditional on its strand as Normal distribution . 
+ Specifically , if a read is in the 0 forward strand , its 5 end position is modeled as Normal 0 m d s2 distribution with mean g { and variance . 
+ 5 end positions for reverse strand reads are modeled similarly with Normal distribution with mean mgzd and variance s2 . 
+ Parameters d and s2 are common to all binding event components in each candidate region . 
+ Strands of reads are modeled as Bernoulli distribution . 
+ Background reads are assumed to be uniformly distributed over the candidate region that they belong to . 
+ Parameters are estimated via the Expectation-Maximization ( EM ) algorithm [ 35 ] . 
+ Additional details on the dPeak model and the estimation algorithm for the PET and SET settings are available in Sections 2 and 3 of Text S1 . 
+ Method comparison for SET ChIP-Seq data
+ We compared the sensitivity and the number of predictions of dPeak with those of PICS [ 11 ] , GPS [ 22 ] , and GEM [ 23 ] . 
+ Sensitivity is the proportion of regions for which both of the two true binding events are correctly identified . 
+ A binding event is considered as ` identified ' if the distance between the actual binding event and the predicted position is less than 20 bp . 
+ Note that we chose a more stringent criteria than the 100 bp used by GPS for defining true positives because 100 bp is not high enough resolution for prokaryotic genomes . 
+ For the PICS algorithm , we used the R package PICS version 1.10 , which is available from Bioconductor ( http://www.bioconductor.org/packages/2.10/bioc/html/PICS . 
+ html ) . 
+ For the GPS algorithm , we used its Java implementation version 1.1 from http://cgs.csail.mit.edu/gps/ . 
+ In the performance comparisons using s70 ChIP-Seq data , we also incorporated GEM , a recently modified and extended version of GPS , which incorporates genome sequence of the peaks to improve binding event identification . 
+ For the GEM algorithm , we used its Java implementation version 0.9 from http://cgs.csail.mit.edu/ gem / . 
+ We downloaded the synthetic data used for the method comparisons from http://cgs.csail.mit.edu/gps/ and its description is provided in Supplementary information of the GPS paper [ 22 ] . 
+ This synthetic data consists of `` chrA '' with 1,000 regions that harbor two closely spaced binding events and `` chrB '' to `` chrK '' with a total of 20,000 regions with a single binding event . 
+ We evaluated performances of the methods on joint and single binding event regions separately so that we could assess sensitivity and specificity for each of these cases . 
+ Candidate regions for dPeak were identified using the conditional binomial test [ 6 ] with a false discovery rate of 0:05 by applying the Benjamini-Hochberg correction [ 36 ] . 
+ These regions were also explicitly provided to the GPS and GEM algorithms as candidate regions . 
+ Candidate regions for PICS were identified using the function segmentReads ( ) in the PICS R package ( default parameters ) . 
+ Default tuning parameters were used during model fitting for all the methods . 
+ Simulation studies to compare PET and SET ChIP-Seq data 
+ We considered distances between binding sites ranging from 50 bp to 200 bp which characterize the typical binding event spacing in E. coli . 
+ We generated and assigned 300 DNA fragments to each of two binding events as follows . 
+ For each DNA fragment , we drew the length ( Li ) from the distribution of library size , P ( L ) , estimated empirically from the actual s70 PET ChIP-Seq data and group index ( Zi ) from multinomial distribution with parameters ( 0:5 , 0:5 ) . 
+ Then , for given a library size and group index ( Zi ~ g ) , leftmost position of the paired reads ( Si ) was generated from Uniform distribution between mg { Liz1 and mg , where mg is the position of g-th binding event . 
+ Rightmost position was assigned as Ei ~ SizLi { 1 . 
+ SET data was generated by randomly sampling one of two ends from each of these paired reads . 
+ For the SET analysis , average library size was assumed to be 150 bp . 
+ Then , only half of the total number of paired reads was used to construct PET data , in order to match number of reads with SET data for fair comparison . 
+ In addition , we randomly assigned 10 DNA fragments to arbitrary positions within the candidate region to generate non-specific binding ( background ) reads . 
+ The sensitivity and the number of predictions were summarized over 100 simulated datasets generated by this procedure . 
+ A binding event was considered as ` identified ' if the distance between the binding event and the predicted position is less than 20 bp . 
+ We repeated these PET versus SET analyses by comparing all the PET data with SET data constructed from selecting one of two ends of each read pair and obtained little or no change in the results ( data not shown ) . 
+ dPeak analysis of s70 PET and SET ChIP-Seq data We identified candidate regions , i.e. , peaks with at least one binding event , using the MOSAiCS algorithm [ 10 ] ( two-sample analysis with a false discovery rate of 0:001 ) . 
+ In each candidate region , we fitted the dPeak model , which is a mixture of g binding event components and one background component ( Figure 1C ) . 
+ In the current analysis , up to five binding event components ( gmax ~ 5 ) were considered . 
+ The optimal number of binding events was chosen with BIC for each candidate region . 
+ We utilized top 50 % of the predicted binding events from each condition for the comparison between the aerobic and anaerobic conditions . 
+ Overall conclusions remained the same when the full set of predicted binding events are considered . 
+ Primer extension experiments
+ Total RNA was isolated as previously described [ 37 ] . 
+ Oligonu-0 cleotide primers ( Table S7 in Text S1 ) were labeled at the 5 end using [ c { 32P ] ATP ( 3,000 Ci = mmol ) and T4 polynucleotide kinase ( Promega ) followed by purification with a G25 Sephadex Quick Spin Column ( GE ) . 
+ Labeled primer ( 0:2 pmol ) was annealed with 7 { 30 mg total RNA in 20 ml and extended with avian myeloblastosis virus reverse transcriptase ( Promega ) as described by the manufacturer , except that actinomycin D was present at 100 ug = ml [ 38 ] . 
+ Primer extension experiments were implemented for spr ( 8 mg zO2 RNA ) , dcuA ( 8 mg { O2 RNA ) , serC ( 8 mg zO2 RNA ) , aroL ( 30 mg and 15 mg { O2 RNA for P1 and P2 , respectively ) , yejG ( 30 mg zO2 RNA ) , hybO ( 30 mg { O2 RNA ) , ybgI ( 9 mg zO2 RNA ) , and ptsG ( 9 mg zO2 RNA ) . 
+ A dideoxy sequencing ladder was electrophoresed in parallel with the primer extension products on a 8 % ( wt = vol ) polyacrylamide gel containing 7M urea . 
+ In cases where the transcription start site could be assigned to one of two nucleotides , preference was given to the purine nucleotide . 
+ Software availability
+ The dPeak algorithm is implemented as an R package named dpeak and is freely available from http://www.stat.wisc . 
+ , edu / chungdon/dpeak / . 
+ We will commit dpeak to Bioconductor ( http://www.bioconductor.org ) and Galaxy Tool Shed ( http://toolshed.g2.bx.psu.edu ) upon publication . 
+ Supporting Information
+ Text S1 Supplementary methods for `` dPeak : High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data '' . 
+ ( PDF )
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/24146625.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/24146625.txt 0 → 100644
View file @27818a9
+ Binding Site Architecture to Regulate Carbon Oxidation
+ Abstract 
+ Introduction
+ Maintaining redox balance is a crucial function for cell survival . 
+ Alteration of the cellular redox environment has been shown to affect a broad range of biological processes including energy metabolism [ 1 -- 3 ] , protein folding [ 4 ] , signaling and stress responses [ 5 -- 9 ] . 
+ Despite this , we have only a superficial understanding of how cells control redox homeostasis at a global level . 
+ Since the cellular redox environment is a reflection of many different redox couples [ 10 ] , some of which are linked together through enzymatic reactions , an improved understanding of this process requires knowledge of how the redox state of each couple is controlled . 
+ One such important redox couple is NADH/NAD , + which plays a central role in catabolic pathways , shuttling electrons between donor and acceptor molecules and allowing cells to convert energy from various reduced substrates into cellular ATP . 
+ To ensure that catabolism proceeds , a balance between the rates of oxidation and reduction of NAD must be + maintained . 
+ Many diverse regulatory mechanisms have evolved amongst different organisms to control the redox state of the 
+ NADH/NAD couple [ 6,11 -- 14 ] . 
+ In this study we investigated + transcriptional inputs into this process by mapping the regulon of the transcription factor ArcA in Escherichia coli . 
+ The ArcAB two component system , comprised of the membrane bound sensor kinase , ArcB , and the response regulator , ArcA , coordinates changes in gene expression in response to changes in the respiratory and fermentative state of the cell [ 15,16 ] . 
+ This system is maximally activated in E. coli under anaerobic fermentative conditions when NADH from central metabolism is recycled to NAD by formation of the end products + succinate , ethanol and lactate . 
+ The DNA binding activity of ArcA is regulated through reversible phosphorylation by ArcB [ 17 ] , whose kinase activity is governed by the redox states of the ubiquinone and menaquinone pools [ 18 -- 20 ] that are linked to the NADH/NAD redox couple through respiration . 
+ In the absence + of O2 , decreased flux through the aerobic respiratory chain lowers the ratio of oxidized to reduced quinones , stimulating ArcB kinase activity and transphosphorylation of ArcA [ 19 ] . 
+ Additionally , fermentation products have been shown to enhance the rate of ArcB autophosphorylation [ 21 ] and there is a positive correlation between the rate of fermentation and the levels of phosphorylated ArcA ( ArcA-P ) [ 16 ] . 
+ Thus , enzymatic linkage of the NADH / NAD couple to the oxidation state of the quinone pool and the + production of fermentation products provides a link between the redox state of the NADH/NAD couple and the activity of the + ArcAB system . 
+ Indeed , artificial perturbation of the NADH / NAD ratio has been shown to alter ArcA activity [ 22 ] . 
+ + Consistent with the role of the ArcAB system in redox regulation , the majority of known ArcA targets in E. coli are associated with aerobic respiratory metabolism . 
+ Under anaerobic conditions , ArcA-P directly represses the operons encoding enzymes of the TCA cycle ( gltA , icdA , sdhCDAB-sucABCD , mdh , lpdA ) [ 23 -- 27 ] , and for the b-oxidation of fatty acids ( fadH , fadBA , fadL , fadE , fadD , fadIJ ) [ 25 ] , lactaldehyde ( aldA ) / lactate oxidation ( lldPRD ) [ 24,28 ] , and glycolate/glyoxylate oxidation ( glcC , glcDEFGBA ) [ 29 ] . 
+ In contrast , ArcA-P activates the expression of operons encoding three enzymes that are important for adapting to microaerobic or anaerobic environments [ cytochrome bd oxidase ( cydAB ) [ 24 ] , pyruvate formate lyase ( focA-pflB ) [ 30 ] and hydrogenase 1 ( hya ) [ 31 ] ] . 
+ However , gene expression profiling analyses indicate that the ArcA regulon is more complex than originally expected , including genes encoding a wide variety of functions outside of redox metabolism [ 32,33 ] . 
+ Salmon et al. [ 33 ] and Liu et al. [ 32 ] each identified .350 genes that were differentially expressed when arcA was deleted . 
+ However , there was only a minimal overlap between these datasets and it is unclear how many of these genes are direct vs. indirect targets of ArcA . 
+ Thus , although ArcA plays a prominent role in the anaerobic repression of genes that encode enzymes for aerobic respiratory metabolism , the full extent of the ArcA regulon remains unclear , preventing a comprehensive understanding of its physiological role . 
+ Despite the identification of several ArcA binding regions by footprinting , the sequence determinants for ArcA DNA binding are also not well understood . 
+ This is in large part due to the unusually long length ( 30 -- 60 plus bp ) [ 23,24,26,28 -- 30 ] and degenerate nature of these sequences , which makes bioinformatic searches challenging . 
+ Nevertheless , a 15-bp site consisting of two tandem direct repeats has been proposed as the ArcA recognition site [ 34 ] . 
+ A similar motif has been derived for Shewanella oneidensis ArcA based on binding energy measurements for every possible permutation of a 15-bp site [ 35 ] . 
+ However , a 15-bp site is insufficient to explain the extended footprints , raising the question of whether additional sequence conservation beyond 15 bp is important for ArcA DNA binding and transcriptional regulation . 
+ To determine the in vivo binding locations of ArcA in E. coli under anaerobic fermentative growth conditions , we utilized chromatin immunoprecipitation followed by sequencing ( ChIP-seq ) or hybridization to a microarray ( ChIP-chip ) . 
+ Bioinformatic analyses of sequences corresponding to ArcA-enriched regions were used to predict individual ArcA binding sites and to search for a binding motif that could explain the large ArcA footprints . 
+ Novel ArcA binding site architectures were then validated by DNase I footprinting . 
+ Additionally , gene expression profiling was + performed in arcA and DarcA backgrounds to determine the effect of ArcA DNA binding on gene expression . 
+ This combination of genome-wide approaches provided insight into the mechanism of ArcA DNA binding and transcriptional regulation . 
+ These results also allowed us to identify additional operons under direct ArcA control , thereby providing a more complete understanding of the physiological role of ArcA in E. coli . 
+ Results
+ Identification of the chromosomal binding locations of ArcA 
+ We mapped 176 chromosomal ArcA binding regions ( Table S1 ) across the genome of E. coli K-12 MG1655 during anaerobic fermentation of glucose using ChIP-chip and ChIP-seq ( Figure 1 ) . 
+ These sites include all but five of the 22 previously identified ArcA binding regions ( uvrA/ssb [ 36 ] , oriC [ 37 ] , ptsG [ 38 ] , rpoS [ 39 ] and sodA [ 40 ] ; Figure 1 ) ; the absence of a binding region upstream of sodA is likely the result of Fur outcompeting ArcA from binding [ 40 ] . 
+ ArcA binding was also examined during aerobic respiration using ChIP-chip and as expected , revealed a pronounced decrease in site occupancy ( Figure 1 ) except for a handful of peaks ( e.g. , ygjG and uxaB ) , which were not investigated further . 
+ As ArcA protein levels remained relatively constant between aerobic and anaerobic conditions ( data not shown and [ 16 ] ) , the decrease in occupancy under aerobic conditions can be explained by decreased ArcA-P levels , resulting from the increase in the ratio of oxidized to reduced quinones [ 20 ] . 
+ ChIP-seq analysis provides improved resolution compared to ChIP-chip
+ Overall , there was good agreement between the ChIP-chip and ChIP-seq datasets ( 109 peaks in common ) . 
+ However , 15 regions identified by ChIP-chip were resolved into 32 binding regions ( Table S2 ) using ChIP-seq and the CSDeconv peak deconvolution algorithm [ 41 ] . 
+ For example , compared to only one binding region resolved with ChIP-chip , three binding regions were identified upstream of cydA ( Figure 2A ) and two were identified within the divergent sdhC/gltA ( Figure 2B ) promoter region using ChIP-seq . 
+ Furthermore , the position of the peak calls with CSDeconv is consistent with the position of known ArcA binding sites mapped by DNase I footprinting within these promoters [ 24,27 ] and 29 of these 32 regions contain a predicted ArcA binding site ( Table S2 ) . 
+ The correlation of footprinted sites and predicted sites with CSDeconv peak calls allowed us to establish that binding sites separated by as little as 76 bp ( based on the CSDeconv-defined coordinate for each binding region ) could be resolved . 
+ From this analysis , several novel closely spaced ArcA binding sites , e.g. three binding regions upstream of cyo and two binding regions upstream of nuo and pdhR-aceEF-lpdA , were identified . 
+ Thus , since ChIP-seq provided higher resolution identification of ArcA binding sites , this dataset was used for all other analyses . 
+ More than 50 % of ArcA binding sites have additional DR elements beyond the ArcA box 
+ DNase I footprinting experiments indicate that ArcA-P typically binds to long stretches of DNA ( 30 -- 60 + bp ) [ 23,24,26,28 -- 30 ] . 
+ However , the sequence determinants beyond a 15 bp direct repeat within these long stretches are not well understood . 
+ Using our high resolution binding regions , we searched for a common sequence recognition element [ 42 ] , which identified a 18-bp sequence motif consisting of two direct repeat ( DR ) elements with a center to center ( ctc ) distance of 11 bp , close to the 10.5 bp per helical turn of B-form DNA , in nearly every ( 158 of 176 ) ArcA binding region ( Figure 3A ; Table S3 ) . 
+ While this result extended the previously described ArcA box from 15 to 18 bp [ 34 ] , we also found that many sites contained additional DR elements beyond the two DRs of the ArcA box . 
+ We then systematically searched the sequences surrounding each ArcA box with a 10-bp pair weight matrix 
+ ( PWM ) , corresponding to a single DR element ( Figure 3B ) , which revealed a diversity in the number and spacing of DR elements within ArcA binding sites . 
+ Although the largest class of binding sites contained just two DR elements at a ctc spacing of 11 bp ( 66 ) , the majority of ArcA-binding sites ( 92 ) contain three to five DR elements predominantly at a ctc spacing of 11 bp ( Figure 3C -- D , 
+ To validate the bioinformatic predictions , DNase I footprinting was performed for a representative set of promoters . 
+ Since the OmpR/PhoB family of response regulators is expected to dimerize upon phosphorylation [ 43 ] , we hypothesized that ArcA would bind as two adjacent dimers to sites with three consecutive DR elements ( e.g. , icdA and acs ) , three DR elements at which the distal DR is separated from DR2 by approximately two helical turns 
+ ( 22 bp ; e.g. , trxC ) , or four consecutive DR elements ( e.g. , astC ) and in each case , protect a region the size of four DRs ( ,44 bp ) . 
+ As anticipated , ArcA-P protected a 44 bp region at the astC promoter ( Figure 4A ) and a 48 bp region at the trxC promoter ( Figure 4B ) . 
+ In contrast , ArcA-P only protected 33 bp and 37 bp regions , respectively , at the icdA and acs promoters , which encompassed the three consecutive DR elements ( Figure 4C -- D ) . 
+ The result for icdA is in agreement with previous footprinting data [ 23 ] . 
+ Our footprinting data also suggested that the spacing between DR2 and DR3 is likely important for ArcA-P binding , because ArcA-P did not protect a predicted DR3 element in which the ctc distance between DR2 and DR3 contained an extra bp ( 12-bp spacing ; putP ) ; protection corresponded to only DR1 and DR2 ( Figure 4E ) . 
+ A potential explanation of this result is that the increased spacer distance disrupted potential protein-protein contacts between ArcA dimers . 
+ Additionally , our footprinting data identified 57 bp and 60 bp ArcA-P-binding regions , respectively , at the paaA and phoH promoters , which spanned from three consecutive predicted DRs to a distal DR element spaced nearly two full helical turns away ( 22 bp ) ( Figure 4F -- G ) . 
+ As expected , no footprints were detected with unphosphorylated ArcA ( data not shown ) . 
+ Unexpectedly , the ArcA-P footprint at the dctA promoter extended 50 bp downstream of the predicted two DR site ( Figure 4H ) , although this extended region was less well protected . 
+ A bioinformatic search revealed a second , but weaker two DR site at the downstream end of this protected region on the opposite DNA strand but no DR elements in the intervening 24 bp region , suggesting that protein-protein contacts may compensate for the absence of identifiable sequence elements at this site . 
+ Altogether , these results suggest that the length of the ArcA-P footprint reflects the location of the outermost DR elements within the binding site . 
+ In addition , these data reveal plasticity in the architecture among ArcA binding sites with anywhere from two to five DR elements of differing predicted strength present at any given site . 
+ The footprinting results also revealed interesting features about ArcA-P DNA binding . 
+ At acs and astC , all DR elements were occupied at the same ArcA-P concentration , whereas at icdA , paaA , phoH , and trxC , occupation of DR3 or DR4 required a higher concentration of ArcA-P . 
+ The difference in concentration dependent occupancy of the DR elements at the icdA and acs promoters likely reflects the fact that DR3 of acs is a better match to the ArcA DR element PWM than DR3 of icdA ( 5 bits versus 3 bits ) . 
+ Furthermore , the transition from an unbound to bound state occurred over a narrow range in ArcA-P concentration , suggesting that ArcA-P binding to DR sites is cooperative , although the apparent degree of cooperativity also varied from site to site . 
+ Cooperative binding was particularly striking at the acs and astC promoters and for the three DR region at the phoH promoter , for which saturation occurred with less than a four-fold increase in ArcA-P levels . 
+ Finally , we also found that the average sequence conservation of DR elements in predicted binding sites with two , three and four equally spaced DR elements decreases with an increasing number of repeats ( Figure S1 ) . 
+ DNase I hypersensitive sites were observed at six of the tested promoters , suggesting that ArcA-P binding to multiple DR sites also results in a bend or kink in the DNA . 
+ However , the locations of these hypersensitive sites differed from site to site . 
+ For example , a hypersensitive site was observed within the spacer region between the 22-bp spaced DR element and the other DR elements at the trxC , paaA and phoH promoters , whereas hypersensitive sites were observed within DR1 and DR2 at the icdA promoter ( +8 and +19 ) . 
+ In contrast , hypersensitive sites were located upstream and downstream of the footprinted regions at the acs and astC promoters , respectively . 
+ Thus , the binding site architecture appears not only to dictate the length of ArcA-P binding sites , but also to affect the concentration dependence of site occupancy and the DNA structure at target operons . 
+ These variations in ArcA-P binding likely have important implications for global transcriptional regulation . 
+ ArcA-P directly regulates the expression of 85 operons under anaerobic fermentative growth conditions
+ To determine which ArcA binding regions exert an effect on transcription , genome-wide mRNA expression profiles for wild type ( WT ) and DarcA strains were examined . 
+ In total , 229 differentially expressed operons ( Table S5 ) were identified , 85 of which were associated with one or more of 88 ArcA binding regions ( Text S1 ) and , thus , are directly regulated by ArcA ( Figure 5 , Table S6 ) . 
+ More than half of the operons that we found to be regulated directly by ArcA have not been previously reported ( Table S6 ) but consistent with previous studies , ArcA acted predominantly as a transcriptional repressor ( Figure 1 ) . 
+ ArcA directly represses 74 operons . 
+ ArcA functions predominantly as a global repressor of pathways associated with the oxidation of non-glycolytic carbon sources . 
+ This includes all previously identified ArcA targets associated with central metab-olism ( e.g. , the genes encoding pyruvate dehydrogenase , cyto-chrome o ubiquinol oxidase , NADH-quinone oxidoreductase I , and the enzymes of the TCA cycle ) ( Figure 6 ) . 
+ In addition , ArcA repressed the genes encoding enzymes , transcriptional regulators , or transporters associated with short chain acid/aldehyde oxidation ( aldA , lldPRD , acs-yjcH-actP , glcC , glcDEFGBA and fdoGHI ; bolded operons have not been previously reported ) , amino acid and polyamine oxidation ( puuA , puuDR , ygjG , potFGHI , as-tABCDE , argT-hisQMP , putA , putP ) , b-oxidation of fatty acids ( fadH , fadBA , fadL , fadE , fadD , fadIJ , tesB ) , aromatic compound oxidation ( hcaR , mhpR , feaR ) , other carbon oxidation pathways ( betIBA , betT , ugpBAED , gcd , maeB ) and peptide utilization ( cstA ) . 
+ Other ArcA repressed targets include methionine sulfoxide reductase ( msrB ) , thioredoxin 2 ( trxC ) [ 44,45 ] , a soluble pyridine nucleotide transhydrogenase ( sthA ) that reduces NAD with NADPH , and an ADP-sugar pyrophosphorylase + ( nudE ) that could play a role in maintaining an optimal NADH / NAD ratio based on its ability to use NADH as a substrate [ 46 ] . 
+ + Finally , an ArcA-regulated ribonucleoside transporter ( nepI ) and a nucleoside diphosphate kinase ( ndk ) could also function in NAD + homeostasis via their functions in nucleotide metabolism [ 47 ] . 
+ A few repressed operons ( 9 ) encode proteins with functions not known to be associated with redox metabolism . 
+ This includes bssR and csgD , which encode transcription factors involved in biofilm formation and curli biosynthesis , respectively , and rsd , encoding a stationary phase induced anti-s factor . 
+ Additionally , ArcA repressed outer membrane proteins ( cirA , ompW ) , a potassium efflux system ( kefGB-yheV ) , the ATPase component of the ClpAP protease ( clpA ) , an ATP binding protein ( phoH ) and a methyl-galactoside ABC transporter ( mgl ) . 
+ Although a rationalization for ArcA repression of each of these operons is not yet known , the control of mgl may be related to the report that E. coli K-12 is unable to grow fermentatively on galactose [ 48 ] . 
+ Finally , 13 repressed operons have only predicted or unknown function [ 47 ] ; four are predicted membrane proteins , two are predicted transcriptional regulators ( ydcI , yjiR ) , and two others are predicted to encode a dehydrogenase ( yeiQ ) and a fimbrial-like adhesin protein ( yehD ) , respectively . 
+ To gain insight into the mechanism of ArcA repression , we 70 examined s ChIP-seq data collected from growth conditions identical to those used with ArcA [ 49 ] . 
+ The vast majority ( 56/65 ) of ArcA-repressed promoters , exhibited a statistically significant reduction in s peak height under anaerobic conditions 70 compared to aerobic conditions , consistent with ArcA preventing RNA polymerase binding ( Table S6 ) . 
+ In agreement with this observation , correlation of the position of predicted ArcA binding 70 sites with known s - dependent transcription start site ( TSSs from EcoCyc [ 47 ] or [ 50 ] ) indicated that the majority of repressed targets ( 52/66 ) with a confirmed TSS have an ArcA binding site 70 that overlaps the region bound by s - RNAP ( the TSS , the 235 element or the 210 elements ; Figure 7A , Table S6 ) . 
+ Eight promoter regions for ArcA-repressed operons did not exhibit a 70 decrease in s occupancy . 
+ Because these sites are located within divergently transcribed regions where the other operon is not 70 affected by ArcA , s occupancy may reflect only the adjacent non-ArcA-regulated promoter . 
+ In summary , the positioning of ArcA binding sites is consistent with the O2-dependent decrease in 70 s occupancy that is observed at nearly all ArcA-repressed operons , suggesting that ArcA likely represses transcription through promoter occlusion . 
+ ArcA directly activates 11 operons . 
+ Analysis of the function of directly activated genes indicated a diversity of functions . 
+ This includes hydrogenase 1 ( hyaABCDEF ) [ 51 ] , the ferrous iron transporter ( feoABC ) , an oligopeptide ABC transporter ( oppA ) , and the acid phosphatase transcriptional regulator ( appY ) that is involved in anaerobic gene regulation [ 52 ] . 
+ Our data also suggest a role for ArcA in the acid resistance response by activating operons encoding regulators of the glutamate dependent acid resistance system ( gadE-mdtEF and gadXW ) [ 53,54 ] , the arginine dependent acid resistance ( adiC ) system [ 55 ] and the resistance to organic acid stress ( slp-dctR ) [ 56 ] . 
+ The remaining ArcA-activated targets encode genes of unknown function ( ybcW and ybfA ) and a small regulatory RNA , fnrS [ 57,58 ] . 
+ Although fnrS was not present on our microarrays , a previous study showed that ArcA is a coactivator of this sRNA [ 57 ] . 
+ 70 Examination of the s occupancy data indicated that there is a 70 statistically significant change in s - RNAP occupancy under anaerobic conditions for nine of the 10 directly activated operons , consistent with ArcA functioning in activation of these operons ( Table S6 ) . 
+ However , both the position and orientation of the predicted ArcA binding site relative to the TSS for each operon is variable among activated targets ( Figure 7B ) . 
+ Some binding sites are located downstream of the nearest mapped TSS , whereas others overlap the promoter elements or are located as far as 200 -- 400 bp upstream of the TSS . 
+ Given this variable positioning and orientation of ArcA binding sites , it remains unclear whether ArcA can activate transcription by directly contacting s - RNAP as 70 found with some OmpR/PhoB family members [ 59 -- 61 ] . 
+ The direct regulon of ArcA extends beyond the 85 operons identified under our growth conditions
+ Many intergenic ArcA binding regions ( 76 ) were associated with operons that did not show an ArcA dependent change in gene expression in our studies . 
+ However , previous studies indicated that 
+ 13 operons are regulated by ArcA but under different growth conditions ( Table S7 ) . 
+ For example , cydAB expression is activated by ArcA under microaerobic growth conditions , when FNR repression is relieved [ 62 ] . 
+ Furthermore , many binding regions ( 31 ) are associated with operons that are poorly expressed under our growth conditions in both the arcA and DarcA strains ( e.g. , paa + operon ; Table S8 ) . 
+ Since ArcA is predominantly a repressor of transcription , we hypothesized that these promoters were repressed by a second transcription factor or require a transcriptional activator and , therefore , growth under inducing conditions would be required to see an effect of ArcA binding on the transcription of these operons . 
+ To test this idea , we constructed a paaA promoter-lacZ fusion and measured b-galactosidase activity in WT and DarcA strains supplemented with phenylacetate ( PA ) because the paaABCDEF-GHIJK operon is known to be repressed by PaaX in the absence of 
+ PA [ 63 ] . 
+ In the presence of PA , ArcA strongly repressed paaA-lacZ expression under anaerobic conditions ( 23 Miller units for WT ) , whereas repression was relieved in a strain lacking ArcA ( 404 Miller units ) or under aerobic conditions ( 294 and 372 Miller units for WT and DarcA , respectively ) , indicating that ArcA prevents induction of the paa operon under anaerobic conditions even when 
+ PA is present . 
+ Examination of regulatory data in EcoCyc [ 47 ] indicated that 11 other poorly expressed operons also are associated with other annotated activators or repressors ( Table S8 ) that may contribute to synergistic regulation with ArcA . 
+ Furthermore , ChIP-chip experiments for other transcriptional repressors indicated that under our growth conditions , 15 targets are also bound by Fur , H-NS , or both [ Beauchene and Kiley , personal communication ; [ 49 ] ] ( Table S8 ) . 
+ Thus , repression by Fur and H-NS may mask effects of ArcA . 
+ Altogether , these results indicate that ArcA repression likely serves as a secondary layer of control at many of these operons , ensuring that induction does not occur under anaerobic conditions even when the specific inducer is encountered . 
+ Thus , the 85 operons that show a change in expression under fermentative growth with glucose represent just a subset of the complete ArcA direct regulon . 
+ The indirect regulon of ArcA may reflect a hierarchical mode of transcriptional regulation 
+ Of the 229 operons regulated by ArcA , 145 lacked ArcA binding in vivo and have not been shown previously to be directly regulated by ArcA . 
+ To assess whether an ArcA binding site was missed by our ChIP analyses at any of these operons , we searched the intergenic region upstream of each operon using a cutoff of 15 bits ( representing the average sequence conservation of the ArcA sequence logo ) . 
+ An ArcA binding site was identified upstream of only seven operons ( acnA , prpR , folE , yibF , yigI , dcuC/crcA ) , indicating that the remaining 135 operons are likely regulated through an indirect mechanism . 
+ Since ArcA directly regulates the expression of 17 transcription factors , a hierarchical mode of regulation could , in part , explain the differential expression of some of these operons . 
+ Although not all of these transcription factors are expected or known to be active under our growth conditions , differential expression of nine operons can likely be traced to one of these transcription factors ( Figure 8 ) . 
+ For example , the expression of the AppY dependent appCBA-yccB operon [ 52 ] is decreased when arcA is deleted , presumably because of the decrease in appY activation by ArcA . 
+ In addition , four target operons ( folE , gpmA , dld and eco ) of the ArcA-activated sRNA , FnrS were upregulated in the arcA mutant [ 57,58 ] . 
+ Finally , although we did not identify an ArcA binding site upstream of arcZ , the downregulation of sdaC ( the most strongly repressed target of the ArcZ sRNA in S. enterica [ 64 ] ) in the absence of arcA is consistent with ArcA-dependent activation of arcZ [ 65 ] . 
+ ArcA prevents the oxidation of non-fermentable carbon sources during fermentation 
+ Examination of EcoCyc ( v15 .5 ) [ 47 ] for annotated dehydrogenase enzymes ( MultiFun term BC-1 ) , indicated that ArcA either directly or indirectly regulates 37 out of 40 non-glycolytic dehydrogenase enzymes that are favored in the direction of reducing equivalent formation and are not involved in biosynthetic or detoxification functions ( Table S9 ) . 
+ The carbon oxidation pathways and transporters associated with the substrates of each repressed dehydrogenase are displayed in Figure 6 and the majority of these pathways feed into the TCA cycle for further carbon oxidation . 
+ The scope of this repression strongly suggests that a major function of ArcA is to repress all genes encoding enzymes that oxidize non-fermentable carbon compounds , thus preventing the formation of excess reducing equivalents ( e.g. , NADH , FADH2 and quinols ) that can not be readily re-oxidized in the absence of respiration . 
+ Nevertheless , despite the extensive upregulation of dehydrogenase enzymes , ArcA mutants have only a small increase in doubling time from 90 to 105 min ( Figure S2A ) and only a minor alteration in the distribution of fermentation end products ( Figure S2B -- C ) . 
+ Succinate and ethanol production were marginally increased and decreased by equivalent amounts in a DarcA strain , respectively , and lactate was not a major fermentation product ( Figure S2C ) . 
+ This suggests that the NADH/NAD + ratio was not likely perturbed in our DarcA strain in agreement with previous results [ 66,67 ] . 
+ Distinct functional roles for ArcA and FNR
+ Although ArcA and FNR are known to mediate widespread changes in gene expression during the transition from aerobic to anaerobic conditions , the extent of the regulatory overlap between these factors has not been established . 
+ Previous gene expression studies have suggested that there may be a large overlap between the genes regulated by ArcA and FNR in both E. coli [ 33 ] and S. enterica [ 68 ] . 
+ However , comparison of our dataset with that determined recently for FNR using identical growth conditions , suggests that there is little direct coregulation ( Figure S3 ) . 
+ Of the 37 operons that showed both FNR and ArcA dependent changes in expression , only seven are directly regulated by both ArcA and FNR . 
+ Rather , differential expression may result from an indirect effect of a fnr deletion on ArcA-P levels , which has been previously suggested to explain the FNR-dependent effect on sdhC and lldP expression [ 69 ] . 
+ An additional 12 operons show both ArcA and FNR binding in vivo but are differentially expressed in only one dataset ( e.g. , focA-pflB , cydAB ) . 
+ This minimal overlap in the direct regulons of ArcA-P and FNR suggests that these regulators occupy distinct functional roles in anaerobic gene regulation ; the ArcA regulon is largely centered around the repression of aerobic carbon oxidation pathways while FNR appears to function as a more general activator of anaerobic gene expression [ 49 ] . 
+ Some coregulated operons encode enzymes that direct carbon flow towards either oxidative or fermentative metabolism ( e.g. , pdhR-aceEF-lpdA , focA-pflB , yfiD ) while others encode principal components of the respiratory chain ( e.g. , nou , ndh , cydA ) . 
+ However , coregulation of other operons ( e.g. bssR , ompW , ompC , oppA , ygjG , msrB ) by ArcA and FNR is surprising and the physiological implications of this coregulation are unknown . 
+ Discussion
+ By comparing ArcA binding in vivo with gene expression profiling data , we have greatly expanded the number of operons regulated by ArcA , leading to important insights into the physiological role , mechanism and sequence requirements for ArcA transcriptional regulation . 
+ Our analysis indicates that ArcA directly regulates the expression of nearly 100 operons and is predominantly a repressor of genes encoding proteins associated with carbon oxidation pathways . 
+ Furthermore , identification of binding sites upstream of many poorly expressed operons ( e.g. , paa ) suggests that the direct regulon of ArcA could actually encompass as many as 150 operons . 
+ Additionally , our bioinformatic and DNase I footprinting analyses reveal a plasticity in the ArcA binding site architecture that likely has important implications for global regulation of carbon oxidation in E. coli . 
+ ArcA is a global repressor of carbon oxidation pathways 
+ Our finding that under anaerobic conditions , ArcA reprograms metabolism by either directly or indirectly repressing expression of nearly all pathways for carbon sources whose oxidation is coupled to aerobic respiration suggests a global mechanism for NAD + sparing . 
+ This strategy would facilitate the preferential oxidation of the fermentable carbon source glucose and the sparing of NAD + for glycolysis by recycling NADH to NAD via reductive + formation of lactic acid , succinate and ethanol . 
+ Thus , ATP synthesis via substrate level phosphorylation is ensured and redox + balance of NADH/NAD is maintained during anaerobic glucose fermentation . 
+ This function of ArcA exhibits parallels to carbon catabolite repression in that it is another mechanism for selective carbon source utilization in cells . 
+ Although carbon catabolite repression preferentially selects for glucose utilization over other sugars , ArcA reinforces glucose catabolism through the repression of non-glycolytic carbon oxidation pathways . 
+ By integrating signals from both respiratory and fermentative metabolism , which are both enzymatically linked to the NADH/NAD redox couple , + the ArcAB two component system provides a means for E. coli to + maintain the NADH/NAD ratio . 
+ Despite the extensive upregulation of dehydrogenase enzymes in an arcA mutant , there was only a minor alteration in fermentation products . 
+ This result is in agreement with previous data , which also showed that the NADH/NAD ratio is not + perturbed in strains lacking ArcA during fermentation [ 66,67 ] . 
+ The ability of glucose fermenting cells to maintain redox balance in the absence of ArcA likely reflects thermodynamic and kinetic parameters that favor flux via glucose fermentation and the fact that although many dehydrogenases are upregulated , their substrates are not present preventing competition with glycolysis . 
+ Indeed , the activity of several dehydrogenases in cellular extracts was previously shown to be increased in an arcA mutant . 
+ However , + the fact that the NADH/NAD ratio is altered in an arcB strain [ 70 ] may be explained by the additional roles of ArcB beyond regulating ArcA [ 39,71 ] . 
+ Nevertheless , previous studies suggest that ArcA deficiencies may compromise growth more significantly under conditions that more closely parallel the natural habitats of E. coli . 
+ For example , an arcA mutant is defective in both survival during aerobic carbon starvation [ 72 ] and in colonization of the mouse intestine [ 73 ] . 
+ Increased NADH/NAD ratios have been observed in an arcA + mutant during microaerobiosis [ 66,67 ] , which may contribute to the poor fitness of arcA mutants in the gut . 
+ Accordingly , it seems reasonable to conclude that this extensive repression of dehydrogenase enzymes by ArcA provides an evolutionary advantage for 
+ E. coli in its natural habitats where nutrient conditions are in flux and where many more growth substrates ( i.e. , both carbon sources and electron acceptors ) could be encountered . 
+ Surprisingly , very little in vitro data are available describing mechanisms of ArcA transcription regulation . 
+ Nevertheless , the location of the ArcA binding sites and the decrease in s 70 occupancy indicate that ArcA represses by occluding RNA polymerase binding like many repressors . 
+ However , the mechanism of activation is unlikely to occur through the direct recruitment of RNA Polymerase as observed with ArcA homologs OmpR [ 60,61 ] and PhoB [ 59 ] since no conserved location or orientation of ArcA binding sites was evident . 
+ Rather , ArcA may increase transcription through an antirepression mechanism . 
+ In support of this notion , in vivo studies of hyaA [ 31 ] , cydAB [ 74 ] , appY [ 75 ] and yfiD [ 76 ] transcription suggest that ArcA activation occurs primarily through disruption of HNS ( cydAB and appY ) , FNR ( yfiD ) or IscR ( hyaA ) binding . 
+ Furthermore , although the mechanism of ArcA activation of focA-pflB [ 77 ] and the PY promoter ( from the conjugative resistance plasmid R1 ) [ 78 ] is unknown , DNA binding by ArcA alone appears insufficient for its transcriptional activation . 
+ In addition , binding of ArcA alone actually repressed transcription of ndh [ 79 ] , despite the observation that ndh expression increased when arcA was deleted [ 32 ] . 
+ Although further in vitro experiments are necessary to investigate the activation mechanism , it seems plausible that ArcA functions solely by binding DNA and activates only indirectly when its binding interferes with the binding and repression by another transcriptional repressor . 
+ Plasticity within the architecture of ArcA binding sites 
+ The variation in the number , spacing , location and predicted strength of DR elements within the chromosomal ArcA binding regions suggests plasticity in the architecture of ArcA binding sites for either repressed or activated operons . 
+ Although the core of each site is an ArcA box containing two , 11-bp ctc spaced DR elements , the majority of binding sites contain an additional one to three DRs predominantly-spaced by approximately one or two turns of the helix of B-form DNA ( 11 bp or 22 bp ctc spacing ) . 
+ Multiple DR elements have also been observed for some promoters regulated by OmpR [ 80 ] and PhoB [ 59,81,82 ] . 
+ However , it is unclear how pervasive multiple repeat elements are for these regulators because the 41 genomic PhoB binding locations recently mapped by ChIP-chip were not searched for sequence elements beyond a single PhoB Box [ 83 ] and a conserved sequence motif was not identified within the majority of the 43 OmpR binding sites identified with ChIP-seq [ 84 ] . 
+ Although the three direct repeat binding site architecture represents a particularly novel finding for the OmpR/PhoB family of response regulators , at least one other example of a response regulator , ComA in B. subtilis , which binds three recognition elements ( i.e. , an inverted repeat and an additional half site ) has been reported and all three elements were shown to be important for both DNA binding and transcriptional activation [ 85 ] . 
+ Whether the protection of only three DR elements by ArcA reflects binding by a dimer and monomer or two dimers , where the distal subunit is not bound sufficiently to protect sequences from DNase I cleavage , is not yet known . 
+ Implications of binding site plasticity for global ArcA transcriptional regulation
+ Since the majority of ArcA binding sites overlap the s 70 promoter recognition elements , the plasticity of these cis-regula-tory modules may provide an efficient means of encoding binding sites for ArcA , s - RNAP and perhaps other transcription factors 70 within the same narrow sequence space . 
+ We propose that having binding sites with different architectures is also an effective mechanism for producing diverse transcriptional regulatory outputs . 
+ First , varying the number , strength or location of DR elements should modulate the extent of anaerobic repression . 
+ Second , embedding transcription factor binding sites within an ArcA binding site could either enhance or antagonize ArcA function . 
+ For example , the DR elements at the trxC , paaA and phoH promoters also overlap a binding site for a transcriptional activator ( CRP for paaA [ 63 ] , OxyR for trxC [ 45 ] ) or a second promoter ( P2 at phoH [ 86 ] ) , allowing additional regulatory control . 
+ Third , sites of varying affinities may also impact the sensitivity of promoters to the phosphorylation state of ArcA . 
+ For example , the different binding affinities of DR elements at the trxC , icdA , paaA and phoH promoters may allow the fine-tuning of expression in response to changing ArcA-P levels when O2 levels vary [ 16 ] . 
+ Fine tuning of ompF and ompC expression by OmpR has been observed in response to medium osmolarity due to the presence of multiple upstream OmpR boxes with different affinities [ 80 ] . 
+ Conversely , the highly cooperative mode of occupancy at the astC and acs promoters would likely render the expression of these operons exquisitely sensitive to changes in ArcA-P levels ; thus , expression may more closely resemble an on-off switch . 
+ Ultimately , such flexibility in transcriptional regulatory outputs may be an important means for linking the redox sensing properties of the ArcAB two component system with the global optimization of carbon oxidation pathway levels . 
+ Further studies are underway to examine the contribution of different binding site architectures to 
+ Materials and Methods
+ Growth conditions
+ All strains were grown in MOPS minimal medium [ 87 ] with 0.2 % glucose at 37uC and sparged with a gas mix of 95 % N2 and 5 % CO2 ( anaerobic ) or 70 % N2 , 5 % CO2 , and 25 % O2 ( aerobic ) . 
+ Cells were harvested during mid-log growth ( OD600 of ,0.3 on a Perkin Elmer Lambda 25 UV/Vis Spectrophotometer ) . 
+ Construction of promoter-lacZ fusions and b-galactosidase assays
+ A paaA promoter-lacZ fusion was constructed as described previously [ 88 ] by amplifying the region from +15 to 2194 relative to the translation start using primers flanked by XhoI or BamHI restriction sites . 
+ A TAA stop codon was incorporated after codon 5 to terminate translation from the Shine-Dalgarno sequence present in this region . 
+ The resulting PCR fragment was digested with XhoI and BamHI and directionally cloned into plasmid pPK7035 . 
+ This lacZ promoter construct was then recombined into the chromosomal lac operon as previously described [ 88 ] to create the paaA promoter-lacZ fusion and then transduced using P1 vir into MG1655 and PK9416 ( DarcA ) to creating PK9959 and PK9960 ( Table S10 ) . 
+ For assays with paaA , 1 mM phenylacetic acid ( Sigma Aldrich ) was added to the minimal glucose media . 
+ To terminate cell growth and any further protein synthesis chloramphenicol ( final concentration , 20 mg/ml ) was added , and cells were placed on ice until assayed for bgalactosidase activity [ 89 ] . 
+ b-galactosidase values represent the average of at least three replicates . 
+ Cloning , overexpression and purification of His6-ArcA arcA was amplified with primers which incorporated a NheI restriction site , a His6-tag and a Tev protease cleavage site ( order listed in 59-39 direction ) on the 59 end of the gene and a XhoI site at the 39 end . 
+ The NheI and XhoI digested fragments were cloned into plasmid pET 21-d to generate plasmid PK9431 for protein production . 
+ E. coli BL21 ( DE3 ) , containing PK9431 was grown at 37uC until an OD600 of 0.5 -- 0.6 was reached then 1 mM isopropyl-1-thio-b-D-galactopyranoside ( IPTG ) was added . 
+ After seven hours at 30uC , cells were harvested , suspended in 5 mM imidazole , 50 mM Tris-Cl , pH 8.3 and 0.3 M NaCl and lysed by sonication . 
+ His6-ArcA was isolated from cell lysates by passage over a Ni-NTA column pre-equilibrated with 5 mM imidazole , washing extensively with the same buffer followed by 50 mM imidazole , and then eluting with a linear gradient of 50 -- 500 mM imidazole . 
+ Fractions containing the overexpressed His6-ArcA , determined by electrophoresis , were dialyzed against 50 mM Tris-Cl , pH 8.0 and 0.1 M NaCl and concentrated . 
+ Antibodies to ArcA were obtained from Harlan ( Indianapolis , In ) , affinity purified prior to use and determined to be specific to ArcA by Western blot ( data not shown ) . 
+ For DNase I footprinting , the His6 tag was removed from ArcA by overnight incubation with tobacco etch virus ( TEV ) protease at 4uC and passage over a Ni2 + - agarose column ( Qiagen ) . 
+ The protein concentration of ArcA ( reported here as monomers ) was determined with the Coomassie Plus protein assay reagent ( Pierce ) , using bovine serum albumin as a standard . 
+ Chromatin immunoprecipitation followed by hybridization to a microarray chip or high-throughput sequencing ChIP was performed as previously described [ 90 ] using the affinity purified ArcA polyclonal antibodies . 
+ ChIP DNA along with corresponding input DNA were amplified by linker-mediated PCR and labeled with Cy3 or Cy5-random 9-mers then hybridized as previously described [ 49 ] to custom-made E. coli 
+ K-12 MG1655 tiled genome microarrays ( Roche NimbleGen , Inc , Madison , WI ) . 
+ The hybridized microarrays were scanned using NimbleGen Hybridization System 4 and the PMT was adjusted as previously described [ 49 ] . 
+ Quantile normalization ( `` normalize.-quantiles '' in the R package VSN ) [ 91 ] was used to obtain the same empirical distribution across the Cy3 and Cy5 channels and across biological replicate arrays to correct for dye intensity bias and to minimize microarray-to-microarray absolute intensity variations as previously described [ 92 ] . 
+ The log2 of the ratio of experimental signals ( Cy5 ) to control signals ( Cy3 ) was calculated . 
+ Regions of the genome enriched for occupancy by ArcA were identified using TAMALPAIS [ 93 ] L2 and L3 stringency levels ( 95th percentile/p ,0.0001 and 98th percentile/p ,0.05 of the log2 ratio for each chip , respectively ) with the anaerobic fermentative ArcA data . 
+ Only enriched regions that were significant in both biological replicates were considered , resulting in the identification of 194 binding regions . 
+ Four false positives were eliminated from the data set by analyzing technical replicate ChIP-chip results from a strain lacking arcA ( PK9416 ; Table S11 ) . 
+ Fifty-three false positives were eliminated because we found that they resulted from ArcA co-immunoprecipitating with RNA polymerase at highly transcribed regions ( Figure S4 ; Table S12 ; Text S1 ; Table S12 ) leaving 137 regions . 
+ The phosphorylation dependence of ArcA DNA binding at these sites was determined by performing a single biological replicate ArcA ChIP-chip experiment under aerobic conditions . 
+ For visualization , the anaerobic ArcA biological replicates were averaged then median smoothed using a 300 bp window using MochiView [ 94 ] . 
+ For ChIP-seq , enriched ChIP DNA from two additional biological replicates from anaerobic ArcA samples were submitted to the University of Wisconsin-Madison DNA Sequencing Facility for library construction and Illumina sequencing performed as previously described [ 49 ] . 
+ A total of 1,364,908 and 12,074,358 reads were obtained for the ChIP replicates . 
+ Greater than 90 % and 80 % of these reads , respectively , mapped uniquely to the K12 MG1655 genome ( version U00096 .2 ) using the software package SOAP release 2.20 , allowing no more than two mismatches [ 95 ] . 
+ The CSDeconv algorithm [ 41 ] was then used to determine significantly enriched regions in high resolution using both ChIP-seq replicates and two anaerobic input samples [ 49 ] from the same sequencing run as the ArcA ChIP samples . 
+ Reads that mapped uniquely within the seven rRNA operon regions were eliminated to allow the algorithm to run more efficiently . 
+ CSDeconv was run with Matlab v7 .11.0 ( R2010b ) using the following parameters : LLR = 21.75 and alpha = 800 for replicate one and LLR = 22 and alpha = 550 for replicate two . 
+ The find_enriched function was modified to account for differences in sequencing depth between the IP and Input samples . 
+ Correction factors of 2.98 ( replicate 1 ) and 0.6579 ( replicate 2 ) , calculated by dividing the number of unique reads in the Input sample by the number of reads in the ChIP sample for replicates one and two , respectively , were multiplied by nip and the forward and reverse kernel density calculations for both the forward and reverse strands of the ChIP sample . 
+ FDRs of 0.0154 and 0.0156 for replicates one and two , respectively , were calculated by a sample swap ( the number of peaks in the Input over the ChIP sample divided by the number of detections in the ChIP over the control sample ) . 
+ From 222 enriched regions generated from two independent ChIP-seq replicates , 146 ArcA-P binding regions ( Table S1 ) were obtained using the same filtering criteria described for ChIP-chip ( Table S12 ; Text S1 ) . 
+ For visualization of the ChIP-seq data , the raw tag density at each position was calculated using QuEST version 2.0 [ 96 ] and normalized as tag density per million uniquely mapped reads . 
+ The final list of 176 binding regions was obtained by searching binding regions that were found in only one ChIP-seq replicate ( 48 ) or were unique to ChIP-chip ( 28 ) with the ArcA box PWM ( see below ) using a cutoff of 10 bits as 99 % of ArcA boxes in the alignment have an individual information content of 10 bits or greater . 
+ An ArcA binding site was identified in 30 of these binding regions ( 15 from ChIP-chip and 15 from ChIP-seq ) which were , therefore , combined with the 146 regions found in both ChIP-seq replicates to produce the final list of 176 ArcA chromosomal binding regions ( Table S1 ) . 
+ ArcA PWM construction and identification of predicted ArcA binding sites 
+ Based on the improved resolution of ChIP-seq , sequence corresponding to a 200 bp window around each of the 146 CSDeconv binding regions ( averages of the two replicates ) was searched for a common motif using MEME [ 42 ] with the parameters - mod zoops - nmotifs 1 - minw 18 - maxw 25 . 
+ Using the alignment from MEME , a sequence logo was built using the Delila software package with the delila , encode , rseq , dalvec , and makelogo programs [ 97 ] . 
+ A PWM generated from this alignment was used to search the 146 binding regions with a cutoff of 9 bits as this represents the lowest scoring ArcA box included in the MEME alignment . 
+ Using the program localbest , only the best scoring ArcA box within a 200 bp region was retained due to several instances of overlapping ArcA-P boxes being identified ( sites with three and four DR elements ) . 
+ The resulting 128 ArcA-P boxes were used to make the final sequence logo ( Figure 3A ) . 
+ The delila program ri [ 97 ] was used to calculate the information content of individual sequences within the positions 23 and 14 , which ranged from 9.1 to 21 bits ( Table S3 ) . 
+ A PWM derived from the conservation of bases between positions 23 and 14 in these 128 ArcA-P boxes , is referred to throughout the paper as the ArcA box PWM . 
+ No unique motif was identified within the 18 binding regions without a match to the ArcA box . 
+ The scan program [ 97 ] was used to search DNA sequences upstream of differentially expressed operons that were not enriched in ChIP using the ArcA box PWM . 
+ The E. coli K12 genome sequence [ 98 ] was obtained from GenBank ( v. U00096 .2 ) and a bit score cutoff of 15 bp bits was used as this represents the average information content of the ArcA box PWM . 
+ The localbest program was used to select the best scoring ArcA box within a 200 bp region in cases where two sites were predicted in close proximity . 
+ To construct the 10 bp PWM corresponding to a single direct repeat element , positions 23 to 6 and 8 to 17 from the 128 sequences used to make the ArcA box sequence logo were aligned as they correspond to the nucleotides contacted by each PhoB monomer in the crystal structure of the C-terminus of PhoB bound to its PhoB box [ 99 ] . 
+ Due to the identical spacing between DR elements and the highly similar nucleotide compositions of the PhoB and ArcA boxes , this structure likely serves as a good model for the nucleotides contacted by each ArcA monomer . 
+ A bit score cutoff of 0 , which represents the theoretical lowest limit of binding [ 97 ] , was used to search a 100 bp region surrounding each identified ArcA box with the scan program to identify sites with additional repeat elements . 
+ Where displayed , sequence walkers were used to visualize matches to the ArcA-P binding site using the lister program [ 100 ] . 
+ Gene expression profiling with a microarray
+ An in-frame DarcA deletion strain was constructed by replacing the coding region of arcA ( codons 2 -- 238 ) with a Cm resistance R cassette flanked by FLP recognition target ( FRT ) sites from plasmid pKD32 in strain BW25993/pKD46 , as described previously [ 101 ] to generate PK7510 . 
+ Transduction with P1 vir was used to move the arcA : : cat allele into MG1655 to produce PK7514 . 
+ The Cm cassette of PK7514 was removed by R transforming this strain with pCP20-encoding FLP recombinase [ 101 ] then screening for loss of Cm , generating PK9416 ( Table S10 ) . 
+ The deletion was confirmed by sequencing . 
+ RNA was isolated from triplicate MG1655 and DarcA ( PK9416 ) strains using a hot-phenol method [ 102 ] . 
+ The RNA was reverse transcribed to cDNA , labeled with Cy3-random 9-mers and hybridized onto the Roche NimbleGen E. coli 4plex Expression Array Platform ( 4672,000 probes , Catalog Number A6697-00-01 ) as previously described [ 49 ] . 
+ The expression data was normalized using Robust Multi-Array ( RMA ) [ 103 ] and statistical analysis was performed with Arraystar III software ( DNASTAR ) . 
+ Transcripts exhibiting a statistically significant ( moderated t-test p-value ,0.05 ) change in expression greater than 2-fold were considered differentially expressed and grouped into operons using operon definitions in EcoCyc [ 47 ] if at least two of the genes in a particular operon exhibited differential expression . 
+ End product analysis
+ Samples ( 2 ml ) for end product analysis were collected during log phase , the transition to stationary phase and in stationary phase ( Figure 7A ) . 
+ Cells were removed by passage through a 0.2 mm filter and the supernatant was stored at 280uC prior to analysis . 
+ For each sample , glucose , pyruvic acid , succinic acid , lactic acid , formic acid , acetic acid , and ethanol were separated by high-performance liquid chromatography ( HPLC ) and subsequently quantified as previously described [ 104 ] . 
+ DNase I footprinting
+ Plasmids containing predicted ArcA-P binding sites were generated by PCR amplification of chromosomal DNA with primers flanked by XhoI or BamHI restriction sites and cloned into pPK7179 or pPK7035 ( for the icdA promoter ) ( Table S10 ) . 
+ The positions of the promoter fragments relative to the previously identified transcription start sites are as follows : for icdA [ 23 ] , 2216 to +65 ; for acs ( P2 ) [ 105 ] , 2172 to +44 ; for phoH ( P2 ) [ 86 ] , 2161 to +20 ; for paaA [ 106 ] , 2132 to +55 ; for astC [ 107 ] , 2166 to +62 ; for putP ( P1 ) [ 108 ] , 2120 to +56 ; for trxC [ 45 ] , 2118 to +50 ; for dctA [ 109 ] , 2185 to +32 . 
+ The icdA fragment contains two promoters : one whose expression is dependent on ArcA ( P1 ) and a second promoter whose expression is dependent on FruR ( P2 ) [ 23,110 ] . 
+ To examine icdA expression from only P1 in future expression analyses , transcription from P2 was eliminated using the site-directed mutagenesis protocol described in [ 111 ] to mutate the 210 site from cattat to cggtga . 
+ DNA fragments were isolated from pPK7179 or pPK9476 ( icdA ) after digestion with XhoI and BamHI , radiolabelled at the 39 BamHI end with [ a - P ] - dGTP 32 
+ ( PerkinElmer ) and Sequenase Version 2.0 ( USB Scientific ) , isolated from a non-denaturing 5 % acrylamide gel and subsequently purified with elutip-d columns ( Schleicher and Schuell ) . 
+ ArcA was phosphorylated by incubating with 50 mM disodium carbamyl phosphate ( Sigma Aldrich ) in 50 mM Tris , pH 7.9 , 150 mM NaCl , and 10 mM MgCl2 for 1 h at 30uC [ 24 ] and immediately used in the binding assays . 
+ Footprinting assays were performed by incubating phosphorylated ArcA with labeled DNA ( ,5 nM ) for 10 min at 30uC in 40 mM Tris ( pH 7.9 ) , 30 mM KCl , 100 mg/ml BSA and 1 mM DTT followed by the addition of 2 mg/ml DNase I ( Worthington ) for 30 s . 
+ The DNase I reaction was terminated by the addition of sodium acetate and EDTA to final concentrations of 300 mM and 20 mM , respectively . 
+ The reaction mix was ethanol precipitated , resuspended in urea loading dye , heated for 60 s at 90uC , and loaded onto a 7 M urea , 8 % polyacrylamide gel in 0.56 TBE buffer . 
+ An A+G ladder was made by formic acid modification of the radiolabeled DNA , followed by piperidine cleavage [ 112 ] . 
+ The reaction products were 
+ Data deposition
+ All genome-wide data from this publication have been deposited in NCBI 's Gene Expression Omnibus ( GSE46415 . 
+ Supporting Information
+ with the relative frequencies of each base depicted by its relative heights . 
+ The two , three and four DR element binding sites used in this figure are listed in Table S4 . 
+ ( EPS ) formate and glucose . 
+ ( C ) Concentration of succinate , ethanol and lactate . 
+ Symbols are described in the legend and error bars represent the standard deviation of three biological replicates . 
+ ( EPS ) when the ChIP-chip experiment was performed in an DarcA strain ( red ) . 
+ ( B ) Correlation of the anaerobic WT ( blue ) or DarcA ( red ) ChIP-chip signal with that for RNAP b. To construct this plot , the genome was divided into 300 bp non-overlapping bins and the maximum log2 ratio was extracted for each sample in each bin . 
+ The solid lines represent the regression lines for each data set for RNAP b log2 ratios greater than or equal to 1.75 with the corresponding Pearson correlation coefficient ( r ) indicated in the figure legend . 
+ ( C ) Correlation of the aerobic ( cyan ) and anaerobic ( blue ) ArcA ChIP-chip signal with that for RNAP beta performed as described for B. ( D ) Maximal aerobic or anaerobic ArcA log2 ratios within all 137 enriched regions that were retained in the ArcA dataset ( Table S10 ) . 
+ ( E ) Maximal aerobic or anaerobic ArcA log2 ratio within all 53 enriched regions that were eliminated from the ArcA dataset due to ArcA likely crosslinking with RNAP ( Table S9 ) . 
+ Acknowledgments
+ We thank Huihuang Yan for assistance with ChIP-seq data analysis , James Keck for providing TEV protease , Richard Gourse for providing strains and Wilma Ross for assistance with DNase I footprinting experiments . 
+ We also thank Irene Ong for assistance compiling the Delila programs and members of the Kiley lab for comments on the manuscript . 
+ Author Contributions
+ Conceived and designed the experiments : DMP AZA RL PJK . 
+ Performed the experiments : DMP . 
+ Analyzed the data : DMP . 
+ Contrib-uted reagents/materials/analysis tools : MSA AZA . 
+ Wrote the paper : DMP RL PJK . 
+ 89 . 
+ Miller JH ( 1972 ) Experiments in molecular genetics . 
+ [ Cold Spring Harbor , N.Y. ] : Cold Spring Harbor Laboratory . 
+ 90 . 
+ Davis SE , Mooney RA , Kanin EI , Grass J , Landick R , et al. ( 2011 ) Mapping E. coli RNA polymerase and associated transcription factors and identifying promoters genome-wide . 
+ Methods Enzymol 498 : 449 -- 471 . 
+ 91 . 
+ Huber W , von Heydebreck A , Sultmann H , Poustka A , Vingron M ( 2002 ) Variance stabilization applied to microarray data calibration and to the quantification of differential expression . 
+ Bioinformatics 18 Suppl 1 : S96 -- 104 . 
+ 92 . 
+ Dufour YS , Landick R , Donohue TJ ( 2008 ) Organization and evolution of the biological response to singlet oxygen stress . 
+ J Mol Biol 383 : 713 -- 730 . 
+ 93 . 
+ Bieda M , Xu X , Singer MA , Green R , Farnham PJ ( 2006 ) Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome . 
+ Genome Res 16 : 595 -- 605 . 
+ 94 . 
+ Homann OR , Johnson AD ( 2010 ) MochiView : versatile software for genome browsing and DNA motif analysis . 
+ BMC Biol 8 : 49 . 
+ 95 . 
+ Li R , Yu C , Li Y , Lam TW , Yiu SM , et al. ( 2009 ) SOAP2 : an improved ultrafast tool for short read alignment . 
+ Bioinformatics 25 : 1966 -- 1967 . 
+ 96 . 
+ Valouev A , Johnson DS , Sundquist A , Medina C , Anton E , et al. ( 2008 ) Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data . 
+ Nat Methods 5 : 829 -- 834 . 
+ 97 . 
+ Schneider TD ( 1997 ) Information content of individual genetic sequences . 
+ J Theor Biol 189 : 427 -- 441 . 
+ Escherichia 98 . 
+ Blattner FR , Plunkett G , . 
+ ( 1997 ) The complete genome sequence of coli K-12 . 
+ Science 277 : 1453 -- 1462 . 
+ 99 . 
+ Blanco AG , Sola M , Gomis-Ruth FX , Coll M ( 2002 ) Tandem DNA recognition by PhoB , a two-component signal transduction transcriptional activator . 
+ Structure 10 : 701 -- 713 . 
+ 100 . 
+ Schneider TD ( 1997 ) Sequence walkers : a graphical method to display how binding proteins interact with DNA or RNA sequences . 
+ Nucleic Acids Res 25 : 4408 -- 4415 . 
+ 101 . 
+ Datsenko KA , Wanner BL ( 2000 ) One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products . 
+ Proc Natl Acad Sci U S A 97 : 6640 -- 6645 . 
+ 102 . 
+ Khodursky AB , Bernstein JA , Peter BJ , Rhodius V , Wendisch VF , et al. ( 2003 ) Escherichia coli spotted double-strand DNA microarrays : RNA extraction , labeling , hybridization , quality control , and data management . 
+ Methods Mol Biol 224 : 61 -- 78 . 
+ 103 . 
+ Bolstad BM , Irizarry RA , Astrand M , Speed TP ( 2003 ) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias . 
+ Bioinformatics 19 : 185 -- 193 . 
+ 104 . 
+ Schwalbach MS , Keating DH , Tremaine M , Marner WD , Zhang Y , et al. ( 2012 ) Complex physiology and compound stress responses during fermentation of alkali-pretreated corn stover hydrolysate by an Escherichia coli ethanologen . 
+ Appl Environ Microbiol 78 : 3442 -- 3457 . 
+ 105 . 
+ Beatty CM , Browning DF , Busby SJ , Wolfe AJ ( 2003 ) Cyclic AMP receptor protein-dependent activation of the Escherichia coli acs P2 promoter by a synergistic class III mechanism . 
+ J Bacteriol 185 : 5148 -- 5157 . 
+ 106 . 
+ Ferrandez A , Minambres B , Garcia B , Olivera ER , Luengo JM , et al. ( 1998 ) Catabolism of phenylacetic acid in Escherichia coli . 
+ Characterization of a new aerobic hybrid pathway . 
+ J Biol Chem 273 : 25974 -- 25986 . 
+ 107 . 
+ Fraley CD , Kim JH , McCann MP , Matin A ( 1998 ) The Escherichia coli starvation gene cstC is involved in amino acid catabolism . 
+ J Bacteriol 180 : 4287 -- 4290 . 
+ 108 . 
+ Nakao T , Yamato I , Anraku Y ( 1987 ) Nucleotide sequence of putC , the regulatory region for the put regulon of Escherichia coli K12 . 
+ Mol Gen Genet 210 : 364 -- 368 . 
+ 109 . 
+ Davies SJ , Golby P , Omrani D , Broad SA , Harrington VL , et al. ( 1999 ) Inactivation and regulation of the aerobic C ( 4 ) - dicarboxylate transport ( dctA ) gene of Escherichia coli . 
+ J Bacteriol 181 : 5624 -- 5635 . 
+ 110 . 
+ Prost JF , Negre D , Oudot C , Murakami K , Ishihama A , et al. ( 1999 ) Cra-dependent transcriptional activation of the icd gene of Escherichia coli . 
+ J Bacteriol 
+ 111 . 
+ Nesbit AD , Giel JL , Rose JC , Kiley PJ ( 2009 ) Sequence-specific binding to a subset of IscR-regulated promoters does not require IscR Fe-S cluster ligation . 
+ J Mol Biol 387 : 28 -- 41 . 
+ 112 . 
+ Maxam AM , Gilbert W ( 1980 ) Sequencing end-labeled DNA with base-specific chemical cleavages . 
+ Methods Enzymol 65 : 499 -- 560 . 
+ 113 . 
+ Schneider TD , Stephens RM ( 1990 ) Sequence logos : a new way to display consensus sequences . 
+ Nucleic Acids Res 18 : 6097 -- 6100 . 
+ 114 . 
+ Neuweger H , Persicke M , Albaum SP , Bekel T , Dondrup M , et al. ( 2009 ) Visualizing post genomics data-sets on customized pathway maps by ProMeTra-aeration-dependent gene expression and metabolism of Corynebacterium glutamicum as an example . 
+ BMC Syst Biol 3 : 82 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/24244182.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/24244182.txt 0 → 100644
View file @27818a9
+ Chromosomal Domain of E. coli
+ Abstract 
+ The E. coli chromosome is compacted by segregation into 400 -- 500 supercoiled domains by both active and passive mechanisms , for example , transcription and DNA-protein association . 
+ We find that prophage Mu is organized as a stable domain bounded by the proximal location of Mu termini L and R , which are 37 kbp apart on the Mu genome . 
+ Formation / maintenance of the Mu ` domain ' configuration , reported by Cre-loxP recombination and 3C ( chromosome conformation capture ) , is dependent on a strong gyrase site ( SGS ) at the center of Mu , the Mu L end and MuB protein , and the E. coli nucleoid proteins IHF , Fis and HU . 
+ The Mu domain was observed at two different chromosomal locations tested . 
+ By contrast , prophage l does not form an independent domain . 
+ The establishment/maintenance of the Mu domain was promoted by low-level transcription from two phage promoters , one of which was domain dependent . 
+ We propose that the domain confers transposition readiness to Mu by fostering topological requirements of the reaction and the proximity of Mu ends . 
+ The potential benefits to the host cell from a subset of proteins expressed by the prophage may in turn help its long-term stability . 
+ Introduction
+ Bacterial chromatin is spatially organized and condensed ,1000 fold to fit inside a bacterial cell [ 1 ] . 
+ Referred to as a nucleoid , E. coli chromatin is organized in a series of negatively supercoiled loops [ 2,3 ] , segregated by dynamic domain barriers ( defined as entities that prevent the free diffusion of supercoils ) and compacted by several nucleoid-associated proteins ( NAPs ) including HU , IHF , Fis and H-NS [ 4,5 ] . 
+ The chromosome is not randomly condensed , but rather has a ring organization with four structured macro-domains and two less-structured regions ; interactions between these regions are highly restricted as determined by cytological and genetic analyses [ 6 ] . 
+ Macro-domains are thought to orchestrate chromosome movements during the cell cycle [ 7 ] . 
+ Supercoiling not only plays a vital role in compacting the chromosome , but a proper degree of supercoiling is crucial for all DNA-related processes [ 1 ] . 
+ Segregation of supercoils into topological domains protects these processes by preventing DNA breaks from relaxing the entire chromosome [ 2,8 ] . 
+ The level of DNA superhelicity is tightly controlled by the combined activities of topoisomerases and histone-like proteins ; the latter not only constrain negative supercoils and generate diffusion barriers for the formation of topological domains , but are also global regulators of gene transcription [ 4,9 -- 11 ] . 
+ The critical importance of DNA supercoiling in interconnecting chromosome structure and global gene transcription was reinforced in recent evolution experiments where supercoiling was observed to be under strong selection in E. coli populations [ 12 ] . 
+ Transposable phage Mu is a temperate phage that integrates into essentially random locations on the E. coli chromosome [ 13 -- 15 ] . 
+ Transposition from mini-Mu plasmids in vitro requires DNA supercoiling for formation of a high-order transpososome within which the two Mu ends are interwound and synapsed [ 13,16 ] . 
+ Supercoiling is inferred to be similarly important in vivo [ 17 ] , where Mu end pairing during replicative transposition additionally requires a centrally located gyrase binding site SGS [ 18 ] . 
+ This site is the strongest such site studied , and is found only in Mu-like prophages [ 19 ] . 
+ Highly processive supercoiling by gyrase bound at the SGS has been proposed to propagate a supercoiled loop , with SGS at the apex , assisting transposase-mediated synapsis of Mu 
+ In vitro studies found that the Mu transposase mediates an ordered interaction of three cis-acting sites - Mu L and R ends and an enhancer element E -- which traps five supercoils before pairing the L and R ends in their reactive configuration [ 21 ] . 
+ The original goal of the present study was to test if the three sites interact in a similar order in vivo upon initiation of transposition . 
+ The in vitro studies employed Cre recombinase-mediated exchange at two strategically placed loxP sites to determine the topology of interactions between a given Mu site and the other two [ 16 ] . 
+ Using this same strategy in vivo , we found to our surprise that the ends were already paired in a prophage , and that the transposase was not essential for their pairing . 
+ We have investigated the basis of this pairing using both the Cre-lox system as well as the 3C crosslinking system [ 22,23 ] . 
+ We show that Mu SGS is important for Mu end pairing , that other Mu cis and trans factors and several host NAPs contribute as well , and that the MuB protein , expressed at a low level in the prophage , likely provides a NAP-like function . 
+ We discuss the implications of this work for the maintenance of large selfish DNA elements on a bacterial genome . 
+ Results
+ The two ends of the 37 kbp Mu prophage genome behave as if they are paired
+ The Cre-loxP site-specific recombination system has simple requirements , needing only two loxP sites ; neither additional cofactors nor special DNA topology is required [ 24 ] . 
+ Cre recombinase can carry out both DNA inversion and deletion equally well , depending on the relative orientation of its loxP target sites [ 25 ] . 
+ The synapsis of loxP sites occurs by random collision , therefore the frequency of recombination between these sites will indicate their spatial proximity [ 26 ] . 
+ This property of Cre is used here to estimate the distance between recombining loxP sites engineered within the E. coli chromosome . 
+ Other site-specific recombinases have been similarly used in the past [ 6,27 ] . 
+ The experimental strategy for assessing Cre recombination efficiency is diagrammed in Figure 1A , using as an example loxP sites flanking a Mu prophage . 
+ All assays in this study used the deletion reaction i.e. loxP pairs were configured in a direct orientation . 
+ Cre recombinase was provided from a plasmid , and reaction conditions optimized as described under Materials and Methods ( Figure S1 ) . 
+ After recombination , the intervening DNA between the loxP sites will be excised , leaving one loxP site on the chromosome and the other site in the excised product ; the latter will be lost during cell growth . 
+ As diagrammed in Figure 1A , the amounts of substrate and product , both chromosomal , were assessed by qPCR after amplification across the loxP sites with appropriate primers . 
+ Recombination efficiency ( RE ) was calculated as the ratio of the recombination product to the starting substrate as described in Materials and Methods . 
+ The distance-dependence of loxP recombination on the E. coli chromosome was first assessed by varying the distance between a pair of directly oriented loxP sites from ,190 bp to 37 kbp engineered within the malF locus ( Figure S2A ) . 
+ This locus was chosen because the Mu prophage we wished to monitor in later experiments was located there . 
+ In the loxP-engineered strains , the log of loxP RE decreased linearly over distance , giving a first order decay function ( Figure 1B and inset ; see also Figure S2D ) . 
+ loxP sites were next engineered on either side of a malF : : Mu lysogen , ,70 bp outside each L and R end as shown in Figure 1A ( Figure 1C , this wild-type loxP-Mu-loxP construct in ZL524 is labeled MU throughout ) . 
+ The RE of the MU loxP sites ( set at 1 ) was closest to the loxP-site pair placed 190 bp apart within malF in the non-Mu strain ( Figure 1C , malF ) . 
+ To control for recombination at distances similar to the length of the Mu prophage around this region of the chromosome , loxP sites were placed 37 kbp upstream ( yjcF ) or downstream ( purH ) of a loxP site in malF in the non-Mu strain ( see Figure S2A ) . 
+ The RE of both these loxP pairs was similar ( Figure 1C ) , and reflected their linear distance as determined from the graph in Figure 1B . 
+ We conclude that reduction of the linear 37 kbp distance between the L and R Mu ends to a distance equivalent to 190 bp as measured by Cre recombination , is indicative of some form of ` synapsis ' of the Mu ends . 
+ The centrally located strong-gyrase-site (SGS) within prophage Mu is important for end-synapsis
+ Central location of SGS ( Figure 1A ) is obligatory for optimal replication of Mu after prophage induction [ 18 ] . 
+ Deletion of this site results in inefficient pairing of Mu ends in vivo , as judged by a low efficiency of transposase-mediated 39 nicking at the ends [ 28,29 ] . 
+ Pato and colleagues have proposed that the requirement for SGS in vivo but not in vitro ( where the distance between the Mu ends is typically ,2 kbp on mini-Mu plasmid substrates ) , is an adaptation for aiding synapsis of reactive sites located at large distances [ 20 ] . 
+ We therefore tested the importance of both the presence and position of SGS on the RE of loxP sites flanking Mu . 
+ Deletion of the central SGS decreased loxP recombination 30-fold , while an asymmetric location of SGS to the left ( L ) or right ( R ) of center in separate strains showed 7 - and 3 - fold reduction , respectively , compared to wild-type ( Figure 2A ; the RE values of 0.14260.086 and 0.30360.11 for the SGS ( L ) and SGS ( R ) prophage strains are not significantly different at p ,0.01 ) . 
+ Reduction of the Mu genome by 20 kbp upon introducing symmetrical 10 kbp deletions on either side of SGS ( i.e. genome size of 17 kbp ) , still showed Mu end synapsis on the smaller Mu genome , which was disrupted by SGS deletion ( Figure 2A ) . 
+ A similar reduction of effective distance was not observed when an SGS site was engineered at the center of E. coli DNA segments ranging from 5 -- 37 kbp , each flanked by loxP ( Figure S2B , C ) . 
+ This shows that the SGS site alone is not sufficient to synapse distant loxP sites ; Mu sequences are required in addition . 
+ Because the effect of SGS is proposed to be mediated via gyrasepromoted supercoiling , the temperature sensitive gyrase allele gyrB402ts was introduced into a Muc lysogen , which is not + temperature inducible . 
+ Since gyrase temperature-sensitive mutants are reported to have decreased supercoiling even at the permissive temperature [ 30 ] , loxP REs were measured at both permissive ( 26uC ) and non-permissive ( 37uC ) temperatures . 
+ loxP sites behaved as if they were unpaired only at the non-permissive temperature ( Figure 2B ; the wild-type loxP-Muc - loxP construct is labeled Mu ) . 
+ + We conclude that DNA supercoiling is important for reducing the distance between the Mu ends , that SGS plays a critical role in this process when located centrally either on a 37 kbp or a 17 kbp Mu genome , but that SGS does not similarly contribute when located within non-Mu E. coli DNA . 
+ These results support the Mu end-pairing function of SGS as deduced by the transposition / replication results of Pato and colleagues . 
+ However , our data were derived in the absence of prophage induction i.e. presumably in the absence of the transpososome proposed to stabilize the synapsed ends . 
+ SGS-mediated Mu DNA synapsis does not extend far outside the Mu ends
+ To test whether Mu ends define the base of the SGS-mediated Mu DNA synapsis loop , we moved the loxP-site pairs symmetrically from 5 kbp inside Mu ( In-loxP ) to 5 -- 25 kbp outside Mu ( OutloxP ) in separate strains ( Figure 3A ) . 
+ RE of the internal loxPs ( In-5 kbp ) was similar to wild-type MU , while that of external loxPs [ Out-5 kbp ( 0.32160.12 ) and Out-10 kbp ( 0.1860.07 ) ] decreased 3 -- 5 fold over 5 and 10 kbp distances , which is a total linear distance of 47 and 57 kbp between the loxP pairs , respectively ( Figure 3B ) . 
+ Synapsis was no longer evident between the Out-25 kbp pair ( 87 kbp linear distance ) , as determined by .40 fold lower RE values ( 0.02360.011 ) compared to wild-type MU . 
+ We conclude that the SGS effect extends 5 -- 10 kbp outside Mu ends into the flanking E. coli DNA ; beyond this length , the RE is reflective of the linear rather than paired distance between the loxP sites ( see standard graph in Figure 1B ) . 
+ 3C reveals an interaction between Mu prophage ends
+ In 3C analysis , protein-protein and protein-DNA crosslinking by formaldehyde is used to permanently capture interactions between two genomic loci [ 22 ] . 
+ After appropriate restriction enzyme digestion and ligation at low DNA concentration , the suspected junctions can be probed by PCR using locus-specific primers . 
+ While this methodology has been widely used to generate DNA contact maps in eukaryotic cells [ 31 ] , it is only beginning to be used in bacteria [ 23,32 ] . 
+ We applied this strategy to test the proximity of Mu L and R ends predicted from the Cre-loxP recombination assay . 
+ To assay DNA interactions both inside and outside Mu , we used digestion at PstI and EcoRI sites , respectively , whose positions in malF : : Mu are shown in Figure 4A . 
+ The PstI sites closest to the L and R ends inside the Mu genome are ,27 kbp apart , whereas the closest EcoRI sites outside the Mu genome are ,52 kbp apart ( Lproximal site is ,13 kbp upstream and R-proximal site is ,2 kbp downstream of the prophage location ) . 
+ Products of the expected size were detected for both PstI and EcoRI joints ( Figure 4B , arrowheads ) in a crosslinking-ligation dependent manner . 
+ Their identities were confirmed by DNA sequencing . 
+ No PCR product corresponding to the joining of DNA cut at two EcoRI sites within Mu was detected , possibly due to cross-linked proteins interfering with ligation . 
+ The ligation products obtained in Figure 4B were quantified further by qPCR and compared to similar reaction products from an isogenic DSGS strain ( Figure 4C ) . 
+ Crosslinking efficiency is defined as the ratio of qPCR signal from the products of the DSGS strain compared to those from its wild-type parent . 
+ Cross-linking efficiency of PstI ends within Mu was 4-fold ( 0.24760.11 ) higher in the presence of SGS compared to its absence ( Figure 4C , left ) , and that of EcoRI ends outside Mu was 2.5 fold ( 0.41460.106 ) higher under similar conditions ( Figure 4C , right ) . 
+ The folddifferences in crosslinking efficiencies versus recombination efficiencies , of wild-type MU and its DSGS derivative ( Figure 4C vs Figure 2A ) , are likely due to differences in methodology . 
+ Background ( non-specific ) levels of signal generated by unligated but crosslinked samples ( lanes 4 and 8 in Figure 4B ) are shown in each panel ( NL ) , along with similar controls for DNA around the malF locus in a strain where Mu had been excised ( Figure 4C , DMu ) . 
+ For additional and independent quantitation of the crosslinked product , we subjected it to Cre-loxP recombination in vitro using a titrated amount of Cre . 
+ While the in vitro efficiency can not be directly compared to the in vivo efficiency , at the lowest Cre concentration required to observe ,30 % recombination in the wild-type crosslinked substrate , the RE of loxP sites in the DSGS substrate was 3-fold ( 0.35160.11 ) lower than the wild-type ( Figure 4D ) . 
+ Taken together , these results are an independent confirmation of the close spatial proximity of Mu ends in the Mu prophage and the important contribution of SGS to this arrangement . 
+ We shall henceforth refer to this apparent Mu-loop 
+ Given that the SGS effect is Mu-specific , we wondered if the L and R ends of Mu are important for closing the Mu loop at its base , and if so , whether the Mu transposase ( A protein ) , which binds to the Mu ends , is expressed in the prophage . 
+ We therefore individually deleted the L and R ends as well as the MuA and MuB genes from the prophage ( MuB regulates MuA function alloste-rically ; [ 13 ] ) . 
+ Deletion of the L end showed a 15-fold ( 0.06660.049 ) reduction of RE , but deletion of the R end had no significant effect ( Figure 5A ) . 
+ Deletion of the MuA gene had no effect , but deletion of MuB had an 8-fold effect ( 0.12560.064 ) ( Figure 5B ) . 
+ To confirm the B gene deletion result , MuB was supplied to this strain from a plasmid [ pMuB ( pJG8 ; [ 15 ] ) ] ; RE levels were restored to wild-type in this strain . 
+ The nonrequirement for the R end and for MuA , but the requirement for MuB , in Mu domain formation/maintenance , will be discussed later . 
+ MuA and B genes are expressed from the early lytic Pe promoter , expected to be repressed in a prophage ( Figure S3A ; [ 33 ] ) . 
+ To determine if MuB was expressed in the lysogen , we engineered into the prophage genome a functional EGFP-MuB fusion [ 34 ] . 
+ Low-level expression of MuB was detected in all the cells in this strain in the absence of Mu induction ( Figure S3B ) . 
+ 59-RACE-PCR experiments showed transcripts originating from Pe , as well as from a second site internal to MuA , which we have named Pe * ( Figure S3C ) . 
+ Pe * has significant homology to the sigma 70 sequence ( Figure S3D ) . 
+ Deletion of each of these promoters had a small effect on EGFP-MuB expression as measured by fluorescence , but deletion of both promoters eliminated expression ( Figure 5C ) . 
+ Note that either promoter deletion will render the strain uninducible for Mu lytic growth . 
+ A control EGFP fusion to a late Mu gene E expressed during lytic growth , showed no fluorescence in the prophage ( Figure 5C ) . 
+ Thus , the transcription results can not be due to spontaneous Mu induction in a subpopulation of cells , because all cells expressed EGFP-MuB , none expressed EGFP-MuE , the Pe deletion interrupts the lytic growth program , and the Pe * deletion does the same by disrupting the MuA gene . 
+ To test if the low-level expression of MuB from Pe and Pe * was related to the Mu domain , we monitored EGFP-MuB expression in a DSGS strain . 
+ Deletion of SGS diminished EGFP-MuB fluorescence compared to its wild-type parent . 
+ To test if this reduction was specific to either Pe or Pe * we measured MuB-EGFP fluorescence in the DSGS strain carrying separate Pe or Pe * deletions ( Figure 5C ) . 
+ Expression from Pe was the most impacted by deletion of SGS . 
+ Single or double deletion of Pe and Pe * led to a reduction in the RE of loxP sites flanking the wild-type Mu strain , with the double-promoter deletion showing a value similar to that of MuB gene deletion ( Figure 5D , compare with Figure 5B ) . 
+ As described above , the Pe-Pe * deletion disrupts the MuA gene as well as early transcription , thereby eliminating Mu lytic development . 
+ This allowed us to delete the Mu c gene encoding the lysogenic repressor Rep in the Pe-Pe * deletion strain , in order to test its contribution to the Mu domain ( see Figure S3A ) . 
+ The data showed that absence of Rep caused an additional 3-fold decrease in RE ( 0.05360.028 ) ( Figure 5D ) . 
+ We conclude that the L end is important but that the R end is dispensable to the Mu domain . 
+ MuB , but not MuA , is important for domain formation/maintenance , and Rep likely contributes as well . 
+ Low levels of MuB are expressed in the lysogen from both Pe and a newly identified promoter Pe * . 
+ The Mu domain configuration is important for transcription from Pe but not Pe * . 
+ Thus activity of Pe is domain-dependent , while that of Pe * is not . 
+ Cellular NAPs are critical for maintenance of the Mu domain
+ E. coli NAPs such as H-NS , IHF , FIS , and HU are implicated in maintaining chromosomal supercoiled domains via their DNA bending and bridging properties . 
+ In contrast to the major E. coli 
+ NAPs , which were found largely scattered throughout the nucleoid , H-NS was reported to form two compact clusters per chromosome , sequestering and juxtaposing into these clusters numerous H-NS regulated DNA segments distributed throughout the chromosome ; deleting H-NS led to substantial chromosome reorganization [ 23 ] . 
+ IHF , FIS and HU have in addition , specific binding sites on the Mu genome , from where they exert effects on Pe transcription , G-segment recombination , and transposition [ 13 ] ( Figure 6A ) . 
+ To determine the importance of these NAPs to the Mu domain , we assayed for changes in RE of loxP sites flanking Mu in strains individually deleted for genes expressing these proteins ( Figure 6B ) . 
+ Absence of H-NS had no effect on the Mu domain . 
+ Absence of HU and of Fis had 12 to 30-fold effects , respectively ( 0.08460.054 and 0.03460.025 ) . 
+ The strongest effect was observed in the absence of IHF , which essentially abrogated the Mu domain . 
+ That NAP deletions do not affect Cre recombination per se was controlled for by simultaneously monitoring the recombination of loxP site pairs placed outside the Mu domain in both wild-type and DNAP strains ( see Materials and Methods ) . 
+ To test if the effects of IHF , Fis and HU were exerted at the specific binding sites for these proteins on the Mu genome , we deleted these sites individually within the prophage . 
+ The dramatic reduction in RE seen in the IHF mutant was not observed with deletion of the IHF binding site ( Figure 6B ) . 
+ However , the 6-fold effect observed ( 0.1760.086 ) , could be due to the negative effect on Pe transcription from deleting the IHF site [ 35 -- 38 ] . 
+ Deletion of the HU-binding site at the Mu L end had a small effect , while deletion of the Fis-binding enhancer site sis ( Dsis ) had no effect . 
+ Recent experiments have identified a set of three Fis-binding sites within the promoter region of the mom gene near the R end [ 39 ] . 
+ A deletion spanning all three sites ( DPmom ) also had no effect . 
+ However , a combination of sis-attR or sis-Pmom-attR deletions reduced RE 5 -- 7 fold ( 0.20960.091 and 0.14360.068 ) . 
+ We conclude that IHF , HU and Fis affect the Mu domain configuration primarily via their global effects on chromosome structure . 
+ The domain organization is unique to Mu, and is not observed for prophage l
+ The four structured macro-domain regions of the E. coli chromosome are called Ori , Right , Ter , and Left ; the two less or non-structured regions ( NS ) are located on either side of Ori [ 7 ] ( Figure 7A ) . 
+ The malF : : Mu prophage used in all the experiments thus far is located in the Ori macro-domain . 
+ To determine if the Mu domain organization is specific to its location in a macro-domain , we also tested for its presence in a lacZ : : Muc prophage + located in the less-structured NS region between Ori and Right . 
+ The results were similar to those seen with the malF : : Mu prophage , with similar negative effects of deletion of SGS or absence of IHF on the domain structure ( Figure 7B , compare to similar data in Figures 2A and 6B ) . 
+ Absence of H-NS had no effect at this location as well . 
+ If the domain organization of Mu were designed to pre-engage Mu ends in a transposition-ready mode for lytic growth , might a similar arrangement be expected for other prophages that depend on pairing of their ends at the start of lytic growth ? 
+ Prophage l is an example of an insertion element which must pair its attL and attR ends for excisive recombination from the E. coli chromosome [ 40 ] . 
+ The attB insertion site of l is in the Right macro-domain ( Figure 7A ) . 
+ To determine if l prophage ends were paired , loxP sites were engineered outside the 48.5 kbp l genome , as done for Mu . 
+ In contrast to Mu , the Cre RE of these sites reflected the linear distance between the l ends ( Figure 7B ) . 
+ Thus , domain organization does not occur for phage l , and may be specific for 
+ Discussion
+ The 37 kbp Mu prophage domain we report in this study , inferred from Cre recombination and supported independently by a crosslinking assay , is the largest stable chromosomal domain in E. coli mapped to date . 
+ ` Stable ' implies that the configuration is long-lived enough to be consistently detected by both genetic recombination and biochemical crosslinking . 
+ This sets the Mu domain apart from the dynamic configuration of the 400 -- 500 supercoiled domains that condense the E. coli nucleoid [ 4 ] . 
+ While the nucleoid exhibits different degrees of compaction around the circular genome referred to as macro-domains [ 41,42 ] , and distant chromosomal regions within these macro-domains cluster via H-NS [ 23 ] , the specific supercoiled domain adopted by the Mu prophage is unique in that it represents a more or less permanent feature of the E. coli chromosome . 
+ The formation/maintenance of the Mu domain requires contributions from both prophage and its 
+ Cre-loxP recombination as a reporter for chromosomal domains
+ The organization of the bacterial chromosome into supercoiled domains has been studied earlier using different strategies : trimethylpsoralen binding , electron microscopy , transcription of supercoiling sensitive genes , or site-specific recombination by Res and Int [ 2,3,43,44 ] . 
+ The Res system initially yielded an average domain size of 25 kbp for a non-essential region spanning ,100 kbp of the Salmonella typhimurium genome [ 43 ] . 
+ Recombination efficiency was found to decrease linearly with the distance between target res sites over the entire region analyzed . 
+ Since the topological constrains of the resolvase reaction requires the res sites to be housed within the same supercoiled domain [ 45 ] , it was concluded that topological domains are dynamic , with stochastically distributed end points . 
+ Using an improved system in which the Res protein was designed to have a shorter half-life , the average domain size in Salmonella was re-calculated to be approximately 10 kbp [ 27 ] . 
+ A domain size of 10 kbp agrees with results obtained in E. coli , where measurements from electron microscopy and the spread of DNA relaxation from double strand breaks were used to estimate a domain size centered around 10 kbp , within a 2 -- 66 kbp range [ 3 ] . 
+ When prophage l attL and attR sites , which can recombine within an Int synapse arranged by random collision [ 40,46 ] , were placed all over the chromosome , Int recombination efficiencies suggested that accessibility of individual loci was not uniform in different regions of the E. coli and Salmonella chromosomes [ 6,44 ] . 
+ The macro-domain organization of the E. coli chromosome deduced from these and cytological studies is shown in Figure 7A [ 6 ] . 
+ The utility of the Cre-loxP system in probing chromosomal domains stems from the simple requirements of Cre recombination , the ease of integrating loxP sites at desired chromosomal locales , and the in vivo distance-dependence of the reaction revealed in this study . 
+ The ,7 kbp value of the slope derived for Cre recombination ( Figure 1B , inset ) , which is the distance at which there is a 50 % probability that barriers to supercoil diffusion exist , defines a domain size in this region of the E. coli chromosome in reasonable agreement with the average 10 kbp domain estimate of Postow et al. [ 3 ] . 
+ Since the Mu genome is much larger than the average E. coli chromosomal domain , efficient Cre recombination at loxP sites placed at the extremities of Mu would be consistent with these sites being contained within a domain . 
+ We note that although loxP recombination by Cre is analogous to attL-attR recombination by l Int in not requiring negative supercoiling and in following the random collision mechanism , the distance-dependence of recombination frequencies observed for the two systems in vivo can not be strictly compared because of many differences in experimental conditions such as bidirectional ( Cre ) versus unidirectional ( l Int ) configurations of the recombining sites , differences in recombinase levels and reaction times , different methods for estimating REs ( colony color , PCR , qPCR ) , and differences in growth media and growth conditions [ 6,44 ] . 
+ Sequestration of Mu into an independent supercoiled domain: Mu and host factors
+ The domain organization of prophage Mu , anchored by Mu L and R ends , was observed at two structurally different regions of the E. coli chromosome ( Figures 1 -- 4 , and Figure 7 ) . 
+ The pairing of the neighboring DNA arms was seen to extend 5 -- 10 kbp outside Mu into the E. coli DNA ( Figure 3 ) . 
+ Both SGS and gyrase were critical to domain integrity ( Figure 2 ) . 
+ The requirement of SGS for the formation of the prophage domain is consistent with the role for SGS originally proposed in promoting Mu end synapsis for transposition [ 18,47 ] . 
+ In each case , the gyrase-mediated processive supercoiling initiated at the center of Mu may be cemented at the L and R ends by either the transpososome or by boundary proteins and the Mu repressor ( see below ) to establish two functionally distinct DNA domains . 
+ The Mu L end and the MuB protein , but not the R end and the MuA transposase , were required for formation of the prophage domain ( Figure 5A , B ) . 
+ The non-requirement of the R end is puzzling . 
+ While the L and R ends have specific binding sites for MuA and for lysogenic repressor Rep [ 48 ] , they also have an AT-rich character [ 49 ] . 
+ We speculate that while the R end normally is required , other AT-rich elements can substitute in its absence . 
+ This possibility is suggested by the observation that an effect of R-end deletion on the Mu domain is only manifested when combined with deletion of AT-rich Fis-binding sites near this end ( Figure 6B ) . 
+ In the E. coli genome , AT-rich elements or A-tracts are overrepresented and distributed ` quasi-regularly ' with a 10 -- 12 bp periodicity throughout the genome , organized in ,100 bp long clusters [ 50 ] . 
+ Such elements have been proposed to constitute a ` structural code ' for DNA compaction via NAP binding . 
+ Thus , in the absence of the R end , synapsis of Mu termini could be assisted by NAPs . 
+ Similarly , absence of the MuA transposase could be compensated for by the presence of the lysogenic repressor Rep , which shares sequence homology with the transposase and binds to Mu ends [ 48,51,52 ] ( Figure 5D ) . 
+ The requirement for MuB in domain organization/stability is consistent with the detectable but low level domain-dependent transcription from the early lytic promoter Pe and activity of a domain-independent promoter Pe * in the prophage state ( Figure S3 and 5C ) . 
+ Deletion of the IHF site is expected to impact Pe transcription ( Figure 6 ) . 
+ How might MuB assist the Mu domain ? 
+ MuB is known to have a binding preference for AT-rich sequences [ 53,54 ] and can compete with NAPs for such sequences [ 15 ] . 
+ The distribution of EGFP-MuB fluorescence throughout the cell is noteworthy in this context ( Figure S3B ) . 
+ A potential prophagespecific NAP-like activity would be a novel role to be identified for this multifunctional protein , which affects target site selection , modulates transposase activity and promotes target site immunity during transposition [ 13 ] . 
+ We note that an earlier Res recombination study in Salmonella using a Bam Mu prophage , did not detect the Mu domain we report here [ 55 ] . 
+ It is likely that the lack of MuB impacted the results , including the possibility that the Bam Mu prophage used in that study carried in addition a large insertion of an Amp-Lac segment near the R end that might have disrupted the symmetry of the SGS site ; additionally , Salmonella and E. coli chromosomes show differences in supercoiling [ 56 ] . 
+ The most abundant NAPs in E. coli - Fis , HU , H-NS and IHF-engender at least partially overlapping functions , as absence of any one of these proteins results in rather subtle phenotypes [ 4,57 ] . 
+ The formation of a stable Mu domain , however , was essentially abrogated by the absence of IHF , and was strongly impeded by the absence of Fis or HU ; lack of H-NS had no effect on domain establishment ( Figure 6 ) . 
+ Although IHF regulates MuA and MuB levels from Pe [ 35 -- 38 ] , and Fis has been implicated in regulating Rep levels [ 58,59 ] , while HU is required for Mu transposition [ 60,61 ] , the magnitude of the effects of deleting the genes for these proteins was far greater than deleting their known binding sites . 
+ Thus , IHF , HU and Fis proteins appear to facilitate Mu domain formation independently of their site-specific interactions within the Mu genome relevant to transposition or to inversion of the G-segment via site-specific recombination [ 13 ] . 
+ Rather , the process is likely assisted by their global role in nucleoid organization . 
+ A model for the Mu prophage domain: Functional implications
+ The model for the Mu prophage domain that we propose ( Figure 8 ) incorporates our present findings with the earlier proposal of Pato and colleagues [ 20 ] . 
+ According to this model , the processivity of gyrase bound to the SGS site located at the center of the Mu genome , helps to align the left and right arms and promote synapsis of the L and R termini . 
+ End-binding proteins such as the transposase had been proposed earlier to seal the Mu loop and stabilize the synapse . 
+ In the prophage state , it is likely that the Mu repressor Rep rather than MuA is involved in end pairing , since more repressor molecules are expected to be present . 
+ This would explain why deletion of MuA had no effect on the Mu domain ( Figure 5B ) . 
+ However , Rep has a lower affinity for the ends compared to the MuA [ 48 ] . 
+ Since MuA must be expressed at a low level from the Pe transcript , a scenario where both proteins contribute to closing the Mu loop is also a plausible mechanism for ensuring domain stability . 
+ MuB likely serves as a NAP [ 15 ] , assisting cellular NAPs such as FIS and IHF in stabilizing the Mu domain . 
+ Other essential NAPs such as SMC proteins could also be involved in maintaining the Mu domain [ 62 ] ( Figure 8 ) . 
+ The domain configuration promotes basal level transcription from Pe , ensuring domain maintenance via MuB . 
+ Like Mu , prophage l depends on pairing of its attL and attR termini for excision prior to entry into lytic growth . 
+ Yet a domain organization was not detected for l ( Figure 7 ) . 
+ The contrasts between Mu and l in the architecture of their prophage genomes perhaps reflect the topological and mechanistic distinctions of the transposition and recombination reactions , respectively , which set forth each phage on the lytic path . 
+ A closed supercoiled domain may not offer a special advantage to l excision as the Int bound attL and attR sites find each other by random collision , even when present on unlinked DNA molecules [ 46 ] . 
+ In contrast , the Mu synapse is arranged by an ordered series of interactions between three sites -- the L/R ends and the enhancer E -- all of which must be present in cis , on the same DNA molecule [ 13 ] . 
+ The first Mu interactions between E and R , which subsequently engage L to generate an LER synapse , trap five DNA supercoils within the synapse [ 63 ] . 
+ The organization of this highly specific topological filter , that presages the chemical steps of Mu transposition , would be aided by the SGS-assisted formation of a self-contained supercoiled domain [ 16,64 ] . 
+ Prophages , despite being largely repressed in gene activity , are major contributors of genome diversity in some bacterial species [ 65,66 ] . 
+ Many of these prophages appear to be in a state of mutational decay . 
+ Both intact and defective prophages can contribute important biological properties to their bacterial hosts . 
+ The best example of ` fixation ' of defective prophage genes are Shiga toxin genes in Shigella dysenteriae , sopE2 in Salmonella enterica , sspH and pertussis-like toxin genes in Salmonella typhi [ 66 ] . 
+ Some of the glycosyl transferase ( gtr ) genes of Salmonella may be another such example [ 67 ] . 
+ The genes for virulence factors and antibiotic resistance often carried by prophages , add to the fitness of their hosts , thereby ensuring long-term selfpropagation as well [ 68 ] . 
+ In prophage l , a small subset of its genes - rex , lom and bor - transcribed at a low level in the prophage , are involved in conferring on the host bacterium resistance to lytic phages or to serum [ 69,70 ] . 
+ In this context , the low level transcription we observe from Pe may also confer some advantage to the host ( Figures 5C and S3B , C ) . 
+ This transcript encompasses a number of ` semi-essential ' ( SE ) genes whose functions are largely unknown ( Figure S3A ) . 
+ A subset of such genes could potentially benefit the host . 
+ The gem gene in the SE region has been reported to modulate host ligase and gyrase functions even in a lysogen [ 71 ] , and gam encodes an orthologue of the eukaryotic protein Ku , which participates in double-strand break repair [ 72 ] . 
+ Pe * , whose activity is domain independent , could serve as a back-up promoter for the expression of MuB and other SE genes during brief periods when the Mu domain is disrupted , either stochastically or for functional reasons such as chromosome replication . 
+ We suggest that the positioning of SGS within the Mu genome and the SGS-induced structuring of the Mu domain are beneficial to both the phage and the host , in keeping the former transposition ready and in providing the latter the NAP-like MuB protein and proteins such as gem and gam that function in cellular physiology . 
+ The proposal that the Mu prophage confers a fitness advantage is supported by earlier competition assays under glucose-limiting conditions , that demonstrated a selective advantage for Mu and other prophage containing bacteria over their prophage-free counterparts [ 73 ] . 
+ Summary
+ This is the first report of a distinct organization of a prophage genome into a stable supercoiled ` loop ' structure which we call the Mu domain . 
+ A closed-loop structure of the Mu genome , where synapsis of prophage ends is assisted by processive supercoiling by gyrase bound to a centrally located SGS site , had been proposed earlier to aid the transposition of Mu during its lytic phase of growth . 
+ The novel result we report is that the Mu domain exists even in the quiescent prophage state , and requires in addition to 
+ SGS , several Mu proteins and host NAPs for its formation / maintenance . 
+ The Mu domain regulates an early promoter that controls the expression of several genes , many with unknown functions ; known functions include DNA transposition , DNA repair and nucleoid structure maintenance . 
+ A domain structure likely benefits the prophage by holding its ends in a transpositionready configuration , and benefits the host by providing extra housekeeping functions . 
+ The latter proposition can be tested in long-term evolution experiments , where a Mu lysogen with an intact SGS site would be expected to outcompete one without an SGS site . 
+ Materials and Methods
+ Strain construction
+ All strains used in this work were derivatives of E. coli K-12 and are listed in Table 1 . 
+ Plasmids are listed in Table 2 . 
+ Gene disruptions , substitutions , deletions and insertions on the chromosome were made using the phage l red-mediated homologous recombination methodology [ 74,75 ] . 
+ Position of mutations is listed in Table S1 . 
+ Primer sequences are listed in Table S2 . 
+ In strains with the temperature-inducible Mucts prophage , all incubation steps were at 30uC . 
+ All gene deletions with the kanamycin gene kan replaced the start codon to the stop codon of the gene to be deleted by amplifying the flanking regions of the gene using primers with 50 nt homology extensions using pKD4 as the kan template , and selecting for kanamycin resistance ( 50 mg/ml ) . 
+ For Mu A , B and c gene deletions , the 1.6 kbp kan cassette was retained at the site of deletion in order to maintain symmetry of the SGS site . 
+ Other deletions were created by a two-step procedure : First , the DNA to be deleted was replaced by a dual selection cassette - either cat-sacB ( amplified from strain SIMD30 ; [ 76 ] ) or kan-ccdB ( amplified from pKD45 , where the ccdB gene is under a rhamnose-inducible promoter ; [ 77 ] ) into the sequence to be replaced . 
+ Selection for the cassettes was on chloramphenicol ( Cam ) ( 100 mg/ml ) or kanamy-cin . 
+ Next , the cassettes were replaced by homologous recombination with appropriate DNA to create the desired mutation , selecting on either LB plates supplemented with 6 % sucrose or 0.5 % rhamnose to eliminate cat-sacB or kan-ccdB , respectively . 
+ The + gyrase mutant in the Muc lysogen was made by moving the gyrB402ts allele [ 78 ] flanked by a cat-sacB cassette from ZL940 into R ZL911 by P1 transduction , selecting for Cam . 
+ loxP sites flanking l prophage were created by inserting them on either side of the l integration site in attB site prior to lysogenization with lcI857 ( Cam ) obtained from SIMD30 . 
+ All constructs were confirmed by DNA sequencing . 
+ In the DHU strain , hupA is deleted [ 79 ] . 
+ In the DIHF strain , himA is deleted [ 80 ] . 
+ EGFP fusions at the N-termini of MuB and MuE were constructed by amplifying the corresponding genes from MP1999 and cloning them into BglII - SalI restriction enzyme sites on plasmid pEGFP-C1 ( Clontech ) , which generated a 5-amino acid intervening linker SGLRS . 
+ The fused genes were transferred back into the Mu prophage in MP1999 by the l Red recombination methodology . 
+ A rhamnose-inducible Cre expression vector was constructed by amplifying the gene for this recombinase from pBAD24-his-Cre plasmid and cloning into SalI -- XbaI restriction enzyme sites within the rhaTRS locus on plasmid pRHA113 [ 81 ] , to generate plasmid pRHA113-Cre , where Cre expression is driven from the rhaT promoter . 
+ Cre recombination in vivo
+ For experiments using Cre expressed from the pBAD24-Cre ( Ara ) plasmid [ 82 ] , M9 glucose minimal media were used in the in vivo recombination assay because in LB media , basal level leaky expression of Cre from this plasmid resulted in complete recombination by the time the cultures were grown up after plasmid transformation . 
+ For plasmid transformation , overnight ( O/N ) cell cultures in LB were diluted 1:100 into 20 ml of the same media , and grown at 30uC for 4 -- 5 hr until OD600 reached 
+ 0.6 . 
+ They were washed thrice with ice cold 10 % glycerol and brought to the final volume of 200 ml . 
+ 40 ml of the cells were electroporated ( Biorad Gene pulser , 1.8 kV , 1 mm cuvette ) with 90 ng of the plasmid . 
+ After recovery for an hour in 1 ml of minimal media at 30uC , a 1:200 dilution of the culture in the same medium with added ampicillin ( 100 mg/ml ) was propagated at 30uC ( Figure S1A ) . 
+ Aliquots at different times of growth were tested for extent of recombination with and without inducer ( 1 mM arabinose ) , followed by DNA extraction from 1 ml of culture using the Wizard Genomic DNA purification kit from Promega ( Figure S1B ) . 
+ Recombination products were assayed by qPCR as described below . 
+ An optimal substrate recombination of ,25 -- 30 % for the wild-type malF : : Mu substrate ( ZL524 ) was observed either in early - to mid-log phase cultures ( OD600 0.5 -- 0.6 ) with arabinose added for 20 min , or in late-log phase cultures ( OD600 1.2 -- 1.3 ) without added inducer . 
+ However , in induceradded mid-log phase cultures , there were large variations in the recombined fraction in different strains , whereas without added inducer , this fraction was reliably reproducible in late-log cultures of all strains . 
+ The latter conditions were therefore chosen for all the experiments reported in this study , except when testing recombination in the gyrase ts strain as described below . 
+ Later in the course of this study we acquired a rhamnose-inducible plasmid ( Rha ) , where the basal-level leaky Cre expression was negligible . 
+ We confirmed that upon induction of Cre with rhamnose in late-log phase , recombination efficiencies ( REs ) were comparable to those obtained with uninduced Cre expression from the Ara plasmid ( Figure S1C ) ; REs obtained with the rhamnose-induced Cre were comparable between mid - and late-log cultures ( Figure S1D ) . 
+ Cre assays in the gyrase ts mutant were carried out with the Rha plasmid . 
+ Wild-type Muc + ( ZL911 ) and its isogenic gyrase ts mutant ( ZL941 ) strains transformed with the pRHA113-Cre plasmid were grown in LB media ( supplemented with 0.2 % glucose ) at 26uC until OD600 reached 0.5 , then either maintained 26uC or shifted to the non-permissive temperature 37uC for 30 min [ 78 ] , followed by addition of 1 mM rhamnose for 20 min before DNA extraction . 
+ To control for the effect of NAPs on Cre recombination per se , RE of loxP site pairs placed outside the Mu domain was monitored simultaneously in both wild-type and DNAP strains . 
+ The REs were not affected by deletion of any of the NAPs tested ( Zheng Lou , Ph.D. dissertation ) . 
+ Real-time qPCR
+ Aliquots with 50 ng of DNA , 10 ml SYBR master mix ( Applied Biosystems Inc ; includes dNTPs , enzyme and buffer ) , 0.4 ml of each primer ( 10 mM ) and 8.2 ml of double distilled H2O were held for 10 min at 95uC , followed by 40 cycles of 15 sec at 95uC and 1 min at 60uC ( 7900HT ; Applied Biosystems ) . 
+ Three independent biological replicates were tested , and for each biological replicate three independent technical replicates were performed . 
+ Product integrity was checked using the dissociation curve . 
+ Cycle Threshold ( Ct ) was read out , and the starting template amount was quantified based on the value of Ct assuming exponential growth at early stages of amplification . 
+ Recombination efficiencies were calculated based on the threshold cycle ( Ct ) . 
+ The relative threshold cycles of each sample were calculated as , DCt ~ Ct ( P ) { Ct ( S ) where ( P ) represents recombination product and ( S ) the substrate before recombination . 
+ The recombination efficiencies ( RE ) of different samples are normalized to set the recombination efficiency of the control loxP sites as 1 . 
+ The relative recombination efficiency of each sample is calculated as RE ~ 2DCt ( WT ) { DCt ( Mutant ) . 
+ The primer pair for amplifying the starting substrates were ( 1 ) ` RT attL loxP t and RT attL loxP b ' for malF : : Mu in MP1999 , ( 2 ) ` RT LR loxP t and RT LR loxP b ' for malF without Mu , ( 3 ) ` RT lacZ L t and RT lacZ L b ' for lacZ : : Mu , and ( 4 ) ` RT l L t and RT l L b ' for l ( other primers are listed in Table S2 ) . 
+ Primer efficiency of a primer pair ( say A & B ) was determined as follows : Primer A was linked to pUC19t ( forward ) and primer B was linked to pUC19b ( reverse ) ; the pUC primers anneal to pUC19 plasmid and amplify a common 180 bp fragment . 
+ The PCR products were purified by Qiaquick PCR purification KitH ( Qiagen ) and used as templates for qPCR in the following reaction : 12.5 ml SYBR mix ( Qiagen ) , 0.75 ml each of primer A ( 10 mM ) and primer B ( 10 mM ) , 1 ml of template ( 10 ng/ml ) ( the 180 bp fragment , as described above ) and 10 ml of double distilled H2O . 
+ PCR cycles were as described above . 
+ Another qPCR reaction was performed using the internal primer pair pUC19t and pUC19b . 
+ The primer efficiency of primer pairs A/B was calculated as the ratio of Ct values of the PCR product obtained using primer ApUC19t + primer B-pUC19b to that from primers pUC19t + - pUC19b . 
+ Chromosome Conformation Capture (3C) assay
+ The methodology was modified from published protocols [ 83,84 ] . 
+ An O/N cell culture in Luria broth ( LB ) was diluted 1:1000 into 50 ml of fresh medium and grown with shaking at 30uC until OD600 reached 0.5 -- 0.6 . 
+ DNA crosslinking and cell lysis . 
+ 1.35 ml of formaldehyde ( Fisher-Scientific , 37 % ) was added to the 50 ml cell culture and incubated for 20 min at room temperature with slow shaking . 
+ Crosslinking was stopped with addition of 12 ml of 2.5 M glycine for 5 min at room temperature ( r.t. ) . 
+ Cells were centrifuged at 50006 g for 10 min at 4uC , washed twice in 10 ml ice-cold PBS pH 7.5 , resuspended in 1 ml PBS and divided into 100 ml aliquots in 10 eppendorf tubes . 
+ The aliquots were centrifuged at 140006g 4uC for 5 min and the pellets resuspended in 200 ml of 16 restriction enzyme buffer . 
+ The centrifugation step was repeated , and the pellet resuspended in 192 ml digestion buffer ( 16 restriction enzyme buffer , 16 complete protease inhibitor from Roche ) . 
+ Next , 2 ml of 35 KU/ml Ready-Lyse lysozyme ( Epicenter Biotechnologies ) was added to the suspension and incubated at r.t. for 20 min . 
+ Finally , 6 ml of 10 % SDS solution was added , followed by O/N incubation at 37uC . 
+ Digestion and ligation . 
+ 100 ml of the cell lysate prepared above was added to 200 ml digestion buffer ( see above ) and 30 ml of 20 % Triton X-100 . 
+ After incubation at 37uC for 1.5 hr , restriction digestion was performed thrice at 37uC as follows : 3 ml of 100 U/ml enzyme EcoRI or PstI ( New England BioLabs ) for 3 hr , another 3 ml of enzyme for 3 hr , and finally another 2 ml of enzyme for O/N incubation . 
+ On the following day , digestion was stopped by addition of 50 ml of 10 % SDS , incubated at 65uC for 20 min . 
+ Samples were centrifuged at 16,000 g for 5 min at r.t. and their supernatants ( ,390 ml ) transferred to 15 ml tubes , which contained 4095 ml of pre-ligation buffer ( 25 ml of 20 % Triton X-100 , 100 ml of 256 complete protease inhibitor , 50 ml of 1M Tris-HCl pH 7.8 , 50 ml of 10 mg/ml BSA , and 3645 ml of H2O ) . 
+ After incubation for 1 hr at 37uC with shaking , 500 ml of 106 T4 DNA ligase buffer and 15 ml of 2000 U/ml T4 DNA ligase ( New England BioLabs ) were added to a final DNA concentration of 0.8 ng/ml . 
+ The ligation reaction was incubated at 4uC for 3 days , replenishing ATP each day by addition of 50 ml of 100 mM ATP . 
+ Finally , the ligation reaction was incubated for 1 hr at r.t. , followed by addition of Proteinase K mixture ( 210 ml of 5M NaCl , 10 ml of 0.5 M EDTA , 105 ml of 10 mg/ml Proteinase K ) to stop the reaction . 
+ Crosslinks were reversed by O/N incubation at 65uC . 
+ DNA purification . 
+ To remove RNA from the samples , 30 ml of 10 mg/ml RNase A ( Promega ) was added for 45 min at 37uC . 
+ DNA was extracted twice by an equal volume of phenol-chloroform-isoamyl alcohol pH 6.7 ( Fisher-Scientific ) , and with pure chloroform once . 
+ DNA was precipitated by addition of glycogen ( final concentration 50 mg/ml , Affymetrix ) , 500 ml 3M sodium acetate , and 10 ml isopropyl alcohol , per 5 ml DNA solution . 
+ The air-dried DNA pellet was resuspended in 100 ml of 10 mM Tris-HCl pH 7.5 for qPCR quantification as described above . 
+ All qPCR results were validated by regular PCR , DNA electrophoresis , and DNA sequencing . 
+ Cre recombination in vitro
+ Cre recombination was performed on genomic DNA that was crosslinked with formaldehyde and treated as described above , just prior to addition of restriction enzymes . 
+ 10 ml of Cre ( 1 mg/ml ) was added for 3 hr at 37uC ( ,30 % recombination in wild-type prophage DNA substrate ) . 
+ Cre protein was a gift from Dr. Makkuni Jayaram [ 85 ] . 
+ DNA was treated with RNAse and precipitated as described above under ` DNA purification ' . 
+ RNA Isolation and RACE
+ For RNA isolation , cells were grown with shaking at 30uC in 10 ml of LB until OD600 reached 0.6 . 
+ Two ml of culture ( ,16108 cells ) were harvested for RNA isolation using ToTALLY RNA Kit from Ambion according to their specification . 
+ MICROBExpress Kit from Ambion was used to enrich for mRNA from 10 mg of purified total RNA by removing the 16S and 23S ribosomal RNAs ( rRNAs ) . 
+ The final yield of enriched mRNA was ,1 mg . 
+ The quality of total RNA was checked by agarose gel electrophoresis and the RNA concentration was determined by measuring OD at 260 and 280 nm . 
+ RNA samples were stored at 280uC until use . 
+ 59-RACE ( 59-Rapid Amplification of cDNA Ends ) was used to determine the transcriptional start sites , using the SMARTer RACE cDNA Amplification Kit ( Clontech , Mountain View , CA ) . 
+ Gene-specific primers ( GSP1-3 for Pe * and GSP4-5 for Pe promoter ) were used to amplify the 59-end of isolated mRNA . 
+ The RACE PCR amplified bands were gel-purified and sequenced directly . 
+ Fluorescence measurement of EGFP-MuB/MuE strains
+ The fluorescence intensity of EGFP strains was recorded using a PTI Quanta Master Model C scanning spectrofluorometer . 
+ Strains were sub-cultured by 1:50 dilution from an O/N culture into 5 ml of LB , and grown at 30uC until OD600 reached 0.6 . 
+ Three ml of the culture cell were placed in a Bio-Rad VersaFluor cuvette with a path length of 10 mm . 
+ Excitation and emission wavelengths were set at 488 nm and 507 nm , respectively . 
+ Fluoresence measurements were obtained from three independent cultures propagated on different days , each measured in triplicate . 
+ AU is arbitrary units . 
+ Each fluorescent value was derived by calculating fluorescent data of EGFP-MuB/MuE strain minus fluorescent data of WT strain without EGFP fusion . 
+ Fluorescence microscopy m
+ 2 l of culture prepared as described above was placed on a glass slide , and examined with Olympus BX53 microscope equipped with a GFP filter . 
+ Photographs were taken with Olympus XM10 camera and processed with Photoshop ( Adobe Systems , Palo Alto , CA ) . 
+ Supporting Information
+ with the Rha plasmid . 
+ Recombination efficiency ( RE ) was calculated as described in Materials and Methods . 
+ RE of loxP sites flanking the wild-type malF : : Mu prophage is set to 1 ( MU , ZL524 ) , and compared to a pair of loxP sites separated by an equivalent 37 kbp on chromosomal DNA in malF region ( malF / yjcF , ZL592 ) . 
+ ( D ) As in ( C ) , except at ,30 hr of growth . 
+ ( TIF ) and the exact position of loxP sites is found in Table S1 . 
+ Primers used are listed in Table S2 . 
+ ( B ) The SGS site was engineered at the center of the 37 kbp yjcF-malF E. coli DNA segment ( see A ) and RE of flanking loxP sites measured . 
+ MU ( ZL524 ) , yjcF/malF ( ZL592 ) , yjcF / malF + SGS ( ZL598 ) . 
+ ( C ) Effect of SGS on the RE of loxP site pairs at varying distances in E. coli DNA . 
+ The SGS site was introduced at the center of DNA flanked by loxP pairs separated by 5 -- 37 kbp shown in A . 
+ The RE of these sites is compared in strains with ( red ) and without ( blue ) SGS . 
+ The strains without SGS are listed in A. Those with added SGS are : ZL706 ( 5 kbp ) , ZL710 ( 9 kbp ) , ZL714 ( 25 kbp ) , ZL598 ( 37 kbp ) . 
+ ( D ) Double log plot of RE vs distance as described for the data in Figure 1B , except that the RE value at 37 kbp is omitted . 
+ Here , y ~ { 5:41 x { 1:86 . 
+ ( TIF ) identified in this study [ 49,86 ] . 
+ ( B ) EGFP-MuB fluorescence in strains containing Mu prophages without ( WT ) or with ( EGFPMuB ) EGFP fused to the B. Both strains were grown at 30uC , where the prophage does not enter lytic growth . 
+ WT ( MP1999 ) , EGFP-MuB ( RS033 ) . 
+ ( C ) Characterization of the Mu early transcripts in a Mu lysogen . 
+ Total RNA was isolated from the uninduced strain MP1999 . 
+ 59-RACE-PCR was performed on the first-strand cDNA synthesized from the total RNA using primers within MuA and MuB genes , and the products were directly sequenced to identify the 59 ends as described in Materials and Methods . 
+ Two products were initially obtained -- Pe and Pe * . 
+ These were characterized separately using gene-specific primers ( GSPs ) placed are varying distances to confirm that the size of the product varied as predicted from the identity of the 59 terminus . 
+ GSP positions on the Mu genome are shown in the schematic below . 
+ Lanes 3 and 7 contain DNA size markers . 
+ ( D ) Position of Pe and Pe * with respect to ner and A gene ORFs and their homology to the E. coli sigma 70 promoter consensus sequence . 
+ Transcription start sites as determined by 59RACE are indicated by magenta coloring of the A nucleotide starts determined for both transcripts . 
+ Start of the Pe transcript matches that reported previously by S1 mapping [ 87 ] . 
+ Conserved nucleotides in both promoters are underlined . 
+ Compared to the sigma 70 consensus promoter , Pe * has the same number of conserved nucleotides as found in the Pe i.e. 5/6 at 235 and 4/6 at 210 . 
+ ( TIF ) 
+ We thank M. Jayaram for helpful comments.
+ Author Contributions
+ Conceived and designed the experiments : RPS ZL LM RMH . 
+ Performed the experiments : RPS ZL LM . 
+ Analyzed the data : RPS ZL LM RMH . 
+ Contributed reagents/materials/analysis tools : RPS ZL LM RMH . 
+ Wrote the paper : RMH .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/24272778.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/24272778.txt 0 → 100644
View file @27818a9
+ Genome-Scale Analyses of Escherichia coli and Salmonella enterica AraC Reveal Noncanonical Targets and an Expanded Core Regulon
+ Escherichia coli AraC is a well-described transcription activator of genes involved in arabinose metabolism . 
+ Using complementary genomic approaches , chromatin immunoprecipitation ( ChIP ) - chip , and transcription proﬁling , we identify direct regulatory targets of AraC , including ﬁve novel target genes : ytfQ , ydeN , ydeM , ygeA , and polB . 
+ Strikingly , only ytfQ has an established connection to arabinose metabolism , suggesting that AraC has a broader function than previously described . 
+ We demonstrate arabinose-dependent repression of ydeNM by AraC , in contrast to the well-described arabinose-dependent activation of other target genes . 
+ We also demonstrate unexpected read-through of transcription at the Rho-independent terminators downstream of araD and araE , leading to signiﬁcant increases in the expression of polB and ygeA , respectively . 
+ AraC is highly conserved in the related species Salmonella enterica . 
+ We use ChIP sequencing ( ChIP-seq ) and RNA sequencing ( RNA-seq ) to map the AraC regulon in S. enterica . 
+ A comparison of the E. coli and S. enterica AraC regulons , coupled with a bioinformatic analysis of other related species , reveals a conserved regulatory network across the family Enterobacteriaceae comprised of 10 genes associated with arabinose transport and metabolism . 
+ Escherichia coli AraC is the founding member of a large family of transcription factors ( TFs ) found across a wide range of bacterial species ( 1 ) . 
+ AraC was ﬁrst identiﬁed in 1959 by virtue of the requirement of araC for the metabolism of L-arabinose ( 2 ) and is the ﬁrst-described positive regulator of transcription ( 3 , 4 ) . 
+ E. coli AraC activates transcription of the araBAD , araFGH , araE , and araJ transcripts in the presence of its inducer , L-arabinose ( 5 ) . 
+ AraC binds DNA as a dimer . 
+ Dimerization occurs between adjacent DNA sites when AraC binds arabinose . 
+ In the absence of arabinose , AraC represses transcription of araBAD and araC by forming a repression loop mediated by dimerization of distally bound AraC monomers ( 5 , 6 ) . 
+ Chromatin immunoprecipitation ( ChIP ) - chip and ChIP sequencing ( ChIP-seq ) are widely used techniques for genome-wide mapping of protein-DNA interactions in vivo . 
+ Surprisingly , these methods have been used only sparingly to study bacterial systems ( 7 ) . 
+ ChIP-chip and ChIP-seq studies of bacterial TFs have identiﬁed novel regulatory interactions , even for well-studied proteins ( 7 , 8 ) . 
+ Furthermore , TF binding sites have been identiﬁed in unexpected locations , such as inside genes ( 9 ) , upstream of genes that are not detectably regulated by the TF , and in genomic regions that lack a canonical DNA sequence motif for the TF ( 7 , 10 ) . 
+ Transcription proﬁling uses microarrays or RNA-seq to determine differences in genome-wide RNA levels between two growth conditions and/or strains ( 11 ) . 
+ This approach is often used to identify regulatory targets of TFs by comparing RNA levels in wild-type cells and cells lacking the TF . 
+ In contrast to ChIP methods , transcription proﬁling identiﬁes all genes regulated by a TF and the level and direction of regulation . 
+ However , transcription proﬁling identiﬁes both direct and indirect regulatory targets . 
+ By combining ChIP methods and transcription proﬁling , it is possible to identify all direct regulatory targets of a TF for a given growth condition ( 11 ) . 
+ We refer to the set of direct regulatory targets as a regulon . 
+ Many TFs , including AraC , are highly conserved between E. coli and other species in the family Enterobacteriaceae . 
+ This suggests that DNA-binding speciﬁcity is the same for TF homologues across the family , and that TF regulon gene function is likely to be conserved . 
+ Most studies of regulon evolution have focused simply on whether regulon members ( i.e. , target genes ) have homologues in related species . 
+ In contrast , very few studies have determined whether conserved genes are regulated by the TF ( 12 ) . 
+ The beststudied TF in this regard is PhoP , a two-component regulator that is conserved across the family Enterobacteriaceae . 
+ Regulation of only three PhoP target genes is conserved across the family , although in any given species there are many more than three PhoP-regulated genes ( 13 ) . 
+ Most PhoP target genes in any given species lack homologues in other species or the genes are conserved but are only regulated by PhoP in one or two species . 
+ The latter phenomenon is known as network rewiring ( 12 ) . 
+ Most of the known AraC regulon members in E. coli are conserved across other Enterobacteriaceae members , but the extent of rewiring is unknown . 
+ Given that much of our understanding of regulon evolution is based on studies of a single TF , PhoP , it is important to experimentally compare regulons for additional TFs between related species ( 12 ) . 
+ Genome-scale approaches have not been previously used to identify AraC-regulated genes . 
+ We hypothesized that despite the extensive prior work on the AraC regulon , there are likely to be previously undescribed AraC-regulated genes and novel modes of regulation by AraC . 
+ In this work , we use a combination of ChIP-chip and transcription proﬁling with microarrays to identify all binding sites and all direct regulatory targets of E. coli AraC . 
+ In addition to identifying a novel mechanism of repression by AraC , our genomic approach reveals unexpected read-through of transcription terminators in AraC-activated transcripts and AraC-regulated genes with no connection to arabinose metabolism . 
+ We also identify all binding sites and all direct regulatory targets of AraC in the related species Salmonella enterica using a combination of ChIP-seq and RNA-seq . 
+ These targets include two novel , cotranscribed , AraC-activated genes ( STM14_0178 and STM14_0177 ) that encode a putative arabinoside transporter and an - L-arabinofuranosidase II precursor . 
+ We rename these genes araT and araU . 
+ Together with a bioinformatic analysis of other Enterobacteriaceae species , these data identify a conserved AraC regulon that includes 7 previously described AraC-regulated genes ( araB , araA , araD , araE , araF , araG , and araH ) as well as three novel targets identiﬁed in this work ( ytfQ , araT , and araU ) . 
+ Moreover , our data indicate only limited rewiring of the AraC regulatory network in the Enterobacteriaceae . 
+ MATERIALS AND METHODS
+ Strains and plasmids . 
+ Bacterial strains and plasmids used in this work are listed in Table 1 . 
+ Cells were grown in LB ( 1 % NaCl , 1 % tryptone , 0.5 % yeast extract ) . 
+ All oligonucleotides used in this work are listed in Table S1 in the supplemental material . 
+ AMD054 was constructed using Red recombineering as described previously ( 14 ) . 
+ The PCR product used for recombineering was generated with oligonucleotides JW464 and JW465 , using pKD13 ( 14 ) as the template . 
+ SAC003 ( MG1655 araC-TAP ) was constructed by P1 transduction of the kanamycin resistance ( Kanr ) gene-linked araC-TAP from DY330 araC-TAP ( 15 ) . 
+ The Kanr gene was removed using pCP20 as described previously ( 16 ) . 
+ SAC001 ( MG1655 araC ) and AMD115 were constructed by P1 transduction of the Kanrlinked araC from BW25113 araC ( 17 ) into MG1655 and AMD054 , respectively . 
+ The Kanr gene was removed using pCP20 as described previously ( 16 ) . 
+ Note that SAC001 and AMD115 also contain the ( araD-araB ) 567 mutation that lacks the araBAD operon . 
+ AMD187 ( E. coli MG1655 araC-3 FLAG ) , JTW010 ( E. coli MG1655 with ytfQ AraC site mutation , araC-3 FLAG ) , and CB005 ( S. enterica serovar Typhimurium 14028s araC-3 FLAG ) were constructed using the FRUIT recombineering system ( 18 ) . 
+ The PCR product used to generate the initial tagged strains was made using oligonucleotides JW1141 and JW1142 for E. coli and JW2895 and JW2901 for S. enterica , with pAMD135 as the template . 
+ For construction of JTW010 , the thyA-containing PCR product for insertion upstream of ytfQ was ampliﬁed with oligonucleotides JW601 and JW602 using pAMD001 as the template . 
+ The PCR product for replacing thyA with mutated sequence was constructed using SOEing PCR ( 19 ) with oligonucleotides JW599 , JW600 , JW603 , and JW604 , using a colony of MG1655 as a template . 
+ All lacZ reporter gene fusions were constructed in plasmid pAMD-BA-lacZ using the oligonucleotides listed in Table S1 in the supplemental material . 
+ PCR products were cloned as SphI-HindIII-digested fragments . 
+ pAMD-BA-lacZ has been described previously ( 20 ) , but its construction has not been described in detail . 
+ pAMD-BA-lacZ is a derivative of pBAC-BA-lacZ ( Addgene plasmid 13423 ) in which the NotI-HindIII fragment has been replaced with a PCR product ( cut with NotI and HindIII ) containing an intrinsic terminator from E. coli rrfB and additional restriction sites ( BamHI , XhoI , and SphI ) . 
+ This PCR product was generated using oligonucleotides JW659 and JW660 , with E. coli genomic DNA as the template ( colony PCR ) . 
+ lacZ in this plasmid does not have a start codon or Shine-Dalgarno sequence , so fusions must be made translationally , as is the case for pAMD086 and pAMD007 , or cloned fragments must include a Shine-Dalgarno sequence and start codon , as is the case ( AGAAGGAG ATATACATATG ) for pAMD124 and pAMD132 . 
+ Oligonucleotides used to generate PCR products for cloning of lacZ fusions for regions upstream of araE , ytfQ , and ydeN were JW679 and JW680 ( araE ) , JW675 and JW678 ( ytfQ ) , JW1438 and JW2391 ( ydeN , 371 to 1 ) , and JW1438 and JW1635 ( ydeN , 371 to 14 ) . 
+ The sequences of ytfQ and ydeN upstream sequences , indicating the pieces cloned into the lacZ fusion plasmid , are shown in Fig . 
+ S1 and S2 , respectively , in the supplemental material . 
+ lacZ fusion plasmids to address transcription termination ( pJTW064 , pJTW055 , pJTW060 , pJTW062 , and pJTW061 ) were cloned using SOE-ing PCR ( 19 ) . 
+ A constitutive promoter was ampliﬁed from pAMD001 ( 18 ) using oligonucleotides JW3415 and JW3379 . 
+ These were joined using SOEing PCR with PCR products ampliﬁed with oligonucleotides JW3381 and JW3416 ( araE terminator ) , JW3476 and JW3478 ( tppB terminator ) , or JW3424 and JW3425 ( ahpF terminator ) . 
+ Final PCR products were cloned into pAMD-BA-lacZ using the In-Fusion method ( Clontech ) . 
+ The mutant tppB terminator construct was isolated serendipitously as a result of a mutation introduced during the cloning of the wild-type construct . 
+ Analysis of binding site conservation . 
+ Sequences surrounding AraC binding sites upstream of E. coli araB , araF , araE , araJ , and ytfQ and within dcp ( 30 bp upstream sequence and 30 bp downstream sequence in addition to the 19-mer AraC site ) were individually aligned with equivalent regions ( i.e. , the sequence 500 bp upstream of the homologous gene , or for the site within E. coli dcp , the entire homologous gene ; for S. enterica araT , sequence was taken from 500 to 100 with respect to the gene start , since these genes may be misannotated ) from S. enterica , Citrobacter rodentium ICC168 , Enterobacter sp . 
+ strain 638 , Klebsiella pneumoniae 342 , and Cronobacter sakazakii ES15 using ClustalW ( 21 ) . 
+ Similarly , the AraC site upstream of S. enterica araT was aligned with homologues from the same list of species . 
+ The number of matches to each position of each AraC site was determined , and the fraction of all species with a match to the reference sequence at each position was calculated . 
+ For each AraC binding site , the multispecies collection of aligned sites was used to compute the information content of each position ( 22 ) to generate conservation pro-ﬁles . 
+ - Galactosidase assays . 
+ Two to 3 ml cells was grown in LB or LB plus 0.2 % arabinose at 37 °C to an optical density at 600 nm ( OD ) of 0.8 to 600 1.0 , and the OD600 was recorded . 
+ Eight hundred l cells was pelleted at full speed in a microcentrifuge for 1 min ( 80 l was used for strongly active fusions , and this was corrected for at the ﬁnal calculation step ) . 
+ Cell pellets were resuspended in 800 l Z buffer ( 0.06 M Na2HPO4 , 0.04 M NaH2PO4 , 0.01 M KCl , 0.001 M MgSO4 ) plus 50 mM - mercaptoethanol ( added fresh ) . 
+ Twenty l chloroform and 10 l 0.1 % SDS was added to the cells , followed by vortexing for 5 s. Assays were started by addition of 160 l o-nitrophenyl - - D-galactopyranoside ( ONPG ; 4 mg/ml in distilled H2O ) and stopped by addition of 400 l 1 M Na2CO3 upon development of an appropriate yellow color . 
+ The reaction time was noted . 
+ Samples were centrifuged at full speed in a microcentrifuge to pellet the chloroform . 
+ The OD420 of the supernatant was recorded . 
+ Arbitrary assay units were calculated as 1,000 [ A420 / ( A600 ) ( total time ) ] . 
+ RNA puriﬁcation . 
+ RNA was puriﬁed from cells using a modiﬁed version of the hot phenol method that has been described previously ( 11 ) . 
+ Cells were grown in LB or LB plus 0.2 % arabinose at 37 °C to an OD600 of 0.6 to 0.8 . 
+ One ml cells was mixed with 400 l ice-cold 95 % ethanol and 5 % phenol-chloroform-isoamyl alcohol ( 25:24:21 mix ) . 
+ Cells were pelleted in a microcentrifuge for 1 min at full speed and washed once with Tris-buffered saline . 
+ Cell pellets were resuspended in 400 l RNA lysis buffer ( 2 % SDS , 4 mM EDTA ) and boiled for 3 min . 
+ Four hundred l acid phenol-chloroform-isoamyl alcohol mix ( pH 4.3 ) was added and incubated at 65 °C for 6 min and on ice for 5 min . 
+ Samples were centrifuged , and the aqueous layer was extracted once more with phenol-chloroform-isoamyl alcohol mix ( pH 4.3 ) . 
+ RNA was precipitated with 1 ml 100 % ethanol and 40 l 3 M sodium acetate . 
+ RNA was pelleted in a microcentrifuge for 10 min at full speed and washed once with room temperature 75 % ethanol . 
+ RNA pellets were air dried and resuspended in water and treated with 10 U of DNase I ( NEB ) in 500 l for 1 h at 37 °C . 
+ RNA was then phenol extracted and ethanol precipitated as described above . 
+ Transcription proﬁling using microarrays . 
+ RNA was puriﬁed from MG1655 ( wild-type ) or SAC001 ( araC ) cells grown in LB with or without 0.2 % arabinose at 37 °C . 
+ cDNA synthesis , labeling , hybridization to Affymetrix GeneChip E. coli Genome 2.0 microarrays , washing , and scanning were performed according to the manufacturer 's ( Affymetrix ) instructions . 
+ Triplicate data sets for each strain/condition pair were analyzed using GeneSpring software ( Agilent ) to calculate fold changes and P values . 
+ Only genes with 4-fold changes and P values of 0.1 are shown in Tables 1 and 2 . 
+ 5 = RACE . 
+ RNA was puriﬁed from MG1655 cells grown in LB . 
+ 5 = Rapid ampliﬁcation of cDNA ends ( RACE ) was performed using the FirstChoice RLM-RACE kit ( Ambion ) according to the manufacturer 's instructions . 
+ Oligonucleotides JW1485 and JW1486 , speciﬁc to ydeN , were used in conjunction with oligonucleotides provided by the manufacturer ( GCTG ATGGCGATGAATGAACACTG and CGCGGATCCGAACACTGCGTT TGCTGGCTTTGATG , respectively ) . 
+ Northern blotting . 
+ Ten g RNA was run per lane on a 1 % agarose , 1 3 - ( N-morpholino ) propanesulfonic acid ( MOPS ) , 2 % formaldehyde gel at 70 V for 4 h. RNA was blotted by capillary action onto Magna nylon transfer membrane ( GE Water & Process Technologies ) and ﬁxed by UV 5 irradiation . 
+ Membranes were incubated with 10 cpm PCR-generated double-stranded DNA ( dsDNA ) probe overnight in hybridization buffer ( 0.525 M Na2HPO4 , 7 % SDS , 1 mM EDTA , 10 mg/ml bovine serum albumin [ BSA ] ) and washed twice with wash buffer 1 ( 40 mM Na2HPO4 , 5 % SDS , 1 mM EDTA ) , wash buffer 2 ( 40 mM Na2HPO4 , 1 % SDS , 1 mM EDTA ) , and wash buffer 3 ( 0.2 % SDS , 0.2 SSC [ 1 SSC is 0.15 M NaCl plus 0.015 M sodium citrate ] ) at 55 °C ( 23 ) . 
+ Blots were visualized by phosphorimaging . 
+ Oligonucleotides used to generate PCR products for probe labeling were JW243 and JW1399 for araE and JW2387 and JW2388 for ygeA . 
+ RNA-seq . 
+ RNA was puriﬁed from 1 ml cells grown in LB with or without 0.2 % arabinose at 37 °C to an OD600 of 0.6 to 0.8 . 
+ Duplicate samples were prepared from independent biological replicates for each condition/strain . 
+ rRNA was removed using the RiboZero kit ( Epicentre ) . 
+ Strand-speciﬁc DNA libraries for Illumina sequencing were prepared using the ScriptSeq 2.0 kit ( Epicentre ) . 
+ Sequencing was performed using an Illumina HiSeq instrument ( University at Buffalo Next Generation Sequencing Core Facility ) . 
+ Sequences were aligned to the 14028s genome using the CLC Genomics Workbench , and differences in expression between conditions/strains were determined using the Pathogen Portal RNA-seq Analysis Pipeline ( 24 ) that includes Bowtie ( version 2.02 ; for aligning reads to reference genomes ) ( 25 ) , Cufﬂinks ( version 2.02 ; for transcript mapping ) , and CuffDiff ( for comparing expression of transcripts between samples ) ( 26 ) with default settings . 
+ 2 Genea Arabinosec araCd araA 9.6 7.6 araB 9.4 7.3 araD 7.8 7.0 araE 5.8 6.1 araG 4.8 4.9 araJ 4.1 4.7 araH 4.6 4.6 araF 4.8 4.3 araHb 4.8 4.0 ygeA 3.5 3.4 isrB 2.9 2.9 cstA 2.3 2.0 melA 2.2 2.0 aldB 2.5 2.1 fucI 2.0 2.2 tdcF 2.0 2.2 tdcA 2.1 2.2 xylF 2.6 2.5 gudX 2.6 2.5 tdcE 2.6 2.6 garR 2.7 2.8 tdcC 2.7 2.9 tdcB 3.3 3.1 ydeN 3.1 3.1 tdcD 2.4 3.1 yjhA 3.1 3.1 tnaL 3.5 3.2 garD 3.3 3.3 garL 3.9 3.4 tnaA 3.0 3.7 garP 3.2 3.8 malG 3.0 3.9 malF 3.8 4.1 tnaB 3.9 4.2 gudP 4.2 4.3 malE 3.6 4.5 malM 4.1 4.6 malK 4.5 5.1 lamB 4.6 5.2 a Arabinose-responsive genes in E. coli were deﬁned by a 4-fold change ( signiﬁcant difference ) in growth with or without arabinose in wild-type ( MG1655 ) cells and a 4-fold signiﬁcant difference between wild-type ( MG1655 ) and araC ( SAC001 ) cells in the presence of arabinose . 
+ Direct regulatory targets of AraC are indicated by boldface . 
+ Previously described regulatory targets of AraC are shaded in gray . 
+ b araH is represented twice on the microarray . 
+ c Fold change in mRNA level for wild-type cells grown with or without arabinose . 
+ d Fold difference in mRNA level between wild-type and araC cells grown in the presence of arabinose . 
+ Reverse transcription-PCR ( RT-PCR ) . 
+ To assess terminator read-through downstream of araE and araD , RNA was puriﬁed from MG1655 cells grown in LB plus 0.2 % arabinose . 
+ RNA was reverse transcribed using SuperScript III reverse transcriptase ( Invitrogen ) with 100 ng random hexamer according to the manufacturer 's instructions . 
+ A control reaction omitted the reverse transcriptase . 
+ One-twentieth of the cDNA ( or negative control ) was used as a template in a PCR with appropriate primers ( see Table S1 in the supplemental material ) . 
+ Oligonucleotides used for PCR were JW435 and JW436 for araE-ygeA and JW1366 and JW1367 for araD-polB . 
+ ChIP , ChIP-chip , and ChIP-seq . 
+ ChIP methods are presented in the supplemental material . 
+ Accession numbers . 
+ Microarray and sequencing data sets are available in the supplemental material ( E. coli ChIP-chip ) or through the EBI / EMBL ArrayExpress repository under the following accession numbers : E. coli transcription proﬁling , E-MTAB-1916 ; S. Typhimurium ChIP-seq , E-MTAB-1915 ; S. Typhimurium RNA-seq , E-MTAB-1901 . 
+ The Agilent microarray design used for E. coli ChIP-chip is available through ArrayExpress under accession number A-MEXP-2346 . 
+ RESULTS
+ Genome-wide mapping of AraC binding sites in E. coli . 
+ E. coli AraC-regulated genes have been identiﬁed previously through a variety of genetic approaches ( 3 , 27 -- 29 ) . 
+ Here , we used two complementary genomic approaches to comprehensively identify members of the AraC regulon . 
+ First , we mapped the genome-wide binding of TAP ( tandem afﬁnity puriﬁcation ) - tagged AraC ( tagged at its native locus in an unmarked strain ) using chromatin immunoprecipitation ( ChIP ) coupled with custom-designed oligonucleotide microarrays ( ChIP-chip ; see Table S2 in the supplemental material ) . 
+ We identiﬁed seven putative target loci for AraC : upstream of araB-araC , araE , araF , araJ , ytfQ , ydeN , and within dcp . 
+ These included all previously described AraC target loci , with the exception of xylA , which we believe is not a direct target of AraC under these growth conditions ( see below ) . 
+ AraC association has not been previously described for ytfQ , ydeN , or dcp . 
+ We validated the ChIP-chip data using ChIP coupled with quantitative real-time PCR ( ChIP/qPCR ) . 
+ To demonstrate that ChIP signal was not an artifact of the TAP tag , we constructed an unmarked derivative of MG1655 that expresses a C-terminally 3 FLAG-tagged AraC from its native locus . 
+ ChIP/qPCR veriﬁed signiﬁcant association of AraC with all regions tested in the presence of arabinose ( Fig. 1A ; araJ was not tested ) . 
+ Association of AraC with all regions was reduced in the absence of arabinose , with no association detected for ydeN ( Fig. 1A ) . 
+ Thus , our data suggest that the overall afﬁnity of AraC for its DNA sites is increased by association with arabinose . 
+ This is particularly important for AraC binding upstream of ydeN , since this interaction appears to be completely dependent upon arabinose . 
+ The known consensus sequence for AraC ( Fig. 1B ) is based on extensive footprinting and mutagenesis studies of the araBAD , araC , araE , araFGH , and araJ promoters ( 30 -- 34 ) . 
+ From our validated AraC ChIP targets , we inferred a de novo position-speciﬁc weight matrix ( PSWM ) for AraC using MEME , a bioinformatic tool that identiﬁes overrepresented motifs in multiple unaligned sequences ( 35 ) . 
+ The top-scoring motif predicted by MEME is a good match to the known AraC motif ( Fig. 1B ) . 
+ MEME identiﬁed many , but not all , of the known AraC binding sites . 
+ This is unsurprising , since cooperative interactions of AraC dimers stabilize binding to some nonconsensus DNA sites at previously described target loci ( 32 ) . 
+ Effects of AraC and arabinose on global gene expression in E. coli . 
+ We used transcription proﬁling with Affymetrix high-density microarrays to determine the effects of AraC and arabinose on RNA levels genome wide . 
+ Wild-type or araC mutant cells were grown in the absence or presence of 0.2 % L-arabinose . 
+ Table 2 lists the genes whose expression changed signiﬁcantly by 4-fold in wild-type cells upon addition of arabinose and whose expression differed signiﬁcantly by 4-fold between wild-type and araC cells in the presence of arabinose . 
+ As expected , expression of known AraC-regulated genes , i.e. , araB , araA , araD , araE , araF , araG , araH , and araJ , increased substantially upon addition of arabinose in wild-type cells and was substantially higher in wildtype cells than araC cells in the presence of arabinose ( Table 2 ) . 
+ Novel AraC-regulated genes identiﬁed using this approach are discussed below . 
+ We did not detect signiﬁcant AraC-dependent or arabinose-dependent regulation of xylA , a previously described AraC-regulated gene ( 36 ) , nor did we detect binding of AraC upstream of xylA . 
+ Hence , we believe that xylA is not a direct regulatory target of AraC under the conditions tested here ( cells were grown in tryptone broth in the other study ) . 
+ Genes regulated indirectly by arabinose and AraC . 
+ Many of the genes regulated by AraC/arabinose ( Table 2 ) are not associated with binding of AraC , as determined by the ChIP-chip experiment . 
+ We conclude that these genes are indirectly regulated by arabinose and/or AraC . 
+ Almost all of these indirectly regulated genes are repressed by AraC/arabinose , and they include genes associated with maltose metabolism ( malE , malF , malG , malK , malM , and lamB ) , threonine metabolism ( tdcA , tdcB , tdcC , tdcD , and tdcE ) , D-glucarate/D-galactarate metabolism ( garD , garL , garP , and garR ) , and tryptophan metabolism ( tnaA , tnaB , and tnaL ) . 
+ Only one indirect target gene , isrB , is upregulated 4-fold by both AraC and arabinose . 
+ isrB was originally annotated as a small RNA but has more recently been shown to encode a small membrane protein ( 37 ) . 
+ The mechanisms by which these genes are indirectly regulated by AraC and/or arabinose are unclear . 
+ Arabinose-independent repression of ytfQ transcription by AraC . 
+ The ChIP-chip analysis identiﬁed binding of AraC upstream of ytfQ and ppa ( divergently transcribed genes ) . 
+ The MEME analysis identiﬁed a putative AraC binding site centered at positions 133.5 and 94.5 relative to the previously mapped transcription start sites of ytfQ and ppa , respectively ( Fig . 
+ S1 in the supplemental material ) ( 38 ) . 
+ To determine experimentally whether this is the true AraC site upstream of ytfQ , we performed a ChIP experiment in a wild-type strain and in a strain in which the putative AraC binding site was mutated . 
+ Association of AraC , as determined by ChIP/qPCR , was signiﬁcantly reduced by mutation of the putative DNA site ( Fig. 1C ) . 
+ We conclude that this is a genuine DNA site for AraC . 
+ We did not detect signiﬁcant regulation of ytfQ or ppa by AraC or arabinose in the transcription proﬁling experiment ; however , ytfQ encodes a transporter that binds arabinose and galactose ( 39 ) , consistent with ytfQ being a regulatory target of AraC . 
+ We constructed a translational fusion of ytfQ to a lacZ reporter gene and performed - galactosidase assays with or without arabinose in a wild-type and a araC strain . 
+ We detected a small ( 1.5-fold ) but signiﬁcant increase in expression in the araC strain ( see Fig . 
+ S3 in the supplemental material ) , suggesting that AraC directly represses transcription of ytfQ , albeit weakly . 
+ This apparent repression did not depend upon the addition of arabinose ( see Fig . 
+ S3 ) . 
+ Arabinose-dependent repression of ydeNM transcription by AraC . 
+ The ChIP-chip analysis identiﬁed binding of AraC upstream of ydeN ( Fig. 1A ) . 
+ The relatively low resolution of ChIP-chip precluded precise identiﬁcation of the binding site ( s ) . 
+ We also showed in the transcription proﬁling experiment that expression of ydeN is reduced in the presence of arabinose and reduced in the presence of araC ( Table 2 ) . 
+ Similarly , expression of ydeM , the downstream gene , decreased 3.2-fold in the presence of arabinose and was reduced 7.3-fold by the presence of araC . 
+ This suggests that ydeN and ydeM are transcribed as a two-gene operon that is repressed by AraC . 
+ In the absence of arabinose , we did not detect AraC association upstream of ydeN ( Fig. 1A ) , nor did we detect any signiﬁcant difference in expression of ydeN or ydeM between wild-type and araC mutant cells in the absence of arabinose . 
+ ChIP/qPCR analysis of RNA polymerase ( RNAP ) at ydeN conﬁrmed that transcription decreases in the presence of arabinose and that this decrease is dependent upon araC ( Fig. 2 ) . 
+ Thus , ydeNM is a novel AraC-regulated operon that is directly repressed by AraC in an arabinose-dependent manner . 
+ We mapped the 5 = end of the ydeNM transcript using 5 = RACE and constructed transcriptional fusions to a lacZ reporter gene with fragments starting at position 371 and ending at position 1 or 14 with respect to the transcription start site . 
+ The longer fragment , from 371 to 14 , showed 3-fold arabinose-depen-dent repression by AraC ( Fig. 3 ) . 
+ In contrast , the shorter fragment , from 371 to 1 , showed no repression by AraC , suggesting association of AraC with the sequence around the transcription start site ( Fig . 
+ S2 in the supplemental material ) , although no site matching the AraC motif could be identiﬁed in this region . 
+ ydeN and ydeM encode a predicted sulfatase and a predicted sulfatase maturase , respectively ; thus , they have no apparent connection to arabinose metabolism . 
+ To determine whether either ydeN or ydeM is required for normal regulation of AraC-activated genes , we constructed a translational fusion of the araE upstream region to lacZ and measured - galactosidase activity in a wildtype strain and in isogenic strains containing deletions of either ydeN or ydeM . 
+ We did not detect any substantial difference in - galactosidase activity relative to the wild-type strain in either mutant ( see Fig . 
+ S4 in the supplemental material ) . 
+ AraC binding within dcp is not associated with detectable regulation of transcription . 
+ We detected binding of AraC within dcp ( Fig. 1A ) . 
+ The predicted binding site is located far from the 5 = end of any gene , including dcp itself ( see Fig . 
+ S5 in the supplemental material ) , suggesting that it is not associated with regulation of an annotated gene . 
+ Intriguingly , association of AraC with the site in dcp , as measured by ChIP/qPCR , is the highest of all AraC-bound regions in the E. coli genome ( Fig. 1A ) . 
+ To determine whether the AraC site within E. coli dcp is associated with transcription regulation , we used ChIP/qPCR to measure association of RNAP in the presence and absence of arabinose in a wild-type and a araC strain ( Fig. 2 ) . 
+ We did not detect any signiﬁcant differences in RNAP association , suggesting that under these growth conditions , AraC does not regulate expression of a transcript that initiates within dcp . 
+ tivated transcripts . 
+ In the transcription proﬁling experiment , we found that expression of ygeA is signiﬁcantly induced by arabinose and is dependent on araC ( Table 2 ) . 
+ ygeA is located immediately downstream of araE , in the same orientation , suggesting that some RNAP reads through the terminator downstream of araE and transcribes ygeA . 
+ We tested this hypothesis using RT-PCR to detect RNA that spans the araE and ygeA genes . 
+ Despite the presence of a strong predicted terminator , we were able to detect RNA species that included both araE and ygeA , consistent with terminator read-through ( Fig. 4A ) . 
+ ChIP/qPCR analysis of RNAP dem-onstrated high levels of RNAP association within ygeA at both the 5 = and 3 = ends , in the presence but not the absence of arabinose and dependent upon araC ( Fig. 2 ) . 
+ Northern blotting using probes speciﬁc to araE and ygeA also demonstrated read-through of the terminator downstream of araE ( Fig. 4B ) , although the level of read-through transcript was lower than that of araE transcript . 
+ We also detected an araC-independent transcript by Northern blotting that is likely due to initiation of transcription immediately upstream of ygeA ( Fig. 4B ) . 
+ Using densitometry analysis , we determined that the araE-ygeA read-through product is 11 % as abundant as the araE transcript . 
+ In contrast , the ChIP/qPCR data ( Fig. 2 ) indicate that 50 % of RNAP complexes read through the terminator downstream of araE . 
+ Together , these data suggest that the read-through transcript is less stable than that for araE alone . 
+ Using the transcription proﬁling data , we analyzed the differences in expression with or without arabinose and in the presence or absence of araC for the genes immediately downstream of araD , araH , and araJ . 
+ Only polB , the gene immediately downstream of araD , showed a 2-fold change in expression . 
+ Speciﬁcally , expression of polB increased 2.6-fold in the presence of arabinose and was 2.5-fold higher in wild-type cells than in araC cells . 
+ This suggests that RNAP also reads through the terminator downstream of araD . 
+ We conﬁrmed this using RT-PCR ( Fig. 4A ) and ChIP/qPCR of RNAP ( Fig. 2 ) , as described above for araE-ygeA . 
+ From the ChIP/qPCR data , we estimate that 30 % of RNAP complexes read through the terminator downstream of araD . 
+ A recent study predicted sites of Rho-independent termination based on RNA sequence and structure ( 40 ) . 
+ The sequence between araE and ygeA ranked 286th on the list of 1,058 predicted terminators , suggesting that it should function effectively to terminate transcription . 
+ To experimentally test the ability of this sequence to terminate transcription , we constructed a lacZ reporter fusion that includes the predicted terminator with limited ﬂanking sequence downstream of a strong , constitutive promoter ( Fig. 5A ) ( 41 ) . 
+ As controls , we constructed fusions with either no terminator sequence or predicted terminators and limited ﬂanking sequence for the ahpF and tppB genes , ranked 293rd and 638th on the list of 1,058 predicted terminators , respectively ( Fig. 5A ) ( 40 ) . 
+ While the ahpF and tppB terminators reduced expression by 98 % and 99 % , respectively , the araE terminator reduced - galactosi-dase activity by only 56 % ( Fig. 5B ) . 
+ We also tested a mutant version of the tppB terminator that contains a point mutation in the upstream stem of the terminator stem-loop . 
+ This mutant terminator reduced - galactosidase activity by 89 % ( Fig. 5B ) . 
+ Thus , the araE terminator is only weakly effective and does not even function as well as a mutant version of a terminator that has lower predicted strength . 
+ Genome-wide mapping of AraC binding sites in S. enterica . 
+ We mapped the genome-wide binding of C-terminally FLAG-tagged AraC in S. enterica subsp . 
+ enterica serovar Typhimurium strain 14028s using ChIP coupled with deep sequencing ( ChIP-seq ) . 
+ We identiﬁed ﬁve putative target loci for AraC : upstream of araB-araC , araE , araJ , STM14_0178 ( araT ) , and within sseD . 
+ We validated the ChIP-seq data using ChIP/qPCR . 
+ Thus , we con-ﬁrmed signiﬁcant association of AraC with all regions identiﬁed by ChIP-seq ( Fig. 6 ) . 
+ Effects of AraC and arabinose on global gene expression in S. enterica . 
+ We used RNA-seq to determine the effects of AraC and arabinose on genome-wide RNA levels in S. enterica . 
+ Wild-type or araC mutant cells were grown in the presence or absence of 0.2 % L-arabinose . 
+ Table 3 lists the 16 genes whose expression changed signiﬁcantly ( false discovery rate [ FDR ] , 0.05 ) by 4-fold in wild-type cells upon addition of arabinose and whose expression differed signiﬁcantly ( FDR , 0.05 ) by 4-fold between wild-type and araC cells in the presence of arabinose . 
+ Of the 16 regulated genes , 9 are direct regulatory targets based on the association of AraC with regions upstream of these genes , as determined by ChIP-seq . 
+ All of the direct regulatory targets are positively regulated by AraC and arabinose . 
+ No direct targets were identiﬁed that are regulated by AraC in the absence of arabinose . 
+ We did not detect any signiﬁcant change in expression of sseD or the surrounding genes , suggesting that , like E. coli dcp , this gene contains an AraC binding site that is not associated with regulation of transcription under the conditions tested . 
+ It is important to note , however , that sseD falls within Salmonella pathogenicity island 2 ( SPI2 ) , a region that is transcriptionally silenced by H-NS under the conditions used in our work ( 42 ) . 
+ Thus , it is possible that AraC regulates transcription from the site within sseD under conditions that derepress SPI2 . 
+ The direct regulatory targets of AraC include all classical ara genes that are conserved in S. enterica , with the exception of araH . 
+ Note that araH is part of the araFGH operon in E. coli but araF and araG are not conserved in S. enterica . 
+ As we have shown for E. coli , ygeA is a direct regulatory target of AraC in S. enterica ( cotrans-cribed with araE ) . 
+ Lastly , STM14_0178 and STM14_0177 are direct regulatory targets of AraC . 
+ STM14_0178 and STM14_0177 do not have close homologues in E. coli and are predicted to encode an arabinoside transporter and an - L-arabinofuranosidase II precursor , respectively . 
+ Thus , it is likely that S. enterica metabolizes arabinosides as a source of arabinose . 
+ Based on their predicted functions , we rename STM14_0178 and STM14_0177 araT ( arabinoside transporter ) and araU ( arabinofuranosidase II precursor ) , respectively . 
+ The AraC site location upstream of araT can be estimated with 20-bp accuracy from the ChIP-seq data ( predicted AraC sites upstream of araE and araJ are within 20 bp of the corresponding ChIP-seq peaks ) ( see Fig . 
+ S6 in the supplemental material ) . 
+ Two regions upstream of araT have sequences similar to the AraC consensus motif . 
+ The location of one of these regions is precisely aligned with the ChIP-seq peak , suggesting that this sequence is bound by AraC under the conditions tested . 
+ The more upstream conserved sequence that resembles an AraC binding site falls outside the region predicted by the ChIP-seq data ; hence , it may bind AraC under other growth conditions , e.g. , in the absence of arabinose . 
+ The end of the downstream putative AraC site is only 21 bp from the annotated gene start for araT , a distance inconsistent with activation of araTU transcription by AraC . 
+ However , the RNA-seq data strongly suggest that the transcription start site is downstream of the annotated gene start for araT . 
+ Hence , the 
+ Genea Arabinoseb araCc ygeA 4.1 4.3 araJ 3.8 4.1 yjcB 3.8 3.0 ycfR 2.7 2.9 dctA 2.4 3.7 mglC 2.4 3.9 mglA 3.2 4.7 ygbM 2.5 5.3 a Arabinose-responsive genes in S. enterica were deﬁned by 4-fold change in expression ( signiﬁcant difference ) under growth with or without arabinose in wild-type ( 14028s ) cells and 4-fold change ( signiﬁcant difference ) between wild-type ( 14028s ) and araC ( AMD485 ) cells in the presence of arabinose . 
+ Direct regulatory targets of AraC are indicated by boldfaced text . 
+ b Fold change in mRNA level for wild-type cells grown with or without arabinose . 
+ c Fold difference in mRNA level between wild-type and araC cells grown in the presence of arabinose . 
+ translation start site of araT is likely to be incorrectly annotated , and the downstream putative AraC site is likely to be located upstream of position 40 with respect to the araTU transcription start site . 
+ This site position is consistent with transcription activation by AraC using a mechanism similar to that described for E. coli AraC-activated genes . 
+ Conservation of the AraC regulon across the family Entero-bacteriaceae . 
+ AraC is highly conserved across the family Entero-bacteriaceae , which includes E. coli and S. enterica . 
+ The two helix-turn-helix DNA-binding domains are particularly well conserved , e.g. , they are 100 % identical between E. coli and S. enterica . 
+ Hence , AraC likely binds with similar DNA sequence speciﬁcity across all Enterobacteriaceae species . 
+ To determine whether regulation of AraC target genes is conserved across the family Enterobacteria-ceae , we aligned sequence surrounding E. coli and/or S. enterica AraC sites identiﬁed in this work with equivalent regions from four other Enterobacteriaceae species ( Citrobacter rodentium , Enterobacter sp . 
+ strain 638 , Klebsiella pneumoniae , and Cronobacter sakazakii ; all alignments are shown in Fig . 
+ S7 in the supplemental material ) . 
+ S. enterica sseD is not conserved in any of the other species , and E. coli ydeN is only conserved in S. enterica ; hence , these regions were not analyzed . 
+ Conservation of AraC sites was observed for araBAD , araFGH , araE , ytfQ , and araTU ( Fig. 7 ; also see Fig . 
+ S6 ) . 
+ No conservation of AraC sites was observed for araJ or dcp . 
+ Conservation was highest for two regions of the AraC binding site : positions 4 to 7 and 13 to 19 . 
+ This is consistent with the information content of the motif derived from the E. coli AraC ChIP-chip data and with the known consensus sequence ( Fig. 1B ) . 
+ DISCUSSION
+ E. coli AraC is one of the best-studied TFs in any bacterial species and was the ﬁrst described transcriptional activator ( 3 , 4 ) . 
+ With the exception of xylA , the last AraC-regulated gene to be identiﬁed was araJ , more than 30 years ago ( 27 ) . 
+ We combined two complementary genomic approaches to expand the known E. coli AraC regulon . 
+ Speciﬁcally , we identiﬁed three novel binding targets of AraC ( upstream of ytfQ and ydeN and within dcp ) and ﬁve novel AraC-regulated genes ( ytfQ , ydeN , ydeM , ygeA , and polB ) . 
+ Strikingly , regulation of four of the ﬁve novel target genes is mechanistically distinct from that observed previously for other AraC-reg-ulated genes . 
+ Thus , our data demonstrate the power of integrating ChIP-chip/ChIP-seq and transcription proﬁling as an unbiased and comprehensive approach to identify regulatory networks . 
+ ChIP-chip identiﬁes noncanonical AraC binding sites . 
+ Despite the extensive history of research on E. coli AraC , we identiﬁed several novel AraC-bound regions and several novel AraC-regu-lated genes . 
+ It is perhaps unsurprising that our unbiased , genomic approach identiﬁed AraC sites and AraC-regulated genes that differ functionally from those identiﬁed previously , as this would explain why they were missed in previous studies . 
+ Speciﬁcally , we identiﬁed AraC binding sites that ( i ) repress rather than activate transcription in an arabinose-dependent manner ( ydeN ) , ( ii ) result in little or no observed regulation under standard laboratory growth conditions ( ytfQ and dcp ) , and ( iii ) are located within a gene ( dcp ) . 
+ We also identiﬁed AraC-regulated genes that are transcribed due to read-through of inefﬁcient Rho-independent terminators ( ygeA and polB ) . 
+ Previous ChIP-chip studies in bacteria have identiﬁed many TF binding sites within genes ( 7 ) . 
+ The most striking example in E. coli is RutR , for which 80 % of binding sites are intragenic ( 9 ) . 
+ With the exception of binding sites close to the 5 = end of genes ( 43 ) , very few intragenic TF binding sites have a described function . 
+ We identiﬁed a binding site for AraC inside dcp , a gene that encodes dipeptidyl carboxypeptidase . 
+ Given the lack of conservation of this putative AraC binding site in other Enterobacteriaceae species and the lack of detectable regulation by AraC at this site , we conclude that the site is unlikely to have regulatory function under the tested growth conditions . 
+ We identiﬁed an analogous AraC site in S. enterica inside sseD . 
+ We propose that these binding sites have ( i ) regulatory function under a different growth condition , ( ii ) a function unrelated to transcription , or ( iii ) no function . 
+ Novel E. coli AraC binding sites that repress transcription . 
+ We identiﬁed two E. coli transcripts that are directly repressed by AraC : ytfQ and ydeNM . 
+ ytfQ encodes a galactose/arabinose transporter ; thus , it has a clear connection to the established function of AraC in regulating arabinose metabolism . 
+ Repression of ytfQ by 
+ AraC is weak ( 1.5-fold ; see Fig . 
+ S3 in the supplemental material ) , indicating that either AraC has only a minor effect on ytfQ expression or that more substantial regulation by AraC is associated with other growth conditions . 
+ AraC has previously been shown to repress its own transcription by binding to a region overlapping the araC promoter elements ( 32 ) . 
+ This repression occurs independently of the addition of arabinose . 
+ The location of the AraC binding site upstream of ytfQ is too far upstream of the transcription start site to repress transcription by directly occluding RNAP . 
+ We propose that AraC bound at this site interacts with additional regulatory proteins , perhaps another monomer of AraC , bound closer to the transcription start site . 
+ GalR has been shown to regulate ytfQ ( 44 ) ( Fig . 
+ S1 in the supplemental material ) . 
+ However , we detected no effect of GalR on regulation of ytfQ by AraC ( data not shown ) . 
+ Unlike AraC-dependent repression of araC and ytfQ , repression of ydeN occurs only in the presence of arabinose ( Table 2 and Fig. 3 ) . 
+ This is consistent with our ChIP data showing binding of AraC upstream of ydeN only in the presence of arabinose ( Fig. 1A ) . 
+ Although arabinose-dependent repression by AraC has not been observed before , there are clear parallels with arabinose-de-pendent activation of araBAD transcription . 
+ Arabinose binding to AraC alters its DNA binding properties ( 5 ) . 
+ At the araC-araBAD intergenic region , AraC forms a repression loop in the absence of arabinose due to the dimerization of distally bound AraC mono-mers . 
+ In the presence of arabinose , dimerization occurs at adjacent sites , breaking the repression loop and activating transcription of araBAD ( 6 ) . 
+ This change in DNA binding is due to a rearrangement of the N-terminal arabinose-binding/dimeriza-tion domain and the C-terminal DNA-binding domain relative to one another ( 5 ) . 
+ We propose that the DNA binding properties of AraC allow it to bind at ydeN only in the presence of arabinose . 
+ Our reporter fusions indicate that maximal repression by AraC requires sequence between 1 and 15 relative to the transcription start site ( Fig. 3 ) . 
+ This strongly suggests the presence of an AraC binding site overlapping the transcription start site , consistent with a role in transcriptional repression . 
+ We propose that AraC binds as a dimer to adjacent sites overlapping the transcription start site . 
+ Thus , arabinose-dependent repression of ydeNM by AraC would use the same mechanism as arabinose-dependent activation of araBAD . 
+ Read-through of inefﬁcient transcription terminators contributes to the E. coli AraC regulon . 
+ ygeA and polB are positively regulated by AraC and arabinose due to partial read-through of Rho-independent terminators ( Fig. 2 , 4 , and 5 ) . 
+ We analyzed published microarray data from another group that used arabinose to induce overexpression of various proteins unrelated to AraC . 
+ Consistent with our own work , both ygeA and polB were in the top 5 % of all genes when ranked by the level of arabinose induction ( 45 ) . 
+ An equivalent analysis for ydeN showed that it is in the bottom 0.5 % of all genes ( 45 ) . 
+ From the Northern blot ( Fig. 4B ) it is clear that , in the presence of arabinose , the majority of ygeA mRNA is in the form of the read-through transcript , suggesting that read-through is physiologically relevant . 
+ Many predictions have been made for intrinsic terminators in E. coli and other species ( 40 , 46 -- 50 ) . 
+ Sequences downstream of araE and araD have been predicted to form terminators . 
+ This is especially true for the terminator downstream of araE , which has a long , G/C-rich stem-loop followed by a 10-mer sequence with 8 U 's ( Fig. 4 ) . 
+ However , both the araE and araD terminators are only weakly effective . 
+ For the araE terminator this is unlikely to be due to alternative structures inﬂuenced by upstream sequence , since a minimal region is insufﬁcient to terminate in the reporter assay we used ( Fig. 5 ) . 
+ Thus , our data suggest that terminator predictions are often inaccurate . 
+ Regulatory functions for AraC beyond arabinose metabo-lism . 
+ We have identiﬁed 7 novel AraC-regulated genes in E. coli and S. enterica . 
+ S. enterica araT and araU encode a likely transport / metabolism system for arabinosides . 
+ This suggests that S. enterica can use arabinosides as a carbon source by metabolizing them to arabinose . 
+ Only one other novel AraC-regulated gene identiﬁed in this work , E. coli ytfQ , has a known connection to arabinose me-tabolism ( 39 ) . 
+ Furthermore , araJ is a long-established member of the AraC regulon but has no known connection to arabinose me-tabolism ( 51 ) . 
+ It is possible that some or all of the novel AraC-regulated genes have as-yet-unidentiﬁed connections to arabinose metabolism , although this seems especially unlikely for polB , which encodes a well-characterized DNA polymerase . 
+ In addition , deletion of ydeN or ydeM did not substantially affect araE expression ( see Fig . 
+ S4 in the supplemental material ) , suggesting that AraC and intracellular arabinose levels are unaffected by the absence of these genes . 
+ Regulation of polB by AraC is particularly intriguing given the well-established function of polB in DNA replication and repair ( 52 ) . 
+ A 6-fold increase in polB expression is sufﬁcient to give a detectable increase in the spontaneous mutation rate independent of the SOS response ( 53 ) . 
+ We were not able to detect a signiﬁcant increase in the spontaneous mutation rate by growth in the presence of arabinose ( data not shown ) , but polB expression increases only 2.6-fold . 
+ While it is likely that an increase in the spontaneous mutation rate would be below our detection threshold , the effect of arabinose on polB expression could contribute to genome variability during long-term growth . 
+ Conservation of the AraC regulon . 
+ The PhoP regulon is by far the best studied with respect to conservation . 
+ Only three genes are consistently regulated by PhoP across the family Enterobacteria-ceae ( 13 ) . 
+ In contrast , our data indicate that most members of the AraC regulon are conserved in this family . 
+ This `` core '' regulon is comprised of araBAD , araFGH , araE , ytfQ , and araTU . 
+ Three of these genes , ytfQ , araT , and araU , have not previously been described as AraC targets . 
+ The conservation of regulation of ygeA and polB by transcriptional read-through is more difﬁcult to assess . 
+ araE-ygeA synteny is not well conserved , suggesting that ygeA is not a conserved AraC regulon member . 
+ We did not detect regulation of polB by AraC in S. enterica . 
+ However , there is a two-gene insertion between araD and polB in S. enterica . 
+ In contrast , most other Enterobacteriaceae species maintain the araD-polB synteny . 
+ Hence , polB regulation by AraC may be widely conserved . 
+ Strikingly , one of the conserved regulatory targets of AraC , araTU , is absent from E. coli . 
+ This highlights the risk associated with making inferences on TF regulons if experimental data are only available for one species . 
+ An analysis of AraC regulon conservation based only on E. coli target genes would have missed araTU . 
+ Similarly , an analysis of AraC regulon conservation based only on S. enterica target genes would have missed araFGH . 
+ The importance of using experimental data from multiple species is especially high for TFs that have degenerate binding motifs , such as AraC , since binding sites can not easily be predicted from DNA sequence alone . 
+ Conclusions . 
+ Our unbiased mapping of the AraC regulons of 
+ E. coli and S. enterica has revealed new functions and new mechanisms of action for this storied regulator . 
+ Our data suggest that AraC regulates functions beyond arabinose metabolism . 
+ Furthermore , unlike the PhoP regulon , most AraC regulatory targets are conserved across related species , although conservation is limited to genes required for the transport and metabolism of arabinose . 
+ Our work highlights the importance of genome-scale approaches in the study of bacterial gene expression . 
+ ACKNOWLEDGMENTS
+ We thank David Grainger , members of the Wade laboratory , Robert Schleif , and members of Keith Derbyshire and Todd Gray 's group for helpful discussions . 
+ We thank David Grainger , Todd Gray , Keith Derbyshire , and Rick Wolf for comments on the manuscript . 
+ We thank Chunhong Mao for assistance with RNA-seq analysis . 
+ We thank the Wad-sworth Center Bioinformatics Core , the Wadsworth Center Applied Genomic Technologies Core , and the University at Buffalo Next Generation Sequencing Core Facility for technical assistance . 
+ This work was supported by National Institutes of Health ( NIH ) grant 1DP2OD007188 and Wadsworth Center start-up funds ( J.W. ) , U.S. National Science Foundation grant MCB-1158056 ( I.E. ) , and appointments ( C.B. and B.P. ) to the Emerging Infectious Diseases ( EID ) Fellowship Program administered by the Association of Public Health Laboratories ( APHL ) and funded by the Centers for Disease Control and Prevention ( CDC ) .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/24699140.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/24699140.txt 0 → 100644
View file @27818a9
+ Determining the Control Circuitry of Redox Metabolism
+ Abstract has long been known that facultative anaerobes will hierarchically utilize external electron acceptors relative to the free energy change provided by each [ 1,2 ] . 
+ Oxygen exists at the top of the hierarchy , electron acceptors like NO3 in the middle , and lactate or acetate or other fermentation products are at the bottom [ 3 -- 5 ] . 
+ Many detailed studies have determined that the transcription factors ( TFs ) ArcA and Fnr are the key players in managing this hierarchy through the activation or repression of the electron transport chain ( ETC ) machinery specific to an available electron acceptor [ 6 -- 11 ] . 
+ It is also largely understood how ArcA senses redox via the flow of reducing equivalents through the ETC , and how Fnr directly senses levels of dissolved O2 [ 1,12,13 ] and glutathione [ 14,15 ] . 
+ However , it is not clear how these two TFs work together and more importantly why they regulate hundreds of gene products that lie outside of the ETC and energy metabolism [ 3,5 ] ? 
+ Even though many biochemical details of redox regulation have been elucidated [ 6,8,16 ] , systems level principles for the global regulatory response throughout the anaerobic shift remain elusive . 
+ An important missing piece is a clear framework , or design principle , that elucidates how hundreds of transcriptionally regulated gene products are coordinately regulated to produce the necessary quantitative shifts in metabolic flux states . 
+ On the purely metabolic side , certain design principles have emerged through the analysis of stoichiometric models that identified growth and energy generation as the two principal dimensions of metabolic network function [ 17 -- 19 ] . 
+ It was further shown that linear combinations of these two dimensions could account for observed flux patterns throughout nutrient limitations and the anaerobic shift [ 18,20 ] . 
+ A question now becomes , what are the corresponding global TFs and how do they coordinately regulate all the gene products which enable the metabolic flux map to shift from one optimal state to another ? 
+ Here we show how the global TFs ArcA and Fnr coordinately regulate the primary metabolic dimensions of growth and energy generation . 
+ We integrated polyomic data sets and used genome-scale metabolic models to enable a mechanistic understanding of hundreds of simultaneous and individual regulatory events . 
+ This analysis subsequently provides a link between global regulatory circuits and global optimality in microbial metabolism . 
+ Results
+ Genome-scale identification of TF regulatory events
+ We first identified individual TF regulatory events at the genome-scale . 
+ Side-by-side measurements of RNA transcript abundance and TF binding were carried out to determine the structure and causality in E. coli 's transcriptional regulatory network ( TRN ) . 
+ ChIP-chip assays for ArcA and Fnr were performed under both fermentative and nitrate respiratory conditions ( Figure 1A ) . 
+ Gene expression measurements were then used to determine causality of activation or repression for each ArcA or Fnr binding site under these same two conditions ( as detailed in the later heatmap figure legend , Figure S1 ) . 
+ We found 102 , and 86 ( and 143 and 132 ) binding regions and 58 and 54 ( and 
+ 95 and 55 ) causal regulatory events for ArcA and Fnr under fermentation ( and nitrate respiration ) conditions , respectively ( Figure 1A , Tables S1 , S2 , S3 , S4 ) . 
+ We then compiled the set of genomic sequences underlying these binding regions for each of the TFs and used the MEME program [ 21 ] to recover previously identified binding motifs [ 22,23 ] ( Figure 1B , Tables S5 , S6 ) . 
+ We confirmed 180 of 216 ( 83 % ) previously known regulatory events [ 24 ] and discovered 132 new binding regions relative to 
+ RegulonDB ( Figure 1A ) , representing an increase of 74 % over current knowledge of the regulatory functions of these two TFs . 
+ We further performed a detailed comparison of our results to recently published works [ 16,25 ] to determine a 78 % overlap in ArcA binding sites and a 50 % overlap in Fnr binding sites under fermentative conditions ( Figures S5 , S6 , S7 ) . 
+ In addition , we report 88 novel binding sites for ArcA and 52 novel binding sites for Fnr under nitrate respiratory conditions highlighting plasticity of the network throughout shifting external electron acceptors . 
+ We then integrated transcription start sites ( TSS ) [ 26 ] with TF binding regions to identify promoter architectures [ 27 ] . 
+ The location of TF binding motifs within experimentally determined binding regions were used to prepare histograms of the frequency of TF binding relative to the TSS ( Figure 1B ) . 
+ This analysis showed that ArcA spans the TSS or 235 box region and represses transcription while Fnr spans the 241.5 or alpha carboxy terminal domain and activates transcription [ 27 ] . 
+ While each of these regulatory strategies have been shown previously , here can we show that each strategy is ubiquitous at the genome-scale . 
+ Discovery of transcription factor mediated bidirectional transcription 
+ Novel cases of divergent transcriptional regulation were found in this data . 
+ The integration of binding regions with gene expression data revealed 42 regions where two divergent transcriptional units ( TUs ) were simultaneously regulated by a single binding event . 
+ Divergent transcriptional regulation has been observed previously [ 28 ] and is known to be mediated by transcription factors in certain cases . 
+ However , systematic regulation by global TFs has only been observed in limited cases [ 29 ] . 
+ We observe a total of 19 inverse , 16 dual activation , and 13 dual repression events for a total of 48 events spread across the 42 regions as some recur under different experimental conditions . 
+ Two examples ( Figure 1C ) highlight this ` hard coupling ' of the transcriptional regulation of seemingly unrelated but contextually dependent pathways . 
+ The acs-nrfABCDE system represents a lowest common denominator coupling between acetyl-coA synthetase ( acs ) acetate scavenging to acetyl-coA and usage of acetyl-coA via the TCA cycle and nrfABCDE nitrite reductase . 
+ Similarly the aroPpdhR system couples the transport of aromatic amino acids to the regulation of pyruvate that acts as their principal precursor molecule . 
+ The link between the acs and nrfABCD systems has been inferred/suggested in previous work which attempted to understand how E. coli could survive on acetate as a sole carbon source under anaerobic conditions [ 30 ] . 
+ In particular , E. coli can not utilize acetate under fully anaerobic conditions because acetate must be scavenged into acetyl-coA via acs and then utilized by the TCA cycle . 
+ Anaerobically the TCA cycle can not be used unless there is an electron acceptor in the ETC to enable oxidative phosphorylation . 
+ Thus , some usage of the TCA cycle via an alternative electron acceptor such as nitrite or nitrate is necessary for E. coli to utilize acetate and acetyl-coA anaerobically . 
+ This metabolic feature is physiologically crucial in the gut environment that is rich in fatty acids that can not be used if E.coli does not utilize alternative electron acceptors like nitrite . 
+ Hence , the direct coupling of acs and nrfABCD through bidirectional transcriptional regulation is consistent with the necessity of a flux through the nrfABCD system in order for the acetyl-coA formed by acs to be utilized . 
+ The transcriptional coupling acts as bidirectional gate controlled by ArcA and the redox state of the cell to coordinate this evolutionarily crucial metabolic capability . 
+ Similarly the aroP-pdhR system couples the transport of aromatic amino acids to the regulation of pyruvate that acts as their principal precursor molecule through the action of Fnr . 
+ To understand the network level connection between the aromatic amino acid transporter ( aroP ) and the pyruvate dehydrogenase repressor TF ( pdhR ) one can examine Figure 2 , which shows the connection between catabolic biomass precursors and biosynthetic pathways . 
+ Tyrosine and tryptophan are both made directly from PEP that is rapidly dephosphorylated into pyruvate . 
+ The corresponding activation of aroP and repression of pdhR is consistent with an increased need for amino acid transport when the precursors for biosynthesis ( PEP ) are critical to maintain cellular energy levels . 
+ This characteristic is supported by a dampening of the switch upon the transition to nitrate respiration , resulting in decreased transporter expression when less pyruvate is needed for fermentation and can thus be shuttled to amino acid biosynthesis . 
+ In general , pdhR acts as a classic repressor that `` pops off '' of its binding site in the presence of pyruvate and hence allows expression of pyruvate dehydrogenase and other oxidative enzymes . 
+ Anaerobically pyruvate dehydrogenase ( aceEF-lpd ) is repressed regardless of pdhR by ArcA and Fnr and given that there is also a higher concentration of pyruvate it would presumably not be active . 
+ Thus , while this switch is highlighted anaerobically in that full repression of pdhR is concomitant with aroP activation its physiological significance is more prevalent under nitrate or even fully aerobic conditions in which it can function to directly couple and balance the catabolic and anabolic demands around pyruvate which acts as a critical second messenger in the aerobic-anaerobic shift [ 6 ] . 
+ It is very insightful to view such a switch as it is ramped fully up under anaerobic conditions and then turned down under nitrate respiration to maintain a physiologically crucial metabolic 
+ Previous work has identified biomass production and energy production as the two principal dimensions characterizing the overall function of metabolic networks [ 17 -- 19 ] . 
+ This duality in function is conceptually equivalent to considering heterotrophic metabolism as the standard combustion equation ( Figure 2 ) in which an electron donor ( glucose ) is broken apart with an electron acceptor ( oxygen , nitrate , etc. ) to form biomass , energy , waste and heat . 
+ Here we use the terms catabolism to describe oxidation of the electron donor , anabolism to describe biomass formation , and chemiosmosis to describe energy generation . 
+ The genes in each of these categories were determined by a manual curation of the E. coli metabolic model [ 31 ] and associated literature sources [ 4,26 ] . 
+ Catabolic genes correspond to nutrient transporters , recycling machinery , and central catabolic machinery . 
+ Anabolic genes correspond to biosynthetic and macromolecular synthesis pathways . 
+ Chemiosmotic genes correspond to the electron transport chain ( ETC ) , fermentation pathways , and ion pumps ( Figure 3 ) . 
+ From the data sets described above , the regulation of these three classes of genes by ArcA and Fnr can be analyzed using their metabolic functions as context . 
+ ArcA and Fnr directly regulate a total of 127 catabolic genes including 49 transporter genes , 38 recycling or secondary catabolic enzymes , 33 central metabolic genes , and 7 associated TFs ( Figures 2,3 ) . 
+ In particular , recovery of all of the classic targets of ArcA and Fnr is complemented by the simultaneous discovery of transporter genes and recycling enzymes like peptidases and proteases ( Figure 3 ) . 
+ It can also be recognized that there existed many classically unknown glycolytic targets along with generally unrecognized activation of the glucose transporter ptsG . 
+ Activation of ptsG by Fnr is consistent with the fact that cells nearly double their uptake of carbon during fermentative growth compared with aerobic growth . 
+ In anabolism , ArcA and Fnr directly regulate 54 genes including 34 metabolite synthesis genes , 14 macromolecular synthesis genes , and 6 TFs . 
+ Broad trends of nucleotide biosynthesis activation and amino acid biosynthetic activation of nucleotide precursors is consistent with redox related demands . 
+ However , perhaps the most important of these findings is the regulation of both transhydrogenases ( sthA , pntAB ) in E. coli . 
+ Previous work has shown that a large portion of the NADPH used for biosynthetic reactions comes from the membrane bound transhydrogenase PntAB [ 32 ] and that the soluble SthA is used for re-oxidation of NADPH under aerobic growth with excess glucose . 
+ Our data shows that ArcA activates pntAB and represses sthA in a redox-dependent fashion consistent with an increased need for NADPH under nitrate respiration relative to fermentation ( Figure 3 ) . 
+ This regulatory shuttling of reduction equivalents thus plays a critical role in maintaining the balance between growth and energy generation by increasing growth only once when energy demands are satisfied . 
+ In the chemiosmotic category we observe regulation of 120 genes including 83 genes of the ETC , 6 for fermentation , 21 for ion pumps , 2 for motility , and 8 TFs . 
+ Nearly all of the regulation can be shown to coincide with redox related demands including regulation of ion pumps which coincides with an increased need to maintain a positive electrical gradient across the inner membrane to make up for the diminished proton gradient . 
+ We also observed strong regulation of the flhDC , gadW , and appY transcription factors . 
+ The flhDC system is a master regulator for the motility and flagellum apparatus of the cell that feeds off the chemiosmotic gradient in search of nutrients . 
+ appY and gadW are key regulators of cytochromes and acidic tolerance , respectively . 
+ After including regulation through appY we can conclude that ArcA and Fnr exhibit control either directly or indirectly over 15 out of the 16 known dehydrogenase and oxidoreductase reactions in E. coli [ 4 ] ( Figures 2,3 ) . 
+ High-level architecture of the metabolic-regulatory network
+ Enumerating regulatory events is informative , but how do they all together form a coherent regulatory logic that produces meaningful physiological states ? 
+ Network analysis of these regulatory interactions reveals a qualitative feedforward and feedback flow-based model of the primary metabolic dimensions ( Figure 4A ) . 
+ The model input is the total set of catabolites ( glucose or electron donor ) available to the cell that are oxidized based on the availability of an electron acceptor into a ratio of reduced to oxidized components . 
+ These components ( primarily NADH/NAD and NADPH/NADP ) are then used by the anabolic machinery to generate biomass , or by the chemiosmotic machinery to generate energy as outputs . 
+ The ratio of reduced-to-oxidized components is sensed by ArcA and Fnr [ 1 ] , and they can feedback and feedforward regulate the catabolic , anabolic , and chemiosmotic processes in a coordinated fashion to maintain the ratio . 
+ Consistent with this schema , it has been shown that TFs are ideal flux sensors [ 33 ] . 
+ Analyzing the regulatory events within the context of the qualitative flow-based model reveals a feedforward with feedbacktrim architecture of the overall regulatory logic . 
+ Counting the number of genes that are activated or repressed ( Figure 3 ) provides a measure of the extent of feedforward or feedback regulation exerted ( Figure 4B ) . 
+ Under fermentation ArcA represses 70 catabolic genes and Fnr activates 75 chemiosmotic genes . 
+ Under nitrate respiration ArcA represses 73 catabolic genes and Fnr activates 61 chemiosmotic output genes . 
+ A similar trend is observed for regulation of the anabolic circuitry in which Fnr activates 14 and 11 genes under fermentation and nitrate respiration . 
+ This circuitry is consistent with fast sensing of oxygen by Fnr and slow but continuous sensing of redox flow through the ETC by ArcA [ 34 ] . 
+ The regulatory architecture revealed by this qualitative model is comprehensive and novel , but primarily topological . 
+ To more quantitatively assess the functions of the observed transcriptional regulatory architecture on the metabolic network that it regulates we sampled all allowable network flux states of a highly curated genome-scale metabolic model of E. coli metabolism [ 31 ] under both fermentative and nitrate respiratory conditions . 
+ This sampling of allowable flux states of the metabolic network was then integrated with the experimentally determined regulatory architecture to discern the amount of total flux ( sum of flux loads across all reactions ) regulated by ArcA and Fnr under each of the conditions studied . 
+ This calculation revealed that 60 % and 57 % ( and 88 % and 80 % ) of all metabolic flux is directly ( and indirectly ) controlled by ArcA and Fnr under fermentative and nitrate respiratory conditions respectively ( Tables S7 , S8 ) . 
+ We further show that 69 % and 62 % of the catabolic fluxes producing each of the redox molecules and biomass precursors along with 71 % and 69 % of the downstream anabolic and chemiosmotic fluxes are directly regulated under fermentative and nitrate respiratory conditions respectively ( Figure S3 , Table S9 , S10 ) . 
+ From a gene level we find that 246 genes are differentially expressed ( fdr , .05 , fold change .2 ) between fermentative and nitrate respiratory conditions and that 236/246 or ,96 % of the genes are directly ( 73 ) or indirectly ( 163 ) regulated by ArcA or Fnr ( Table S12 ) . 
+ Taken together , these measurements quantify the global metabolic regulation of flux by ArcA and Fnr and provide further evidence towards the proposed feedforward with feedback-trim regulatory architecture . 
+ To provide more validation for the feedforward with feedbacktrim architecture at the genome-scale we first assessed the set of 91 reactions that significantly differed ( flux cutoff of 0.25 mmol / gDW-1 - h-1 ) between fermentation and nitrate respiration ; gDW is denotes grams dry weight . 
+ We were then able to show that 89 of the 91 reactions were regulated directly ( 40 reactions ) or indirectly ( 49 reactions ) by ArcA or by Fnr ( Table S11 ) . 
+ We then calculated the change in flux for each of these 89 reactions between the two conditions along with the change in regulatory strength for the genes encoding these 89 reactions across the same conditions ( Table S11 ) . 
+ We plotted the change in flux versus the change in regulation ( Figure 5A ) and calculated an r correlation value of 2 0.71 ( p,1e-6 ) for the directly regulated genes . 
+ This correlation provides quantitative evidence for the logic of the regulatory circuit in the transition from fermentation to nitrate respiration . 
+ The linear positive slope shows not only that the reactions responsible for the redox shift are regulated , but also that these reactions are quantitatively regulated to help minimize the redox ratio in concert with the quantitative model predictions . 
+ Most of the ArcA regulated reactions are de-repressed , as indicated by the lightening shade of blue under nitrate respiration ( Figure 5B ) . 
+ Most of the Fnr regulated reactions are de-activated as highlighted by the lightening shade of yellow under nitrate respiration ( Figure 5B ) . 
+ The broad repression of crucial catabolic genes by ArcA and activation of chemiosmotic genes by Fnr is also shown through analysis of C-13 MFA data generated between wild type and Dfnr or DarcA strains ( Figure S8 ) . 
+ This trend of redox ratio minimization was so strong that the only outliers resulted in identification of new biology in the form of transport-coupled redox balancing for allosterically regulated amino acid biosynthetic reactions ( Figure S4 , Text S1 ) . 
+ We then sought to show that this quantitative regulatory model was truly redox dependent and not just fermentative/nitrate respiration specific . 
+ We thus took C-13 measured flux data [ 35 ] for E. coli grown aerobically in batch under either fully respiratory galactose conditions or partially fermentative glucose conditions . 
+ Even though both conditions are aerobic , we hypothesized that a similar shift in the redox ratio as observed between fully fermentative and nitrate respiration would occur given the comparison between a partially fermentative and fully respiratory condition . 
+ We made the same plot ( Figure 5C ) as in Figure 5a and even used regulatory strengths taken from the fermentative/nitrate shift . 
+ Only 16 flux measurements could be mapped of which only 9 showed any difference between glucose and galactose conditions . 
+ Of those 9 fluxes we were able to see a clear correlation for 7 and 2 an overall weak but significant r correlation value of .26 ( p = .079 ) . 
+ This plot again shows genes regulated by ArcA being de-repressed and genes regulated by Fnr being de-activated upon the shift to more oxidative conditions ( Figure 5D ) . 
+ Hierarchy of the joint metabolic-regulatory network
+ An expansion of the top-level of the flow-based model contextualizes the function of the hundreds of individual gene products and provides a window into the structure of the full metabolic-regulatory network ( Figure 6A ) . 
+ Each different type of catabolite ( Figure 3 , Figure 4A , Figure 6A ) is maintained via production fluxes ( transport or recycling ) and consumption fluxes ( secondary catabolism or central catabolism ) . 
+ The catabolism specific production set consists of genes for amino acid , carbohydrate , lipid , and nucleic acid transport and recycling . 
+ The same expansion can be performed for anabolism and chemiosmosis . 
+ For anabolism , the total biomass is a result of the sum of the rate of metabolite biosynthesis plus the rate of macromolecular synthesis [ 36 ] minus the rate of dilution and recycling . 
+ For chemiosmosis , the total gradient is a sum of protons pumped across the inner membrane via the ETC , proton equivalents pumped across the inner membrane via fermentation , and ions translocated across the inner membrane minus the usage of the gradient for ATP production , nutrient transport , and motility [ 37 ] . 
+ This expansion also accounts for the classically observed hierarchy [ 38 ] of the TRN via sensing of lower level metabolites and subsequent regulatory control of the TFs themselves or of the production or consumption pathways for sensed metabolites ( Figure 6B ) . 
+ A full tracing of the TRN to explain the effects of the global TF deletion is consistent with 69 % of observed differential expression ( Figure S2 ) . 
+ Discussion
+ This work presents a systems level and genome-scale mechanism for the coordinate action of global transcription factors throughout an electron acceptor shift . 
+ Our mechanism accounts for the previously unexplained genes regulated by ArcA and Fnr , it predicts changes in flux patterns , and perhaps most importantly shows that the classically observed hierarchy of transcriptional regulation mirrors the hierarchy of dimensions in the metabolic network . 
+ By basing our work off of the extensive body of detailed biological literature and the more recent work of principal dimensionality in metabolic networks we are able to present a systematic and remarkably consistent genome-scale mechanism . 
+ At the local level , we first greatly expanded the number of cases of promoter architectures [ 39 ] . 
+ This validates and highlights the importance of understanding initiation mechanisms , as they may be extendable to a systems level in future development of computational models . 
+ We were then able to make the novel discovery that 42 regions across the genome contained divergently transcribed TUs controlled by a single global TF binding region . 
+ We recognize that due to ChIP-chip resolution it is possible ( and even likely ) that multiple binding sites exist under the larger ChIP peak , however the local proximity still affords the same hardcoupling within the regulon . 
+ This hard coupling suggests switch like mechanisms in which sets of seemingly unrelated genes are jointly regulated to obey non-obvious systems level constraints . 
+ We identify two such cases of this in the acs-nrfABCDE operon and the aroP-pdhR operon . 
+ To understand systems level mechanisms of transcriptional regulation we turned to previous work that showed the principal dimensions of a metabolic space were biomass and energy generation . 
+ We hypothesized that global regulators must play a role in regulating globally decisive metabolic dimensionality . 
+ This hypothesis is supported by broad regulation across all of these main categories and the abilities of ArcA and Fnr to sense the molecules that govern the branch point between the two dimensions . 
+ Although we were able to make an unbiased characterization of the genes in each of the categories using the iJO1366 model we were still unsatisfied with such a coarse grained approach and sought to understand the composition of each of the categories . 
+ This led us to a hierarchical expansion and classification of pathways around key metabolic intermediates . 
+ Going on in this fashion led us to realize that the global transcriptional regulatory hierarchy plays out not only on the level of TF-TF regulation , but perhaps more importantly at the level of global TFs regulating the production or consumption fluxes of lower level metabolites which are correspondingly sensed by other intermediate regulators . 
+ In essence , the regulatory network is shaped by the underlying metabolite pools and vice versa . 
+ After determining the broad circuitry of the metabolic-regulatory network we mapped our data onto it and discovered that a strong feedforward with feedback trim architecture dominates at the genome scale . 
+ This occurs via ArcA 's strong repression of input catabolic circuits coupled with Fnr 's strong activation of downstream chemiosmotic and anabolic circuitry . 
+ This circuit is corroborated by Fnr 's ability to sense oxygen [ 13 ] which will diffuse quickly whereas ArcA will more continuously sense the flow of reducing equivalents through the ETC by sensing of the ratio of reduced to oxidized quinones [ 12 ] . 
+ This pattern of a fast component feeding forward for downstream `` planning '' coupled with a slower but continuous feedback sensor is a common pattern in basic process control schemes [ 40 ] . 
+ If coupled with other common process control patterns such as hierarchical and PID control one can envision a process control based model 
+ This work presents a formal integration and reconstruction of over 50 years of research on E. coli metabolism and its transcriptional regulation . 
+ The result is a detailed and coherent hierarchical view of the regulation of the principal dimensions of metabolism through a critical environmental shift . 
+ We find that the mathematical notions of optimality in metabolic functions are in line with our observations of global regulation . 
+ TRNs are not just TF-gene networks but rather TF-gene-enzyme-reaction flux networks , that are tightly integrated as levels or ratios of metabolites can drive TF activity [ 41,42 ] . 
+ The full elucidation of an electron acceptor response in the important model organism , E. coli , may have implications for similar metabolic responses in other organisms . 
+ For cancer , recent focus has shifted towards an understanding of the metabolic drivers and Warburg effect , where the hypoxia inducible factor ( HIF ) [ 43 ] senses the redox ratio and feedforward or feedback regulates genes producing or consuming reduction potential . 
+ Taken together , we are able to show how the two principal dimensions of metabolism are controlled in a shifting environment by global TFs through the use of polyomic data sets and genome-scale metabolic models . 
+ This study is likely to be useful as a guide for similar studies in other organisms where the same tools for experimentation and analysis are available . 
+ Methods
+ Bacterial strains and growth conditions
+ All strains used in this study were E. coli K-12 MG1655 and its derivatives . 
+ The E. coli strains harboring Fnr-8myc and ArcA-8myc were generated as described previously [ 44 ] . 
+ The deletion mutants ( Dfnr and DarcA ) were constructed by a l red and FLP-mediated site-specific recombination method . 
+ Glycerol stocks of E. coli strains were inoculated into M9 minimal medium containing 0.2 % ( w/v ) carbon source ( glucose ) and 0.1 % ( w/v ) nitrogen source ( NH4Cl ) , and cultured overnight at 37uC with constant agitation . 
+ The cultures were diluted 1:100 into fresh minimal medium and then cultured at 37uC to an appropriate cell density with constant agitation . 
+ For the anaerobic cultures , the minimal medium were flushed with nitrogen and then continuously monitored using a polarographic-dissolved oxygen probe ( Cole-Parmer Instruments ) to ensure anaerobicity . 
+ For nitrate respiration 20 mmol potassium nitrate was added . 
+ ChIP-chip
+ To identify Fnr and ArcA binding regions in vivo , we used the ChIP-chip approach as described previously [ 44,45 ] . 
+ Briefly , cells at appropriate cells density were cross-linked by 1 % formaldehyde at ,20 uC for 25 min . 
+ Following the quenching of the unused formaldehyde with a final concentration of 125 mM glycine at ,20 uC for 5 min , the cross-linked cells were harvested and washed three times with 50 ml of ice-cold Trisbuffered saline . 
+ The washed cells were resuspended in 0.5 ml lysis buffer composed of 50 mM Tris-HCl ( pH 7.5 ) , 100 mM NaCl , 1 mM EDTA , 1 mg/ml RNaseA , protease inhibitor cocktail ( Sigma ) and 1 kU Ready-Lyse lysozyme Epicentre ) . 
+ The cells were incubated at 37uC for 30 min and then treated with 0.5 ml of 2 Å , IP buffer with the protease inhibitor cocktail . 
+ The lysate was then sonicated four times for 20 s each in an ice bath to fragment the chromatin complexes using a Misonix sonicator 3000 ( output level , 2.5 ) . 
+ The range of the DNA size resulting from the sonication procedure was 300 -- 1,000 base pairs ( bp ) . 
+ The specific antibodies that specifically recognizes myc tag ( 9E10 , Santa Cruz Biotech ) were used to immunoprecipitate each chromatin complex , respectively . 
+ For the control ( mock-IP ) , 2 mg of normal mouse IgG ( Upstate ) was added into the cell extract . 
+ The remaining ChIP-chip procedures were performed as described previously [ 44,45 ] . 
+ The high-density oligonucleotide tiling arrays used to perform ChIP-chip analysis consisted of 371,034 oligonucleotide probes spaced 25 bp apart ( 25 bp overlap between two probes ) across the E. coli genome ( Roche NimbleGen ) . 
+ After hybridization and washing steps , the arrays were scanned on an Axon GenePix 4000B scanner and features were extracted as a pair format by using NimbleScan 2.4 software ( RocheNimbleGen ) . 
+ To monitor the enrichment of promoter regions , 1 mL immunoprecipitated DNA was used to carry out gene-specific qPCR . 
+ The quantitative real-time PCR of each sample was performed in triplicate using iCycler ( Bio-Rad Laboratories ) and SYBR green mix ( Qiagen ) . 
+ The real-time qPCR conditions were as follows : 25 mL SYBR mix ( Qiagen ) , 1 mL of each primer ( 10 pM ) , 1 mL of immunoprecipitated or mock-immunoprecipi-tated 3DNA and 22 mL of ddH2O . 
+ All real-time qPCR reactions were done in triplicates . 
+ The samples were cycled to 94uC for 15 s , 
+ 52uC for 30 s and 72uC for 30 s ( total 40 cycles ) on a LightCycler ( Bio-Rad ) . 
+ The threshold cycle values were calculated automatically by the iCycler iQ optical system software ( Bio-Rad Laboratories ) . 
+ Any primer sequences used were described previously [ 44 ] . 
+ Transcriptome analysis
+ Samples for transcriptome analysis were taken from exponentially growing cells . 
+ From the cells treated by RNAprotect Bacteria Reagent ( Qiagen ) , total RNA samples were isolated using RNeasy columns ( Qiagen ) in accordance with manufacturer 's instruction . 
+ Total RNA yields were measured using a spectrophotometer ( A260 ) , and quality was checked by visualization on agarose gels and by measuring the sample A260/A280 ratio ( .1.8 ) . 
+ Affymetrix GeneChip E. coli Genome 2.0 arrays were used for genome-scale transcriptional analyses . 
+ cDNA synthesis , fragmentation , end-terminus biotin labeling , and array hybridization were performed as recommended by Affymetrix standard protocol . 
+ Raw CEL files were analyzed using robust multi-array average for normalization and calculation of probe intensities . 
+ The processed probe signals derived from each microarray were averaged for both the wild 
+ ChIP-chip and expression data analysis . 
+ To identify TF-binding regions , we used the peak finding algorithm built into the NimbleScan software . 
+ Processing of ChIP-chip data was performed in three steps : normalization , IP/mock-IP ratio computation ( log base 2 ) , and enriched region identification . 
+ The log2 ratios of each spot in the microarray were calculated from the raw signals obtained from both Cy5 and Cy3 channels , and then the values were scaled by Tukey bi-weight mean . 
+ The log2 ratio of Cy5 ( IP DNA ) to Cy3 ( mock-IP DNA ) for each point was calculated from the scanned signals . 
+ Then , the bi-weight mean of this log2 ratio was subtracted from each point . 
+ Each log ratio dataset from duplicate samples was used to identify TF-binding regions using the software ( width of sliding window = 300 bp ) . 
+ Our approach to identify the TF-binding regions was to first determine binding locations from each data set and then combine the binding locations from at least five of six datasets to define a binding region using the MetaScope software ( http://sbrg.ucsd . 
+ edu/Downloads/MetaScope ) . 
+ Raw gene expression CEL files were normalized using background corrected robust multi-array average implemented in the R affy package . 
+ To detect differential expression between the wild type and TF deletion strains we applied a two-tailed unpaired students t-test between the experimental triplicates for the wild type and gene deletion strains . 
+ This was followed by a false discovery rate adjustment . 
+ Before performing the FDR correction we removed all genes that exhibited an expression level below the background across all experiments . 
+ The background level was calculated as the average expression level across all intergenic probes . 
+ We then only considered genes meeting a 5 % FDR ( false discovery rate ) - adjusted P-value cut-off to be differentially expressed . 
+ ChIP binding tracks for Figure 1a and the heatmap for Figure 3 were generated using D3 [ 46 ] . 
+ Related code is available at http://nbviewer.ipython.org/gist/ steve-federowicz/7cceedba73982c0ae995 . 
+ All raw and processed data have been deposited in NCBI/GEO under accession number GSE55367 ( http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE55367 ) . 
+ Motif searching
+ The ArcA and Fnr binding motif analysis was completed using the MEME and FIMO tools from the MEME software suite [ 21 ] . 
+ We first determined the proper binding motif and then scanned the full genome for its presence . 
+ The elicitation of the motif was done using the MEME program on the set of sequences defined by the ArcA and Fnr binding regions respectively . 
+ Using default settings the previously determined ArcA and Fnr motifs were recovered and then tailored to the correct size by setting the width parameter to 18-bp and 16-bp respectively . 
+ We then used these motifs and the PSPM ( position specific probability matrix ) generated for each by MEME to rescan the entire genome with the FIMO program . 
+ Promoter architecture determination
+ We integrated transcription start sites ( TSS ) with our TF binding regions to identify promoter architectures genome wide [ 27,47 ] . 
+ We first determined the location of motif binding sites within experimentally determined binding regions . 
+ We then calculated the distance between motif center position and previously determined TSS locations [ 26 ] . 
+ Finally , we prepared a histogram of the number of motifs that occur at varying distances relative to the TSS ( Figure 1B ) and included the gene expression data to determine the regulatory outcome of each binding event . 
+ The results showed that ArcA spans the TSS or 235 box region and represses transcription while Fnr spans the 241.5 or alpha carboxy terminal domain [ 47 ] and activates transcription . 
+ The histograms also reveal the previously reported trend [ 48 ] of motif frequency oscillation at a roughly 10.5 bp interval consistent with helical phasing of the DNA strand . 
+ Genome-scale metabolic sampling
+ To perform sampling we first generated pFBA [ 49 ] constrained models of the iJO1366 [ 31 ] metabolic model corresponding to fermentative and nitrate respiratory conditions . 
+ Fermentative conditions were simulated by setting the lower bound of the oxygen exchange reaction ( EX_o2 ) to zero . 
+ Nitrate respiratory conditions were simulated by setting the lower bound for nitrate uptake ( EX_no3 ) to 220 mmol gDW h ( mirroring experi-21 21 mental addition of 20 mmol KNO3 ) along with the lower bound of EX_o2 set to zero . 
+ pFBA constrained models were generated by first using the convertToIrreversible ( ) function of the COBRA toolbox [ 50 ] followed by a standard FBA for growth rate . 
+ This growth rate was then imposed as a constraint in a subsequent optimization that found the minimum sum of flux able to achieve that growth rate . 
+ Finally , using the gpSampler ( ) [ 50 ] method we sampled each of the pFBA constrained models . 
+ All sampling runs were for a full 24 hours to ensure a mixing fraction below .55 . 
+ After sampling was performed we took the average across the 7046 sampling points ( 2n where n = 3,523 reactions in the metabolic model ) . 
+ Sampling results were then interfaced with the regulatory network and metabolic model via the COBRApy project ( http : / / opencobra.sourceforge.net/openCOBRA ) , iPython notebook [ 51 ] , and in-house databases . 
+ Supporting Information
+ and results for this curation can be viewed at http://nbviewer . 
+ ipython.org/gist/steve-federowicz/aa44c9d8add955f4ada7 for Fnr and http://nbviewer.ipython.org/gist/steve-federowicz/ 1c5017c6ce419234019a for ArcA . 
+ ( PDF ) et al. studies . 
+ The overall conclusion here is that most of the differences in each case were due to genes that were either not expressed or lowly expressed in our data . 
+ These differences can be primarily attributed to different measurement technologies used for gene expression measurement . 
+ We used affymetrix arrays throughout this study which generally do not have as high of a dynamic range as RNAseq or Nimblegen tiling arrays used in the studies of Park et al. and Myers et al. . 
+ However , there is still a slight bias towards our ArcA data having reasonably similarity but our Fnr showing noticeable differences . 
+ All of the code and results for this curation can be viewed at http://nbviewer.ipython.org/gist/steve-federowicz/8c0e96ac208264e623b9 for Fnr and http://nbviewer . 
+ ipython.org/gist/steve-federowicz/05659c90b49abc049a42 for ArcA . 
+ ( PDF ) that for ArcA , the 21 discrepancies can be almost uniformly attributed to noise in highthroughput data in which some solid information exists , but ultimately falls below stringent cutoffs . 
+ A similar picture also emerges for Fnr with almost every discrepancy containing some type of comparable data in our study . 
+ All of the code and results for this curation can be viewed at http://nbviewer . 
+ ipython.org/gist/steve-federowicz/1cbb68842ab0a0571ff0 for Fnr and http://nbviewer.ipython.org/gist/steve-federowicz/f2b3d25f 114914147c81 for ArcA . 
+ ( PDF ) 
+ NADPH . 
+ Each node map diagram shows the split between the amount of regulated vs. unregulated flux that goes into the production or consumption of each metabolite . 
+ The notable pattern is repression of the consumption and often production upon a shift to nitrate respiratory conditions . 
+ This occurs primarily as a means of negative feedback on the flux through these core nodes . 
+ In fact these diagrams fail to show that under fermentative conditions these same fluxes through core nodes are even more highly repressed . 
+ This occurs because the metabolic network at optimality is already in line with the regulation , and hence does not carry flux through many of the reactions that are shown to be repressed under nitrate respiratory conditions . 
+ This result led us to make the scatter plot of Figure 5A which more clearly displays the higher degree of repression in fermentation vs. nitrate respiratory conditions along with deactivation through the shift . 
+ All data 
+ First and second columns indicate identified ArcA-binding peaks ( Start : left-end peak position , End : right-end peak position ) . 
+ The third column indicates the log2 ratio of each ArcA-binding peak . 
+ ( PDF ) consumed and the amount of this flux which is activated or repressed by ArcA and Fnr . 
+ ( PDF ) for Fermentative regulation and Nitrate regulation are the max absolute value levels of regulation ( Fig. 3 ) cause by ArcA or Fnr under that condition across all genes associated with the metabolic reaction . 
+ The flux difference and regulation difference is always the value of the nitrate condition minus the value of the fermentation condition . 
+ The plot in figure 5c is between the last two columns of this table . 
+ ( PDF ) by at least .25 mmol/GDW-h ) . 
+ Of these 91 reactions , 40 are directly regulated by ArcA or Fnr and another 49 are indirectly regulated . 
+ ( PDF ) 
+ Text S1 Transport coupled redox balancing as shown in Fig . 
+ S4 is explained in greater detail . 
+ Briefly , only 5 genes are found that encode for reactions which produce NAD ( P ) H and are not regulated by ArcA or Fnr . 
+ Interestingly , 4/5 of these genes are amino acid biosynthetic enzymes . 
+ Two of these enzymes in particular , serA and tyrA , are feedback inhibited by serine and tyrosine respectively . 
+ Thus , as shown in Figure S4 we are able to corroborate dramatic regulation of the sstT serine transporter and the aroP tyrosine transporter with feedback inhibition of these critical biosynthetic enzymes . 
+ Under this regulatory scheme , serine and tyrosine would be produced at the expense of critical redox potential but immediately shut down if any serine or tyrosine can be scavenged exogenously . 
+ ( DOC ) 
+ Author Contributions
+ Conceived and designed the experiments : SF BP BkC . 
+ Performed the experiments : BkC DK . 
+ Analyzed the data : SF BkC JL KZ . 
+ Contributed reagents/materials/analysis tools : SF AE HN . 
+ Wrote the paper : SF JL KZ BP .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/24927582.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/24927582.txt 0 → 100644
View file @27818a9
+ Edited by Tina M. Henkin, Ohio State University, Columbus, OH, and approved May 13, 2014 (received for review January 29, 2014)
+ The molecular mechanisms of ethanol toxicity and tolerance in bacteria , although important for biotechnology and bioenergy applications , remain incompletely understood . 
+ Genetic studies have identified potential cellular targets for ethanol and have revealed multiple mechanisms of tolerance , but it remains difficult to separate the direct and indirect effects of ethanol . 
+ We used adaptive evolution to generate spontaneous ethanol-tolerant strains of Escherichia coli , and then characterized mechanisms of toxicity and resistance using genome-scale DNAseq , RNAseq , and ribosome profiling coupled with specific assays of ribosome and RNA polymerase function . 
+ Evolved alleles of metJ , rho , and rpsQ recapitulated most of the observed ethanol tolerance , implicating translation and transcription as key processes affected by ethanol . 
+ Ethanol induced miscoding errors during protein synthesis , from which the evolved rpsQ allele protected cells by increasing ribosome accuracy . 
+ Ribosome profiling and RNAseq analyses established that ethanol negatively affects transcriptional and translational processivity . 
+ Ethanol-stressed cells exhibited ribosomal stalling at internal AUG codons , which may be ameliorated by the adaptive inactivation of the MetJ repressor of methionine biosynthesis genes . 
+ Ethanol also caused aberrant intragenic transcription termination for mRNAs with low ribosome density , which was reduced in a strain with the adaptive rho mutation . 
+ Furthermore , ethanol inhibited transcript elongation by RNA polymerase in vitro . 
+ We propose that ethanol-induced inhibition and uncoupling of mRNA and protein synthesis through direct effects on ribosomes and RNA polymerase conformations are major contributors to ethanol toxicity in E. coli , and that adaptive mutations in metJ , rho , and rpsQ help protect these central dogma processes in the presence of ethanol . 
+ Microbially produced aliphatic alcohols are important biocommodities but exert toxic effects on cells . 
+ Understanding the mechanisms by which these alcohols inhibit microbial growth and generate resistant microbes will provide insight into microbial physiology and improve prospects for microbial bio-technology and biofuel production . 
+ We find that Escherichia coli ribosomes and RNA polymerase are mechanistically affected by ethanol , identifying the ribosome decoding center as a likely target of ethanol-mediated conformational disruption and showing that ethanol inhibits transcript elongation via direct effects on RNA polymerase . 
+ Our findings provide conceptual frameworks for the study of ethanol toxicity in microbes and for the engineering of ethanol tolerance that may be extensible to other microbes and to other shortchain alcohols . 
+ Aliphatic alcohols such as ethanol are important microbial bioproducts whose toxic effects are known to limit their production in both bacteria and yeast ( 1 -- 4 ) . 
+ Thus , elucidating the mechanisms by which alcohols exert toxic effects and understanding modes of microbial tolerance are important both to understand basic microbial physiology and to engineer microbes with more efficient fermentative capacities ( 5 , 6 ) . 
+ Escherichia coli is a model of choice for these studies . 
+ Its well-defined physiology and powerful genetic tools have allowed the identification of multiple effects of ethanol on biomolecules and cellular processes . 
+ One well-established toxic effect of ethanol on E. coli is an increase in cell-envelope permeability ( 7 9 ) . 
+ E. coli exposed to -- ethanol exhibit reduced peptidoglycan cross-linking , which is detrimental to viability , and altered membrane-lipid composition , which may represent an attempt to cope with ethanol stress ( 10 , 11 ) . 
+ Ethanol induces broad transcriptional changes in E. coli that extend beyond membrane-stress responses ( 12 , 13 ) , however , suggesting that membrane effects explain only a part of the toxicity of ethanol . 
+ Consistent with this idea , widely varied approaches have successfully been used to achieve modest eth-anol tolerance in E. coli , including random transposon insertion 
+ Author contributions : R.J.F.H. , D.H.K. , M.S.S. , M.T. , J.M.P. , M.V.K. , P.J.K. , and R.L. designed research ; R.J.F.H. , D.H.K. , T.S. , M.S.S. , J.V. , M.T. , J.M.P. , M.V.K. , E.L.P. , and J.A.G. performed research ; R.J.F.H. , D.H.K. , T.S. , M.S.S. , M.T. , J.M.P. , M.V.K. , E.L.P. , I.M.O. , and J.A.G. analyzed data ; and R.J.F.H. , D.H.K. , J.M.P. , M.V.K. , and R.L. wrote the paper . 
+ The authors declare no conflict of interest. This article is a PNAS Direct Submission.
+ Freely available online through the PNAS open access option.
+ Data deposition : The data reported in this paper have been deposited in the Gene Expression Omnibus ( GEO ) database , www.ncbi.nlm.nih.gov/geo ( accession no . 
+ GSE56408 ) . 
+ Lists of noncoding RNA regions , pseudogenes , and gene sets used in ribosome profiling analysis are available from GEO under accession no . 
+ GSE56372 . 
+ 1Present address : Department of Microbiology and Immunology , University of California , San Francisco , CA 94143 . 
+ 2To whom correspondence should be addressed. E-mail:landick@biochem.wisc.edu.
+ This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10 . 
+ 1073/pnas .1401853111 / - / DCSupplemental . 
+ ethanol on transcription are suggested by two independent reports linking decreased Rho activity to ethanol tolerance in E. coli by a point mutation in rho ( 14 , 15 ) or by inhibition of Rho activity by overexpression of YaeO/Rof ( 18 ) . 
+ Rho is a hexameric RNA helicase that can terminate transcription when it is not coupled with translation ( 29 ) . 
+ Reduced Rho activity leads to numerous changes in gene expression that might confer ethanol tolerance ( 14 , 30 ) . 
+ The reported effects of ethanol on ribosomes and ( p ) ppGpp induction suggest that uncoupling of translation from transcription could additionally cause maladaptively high Rho termination within genes in ethanol-stressed cells . 
+ We sought to identify the primary effects of ethanol on E. coli and to understand mechanisms of tolerance using a two-stage strategy . 
+ First , we selected for tolerance-conferring mutations by directed evolution , allowing cultures to accumulate spontaneous mutations while minimizing bias against mutations in essential genes that can accompany transposon mutagenesis or overexpression . 
+ Second , we tested for physiological effects of ethanol suggested by the evolved alleles and investigated the mechanisms by which evolved alleles counteracted ethanol toxicity . 
+ Our results demonstrate that ethanol has detrimental effects on central dogma processes in vivo , including increases in translational error , ribosome stalling , and intragenic Rho-dependent transcription termination , as well as on transcript elongation in vitro . 
+ Mutant alleles isolated in this study help to ameliorate in vivo effects of ethanol , underscoring the physiological relevance of transcription and translation in ethanol toxicity and resistance . 
+ Results
+ Mutations in metJ , rho , and rpsQ Confer Ethanol Tolerance and Improve Fermentative Yields . 
+ To reveal physiologically important targets of ethanol , we performed serial-passage evolution experiments to select for spontaneous mutations that conferred a growth advantage to E. coli in minimal glucose medium containing increasing concentrations of ethanol up to 65 g/L ( Fig. 1A ) . 
+ We isolated clonal strains from the evolved cultures by growing culture aliquots on nonselective plates and selecting single colonies for further study . 
+ We tested ethanol tolerance of clonal strains by following growth in media containing 40 65 g of EtOH/L and se - -- lected the three highly tolerant isolates MTA156 , MTA157 , and MTA160 for genomic sequencing ( Table 1 ) . 
+ MTA156 and MTA160 were independently isolated from a single evolved culture whereas MTA157 was isolated from a separate culture evolved in parallel . 
+ Mutations present in these strains ( Table S1 ) affected genes involved in various pathways , but six of the eleven mutant alleles encoded variant proteins with clear ties to transcription and translation , including two variants of transcription termination factor Rho , two variants of the master repressor of methionine synthesis ( MetJ ) , a variant ribosomal protein S17 ( RpsQ ) , and a variant of the conserved transcription elongation / termination factor NusA . 
+ We selected for detailed study a single clonal strain , MTA156 , which displayed improved growth in the presence of ethanol ( Fig. 1B and Fig . 
+ S1 ) comparable with that reported for highly ethanol-tolerant E. coli strains described by various groups ( 1 , 17 , 18 , 20 , 22 ) . 
+ Genomic resequencing of MTA156 identified mutations affecting ispB , lptF , metJ , rho , rpsQ , and topA ( Table 1 and Table S1 ) . 
+ Replacement of these loci with wild-type alleles reduced ethanol tolerance in each case except that of topA ( Fig . 
+ S1A ) . 
+ Tolerance conferred by mutations in genes involved in LPS transport ( lptF ) and quinone biosynthesis ( ispB ) likely act by ameliorating effects of ethanol on processes occurring in the E. coli cell envelope . 
+ The other three tolerance-conferring alleles had clear ties to transcription and translation : metJ [ ΔE91 ] , rho [ L270M ] , and rpsQ [ H31P ] ( Table 2 ) . 
+ Combining these evolved alleles in an otherwise wild-type background improved aerobic growth in the presence of ethanol to a level near that of MTA156 ( Fig. 1B and Fig . 
+ S1B ) . 
+ The triple mutant ( strain EP61 ) also showed increased ethanol tolerance under anaerobic conditions ( Fig . 
+ S1 C and D ) . 
+ An ethanologenic strain bearing the evolved metJ , rho , and rpsQ alleles produced 30 % more ethanol than the base strain on a per-cell basis in a synthetic mimic of lignocellulose hydrolysate , indicating that the ethanol-tolerant phenotype can promote biofuel production under industrially relevant conditions ( Fig . 
+ S1E ) . 
+ The tolerance associated with mutations in metJ , rho , and rpsQ implicated translation and transcription as targets of ethanol . 
+ MetJ represses expression of genes involved in methionine bio-synthesis ( 31 ) . 
+ Rho terminates transcription uncoupled from translation , halting mRNA synthesis when translation terminates prematurely ( 29 ) . 
+ RpsQ is ribosomal protein S17 , an essential component of the 30S subunit that plays a role in maintaining translational accuracy ( 32 ) . 
+ We focused on phenotypes related to these three targets to study the effects of ethanol on transcription and translation within the cell . 
+ Ethanol Induces Toxic Translational Misreading . 
+ The evolved rpsQ [ H31P ] allele is identical to the nea301 rpsQ allele previously selected as a neamine-resistance mutation ( 33 ) . 
+ The nea301 allele encodes a variant of protein S17 that increases translational accuracy in vitro in the presence of ethanol and other misreading agents ( 25 ) . 
+ The in vivo effects of ethanol on translational accuracy have not been reported , but isolation of a hyperaccurate ribosomal mutation led us to hypothesize that ethanol would stimulate translational misreading , which would in turn be reduced by the evolved rpsQ allele . 
+ We tested this hypothesis by measuring misreading in the presence of varying concentrations of ethanol using a sensitive assay developed by Kramer and Farabaugh ( 34 ) . 
+ In this assay , an active-site codon of firefly luciferase ( F-luc ) is mutated ( K529N ) such that it produces an inactive enzyme if correctly translated ; if ribosomal error leads to insertion of a lysine at this position , an active enzyme is produced instead . 
+ The firefly luciferase is translationally fused to a wild-type jellyfish luciferase active under different chemical conditions , which was used to control for changes in transcription/translation rate , protein stability , and other factors . 
+ All measurements of mutant F-luc activity were normalized to an isogenic strain expressing wild-type F-luc fused to jellyfish luciferase to control for effects of ethanol on the two enzymes . 
+ Ethanol caused a dose-dependent increase in normalized F-luc activity expressed by wild-type E. coli , up to ninefold higher than untreated controls at 40 g of EtOH/L ( Fig. 2A ) , representing an increase in the miscoding error rate similar to that previously reported for inhibitory concentrations of streptomycin ( 34 ) . 
+ Strikingly , the ethanol-tolerant triple mutant and an otherwise wild-type strain bearing the rpsQ [ H31P ] allele showed dramatically reduced levels of F-luc activity at all concentrations of ethanol , such that F-luc activity in the mutant strains grown at 40 g of EtOH/L was similar to wild-type cultures without ethanol . 
+ These results demonstrate that ethanol increases translational misreading in vivo and that the evolved rpsQ allele strongly reduces translational misreading in the presence or absence of ethanol . 
+ We assessed the physiological importance of ethanol-induced errors in protein synthesis by measuring synergistic toxicity between ethanol and antibiotics that affect different steps in translation . 
+ Unrelated stressors at sublethal concentrations generally have multiplicative effects on culture growth when combined whereas stressors that are functionally related vary from this pattern ( 35 , 36 ) . 
+ When combined with streptomycin , an antibiotic that induces translational misreading , ethanol caused approximately fourfold greater toxicity than predicted by the multiplicative model for unrelated stressors ( Fig. 2B ) . 
+ In contrast , ethanol did not show cooperative synergy with chloramphenicol , which inhibits peptidyl transfer in the ribosome but does not decrease translational accuracy ( 37 ) , suggesting that ethanol does not impede the peptidyl transfer reaction . 
+ These results strongly indicate that sublethal ethanol stress induces physiologically damaging levels of translational misreading , perhaps compromising the conformation of the ribosomal decoding site known to be targeted by streptomycin ( 38 , 39 ) . 
+ The observed synergy between ethanol and streptomycin suggested that ribosomal proteins other than RpsQ classically implicated in translational accuracy , like RpsL ( protein S12 ) , might also play a role in ethanol tolerance . 
+ Consistent with this hypothesis , we observed that the rpsL150 allele ( 40 ) , which confers streptomycin resistance , increased fitness during growth at a modest concentration ( 20 g/L ) of ethanol ( Fig . 
+ S2 ) . 
+ Ethanol has also been reported to rescue streptomycin-dependent mutants with lesions mapping at or near rpsL ( 41 ) , which generally display a hyperaccurate translation phenotype ( 42 ) , implying a potent effect of ethanol on ribosome decoding activity . 
+ Ribosome Profiling Reveals Ethanol Induction of Ribosomal Termination . 
+ Translational misreading has been shown to cause ribosome stall-ing and termination ( 43 ) , and ( p ) ppGpp production in ethanol-treated E. coli may be a consequence of stalled ribosomes ( 27 ) . 
+ To investigate whether ethanol stress might cause aberrant termination of polypeptide synthesis , we assessed the genome-wide effects of ethanol on mRNA levels and ribosomal distribution across cellular mRNAs using ribosome profiling coupled with RNAseq ( 44 ) . 
+ These techniques measure mRNA abundance and ribosomal occupancy with high resolution , revealing the density of ribosomes on various mRNAs and ribosome abundance along the lengths of mRNA transcripts . 
+ We designed the ribosome-profiling experiment to compare three physiological conditions apparent in ethanol-stress experiments ( Fig. 1B ) : logarithmic growth before ethanol stress ( T0 ) , the period of growth cessation due to acute stress ( T1 ) , and the phase in which cells had resumed growth under chronic ethanol stress ( T2 ) . 
+ Practical and cost limitations on the number of samples processed precluded examining the single and double rho , metJ , and rpsQ mutants individually ; thus , we limited the experiment to comparing the triple mutant to wild type at the three indicated time points . 
+ To assess ribosome processivity , we examined the change in relative ribosome occupancy from the 5 ′ end to the 3 ′ end of ORFs , dividing each ORF into eight even segments and averaging signals across sets of genes . 
+ Decreases in relative occupancy at the 3 ′ end of messages would indicate aberrant translational termination between the 5 ′ and the 3 ′ end of the ORF . 
+ Because ribosome occupancy is the ratio of site-specific ribosome abundance over site-specific mRNA abundance , we first quantified mRNA signals from RNAseq across ORFs ( Fig. 3 A -- C ) and then used the ratio of raw ribosome footprint signal to mRNA signal to generate plots of ribosome occupancy ( Fig. 3 D -- F ) . 
+ To facilitate genome-wide comparisons , we identified a set of 3,048 protein-coding genes with unambiguously mapped reads in all RNAseq and ribosome footprinting datasets . 
+ Initially , we looked for evidence of translation termination by averaging relative occupancy values across this set of `` all genes '' ( Fig. 3D ) . 
+ Ethanol-stimulated translational termination was evident and followed similar patterns in both wild-type and mutant cultures . 
+ Before stress , ribosomes were widely distributed across mRNAs . 
+ During acute stress ( T1 ) , relative ribosomal occupancy near the 3 ′ ends of genes dropped from ∼ 0.95 to ∼ 0.75 . 
+ After cells resumed growth under chronic stress ( T2 ) , genome-wide ribosome occupancy shifted back to near prestress patterns . 
+ These observations indicate that the acute stress phase , which is coupled to growth cessation , was characterized by widespread aberrant termination of translation within mRNAs . 
+ By the time growth resumed , both strains had largely corrected this defect . 
+ In analyzing the ribosome profiling data , we observed that mRNAs with higher prestress ribosome density exhibited greater decreases in 3 ′ ribosome occupancy . 
+ To assess this pattern further , we separated the `` all gene '' set into quintiles based on ribosome density in unstressed wild-type cells and examined the upper and lower quintiles . 
+ The upper ribosome density quintile exhibited a decrease in occupancy at the 3 ′ end of genes that was , on average , greater in magnitude than the analogous decrease in the set of all genes ( Fig. 3E ) . 
+ This result indicates that a high density of ribosomes on the message could not prevent ethanol-induced ribosomal halting . 
+ The low ribosome density quintile , in contrast , did not exhibit such a decrease in ribosome occupancy ( Fig. 3F ) . 
+ Although this flatter ribosome occupancy curve could indicate a smaller effect of ethanol on translation of low ribosome density mRNAs , a likely alternative explanation is that translation termination on these messages might be coupled to termination of transcription . 
+ Because ribosome occupancy is ribosome abundance corrected for mRNA abundance , parallel decreases in both would appear as relatively flat ribosome occupancy across a message , as observed for the low-density quintile . 
+ Coupling of ethanol-induced translation termination to transcription termination could explain why we and others found that mutations reducing Rho activity confer ethanol tolerance . 
+ Rho terminates transcription of untranslated mRNA molecules ; thus , poorly translated mRNAs might be particularly susceptible to ethanol-induced Rho-dependent termination because an etha-nol-induced increase in ribosomal termination could not be compensated by other ribosomes on the mRNA . 
+ We therefore used the RNAseq data to assess changes in intragenic transcription termination after ethanol treatment . 
+ Rho-Dependent Transcription Termination Is Increased During Ethanol Stress . 
+ Intragenic transcription termination results in decreased mRNA levels from the 5 ′ end to the 3 ′ end of ORFs . 
+ We tested for this pattern by dividing the RNAseq reads into the same gene sets used to analyze ribosome occupancy ( Fig. 3 A -- C ) . 
+ In the `` all gene '' gene set , we observed statistically significant decreases in 3 ′ abundance relative to 5 ′ abundance , indicative of transcriptional termination , during acute stress for both wild type and the mutant ( T1 ) ( Fig. 3A ) . 
+ This change was markedly greater in magnitude for wild-type cells , suggesting that the variant Rho expressed by the mutant may reduce intragenic transcription termination . 
+ However , mRNAs with high ribosome density did not exhibit obvious eth-anol-induced decreases in 3 ′ - proximal signal ( Fig. 3B ) , suggesting that high ribosome density may protect transcripts from termination . 
+ The low ribosome-density gene set exhibited a similar pattern to the set of all genes , but with stronger and more persistent effects ( Fig. 3C ) . 
+ For these genes , mRNA levels were maximal at the 5 ′ end and dropped as transcription proceeded toward the 3 ′ end in both strains . 
+ The decrease in mRNA level after the 5 ′ end of the gene was greatest during acute ethanol stress and only partially recovered during chronic stress , indicating that ethanol inhibited the ability of cells to produce full-length transcripts of these genes even after cells had resumed active growth . 
+ Although both strains exhibited this general pattern of transcription termination , the tolerant mutant had a clear advantage in producing fulllength transcripts , maintaining significantly higher mRNA levels across the length of genes than wild type at all time points . 
+ These results suggested that Rho [ L270M ] reduced transcription termination . 
+ To test this possibility , we performed an RNAseq analysis of a strain containing only the rho ( L270M ) mutation . 
+ The experiment confirmed the reduced rho activity phenotype , revealing increased readthrough of a set of previously defined Rho-dependent terminators ( 45 ) in the rho [ L270M ] strain compared with wild type ( Fig . 
+ S3 ) . 
+ Of 114 Rho-dependent terminators for which fold change could be reliably calculated using our RNAseq dataset ( Dataset S1 ) , 84 % exhibited increased terminator read-through in the mutant , and nearly half ( 45 % ) exhibited a twofold or greater increase ( Mann -- Whitney U test ; P < 0.001 ) . 
+ Thus , the rho [ L270M ] mutation confers a global defect in Rho-dependent transcription termination . 
+ We concluded that , during ethanol stress , Rho aberrantly terminates transcription of hundreds of genes . 
+ RNA polymerases ( RNAPs ) synthesizing mRNAs with fewer ribosomes than average were more prone to Rho-dependent termination . 
+ Especially for these poorly translated mRNAs , the reduced activity of Rho [ L270M ] favored production of full-length transcripts . 
+ Thus , our results are consistent with the hypothesis that ethanol-induced translational termination uncouples translation from transcription , stimulating Rho to terminate mRNA elongation . 
+ Ethanol Inhibits mRNA Synthesis by RNA Polymerase in Vitro . 
+ Another possible way ethanol could increase Rho-dependent transcription termination is to slow transcript elongation by RNA polymerase , which would increase the time available for Rho action . 
+ To investigate this possibility , we tested whether modest concentrations of ethanol could affect transcript elongation by RNAP in vitro in the absence of translation . 
+ In the presence of ethanol at concentrations of 30 g/L or 60 g/L , the average rate of transcript elongation by E. coli RNAP was reduced by 10 -- 30 % , and transcriptional pausing at a subset of sites was exacerbated ( Fig. 4 ) . 
+ Thus , ethanol not only may increase chances for Rho loading by causing translational misreading and termination , but also may increase Rho action by slowing RNAP and thus increasing the kinetic window within which Rho can effect termination . 
+ Ethanol Inhibits Translation of Nonstart AUG Codons . 
+ The results described above ( see Ribosome Profiling Reveals Ethanol Induction of Ribosomal Termination ) strongly implicated ribosome stalling in the toxic effect of ethanol on E. coli . 
+ Such stalling could be random or could be biased toward specific sites in the genome . 
+ We hypothesized that ribosomes in ethanol-treated cells might be prone to stop at AUG codons due to methionine limitation because a mutation in the master repressor of methionine biosynthesis , metJ , contributed to ethanol tolerance . 
+ We therefore used the ribosome profiling data to assess ethanol-induced changes in ribosome occupancy at start and nonstart AUG codons compared with codons for other amino acids and stop codons ( Fig. 5 ) . 
+ Increased codon occupancy in the ribosome profiles reflects increased dwell time of ribosomes at those codons relative to others , thus revealing codons whose translation was inhibited by ethanol . 
+ Ethanol had small effects on ribosome occupancy at most codons , but strongly affected occupancy at nonstart AUG codons . 
+ Nonstart AUG occupancy dramatically increased during acute toxicity ( T1 ) and only partially recovered during chronic toxicity ( T2 ) in both strains . 
+ This pattern correlates with the overall pattern of translation termination observed by ribosome profiling ( Fig. 3D ) , which was strongly induced at T1 but largely recovered by T2 . 
+ The magnitude of the ethanol effect on nonstart AUG occupancy was less for the mutant than for wild type at both time points ( Wilcoxon matched-pairs rank test ; P < 0.001 ) . 
+ We infer that the metJ [ ΔE91 ] allele protects against ethanol stress at least in part because it leads to increased expression of methionine biosynthesis genes ( Table S2 ) , potentially increasing the methionine pool available to the cell . 
+ Consistent with this hypothesis , deletion of the metJ repressor or addition of excess methionine improved ethanol tolerance of wild-type E. coli ( Fig . 
+ S4 ) . 
+ In contrast to nonstart AUG codons , ribosome occupancy at AUG start codons increased in the mutant after ethanol addition but decreased in the wild-type strain ( Fig. 5 ) . 
+ In principle , relative start codon occupancy reflects the rate of translation initiation relative to elongation ( i.e. , how much time that ribosomes spend at start codons relative to other codons ) . 
+ Because rates of translation initiation and growth are positively correlated in E. coli ( 46 ) , these different responses likely reflect the lesser effect of ethanol on mutant versus wild-type growth rates ( < 3 × versus > 6 × , respectively ) ( Fig. 1B ) . 
+ Discussion
+ Our study of the effects of ethanol on central dogma processes was motivated by the discovery that significant ethanol tolerance in E. coli could be recapitulated by mutations in genes encoding a ribosomal protein involved in decoding ( RpsQ ) , the master feedback repressor of methionine biosynthesis ( MetJ ) , and a transcription factor that terminates transcription when it becomes uncoupled from translation ( Rho ) . 
+ These three mutations exhibited specific effects on transcription and translation consistent with the hypothesis that ethanol alters the conformations of ribosomes and RNAP in ways that increase misreading , ribosome stalling , and RNAP pausing , leading to increased Rho termination when transcription and translation become uncoupled . 
+ RpsQ [ H31P ] increased translational fidelity and suppressed ethanol-induced misreading ( Fig. 2 ) . 
+ MetJ [ ΔE91 ] up-regulated expression of methionine biosynthesis genes and may have partially countered an ethanol-induced stalling of ribosomes at nonstart AUG codons ( Table S2 , Fig. 5 , and Fig . 
+ S4 ) . 
+ Rho [ L270M ] reduced transcript termination at Rho-dependent terminators and contributed to amelioration of ethanol-induced premature termination on poorly translated genes ( Fig . 
+ S3 and Fig. 3 ) . 
+ Finally , we found that ethanol directly slows transcript elongation by RNA polymerase , which could increase opportunities for intragenic Rho-dependent termination ( Fig. 4 ) . 
+ Taken together , these data indicate that multiple effects on the transcription and translation machinery are important components of cellular ethanol stress ( Fig. 6 ) . 
+ In the presence of ethanol , increased ribosome stalling ( particularly at nonstart AUG codons ) and increased ribosome misreading inhibit poly-peptide synthesis and increase the levels of misfolded and ab-errant proteins in the cell . 
+ Ethanol-induced misreading errors may exacerbate stalling and chain termination due to ribosomal proofreading after peptide bond formation ( 43 ) . 
+ Slower translation and increased translational termination can decouple translation from transcription , promoting Rho-dependent termination of transcription within genes . 
+ Ethanol-induced slowing of RNAP further sensitizes transcription to termination by Rho . 
+ The ethanol tolerance conferred by metJ mutations may reflect the inhibitory effects of ethanol on translation that are affected by methionyl-tRNAMet levels . 
+ Ethanol increases ribosome dwell time on internal AUG codons ( Fig. 5 ) , suggesting that methionyltRNAMet becomes limiting for protein synthesis ; thus , elevated methionine levels might partially compensate for this effect . 
+ Consistent with this hypothesis , methionine supplementation or metJ deletion increased ethanol tolerance of wild-type E. coli grown in minimal medium ( Fig . 
+ S4 ) . 
+ Deletions and point mutations in metJ are known to increase methionine levels in E. coli ( 47 , 48 ) ; the metJ [ ΔE91 ] mutation likely causes a similar effect by elevating transcription of the metABFIKNR genes ( Table S2 ) . 
+ Interestingly , elevating methionine biosynthesis or supplementing with methionine also compensates for other stresses in E. coli , including acetate , organic acids , nitrosating agents , and heat shock ( 49 -- 51 ) . 
+ MetA , which catalyzes the first step in methionine bio-synthesis , is known to aggregate during heat stress ( 51 ) , suggesting that stress-induced MetA aggregation may reduce functional MetA and resultant methionine levels . 
+ Ethanol is a potent inducer of the heat shock response ( 27 ) and thus could similarly cause MetA aggregation , resulting in decreased methionine and Met methionyl-tRNA levels that could be partially corrected by the metJ [ ΔE91 ] mutation . 
+ Although other compensatory effects on cell growth of elevating methionine biosynthesis remain possible , effects on translation are consistent with our detection of ethanol-induced stalling on internal Met codons . 
+ In addition to causing toxic effects related to methionine , our data indicate that ethanol interferes with the processes by which ribosomes and RNAP produce proteins and RNA . 
+ Ethanol may alter the activities of the ribosome and RNAP through interactions at specific sites or through less-specific solute effects on macromolecular conformation . 
+ The latter possibility is attractive because both the ribosome and RNAP are multisubunit complexes whose activities require significant conformational changes during repeated cycles of chain extension and because the ethanol levels at which toxic effects occur ( 1-2 M ) are sufficient to alter the activities of water and solutes at protein surfaces ( 52 ) . 
+ Indeed , solute effects are known to alter RNAP pausing and elongation ( 53 ) in patterns and magnitude similar to those we observed for ethanol ( Fig. 4 ) . 
+ Ethanol appears to de-stabilize proteins by promoting exposure of hydrophobic regions ( 54 , 55 ) through direct interactions ( 56 ) . 
+ Ethanol destabilization of protein structure via global effects as a solute may contribute to the well-known induction of the unfolded protein response ( heat-shock response ) by ethanol in E. coli ( 57 ) although etha-nol-induced amino acid misincorporation to generate nonnative proteins also could contribute . 
+ However , ethanol is less well studied than other protein-perturbing solutes , and both mRNA decoding by the ribosome and pausing and elongation by RNAP are complex phenomena . 
+ Thus , further study will be needed to understand which parts of these complex molecular machines may be altered to cause the effects we observed . 
+ Nonetheless , our finding that ethanol causes decoding errors similarly to inhibitors such as streptomycin , coupled with previous findings , suggests that ethanol may alter the conformation of the decoding center of the small ribosomal subunit . 
+ Ribosomes select the correct aminoacyl-tRNA by coupling correct codon-anticoding pairing in the decoding center to GTP hydro-lysis by elongation factor Tu ( EF-Tu ) , leading to tRNA accommodation into the peptidyl transferase center ( 58 ) . 
+ Ethanol increases ribosomal misreading in vitro ( 24 -- 26 ) and causes changes to the ribosome footprint in a manner similar to anti-biotics such as streptomycin and neomycin ( 59 ) . 
+ A mutation in rpsL , encoding a ribosomal protein known to be important for streptomycin binding and accurate translation ( S12 ) , was shown to increase ethanol tolerance in an E. coli rho mutant background ( 14 ) . 
+ Additionally , ethanol tolerance is conferred on E. coli by overexpression ( 17 ) of RlmH , which methylates a 23S rRNA pseudouridine near the ribosomal decoding center ( 60 ) , and of TruB , which catalyzes formation of pseudouridine-55 on tRNAs and is important for efficient translation of certain codons ( 61 ) . 
+ Finally , ethanol has been reported to promote the growth of streptomycin-dependent mutants of E. coli in the absence of streptomycin ( 41 , 62 ) , suggesting that ethanol , like streptomycin , may induce compensatory conformational changes in the decoding center of the ribosome . 
+ We propose that ethanol disrupts the natural conformation of the ribosomal decoding center and thus , like streptomycin , allows EF-Tu GTP hydrolysis and accommodation upon binding of noncognate aminoacyl-tRNAs . 
+ An eth-anol-induced rearrangement of the decoding center might also affect the propensity of the ribosome to stall and possibly its ability to recognize methionyl-tRNA although other , distal effects of eth-anol on the ribosome also could cause translational halting . 
+ Our proposal that ethanol exerts its toxic effects on E. coli in part through direct effects on the ribosome and RNAP contrasts with previous suggestions that ethanol tolerance mutations modifying proteins involved in transcription and translation function indirectly by `` rewiring '' gene-expression networks . 
+ For example , increased alcohol tolerance of strains with reduced Rho activity ( 14 , 20 ) or mutations in RNA polymerase subunits or the TATA-binding protein ( 4 , 22 , 63 ) have been attributed to altered expression of specific genes controlled by termination or initiation . 
+ Our results do not preclude the possibility that increased expression of some genes contributes to ethanol tolerance ; both effects on the transcription/translation machinery and effects on gene expression may occur , and both may be important . 
+ Rather , we suggest that the potentially important and simpler explanation that compensatory mutations may help shield the ribosome and RNAP from direct effects of ethanol should not be overlooked . 
+ Indeed , some indirect effects of ethanol may also be consequences of primary effects on the ribosome or RNAP . 
+ For example , we speculate that the ethanol-induced changes in E. coli membrane lipid composition ( 10 ) might be caused in part by ethanol 's effects on the ribosome . 
+ Ethanol stress results in ( p ) ppGpp production ( 27 ) , likely by the RelA synthase bound to stalled ribosomes . 
+ ( p ) ppGpp binds numerous cellular enzymes ( 64 ) and is known to inhibit the activity of the phospholipid synthesis enzyme PlsB in vivo and in cell-free extracts ( 65 , 66 ) . 
+ Mutation of plsB alters fatty acid composition similarly to effects of ethanol ( increased unsaturated C18 :1 fatty acids ) ( 67 ) , providing a potential link between the effects of ethanol on the ribosome and on membrane composition ( 10 , 11 ) mediated by ( p ) ppGpp . 
+ Increased ( p ) ppGpp levels affect transcription from numerous promoters in the cell ( 28 , 68 ) , which could lead to other downstream effects as well . 
+ Thus , our finding that ethanol induces ribosome stalling also potentially provides a simple mechanistic explanation for ethanol stimulation of ( p ) ppGpp production and consequent effects on multiple cellular functions as a signal of translational stress . 
+ A key question raised by our work is whether modes of ethanol tolerance related to its direct effects on the ribosome and RNAP are broadly applicable to other solvents and other microbes . 
+ Some ethanol-tolerant mutants of Saccharomyces cerevisiae carry ribosomal mutations that alter antibiotic sensitivities ( 69 ) , suggesting that the toxic effects of ethanol on the ribosome may operate in yeast as well as bacteria . 
+ Do naturally ethanol-tolerant species such as S. cerevisiae protect their ribosomes from etha-nol-induced translational error and their RNAPs from ethanol-induced transcriptional slowing ? 
+ Do other small-molecule solvents have ethanol-like toxic effects ? 
+ The answers to these questions could have important consequences for biosynthetic engineering . 
+ If translation and transcription are widely prone to inhibition by solvents such as ethanol , then engineering microbes with resilient ribosomes and RNAPs may enhance biological production of a variety of small molecules . 
+ More generally , our results highlight the importance of considering direct effects on central dogma processes when evaluating the effects of solutes on microbes . 
+ Materials and Methods Bacterial Strains , Media , and Growth Conditions . 
+ All strains are derivatives of 
+ E. coli K12 strain MG1655 ( Table 1 ) . 
+ Strains MTA156 , MTA157 , and MTA160 were selected after 24 aerobic serial passages of MG1655 in M9 minimal me-dium ( 70 ) , supplemented with MgSO4 ( 1 mM ) , CaCl2 ( 0.1 mM ) , and glucose ( 10 g/L ) , and increasing amounts of ethanol up to 65 g/L ( Fig. 1 ) . 
+ We isolated MTA156 , MTA157 , and MTA160 as clonal strains from the evolved cultures by single-colony purification . 
+ The genomes of MG1655 , MTA156 , MTA157 , and MTA160 were sequenced at the University of Wisconsin-Madison ( UWMadison ) Biotechnology Center using the Illumina HiSeq 2000 platform . 
+ Unmarked transfer of alleles between MG1655 and MTA156 was achieved by P1 transduction , first transducing in an auxotrophic marker from the Keio collection ( 71 ) linked to the desired locus and then selecting for prototrophic transductants from the appropriate donor , as previously described for rho mutations ( 72 ) . 
+ EP23 was constructed by P1 transduction of the metJ : : kan mutation from the Keio collection into MG1655 . 
+ RL2739 was constructed by λ Red recombination of the rpsL150 allele into MG1655 after amplification from strain DH10B ( 40 ) by PCR using forward primer 5 ′ - GCCTGGTGATGGCGGGATCG and reverse primer 5 ′ - CGCGACGACGTGGCATGGAA . 
+ Fermentation Experiments . 
+ For fermentation data shown in Fig. 1E , strains MTA376 and MTA722 were constructed by introducing the ldhA : : kan allele from the Keio collection into MG1655 and EP61 , respectively , by transduction , followed by removal of the kan marker by FLP recombinase ( 71 ) . 
+ The etha-nologenic PET cassette was expressed in each strain from plasmid pJGG2 ( 52 ) . 
+ Fermentative cultivations were performed in stirred flasks incubated at 37 °C in an anaerobic chamber with an atmosphere of 10 % CO2 + 10 % H2 . 
+ The medium for fermentations was synthetic corn stover hydrolysate ( 73 ) supplemented with glucose ( 60 g/L ) and xylose ( 30 g/L ) to mimic 9 % glucan loading . 
+ End product detection was done as described previously ( 73 ) . 
+ Rho-Dependent Terminator Readthrough Analysis . 
+ MG1655 and RL2325 cultures were grown , RNA was harvested , and RNAseq libraries were prepared as described ( 30 ) . 
+ Sequencing was performed by the Joint Genome Institute using an Illumina Genome Analyzer II . 
+ Normalized RNAseq reads were sum-med within regions previously defined by terminator readthrough in Rho-inhibited cells ( 45 ) . 
+ Fold changes in terminator readthrough were calculated by averaging the read counts for two biological replicates , and then dividing the mutant read count by the wild-type read count . 
+ Fold change could not be calculated for terminators with an average read count of zero for the wildtype samples ; such terminators were removed from the final analysis . 
+ Statistical significance was tested by comparing fold changes for Rho-dependent terminators versus random sites in the genome of the same length ( obtained by rotating the positions of terminator readthrough by 1 Mb ) . 
+ In Vitro Transcription Assay . 
+ Template DNA was made from plasmid pMK110 , which was constructed from pIA267 ( 74 ) by inserting a region from the E. coli bgl operon into the SpeI site using forward primer 5 ′ - GCGAGCACTAGTTGTT-CAAGAATACGCCAGGA and reverse primer 5 ′ - GCGAGCACTAGTGGCGATGA-GCTGGATAAACT . 
+ pMK110 template DNA contains a λPR promoter followed by a 26-nucleotide C-less cassette . 
+ The linear pMK110 template for transcription reactions was generated by PCR amplification using forward primer 5 ′ - CGTTAAATCTATCACCGCAAGGG and reverse primer 5 ′ - CAGTTCCCTACTCTCG-CATG . 
+ PCR products were electroeluted from an agarose gel and phenol extracted . 
+ Core E. coli RNAP and E. coli σ70 were purified as described previously ( 75 , 76 ) . 
+ RNAP holoenzyme ( core ββ ′ α plus σ70 2 ) was prepared by incubating twofold molar excess of σ70 with core for 30 min at 30 °C . 
+ Halted elongation complexes were formed by incubating 10 nM linear pMK110 template and 15 nM RNAP holoenzyme in transcription buffer [ 40 mM Tris · HCl ( pH 8.0 ) , 50 mM KCl , 5 mM MgCl2 , 0.5 mM DTT , and 5 % ( vol/vol ) glycerol ] with 150 μM ApU , 10 μM ATP and UTP , 2.5 μM GTP , and 0.37 μM ( 10 μCi ) [ α-32P ] GTP for 10 min at 37 °C to stall complexes 26 nucleotides downstream of the transcription start site . 
+ Resumption of transcription was allowed by adding 30 μM each ATP , UTP , CTP , and GTP , 100 μg rifampicin / mL , and 0.1 U RNasin / μL ( Promega ) . 
+ Ethanol was added where indicated . 
+ Samples were taken at indicated time points by mixing with an equal volume of 2 × stop dye ( 8 M urea , 30 mM Na2EDTA , and 0.05 % bromophenol blue and xylene cyanol ) . 
+ Samples were heated for 2 min at 90 °C and separated by electrophoresis in a denaturing 6 % polyacrylamide gel ( 19:1 acrylamide : bisacrylamide ) in 7 M urea , 1.25 mM Na2EDTA , and 44 mM Tris borate ( pH 8.3 ) . 
+ Gels were exposed to a PhosphorImager screen , which was scanned using a Typhoon PhosphorImager and quantified using ImageQuant software ( GE Healthcare ) . 
+ Densitometry profiles were generated by converting pixels from the PhosphorImager scan to transcript positions by comparison with end-labeled MspI fragments of pBR322 using a 6-factor polynomial function . 
+ Mean transcript length was calculated from the summed products of the transcript length ( in nt ) times signal intensity divided by total signal intensity . 
+ Ethanol -- Antibiotic Synergy Experiments . 
+ Indicated strains ( MG1655 and RL2739 ) were cultured in 96-well plates ( BD Falcon ) in Neidhardt rich medium with 2 g glucose/L ( 77 ) supplemented with ethanol , translation inhibitors , or both . 
+ Wells were topped with mineral oil to prevent ethanol evaporation as described ( 78 , 79 ) , and growth was tracked by measuring the absorbance at 595 nm in a Tecan Infinite F200 plate reader . 
+ Fitness was defined as the ratio of stressed to unstressed logarithmic growth rates . 
+ Measurement of Translational Misreading . 
+ Cultures for misreading experiments were diluted from unstressed overnight cultures and grown to midlogarithmic phase in aerobic tubes in Neidhardt rich medium with 2 g glucose/L ( 77 ) supplemented with 100 μg ampicillin/mL ( to maintain luciferase-expressing plasmids ) , 0 -- 40 g EtOH/L , and 10 -- 100 μM isopropyl β-D-1-thiogalactopyranoside ( IPTG ) . 
+ Equivalent numbers of cells were harvested from each culture . 
+ Cells were lysed , and firefly luciferase ( F-luc ) and jellyfish luciferase ( R-luc ) activities were measured as described ( 34 ) . 
+ Luminescence was measured using a Tecan Infinite F200 plate reader . 
+ Values are expressed as F-luc/R-luc ratios normalized to the F-luc/R-luc ratio of an isogenic strain carrying the wild-type firefly luciferase to control for any differences between cultures and differential effects of ethanol on the two enzymes . 
+ Ribosome Profiling . 
+ Cultures were grown aerobically in 1-L volumes of M9 minimal medium ( 70 ) , supplemented with MgSO4 ( 1 mM ) , CaCl2 ( 0.1 mM ) , and glucose ( 10 g/L ) in vigorously shaken ( 225 rpm ) Fernbach flasks . 
+ Ethanol was added to 40 g of EtOH/L once A600 of cultures reached 0.3 . 
+ This ethanol concentration was chosen because it allowed comparison of wild-type and mutant phenotypes under conditions in which both could grow and both 
+ 1 . 
+ Yomano LP , York SW , Ingram LO ( 1998 ) Isolation and characterization of ethanol-tolerant mutants of Escherichia coli KO11 for fuel ethanol production . 
+ J Ind Microbiol Biotechnol 20 ( 2 ) :132 -- 138 . 
+ 2 . 
+ Dien BS , Cotta MA , Jeffries TW ( 2003 ) Bacteria engineered for fuel ethanol production : Current status . 
+ Appl Microbiol Biotechnol 63 ( 3 ) :258 -- 266 . 
+ 3 . 
+ Zhao XQ , Bai FW ( 2009 ) Mechanisms of yeast stress tolerance and its manipulation for efficient fuel ethanol production . 
+ J Biotechnol 144 ( 1 ) :23 -- 30 . 
+ 4 . 
+ Alper H , Moxley J , Nevoigt E , Fink GR , Stephanopoulos G ( 2006 ) Engineering yeast transcription machinery for improved ethanol tolerance and production . 
+ Science 314 ( 5805 ) :1565 -- 1568 . 
+ 5 . 
+ Fischer CR , Klein-Marcuschamer D , Stephanopoulos G ( 2008 ) Selection and optimi-zation of microbial hosts for biofuels production . 
+ Metab Eng 10 ( 6 ) :295 304 . 
+ -- 6 . 
+ Wackett LP ( 2008 ) Biomass to fuels via microbial transformations . 
+ Curr Opin Chem Biol 12 ( 2 ) :187 193 . 
+ -- 7 . 
+ Ingram LO ( 1990 ) Ethanol tolerance in bacteria . 
+ Crit Rev Biotechnol 9 ( 4 ) :305 -- 319 . 
+ 8 . 
+ Sikkema J , de Bont JA , Poolman B ( 1995 ) Mechanisms of membrane toxicity of hydrocarbons . 
+ Microbiol Rev 59 ( 2 ) :201 222 . 
+ -- 9 . 
+ Segura A , et al. ( 2012 ) Solvent tolerance in Gram-negative bacteria . 
+ Curr Opin Bio-technol 23 ( 3 ) :415 -- 421 . 
+ 10 . 
+ Dombek KM , Ingram LO ( 1984 ) Effects of ethanol on the Escherichia coli plasma membrane . 
+ J Bacteriol 157 ( 1 ) :233 -- 239 . 
+ 11 . 
+ Ingram LO , Vreeland NS ( 1980 ) Differential effects of ethanol and hexanol on the Escherichia coli cell envelope . 
+ J Bacteriol 144 ( 2 ) :481 -- 488 . 
+ 12 . 
+ Brynildsen MP , Liao JC ( 2009 ) An integrated network approach identifies the iso-butanol response network of Escherichia coli . 
+ Mol Syst Biol 5:277 . 
+ 13 . 
+ Gonzalez R , et al. ( 2003 ) Gene array-based identification of changes that contribute to ethanol tolerance in ethanologenic Escherichia coli : Comparison of KO11 ( parent ) to LY01 ( resistant mutant ) . 
+ Biotechnol Prog 19 ( 2 ) :612 -- 623 . 
+ 14 . 
+ Freddolino PL , Goodarzi H , Tavazoie S ( 2012 ) Fitness landscape transformation through a single amino acid change in the Rho terminator . 
+ PLoS Genet 8 ( 5 ) : e1002744 . 
+ 15 . 
+ Goodarzi H , Hottes AK , Tavazoie S ( 2009 ) Global discovery of adaptive mutations . 
+ Nat Methods 6 ( 8 ) :581 -- 583 . 
+ 16 . 
+ Nicolaou SA , Gaida SM , Papoutsakis ET ( 2012 ) Exploring the combinatorial genomic space in Escherichia coli for ethanol tolerance . 
+ Biotechnol J 7 ( 11 ) :1337 -- 1345 . 
+ 17 . 
+ Woodruff LB , Boyle NR , Gill RT ( 2013 ) Engineering improved ethanol production in Escherichia coli with a genome-wide approach . 
+ Metab Eng 17:1 -- 11 . 
+ 18 . 
+ Woodruff LB , et al. ( 2013 ) Genome-scale identification and characterization of eth-anol tolerance genes in Escherichia coli . 
+ Metab Eng 15:124 -- 133 . 
+ 19 . 
+ Zingaro KA , Papoutsakis ET ( 2012 ) Toward a semisynthetic stress response system to engineer microbial solvent tolerance . 
+ MBio 3 ( 5 ) : e00308 -- 12 . 
+ 20 . 
+ Goodarzi H , et al. ( 2010 ) Regulatory and metabolic rewiring during laboratory evolution of ethanol tolerance in E. coli . 
+ Mol Syst Biol 6:378 . 
+ 21 . 
+ Chong H , et al. ( 2013 ) Improving ethanol tolerance of Escherichia coli by rewiring its global regulator cAMP receptor protein ( CRP ) . 
+ PLoS ONE 8 ( 2 ) : e57628 . 
+ exhibited growth responses to ethanol . 
+ Ribosome profiling and library generation were performed as described by Oh et al. ( 80 ) , with cells harvested by rapid vacuum filtration onto 0.2 - μm filters and flash-frozen in liquid N2 to arrest ribosomes . 
+ After cryogenic lysis , monosomes were purified by sucrose gradient fractionation . 
+ Sequencing was performed at the UW-Madison Biotechnology Center using an Illumina HiSeq 2000 set for 50-bp single-end reads . 
+ Raw reads were trimmed by 2 nt from the 5 ′ end ( to remove any nontemplated nucleotides added by reverse transcriptase ) and were then mapped using the Burrows-Wheeler Aligner ( 81 ) to the MG1655 genome ( National Center for Bio-technology Information accession no . 
+ NC_000913 ) . 
+ Reads were not trimmed to individual codons to avoid errors arising from the variation in ribosome footprint dimensions at different genomic locations ( 82 ) . 
+ Signals mapping to noncoding RNA regions were removed from the dataset , and each dataset was normalized to reads per million per position before further analysis . 
+ Data manipulation and analyses were performed with custom Perl scripts . 
+ Statistical analyses were performed using GraphPad Prism ( GraphPad Software ) . 
+ For metagene analyses ( gene-segment analyses and codon-type analyses ) , pseudogenes and genes not represented in one or more datasets were excluded , leaving 3,048 genes in the `` all genes '' dataset . 
+ High-trans-lation and low-translation quintiles were defined as the gene sets with highest/lowest ribosome-occupancy-to-RNA signal ratios in MG1655 before ethanol treatment . 
+ For gene-segment analyses , differences between data-sets were assessed using area under the curve ( rectangular midpoint approximations ) for each gene in the set . 
+ ACKNOWLEDGMENTS . 
+ We thank our Great Lakes Bioenergy Research Center collaborators and R.L. laboratory colleagues for critical reading of the manuscript . 
+ We are grateful to Gene-Wei Li and David Burkhardt for advice and assistance with ribosome profiling experiments , P. Chu for technical assistance , and Gerwald Jogl for helpful discussions of ribosome decoding . 
+ This work was funded by the Department of Energy Great Lakes Bioenergy Research Center ( DOE BER Office of Science Grant DE-FC02-07ER64494 ) . 
+ 22 . 
+ Alper H , Stephanopoulos G ( 2007 ) Global transcription machinery engineering : A new approach for improving cellular phenotype . 
+ Metab Eng 9 ( 3 ) :258 -- 267 . 
+ 23 . 
+ Luo LH , et al. ( 2009 ) Improved ethanol tolerance in Escherichia coli by changing the cellular fatty acids composition through genetic manipulation . 
+ Biotechnol Lett 31 ( 12 ) : 1867 -- 1871 . 
+ 24 . 
+ Friedman SM , Berezney R , Weinstein IB ( 1968 ) Fidelity in protein synthesis : The role of the ribosome . 
+ J Biol Chem 243 ( 19 ) :5044 -- 5048 . 
+ 25 . 
+ Phoenix P , Melançon P , Brakier-Gingras L ( 1983 ) Characterization of mutants of Escherichia coli with an increased control of translation fidelity . 
+ Mol Gen Genet 189 ( 1 ) : 123 -- 128 . 
+ 26 . 
+ So AG , Davie EW ( 1964 ) The effects of organic solvents on protein biosynthesis and their influence on the amino acid code . 
+ Biochemistry 3:1165 -- 1169 . 
+ 27 . 
+ VanBogelen RA , Kelley PM , Neidhardt FC ( 1987 ) Differential induction of heat shock , SOS , and oxidation stress regulons and accumulation of nucleotides in Escherichia coli . 
+ J Bacteriol 169 ( 1 ) :26 -- 32 . 
+ 28 . 
+ Potrykus K , Cashel M ( 2008 ) ( p ) ppGpp : still magical ? 
+ Annu Rev Microbiol 62:35 -- 51 . 
+ 29 . 
+ Peters JM , Vangeloff AD , Landick R ( 2011 ) Bacterial transcription terminators : The RNA 3 ′ - end chronicles . 
+ J Mol Biol 412 ( 5 ) :793 -- 813 . 
+ 30 . 
+ Peters JM , et al. ( 2012 ) Rho and NusG suppress pervasive antisense transcription in Escherichia coli . 
+ Genes Dev 26 ( 23 ) :2621 -- 2633 . 
+ 31 . 
+ Marincs F , Manfield IW , Stead JA , McDowall KJ , Stockley PG ( 2006 ) Transcript analysis reveals an extended regulon and the importance of protein-protein co-operativity for the Escherichia coli methionine repressor . 
+ Biochem J 396 ( 2 ) :227 -- 234 . 
+ 32 . 
+ Bollen A , Cabezón T , de Wilde M , Villarroel R , Herzog A ( 1975 ) Alteration of ribosomal protein S17 by mutation linked to neamine resistance in Escherichia coli . 
+ I. General properties of neaA mutants . 
+ J Mol Biol 99 ( 4 ) :795 -- 806 . 
+ 33 . 
+ Yaguchi M , et al. ( 1976 ) Alteration of ribosomal protein S17 by mutation linked to neamine resistance in Escherichia coli . 
+ II . 
+ Localization of the amino acid replacement in protein S17 from a neaA mutant . 
+ J Mol Biol 104 ( 3 ) :617 -- 620 . 
+ 34 . 
+ Kramer EB , Farabaugh PJ ( 2007 ) The frequency of translational misreading errors in E. coli is largely determined by tRNA competition . 
+ RNA 13 ( 1 ) :87 -- 96 . 
+ 35 . 
+ Andrusiak K , Piotrowski JS , Boone C ( 2012 ) Chemical-genomic profiling : Systematic analysis of the cellular targets of bioactive molecules . 
+ Bioorg Med Chem 20 ( 6 ) : 1952 -- 1960 . 
+ 36 . 
+ Barker CA , Farha MA , Brown ED ( 2010 ) Chemical genomic approaches to study model microbes . 
+ Chem Biol 17 ( 6 ) :624 -- 632 . 
+ 37 . 
+ Lando D , Cousin MA , Privat de Garilhe M ( 1973 ) Misreading , a fundamental aspect of the mechanism of action of several aminoglycosides . 
+ Biochemistry 12 ( 22 ) :4528 -- 4533 . 
+ 38 . 
+ Demirci H , et al. ( 2013 ) A structural basis for streptomycin-induced misreading of the genetic code . 
+ Nat Commun 4:1355 . 
+ 39 . 
+ Gromadski KB , Rodnina MV ( 2004 ) Streptomycin interferes with conformational coupling between codon recognition and GTPase activation on the ribosome . 
+ Nat Struct Mol Biol 11 ( 4 ) :316 -- 322 . 
+ 40 . 
+ Durfee T , et al. ( 2008 ) The complete genome sequence of Escherichia coli DH10B : Insights into the biology of a laboratory workhorse . 
+ J Bacteriol 190 ( 7 ) :2597 -- 2606 . 
+ 41 . 
+ Momose H , Gorini L ( 1971 ) Genetic analysis of streptomycin dependence in Escher-ichia coli . 
+ Genetics 67 ( 1 ) :19 -- 38 . 
+ 42 . 
+ Agarwal D , Gregory ST , O'Connor M ( 2011 ) Error-prone and error-restrictive mutations affecting ribosomal protein S12 . 
+ J Mol Biol 410 ( 1 ) :1 -- 9 . 
+ 43 . 
+ Zaher HS , Green R ( 2009 ) Quality control by the ribosome following peptide bond formation . 
+ Nature 457 ( 7226 ) :161 -- 166 . 
+ 44 . 
+ Ingolia NT , Ghaemmaghami S , Newman JR , Weissman JS ( 2009 ) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling . 
+ Science 324 ( 5924 ) :218 -- 223 . 
+ 45 . 
+ Peters JM , et al. ( 2009 ) Rho directs widespread termination of intragenic and stable RNA transcription . 
+ Proc Natl Acad Sci USA 106 ( 36 ) :15406 -- 15411 . 
+ 46 . 
+ Bremer H , Dennis PP ( 1996 ) Modulation of chemical composition and other parameters of the cell by growth rate . 
+ Escherichia coli and Salmonella : Cellular and Mo-lecular Biology , ed Neidhardt FC ( American Society for Microbiology , Washington , DC ) , pp 1553 -- 1569 . 
+ 47 . 
+ Nakamori S , Kobayashi S , Nishimura T , Takagi H ( 1999 ) Mechanism of L-methionine overproduction by Escherichia coli : The replacement of Ser-54 by Asn in the MetJ protein causes the derepression of L-methionine biosynthetic enzymes . 
+ Appl Microbiol Biotechnol 52 ( 2 ) :179 -- 185 . 
+ 48 . 
+ Usuda Y , Kurahashi O ( 2005 ) Effects of deregulation of methionine biosynthesis on methionine excretion in Escherichia coli . 
+ Appl Environ Microbiol 71 ( 6 ) :3228 -- 3234 . 
+ 49 . 
+ Flatley J , et al. ( 2005 ) Transcriptional responses of Escherichia coli to S-nitrosoglutathione under defined chemostat conditions reveal major changes in methionine biosynthesis . 
+ J Biol Chem 280 ( 11 ) :10065 -- 10072 . 
+ 50 . 
+ Mordukhova EA , Lee HS , Pan JG ( 2008 ) Improved thermostability and acetic acid tolerance of Escherichia coli via directed evolution of homoserine o-succinyltransferase . 
+ Appl Environ Microbiol 74 ( 24 ) :7660 -- 7668 . 
+ 51 . 
+ Gur E , Biran D , Gazit E , Ron EZ ( 2002 ) In vivo aggregation of a single enzyme limits growth of Escherichia coli at elevated temperatures . 
+ Mol Microbiol 46 ( 5 ) :1391 -- 1397 . 
+ 52 . 
+ Record MT , Jr. , Zhang W , Anderson CF ( 1998 ) Analysis of effects of salts and uncharged solutes on protein and nucleic acid equilibria and processes : A practical guide to recognizing and interpreting polyelectrolyte effects , Hofmeister effects , and osmotic effects of salts . 
+ Adv Protein Chem 51:281 353 . 
+ -- 53 . 
+ Chan CL , Landick R ( 1997 ) Effects of neutral salts on RNA chain elongation and pausing by Escherichia coli RNA polymerase . 
+ J Mol Biol 268 ( 1 ) :37 53 . 
+ -- 54 . 
+ Brandts JF , Hunt L ( 1967 ) The thermodynamics of protein denaturation . 
+ 3 . 
+ The denaturation of ribonuclease in water and in aqueous urea and aqueous ethanol mixtures . 
+ J Am Chem Soc 89 ( 19 ) :4826 -- 4838 . 
+ 55 . 
+ Bull HB , Breese K ( 1978 ) Interaction of alcohols with proteins . 
+ Biopolymers 17 ( 9 ) : 2121 -- 2131 . 
+ 56 . 
+ Lehmann MS , Mason SA , McIntyre GJ ( 1985 ) Study of ethanol-lysozyme interactions using neutron diffraction . 
+ Biochemistry 24 ( 21 ) :5862 -- 5869 . 
+ 57 . 
+ Straus DB , Walter WA , Gross CA ( 1987 ) The heat shock response of E. coli is regulated by changes in the concentration of σ 32 . 
+ Nature 329 ( 6137 ) :348 -- 351 . 
+ 58 . 
+ Giudice E , Gillet R ( 2013 ) The task force that rescues stalled ribosomes in bacteria . 
+ Trends Biochem Sci 38 ( 8 ) :403 -- 411 . 
+ 59 . 
+ Jerinic O , Joseph S ( 2000 ) Conformational changes in the ribosome induced by translational miscoding agents . 
+ J Mol Biol 304 ( 5 ) :707 -- 713 . 
+ 60 . 
+ Baldridge KC , Contreras LM ( 2014 ) Functional implications of ribosomal RNA meth-ylation in response to environmental stress . 
+ Crit Rev Biochem Mol Biol 49 ( 1 ) :69 -- 89 . 
+ 61 . 
+ Urbonavičius J , Durand JM , Björk GR ( 2002 ) Three modifications in the D and T arms of tRNA influence translation in Escherichia coli and expression of virulence genes in Shigella flexneri . 
+ J Bacteriol 184 ( 19 ) :5348 -- 5357 . 
+ 62 . 
+ Gorini L , Rosset R , Zimmermann RA ( 1967 ) Phenotype masking and streptomycin dependence . 
+ Science 157 ( 3794 ) :1314 -- 1317 . 
+ 63 . 
+ Klein-Marcuschamer D , Santos CN , Yu H , Stephanopoulos G ( 2009 ) Mutagenesis of the bacterial RNA polymerase alpha subunit for improvement of complex phenotypes . 
+ Appl Environ Microbiol 75 ( 9 ) :2705 -- 2711 . 
+ 64 . 
+ Kanjee U , Ogata K , Houry WA ( 2012 ) Direct binding targets of the stringent response alarmone ( p ) ppGpp . 
+ Mol Microbiol 85 ( 6 ) :1029 -- 1043 . 
+ 65 . 
+ Heath RJ , Jackowski S , Rock CO ( 1994 ) Guanosine tetraphosphate inhibition of fatty acid and phospholipid synthesis in Escherichia coli is relieved by overexpression of glycerol-3-phosphate acyltransferase ( plsB ) . 
+ J Biol Chem 269 ( 42 ) :26584 -- 26590 . 
+ 66 . 
+ Merlie JP , Pizer LI ( 1973 ) Regulation of phospholipid synthesis in Escherichia coli by guanosine tetraphosphate . 
+ J Bacteriol 116 ( 1 ) :355 -- 366 . 
+ 67 . 
+ Cronan JE , Jr. , Weisberg LJ , Allen RG ( 1975 ) Regulation of membrane lipid synthesis in Escherichia coli : Accumulation of free fatty acids of abnormal length during inhibition of phospholipid synthesis . 
+ J Biol Chem 250 ( 15 ) :5835 -- 5840 . 
+ 68 . 
+ Dalebroux ZD , Swanson MS ( 2012 ) ppGpp : Magic beyond RNA polymerase . 
+ Nat Rev Microbiol 10 ( 3 ) :203 -- 212 . 
+ 69 . 
+ Bandas EL , Zakharov IA ( 1980 ) Induction of rho - mutations in yeast Saccharomyces cerevisiae by ethanol . 
+ Mutat Res 71 ( 2 ) :193 -- 199 . 
+ 70 . 
+ Miller JH ( 1972 ) Experiments in Molecular Genetics ( Cold Spring Harbor Laboratory Press , Plainview , NY ) . 
+ 71 . 
+ Baba T , et al. ( 2006 ) Construction of Escherichia coli K-12 in-frame , single-gene knockout mutants : The Keio collection . 
+ Mol Syst Biol 2:2006.0008 . 
+ 72 . 
+ Das A , Court D , Adhya S ( 1976 ) Isolation and characterization of conditional lethal mutants of Escherichia coli defective in transcription termination factor rho . 
+ Proc Natl Acad Sci USA 73 ( 6 ) :1959 -- 1963 . 
+ 73 . 
+ Schwalbach MS , et al. ( 2012 ) Complex physiology and compound stress responses during fermentation of alkali-pretreated corn stover hydrolysate by an Escherichia coli ethanologen . 
+ Appl Environ Microbiol 78 ( 9 ) :3442 -- 3457 . 
+ 74 . 
+ Artsimovitch I , Landick R ( 2002 ) The transcriptional regulator RfaH stimulates RNA chain synthesis after recruitment to elongation complexes by the exposed nontemplate DNA strand . 
+ Cell 109 ( 2 ) :193 -- 203 . 
+ 75 . 
+ Gribskov M , Burgess RR ( 1983 ) Overexpression and purification of the sigma subunit of Escherichia coli RNA polymerase . 
+ Gene 26 ( 2-3 ) :109 -- 118 . 
+ 76 . 
+ Toulokhonov I , Zhang J , Palangat M , Landick R ( 2007 ) A central role of the RNA polymerase trigger loop in active-site rearrangement during transcriptional pausing . 
+ Mol Cell 27 ( 3 ) :406 -- 419 . 
+ 77 . 
+ Neidhardt FC , Bloch PL , Smith DF ( 1974 ) Culture medium for enterobacteria . 
+ J Bacteriol 119 ( 3 ) :736 -- 747 . 
+ 78 . 
+ Sasson V , Shachrai I , Bren A , Dekel E , Alon U ( 2012 ) Mode of regulation and the insulation of bacterial gene expression . 
+ Mol Cell 46 ( 4 ) :399 -- 407 . 
+ 79 . 
+ Zaslaver A , et al. ( 2006 ) A comprehensive library of fluorescent transcriptional reporters for Escherichia coli . 
+ Nat Methods 3 ( 8 ) :623 -- 628 . 
+ 80 . 
+ Oh E , et al. ( 2011 ) Selective ribosome profiling reveals the cotranslational chaperone action of trigger factor in vivo . 
+ Cell 147 ( 6 ) :1295 -- 1308 . 
+ 81 . 
+ Li H , Durbin R ( 2009 ) Fast and accurate short read alignment with Burrows-Wheeler transform . 
+ Bioinformatics 25 ( 14 ) :1754 -- 1760 . 
+ 82 . 
+ O'Connor PB , Li GW , Weissman JS , Atkins JF , Baranov PV ( 2013 ) rRNA : mRNA pairing alters the length and the symmetry of mRNA-protected fragments in ribosome profiling experiments . 
+ Bioinformatics 29 ( 12 ) :1488 -- 1491 . 
+ 83 . 
+ Blattner FR , et al. ( 1997 ) The complete genome sequence of Escherichia coli K-12 . 
+ Science 277 ( 5331 ) :1453 -- 1462 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/25049088.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/25049088.txt 0 → 100644
View file @27818a9
+ Rose T. Byrne,a* Stefanie H. Chen,a Elizabeth A. Wood,a Eric L. Cabot,b Michael M. Coxa
+ To further an improved understanding of the mechanisms used by bacterial cells to survive extreme exposure to ionizing radiation ( IR ) , we broadly screened nonessential Escherichia coli genes for those involved in IR resistance by using transposon-di-rected insertion sequencing ( TraDIS ) . 
+ Forty-six genes were identiﬁed , most of which become essential upon heavy IR exposure . 
+ Most of these were subjected to direct validation . 
+ The results reinforced the notion that survival after high doses of ionizing radiation does not depend on a single mechanism or process , but instead is multifaceted . 
+ Many identiﬁed genes affect either DNA repair or the cellular response to oxidative damage . 
+ However , contributions by genes involved in cell wall structure/function , cell division , and intermediary metabolism were also evident . 
+ About half of the identiﬁed genes have not previously been associated with IR resistance or recovery from IR exposure , including eight genes of unknown function . 
+ Organisms have evolved mechanisms to maintain genomic integrity in the face of extreme environmental stresses . 
+ One class of extremophiles , typiﬁed by the bacterium Deinococcus radiodurans ( 1 , 2 ) , exhibits extraordinary resistance to the effects of high doses of ionizing radiation ( IR ) . 
+ The repair of damaged DNA , stalled replication forks , and other damaged cellular components is critical for cells to survive exposure to IR . 
+ The DNA sugar-phosphate backbone is particularly susceptible to both direct and indirect damage caused by IR ( 3 , 4 ) . 
+ Direct damage is caused by absorption of IR by the DNA molecule , which can lead to strand breakage and chemical alterations of bases . 
+ In contrast , indirect damage occurs when reactive oxygen species ( ROS ) , such as hydroxyl radicals , which are formed when IR is absorbed by water , interact with DNA . 
+ Hydroxyl radicals produce single-strand DNA breaks . 
+ Double-strand DNA breaks ( DSBs ) can occur when two IR-induced single-strand DNA breaks are in close proximity ( 5 ) . 
+ DSBs are the most lethal form of DNA damage , because they halt DNA replication , cause the collapse of the replication fork , and are difﬁcult to repair ( 1 , 6 , 7 ) . 
+ Cells repair DSBs and other DNA damage caused by IR by utilizing recombinational DNA repair and nonhomologous end joining ( 6 -- 10 ) . 
+ Because ROS are also by-products of aerobic respiration and general me-tabolism , it is likely that genes involved in IR survival are also involved in preserving DNA integrity under normal conditions , suggesting an essential role in bacteria . 
+ The capacity of cells to repair DNA , particularly double-strand breaks , has long been linked to cell survival after IR exposure ( 11 -- 19 ) . 
+ DNA repair similarly plays a major role in the extremo-phile IR resistance phenotype of Deinococcus ( 15 , 17 -- 19 ) . 
+ More recently , the Daly group , and later the Radman group , focused attention on the importance of amelioration of oxidative damage to proteins ( 20 -- 24 ) . 
+ In this mechanism , specialized DNA repair pathways are not necessary . 
+ Passive protection of proteins from oxidative processes ( including a generic complement of DNA repair functions ) facilitate survival at high levels of IR . 
+ Nevertheless , clear evidence has indicated that adaptations to the cellular DNA repair systems can make substantial contributions to extreme levels of IR resistance ( 25 ) . 
+ Given the complexity of bacterial metab-olism , it seems unlikely that the list of processes contributing to IR resistance is limited to DNA repair and amelioration of oxidative damage to proteins . 
+ Thus , a broader assessment is needed . 
+ We have carried out an exercise in directed evolution in which the Escherichia coli K-12 strain MG1655 acquired the phenotype of extreme resistance to IR ( 25 , 26 ) . 
+ Four evolved populations of E. coli were obtained , and they exhibited levels of IR resistance approaching that of D. radiodurans . 
+ Analysis of numerous sequenced isolates from these populations allowed us to identify the genetic alterations accounting for most of the acquired IR resistance phenotype ( 25 ) . 
+ In one highly evolved isolate , the phenotype was largely explained by mutations in three DNA metabolism genes , recA , dnaB , and yfjK . 
+ The modiﬁed genes provide the beginning of a more complete molecular accounting of adaptations needed to survive extreme radiation resistance . 
+ Efforts to understand this phenotype have focused to a large extent on Deincoccus radiodurans and related bacteria ( 1 , 2 , 19 , 21 , 22 , 24 ) . 
+ Analysis of transcriptome ( 27 , 28 ) and proteome ( 29 , 30 ) changes upon IR exposure , as well as careful analysis of how the genome is reconstructed over time ( 1 , 31 -- 34 ) , have provided some important insights into this bacterium 's response to IR . 
+ However , broad genetic screens to identify all contributing processes are very difﬁcult to perform with Deinococcus , reﬂecting its multigenomic status ( 1 , 2 , 19 , 24 ) . 
+ In contrast , E. coli strains with an extreme IR resistance phenotype provide an opportunity to utilize a highly tractable and insight-fertile genetic system to more broadly explore the molecular basis of this phenotype . 
+ One step toward a complete description of the genetic requirements for IR resistance would be the identiﬁcation of all contributing genes that are not modiﬁed in directed evolution trials . 
+ That identiﬁcation requires a genetic screening approach . 
+ Many screens have been carried out to identify genes involved in DNA repair in E. coli ( 35 -- 42 ) . 
+ These have resulted in the discovery of many of the key DNA repair enzymes we continue to study today . 
+ Screens to identify genes involved in radiation resistance were part of these efforts . 
+ The recN and recG genes were characterized to an extent as genes involved in radiation resistance and given a rad nomenclature ( radB and radC , respectively ) until their functions were further understood ( 12 , 43 ) . 
+ However , modern screening methods are much more robust and are sensitive methods for discovering new genes with particular functions . 
+ We sought to identify the genes involved in survival after extreme IR exposure for three additional reasons . 
+ ( i ) We do not understand the physiological function of nearly one-third of the genes of E. coli , despite its role as the most extensively studied organism . 
+ ( ii ) Radiation resistance is a complex phenotype whose molecular basis remains the subject of some controversy ( 1 , 2 , 22 , 44 ) . 
+ ( iii ) Current research tends to focus on either DNA damage or protein oxidation , and contributions from other processes are possible . 
+ We thus set out to provide a more global assessment of the cellular processes that contribute to IR resistance . 
+ A range of modern screening methods have been described ( 45 -- 49 ) that utilize transposon mutagenesis in combination with Illumina sequencing . 
+ These techniques measure each gene 's 
+ ORB1 5 = CTG TCT CTT ATA CAC ATC TC ORB5 5 = CAA GCA GAA GAC GGC ATA CGA GAT TCC TCA GTG ACT GGA GTT CAG ACG TGT GCT CTT CCG ATC T ORB6 5 = CAA GCA GAA GAC GGC ATA CGA GAT ATT GGC GTG ACT GGA GTT CAG ACG TGT GCT CTT CCG ATC T ORB7 5 = CAA GCA GAA GAC GGC ATA CGA GAT ATT GGC GTG ACT GGA GTT CAG ACG TGT GCT CTT CCG ATC T ORB8 5 = ATT GAT ACG GCG ACC ACC GAG ATC TAC ACT AAT ACG ACT CAC TAT AGG GAG ACC GGC CTC AG ORB9 5 = TAG GGA GAC CGG CCT CAG GGT TGA GAT GTG TA contribution to ﬁtness on a genomic scale through massive sequencing of transposon-genome junctions in highly mutagenized populations . 
+ We employed a relatively new procedure called transposon-directed insertion sequencing ( TraDIS ) ( 45 ) . 
+ In this method , saturating transposon mutagenesis is performed and the resulting insertion mutants are pooled to make an insertion mutant library . 
+ This library is then subjected to repeated exposures to IR . 
+ Genomic DNA from the nontreated population as well as the irradiated populations is isolated , and the location of each transposon insertion as well as the frequency of each insertion mutant within the population are determined . 
+ The change in frequencies of insertion mutants within the population are calculated for each gene , reﬂecting the effect the insertion has on a strain 's ability to survive radiation exposure . 
+ Using TraDIS , we have identiﬁed 46 candidate genes that appear to have a signiﬁcant role in survival after IR exposure . 
+ These are the focus of this report . 
+ MATERIALS AND METHODS
+ Bacterial strains and primers used in this study . 
+ All strains used in this study are E. coli K-12 derivatives and are listed in Table 1 . 
+ Genetic manipulations were performed as previously described ( 50 ) . 
+ Oligonucleotide primers are listed in Table 2 . 
+ Transposome preparation . 
+ Transposon mutagenesis was performed using the Epicentre EZ-Tn5 transposition system , which consists of a transposase dimer conjugated to transposon DNA ( 51 ) . 
+ The transposon is EZ-Tn5 KAN-2 Tnp and was ampliﬁed by using the oligonucleotides ORB1 and phusion polymerase ( Stratagene ) . 
+ One hundred nanograms of this DNA was incubated with Tnp EK54/MA56/LP372 , a hyperactive transposase with reduced target speciﬁcity ( 52 ) , at room temperature for 3 h. Transposome complexes were dialyzed against Tris-EDTA ( TE ) to remove all salt from the reaction mixture before electroporation . 
+ Preparation of electrocompetent cells for mutagenesis . 
+ Cells were cultured in Luria-Bertani ( LB ) broth at 37 °C with aeration to an optical density at 600 nm ( OD600 ) of 0.4 to 0.6 , chilled at 4 °C for 30 min with stirring , harvested by centrifugation , and washed three times with ice-cold 10 % glycerol . 
+ In the ﬁnal wash , cells were resuspended in 1/500 ( vol/vol ) ice-cold glycerol-yeast extract medium and stored at 80 °C . 
+ One hundred microliters of cell suspension was mixed with 10 l of transposomes and electroporated in a 2-mm electrode gap cuvette with a GenePulser II ( Bio-Rad ) . 
+ Cells were recovered in 1 ml of SOC medium ( 53 ) and incubated at 37 °C for 1 h and then spread on plates containing 40 mg/ml kanamycin and incubated overnight . 
+ The total number of colonies was estimated by counting colonies on several plates . 
+ The colonies on each plate were pooled sterilely in LB plus 20 % glycerol and stored at 80 °C . 
+ Approximately 5 or more electroporations were performed per strain to generate an insertion mutant pool . 
+ The number of mutants per electroporation ranged from 20,000 to 175,000 . 
+ By estimating the total number of mutants per batch , volumes containing similar numbers of mutants from each batch were pooled to create mutant libraries to contain 5 10 mutants . 
+ IR treatment . 
+ An IR dose of 1,000 Gy was applied iteratively to the mutant libraries by using a Mark I 137Cs irradiator ( from J. L. Shepherd and Associates ) . 
+ Mutant pools were inoculated into 100 ml of LB at an initial OD600 of 0.02 and were grown to an OD600 of 0.2 . 
+ Cells were spun down and resuspended in 0.5 ml LB , IR treated , and then allowed to grow for approximately 7 generations to stationary phase before being used to inoculate the next cycle . 
+ This was repeated ﬁve times . 
+ Nonirradiated mock cultures were taken through all ﬁve passages in parallel but sat outside the irradiator during treatment . 
+ Fragment library sample preparation , sequencing , and data analysis . 
+ Genomic DNA was isolated after each cycle of irradiation and from the nonirradiated cultures . 
+ DNA from the ﬁrst and ﬁfth cycles was sheared to an average size of 300 bp by using hydroshear sonication . 
+ Preparation of the fragment library for sequencing was performed as described by Illumina , except the PCR ampliﬁcation step was modiﬁed so that only sequences ﬂanking transposons were ampliﬁed , using primers ORB5 , -6 , or -7 and ORB8 . 
+ ORB5 to -7 are custom reverse primers with different indexes for multiplexing . 
+ ORB8 was the forward primer , complementary to the adapter used ( Table 2 ) . 
+ Ampliﬁed fragment libraries were separated on an E-gel size select 2 % agarose gel ( Promega ) , and 270-bp fragments were puriﬁed . 
+ The ampliﬁed DNA fragment libraries were sequenced on single-end Illumina ﬂow cells for 75 cycles in an Illumina genome analyzer IIx . 
+ The sequencing primer , ORB9 , was modiﬁed for only sequence trans-poson-containing DNA fragments . 
+ Sequence reads from the Illumina FASTQ ﬁles were separated into reads with tag and reads without . 
+ Reads containing the transposon tag sequence were retained for analysis . 
+ The 10-bp tag was removed , and then reads were trimmed to 50 bp and mapped to the E. coli genome by using BOWTIE ( 54 ) , omitting insertion locations with less than 10 reads and allowing for 1 mismatch . 
+ All insertion locations in the ﬁrst 1 % and last 10 % of gene regions were removed from further analysis . 
+ Further , genes with three or fewer insertion locations that met our criteria were considered essential . 
+ Reads per gene were normalized by the total millions of reads collected for the sample to normalize for variations of total reads in different sequencing runs . 
+ Contribution values were calculated as the log ratio of reads in the irradiated sample , ng , B , to reads in the nonirradiated sample , ng , A , for each gene , g [ log ( ng , B/ng , A ) ] . 
+ Genes were analyzed for the decrease in insertion locations per gene in a parallel analysis . 
+ The genes shown below in Table 4 had a contribution value of 0.5 to 2 , indicating a loss of 0.5 to 2 logs of reads under the irradiated versus nonirradiated condition . 
+ When genes were found to have a low contribution value yet a large decrease in insertion locations , the genes were checked for single insertion locations with over 1,000 reads , which can be artifacts of library ampliﬁcation . 
+ The genes listed below in Table 4 that were discovered by loss-of-insertion locations rather than decreased read counts were in the top 99.999 % of genes for insertion location losses . 
+ IR survival assays for gene validation . 
+ Of the 46 genes discovered , 19 genes were veriﬁed by deleting the gene from the Founder strain via the Wanner method ( 50 ) . 
+ Deletions strains were tested for their ability to survive increasing doses of ionizing radiation in comparison to an isoge-neic wild-type strain . 
+ All strains were tested in biological triplicates . 
+ Cells from a fresh single colony of each strain were cultured in LB broth ( 55 ) at 37 °C with aeration . 
+ After growth overnight , cultures were diluted 1:1,000 into 25 ml fresh LB broth in 125-ml ﬂasks and grown at 37 °C with shaking until an OD600 of 0.4 was reached . 
+ For each sample , 15 ml of culture was spun down and resuspended in 0.8 ml of fresh LB . 
+ One-hundred-micro-liter aliquots were set on ice as the nonirradiated controls , and the other 700 l was irradiated in a Mark I 137Cs irradiator ( from J. L. Shepherd and Associates ) for the times corresponding to 1 and 2 Gy ( 7 Gy/min ) . 
+ Irradiated samples as well as the nonirradiated control samples for each culture were diluted appropriately and plated on LB -- 15 % agar medium to determine the total number of CFU . 
+ Percent survival was calculated by dividing the titer of the surviving population by the titer of the nonirradiated control sample . 
+ For each strain , 3 to 5 biological replicates were performed . 
+ RESULTS TraDIS was performed to identify genes involved in IR survival.
+ The original directed evolution trials were carried out with an aliquot of E. coli strain MG1655 obtained from F. R. Blattner ( 56 ) . 
+ Deep sequencing revealed 6 mutations in this strain ( designated Founder ) relative to the type strain database ( 26 ) . 
+ For TraDIS , a mutant library consisting of 500,000 insertion mutants was generated in Founder and also in one highly evolved strain , CB2000 . 
+ Each library was subjected to 5 rounds of irradiation followed by competitive outgrowth , as diagrammed in Fig. 1 . 
+ A nontreated control was taken through the entire experiment , treated identically except that it sat outside the irradiator during treatment . 
+ To identify genes that contribute to IR survival , genomic DNA from the mutagenized population was isolated after the ﬁrst and ﬁfth IR treatment passages . 
+ TraDIS was carried out using Illumina highthroughput sequencing to identify genes that , when disrupted , caused cells to drop out of the population after IR treatment , as illustrated for uvrB in Fig. 1B . 
+ The method can be veriﬁed in part by examining the effects of irradiation on the insertion patterns of genes known to have major roles in the repair of DNA damage inﬂicted by IR . 
+ As expected , uvrB , which encodes a key component of the complex that promotes nucleotide excision repair , exhibited a diagnostic two-part insertion pattern reﬂecting an essential role in IR resistance ( step 1 ) . 
+ Numerous insertions were present in the uvrB gene in the nonirradiated populations , indicating that the gene is not essential for normal growth ( step 2 ) . 
+ Transposon-directed sequence reads were reduced in this gene after IR treatment passage 1 , and there were no sequence reads in this gene after IR treatment passage 5 ( Fig. 1 ) . 
+ These results suggest that any cells that had insertions in uvrB rapidly dropped out of the population upon treatment . 
+ General sequencing results . 
+ The protocol described above generated 15 to 30 million reads per sample . 
+ The reads were mapped to the E. coli genome , and the number of unique transposon insertion sites and the average base pair distance between inserts for each sample were calculated ( Table 3 ) . 
+ These results suggested that the mutagenesis was saturating with 1 insertion per 40 to 50 bases for the mutant pool after 1 passage . 
+ For the mutant pool after 5 passages , 1 transposon insertion per 100 to 200 bases was detected . 
+ The decline was expected , as it was previously reported that passaging reduces the number of unique mutants in the pool , even in the absence of stress ( 45 ) . 
+ This is due to genetic bottlenecks that occur during passaging and competition between strains with different mutations . 
+ Insertion densities were calculated for the mutant pools from each passage , as previously described ( Table 3 ) ( 57 ) . 
+ The gene length boundaries were calculated to determine the minimum length of a gene ( in bp ) required to ensure that the absence of sequenced transposon insertions signaled an essential gene function rather than a random chance occurrence ( P 0.05 ) . 
+ This value differed by sample due to the varied insertion densities obtained for each sample . 
+ We note that approximately 670 genes in the E. coli genome are required for normal growth in an unstressed environment under our growth conditions as indicated by the absence of insertions in these genes in our nonirradiated control that met our threshold criteria ( see Materials and Methods ) . 
+ These genes are summarized in Table S1 in the supplemental material . 
+ We are thus effectively screening the approximately 3,555 genes denoted nonessential in our nonirradiated sample ( approximately 84 % of the genome ) . 
+ Our goal was to identify those genes that are not necessary during normal growth but which become important when cells are heavily dosed with ionizing radiation . 
+ Essential genes have previously been surveyed in E. coli . 
+ Of 620 genes denoted essential in one survey that covered 87 % of the genomic open reading frames in E. coli ( 58 , 59 ) , approximately 55 % overlap the essential genes found in our study . 
+ A second survey carried out under different conditions produced a list of 300 essential genes ( 60 ) , of which 94 % appeared essential in our study . 
+ The differences observed between these three studies are likely due to different growth conditions , the presence or absence of competitive outgrowth , the approach for distinguishing essential versus nonessential genes , and the depth in which the mutant libraries were assayed . 
+ Because of the requirement for outgrowth in our protocol , any gene inactivation that produces a sufﬁcient decline in growth rate under our conditions will lead to that gene 's inclusion on the list of essential genes . 
+ Identiﬁcation of genes involved in IR survival . 
+ After removal of the transposon tag sequence , each read was mapped to the E. coli genome . 
+ The ﬁrst genome-derived base pair of each read de-ﬁned the genomic location of each transposon insertion within the mutant pool . 
+ The number of transposon insertion locations for each gene was used to calculate the relative contributions of nonessential genes to IR survival . 
+ Contribution values were only calculated for genes with at least three independent insertion sites to reduce variability that can result in misleading ﬁtness calculations ( 61 ) . 
+ We identiﬁed genes that , when disrupted , resulted in reduced IR survival ﬁtness after passage 1 and after passage 5 . 
+ A total of 46 genes were thus identiﬁed in the Founder strain ( Tables 4 and 5 ) . 
+ We also noted that well over 90 % of the nonessential genes in the E. coli genome exhibited little or no difference in the observed insertion patterns with or without irradiation . 
+ The genes of interest in this study are those exhibiting transposon insertion patterns similar to uvrB in Fig. 1 , and they are listed in Tables 4 and 5 . 
+ Deletion or alteration of some genes involved in DNA repair are known to result in slow growth phenotypes in rich media ( 62 -- 70 ) . 
+ The otherwise-nonessential recA protein , clearly important for IR resistance , is not present in our list because strains with alterations resulting in recA gene inactivation grow somewhat slower and are unable to compete with the broader population during outgrowth . 
+ A total of 18 of the 46 genes listed in Tables 4 and 5 exhibited patterns that reﬂected somewhat slow growth , although the decline in growth rate was insufﬁcient to remove them from our screen at least in passage 1 ( Tables 4 and 5 ) . 
+ Cells disrupted for these genes had reduced ﬁtness upon irradiation in passage 1 . 
+ By passage 5 , insertions in these genes disappeared from both the irradiated and the nonirradiated samples , eliminated competitively during outgrowth . 
+ Interestingly , two genes , recR and rep , appeared to be essential for IR survival as early as the ﬁrst IR exposure . 
+ By passage 5 , there were no insertions in these genes a Contributions were calculated as described in Materials and Methods . 
+ Genes listed here had the largest decrease in reads upon irradiation compared to the nonirradiated control , with contribution factors between 0.5 and 2 logs . 
+ * , the gene was discovered by analyzing loss of unique insertion locations . 
+ Validation in this study indicates that the gene was assayed for its contribution to IR resistance as shown in Fig. 2 . 
+ in the nonirradiated control samples . 
+ We hypothesize that these genes are essential for surviving IR but also make a modest contribution to growth in rich media . 
+ These genes have been reported to be important for normal growth ( 59 , 60 ) . 
+ To further investigate the importance to general radiation resistance of the 46 genes identiﬁed here in Founder , TraDIS was performed on CB2000 , a strain of E. coli previously reported to be highly radiation resistant ( 26 ) . 
+ The 9 genes with the largest contributions to IR resistance ( Table 4 ) were also identiﬁed as top contributors in CB2000 after passage 1 , in spite of the presence of all CB2000 mutations that confer an IR resistance phenotype . 
+ This result validates their importance for timely recovery from damage inﬂicted by IR . 
+ A total of 37 of the reported genes for MG1655 ( 80 % ) were identiﬁed as important in CB2000 as well . 
+ Data were ambiguous for 5 of the 9 genes that were required for MG1655 but not CB2000 . 
+ This was likely due to the genetic bottlenecks that a These genes were identiﬁed to be important for passage 1 , but cells lacking these genes were likely outcompeted due to growth defects associated with the gene deletion , and there were no data for passage 5 ( in either the nonirradiated control or the irradiated sample ) . 
+ Contributions were calculated as described in Materials and Methods . 
+ * , the gene was discovered by analyzing loss of unique insertion locations . 
+ ** , uvrD exhibited the greatest observed loss of unique insertion locations . 
+ Validation in this study indicates that the gene was assayed for its contribution to IR resistance as shown in Fig. 2 . 
+ occur during passaging or slow growth of these mutants . 
+ Four genes , pepP , rsxA , crr , and tatC , appear to be important for survival in wild-type E. coli but not the directly evolved CB2000 . 
+ This suggests that one or more of the 69 mutations arising in CB2000 ( 25 ) render these four genes dispensable for IR survival . 
+ IR resistance gene validations . 
+ To directly verify a subset of genes identiﬁed as putative IR resistance genes by TraDIS , we separately deleted 31 of the 46 genes identiﬁed from E. coli MG1655 ( e14 ) and assayed each deletion mutant for survival following exposure to 1,000 and 2,000 Gy ( Fig. 2 ) . 
+ Two of these ( tatC and waaC ) had viability issues that made it impossible to generate survival curves . 
+ The tatC gene is nonessential , as it encodes a component of a system that transports folded proteins from the cell to the outer membrane . 
+ Deletions of the gene compromise the integ-rity of the outer membrane and render it sensitive to many stresses ( 71 -- 74 ) . 
+ The waaC gene ( formerly rfaC ) encodes heptosyltransferase I , which catalyzes a step in the synthesis of outer membrane lipopolysaccharide . 
+ As in the case of tatC , deletions of waaC may render the cell particularly sensitive to stress ( 75 ) , even though the gene may be scored nonessential . 
+ The other 29 exhibited IR recovery deﬁciencies that were readily documented with survival curves , helping to validate that the overall screen was identifying genes of interest . 
+ From our tests , these 29 deletion mutants were clustered into four different groups based on the overall decline in cell survival upon their deletion . 
+ They were numbered 1 through 4 in order of increasing severity of the observed sensitivity to ionizing radiation . 
+ The genes in group 1 ( rdgC , ftsP , radA , rsxB , topB , recX , speA , yabI , yhgF , ybjN , tolA , and ompA ) exhibit relatively modest effects when deleted , with a decline in survival of just over an order of magnitude or less at 2,000 Gy . 
+ Those in group 2 ( pgi , rsxA , dnaJ , yafC , prc , uup , and crr ) exhibit a decline in survival of 1 to 2 orders of magnitude at 2,000 Gy . 
+ The group 3 genes ( uvrA , uvrB , yejH , sbcB , yebC , ybgI , and pstS ) had declines in survival of approximately 2 to 3 orders of magnitude . 
+ Those gene products in group 4 produce the most dramatic effects , a 3 - to 5-log decline in survival at 2,000 Gy when deleted ( Fig. 2 ) . 
+ This ﬁnal group features three genes ( recF , recG , and recN ) that have long been associated with IR survival . 
+ We note that one other gene initially identiﬁed in the screen and subjected to validation by this method ( the mrcB gene , not included in Table 5 ) turned out to be a false positive . 
+ Although the screen exhibited a very low error level , it is thus possible that one or two additional listings in Table 4 are errant . 
+ We thus estimate the false-positive rate to be 5 % . 
+ Contributions to IR resistance . 
+ The list of gene functions required for survival after extreme IR exposure ( Table 4 ) generally continues themes that were evident in the directed evolution study that examined the genetic adaptations required for extreme resistance to IR ( 25 ) . 
+ In the overall list , 20 of the 46 genes ( or 43 % ) can be clearly deﬁned as DNA repair or DNA metabolism functions . 
+ DNA repair functions have already been amply implicated in recovery from IR exposure ( 11 -- 19 ) , and the heavy representation of DNA repair functions in this list helps to verify the screen . 
+ In the directed evolution study ( 25 ) , a small number of mutations in genes involved in DNA repair provide the major contributions to extreme IR resistance in one isolate ( CB2000 ) derived from directed evolution . 
+ Whereas the idea that DNA is the major target of IR that results in lethality has been challenged in recent years ( 20 -- 24 ) , a range of key DNA repair systems must be intact in order for the cell to survive extreme IR exposure . 
+ Additional contributions are evident . 
+ An additional 8 of the genes identiﬁed ( 17 % ) have not been functionally characterized and represent a new class of genes to be studied for a role in surviving IR exposure ( Fig. 2D ) . 
+ We have directly validated 7 of these ( for reasons that are unclear , it was not possible to make deletion mutants of yqiA ) , demonstrating that they are indeed important to survival when cells are exposed to IR . 
+ In three cases ( yabI , yhgF , and ybjN ) , elimination of gene function has just a modest effect on IR survival and the genes fall into group 1 . 
+ One gene , yafC , falls into group 2 . 
+ Three genes ( yejH , ybgI , and yebC ) have quite substantial effects on IR survival and fall into group 3 . 
+ This result represents the ﬁrst observed phenotype for many of these genes . 
+ A few things are known about the genes that fall into category 3 . 
+ Although the function remains enigmatic , the structure of the YbgI protein is known . 
+ It is a toroidal structure consisting of a trimer of dimers , wherein each subunit exhibits two metal binding sites on the inside of the toroid ( 76 ) . 
+ The ybgI gene shares an operon with the gene that encodes endonuclease VIII , eliciting some speculation about a possible DNA repair function ( 76 ) . 
+ The product of the yebC gene appears to have a function in the transcriptional regulation of the RuvABC proteins , which are all involved in recombinational DNA repair ( 77 ) . 
+ The yejH gene is worthy of special mention . 
+ It encodes a putative DNA helicase with signiﬁcant homology to the human XPB gene , which encodes a nucleotide excision repair helicase conserved in eukaryotes and archeans . 
+ By sequence analysis , the protein possesses the 7 helicase motifs central to superfamily 2 helicases in the N-termi-nal 350 amino acids . 
+ It plays a substantial role in survival after IR exposure . 
+ The remaining 19 genes cluster into 5 major categories , which are deﬁned in Table 6 . 
+ Four of the genes fall into a category of oxidative stress signaling ( pgi , speA , and rsxAB ) , in line with the idea that amelioration of protein oxidation is a major mechanism of IR resistance ( 20 -- 24 ) . 
+ Four of these were directly validated . 
+ The pgi and rsxA genes fall into category 2 , while rsxB and speA are category 1 genes . 
+ The pgi gene has a complex involvement , as described below . 
+ Also as described below , genes in other catego-ries may affect oxidative stress . 
+ The remaining genes have roles in cell wall structure and biosynthesis ( 7 ) , protein stability and turn-over ( 5 ) , cell division ( 4 ) , and central metabolism ( 2 ) . 
+ This listing does not include a few gene products , such as pgi , that fall into more than one functional category ( Table 6 ) . 
+ Of interest , 5 of 
+ Cellular function clustered genes Genesa
+ 4
+ DNA metabolism 21 46 recN , uvrABCD , recD , recF , * recO , sbcB , endA , phr , uup , gph , rdgC , yejH , topB , recX , recR , rep , radA , recG Cell wall structure and 7 rfaC , prc , * ompA , tolA , pstS , biosynthesis tatC , * crr Unknown yafC , ybjN , yqiA , yhgF , yabI , ybgI , yebC Cell division tolA , * slmA , ftsN , ftsP Oxidative stress pgi , * rsxAB , speA * signaling Protein stability and 3 pepP , * prc , * dnaJ turnover Central metabolism SIM response 
+ 11
+ the genes listed in Table 6 , pgi , prc , tatC , recF , and pepP , are part of a broader network of 93 genes believed to play a role in promoting the stress-induced mutagenesis ( SIM ) response of E. coli K-12 ( 78 ) . 
+ Two genes identiﬁed in the screen are not included in Table 6 : trmH and rlmL . 
+ These may contribute to IR resistance . 
+ However , they may have been identiﬁed as IR resistance genes due to possible polar effects on genes immediately downstream that are known to be involved in radiation survival : recG and uup , respectively . 
+ We have not directly tested these insertions to conﬁrm the presence of polar effects . 
+ However , each of these genes is in the same operon as and coexpressed with the indicated downstream genes . 
+ The requirements for genes involved in protein stability and turnover , as well as those involved in oxidative stress signaling , can likely be understood in the context of current research from the Daly and Radman groups that indicates that protein oxidation is a major deleterious effect of IR ( 21 , 24 , 44 , 79 , 80 ) . 
+ As is the case for DNA , proteins are a target of IR-mediated damage . 
+ Among the mutations underlying the acquired extreme resistance to IR documented in the directed evolution study ( 25 ) , mutations in rsxB , which encodes part of a system that controls the cellular response to reactive oxygen species , and in gsiB , which encodes a glutathi-one transporter , were apparently ﬁxed in population IR-2-20 . 
+ Each makes a small but measurable contribution to the acquired IR resistance of evolved strain CB2000 . 
+ The current study ( Table 6 ) indicates that multiple cellular systems involved in ameliorating the effects of oxidative damage play a signiﬁcant role in IR resistance . 
+ The pgi gene deserves special mention . 
+ The product of the pgi gene , phosphoglucose isomerase , catalyzes the second step in glycolysis . 
+ However , it is not an essential gene due to the metabolic bypass provided by the pentose phosphate pathway . 
+ The pentose phosphate pathway generates NADPH , which is interconverted with NADH by the NADH/NADPH transhydrogenases encoded by the genes udhA and pntAB . 
+ One result is a substantial increase in the electrons fed into oxidative phosphorylation and a resultant increase in the production of damaging reactive oxygen species . 
+ Cells deleted for pgi generate suppressor mutations in rpoS , udhA , and pntAB under some growth conditions ( 81 ) . 
+ They also lose the e14 prophage to deletion ( 81 ) . 
+ We note that deletion of the e14 prophage is a stress indicator , and this was the ﬁrst genomic alteration detectable in all trials of our experiment in directed evolution of radiation resistance ( 26 ) . 
+ The role of pgi in oxidative stress may be much more complex . 
+ The Pgi protein was identiﬁed as an interaction partner with the YejH protein in a global search for E. coli protein interactions ( 82 ) , a result we have conﬁrmed ( R. Byrne , unpublished results ) . 
+ YejH is one of the proteins of unknown function identiﬁed in the current study as essential for recovery from heavy IR exposure ( Table 4 ) . 
+ A pgi deletion mutant is hypersensitive to oxidative stress induced by paraquat ( 83 ) , UV sensitive ( 78 ) , defective for the rpoS reponse ( 78 ) , and defective for spontaneous SOS induction ( 78 ) . 
+ The pgi gene is part of the soxR regulon ( 84 , 85 ) . 
+ The pgi gene product may have functions outside its role in glycolysis , perhaps working upstream of rpoS in a pathway that leads to stress-induced mutagenesis ( 78 ) . 
+ A few of the genes listed under other categories may actually affect the cellular oxidative response . 
+ For example , cells with a deletion of the speA gene , which is involved in polyamine biosynthesis , are sensitive to H2O2 ( 86 ) , and we have thus listed speA under both categories . 
+ At least one of the genes of unknown function may also be linked to the oxidative damage response . 
+ The ybjN gene encodes a protein with structural homology to the DR1245 protein of Deinococcus radiodurans and type III secretion system chaperones ( 87 ) . 
+ Overexpression of ybjN leads to induction of the SOS response ( 88 ) . 
+ In general , the product of the ybjN gene appears to play a broad role in cellular survival under conditions of stress ( 88 ) . 
+ Selected aspects of intermediary metabolism are also doubt-lessly linked to stress responses . 
+ The crr gene identiﬁed in the current screen encodes the phosphotransfer protein EIIA ( glc ) , which is a component of three different sugar transport systems ( glucose , trehalose , and maltose ) ( 89 ) . 
+ EIIA ( glc ) is also a negative regulator of other carbohydrate utilization pathways ( glycerol , lactose , melibiose , and maltose ) and negatively regulates rpoS . 
+ It may positively regulate adenylate cyclase , which controls transcription of genes involved in the stress response . 
+ Survival after IR exposure may require some ﬂexibility in carbon source utilization . 
+ PstS is a periplasmic protein that binds phosphate as part of the phosphate transport system ( 71 , 90 -- 93 ) . 
+ Phosphate limitation itself can trigger a stress response in bacteria . 
+ The combination of limiting phosphate and radiation damage may be synergistic in the effects on lethality . 
+ The requirements for several genes involved in outer membrane structure and biosynthesis continue an additional theme seen in our directed evolution study ( 25 ) . 
+ In the evolved strain CB2000 , mutations in the genes wcaK and nanE again made small but measurable contributions to the acquired IR resistance phenotype ( 25 ) . 
+ These genes encode enzymes involved in the synthesis and/or recycling of peptidoglycan or surface polysaccharides . 
+ In the present study , eight additional genes that contribute to cell wall structure and biosynthesis were identiﬁed that make signiﬁcant contributions to IR survival . 
+ After DNA repair , this is the largest number of genes concentrated in any particular function . 
+ The importance of the bacterial cell wall as a target of IR-mediated damage has not yet been adequately assessed . 
+ The results of these two studies suggest that , in addition to DNA damage and protein inactivation via oxidation , the integrity of the bacterial cell wall , or particular substructures within it , may represent a signiﬁcant factor in the overall lethality of IR . 
+ In each case , the gene functions identiﬁed in the new study can be dispensed with under normal growth conditions , but they become important upon IR exposure . 
+ We can suggest at least three mechanisms that might be at work . 
+ First , the cell wall , particularly the outer membrane , could have substructures that are effectively weak points that are particularly sensitive to damage inﬂicted by IR . 
+ Alternatively , there may be key enzymatic or transport steps that have broad signiﬁcance for cell wall or membrane integrity . 
+ The relevant enzymes or transporters could become essential under stress , as we suggested above for tatC and waaC . 
+ The TolA protein is the inner membrane protein that links to Pal , an outer membrane protein , to maintain cell envelope integrity . 
+ This linkage is likely important under stressed conditions , such as irradiation ( 94 -- 98 ) . 
+ Second , there may be alterations to the peptidoglycan that are part of a general cellular response to stress that are critical to IR survival . 
+ Peptidoglycan plays an important role in osmotic regulation and cell shape ( 99 ) . 
+ Changes to its structure accompany a number of cell stresses , including the nutritional stress that leads to the onset of stationary phase ( 99 , 100 ) . 
+ Third , a somewhat different cell shape may be more optimal for IR survival in ways that are hard to predict . 
+ The ompA gene , which was mutated in a few isolates studied in the original directed evolution study ( 25 ) and found to contribute to survival in this study , has a proposed role in mediating cell shape . 
+ Cells deleted for ompA have unstable outer membrane structures and the cells tend to be spherical ( 101 , 102 ) . 
+ Additionally , altera-tions in one of the 12 penicillin binding proteins that catalyze synthesis of peptidoglycan were documented in a long-term evolution experiment that showed increased cellular ﬁtness in a particular medium , and this in turn produced alterations in cell shape ( 103 ) . 
+ Additional mechanisms may be considered , and this list is not intended to be exhaustive . 
+ The present study was carried out to identify genes that are critical for survival when cells are exposed to ionizing radiation but which were not necessarily targets of mutation in our recent directed evolution study ( 25 ) . 
+ However , of the 46 genes identiﬁed in this study , 9 ( 19 % ) acquired mutations in one or more of the sequenced isolates characterized as part of that earlier study ( 25 ) . 
+ As the present study indicates that loss of many of these gene functions results in signiﬁcant radiation sensitivity , it is tempting to speculate that the mutations identiﬁed in the earlier study may be either neutral or gain-of-function mutations . 
+ This conclusion must be tempered by the fact that each of the IR-resistant E. coli strains in which the mutations appear has dozens of additional mutant loci that could potentially act as functional suppressors ( 25 ) . 
+ Of the nine , the nonsynonymous mutation in rsxB has already been discussed . 
+ A mutation upstream of tolA was ﬁxed in another of the four separately evolved populations ( IR-3-20 ) ( 25 ) , indicating that a change in the expression of this operon could be beneﬁcial to survival . 
+ A nonsynonymous mutation was also ﬁxed in prc in IR-3-20 . 
+ While no mutations were found among the evolved isolates in ftsN or ftsP , mutations were common in ftsW and ftsZ among isolates of two subpopulations in IR-1-20 , suggesting that alterations of the cell division process might contrib-ute to IR resistance . 
+ Other mutations in our previous study were identiﬁed in recG , dnaJ , ompA , tatC , and yejH . 
+ The yejH and dnaJ mutations appeared to be ﬁxed in the isolates taken from the further evolution of strain CB1000 , and it is possible that the observed mutations in these two genes provide a useful gain of function in the context of extreme exposure to ionizing radiation . 
+ However , the mutations in tatC , recG , and ompA appeared in only one or a few IR-resistant isolates , providing little in the way of a pattern to indicate that the mutations in these genes contribute signiﬁcantly to IR resistance . 
+ DISCUSSION
+ Combined with the directed evolution study ( 25 ) , this work reveals a multifaceted and nuanced cellular approach to surviving IR . 
+ The published directed evolution study documents that enhancements to DNA repair processes can make major contributions to an extreme IR resistance phenotype acquired by directed evolution ( 25 ) , even in a genetic background that is otherwise unaltered . 
+ At the same time , contributing mutations appear that provide potential enhancements to cellular systems for protein oxidation amelioration , protein folding and stability , cell division , and maintenance of cell wall structure and function . 
+ Roles for the same cellular functions are evident in the screen carried out here . 
+ A more complete description of the molecular basis of extreme IR resistance might thus consist of ( i ) enhanced DNA repair processes , ( ii ) an enhanced capacity to prevent or ameliorate the effects of protein oxidation , including protein stabilization/refolding , ( iii ) an appropriate control of cell division to ensure that DNA repair can be completed , and ( iv ) an enhancement of key processes affecting the structure and function of the cell wall and maintenance of cell wall integrity . 
+ The current study provides a general screen of gene functions that are not required for normal growth but which become necessary when cells are exposed to high levels of ionizing radiation . 
+ There are at least 46 genes required for cells to recover from IR exposure . 
+ As beﬁts the need to repair IR damage to DNA , DNA repair functions predominate , with 20 identiﬁed genes falling into this category . 
+ Several of these genes , particularly recF , recN , and recG , make very substantial contributions to survival and represent genes long known to be required for survival after IR exposure ( 12 , 104 -- 106 ) . 
+ The results , not surprisingly , highlight the importance of general DNA repair when cells are exposed to ionizing radiation . 
+ At the same time , the requirements for gene functions involved in protein structure stabilization and turnover , the response to oxidative damage , and the maintenance of bacterial cell wall structure and function continue themes that were evident in the directed evolution study ( 25 ) . 
+ Our assessment of genes that contribute to IR survival implicated eight genes of previously unknown function in the recovery of cells , with ﬁve of them validated . 
+ By utilizing high-throughput screening techniques with a simple organism such as E. coli under various growth conditions , we can begin to identify the functions of these enigmatic genes by identifying growth conditions or stress conditions where these genes become essential . 
+ A follow-up study on the cellular role of these eight genes will begin to unravel the basis of their contributions and potentially deﬁne their cellular functions . 
+ This in turn may help deﬁne the role of their homologs in archaeans and eukaryotes . 
+ Of the eight genes , three have homologs identiﬁed in all three domains of life , and all eight have homologs in eukaryotes . 
+ In no instance has the function of one of these homologs been described in detail , although some hints about YbjN , YebC , YejH , and YbgI functions have appeared ( 76 , 77 , 87 , 88 ) . 
+ In several cases , we provide here the ﬁrst phenotype described for cells lacking the functions of these genes . 
+ Among the genes with no previously described phenotype , we were particularly interested in yejH because of the dramatic TraDIS proﬁle and the IR sensitivity of cells lacking a functional copy . 
+ We have found that this gene exhibits signiﬁcant homology to the human gene encoding XPB . 
+ This gene clusters into category 3 , along with two other genes of unknown function ( yebC and ybgI ) and the uvrA , uvrB , and sbcB genes of known function that have been studied for their roles in DNA repair for decades . 
+ Based on the gene sequence , an initial characterization ( R. Byrne and S. Chen , unpublished data ) , and the the new phenotype of the yejH gene described here , we hypothesize that YejH is involved in cellular repair of DNA after IR exposure . 
+ Characterization of this gene is the subject of ongoing work . 
+ ACKNOWLEDGMENTS
+ This work was supported by the National Institutes of General Medical Science , grant GM32335 . 
+ We thank John R. Battista for helpful discussions during the course of this work and for comments on the manuscript . 
+ We similarly thank Diana Downs for helpful discussions in planning the screen and Marie Adams for sequencing advice and guidance .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/25177315.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/25177315.txt 0 → 100644
View file @27818a9
+ Aromatic inhibitors derived from ammonia-pretreated
+ 1 Great Lakes Bioenergy Research Center , University of Wisconsin-Madison , Madison , WI , USA 2 Department of Biomolecular Chemistry , University of Wisconsin-Madison , Madison , WI , USA 3 Department of Biochemistry , University of Wisconsin-Madison , Madison , WI , USA 4 Department of Chemistry , University of Wisconsin-Madison , Madison , WI , USA 5 Paciﬁc Northwest National Laboratory , Richland , WA , USA 6 Department of Chemical and Biological Engineering , University of Wisconsin-Madison , Madison , W 7 Department of Bacteriology , University of Wisconsin-Madison , Madison , WI , USA 
+ INTRODUCTION
+ Elucidation of metabolic and regulatory barriers in microbial conversion of lignocellulosic sugars to ethanol is crucial for both the immediate goal of economical cellulosic ethanol and for the long-term development of next-generation biofuels and sustainable chemicals from renewable biomass . 
+ Efﬁcient conversion of lignocellulose ( LC ) hydrolysates is limited by multiple factors ( Mills et al. , 2009 ; Lau and Dale , 2010 ) , including high osmolarity ( Underwood et al. , 2004 ; Purvis et al. , 2005 ; Miller and Ingram , 
+ 2007 ) , toxicity of the conversion products ( Ingram and Buttke , 1984 ) , and inhibitors of microbial metabolism and growth generated during the deconstruction of LC ( Zaldivar et al. , 1999 ; Wang et al. , 2011a ; Tang et al. , submitted ) . 
+ Understanding and overcoming the barriers created by LC-derived inhibitors presents significant challenges as their composition can vary depending on the biomass source of LC , the methods used to deconstruct the LC , and the diverse metabolic and regulatory responses of microbes to inhibitors ( Klinke et al. , 2004 ; Liu , 2011 ) . 
+ Synergy among the inhibitors , the high osmolarity inherent to hydrolysates , and toxicity of conversion products ( e.g. , ethanol ) are additional factors that contribute to the complex molecular landscape of lignocellulosic hydrolysates ( Klinke et al. , 2004 ; Liu , 2011 ; Piotrowski et al. , 2014 ) . 
+ Release of sugars from LC typically requires either acidic or alkaline treatment of biomass prior to or coupled with chemical or enzymatic hydrolysis ( Chundawat et al. , 2011 ) . 
+ Acidic treatments generate signiﬁcant microbial inhibitors by condensation reactions of sugars ( e.g. , furfural and 5-hydroxymethylfurfural ) . 
+ Microbes typically detoxify these aldehydes by reduction or oxidation to less toxic alcohols or acids ( Booth et al. , 2003 ; Herring and Blattner , 2004 ; Marx et al. , 2004 ; Jarboe , 2011 ) , but these conversions also directly or indirectly consume energy that otherwise would be available for biofuel synthesis ( Miller et al. , 2009a , b ) The impact of these inhibitors is especially signiﬁcant for C5 sugars like xylose whose catabolism provide slightly less cellular energy ( Lawford and Rousseau , 1995 ) , and can be partially ameliorated by replacing NADPH-consuming enzymes with NADH-consuming enzymes ( Wang et al. , 2013 ) . 
+ Alkaline treatments , for instance with ammonia , are potentially advantageous in generating fewer toxic aldehydes , but the spectrum of inhibitors generated by alkaline treatments is less well characterized and their effects on microbial metabolism are less well understood . 
+ We have developed an approach to elucidate the metabolic and regulatory barriers to microbial conversion in LC hydrolysates using ammonia ﬁber expansion ( AFEX ) of corn stover , enzymatic hydrolysis , and a model ethanologen ( GLBRCE1 ) engineered from the well-studied bacterium E. coli K-12 ( Schwalbach et al. , 2012 ) . 
+ Our strategy is to compare anaerobic metabolic and regulatory responses of the ethanologen in authentic AFEX-pretreated corn stover hydrolysate ( ACSH ) to responses to synthetic hydrolysates ( SynHs ) designed to mimic ACSH with a chemically deﬁned medium . 
+ GLBRCE1 metabolizes ACSH in exponential , transition , and stationary phases but , unlike growth in traditional rich media ( Sezonov et al. , 2007 ) , GLBRCE1 enters stationary phase ( ceases growth ) long before depletion of available glucose but coincident with exhaustion of amino acid sources of organic nitrogen ( Schwalbach et al. , 2012 ) . 
+ The growth-arrested cells remain metabolically active and convert the remaining glucose , but not xylose , into ethanol ( Schwalbach et al. , 2012 ) . 
+ Our ﬁrst version of SynH ( SynH1 ) matched ACSH for levels of glucose , xylose , amino acids , and some inorganics , overall osmolality , and the amino-acid-dependent growth arrest of GLBRCE1 ( Schwalbach et al. , 2012 ) . 
+ However , gene expression proﬁling revealed that SynH1 cells experienced signiﬁcant osmotic stress relative to ACSH cells , whereas ACSH cells exhibited elevated expression of efﬂux pumps , notably of aaeAB that acts on aromatic carboxylates ( Van Dyk et al. , 2004 ) , relative to SynH1 cells ( Schwalbach et al. , 2012 ) . 
+ Osmolytes found in ACSH ( betaine , choline , and carnitine ) likely explained the lower osmotic stress , whereas phenolic carboxylates derived from LC ( e.g. , coumarate and ferulate ) likely explained efﬂux pump induction possibly via the AaeR and MarA/SoxS/Rob regulons known to be induced by phenolic carboxylates ( Sulavik et al. , 1995 ; Dalrymple and Swadling , 1997 ) . 
+ We also observed elevated expression of psp , ibp , and srl genes associated with ethanol stress at ethanol concentrations three-fold lower than previously reported to induce expression ( Yomano et al. , 1998 ; Goodarzi et al. , 2010 ) and thus consistent with a synergistic stress response with the LC-derived inhibitors . 
+ These ﬁndings led us to hypothesize that the collective effects of osmotic , ethanol , and LC-derived inhibitor stresses created an increased need for ATP and reducing equivalents that was partially offset in early growth phase by catabolism of amino acids , as N and possibly S sources . 
+ However , as these amino acids are depleted , cells transition to stationary phase where they continue to catabolize glucose for maintenance ATP and NAD ( P ) H but are unable to generate sufﬁcient energy for cell growth or efﬁcient xylose catabolism . 
+ To test this hypothesis , we developed a new SynH formulation ( SynH2 ) that faithfully replicates the physiological responses in ACSH and the effects of LC-derived inhibitors . 
+ Using SynH2 with and without the LC-derived inhibitors , we generated and analyzed metabolomic , gene expression , and proteomic data to deﬁne the effects of inhibitors on bacterial gene expression and physiology . 
+ The analysis allowed identiﬁcation of key regulators that may provoke stress responses in the presence of LC-derived inhibitors and suggest that coping mechanisms employed by E. coli to deal with lignocellulosic stress drains cellular energy , thus limiting xylose conversion . 
+ MATERIALS AND METHODS REAGENTS
+ Reagents and chemicals were obtained from Thermo Fisher Scientiﬁc ( Pittsburgh , Pennsylvania , USA ) or Sigma Aldrich Co. ( Saint Louis , Missouri , USA ) with the following exceptions . 
+ 5-hydroxymethyl-2-furancarboxylic acid and 5 - ( hydroxymethyl ) furfuryl alcohol were obtained from Toronto Research Chemicals Inc. ( Toronto , Ontario , Canada ) . 
+ Deuterated compounds for HS-SPME-GC/IDMS were obtained from C/D/N Isotopes ( Pointe-Claire , Quebec , Canada ) . 
+ D4-acetaldehyde and U13C6-fructose were obtained from Cambridge Isotope Labs ( Andover , Massachusetts , USA ) . 
+ SYNTHESIS OF FERULOYL AND COUMAROYL AMIDES
+ Twenty grams of ferulic or coumaric acid were dissolved in 200 ml of 100 % ethanol in a 3-neck , 250 ml round-bottom ﬂask equipped with a magnetic stir bar and a drying tube on one of the outside arms . 
+ Ten milliliters of acetyl chloride was added and incubated with stirring at room temperature overnight . 
+ Ethanol was removed in a rotary evaporator at 40 ◦ C under modest vacuum ; the syrup re-dissolved in 250 ml 100 % ethanol and re-evaporated twice . 
+ When the ﬁnal syrup was reduced to < 25 ml , ∼ 6 ml portions were transferred to heavy-wall 25 × 150 mm tubes containing ∼ 30 ml concentrated ammonium hydroxide and sealed with a Teﬂon-lined cap . 
+ The sealed tubes were incubated at 95 ◦ C in a heating block covered with a safety shield overnight . 
+ The tubes were cooled and then left open in a hood for 4 -- 8 h to allow evaporation of ammonium hydroxide , during which the feruloyl or coumaroyl amide precipitated . 
+ The crystallized products were collected under vacuum on a glass ﬁlter and washed with 250 ml ice-cold 150 mM ammonium hydroxide . 
+ The product was allowed to air dry in a plastic weigh boat in the hood at room temperature for 2 -- 3 days . 
+ Purity of the products was analyzed by silica gel TLC developed with 5 % methanol in chloroform . 
+ Only preparations exceeding 90 % purity were used for experiments . 
+ PREPARATION OF ACSH
+ ACSH was prepared by one of two methods that differed in whether or not CS was autoclaved prior to enzymatic hydrolysis . 
+ Non-autoclaved CS hydrolysate more closely replicates an industrial process , was used by Tang et al. ( submitted ) for compositional analysis , and was used for some of our fermentation experiments . 
+ Autoclaved CS hydrolysate ensures sterility for bacterial fermentations and was used for our compositional analysis and for experiments to generate RNA-seq data . 
+ We did not observe a signiﬁcant difference in GLBRCE1 behavior in non-autoclaved vs. autoclaved CS hydrolysates , although HMF was detectable in the former , but not the latter ( Table 2 ) . 
+ We observed minor variations in growth with CS harvested in different years . 
+ For autoclaved CS hydrolysate , AFEX-pretreated CS was mixed with water to 6 -- 10 L ﬁnal volume at 60 g glucan/L loading ( 18 -- 22 % solids , adjusted for moisture content ) and autoclaved for 30 -- 120 min in a 15 L Applikon bioreactor vessel ( Schwalbach et al. , 2012 ) . 
+ For non-autoclaved CS hydrolysate , AFEX pretreated-corn stover was added to the vessel after the water was autoclaved for 30 min . 
+ For both , the sample was cooled to ∼ 70 ◦ C , adjusted to 10 L volume with water , and pH adjusted with ∼ 30 ml concentrated HCl . 
+ Hydrolysis was initiated by adding Novozymes CTec2 to 24 mg/g glucan and HTec2 to 6 mg/g glucan , followed by incubation for 5 days at 50 ◦ C with stir speed at 700 rpm . 
+ Some older batches of hydrolysate were prepared using Genencor Accellerase , Genencor Accellerase XY , and Multifect pectinase A in place of Novozyme enzymes ( Schwalbach et al. , 2012 ) . 
+ Solids were then removed by centrifugation ( 8200 × g , 4 ◦ C , 10 -- 12 h ) and the supernatant was ﬁlter-sterilized through 0.5 μm and then 0.2 μm ﬁlters . 
+ Prior to fermentation , the hydrolysate was adjusted to pH 7.0 using NaOH pellets and ﬁltered again through a 0.2 μm ﬁlter to remove precipitates and to ensure sterility . 
+ PREPARATION OF SYNTHETIC HYDROLYSATE (SYNH2)
+ SynH2 ( Table 1 ) was prepared by combining per L ﬁnal volume of SynH2 the following ingredients . 
+ Water ( ∼ 700 ml ) was mixed with 6.25 ml of 1.6 M KPO buffer , pH 7.2 , 20 ml of 1.5 M 4 ammonium sulfate , 20 ml of 2.25 M KCl , 1.25 M NaCl , 20 ml of a 50X amino acid stock giving the ﬁnal concentrations shown in Table 1 ( except tyrosine ) , 20 ml of 8.75 mM tyrosine dissolved in 50 mM HCl , 50 ml of 1 mM each adenine , guanine , cytosine and uracil dissolved in 10 mM KOH , 10 ml of vitamin stock ( 1 mM each thiamine , calcium pantothenate , p-aminobenzoic acid , p-hydroxybenzoic acid , and 2,3-dihydroxybenzoic acid ) , 1 ml of a 1000X stock of micronutrients ( ZnCl2 , MnCl2 , CuCl2 , CoCl2 , H3BO3 , ( NH4 ) 6Mo7O24 , and FeCl3 ) giving the ﬁnal concentrations shown in Table 1 , 1 ml of 1 M magnesium chloride , 1 ml of 90 mM CaCl2 , 10 ml of 1 M sodium formate , 10 mM sodium nitrate , and 50 mM sodium succinate , 1 ml of 3 M glycerol , 1 ml of 500 mM lactic acid , 1 ml of 700 mM glycine betaine , 700 mM choline chloride , 200 mM DL-carnitine ( osmolytes ) , 5.61 g acetamide , 2.71 g sodium acetate , 3.3 g sodium pyruvate , 2.94 g sodium citrate , 1.34 g DL-malic acid , 60 g D-glucose , 
+ 30 g D-xylose , 5.1 g D-arabinose , 1.48 g D-fructose , 1.15 g D-galactose , and 468 mg D-mannose . 
+ After adjusting to pH 7 with 10 N NaOH , the ﬁnal volume was adjusted to 1 L . 
+ This base recipe corresponds to SynH2 − . 
+ To create SynH2 , the aromatic inhibitors were added as solids to the base recipe in the following quantities per L SynH2 and stirred until fully dissolved before ﬁlter sterilization ; 531 mg feruloyl amide , 448 mg coumaroyl amide , 173 mg p-coumaric acid , 69 mg ferulic acid , 69 mg hydroxymethylfurfural , 59 mg benzoic acid , 15 mg syringic acid , 14 mg cinnamic acid , 15 mg vanillic acid , 2 mg caffeic acid , 20 mg vanillin , 30 mg syringaldehyde , 24 mg 4-hydroxybenzaldehyde , 3.4 mg 4-hydroxybenzophenone . 
+ For some experiments ( Figures S3 , S4 ) , feruloyl amide , coumaroyl amide , p-coumaric acid , ferulic acid , and hydroxymethylfurfural were added at up to twice these concentrations . 
+ The medium was ﬁlter-sterilized through a 0.2 μm ﬁlter . 
+ CHEMICAL ANALYSIS OF ACSH
+ Carbohydrates , ethanol , and short chain acids in ACSH and fermentation media were quantiﬁed using HPLC-RID , NMR , and GC-MS as previously described ( Schwalbach et al. , 2012 ) . 
+ ACSH osmolality was measured using a Vapro osmometer 5520 ( Wescor Inc. , Logan , Utah , USA ) . 
+ The synthetic hydrolysate medium used in these studies ( SynH2 ) was based on a previously described synthetic hydrolysate medium ( Schwalbach et al. , 2012 ) that was modiﬁed to more closely approximate the composition of ACSH media , particularly with regard to the presence of alternative carbon sources and protective osmolytes . 
+ Concentrations of components in the modiﬁed SynH2 are described in Table S1 . 
+ FERMENTATIVE GROWTH CONDITIONS
+ Cell culture was conducted as described previously ( Schwalbach et al. , 2012 ) , except fermentations were carried out in 3 L bioreactors ( Applikon Biotechnology ) containing 2.45 L of ACSH or SynH media , and cultures were diluted into ACSH or SynH with initial OD600 at 0.2 , grown anaerobically overnight , and then inoculated into bioreactors to a starting OD600 of 0.2 . 
+ For fermentation experiments to determine the effect of osmolytes , it was carried out in 0.5 L Sartorius BIOSTAT Qplus bioreactors ( Sartorius Stedium Biotech , Bohemia , NY ) containing 0.35 L of SynH2 media in the absence or presence of osmolytes or aromatic inhibitors . 
+ Culture density was measured using a Beckman Coulter DU720 in a 1 ml cuvette . 
+ Due to the high absorbance of ACSH at 600 nm , cells were diluted 1:10 in water prior to OD600 measurement , with diluted ACSH ( 1:10 ) as a blank . 
+ For SynH , diluted SynH ( 1:10 ) was used as a blank . 
+ RNA-seq GENE EXPRESSION ANALYSES
+ Samples for RNA-seq were captured and RNA extracted as described previously ( Schwalbach et al. , 2012 ) . 
+ FASTQ formatted sequence ﬁles from strand-speciﬁc Illumina RNA-Seq reads were aligned to the GLBRCE1 reference genome using Bowtie version 0.12.7 ( Langmead et al. , 2009 ) with `` -- nofw '' strandspeciﬁc parameter and maximal distance between the paired reads of 1000 bp . 
+ Nucleotide-level read quality information was used to weight each alignment at subsequent probabilistic expression counting step using the RNA-Seq by Expectation-Maximization 
+ LC-derived inhibitors w/o auto- w/auto- (mM)e,f clavingg claving
+ Feruloyl amide 5.5 3.5 ± 0.6 -- -- 2.75 Coumaroyl amide 5.5 7.1 ± 1.3 -- -- 2.75 Hydroxymethylfurfural 1.1 < 0.03 -- -- 0.55 p-Coumaric acid 2.1 1.4 ± 0.3 -- -- 1.05 Ferulic acid 0.71 0.091 ± 0.003 -- -- 0.355 Benzoic acid 0.48 0.32 ± 0.01 -- -- 0.48 Syringic acid 0.08 0.036 ± 0.004 -- -- 0.08 Cinnamic acid 0.09 -- -- -- 0.09 Vanillic acid 0.09 0.15 ± 0.02 -- -- 0.09 Caffeic acid 0.01 0.006 ± 0.001 -- -- 0.01 Vanillin 0.132 0.24 ± 0.04 -- -- 0.132 Syringaldehyde 0.162 0.017 ± 0.002 -- -- 0.162 4-Hydroxybenzeldehyde 0.197 0.15 ± 0.02 -- -- 0.197 4-Hydroxyacetophenone 0.025 0.017 ± 0.002 -- -- 0.025 Osmolality ( mol/kg ) 1.16 ± 0.03 0.97 1.17 ± 0.01 1.19 ± 0.01 aACSH data are from Schwalbach et al. ( 2012 ) . 
+ Sugar concentrations are averages of HPLC-MS and NMR determinations . 
+ bIn the SynH2 recipe , D-Arabinose was substituted for the L-Arabinose present in ACSH to avoid AraC-mediated repression of xylose-utilization genes ( Desai and Rao , 2010 ) . 
+ In other contexts , use of L-Arabinose in SynH2 would be appropriate . 
+ c -- , not determined in ACSH or not added in SynH . 
+ dn.d. , not detectable by methods used . 
+ eAromatic compounds detected at less than 20 µM in ACSH are not reported f The sets of acids , amides , and aldehydes used for supplemental studies in formulating SynH2 consisted of p-Coumaric acid , Ferulic acid , Benzoic acid , Syringic acid , Cinnamic acid , Vanillic acid , and Caffeic acid ( acids ) , Feruloyl amide and Coumaroyl amide ( amides ) , and HMF , Vanillin , Syringaldehyde , 4-Hydroxybenzaldehyde , and 4-Hydroxyacetophenone ( aldehydes ) at the concentrations listed for non-autoclaved ACSH or fractions thereof as described in the Supplemental Results . 
+ gACSH Inhibitor concentrations for non-autoclaved CS hydrolysate are from ( Tang et al. , submitted ) . 
+ Hydrolysate preparations are described in Materials and Methods . 
+ ( RSEM ) version 1.2.4 ( Li and Dewey , 2011 ) . 
+ Posterior mean estimates of counts and FPKM values were used in the downstream analysis . 
+ The program edgeR v. 3.0.2 ( Robinson et al. , 2010 ) was used to compute differential expression by using the procedures and steps described in the package documentation in all function calls with median normalization rather than the default TMM procedure . 
+ We found that median normalization better adjusted for the particular biases within the dataset . 
+ Adjusted p-values for multiple hypothesis corrections were used as calculated by edgeR . 
+ Pairwise 
+ Growth ( Exponential ) ( hr − 1 ) b 0.13 ± 0.01 0.09 ± 0.02 0.12 ± 0.01 Glucose Rate ( Exponential ) b 4.7 ± 0.5 5.9 ± 1.3 5.6 ± 1.3 Glucose Rate ( Transition ) c 3.2 ± 0.1 2.6 ± 0.4 2.7 ± 0.1 Xylose Rate ( Transition ) c 0.6 ± 0.1 0.5 ± 0.1 0.2 ± 0.1 Glucose Rate ( Glu-Stationary ) d N/A 1.6 ± 0.2 1.4 ± 0.2 Xylose Rate ( Glu-Stationary ) d N/A 0.11 ± 0.05 0.11 ± 0.04 Xylose Rate ( Xyl-Stationary ) e 0.19 ± 0.03 0.01 ± 0.01 0.04 ± 0.03 Total Glucose Consumed ( mM ) 330 ± 20 310 ± 20 300 ± 20 Total Xylose Consumed ( mM ) 65 ± 30 25 ± 1 25 ± 10 Total Ethanol produced ( mM ) 540 ± 30 460 ± 60 470 ± 60 Ethanol Yield ( % ) f 70 ± 3 70 ± 6 73 ± 3 aEach value is from at least three biological replicates in different bioreactors . 
+ bExponential phase is between 4 and 12 h in all media . 
+ Unit for glucose uptake rate is mM · OD − 1 · h − 1 . 
+ 600 cTransition phase is between 12 and 30 h for SynH2 - , and between 12 and 23 h for SynH2 and ACSH . 
+ Units for glucose and xylose uptake rate are mM · OD − 1 · h − 1 . 
+ 600 dStationary phase when glucose is present ( Glu-Stationary ) is between 23 and 
+ 100 h for SynH2 and ACSH . 
+ However , there was no Glu-stationary phase for SynH2 − because it remained in transition phase until the glucose was gone . 
+ eStationary phase when glucose is gone ( Xyl-Stationary ) is between 47 and 78 h for SynH2 − . 
+ The Xyl-Stationary rates for SynH2 and ACSH were measured in follow-up experiments carried out long enough to exhaust glucose in stationary phase . 
+ f Calculated from the total ethanol produced and the total glucose and xylose consumed , assuming 2 ethanol per glucose and 1.67 ethanol per xylose . 
+ fold-changes and adjusted p-values are calculated between media types and within each phase and between phases within each media type . 
+ To catalog the most signiﬁcant effects , we examined the ratios using several different strategies . 
+ In addition to identifying the largest changes in expression of individual genes in SynH2 and ACSH relative to SynH2 − ( Table S2 ) , we also used gene set enrichment analyses as described by Subramanian et al. ( 2005 ) and Varemo et al. ( 2013 ) . 
+ We compiled gene sets for these analyses from pathways , transporters , and regulons documented in Ecocyc ( Keseler et al. , 2013 ) and KEGG . 
+ PROTEOMIC MEASUREMENTS
+ Thirty-four Escherichia coli samples were processed for analysis by mass spectrometry at PNNL . 
+ Each sample was typically digested using a global urea digestion ( Pasa-Tolic et al. , 2004 ; Smyth , 2004 ) prior to isobaric labeling with an iTRAQ 4-plex labeling kit , following the manufacturer 's directions ( ABSciex , Redwood City , CA ) ( Ross et al. , 2004 ; Bantscheff et al. , 2008 ) . 
+ Prior to high pH reverse phase fractionation with concatenated pooling ( Wang et al. , 2011b ) , the samples were desalted using C18 solid-phase extraction ( SPE ) ( SUPELCO , Bellefonte , PA ) . 
+ All samples were processed with a custom LC system using reversed-phase C18 columns ( unpublished variation of Maiolica et al. , 2005 ) and the samples were then analyzed with a Velos Orbitrap mass spectrometer ( Thermo Scientiﬁc , San Jose , CA ) that was equipped with an electrospray ionization ( ESI ) interface ( Kelly et al. , 2006 ) . 
+ Raw ﬁles were searched against a concatenated Escherichia coli K-12 database and contaminant database using MS-GF + ( v9018 ) with oxidation as a dynamic modiﬁcation on methionine and 4-plex iTRAQ label as a static modiﬁcation ( Kim et al. , 2008 ) . 
+ The parent ion mass tolerance was set to 50 ppm . 
+ The resulting sequence identiﬁcations were ﬁltered down to a 1 % false discovery rate using target-decoy approach and MS-GF derived q-values . 
+ Reporter ion intensities were quantiﬁed using the tool MASIC ( Monroe et al. , 2008 ) . 
+ Results were then processed with the MAC ( Multiple Analysis Chain ) pipeline , an internal tool which aggregates and ﬁlters data . 
+ Missing reporter ion channel results were retained . 
+ Degenerate peptides , i.e. , peptides occurring in more than one protein , were ﬁltered out . 
+ Proteins with one peptide detected were removed if they were not repeatable across at least two replicates . 
+ Redundant peptide identiﬁcation reporter ions were summed across fractions and median central tendency normalization was applied to account for channel bias . 
+ Each 4-plex sample group was normalized using a pooled sample for comparison between groups . 
+ The ﬁnal protein values were obtained by averaging their associated peptide intensity values and varied from ∼ 5000 to 350000 . 
+ Finally , the protein values were then log2 transformed . 
+ All proteins that had missing values in their replicates were removed and the pair-wise protein expression level changes and signiﬁcance p-values between the SynH2 and SynH2 − cells at each growth phase were estimated using limma ( Smyth , 2004 ; Smith , 2005 ) , which ﬁts a linear model across the replicates to calculate the fold changes , smooths the standard errors for signiﬁcance and adjusts the p-values via the Benjamini-Hochberg method . 
+ COMPARISON OF PROTEOMIC DATA TO TRANSCRIPTOMIC DATA
+ Pair-wise RNA expression level changes and signiﬁcance p-values were estimated using the edgeR package as previously discussed . 
+ The log2-fold-changes for the Protein and RNA were z-score scaled separately to correct for the difference in dynamic ranges between the protein and RNA measurements . 
+ Signiﬁcant discrepant Protein/RNA ratios between SynH2 and − SynH2 cells were estimated using a two-sample z-test and the corresponding p-values are adjusted for multiple comparisons using the Benjamini-Hochberg method . 
+ All Protein/RNA ratios that are either signiﬁcant in the RNA or protein ratio ( p < 0.05 ) and that signiﬁcantly disagree ( p < 0.05 ) are tabulated in Table S7 . 
+ MEASUREMENT OF INTERNAL METABOLITE ABUNDANCES PREPARATION OF INTRACELLULAR EXTRACTS
+ Two ml of cell culture was rapidly removed from bioreactors with a 10 ml sterile syringe and cells captured on Whatman 0.45 um nylon syringe ﬁlters ( GE Healthcare Bio-Sciences , Pittsburgh , Pennsylvania , USA ) as described previously ( Schwalbach et al. , 2012 ) . 
+ To reduce the background associated with metabolites present in ACSH and SynH the cells on the ﬁlter were then rapidly washed with 5 ml of M9 medium ( Neidhardt et al. , 1974 ) lacking a carbon source . 
+ Acetonitrile-methanol-water ( 40:40:20 ; 2 ml ) containing 0.1 % formic acid was then applied to the ﬁlters , and the eluate captured in a 15 ml conical tube . 
+ The eluate was passed through the cells a second time to ensure complete cell lysis and then ﬂash frozen in a dry ice/ethanol bath . 
+ DETECTION/QUANTIFICATION OF METABOLITES
+ The concentration of internal glycolytic and TCA cycle intermediates were determined using high performance anion exchange chromatography electrospray ionization tandem mass spectrometry ( HPAEC-ESI-MS/MS ) . 
+ Reagents and non-labeled reference compounds were from Sigma Aldrich Co. . 
+ HPAEC was adapted from a previously reported method ( Buescher et al. , 2010 ) , and was used for determination of pyruvate , citrate , α -- ketoglutarate , glucose-6-phosphate , fructose-6-phosphate , fructose-1 ,6 - bis phosphate , phospho ( enol ) pyruvate , and ATP . 
+ Chromatography was carried out on an Agilent 1200 series HPLC comprised of a vacuum degasser , binary pump , and a heated column compartment , and a thermostated autosampler set to maintain 6 ◦ C. Mobile Phase A was 0.5 mM NaOH and mobile phase B was 100 mM NaOH . 
+ Compounds were separated by a gradient elution of 0.35 mL per minute starting at 10 % B , increased to 15 % B over 5 min and held at 15 % B for 10 min , then increased to 100 % B over 12 min and held for 10 min before returning to 10 % B to be re-equilibrated for 5 min prior to the next injection . 
+ The column temperature was 40 ◦ C . 
+ The injection volume was 20 μL of intracellular extract or calibrant standard mixture . 
+ MEASUREMENT OF AROMATIC INHIBITORS IN ACSH AND SynH
+ Samples of ACSH and SynH cultures were prepared by centrifugation as described previously ( Schwalbach et al. , 2012 ) , and then were subjected to reverse phase HPLC high resolution/accurate mass spectrometry ( RP-HPLC-HRAM MS ) and headspace solidphase microextraction gas chromatography-isotope dilution mass spectrometry ( HS-SPME/IDMS ) analysis . 
+ The majority of phenolic compounds were determined by RP-HPLC-HRAM MS , which was carried out with a MicroAS autosampler ( Thermo Scientiﬁc ) equipped with a chilled sample tray and a Surveyor HPLC pump ( Thermo Scientiﬁc ) coupled to a Q-Exactive hybrid quadrupole/orbitrap mass spectrometer by electrospray ionization . 
+ The analytical column was an Ascentis Express column ( 150 × 2.1 mm × 2.7 μm core-shell particles , Supelco , Bellefonte , PA ) protected by a 5 mm C18 precolumn ( Phenomenex , Torrance , CA ) . 
+ Mobile phase A was 10 mM formic acid adjusted to pH 3 with ammonium hydroxide and mobile phase B was methanol with 10 mM formic acid and the same volume of ammonium hydroxide as was added to mobile phase A. Compounds were separated by gradient elution . 
+ The initial composition was 95 % A , which was held for 2 min after injection , then decreased to 40 % A over the next 8 min , changed immediately to 5 % A and held for 5 min , then changed back to 95 % A for a column re-equilibration period of 7 min prior to the next injection . 
+ The ﬂow rate was 0.3 mL/min . 
+ The HPLC separation was coupled to the mass spectrometer via a heated electrospray ( HESI ) source ( HESI II Probe , Thermo 
+ Scientiﬁc ) . 
+ The operating parameters of the source were : spray voltages : +3000 , − 2500 ; capillary temperature : 300 ◦ C ; sheath gas ﬂow : 20 units ; auxiliary gas ﬂow : 5 units ; HESI probe heater : 300 ◦ C. Spectra were acquired with fast polarity switching to obtain positive and negative mode ionization chromatograms in a single analysis . 
+ In each mode , a full MS1 scan was performed by the Orbitrap analyzer followed by a data dependent MS2 scan of the most abundant ion in the MS1 scan . 
+ The Q-Exactive parameters ( both positive and negative modes ) were : MS1 range 85 -- 500 Th , resolution : 17,500 ( FWHM at 400 m/z ) , AGC target : 1e6 , maximum ion accumulation time 100ms , S-lens level : 50 . 
+ Settings for data dependent MS2 scans were : isolation width : 1.8 Th , normalized collision energy : 50 units , resolution : 17,500 , AGC target : 2e5 , maximum ion accumulation time : 50 ms , underﬁll ratio : 1 % , apex trigger : 5 -- 12 s , isotope exclusion enabled , dynamic exclusion : 10 s. HS-SPME/IDMS was used to quantify acetaldehyde , acetamide , furfural , furfuryl alcohol , HMF , 5 - ( hydroxymethyl ) fu rfural ( HMF ) , and Bis ( hydroxymethyl ) furan ( `` HMF alcohol '' / BHMF ) . 
+ Samples were thawed and brieﬂy vortex mixed prior to measuring 500 microliters of sample , 500 microliters of stable isotope labeled internal standard mixture , and ∼ 300 mg of NaCl into a 20 mL screw top headspace and quickly capped with magnetic screwtop cap with 4 mm PTFE backed silicone rubber septum for SPME . 
+ Automated SPME sample processing and analysis was carried out using a Pegasus 4D GCxGC-TOF MS ( Leco Corp. . 
+ Saint Joseph , Michigan ) with an Agilent 6890A gas chromatograph coupled to the ToF mass analyzer via a heated capillary transfer line , and a Gerster-LEAP combi PAL autosampler and sample preparation system with Twister heated sample agitator ﬁtted with an automated SPME holder containing a gray hub 50/30 , 23 ga. . 
+ Stabiliﬂex DVB/Carboxen/PDMS SPME ﬁber ( Supelco , Inc. ) . 
+ Chromatof software ( Leco , Corp. ) V. 4.50.8.0 was used for system control during acquisition and for data processing , calibration and calculation of ﬁnal concentrations . 
+ Sample incubation temperature 95 ◦ C , agitation speed 100 rpm , during extraction time , 100 rpm , agitation on 4 s/off 15 s , sample extraction time ( SPME ﬁber exposed to the sample headspace in heated agitator ) 20 min , desorb time ( SPME ﬁber inserted in hot GC inlet ) 60 min . 
+ GC cycle time 40 min . 
+ Critical injector positions were determined empirically through trial , error , and careful measurement : vial penetration 11 mm , Injector penetration 54 mm , Injector penetration -- needle 40 mm . 
+ GC was carried out using a StabilWAX-DA column ( Restek Corp , Bellefonte , Pennsylvania , USA ) 0.25 mm ID × 30 m , df = 0.25 μm ; carrier gas He , 1 mL/min ; split 5:1 ; purge ﬂow 3 mL/min ; inlet temp 250 ◦ C ; inlet liner type straight split/splitless deactivated glass 0.75 mm ID ; equilibration time 1 min ; Oven temperature program : initial temperature 30 ◦ C , hold 2 min . 
+ Increase to 10 ◦ C/min to 250 ◦ C , hold 10 min ; MS transfer line 250 ◦ C. ToF mass spectrometer ( unit mass resolution ) Acquisition delay 85 s ; start mass 10 end mass 500 ; acquisition 10 spectra/s ; electron multiplier delta V 1475 ( dependent on QC procedure ) source temperature 200 ◦ C. Quantiﬁcation of organic acids in ACSH was carried out by HPAEC-MS/MS in a similar manner to that described for intracellular metabolites . 
+ Transcriptomic data ( RNA-seq and microarray ) have been deposited in NCBI 's Gene Expression Omnibus and are accessible through GEO Series accession number GSE58927 . 
+ Proteomic data can be obtained from the PeptideAtlas database ( http://www . 
+ peptideatlas.org/PASS/PASS00514 ) . 
+ RESULTS SynH2 RECAPITULATES THE GROWTH, SUGAR CONSUMPTION, AND
+ ETHANOL PRODUCTION PROFILES OF E. COLI IN ACSH We ﬁrst sought to validate a new SynH recipe ( SynH2 ) that would replicate ACSH composition and effects on cells . 
+ In addition to protective osmolytes , trace carbohydrates , organic acids , acetamide , and alternative electron donors/acceptors detected in ACSH previously ( Schwalbach et al. , 2012 ) , new compositional analyses revealed signiﬁcant quantities of coumarate , coumaroyl amide , ferulate , feruloyl amide , 5-hydroxymethylfurfural ( HMF ) and nine other aromatic carboxylates or aldehydes in ACSH ( Table 1 ) . 
+ To formulate a chemically deﬁned ACSH-mimic ( SynH2 ) for use with E. coli , we tested combinations of the osmolytes and the LC-derived inhibitors in a base medium composition that included the other missing components ( Supplemental Results ; Materials and Methods ) , but substituting D-arabinose for L-arabinose to avoid repression of xylose-utilization genes ( Desai and Rao , 2010 ) . 
+ To verify that SynH2 recapitulates the major properties of ACSH and to prepare samples for gene expression and proteomic analyses , we compared growth of the E. coli ethanologen in SynH2 − ( SynH2 lacking aromatic inhibitors ) , SynH2 , and ACSH . 
+ For each medium , growth could be divided into exponential , transition , stationary , and late stationary growth phases ( Figure 1 and Figure S5 ) . 
+ Growth rates of GLBRCE1 in each phase and ﬁnal cell density were similar for SynH2 and ACSH , with only slight differences , whereas removal of inhibitors ( SynH2 − ) signiﬁcantly increased growth and ﬁnal cell density ( Figure 1 and Figure S5 ; Table 2 ) . 
+ During exponential phase , glucose uptake was similar in all media . 
+ As observed previously in ACSH ( Schwalbach et al. , 2012 ) , cells stopped growth prematurely in both ACSH and SynH , but remained metabolically active and continued glucose assimilation during stationary phase . 
+ However , in SynH2 − , cell growth continued until the glucose was essentially gone ( Figure 1 and Figure S5 ) . 
+ Thus , cessation of cell growth and entry into the metabolically active stationary phase was caused by the presence of LC-derived inhibitors . 
+ In the absence of inhibitors , cells growth ceased when glucose was depleted . 
+ In the presence of inhibitors , cells ceased growth when they ran out of organic N and S sources ( Schwalbach et al. , 2012 ) . 
+ After glucose depletion and entry into stationary phase in SynH2 − , GLBRCE1 consumed xylose ( up to ∼ 50 % by the time the experiments were terminated 80 -- 100 h ; Figure 1 and Figure S5 ; Table 2 ) . 
+ However , little xylose consumption occurred in the presence of inhibitors or in ACSH , presumably in part because glucose conversion continued during stationary phase to near the end of the experiment . 
+ However , even in experiments that exhausted glucose in stationary phase , SynH2 cells and ACSH cells exhibited little or no xylose conversion ( Table 2 ) . 
+ GLBRCE1 generated slightly more ethanol in SynH2 − than in SynH2 or 
+ ACSH , consistent with greater sugar consumption , but also generated ethanol much faster than in the inhibitor-containing media ( Figure 1 and Figure S5 ; Table 2 ) . 
+ We conclude that LC-derived inhibitors present in SynH2 and in ACSH cause E. coli cells to cease growth before glucose was consumed , decreased the rate of ethanol production , and to lesser extent decreased ﬁnal amounts of ethanol produced . 
+ GLBRCE1 GENE EXPRESSION PATTERNS ARE SIMILAR IN SynH2 AND ACSH
+ To test the similarity of SynH2 to ACSH and the extent to which LC-derived inhibitors impact ethanologenesis , we next used RNA-seq to compare gene expression patterns of GLBRCE1 grown in the two media relative to cells grown in SynH2 − ( Materials and Methods ; Table 1 ) . 
+ We computed normalized gene expression ratios of ACSH cells vs. SynH2 − cells and SynH2 cells vs. SynH2 − cells , and then plotted these ratios against each other using log10 scales for exponential phase ( Figure 2A ) , transition phase ( Figure 2B ) , and stationary phase ( Figure 2C ) . 
+ For simplicity , we refer to these comparisons as the SynH2 and ACSH ratios . 
+ The SynH2 and ACSH ratios were highly correlated in all three phases of growth , although were lower in transition and stationary phases ( Pearson 's r of 0.84 , 0.66 , and 0.44 in exponential , transition , and stationary , respectively , for genes whose SynH2 and ACSH expression ratios both had corrected p < 0.05 ; n = 390 , 832 , and 1030 , respectively ) . 
+ Thus , SynH2 is a reasonable mimic of ACSH . 
+ We used these data to investigate the gene expression differences between SynH2 and ACSH ( Table S3 ) . 
+ Several differences likely reﬂected the absence of some trace carbon sources in SynH2 ( e.g. , sorbitol , mannitol ) , their presence in SynH2 at higher concentrations than found in ACSH ( e.g. , citrate and malate ) , and the intentional substitution of D-arabinose for L-arabinose . 
+ Elevated expression of genes for biosynthesis or transport of some amino acids and cofactors conﬁrmed or suggested that SynH2 contained somewhat higher levels of Trp , Asn , thiamine and possibly lower levels of biotin and Cu2 + ( Table S3 ) . 
+ Although these discrepancies point to minor or intentional differences that can be used to reﬁne the SynH recipe further , overall we conclude that SynH2 can be used to investigate physiology , regulation , and biofuel synthesis in microbes in a chemically deﬁned , and thus reproducible , media to accurately predict behaviors of cells in real hydrolysates like ACSH that are derived from ammonia-pretreated biomass . 
+ AROMATIC ALDEHYDES IN SynH2 ARE CONVERTED TO ALCOHOLS, BUT PHENOLIC CARBOXYLATES AND AMIDES ARE NOT METABOLIZED
+ Before evaluating how patterns of gene expression informed the physiology of GLBRCE1 in SynH2 , we ﬁrst determined the pro-ﬁles of inhibitors , end-products , and intracellular metabolites during ethanologenesis . 
+ The most abundant aldehyde inhibitor , HMF , quickly disappeared below the limit of detection as the cells entered transition phase with concomitant and approximately stoichiometric appearance of the product of HMF reduction , 2,5-bis-HMF ( hydroxymethylfurfuryl alcohol ; Figure 3A , Table S8 ) . 
+ Hydroxymethylfuroic acid did not appear during the fermentation , suggesting that HMF is principally reduced by aldehyde reductases such as YqhD and DkgA , as previously reported for HMF and furfural generated from acid-pretreated biomass ( Miller et al. , 2009a , 2010 ; Wang et al. , 2013 ) . 
+ In contrast , the concentrations of ferulic acid , coumaric acid , feruloyl amide , and coumaroyl amide did not change appreciably over the course of the experiment ( Figure 3B , Table S8 ) , suggesting that E. coli either does not encode activities for detoxiﬁcation of phenolic carboxylates and amides , or that expression of such activities is not induced in SynH2 . 
+ Although HMF disappeared early in fermentation , acetaldehyde accumulated to > 10 mM during exponential and transition phase in both SynH2 and ACSH ( Figure 3C , Table S8 ) . 
+ Elevated acetaldehyde relative to SynH2 − was also observed upon omission of aromatic aldehydes from SynH2 , demonstrating that LC-derived phenolic acids and amides alone can cause accumulation of acetaldehyde ( Figure 3C ) . 
+ Thus , acetaldehyde accumulation was not simply a consequence of diverting reducing equivalents to detoxiﬁcation of the aromatic aldehydes like HMF but likely resulted from a broader impact of LC-derived inhibitors on cellular energetics that decreased the pools of NADH available for conversion of acetaldehyde to ethanol . 
+ LIGNOCELLULOSE-DERIVED INHIBITORS NEGATIVELY IMPACT CARBON AND ENERGY METABOLISM, RESULTING IN ACCUMULATION
+ OF PYRUVATE AND ACETALDEHYDE Examination of intracellular metabolites revealed that aromatic inhibitors decreased the levels of metabolites associated with glycolysis and the TCA cycle ( Figures 4B , E ; Table S1 ) . 
+ Strikingly , metabolites associated with cellular energetics and redox state were also decreased in SynH2 cells relative to SynH2 − cells ( Figures 4A , C , D , F ; Table S1 ) . 
+ ATP was reduced 30 % ; the NADH/NAD + ratio decreased by 63 % ; and the NADPH/NADP + ratio decreased 56 % . 
+ Together , these data indicate that the aromatic inhibitors dramatically decreased cellular energy pools and available reducing equivalents in SynH2 cells . 
+ The consequences of energetic depletion were readily apparent with an approximate 100-fold increase in the intracellular levels of pyruvate in SynH2 cells ( to ∼ 14 mM ) , despite the disappearance of pyruvate from the growth medium ( Table S1 , Figure 4B , and data not shown ) . 
+ The increase in pyruvate and correspondingly in acetaldehyde ( Figures 3C , 4B ) suggest that the reduced rate of glucose-to-ethanol conversion caused by aromatic inhibitors results from inadequate supplies of NADH to convert acetaldehyde to ethanol . 
+ Transition-phase SynH2 vs. SynH2 − cells exhibited similar trends in aromatic-inhibitor-dependent depletion of some glycolytic intermediates , some TCA intermediates , and ATP , along with elevation of pyruvate and acetaldehyde ( Table S1 ; Figure 3C ) . 
+ Stationary phase cells displayed several differences , however . 
+ Glycolytic intermediates ( glucose 6-phosphate , fructose 6-phosphate , fructose 1,6 diphosphate , and 2 - , 3-phosphoglycerate ) were approximately equivalent in SynH2 and SynH2 − cells , whereas pyruvate concentrations dropped signiﬁcantly ( Table S1 ) . 
+ The impact of the inhibitors was largely attributable to the phenolic carboxylate and amides alone , as removal of the aldehydes from SynH2 changed neither the depletion of glycolytic and TCA intermediates nor the elevation of pyruvate and acetaldehyde ( data not shown ) . 
+ We conclude that phenolic carboxylates and amides in SynH2 and ACSH have major negative impacts on the rate at which cells grow and consequently can convert glucose to ethanol . 
+ AROMATIC INHIBITORS INDUCE GENE EXPRESSION CHANGES REFLECTING ENERGY STRESS
+ Given the major impacts of aromatic inhibitors on ethanologenesis , we next sought to address how these inhibitors impacted gene expression and regulation in E. coli growing in SynH2 . 
+ To that end , we ﬁrst identiﬁed pathways , transporters , and regulons with similar relative expression patterns in SynH2 and ACSH using both conventional gene set enrichment analysis and custom comparisons of aggregated gene expression ratios ( Materials and Methods ) . 
+ These comparisons yielded a curated set of regulons , pathways , and transporters whose expression changed signiﬁcantly in SynH2 or ACSH relative to SynH2 − ( aggregate p < 0.05 ; Table S4 ) . 
+ For many key pathways , transporters , and regulons , similar trends were seen in both SynH2 and ACSH vs. SynH2 − ( Figure 2 and Table S4 ) . 
+ The most upregulated gene sets reﬂected key impacts of aromatic inhibitors on cellular energetics . 
+ Anabolic processes requiring a high NADPH/NADP + potential were significantly upregulated ( e.g. , sulfur assimilation and cysteine biosynthesis , glutathione biosynthesis , and ribonucleotide reduction ) . 
+ Additionally , genes encoding efﬂux of drugs and aromatic carboxylates ( e.g. , aaeA ) and regulons encoding efﬂux functions ( e.g. , the rob regulon ) , were elevated . 
+ Curiously , both transport and metabolism of xylose were downregulated in all three growth phases in both media , suggesting that even prior to glucose depletion aromatic inhibitors reduce expression of xylose genes and thus the potential for xylose conversion . 
+ Currently the mechanism of this repression is unclear , but it presumably reﬂects either an indirect impact of altered energy metabolism or an interaction described in the Material and Methods . 
+ Shown are intracellular concentrations of ATP ( A ) , pyruvate ( B ) , fructose-1 ,6 - bisphosphate ( E ) , and cAMP ( F ) . 
+ ( C , D ) show the ratios of NADH/NAD + and NADPH/NADP + , respectively . 
+ of one or more of the aromatic inhibitors with a regulator that decreases xylose gene expression . 
+ During transition phase , a different set of genes involved in nitrogen assimilation were upregulated in SynH2 cells and ACSH cells relative to SynH2 − cells ( Table S5 ) . 
+ Previously , we found that transition phase corresponded to depletion of amino acid nitrogen sources ( e.g. , Glu and Gln ; Schwalbach et al. , 2012 ) . 
+ Thus , this pattern of aromatic-inhibitor-induced increase in the expression of nitrogen assimilation genes during transition phase suggests that the reduced energy supply caused by the inhibitors increased difﬁculty of ATP-dependent assimilation of ammonia . 
+ Interestingly , the impact on gene expression appeared to occur earlier in ACSH than in SynH2 , which may suggest that availability of organic nitrogen is even more growth limiting in ACSH . 
+ Of particular interest were the patterns of changes in gene expression related to the detoxiﬁcation pathways for the aromatic inhibitors . 
+ Our gene expression analysis revealed inhibitor induction of genes encoding aldehyde detoxiﬁcation pathways ( frmA , frmB , dkgA , and yqhD ) that presumably target LC-derived aromatic aldehydes ( e.g. , HMF and vanillin ) and acetaldehyde that accumulates when NADH-dependent reduction to ethanol becomes inefﬁcient ( Herring and Blattner , 2004 ; Gonzalez et al. , 2006 ; Miller et al. , 2009b , 2010 ; Wang et al. , 2013 ) as well as efﬂux pumps controlled by MarA/SoxS/Rob ( e.g. , acrA and acrB ) and the separate system for aromatic carboxylates ( aaeA and aaeB ) ( Van Dyk et al. , 2004 ) . 
+ Interestingly , we observed that expression of the aldehyde detoxiﬁcation genes frmA , frmB , dkgA , and yqhD paralleled the levels of LC-derived aromatic aldehydes and acetaldehyde detected in the media ( Figure 3 ) . 
+ Initially high-level expression was observed in SynH2 cells , which decreased as the aldehydes were inactivated ( Figure 5A ) . 
+ Conversely , expression of these genes increased in SynH2 − cells , surpassing the levels in SynH2 cells in stationary phase when the level of acetaldehyde in the SynH2 − culture spiked past that in the SynH2 culture . 
+ The elevation of frmA and frmB is particularly noteworthy as the only reported substrate for FrmAB is formaldehyde . 
+ We speculate that this system , which has not been extensively studied in E. coli , may also act on acetaldehyde . 
+ Alternatively , formaldehyde , which we did not assay , may have accumulated in parallel to acetaldehyde . 
+ In contrast to the decrease in frmA , frmB , dkgA , and yqhD expression as SynH2 cells entered stationary phase , expression of aaeA , aaeB , acrA , and acrB remained high ( Figure 5B ) . 
+ This continued high-level expression is consistent with the persistence of phenolic carboxylates and amides in the SynH2 culture ( Figure 3 ) , and presumably reﬂect the futile cycle of antiporter excretion of these inhibitors to compete with constant leakage back into cells . 
+ POST-TRANSCRIPTIONAL EFFECTS OF AROMATIC INHIBITORS WERE LIMITED PRIMARILY TO STATIONARY PHASE
+ We next investigated the extent to which the aromatic inhibitors could exert effects on cellular regulation post-transcriptionally rather than via transcriptional regulators by comparing inhibitor-induced changes in protein levels to changes in RNA levels . 
+ For this purpose , we used iTRAQ quantitative proteomics to assess changes in protein levels ( Material and Methods ) . 
+ We then normalized the log2-fold-changes in protein levels in each of the three growth phases to changes in RNA levels determined by RNA-seq and plotted the normalized values against each other ( Figures 6A -- C ; Tables S6 , S7 ) . 
+ Most proteome and transcriptome fold-changes fall within a factor of 2 of the diagonal , consistent with concordant changes in mRNA and protein and thus limited post-transcriptional effects of aromatic inhibitors . 
+ A small number of RNA-protein pairs exhibited an > 2-fold change with p < 0.05 . 
+ During exponential phase , four proteins were present at elevated levels relative to changes in RNA levels , which actually decreased ( RpoS , TnaA , MalE , and GlnH ; red circles , Figure 6A ; Table S7A ) , whereas 26 RNAs increased or decreased signiﬁcantly with little difference in proteins levels ( blue circles , Figure 6A ; Table S7A ) . 
+ These disparate increases in RNA levels included some of the major transcriptional responses to the inhibitors ( S assimilation and the FrmA aldehyde detoxiﬁcation pathway ) , and these proteins were present at high levels both with and without inhibitors ( Table S7D ) . 
+ Several observations led us to conclude that these discrepancies in protein and RNA levels between SynH2 − and SynH2 cells reﬂect induction of expression in SynH2 cells but carryover of elevated protein levels in the inoculum of SynH2 − cells not yet diluted in exponential phase . 
+ First , we sampled exponential phase between one and two cell doublings so that proteins elevated in stationary phase in the inoculum might still be present . 
+ Second , FrmRAB and S assimilation genes are elevated in stationary SynH2 − cells relative to SynH2 cells ( Table S7C ) , likely reﬂecting the greater accumulation of acetaldehyde in SynH2 − cells in stationary phase ( Figure 3C ) . 
+ Finally , RpoS and TnaA are markers of stationary phase ( Lacour and Landini , 2004 ) and may reﬂect elevation of these proteins in SynH stationary cells carried over from the inoculum . 
+ In a similar lines ) or SynH2 − ( dotted lines ) or their relative ratios ( bottom panel ) from exponential , transition , and stationary phases of growth as indicated . 
+ ( A ) Aldehyde detoxiﬁcation genes ( frmA , frmB , dkgA , and yqhC ) . 
+ ( B ) Genes that encode efﬂux pumps ( aaeA , aaeB , acrA , acrB ) . 
+ vein , the apparent overrepresentation of PyrBI , GadABC , and MetEF proteins in SynH2 cells could reﬂect their greater abundance in stationary phase SynH2 cells that were carried over to early exponential phase . 
+ Supporting this view , transition phase cells in which the inoculum was diluted > 5-fold exhibited a higher correlation between protein and RNA levels and only limited evidence of post-transcriptional regulation caused by the aromatic inhibitors ( Figure 6B ) . 
+ Three clusters of outliers reﬂected ( i ) reduced transcript levels for S assimilation genes in SynH2 − without a corresponding drop in protein level ( cys genes ) , ( ii ) higher levels of glnAGHLQ transcripts in SynH2 cells than SynH2 − cells with high protein levels in both , and ( iii ) high induction of transcripts for the citrate assimilation system ( citDEFX ) in SynH2 with lesser induction of protein levels . 
+ These effects likely reﬂect adjustment of S assimilation gene expression during transition phase , a greater induction of N assimilation in the more rapidly growing SynH2 cells , and induction of citrate assimilation by the aromatic inhibitors . 
+ The clearest evidence for post-transcriptional regulation caused by the aromatic inhibitors appeared in stationary phase ( Figure 6C ) . 
+ A set of proteins involved in arginine , glutamate , lysine and citrate biosynthesis ( ArgABCGI , GdhA , LysC , GltA ) and periplasmic proteins arginine high-afﬁnity import ( ArtJ ) , histidine high-afﬁnity import ( HisJ ) , molybdate import ( ModA ) , and lysozyme inhibition ( PliG ) decreased dramatically in SynH2 cells relative to SynH2 − cells without corresponding reductions of their transcripts . 
+ GdhA , other biosynthetic enzymes , and other periplasmic binding proteins are degraded by the ClpP protease during C or N starvation ( Maurizi and Rasulova , 2002 ; Weichart et al. , 2003 ) ; Lon protease also has been implicated in proteolysis upon C starvation ( Luo et al. , 2008 ) . 
+ Thus , we suggest that aromatic inhibitors may enhance degradation of proteins involved in N and C metabolism in stationary phase cells . 
+ The periplasmic proteins must be degraded as precursors or mediated by an additional effect involving periplasmic proteases . 
+ DISCUSSION
+ Results of our investigation into the effects of LC-derived inhibitors on E. coli ethanologenesis support several key conclusions that will guide future work . 
+ First , a chemically deﬁned mimic of ACSH ( SynH2 ) that contained the major inhibitors found by chemical analysis of ACSH adequately replicated both growth and the rates of glucose and xylose conversion to ethanol by E. coli . 
+ SynH2-replication of ACSH required inclusion of osmolytes found in ACSH and established that , at the ratios present in ACSH , phenolic carboxylates and amides , which are not metabolized by E. coli , had a greater overall impact on cell growth than phenolic aldehydes and furfurals , which were metabolized . 
+ In both SynH2 and ACSH , E. coli entered a metabolically active stationary phase as cells exhausted organic sources of N and S ( e.g. , amino acids ) and during which the inhibitors greatly reduced xylose conversion . 
+ The impact of inhibitors on cellular energetics reduced levels of ATP , NADH , and NADPH and was seen most dramatically for energetically challenging processes requiring NADPH ( like SO assimilation and deoxyribonu - − 2 4 cleotide production ) , during transition to the stationary phase on ATP-dependent NH assimilation , and in elevated pyruvate 3 levels presumably reﬂecting reduced NADH-dependent ﬂux of pyruvate to ethanol ( Figure 7 ) . 
+ The direct effects of the inhibitors on cells appear to be principally mediated by transcriptional rather than translational regulators , with the MarA/SoxS/Rob network , AaeR , FrmR , and YqhC being the most prominent players . 
+ Although the effect of the inhibitors on transcriptional regulation of the efﬂux pumps was striking , increased efﬂux activity itself may perturb cellular metabolism . 
+ For example , Dhamdhere and Zgurskaya ( 2010 ) have shown that deletion of the AcrAB-TolC complex results in metabolic shutdown and high NADH/NAD + ratios . 
+ By analogy , overexpression of efﬂux pumps may have the opposite effect ( e.g. , lowering of NADH/NAD + ratios ) , which is consistent with observations in this study . 
+ In addition , recent work suggests that the acrAB promoter is upregulated in response to certain cellular metabolites ( including those related to cysteine and purine biosynthesis ) , which are normally efﬂuxed by this pump ( Ruiz and Levy , 2014 ) . 
+ Therefore , upregulation of AcrAB-TolC may impact homeostatic mechanisms of cellular biosynthetic pathways , resulting in continuous upregulation of pathways that require large amounts of reducing power in the form of NADPH . 
+ It is also possible that LC-derived inhibitors perturb metabolism directly in ways that generate additional AcrAB-TolC substrates , potentially increasing energy-consuming efﬂux further . 
+ Given these intricacies , further studies to unravel the mechanistic details of the effects of efﬂux pump activity on cellular metabolism , as a result of exposure to LC-derived inhibitors , are warranted . 
+ The inability of cells to convert xylose in the presence of inhibitors appears to result from a combination of both effects on gene expression and some additional effect on transport or metabolism . 
+ The inhibitors lowered xylose gene expression ( XylR regulon ; xylABFGH ) by a factor of 3-5 during all three growth phases ( Table S4 ) . 
+ This effect was not caused by the previously documented AraC repression ( Desai and Rao , 2010 ) , since it persisted in SynH2 when we replaced the AraC effector L-arabinose with D-arabinose , but might reﬂect lower levels of cAMP caused by the inhibitors ( Figure 4 ) ; both the xylAB and xylFGH operons are also regulated by CRP · cAMP . 
+ Nonetheless , signiﬁcant levels of XylA , B , and F were detected even in the presence of inhibitors ( Table S7D ) , even though xylose conversion remained inhibited even after glucose depletion ( Table 2 ) . 
+ Thus , the inability to convert xylose may also reﬂect either the overall impact of inhibitors on cellular energetics somehow making xylose conversion unfavorable or an effect of xylose transport or metabolism that remains to be discovered . 
+ Further studies of the impact of inhibitors on xylose transport and metabolism are warranted . 
+ It would be particularly interesting to test SynH formulations designed to compare the conversion efﬁciencies of xylose , arabinose , and C6 sugars other than glucose . 
+ The central focus of this study was to understand the impact of inhibitors of gene expression regulatory networks . 
+ The apparent lack of involvement of post-transcriptional regulation suggests that E. coli mounts a defense against LC-derived inhibitors principally by controlling gene transcription , probably reﬂecting evolution of speciﬁc bacterial responses to LC-derived inhibitors . 
+ Although enteric bacteria do not ordinarily encounter industrial lignocellulosic hydrolysates , they likely encounter the same suite of compounds from digested plant material in the mammalian gut . 
+ Thus , evolution of speciﬁc responses is reasonable . 
+ A key question for future studies is whether phenolic amides , not ordinarily present in digested biomass , will also invoke these responses in the absence of carboxylates or aldehydes . 
+ We note that the apparent absence of a translational regulatory response in the cellular defense against LC-derived inhibitors does not preclude involvement of either direct or indirect post-transcriptional regulation in ﬁne-tuning the response . 
+ Our proteomic measurements would likely not have detected ﬁne-tuning . 
+ Additionally , we did detect an apparently indirect induction by inhibitors of protein degradation in stationary phase , possibly in response to C starvation ( Figure 6C ) . 
+ Finally , we note that the sRNA micF , a known post-transcriptional regulator , is a constituent of the MarA/SoxS/Rob regulon and was upregulated by inhibitors . 
+ Although conﬁdence was insigniﬁcant due to poor detection of sRNAs in RNAseq data , the induction of micF was conﬁrmed in a separate study of sRNAs ( Ong and Landick , in preparation ) . 
+ Thus , a more focused study of the involvement of sRNAs in responses to LC inhibitors would likely be informative . 
+ MarA/SoxS/Rob is a complex regulon consisting of the three inter-connected primary AraC-class regulators that bind as monomers to 20-bp sites in promoters with highly overlapping speciﬁcity and synergistically regulate ∼ 50 genes implicated in resistance to multiple antibiotics and xenobiotics , solvent tolerance , outer membrane permeability , DNA repair , and other functions ( Chubiz et al. , 2012 ; Duval and Lister , 2013 ; GarciaBernardo and Dunlop , 2013 ) ( Figure 7 ) . 
+ Twenty-three genes , including those encoding the AcrAB · TolC efﬂux pump , the NfsAB nitroreductases , the micF sRNA , superoxide dismutase , some metabolic enzymes ( e.g. , Zwf , AcnA , and FumC ) and incompletely characterized stress proteins are controlled by all three regulators , whereas other genes are annotated as being controlled by only a subset of the regulators ( Duval and Lister , 2013 ) , www . 
+ ecocyc.org ; ( Keseler et al. , 2013 ) . 
+ MarA and SoxS lack the Cterminal dimerization domain of AraC ; this domain is present on Rob and appears to mediate regulation by aggregation that can be reversed by effectors ( Grifﬁth et al. , 2009 ) . 
+ Inputs capable of inducing these genes , either through the MarR and SoxR repressors that control MarA and SoxS , respectively , or by direct effects on Rob include phenolic carboxylates , Cu2 + , a variety of organic oxidants , dipyridyl , decanoate , bile salts , Fis , and Crp · cAMP 
+ ( Martin and Rosner , 1997 ; Rosner et al. , 2002 ; Rosenberg et al. , 2003 ; Chubiz and Rao , 2010 ; Duval and Lister , 2013 ; Hao et al. , 2014 ) ( Figure 7 ) . 
+ Given these diverse inputs , it seems highly likely that ferulate and coumarate in ACSH induce the MarA/SoxS/Rob regulon via MarR . 
+ Indeed , LC-hydrolysate and ferulate induction of MarA has been reported ( Lee et al. , 2012 ) . 
+ Interestingly , Cu2 + recently was shown to induce MarR by oxidation to create MarR disulﬁde dimer ( Hao et al. , 2014 ) . 
+ Given the elevated levels of Cu2 + in ACSH reﬂected by induction of Cu2 + efﬂux ( Figure 2 ; Table S4 ) , induction of MarA/SoxS/Rob in ACSH may result from synergistic effects of Cu2 + and phenolic carboxylates , oxidants that affect SoxR , and yet-to-be-determined compounds that affect Rob . 
+ A second response in LC-derived inhibitors appears to be mounted by the LysR-type regulator AaeR , which controls the AaeAB aromatic carboxylate efﬂux system ( Van Dyk et al. , 2004 ) ( Figure 7 ) . 
+ Both phenolic and aryl carboxylates induce AaeAB through AaeR , but little is known about its substrate speciﬁcity or mechanism of activation . 
+ Pink panels , direct targets of the regulators that consume reductant ( NADPH ) for detoxiﬁcation reactions or deplete the proton motive force through continuous antiporter efﬂux of aromatic carboxylates . 
+ Blue panels , indirect effects of inhibitors mediated by reductions in ATP and NADPH levels . 
+ Two distinct regulators , YqhC and FrmR , control synthesis of the YqhD/DkgA NAPDH-dependent aldehyde reductases and the FrmAB formaldehyde oxidase , respectively ( Herring and Blattner , 2004 ; Turner et al. , 2011 ) . 
+ Even less is known about these regulators , although the DNA-binding properties of YqhC have been determined . 
+ In particular , it is unclear how aldehydes cause induction , although the current evidence suggests effects on YqhC are likely to be indirect . 
+ Given the central role of the regulators AaeR , YqhC , and FrmR in the cellular response to LC-derived inhibitors , further study of their properties and mechanisms is likely to be proﬁtable . 
+ With sufﬁcient understanding and engineering , they could be used as response regulators to engineer cells that respond to LC-inhibitors in ways that maximize microbial conversion of sugars to biofuels . 
+ What types of responses would optimize biofuel synthesis ? 
+ It appears the naturally evolved responses , namely induction of efﬂux systems and NADPH-dependent detoxiﬁcation pathways , may not be optimal for efﬁcient synthesis of biofuels . 
+ We infer this conclusion for several reasons . 
+ First , our gene expression results reveal that crucial pathways for cellular biosynthesis that are among the most energetically challenging processes in cells , S assimilation , N assimilation , and ribonucleotide reduction , are highly induced by LC-derived inhibitors ( Figures 2 , 7 ; Table S4 ) . 
+ A reasonable conjecture is that the diversion of energy pools , including NADPH and ATP , to detoxiﬁcation makes S assimilation , N assimilation , and ribonucleotide reduction dif-ﬁcult , increasing expression of genes for these pathways indirectly . 
+ The continued presence of the phenolic carboxylates and amides ( Figure 3 ) likely causes futile cycles of efﬂux . 
+ As both the AcrAB and AaeAB efﬂux pumps function as proton antiporters ( Figure 7 ) , continuous efﬂux is expected to decrease ATP synthesis by depleting the proton-motive force . 
+ Although this response makes sense evolutionarily because it protects DNA from damage by xenobiotics , it does not necessarily aid conversion of sugars to biofuels . 
+ Disabling these efﬂux and detoxiﬁcation systems , especially during stationary phase when cell growth is no longer necessary , could improve rates of ethanologenesis . 
+ Indeed , Ingram and colleagues have shown that disabling the NADPH-dependent YqhD/DkgA enzymes or better yet replacing them with NADH-dependent aldehyde reductases ( e.g. , FucO ) can improve ethanologenesis in furfural-containing hydrolysates of acid-pretreated biomass ( Wang et al. , 2011a , 2013 ) . 
+ That simply deleting yqhD improves ethanologenesis argues that , in at least some cases , it is better to expose cells to LC-derived inhibitors than to spend energy detoxifying the inhibitors . 
+ Some previous efforts to engineer cells for improved biofuel synthesis have focused on overexpression of selected efﬂux pumps to reduce the toxic effects of biofuel products ( Dunlop et al. , 2011 ) . 
+ Although this strategy may help cells cope with the effects of biofuel products , our results suggest an added potential issue when dealing with real hydrolysates , namely that efﬂux pumps may also reduce the rates of biofuel yields by futile cycling of LC-derived inhibitors . 
+ Thus , effective use of efﬂux pumps will require careful control of their synthesis ( Harrison and Dunlop , 2012 ) . 
+ An alternative strategy to cope with LC-derived inhibitors may be to devise metabolic routes to assimilate them into cellular metabolism . 
+ In conclusion , our ﬁndings illustrate the utility of using chemically deﬁned mimics of biomass hydrolysates for genome-scale study of microbial biofuel synthesis as a strategy to identify barriers to biofuel synthesis . 
+ By identifying the main inhibitors present in ammonia-pretreated biomass hydrolysate and using these inhibitors in a synthetic hydrolysate , we were able to identify the key regulators responsible for the cellular responses that reduced the rate of ethanol production and limited xylose conversion to ethanol . 
+ Knowledge of these regulators will enable design of new control circuits to improve microbial biofuel production . 
+ ACKNOWLEDGMENTS
+ The authors thank Trey Sato and Jeff Piotrowski for critical reading of the manuscript , Fachuang Lu and John Ralph for advice on synthesis of feruloyl and coumaroyl amide , and Christa Pennacchio and colleagues at the Joint Genome Institute for cDNA library preparation and sequencing . 
+ This work was funded by the DOE Great Lakes Bioenergy Research Center ( DOE BER 
+ Ofﬁce of Science DE-FC02-07ER64494 ) . 
+ Portions of this research were enabled by the DOE GSP under the Pan-omics project . 
+ Work was performed in the Environmental Molecular Science Laboratory , a U.S. Department of Energy ( DOE ) national scientiﬁc user facility at Paciﬁc Northwest National Laboratory ( PNNL ) in Richland , WA . 
+ Battelle operates PNNL for the DOE under contract DE-AC05-76RLO01830 . 
+ SUPPLEMENTARY MATERIAL
+ The Supplementary Material for this article can be found online at : http://www.frontiersin.org/journal/10.3389/fmicb . 
+ 2014.00402 / abstract
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/25275371.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/25275371.txt 0 → 100644
View file @27818a9
+ Comprehensive Mapping of the Escherichia coli Flagellar
+ Abstract 
+ Flagellar synthesis is a highly regulated process in all motile bacteria . 
+ In Escherichia coli and related species , the transcription factor FlhDC is the master regulator of a multi-tiered transcription network . 
+ FlhDC activates transcription of a number of genes , including some flagellar genes and the gene encoding the alternative Sigma factor FliA . 
+ Genes whose expression is required late in flagellar assembly are primarily transcribed by FliA , imparting temporal regulation of transcription and coupling expression to flagellar assembly . 
+ In this study , we use ChIP-seq and RNA-seq to comprehensively map the E. coli FlhDC and FliA regulons . 
+ We define a surprisingly restricted FlhDC regulon , including two novel regulated targets and two binding sites not associated with detectable regulation of surrounding genes . 
+ In contrast , we greatly expand the known FliA regulon . 
+ Surprisingly , 30 of the 52 FliA binding sites are located inside genes . 
+ Two of these intragenic promoters are associated with detectable noncoding RNAs , while the others either produce highly unstable RNAs or are inactive under these conditions . 
+ Together , our data redefine the E. coli flagellar regulatory network , and provide new insight into the temporal orchestration of gene expression that coordinates the flagellar assembly process . 
+ Introduction
+ Bacterial flagellar synthesis and motility are highly regulated processes involving positive and negative input at the transcriptional , post-transcriptional , translational , and post-translational levels . 
+ This complex regulatory system allows for sequential production of flagellar components in roughly the order they are required for assembly . 
+ One of the primary ways this temporality is established is through the underlying hierarchical transcription network ( reviewed [ 1 -- 3 ] ) . 
+ Promoters for flagellar genes are divided into three categories ( Classes 1 , 2 , and 3 ) based on their timing and mode of expression . 
+ The atypical transcription factor FlhDC ( more specifically , FlhD4C2 ) serves as the master flagellar regulator in Escherichia coli , Salmonella enterica , and other related enteric bacteria [ 4 ] . 
+ The flhDC operon is considered the sole Class 1 transcription unit . 
+ FlhDC expression and activity is regulated at a number of levels , allowing it to serve as an integrator of environmental , nutritional , and growth-phase signals [ 5 -- 11 ] . 
+ In turn , FlhDC is responsible for activating the transcription , either directly or indirectly , of all structural and regulatory components of the flagellar machinery . 
+ Class 2 promoters are directly activated by FlhDC and transcribed by RNA polymerase ( RNAP ) containing the primary s-factor , s [ 2 ] . 
+ Contacts between FlhDC and the carboxy-70 terminal domain of the a-subunit of RNAP are important for activation , but the precise mechanism is not known [ 12 ] . 
+ The seven commonly accepted FlhDC-dependent operons ( flgAMN , flgBCDEFGHIJ , flhBAE , fliAZY , fliE , fliFGHIJK , and fliLM-NOPQR ) encode important regulatory factors and the structural components of the membrane-spanning basal body and associated export apparatus [ 1 ] . 
+ Notably , fliA encodes an alternative s-factor [ 13 -- 15 ] and flgM encodes its cognate anti-s-factor [ 16 ] . 
+ The interplay between these two factors regulates the transition from early flagellar gene expression to late-stage gene expression . 
+ When both FliA and FlgM are present in the cytoplasm , FlgM binds FliA , preventing interaction with RNAP , and repressing FliA-dependent transcription . 
+ Upon assembly of the basal body and secretion apparatus ( from Class 2 gene products ) , FlgM is exported out of the cell , freeing FliA and allowing initiation of FliA-dependent transcription from Class 3 promoters . 
+ This coupling of transcription and assembly allows for efficient `` just-in-time '' expression kinetics , as has been described for some metabolic pathways [ 17 ] . 
+ FliA drives transcription of six commonly accepted Class 3 operons ( flgKL , fliDST , flgMN , fliC , tar-tap-cheRBYZ ( meche ) , and motAB-cheAW ( mocha ) ) and can initiate transcription of its own operon , fliAZY . 
+ These Class 3 operons encode products needed in late flagellar assembly such as the subunits of the flagellar filament ( flagellin ) , components of the motor , and chemotaxis-related regulatory factors . 
+ In addition to these classical Class 3 operons , FliA has been predicted or shown to drive transcription of genes involved in chemotaxis ( trg [ 18 ] and tsr [ 19 ] ) , aerotaxis ( aer [ 18,20 ] ) , and cyclic-di-GMP regulation of motility ( yhjH and ycgR [ 21 ] ) . 
+ FliA can also drive transcription of the Class 2 fliLMNOPQR operon [ 22 ] , and FliA-dependent transcription of other Class 2 operons has been suggested [ 1,2,23 ] . 
+ In addition to their roles in the flagellar transcription network , FlhDC and FliA have been implicated in the regulation of nonflagellar genes . 
+ Studies have reported a host of non-flagellar genes and processes , such as cell division and anaerobic metabolism , regulated by FlhDC , or by FlhD alone [ 24 -- 28 ] . 
+ However , the evidence provided for most of these additional target genes , such as changes in gene expression with no information on DNA binding , is far from conclusive . 
+ The reports of FlhD acting independently to regulate cell division have been directly refuted by another study [ 29 ] and many putative FlhDC targets fail to be repeatedly detected across multiple studies [ 20,24,25,27,28 ] . 
+ Furthermore , most previous studies fail to accurately distinguish between direct and indirect FlhDC-dependent regulation , as exemplified by the identification of the flagellar-related gene aer as an FlhDC target [ 25 ] when it is actually transcribed by FliA [ 18,20 ] . 
+ There are a few well-characterized FliA-dependent promoters , such as modA [ 30 ] and flxA [ 31 ] , that have no obvious function in flagellar synthesis or motility . 
+ Furthermore , there have been various bioinformatic studies that predicted FliA-dependent promoters based on sequence identity , finding hundreds of promoters , many with non-flagellar functions [ 30,32,33 ] ; however , these predictions have not been tested experimentally . 
+ Though the E. coli flagellar network has been studied extensively over the past few decades , many issues remain to be addressed . 
+ These include ( i ) the ability of FlhD to have regulatory effects independent of FlhC , ( ii ) the extent of co-regulation by FlhDC and FliA and the relative contributions of each at known complex promoters such as fliAZY and fliLMNOPQR , and ( iii ) the extent and composition of the non-flagellar regulons of FlhDC and FliA . 
+ Chromatin immunoprecipitation followed by deep sequencing ( ChIP-seq ) is a powerful technique to study the genome-wide localization of DNA-binding proteins . 
+ ChIP-seq provides highresolution information about binding location and relative binding affinity in vivo [ 34 ] . 
+ Factors such as local DNA structure and the binding of nucleoid-associated proteins and/or other transcription factors affect binding and regulatory activity , making direct in vivo measurements incredibly valuable [ 35 ] . 
+ RNA-seq is a highresolution method for assessing the transcriptomic differences between strains or conditions and allows for identification of novel transcripts such as non-coding RNAs [ 36 ] . 
+ By combining ChIP-seq and RNA-seq , one can assess the regulatory effect of each binding site and comprehensively establish which regulatory effects are direct and which are indirect . 
+ The combination of ChIP-seq and transcriptome analysis has recently been applied to many bacterial transcription factors [ 37 -- 39 ] and s-factors [ 40,41 ] . 
+ This powerful combination of techniques has primarily been used to investigate single DNA-binding proteins , but can also be used to build global transcription networks [ 38 ] . 
+ In this study , we performed ChIP-seq on FlhD , FlhC , and FliA , and RNA-seq on motile wild-type , DflhD , DflhC , and DfliA derivatives of E. coli MG1655 . 
+ These data reveal new binding sites and regulatory targets for both FlhDC and FliA , including the discovery of two FliA-dependent non-coding RNAs . 
+ We have comprehensively determined the direct and indirect regulons of these three proteins and demonstrate that this information can be used to build a 
+ Results
+ Construction and validation of epitope-tagged strains
+ In order to perform ChIP experiments , we constructed strains in which FlhD , FlhC , and FliA were chromosomally epitope-tagged with a 36FLAG tag . 
+ Since both termini of FlhD and FlhC have been implicated in complex formation or DNA-binding [ 42,43 ] , we inserted epitope tags into internal , unstructured regions ( Figure S1A ) . 
+ FliA was tagged at the N-terminus . 
+ Tagged strains were constructed from a poorly motile MG1655 derivative , which contained no IS element upstream of flhDC . 
+ Spontaneous highly motile isolates were recovered following incubation on motility agar , as has been described previously [ 44 -- 46 ] . 
+ To confirm that all isolates had gained motility through IS element insertion in the region upstream of flhDC , we sequenced the upstream and coding regions of each isolate ( File S1 ) . 
+ Each motile isolate had an IS element inserted in the region upstream of flhDC although the identity and location of the IS elements varied between strains ( Figure S1B ) . 
+ No additional mutations were observed in flhD , flhC , or the region between flhD and uspC , in any of the tagged strains . 
+ To ensure that each of the IS element insertions was responsible for the motile phenotype of the strains , we constructed strains in which a selectable marker gene , thyA , was inserted in each observed insertion location . 
+ Insertion of the thyA cassette in each of the observed locations resulted in fully motile strains ( Figure S1C ) . 
+ This experiment confirmed that while the tagged strains used in this study are not strictly isogenic , they are likely functionally equivalent . 
+ Additionally , it lends insight into IS element-associated motility by showing that motility acquisition is largely sequence - and location - independent . 
+ This is consistent with the previous hypothesis that IS element insertion up-regulates flhDC by displacing repressors bound in the upstream region [ 44,45 ] . 
+ It is formally possible that the tagged strains contain additional mutations that confer motility , but this is extremely unlikely given the frequency with which motile strains were isolated . 
+ While motile isolates were obtained for each tagged strain , all three tagged strains were less motile than a similarly selected wildtype strain , suggesting that epitope-tagging resulted in moderate functional impairment ( Figure S1D ) . 
+ It is possible that the diminished functionality of the tagged proteins could prevent the identification of very weak binding sites ; however , as discussed in detail below , the tagged proteins resulted in robust ChIP-seq signal at all extensively characterized binding sites and at many novel sites . 
+ Furthermore , the ChIP-seq signals generated for FlhDFLAG and FlhC-FLAG proteins were nearly identical ( Figure 1A ) although the tags are inserted in distinct locations within the quaternary structure of the FlhD4C2 complex . 
+ This suggests that whatever the functional defect is , it is not influencing the ability of the tagged proteins to bind DNA . 
+ Finally , the RNA-seq experiments ( performed with an isogenic set of strains ) do not support the existence of any directly regulated targets not identified by ChIP-seq . 
+ Genome-wide binding of FlhD and FlhC
+ We used ChIP-seq to assess genome-wide binding of FlhD and FlhC in E. coli MG1655 grown to mid-exponential phase ( OD600 0.5 -- 0.7 ) in Lysogeny Broth ( LB ) . 
+ The genome-wide binding profiles of the two proteins are shown in Figure 1A . 
+ Based on a stringent peak-calling analysis , 10 FlhD binding sites and 8 FlhC binding sites were identified ( Table 1 ) . 
+ All 8 FlhC binding sites overlapped with FlhD binding sites . 
+ The 2 additional FlhD binding sites were also associated with substantial FlhC occupancy , barely missing the threshold in the peak calling analysis . 
+ None of the peaks identified in the FlhD or FlhC ChIP-seq experiments overlapped with regions enriched in control ChIP-seq experiments using untagged control strains . 
+ The complete colocalization of FlhD and FlhC binding is consistent with the proteins only binding DNA as a heteromeric complex . 
+ The 10 sites of FlhDC binding include all 5 binding sites associated with the 7 canonical flagellar Class 2 operons . 
+ An FlhDC binding site was identified upstream of yecR , a gene of unknown function , consistent with previous reports [ 20,24 ] . 
+ The additional 4 binding sites identified represent novel FlhDC targets . 
+ Three of the novel FlhDC binding sites are in intergenic regions , upstream of one or more genes ( ampH/sbmA , yciK/sohB , and gntR ) . 
+ The remaining novel binding site is located inside csgC and immediately upstream of a transcription start site for the adjacent gene , ymdA [ 47 ] . 
+ To confirm FlhDC binding sites identified by ChIP-seq , targeted ChIP-qPCR was performed for all loci ( Figure 1B ; Table S1 ) . 
+ Although ChIP-qPCR is not a completely independent validation method , it allows for analysis of more biological replicates , more straightforward comparison with untagged control strains , and reduces common artifacts enhanced by ChIP-seq library amplification [ 48 ] . 
+ Consistent with the ChIP-seq findings , FlhD and FlhC occupancy was significantly above that detected in an untagged strain ( t-test , p-value # 0.05 ) for all loci . 
+ ChIP-qPCR was also used to evaluate FlhDC occupancy at previously predicted binding sites [ 24 ] not identified by ChIP-seq ( Figure S2 ) . 
+ The fliDST promoter was the only site to show significant occupancy by either protein ( Figure 1B , Figure S2 ) . 
+ This site was not identified by our ChIP-seq peak-calling analysis , but small , correctly shaped peaks are visible in the raw data for both proteins at this locus ( see below ) . 
+ Hence , we consider the fliDST promoter to be a genuine FlhDC binding site and have included it in all downstream analysis , bringing the total to 11 FlhDC-bound sites . 
+ The sequences surrounding the FlhDC binding sites identified with ChIP-seq and/or ChIP-qPCR were extracted and analyzed by BioProspector [ 49 ] . 
+ The resulting motif , found in 7 out of 11 sites , is low-scoring and degenerate ( motif score = 1.49 , Figure 1C ) but corresponds well with previously reported motifs for FlhDC binding [ 20,50 ] . 
+ Direct and indirect regulation by FlhDC
+ To determine the effect of the identified FlhDC binding sites on the transcriptome , RNA-seq was performed in the motile wildtype MG1655 strain and isogenic single-gene deletions of flhD and flhC . 
+ Differential gene expression analysis was performed using Rockhopper , an RNA-seq analysis program optimized for bacterial datasets [ 36 ] . 
+ Gene expression was almost identical in the DflhD and DflhC strains ( Figure S3 ) , as would be expected since the factors likely regulate the same genes and because the flhD deletion has a minor polar effect on flhC expression . 
+ Figure 2 shows a genome-wide comparison of normalized expression values in motile wild-type MG1655 versus the average normalized expression of the DflhD and DflhC strains . 
+ Overall , 228 genes were differentially expressed due to deletion of flhD/flhC ( Figure 2 ) . 
+ Of those significantly regulated genes , 40 are associated with FlhDC binding sites , indicating direct regulation . 
+ Of the 11 FlhDC binding sites identified by ChIP , 9 are associated with regulation of one or more adjacent genes ( Figure 3A -- E , Table 1 ) . 
+ We detected strong , positive regulation of all previously reported Class 2 operons by FlhD and FlhC . 
+ Additionally , the RNA-seq data demonstrated FlhDC-dependent activation of yecR , fliDST , and the novel target operon ymdABC ( Table 1 , Figure 3B ) . 
+ The FlhDC binding site upstream of sohB seemed to repress expression by 1.95-fold ( Figure 3C , Table 1 ) . 
+ The RNA-seq experiments did not provide any evidence for regulation associated with the novel binding sites upstream of ampH/sbmA and gntR ( Figure 3D -- E , Table 1 ) . 
+ Selected examples of direct FlhDC-dependent regulation , or lack of regulation , were validated using qRT-PCR ( Figure 3F -- G ) . 
+ The presence or absence of regulation was confirmed for all genes tested , except sohB and the divergently transcribed gene yciK . 
+ While the RNA-seq data suggested that sohB was repressed 1.95-fold by FlhDC and yciK was not regulated , the qRT-PCR data clearly indicate the opposite : no regulation of sohB and significant , but slight positive regulation of yciK ( 1.5-fold for FlhD and 2.29-fold for FlhC ; Figure 3F -- G ) . 
+ Positive FlhDC-dependent regulation of yciK is also supported by complementation experiments . 
+ Small ( less than 4-fold ) but significant changes in ampH and sbmA levels are detected between wild type and DflhC ; however , these changes are not detected in the DflhD strain or supported by complementation experiments ( Figure 3F -- G ) . 
+ RNA levels were also determined using qRT-PCR in a variety of complemented strains . 
+ As suggested by the motility assays ( Figure S1E ) , flhD overexpression could only partially rescue the flhD deletion , but overexpression of flhDC restored target gene expression to wild-type or higher levels ( Figure 3F ) . 
+ Overexpression of flhC alone in the flhD deletion strain did not rescue expression levels at all , demonstrating that the DflhD phenotype was not solely due to the polar effect on flhC expression ( Table S2 ) . 
+ The flhC deletion could be completely complemented by flhC overexpression , but target gene expression never exceeded wild-type levels , suggesting that the wild-type levels of flhD expression limit the possible pool of active FlhDC complexes ( Figure 3G ) . 
+ Mechanism of direct regulation of transcription by FlhDC 
+ To provide further insight into the mechanism of regulation at FlhDC-dependent promoters , ChIP-qPCR was used to evaluate s occupancy at FlhDC-dependent promoters in wild-type and 70 flhD deletion strains ( Figure 4 ) . 
+ For all positively regulated promoters tested , s occupancy was detected in the wild-type 70 strain ; however , s occupancy was completely undetectable in 70 the DflhD strain , except at the yciK/sohB promoter . 
+ At that promoter , deletion of flhD significantly reduced , but did not eliminate , s occupancy , consistent with FlhDC-dependent 70 recruitment to the yciK promoter and FlhDC-independent recruitment to the adjacent sohB promoter . 
+ Our data suggest that FlhDC binding is absolutely required for s : RNAP recruitment 70 to FlhDC-dependent promoters . 
+ This is consistent with reports that these promoters have poor matches to the consensus 210 and 235 hexamers [ 1 ] . 
+ Genome-wide binding of FliA
+ To begin assessing the next level of the transcriptional hierarchy , ChIP-seq was used to determine the genome-wide binding of the flagellar s-factor FliA ( Figure 5A ) . 
+ Following a stringent peak-calling analysis , 52 regions were identified that did not overlap with regions enriched in untagged controls ( Table 2 ) . 
+ These 52 FliA binding sites include 7 canonical flagellar Class 3 promoters and one upstream of the fliLMNOPQR operon . 
+ FliA binding sites were also identified upstream of five other flagellarrelated genes previously shown or predicted to be FliA-dependent : aer , ycgR , yhjH , trg , and tsr . 
+ Additionally , FliA binding sites were identified at 4 other previously reported FliA-dependent promoters of non-flagellar genes : modA [ 30 ] , ves [ 20 ] , ynjH [ 30 ] , and flxA [ 31 ] . 
+ The remaining 35 FliA binding sites represent novel FliA promoters for which no previous experimental evidence can be found , although a small fraction have been predicted by various bioinformatic approaches [ 51 ] . 
+ Strikingly , 28 of these novel binding sites are inside genes , with only 4 of these intragenic binding sites ,300 bp upstream of a start codon for an appropriately oriented annotated gene . 
+ A subset of the putative FliA binding sites identified by ChIP-seq was also tested using ChIP-qPCR . 
+ Of the 13 novel FliA binding sites tested , 11 showed significant enrichment in these targeted experiments ( Figure 5B ) . 
+ The 2 novel sites that could not be validated had ChIP-seq `` fold above threshold '' ( FAT ) scores of 4 or lower . 
+ Twenty-two other peaks had similarly low peak scores ( # 4 ) but were not tested by ChIP-qPCR . 
+ Sequences surrounding each of the 52 FliA binding sites were extracted and examined for a motif using MEME [ 52 ] . 
+ A highly significant motif ( Evalue = 2.4 e ) was identified in all 52 binding regions and 262 matches previously reported FliA promoter motifs well ( Figure 5C , Table 2 ) . 
+ This MEME analysis identified motifs associated with all low-scoring sites , including the two that could not be validated by ChIP-qPCR . 
+ To further assess whether low-scoring sites , as a group , show evidence of sequence-specific FliA binding , motifs were identified separately for high-scoring ( FAT .4 , n = 26 ) and low-scoring ( FAT # 4 , n = 26 ) peaks ( Figure 5D ) . 
+ The two groups yielded very similar motifs , both with highly significant q-values ( E-value = 5.8 e , E-value = 2.9 e 237 210 high low ) . 
+ This strongly suggests that a majority of low-scoring ChIP-seq peaks represent 
+ Not only were motifs identified for all ChIP-seq peaks , but these motifs were also localized in a striking pattern relative to the ChIP-seq peaks ( Figure 5E ) . 
+ For most ChIP-seq experiments , one would expect motifs to be enriched at peak centers [ 53 ] . 
+ However , FliA motifs were clustered with a median position of 25 nt upstream of the peak center ( relative to the direction of the motif ) . 
+ This is consistent with the FliA binding to the 210 and 235 hexamer motif while in the context of RNAP holoenzyme [ 54 ] . 
+ This localization of FliA motifs strongly suggests that FliA only binds in the context of RNAP holoenzyme . 
+ Furthermore , this analysis allowed us to identify two weak , non-canonical , putative FliA binding sites ( insB-4 / cspH convergent intergenic region and inside proK ) , that have unusually localized motifs , suggesting that they might not be genuine FliA promoters . 
+ While this manuscript was in preparation , FliA ChIP-chip data was published by another group [ 55 ] . 
+ Our analysis of this study indicates many false positives and false negatives , and the resolution is far lower than that of our data ( Figure S4 ) . 
+ Direct regulation by FliA
+ To detect FliA-dependent changes in gene expression , RNA-seq was performed in a DfliA strain and compared to the isogenic motile wild-type . 
+ Analysis of differential gene expression was performed with Rockhopper [ 36 ] . 
+ Overall , 68 genes were differentially expressed between the motile wild-type and a DfliA strain ( Figure 6 ) . 
+ Most of the strongly regulated genes are associated with FliA binding sites identified by ChIP-seq ( Figure 6 , green points ) , indicating direct regulation . 
+ Of the 52 FliA binding sites identified by ChIP-seq , 14 were associated with significant gene expression changes under these conditions ( Table 2 , Figure 7 ) . 
+ These included some , but not all , Class 3 promoters , and 8 other previously reported FliA promoters . 
+ Interestingly , three of the intragenic FliA binding sites were also associated with significant regulation of surrounding genes . 
+ The promoters inside flhC , yjdA , and yafY drive transcription of the downstream genes ( motA , yjcZ , and ykfB , respectively ; Figure 7C , Table 2 ) . 
+ In addition to using Rockhopper to analyze differential gene expression , we visually inspected mapped RNA-seq data to find 
+ FliA-dependent transcripts that might otherwise have been overlooked . 
+ We identified an unusual transcript associated with uhpT the novel FliA binding site inside . 
+ Unlike the previously discussed examples , this promoter drives transcription of a purely intragenic RNA . 
+ RNA-seq detects a small ( ,50 nt ) RNA encoded in the same orientation as the gene ( Figure 7D ) . 
+ Visual examination of the RNA-seq data also led to the discovery of an antisense orientation noncoding RNA ( asRNA ) associated with the novel intragenic FliA-promoter inside hypD ( Figure 7E ) . 
+ This asRNA is contained entirely within the hypD gene and overlaps ,500 nt of the 59 end of the hypD open reading frame ( ORF ) . 
+ The hypD antisense RNA is too weakly expressed to be detected by 
+ Rockhopper , despite the program 's ability to identify novel antisense RNAs . 
+ The remaining novel FliA binding sites were not associated with detectable FliA-dependent changes in gene expression under these conditions ( eg . 
+ speA , Figure 7F ) . 
+ A variety of canonical and novel examples of FliA-dependent regulation were further validated using qRT-PCR ( Figure 7G , Table S2 ) . 
+ In all cases , including the two novel non-coding RNAs , the regulation identified in the RNA-seq experiment was confirmed in these targeted experiments . 
+ Additionally , overexpression of fliA in the DfliA strain resulted in higher than wild-type levels of expression of all FliA-dependent transcripts tested . 
+ Dual regulation of complex promoters and overlapping operons 
+ Complex promoters and overlapping operons were first evaluated using our ChIP-seq and RNA-seq data ( Figure 8 ) . 
+ The previously described ( fliAZY [ 22 ] and fliLMNOPQR [ 22 ] ) or predicted ( fliDST [ 24 ] ) complex promoters are supported by our findings ( Figure 8A , B , D ) . 
+ The previously described overlapping flgAMN/flgMN operons are also supported by our data ( Figure 8C ) . 
+ Finally , our data confirm that flgKL can be transcribed as part of the upstream FlhDC-dependent flgBCDEF-GHIJ operon , in addition to being transcribed from its own Class 3 promoter ( Figure 8C , Table 2 ) . 
+ While these operons have been previously described in Salmonella [ 56,57 ] , this is the first experimental demonstration of the overlapping operons in E. coli . 
+ The novel flgBCDEFGHIJKL operon was further confirmed by using RT-PCR to amplify across the flgJ-flgK boundary ( Figure S5A ) . 
+ In addition to confirming dual regulation of fliDST and discovering that flgKL can be transcribed as part of the upstream operon , the RNA-seq data from this study reveal a surprising pattern for all dual regulated flagellar genes . 
+ In the investigated conditions and growth phase , deletion of FliA has little or no effect on the expression levels of dual regulated genes ( Figure 8A -- D ) . 
+ This suggests that , while FliA occupancy is detected at all dual promoters by ChIP-seq , most transcription is coming from the FlhDC-dependent s promoters under our conditions . 
+ 70 We used qRT-PCR to further investigate the regulatory input of FlhDC and FliA for dual regulated genes ( Figure 8E ) . 
+ As seen in the RNA-seq experiments , deletion of fliA had a much less dramatic effect on gene expression compared to deletion of flhD . 
+ Overexpression of fliA in the DflhD strain significantly increased RNA levels of all dual targets ( compared to DflhD ) showing that the FliA promoters are capable of transcribing the target genes . 
+ However , this effect was smaller for fliL than for any other dual regulated gene . 
+ Furthermore , overexpression of fliA in the DfliA strain ( hashed green bar ) reduced fliL expression by 2.10-fold compared to the empty vector control ( solid green bar ; t-test , pvalue 0.02 ) , suggesting a possible repressive interaction ( Figure 8E ) . 
+ As dual regulation has been suggested for all Class 2 targets 
+ [ 23 ] , we performed similar qRT-PCR experiments for these genes . 
+ Overexpression of fliA in the DflhD strain was not able to significantly increase the expression of any other Class 2 target tested ( Figure S6 ) . 
+ While not statistically significant , flhB expression does increase with fliA overexpression ( Figure S6 ) . 
+ Since flhB is downstream of the FliA-dependent meche operon , it is possible that the flhBAE operon is dual regulated and can be transcribed as part of the upstream operon . 
+ However , there is no evidence of co-transcription from the RNA-seq or RT-PCR targeting the cheZ-flhB intergenic region ( Figure S5B ) so we conclude that this read-through is likely an artifact of extreme FliA overexpression . 
+ It should also be noted that no FliA ChIP-seq signal was detected in the vicinity of the flhBAE , fliE , or fliFGHIJK operons , nor were any of these Class 2 genes differentially expressed in RNA-seq experiments ( motile wild-type and DfliA strains ) . 
+ Therefore , with the exception of the dual targets already discussed , our data do not support widespread dual regulation of Class 2 genes . 
+ Motility assessment of novel FlhDC and FliA targets
+ To determine if novel FlhDC and/or FliA targets were required for motility , knockouts were generated for ampH , ymdA , sbmA , gntR , ykfB , yjcZ , ynjH , hypD , and uhpT . 
+ Motility on semi-solid agar was determined for each deletion strain . 
+ None of the deletions resulted in a substantial change in motility ( Figure S7 ) , suggesting that these genes do not directly contribute to motility under the conditions tested . 
+ Discussion
+ FlhDC regulon
+ To our knowledge , this study provides the first look at genome-wide in vivo binding of FlhD and FlhC . 
+ We have redefined the direct FlhDC regulon by identifying new targets and showing that many previously reported non-flagellar targets are incorrect or not bound under these conditions ( Table 1 , Figure S2 ) . 
+ Our data support a surprisingly limited direct FlhDC regulon and show no evidence of either protein binding DNA independently . 
+ Under the conditions tested , FlhDC binds 11 loci and directly regulates 11 transcriptional units . 
+ However , it is important to note that some binding sites are associated with regulation of two divergent operons while others are not associated with any detectable regulation . 
+ In addition to the classical flagellar Class 2 binding sites , our study detects previously reported binding sites upstream of yecR and fliDST , and 4 novel FlhDC binding sites . 
+ Two of these novel binding sites were associated with detectable FlhDC-dependent regulation of adjacent genes ( ymdA and yciK ) , while the remaining two were not ( gntR and ampH/sbmA ) . 
+ It is likely that the two latter sites are functional , perhaps under different conditions , because the probability of two spurious sites occurring in intergenic regions is very small ( only ,11 % of the genome is intergenic sequence ) . 
+ None of the genes surrounding the 4 novel binding sites have an obvious functional connection to flagellar synthesis , and deletion of these genes had no detectable effect on motility ( Figure S7 ) . 
+ The RNA-seq data provided little evidence of an extensive indirect FlhDC regulon . 
+ While 183 genes were differentially regulated but not associated with FlhDC or FliA binding , the magnitude of the indirect regulation is significantly smaller than for direct targets ( Figure 2 ; Krustal-Wallis , p , 0.0001 ) . 
+ FliZ is the only transcription factor regulated directly by either FlhDC or FliA , and we do not see regulation of described FliZ targets [ 58 ] . 
+ Therefore , we doubt that FliZ-dependent regulation substantially contributes to the indirect FlhDC-depen-dent regulation detected in this study . 
+ Instead , it is likely that the indirectly regulated genes do not represent an additional level of the FlhDC transcriptional network but result from physiological and metabolic changes associated with flagellar motility . 
+ Despite having a degenerate consensus motif , FlhDC appears to bind DNA highly specifically - at only 11 sites throughout the genome . 
+ The presence of a motif alone is insufficient for binding in vivo , even at sites that are bound in vitro ( Figure S2 ) . 
+ We propose that as-yet unidentified factors such as DNA conformation or competition with nucleoid-associated proteins play a role in the 
+ Although the presence of a motif and in vitro binding does not always predict in vivo FlhDC behavior , a striking pattern emerges for flagellar Class 2 sites . 
+ There is a perfect correlation between reported in vitro FlhDC affinities [ 54 ] , expression kinetics [ 59 ] , and in vivo FlhDC occupancy ( this study ) . 
+ Coupled with the s 7uChIP data presented here ( Figure 4 ) , this suggests a simple mechanism of transcriptional activation : s - RNAP is unable to 70 bind these promoters in the absence of FlhDC , but as its concentration increases , FlhDC binds DNA and recruits s-70 
+ RNAP to promoters in the order of relative FlhDC binding affinity . 
+ This model of affinity-based temporal ordering has been commonly predicted in transcriptional networks [ 60 ] . 
+ Furthermore , this suggests that for transcription factors with a similar recruitment-based mechanism of action , relative in vivo occupancy derived from ChIP-seq can be used to predict temporal 
+ FliA regulon
+ By coupling ChIP-seq and RNA-seq , we have comprehensively identified FliA binding sites and matched some of these with FliA-dependent transcripts . 
+ We identified 52 FliA binding sites , 35 of which are novel . 
+ Furthermore , we have demonstrated that existing computational approaches [ 30,32,33,51 ] are poor predictors of in vivo FliA binding . 
+ Our data confirm all the better-characterized members of the FliA regulon while discounting most promoters for which only bioinformatic predictions exist 
+ Under the conditions used in our study , 14 out of 52 FliA binding sites were associated with significant changes in the RNA levels of one or more surrounding genes ( wild-type vs. DfliA ) . 
+ This suggests , as will be discussed below , that a large number of promoter sequences bind FliA but are relatively inactive . 
+ Similar to FlhDC , the RNA-seq data provide little evidence for an extensive indirect FliA regulon . 
+ Only 40 genes were differentially regulated but not associated with FliA binding . 
+ The magnitude of indirect regulation was dramatically smaller than direct regulation ( Figure 6 ; Mann-Whitney , p ,0.0001 ) . 
+ As with FlhDC , it seems likely that indirect regulation is due to secondary physiological and metabolic changes associated with flagellar motility . 
+ Consistent with this idea , 29 of the 40 indirectly regulated genes are also indirectly regulated by FlhDC . 
+ Non-canonical FliA binding sites
+ The most striking finding from our FliA ChIP-seq experiments is that more than half of FliA binding sites are inside genes . 
+ While intragenic binding has been reported for other s-factors and for many transcription factors , the proportion of intragenic FliA sites is remarkable . 
+ Similar to our findings , 40 % of the sites bound by the Mycobacterium tuberculosis s-factor , SigF , are intragenic . 
+ Some of these sites are associated with SigF-dependent transcription in the antisense orientation relative to the overlapping gene [ 61 ] . 
+ RpoH ( s ) has been reported to bind many intragenic sites , 32 but these sites only account for ,25 % of the total sites bound ( 22 out of 87 , [ 62 ] ) . 
+ Lastly , a recent study reported that s can bind 70 and initiate transcription at a large number of intragenic sites , but that this phenomenon is repressed at many locations by the nucleoid-associated protein H-NS [ 63 ] . 
+ Combined with our FliA results , it is clearly important to take intragenic transcription initiation into account when analyzing genome-wide data . 
+ Of the 30 intragenic FliA sites identified in our study , only 5 are associated with detectable changes in RNA levels . 
+ Three intragenic promoters appear to drive transcription of canonical mRNAs for the downstream gene ( s ) . 
+ Two of these promoters have been accurately predicted before ( inside flhC driving motAB-cheAW [ 32,64 ] , and inside yafY driving ykfB [ 20 ] ) , but one is novel ( inside yjdA driving yjcZ ) . 
+ A FliA promoter upstream of yjdA has been incorrectly predicted , based on expression data [ 20 ] . 
+ These three sites function as canonical promoters and are likely inside genes simply due to the spatial constraints of a small genome . 
+ Additionally , two novel intragenic FliA binding sites are associated with detectable small , intragenic RNAs . 
+ The FliA promoter inside uhpT transcribes a small RNA overlapping the 39 end of the ORF in the sense orientation ( Figure 7D ) , while the FliA promoter inside hypD drives transcription of an antisense RNA ( Figure 7E ) . 
+ A FliA promoter upstream of uhpT has been predicted based on expression data [ 20 ] , but our data clearly show that the FliA-dependent transcript is not an mRNA . 
+ While the FliA promoters inside uhpT and hypD are associated with detectable transcription , the functions of the corresponding RNAs are not known . 
+ Neither intragenic RNA shows evidence of regulating the overlapping mRNA . 
+ It is possible that one or both of these RNAs regulate trans-encoded targets . 
+ Most intragenic FliA binding sites are not associated with detectable transcripts . 
+ Furthermore , most of these putative promoters are greater than 300 nucleotides upstream of the start codon to an adjacent gene making it improbable that they drive transcription of canonical mRNAs . 
+ More likely , these promoters , if active , transcribe non-coding RNAs . 
+ However , since no changes in local RNA levels were detected flanking these binding sites , it remains unclear whether these FliA binding sites represent functional FliA-dependent promoters . 
+ Many studies have reported widespread spurious transcription , most of which is rapidly terminated and produces highly unstable transcripts [ 65 ] . 
+ Therefore , these promoters may be active but the resulting RNAs are too unstable to detect by standard RNA-seq methods . 
+ Three of these intragenic promoters were previously predicted in E. coli and two of the three , those inside galK and speA , showed FliA-dependent activity when cloned upstream of the chloramphenicol acetyltransferase reporter gene [ 64 ] . 
+ This supports the hypothesis that at least some of the intragenic FliA binding sites are functional promoters , even if no transcripts were detectable in our study . 
+ Alternatively , these binding sites could represent promoter-like sequences where FliA : RNAP can bind but can not initiate transcription under the conditions tested , or potentially , ever . 
+ Regardless of their transcriptional activity , intragenic FliA sites could serve to alter the available pool of FliA , indirectly affecting transcription from canonical promoters , as has been proposed for some transcription factors [ 66,67 ] . 
+ Network topology and the consequences of dual regulation
+ Over the last 15 years there has been a growing interest in modeling cellular processes as networks [ 60 ] . 
+ Several frequently occurring network motifs have been identified , with feed-forward loops ( FFLs ) being one of the most common . 
+ In FFLs , one regulator regulates another regulator , and both regulate a common target gene ( or genes ) [ 68 ] . 
+ A quantitative computational model of the flagellar network has been constructed [ 23 ] and is considered a seminal work in the field of network modeling [ 69 ] . 
+ A key aspect of the existing flagellar network model is that all Class 2 genes are modeled as FFLs with dual FlhDC/FliA input . 
+ However , our data clearly indicate that three Class 2 operons ( 10 genes ) are not dual regulated ( Figure S6 , Figure 9A ) . 
+ This implies that the true behavior of the transcriptional network can not be accurately predicted by the existing model . 
+ In addition to clarifying the overall topology of the network , our genome-wide and targeted experiments have revealed interesting information about the regulatory inputs for specific dual targets . 
+ One striking pattern that emerges is that , while FliA binding is readily detected at all dual targets , RNA levels detected by RNA-seq and qRT-PCR only change moderately between wild-type and DfliA . 
+ In most cases , expression of dual regulated genes is affected less than 4-fold by fliA deletion , but more than 100-fold by flhDC deletion ( Figure 8 ) . 
+ This suggests FliA-dependent transcription is contributing very little to the overall abundance of dual regulated transcripts under these conditions , despite FliA being present at the associated promoters . 
+ Despite the seemingly low level of activity of these FliA promoters , when fliA is overexpressed in a DflhD strain , expression of fliD , flgM , and flgK returns to wildtype or higher levels . 
+ This demonstrates that these promoters do have the capacity to be very active , but it remains to be seen if FliA levels and activity are ever high enough to trigger high-level promoter activity in wild-type cells . 
+ Interestingly , fliA overexpression results in lower fliL expression in both DflhD and DfliA strains compared to wild-type ( Figure 8E ) . 
+ Since the FliA promoter is downstream of the FlhDC-dependent s promoter , 70 we speculate that high FliA : RNAP occupancy yields little FliA-dependent transcription and actually competes with s : RNAP at 70 the level of DNA binding to repress transcription from the upstream promoter . 
+ Overall , our data suggest that flagellar genes likely have diverse temporal expression patterns due to the diversity of regulatory inputs at each promoter ( Figure 9A ) . 
+ Now that we have systematically identified dual-regulated genes , it is tempting to surmise why certain genes are under dual control , and others are not . 
+ The correlation between the gene expression class ( Class 2 or 3 ) and the developmental stage at which the resulting proteins are utilized has been described ( reviewed [ 3 ] ) . 
+ However , dual-regulated gene products show less of a pattern in localization or correlation with a specific phase of assembly ( Figure 9B ) . 
+ In Salmonella , the Class 3 promoters of fliDST and flgKL have been shown to be required for swarming and for rapid repair of sheared flagella [ 57 ] . 
+ The authors hypothesized that these genes utilize their Class 2 promoters during de novo flagellar assembly but are transcribed from the Class 3 promoters , independent of FlhDC activity , to repair flagella broken during swarming . 
+ Wozniak et al. also suggest that flgMN is dual regulated so that FlgM can fine-tune FliA activity throughout flagellar assembly and so that FlgN can assist in FlgK folding and transport during flagellar repair [ 57 ] . 
+ The final dual-regulated operon identified in our study , fliLMNOPQR , encodes components of the C-ring and secretion apparatus that localize to the proximal portion of the flagella ( Figure 9B ) . 
+ Our results suggest that high levels of FliA may repress this operon ( Figure 8E ) but it is unclear why these components would be specifically targeted by negative feedback . 
+ Alternatively , FliA may positively regulate the fliL operon at some stage of flagellar development but it is equally unclear why this would be required . 
+ The specific roles of the Class 2 and Class 3 promoters of dual-regulated targets should be explored further to determine the physiological importance of the complex temporal gene expression patterns resulting from dual regulation . 
+ Conclusions
+ We have identified many new targets of FlhDC and FliA , including many non-canonical FliA binding sites . 
+ Additionally we have shed new light on the complex topology of the network by systematically defining Class 2 , Class 3 , and dual-regulated targets . 
+ Finally , we have demonstrated that the combined application of ChIP-seq and RNA-seq to related regulators provides sufficient data to build a transcriptional network model from scratch or redefine even the best-characterized networks , such as the flagellar transcription network . 
+ Materials and Methods
+ Strains and plasmids
+ Bacterial strains used in this work are listed in Table S4 . 
+ Cells were grown in Lysogeny Broth ( LB : 1 % NaCl , 1 % tryptone , 0.5 % yeast extract ) or Tryptone broth ( TB : 1 % tryptone , 0.5 % NaCl ) . 
+ Strains harboring plasmids were cultured with the appropriate antibiotic , as listed in Table S4 . 
+ Primers used in strain construction are described in Table S5 . 
+ Epitope-tagged strains ( DMF11 , DMF14 , RPB081 ) were generated using the FRUIT method of recombineering [ 70 ] . 
+ All epitope-tagged strains originate from AMD052 ( MG1655 DthyA ) , which has a poorly motile phenotype . 
+ FliA was N-terminally tagged by inserting a 36FLAG tag after the third amino acid . 
+ FlhD and FlhC were tagged by inserting 36FLAG tags into internal loop regions ( Figure S1A ) . 
+ Internal sites were chosen since the termini of both proteins are known or predicted to participate in proteinprotein and/or protein-DNA contacts [ 42,43 ] . 
+ Following motility selection ( see below ) , the region upstream of flhDC and the flhDCcoding region were sequenced in each strain ( Wadsworth Center Applied Genomic Technologies Core ; File S1 ) . 
+ Strains DMF62 -- 65 were also generated using FRUIT . 
+ In each strain , the thyA cassette was inserted ( facing away from flhDC ) at a precise location upstream of flhDC . 
+ thyA insertions were confirmed by PCR and sequencing ( Wadsworth Center Applied Genomic Technologies Core ) . 
+ Deletion strains ( DflhD , DflhC , and DfliA ) were also generated using recombineering . 
+ First , DMF35 was generated from AMD052 by motility selection , as described below . 
+ With DMF35 as the common parent strain , FRUIT was used to generate scarless deletions of flhD ( DMF38 ) and fliA ( DMF40 ) . 
+ The flhC deletion strain ( DMF58 ) was generated by amplifying the flhC : : kan allele from the Keio collection [ 71 ] , electroporating it into DMF35 , and then removing the kan cassette by expressing 
+ FLP recombinase from pCP20 [ 72 ] . 
+ Following tag insertion or gene deletion , thyA was reintroduced at its native locus and strains were cured of all plasmids . 
+ Strains DMF50 -- 57 were also generated from AMD052 , using FRUIT to replace the genes of interest with the thyA gene . 
+ Note that these strains lack thyA at its native locus , + but have a thyA phenotype . 
+ Overexpression plasmids were generated by cloning the ORF of interest into pBAD24 [ 73 ] cut with NheI and SphI ( NEB ) using the In-Fusion method ( Clontech ) . 
+ All deletion strains and plasmids were verified by PCR and sequencing ( Wadsworth Center Applied Genomic Technologies Core ) . 
+ Motility selection to generate motile isolates
+ Our lab isolate of MG1655 , and the related strain AMD052 , displayed a poorly motile phenotype . 
+ PCR using JW3100 and JW5358 demonstrated that these strains lacked an IS element upstream of the flhDC , which is required for high-level motility . 
+ Epitope tagged strains were generated from AMD052 , and were thus initially poorly motile as well . 
+ To isolate highly motile derivatives , saturated overnight cultures of strains AMD052 and preliminary epitope-tagged strains were spotted ( 5 mL ) onto soft TB agar ( 0.3 % ) and incubated at 30uC . 
+ Motile subpopulations began emerging between 20 and 24 hours . 
+ We typically observed 1 -- 5 motile subpopulations on each plate . 
+ Motile cells were collected from stabs , yielding motile strains DMF35 , DMF11 , DMF14 , and RPB081 . 
+ IS element insertion was verified by PCR and sequencing with oligonucleotides JW3100 and JW5358 ( Wadsworth Center Applied Genomic Technologies Core ; File S1 ) . 
+ Motility assays
+ Overnight cultures were grown in TB at 30uC . 
+ Saturated cultures ( 5 mL ) were spotted onto 155 mm TB soft agar ( 0.2 % ) plates and incubated at 30uC . 
+ Each plate included DMF36 for reference and 1 -- 4 other strains of interest . 
+ Each assay was performed 5 times from independent overnight cultures . 
+ Images were taken at hourly intervals from 4 -- 6 hours post-inoculation . 
+ Representative images are shown in Figure S1 and S7 . 
+ ChIP-qPCR
+ Strains DMF11 , DMF14 , RPB081 , and DMF36 were used for all ChIP experiments . 
+ Subcultures were grown in LB at 37uC with aeration to an OD600 of 0.5 -- 0.7 . 
+ Cultures were harvested , crosslinked , sonicated , and immunoprecipitated as previously described [ 74 ] , with minor modifications . 
+ Anti-FLAG ( 2 mL per IP , M2 monoclonal ; 70 Sigma ) or anti-s ( 1 mL per IP ; Neoclone ) antibodies were used for all immunoprecipitations , both for ChIP-qPCR and ChIP-seq . 
+ ChIP and input DNA was purified using Zymo PCR Clean and Concentrate kit . 
+ Samples were analyzed using qPCR as previously described [ 74 ] . 
+ Oligonucleotides used to amplify the bglB control region and regions of interest are described in Table S6 . 
+ ChIP-qPCR experiments were utilized 3 -- 7 biological replicates per strain . 
+ Complete ChIP-qPCR data is presented in Table S1 . 
+ ChIP-seq library preparation and sequencing
+ Cultures for ChIP-seq experiments were grown as described for ChIP-qPCR . 
+ ChIP-seq libraries were constructed and sequenced as previously described [ 63 ] with the exception of one replicate of FlhC-FLAG , which was prepared as described [ 37 ] . 
+ Antibodies used were the same as those used for ChIP-qPCR . 
+ ChIP-seq libraries were constructed and sequenced for 2 biological replicates per strain . 
+ Sequencing was performed on an Illumina Hi-Seq instrument ( University at Buffalo , SUNY ) . 
+ Sequences were aligned to the MG1655 genome ( NC_000913 .2 ) using the CLC Genomics Workbench . 
+ Mapped reads were piled up and written to a . 
+ gff file using a custom Python script and viewed in SignalMap ( Nimblegen ) . 
+ All ChIP-seq images presented in this study are captured from SignalMap and manipulated in the image editing software GIMP to highlight baselines ( zero reads ) and fill gaps in the data resulting from image artifacts . 
+ Almost all ChIP-seq analysis programs have been designed and optimized for eukaryotic ChIP-seq data and , in our experience , do not perform well with bacterial ChIP-seq data . 
+ We have generated custom Python scripts to identify peaks in bacterial ChIP-seq data . 
+ First , all datasets were normalized to 100 million reads . 
+ Pairs of replicate datasets were considered together . 
+ For each replicate dataset in the pair , an appropriate threshold was determined . 
+ The plus and minus strands were considered separately . 
+ For the first replicate , for a given strand , a value T1 was selected as the threshold . 
+ For the second replicate , a value T was selected as the 2 threshold . 
+ Values for T and T were considered between 1 and 1 2 1000 . 
+ For each combination of values for T and T , the number 1 2 of genome positions with values $ T1 in the first replicate and with values $ T2 in the second replicate was determined . 
+ The false discovery rate was estimated using the null hypothesis that no regions are enriched . 
+ The combination of thresholds yielding the highest number of true positive positions , with an estimated false discovery rate of less than 0.01 , was selected . 
+ Once T1 and T2 were chosen , peak calling was performed as previously described ( Supplementary Material of [ 54 ] ) . 
+ Briefly , a region was identified as a peak if both replicates showed enrichment above the corresponding thresholds for each strand . 
+ For a peak to be called there must be a peak on the plus strand within a threshold distance of a peak on the minus strand , as previously described ( Supplementary Material of [ 54 ] ) . 
+ To identify regions of artifactual enrichment , peaks identified in tagged strains were compared to those called in a control ChIP-seq experiment using an untagged strain ( DMF35 ) . 
+ For each factor , the calculated T values were adjusted to reflect the total number of reads in control experiment replicates and then applied for peak calling in the controls . 
+ Any regions for which a peak was called in the true ChIP-seq experiment and in the untagged control experiment within 50 bp of each other were considered potential artifacts and excluded from further analysis . 
+ RNA isolation and RNA-seq library preparation
+ Strains DMF36 , DMF38 , DMF58 , and DMF40 were used for RNA-seq experiments . 
+ Subcultures were grown in LB at 37uC with aeration to an OD600 of 0.5 -- 0.7 . 
+ RNA was purified using a modified hot phenol method , as previously described [ 37 ] . 
+ Following isolation , RNA was treated with DNase ( TURBO DNA-free kit ; Life Technologies ) for 45 minutes at 37uC , followed by phenol extraction and ethanol precipitation . 
+ rRNA was removed using the RiboZero kit ( Epicentre ) and strand-specific DNA libraries were constructed using the ScriptSeq 2.0 kit ( Epicentre ) . 
+ Sequencing was performed using an Illumina Hi-Seq instrument ( University at Buffalo , SUNY ) . 
+ RNA-seq data visualization and differential expression analysis
+ Sequence reads were mapped and visualized as described above . 
+ Differential expression analysis was performed using Rockhopper [ 36 ] with default parameters . 
+ As suggested by the developers , changes in gene expression were considered statistically significant if q-value # 0.01 . 
+ Genes were required to be regulated at least 2-fold to be considered significantly differentially regulated . 
+ Ribosomal RNAs ( rRNAs ) were excluded from RNA-seq analysis since rRNA was removed during library preparation . 
+ RNA isolation and qRT-PCR
+ Strains pDMF9 -- 18 were used for qRT-PCR experiments . 
+ Subcultures were grown in LB +100 mg/ml ampicillin at 37uC with aeration to an OD600 of 0.4 -- 0.6 . 
+ Arabinose was added to a final concentration of 0.2 % and cultures were incubated at 37uC for an additional 10 minutes . 
+ RNA was isolated as follows . 
+ 10 ml of culture were harvested by centrifugation , resuspended in 1 ml Trizol reagent ( Life Technologies ) and incubated at room temperature for 5 min . 
+ Samples were centrifuged , supernatants were transferred to new tubes , and 200 mL of chloroform was added . 
+ Samples were mixed well , incubated at room temperature for 3 minutes , and centrifuged . 
+ The aqueous layers were transferred to new tubes and mixed with 500 mL isopropanol . 
+ Samples were mixed well , incubated at room temperature for 10 minutes , and centrifuged to collect precipitated RNA . 
+ Pellets were washed once with 75 % ethanol and then dried . 
+ RNA was resuspended in H2O and treated with TURBO DNase ( Life Technologies ) as described above . 
+ RNA was reverse transcribed using SuperScript III reverse transcriptase ( Invitrogen ) with 100 ng of random hexamer , according to the manufacturer 's instructions . 
+ A control reaction lacking reverse transcriptase ( no RT ) was performed for each sample . 
+ cDNA and no RT samples were diluted and used as templates for qPCR . 
+ qPCR was performed with an ABI 7500 Fast real time PCR machine . 
+ Oligonucleotides used to amplify target genes , and the control ( mreB ) are listed in Table S7 . 
+ Relative expression values were calculated using a modified 2 method [ 75 ] . 
+ First target Ct 2DDCt values were normalized to the control ( mreB ) yielding DCt values , then 1.9 ( assuming imperfect PCR efficiency ) was calculated 2DCt for each target in each strain . 
+ Finally , expression values in all strains were normalized to the average 1.9 value in motile 2DCt MG1655 + pBAD . 
+ qRT-PCR experiments utilized 3 -- 4 biological replicates per strain and all qPCR reactions were performed in triplicate . 
+ Supporting Information
+ locations for internal 3 FLAG-tagging of FlhD and FlhC . 
+ Black lines represent unstructured regions , red cylinders represent ahelices , and blue boxes represent b-sheets . 
+ Gold stars represent Zn-binding cysteine residues . 
+ Insets show amino acid sequence surrounding tag insertion sites . 
+ ( B ) Location , identity , and direction of IS element insertions present in each motility-selected strain . 
+ The boxes represent regions that are duplicated during insertion . 
+ ( C ) Soft agar motility of motile MG1655 and strains in which a thyA cassette has been inserted in each of the IS element insertion locations described above . 
+ ( D ) Soft agar motility of motile MG1655 and epitope-tagged strains . 
+ ( E ) Soft agar motility of motile MG1655 , isogenic deletions , and complemented strains . 
+ Note that DflhC + pflhC is non-motile due to disruption of the promoter of the motAB-cheAW operon . 
+ ( PDF ) is not predictive of in vivo binding . 
+ Stafford et al. [ 25 ] predicted FlhDC binding sites based on the consensus motif of characterized binding sites . 
+ The sites shown in this figure had good matches to the consensus and demonstrated weak in vitro binding . 
+ With the exception of fliD and yecR ( not shown ) , none of the predicted sites showed in vivo FlhDC binding in targeted ChIP-qPCR assays ( n = 4 , * p ,0.05 , ** p ,0.01 ) . 
+ ( PDF ) 
+ Expression of all genes in DflhD versus DflhC . 
+ Gene expression values represent normalized expression values calculated by Rockhopper . 
+ ( PDF ) and Cho et al. [ 55 ] . 
+ Motifs were generated from sequence surrounding binding sites identified only in our study ( n = 27 ) or only in Cho et al. ( n = 29 ) . 
+ ( A ) FliA binding sites unique to our study yielded a highly significant motif ( 27/30 sites , Evalue = 1.5e-27 ) similar to that described for FliA . 
+ ( B ) The best-scoring motif for FliA binding regions unique to Cho et al is not significantly enriched , and shows no similarity to described FliA motifs ( best-scoring motif : 10/29 , E-value = 3.5 ) . 
+ It should also be noted that Cho et al failed to detect some well-characterized FliA promoters such as those upstream of fliAZY and fliC . 
+ ( PDF ) 
+ RT-PCR using an upstream primer within flgJ and a downstream primer within flgK yielded a band of the expected size in motile MG1655 , but not in DflhD . 
+ This confirms that flgKL can be transcribed as part of the upstream FlhDC-dependent operon . 
+ Lanes labeled `` colony '' are a colony ( genomic DNA ) PCR control , `` + RT '' are RT-PCR , and `` 2RT '' are controls in which no reverse transcriptase was added during cDNA synthesis . 
+ ( B ) RTPCR using an upstream primer within cheZ and a downstream primer within flhBAE yielded very little product . 
+ Product slightly increased , relative to the 2RT control , when fliA was overexpressed . 
+ This small amount of read-through at very high FliA levels is unlikely to be physiologically relevant . 
+ ( PDF ) 
+ Expression of flgA , flgB , and fliF can not be rescued by overexpression FliA in DflhD . 
+ Expression of flhB is moderately increased in the DflhD + pBAD-fliA strain , potentially due to read-through from the upstream FliA-dependent tar-tap-cheRBYZ ( see Figure S4 ) . 
+ ( PDF ) required for motility . 
+ Soft agar motility of motile MG1655 and single gene deletions of FlhDC and FliA target genes . 
+ Images are representative of 5 biological replicates per strain . 
+ ( PDF ) 
+ Acknowledgments
+ We thank Dave Grainger , James Galagan and Todd Gray for comments on the manuscript . 
+ We thank Pascal Lapierre for help implementing the ChIP-seq peak-calling algorithm . 
+ We thank Keith Derbyshire , Randy 
+ Author Contributions
+ Conceived and designed the experiments : DMF JTW . 
+ Performed the experiments : DMF . 
+ Analyzed the data : DMF JTW . 
+ Contributed reagents / materials/analysis tools : RPB . 
+ Contributed to the writing of the manuscript : DMF JTW .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/25375160.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/25375160.txt 0 → 100644
View file @27818a9
+ Characterization of the YdeO Regulon in Escherichia coli
+ Abstract 
+ Enterobacteria are able to survive under stressful conditions within animals , such as acidic conditions in the stomach , bile salts during transfer to the intestine and anaerobic conditions within the intestine . 
+ The glutamate-dependent ( GAD ) system plays a major role in acid resistance in Escherichia coli , and expression of the GAD system is controlled by the regulatory cascade consisting of EvgAS . 
+ YdeO . 
+ GadE . 
+ To understand the YdeO regulon in vivo , we used ChIP-chip to interrogate the E. coli genome for candidate YdeO binding sites . 
+ All of the seven operons identified by ChIP-chip as being potentially regulated by YdeO were confirmed as being under the direct control of YdeO using RT-qPCR , EMSA , DNaseI-footprinting and reporter assays . 
+ Within this YdeO regulon , we identified four stress-response transcription factors , DctR , NhaR , GadE , and GadW and enzymes for anaerobic respiration . 
+ Both GadE and GadW are involved in regulation of the GAD system and NhaR is an activator for the sodium/proton antiporter gene . 
+ In conjunction with co-transcribed Slp , DctR is involved in protection against metabolic endoproducts under acidic conditions . 
+ Taken all together , we suggest that YdeO is a key regulator of E. coli survival in both acidic and anaerobic conditions . 
+ Introduction
+ Enterobacteria such as Escherichia coli , exist in the environment , and in the gut of warm blooded animals . 
+ To survive this switch in lifestyles , and upon ingestion by a new host , bacteria are directly exposed to various stresses and hence require sophisticated stress response systems to survive continuous changes in environment such as acidic conditions in the stomach , bile salts , and anaerobic conditions within the intestines [ 1 ] . 
+ For survival under acidic conditions , E. coli possesses three amino acid-dependent acid resistance systems with glutamate , arginine , and lysine [ 2,3,4 ] . 
+ The resistance mechanism involves the transient consumption of the intracellular proton by glutamate , arginine and lysine decarboxylases , and exchange of the amine products with extracellular amino acids through their respective antiporters [ 2,3,5,6 ] . 
+ The most effective system of acid resistance is the GAD ( glutamic acid-dependent ) system which is composed of two glutamate decarboxylase isozymes , GadA and GadB , and the cognate antiporter GadC . 
+ Expression of these components is under the control of a complex network of transcription factors , including GadE , GadX , GadW , EvgA , YdeO , and H-NS [ 1 ] . 
+ YdeO is a transcription factor , belonging to the AraC/XylS family . 
+ Knowledge about the regulatory functions of YdeO is limited except that it is known that YdeO activates transcription of the gad system components , gadE , gadA and gadBC [ 7,8,9 ] . 
+ The expression of ydeO is activated by the two-component system EvgSA [ 9,10,11 ] , forming a regulatory cascade , EvgA . 
+ YdeO . 
+ GadE [ 9,12 ] . 
+ In this study , we performed a comprehensive interrogation of YdeO-binding sites in vivo on the E. coli genome using ChIP-chip analysis , and identified a set of YdeO-regulated genes , including four stress-response transcription factors , DctR , NhaR , GadE , and GadW , and several genes involved in respiration . 
+ Taking these observations together we propose that YdeO is the regulator which coordinates the response to acid and anaerobic conditions in E. coli . 
+ Materials and Methods
+ E. coli strains and growth conditions
+ E. coli strains and plasmids used in this study are shown in Table S1 . 
+ E. coli cells were grown at 37uC in Luria-Bertani ( LB ) medium . 
+ Cell growth was monitored by measuring the turbidity with a Mini photo 518R spectrophotometer ( Taitec ) . 
+ The standard procedure for bacterial cell cultivation in this study was as follows : A single colony was isolated from an overnight culture on a LB agar plate , and inoculated into 5 ml of fresh LB medium . 
+ This liquid culture was grown overnight at 37uC , and the overnight culture was diluted 100-fold into fresh LB medium . 
+ The culture was incubated at 37uC with reciprocal shaking ( 160 revolutions min ) for aerobiosis or without shaking for anaerobiosis . 
+ 21 
+ Introduction of a tagged gene into the E. coli genome
+ The introduction of a tagged gene into the E. coli genome was carried out using the method of Uzzau et al. [ 13 ] . 
+ In brief , primers were used to make PCR extensions homologous to the last portion of the targeted gene ( forward primer ) and to a region downstream of it ( reverse primer ) as follows ; YDEOF-1 ( forward ) and YDEOR-1 ( reverse ) for ydeO-3xflag ; GADE-F ( forward ) and GADE-R ( reverse ) for gadE-3xflag ; GADW-F ( forward ) and GADW-R ( reverse ) for gadW-3xflag ( Table S1 ) . 
+ Amplified DNA fragments including the 39 sequence with flag tag and a kanamycin-resistance gene were amplified by PCR using pSUB11 as a template , a pair of primers , and Ex-Taq DNA polymerase ( Takara Bio ) . 
+ PCR products were purified using a QIAquick PCR purification kit ( Qiagen ) , and then used directly for electro-transformation . 
+ E. coli carrying a lambda-Red helper plasmid , pKD46 , was used to make competent cells , and were grown at 30uC in LB medium supplemented with 100 mg ml21 ampicillin and 1 mM arabinose to an OD600 of 0.4 . 
+ Cells were collected by centrifugation , and washed two times with ice-cold sterile deionized water containing 10 % glycerol . 
+ Aliquots ( 50 ml ) of the bacterial suspensions in 10 % glycerol were mixed with more than 1 mg of PCR product in a chilled cuvette ( 0.2 cm electrode gap ) and subjected to a single pulse ( 2.5 kV ) by a Gene pulser Xcell ( Bio Rad ) . 
+ After 1 hr recovery at 37uC in 1 mL of SOC medium ( 2 % tryptone , 0.5 % yeast extract , 10 mM NaCl , 2.5 mM KCl , 10 mM MgCl2 , 10 mM MgSO4 , 20 mM glucose ) containing 1 mM arabinose , half of the volume of electroporated bacteria in SOC media were spread on to LB agar plates supplemented with antibiotics for the selection of kanamycin-resistant recombinants . 
+ If none grew on the agar plate after incubation overnight at 37uC , the remainder stored was spread on to LB kan plates . 
+ The kanamycin-resistance recombinants were isolated once on LB agar at 37uC , and then examined for ampicillin sensitivity for loss of the helper plasmid . 
+ Construction of YdeO expression plasmids
+ To construct pYY0401 for YdeO-3xFLAG expression , DNA fragments containing the ydeO coding region were amplified by PCR using E. coli YY5001 genomic DNA , including the 3xflag tag at the end of ydeO as a template , and a pair of primers , YDEOF-2 and YDEOR-3 , in which the Bam HI and Eco RI sites were included ( see sequences in Table . 
+ S1 ) . 
+ After digestion of PCR products with Bam HI and Eco RI , the PCR-amplified fragments were cloned into the pTrc99A vector containing an inducible trc promoter between the Bam HI and Eco RI sites . 
+ To construct pYdeO for expression of intact YdeO , DNA fragments containing the ydeO coding region were amplified by PCR using E. coli W3110 type A [ 14 ] genomic DNA as a template and the primers , YDEOF-2 and YDEOR-2 ( see sequences in Table . 
+ S1 ) . 
+ After digestion of the PCR product with Bam HI and Eco RI , the PCR-amplified fragments were ligated into the pTrc99A vector between appropriate restriction enzyme sites . 
+ To construct pYdeO-SUMO for overproduction of SUMO ( Small Ubiquitin-related MOdifier ) fused YdeO , DNA fragments containing the ydeO coding region were amplified by PCR using E. coli BW25113 genomic DNA as a template and the primers , YDEO-SUMO-F and YDEO-SUMO-R , in which 15-nt homologous to pE-SUMO vector ( Life Sensors ) digested with Bsa I were included ( see sequences in Table . 
+ S1 ) . 
+ The PCR-amplified fragments were cloned into the pE-SUMO vector using In-Fusion HD cloning kit ( Clontech ) . 
+ All of the plasmids were confirmed by DNA sequencing with primers , Trc99A-F and/or Trc99A-R for pTrc99A derivatives and T7 terminator and SUMO forward for pE-SUMO derivatives . 
+ Construction of lacZ and lux reporter plasmids
+ To construct a lacZ fusion gene , the pRS552 plasmid was used as a vector for the construction of translational fusions [ 15 ] . 
+ The promoter DNA fragment was amplified by PCR using the genome of E. coli W3110 type-A strain [ 14 ] as a template and a pair of primers . 
+ The primers used were : APPC-LF and APPC-LR for pAPPC-L ; YIIS-LF and YIIS-LR for pYY0503 ; HYAA-LF and HYAA-LR for pHYAA-L ( Table S1 ) . 
+ The PCR product was digested with BamH I and/or EcoR I and then ligated into pRS552 at the corresponding sites . 
+ A nhaR-lux transcription fusion was also constructed . 
+ First , DNA fragments containing the nhaR promoter were amplified by PCR using the primers : NHAR-lux-F and NHA-lux-R , which contained 15-nt homologous to the pLUX vector [ 16 ] digested with Xho I and Bam HI were included ( see sequences in Table . 
+ S1 ) . 
+ The PCR-amplified fragments were cloned into the pLUX vector using In-Fusion HD cloning kit ( Clontech ) , resulting in the construction of pLUXnhaR ( Table S1 ) . 
+ All of the plasmids were confirmed by DNA sequencing using the lacZ-30R primer complementary to lacZ or Lux-R primer complementary to luxC in a vector . 
+ ChIP-chip analysis
+ The ChIP-chip assay was carried out as described in previous reports [ 17,18,19 ] with a few modifications . 
+ YY0201 ( DydeO ) harbouring pYY0401 ( ydeO-3xflag ) was grown to an OD600 of 0.4 then re-incubated in LB medium containing formaldehyde ( final concentration of 1 % ) at 37uC for 30 min . 
+ The cross-linking reaction was terminated by the addition of glycine , and cells were collected , washed , re-suspended with lysis buffer , and lysed by incubation with Lysozyme . 
+ Lysed cells were dissolved in 4 ml of IP buffer containing PMSF . 
+ The sample was then sonicated 60 times for 30 sec at 30 sec intervals on ice using a BRANSON Digital Sonifier ( Branson ) . 
+ After centrifugation , the supernatant fraction ( whole cell extract ) was mixed with anti-FLAG antibody ( Sigma Aldrich ) - coated-protein A Dynal Dynabeads ( Invitrogen ) and incubated at 4uC overnight . 
+ After washing twice with IP buffer and IP salt buffer , the DNA -- YdeO-3xFLAG complex bound to the beads was recovered by eluting with elution buffer ( 50 mM Tris -- HCl pH 7.5 , 10 mM EDTA , 1 % SDS ) . 
+ YdeO-3xFLAG in whole cell extracts and in immunoprecipitated DNA fractions were digested by Pronase ( Roche ) . 
+ DNA fragments free of cross-linked DNA -- protein were purified using a QIAquick PCR purification kit ( Qiagen ) . 
+ Recovered DNA fragments were amplified according to the random DNA amplification method using the primers , PF 43 and PF 44 described by Katou et al. [ 17 ] . 
+ PCR was performed over 30 cycles , using Phusion high-fidelity DNA polymerase ( New England Biolabs ) . 
+ Amplified DNA fragments were terminally labeled and hybridized with the custom-designed Affymetrix oligonucleotide tiling array and raw data ( CEL files ) were processed using the Array edition of the In Silico Molecular Cloning ( IMC ) software ( In Silico Biology ) as previously described [ 18,19,20 ] . 
+ To detect DNA fragments by immunoprecipitation , the signal intensities of ChIP DNA were divided by those of the supernatant ( Sup ) fraction . 
+ Pufirication of the YdeO protein
+ In a typical procedure [ 21 ] , a single colony of transformed E. coli BL21 ( DE3 ) was grown to OD600 = 0.6 at 37uC with shaking in LB medium supplemented with 100 mg ml21 ampicillin . 
+ The culture was then cooled on ice , induced with 4.5 mM IPTG , and incubated at 20uC overnight with shaking . 
+ Cells were isolated by centrifugation and resuspended in 400 mL of lysis buffer ( 100 mM NaCl , 50 mM Tris-HCl pH 8.0 ) containing 0.2 mM PMSF . 
+ Cells were treated with lysozyme and then subjected to sonication . 
+ Triton X-100 was added to 1 % ( v/v ) and incubated on ice for 1 hr . 
+ The culture was centrifuged , and the supernatant was decanted and stored at 4uC . 
+ Supernatant was mixed with 2 ml of 50 % Ni-nitrilotriacetic acid ( NTA ) agarose solution ( Qiagen ) and loaded onto a column . 
+ The column was washed with 10 ml of lysis buffer containing 1 % Triton X-100 , and then washed with 10 ml of lysis buffer containing 1 % Triton X-100 and 25 mM imidazole . 
+ Proteins were eluted with 3 ml of each elution buffer ( lysis buffer containing 1 % Triton x-100 and 0.1 M , 0.2 M , 0.3 M , 0.4 M , or 0.5 M imidazole ) , and peak fractions of transcription factors were pooled and dialyzed against a storage buffer ( 50 mM Tris-HCl , pH 7.5 at 4uC , 200 mM KCl , 10 mM MgCl2 , 0.1 mM EDTA , 5 mM DTT , and 50 % glycerol ) , and stored at -- 80uC until use . 
+ Protein purity was checked on SDS-PAGE . 
+ Preparation of total RNA from E. coli cells
+ Total RNA was prepared using the as previously described [ 22 ] . 
+ A single colony of E. coli was grown in LB medium to OD600 = 0.3 at 37uC with shaking . 
+ Cells were harvested and total RNAs were prepared using hot phenol . 
+ In brief , total RNA was extracted with H2O-saturated phenol and precipitated with ethanol . 
+ After digestion with RNase-free DNase I ( Takara Bio ) , total RNA was extracted with H2O-saturated phenol and precipitated with ethanol , and dissolved in RNase-free water . 
+ The concentration of total RNA was determined by measuring the absorbance at 260 nm . 
+ The purity of total RNA was checked by agarose gel electrophoresis . 
+ Transcriptome analysis
+ To prepare fluorescently labeled cDNA , total RNA ( 5 mg ) was used . 
+ We used the FairPlay III Microarray Labeling kit ( Agilent ) , CyDye Cy3 mono-reactive Dye , and CyDye Cy5 mono-reactive Dye ( GE Healthcare ) . 
+ For all experiments , two sets of RNAs from an independent colony were carried out with a pair of the fluorescence dye . 
+ The mixture containing 1 ml of Ramdom hexanucleotide primers , 5 mg of total RNA , and 12 ml of DEPC-treated water was heated at 75uC for 10 min and cooled to room temperature . 
+ After addition of 3 ml of Affinity script HC RTase ( Agilent ) , 1X Affinity script RT buffer , 1X dNTP mixture , 75 mM DTT , and 0.5 ml of RNase block to 10 ml of RNA/primer mixture product , cDNA synthesis was carried out at 42uC for 1 hr and stopped by addition of 10 mM NaOH . 
+ The mixture was neutralized by addition of 10 mM HCl . 
+ The synthesized cDNA was purified by ethanol-precipitation and then labelled by CyDye Cy3 mono-reactive Dye or CyDye Cy5 mono-reactive Dye . 
+ The dye-coupled cDNA was purified by attached the micro spin cup . 
+ The E. coli Gene Expression Microarray microarray 8615 K ( Agilent ) was used . 
+ Each 300 ng of Cy3 - and Cy5-labeled cDNA were mixed and added to 1X Blocking Buffer ( Agilent ) and 1X HI-RPM GE Hybridization Buffer ( Agilent ) . 
+ After precipitation of impurities , 40 ml of the labelled-cDNA mixture was applied to the 
+ DNA chip , and the hybridization was carried out at 65uC for 17 hr . 
+ The DNA chip was washed at room temperature with Agilent Gene Expression Wash Buffer 1 ( Agilent ) and at 37uC with Agilent Gene Expression Wash Buffer 2 ( Agilent ) . 
+ The DNA chip was scanned with an Agilent G2565CA microarray scanner Ver . 
+ 8.1 , and the intensities of both Cy3 and Cy5 were quantified by Feature Extraction Ver . 
+ 8.1 . 
+ And then , the Cy5/Cy3 ratios were calculated from the normalized values . 
+ RT-qPCR
+ Total RNAs were transcribed to cDNA with random primers using Primer Script 1 strand cDNA synthesis Kit ( Takara Bio ) . 
+ st Quantitative PCR ( qPCR ) was conducted using SYBR Green PCR Master Mix ( Applied Biosystems ) . 
+ Pairs of primers used are described in Table S1 . 
+ The cDNA templates were twofold serially diluted and used in the qPCR assays . 
+ The qPCR reaction mixtures , each containing 12.5 ml of 2X Power SYBR Green PCR Master Mix ( Applied Biosystems ) , 0.225 ml of each primer ( 10 mM stock ) , 9.55 ml of water , and 2.5 ml of cDNA , were amplified under the following thermal cycle conditions of : 50uC for 2 min and 95uC for 10 min followed by 40 cycles of 15 sec at 95uC and then 60 sec at 60uC . 
+ The expression levels of the 16 S rRNA gene were used for normalization of data , and the relative expression levels were quantified using ` Delta -- delta method ' presented by PE Applied Biosystems ( Perkin Elmer ) as described in previous reports [ 23,24 ] . 
+ The results presented are averages of the results from the replicate experiments 6 standard errors of the means ( SEM ) . 
+ EMSA
+ Probes were amplified by PCR using the previously constructed reporter plasmids as templates , with a pair of primers : a specific primer and an FITC-labeled primer . 
+ PCR products with FITC at their termini were purified using the QIAquick PCR purification kit ( Qiagen ) . 
+ For gel shift assays , mixtures of the FITC-labeled probes and purified SUMO-YdeO were incubated at 37uC for 30 min in gel shift buffer ( 50 mM Tris-HCl , pH 7.8 at 37uC , 50 mM NaCl , 3 mM Mg acetate , 0.1 mM EDTA , 0.1 mM DTT , and 0.37 mM BSA ) containing 0.2 mg ml21 salmon sperm DNA . 
+ After addition of a DNA dye solution , the mixture was directly subjected to 4 % or 7 % PAGE . 
+ Fluorescent-labeled DNA in gels was detected using Typhoon 9410 ( Amersham Biosciences ) . 
+ DNase I footprinting analysis
+ The probe was amplified by PCR using a pLUXgadWp as a template , primer pairs GADW-F-2 and Lux-R-FITC , and Ex Taq DNA polymerase ( Takara ) . 
+ 1.0 pmol of a FITC-labeled probe was incubated at 37uC for 30 min with purified SUMO-YdeO ( 0.5 to 15 pmol ) in 25 ml of gel shift buffer ( 50 mM Tris-HCl , pH 7.8 at 37uC , 50 mM NaCl , 3 mM Mg acetate , 0.1 mM EDTA , 0.1 mM DTT , and 0.37 mM BSA ) . 
+ After incubation for 30 min , DNA was digested by DNase I ( Takara Bio ) for 30 s at 25uC , and then the reaction was terminated by addition of phenol . 
+ DNA was precipitated by ethanol , dissolved in formamide dye solution , and analyzed by electrophoresis on a DNA analyzer DSQ-2000L ( Shimadu ) . 
+ Measurement of luciferase activity in E. coli
+ A single colony of a strain freshly transformed with one of the luciferase reporter plasmids ( Table S1 ) was grown in LB medium supplemented with 50 mg ml21 kanamycin to OD600 = 0.3 at 37uC with shaking . 
+ At this point , the culture was transferred to a micro-titer plate ( 96-well micro-titer ) to start monitoring reporter activity measurement in an automated plate reader MTP-880 ( Corona ) . 
+ The Lux ( luciferase activity ) reads were then divided by the equivalent OD reads ( Lux/OD ) to approximate Lux activity unit per cell mass for each well . 
+ The Lux/OD values of the three technical replicate wells of each culture were averaged . 
+ Measurement of b-galactosidase activity in E. coli
+ E. coli cells were grown in LB medium and subjected to measurement of b-galactosidase activity with o-nitrophenyl-D-galactopyranoside as described in the previous report [ 11 ] . 
+ Western blotting analysis.
+ E. coli cells grown in LB medium were harvested by centrifugation and re-suspended in 0.4 ml lysis buffer containing 8 M urea and sonicated . 
+ After centrifugation , the same volume of supernatant was subjected to 15 % SDS-PAGE and blotted on to PVDF membranes using an iBlot semi-dry transfer apparatus ( Invitrogen ) . 
+ Membranes were first immuno-detected with antiFLAG ( Sigma ) , anti-NhaR serum ( Lab stock ) , or anti-a ( Neoclone ) and HRP-conjugated anti-mouse IgG ( Nacalai Tesque ) antibodies and then developed with a chemiluminescence kit ( Nacalai Tesque ) . 
+ The image was analyzed with a LAS-4000 IR multi colour imager ( Fuji Film ) . 
+ Results
+ Identification of YdeO associated sites in vivo within the E. coli genome 
+ To identify the genes directly regulated by YdeO , we first determined the genome-wide distribution of YdeO-binding sites by ChIP-chip ( Chromatin ImmunoPreciptation-DNA chip ) analysis . 
+ For this purpose , we inserted a 3xflag tail into the 39 end of the ydeO gene in the genome and tried to prepare YdeO-DNA complexes for ChIP-chip analysis from the YY5001 strain harbouring ydeO-3xflag grown in LB medium at 37uC with shaking . 
+ The level of YdeO-3xFLAG expression was , however , not enough to isolate YdeO-DNA complexes using the anti-FLAG antibody . 
+ We then constructed plasmid pYY0401 for the expression of YdeO-3xFLAG and transformed it into the ydeO-deficient mutant . 
+ The ydeO-deficient mutant transformed with pYY0401 was grown until it reached log phase and was then treated with formaldehyde for DNA-protein cross-linking . 
+ The E. coli cells were disrupted with sonication to prepare a whole cell extract from which YdeO-DNA complexes were isolated , sonicated and subjected to immune-precipitation using anti-FLAG antibody . 
+ After the pronase treatment , ChIP DNA fragments were isolated from the YdeO-DNA complexes for mapping on the genome . 
+ As an internal reference for the specific binding of YdeO with its targets , we interrogated the association of YdeO with the gadE promoter , the only known target of YdeO . 
+ After PCR amplification from the ChIP DNA samples using specific primers , the gadE promoter could be specifically amplified ( data not shown ) . 
+ To identify the genome-wide YdeO-binding sites on the entire E. coli genome , Sup ( the whole extract DNA ) and ChIP samples were each labelled and subjected to hybridization on a tiling array . 
+ Seven chromosomal regions were determined with high-level signal peaks indicating YdeO-binding , which were distinguishable from the background intensities ( Fig. 1 ) , including the gadEp2p3 promoters ( Fig. 1F ) , the only known direct target of YdeO [ 9 ] . 
+ Six additional YdeO-binding sites were identified by ChIP-chip and were located within intergenic chromosomal regions . 
+ These included the intergenic spacer between yccA ( an inner membrane protein ) and hyaA ( hydrogenase I ) ( Fig. 1B ) ; the intergenic spacer upstream of appC ( cytochrome bd-II oxidase ) ( Fig. 1C ) ; the intergenic spacer upstream of the yiiS gene ( a conserved protein ) 
+ Identification of YdeO-binding in vitro to the seven targets
+ In order to confirm the direct interaction of YdeO to the seven target sequences determined by ChIP-chip , we performed the EMSA assay . 
+ Firstly we failed to purify the YdeO protein using the pET system , because the over-expressed YdeO proteins formed inclusion bodies in E. coli cells . 
+ Next YdeO was over-expressed as a His-SUMO fusion , and the His-SUMO-tagged YdeO protein could be purified in soluble forms by affinity chromatography with Ni-NTA agarose ( data not shown ) . 
+ After treatment with SUMO protease to remove the His-SUMO tag , the intact YdeO protein , however , became insoluble . 
+ Then we used this His-SUMO-tagged 
+ YdeO as the test protein . 
+ The purified His-SUMO-YdeO protein bound to the gadEp2p3 promoters , the only known target of YdeO ( Fig. 2A-f ) , in good agreement with the previous report [ 9 ] . 
+ Besides the gadE promoter , His-SUMO-YdeO formed complexes with the nhaR ( Fig. 2A-a ) , hyaA ( Fig. 2A-b ) , yiiS ( Fig. 2A-d ) , gadW ( Fig. 2A-e ) , and slp ( Fig. 2A-g ) promoters , which were observed as a smeared band , in the presence of 10-fold molar excess of YdeO over the DNA probes . 
+ A detectable level of the YdeO-probe complex was not formed with the appC promoter even in the presence of 35-fold molar excess of YdeO ( Fig. 2A-c ) . 
+ These results indicate that YdeO directly binds to at least these six sites . 
+ YdeO-DNA was detected as a smeared band in several cases , implying the cooperative binding of YdeO at the higher concentration . 
+ Since the association of YdeO with the appC promoter was observed only in vivo ( see Fig. 1 ) , this association might require another factor ( s ) for effective binding . 
+ Regulation in vivo of the predicted targets by YdeO: Transcriptome and RT-qPCR assays
+ We analyzed the alteration in the E. coli K-12 transcriptome caused by the over-expression of YdeO from a plasmid . 
+ E. coli KP7600 harboring pYY0401 ( ydeO-3xflag ) or the empty expression vector , pTrc99A , were grown until log phase under the same conditions used for ChIP-chip analysis , and total RNAs from these cultures were subjected to transcriptome analysis . 
+ Amongst genes downstream of a YdeO-binding site , 19 genes , ( nhaA , nhaR , hyaA , hyaB , hyaC , hyaD , hyaE , hyaF , appC , appB , appA , yiiS , yiiT / uspD , slp , dctR , gadE , mdtE , mdtF , and gadW ) were induced more than 3-fold by the over-expression of YdeO ; while three genes , yccA , yiiR , and yhiS , were not affected in both duplicate experiments . 
+ ( Table S2 and see also Table 1 ) . 
+ These 19 genes induced by YdeO constitute a total of 7 transcriptional units , nhaAR , hyaABCDEF , appCBA , yiiS-yiiT/uspD , slp-dctR , gadE-mdtEF , and gadW . 
+ All 7 of these operons carry promoters containing YdeO-binding sites ( see Fig. 2 ) , and thus should be under the direct control of YdeO . 
+ We also examined the induction of these transcriptional units by the expression of YdeO by RT-qPCR after expression of YdeO . 
+ Transcripts of some representative genes from each operon were measured using specific pairs of the respective primers ( Table S1 ) . 
+ Transcripts were found to increase for all seven operons , nhaAR , hyaABCDEF , appCBA , yiiS-uspD ( yiiT ) , gadW , gadE-mdtEF , and slp-dctR , in the ydeOexpressing cells ( Table 2 ) . 
+ We also measured the level of mRNAs in the ydeO-deficeint mutant , but detectable differences were not found for the mRNA from YdeO-target genes between the wild-type and the ydeO mutant . 
+ Transcript of yccA , an opposite direction gene from hyaA , was also not affected in the presence and absence of the YdeOexpressing plasmid ( Table 2 ) . 
+ Although the yiiS and uspD genes , encoding conserved proteins with unidentified function , were expressed even without the over-expression of YdeO , their expressions were further increased after YdeO expression . 
+ These results altogether indicate that YdeO plays a role as a positive regulator for expression of all seven operons , nhaAR , hyaABC-DEF , appCBA , yiiS-yiiT/uspD , gadW , gadE-mdtEF , and slp-dctR . 
+ Regulation in vivo of the predicted targets by YdeO: Reporter assay
+ To confirm the positive role of YdeO on expression of the newly identified target promoters , we performed the reporter assay using the lacZ reporter [ 15 ] and lux reporter [ 16 ] systems . 
+ The translation fusions , hyaA-lacZ , appC-lacZ , and yiiS-lacZ , on the pRS552 derivative plasmids were introduced at the attachment ( att ) site of the E. coli YY0201 chromosome using the lRS45 phage , resulting in isolation of HYAA-JL ( hyaA-lacZ ) , APPC-JL ( appC-lacZ ) , and YY1101 ( yiiS-lacZ ) . 
+ Three E. coli lysogens containing hyaA , appC , and yiiS translational lac fusions in their chromosomes were transformed with either the YdeO-expression plasmid or the vector plasmid . 
+ The b-galactosidase activities in these transformants were measured in log-phase ( Fig. 3A ) . 
+ YdeOexpression was found to induce the expression of all these test promoters , hyaA-lacZ , yiiS-lacZ , and appC-lacZ ( Fig. 3A and B ) . 
+ In the cases of hyaA-lacZ and yiiS-lacZ , the promoter activity increased approximately 1.5 fold upon expression of YdeO . 
+ The detectable level of expression was not observed for appC-lacZ in the absence of YdeO expression but a high-level of appC-lacZ activity was detected upon expression of YdeO ( Fig. 3B ) . 
+ The result indicates YdeO has a positive role in activation of the appC , hyaA , and yiiS promoters , in agreement with the observation by transcriptome and RT-qPCR ( see above ) . 
+ The nhaR , slp , gadE and gadW promoters were too weak for quantitation by the LacZ reporter system , so we then employed the more sensitive Lux reporter system . 
+ The lux reporter plasmids of four transcription fusions , slp-lux , gadE-lux and gadW-lux ( kindly provide by Peter Lund [ 16 ] ) and the nhaR-lux plasmid [ constructed in this study ] , were introduced into YY0201 E. coli carrying either the vector plasmid or the YdeO-expression plasmid . 
+ The expression of nhaR-lux , slp-lux , and gadE-lux was found to be activated in the presence of the YdeO-expressing plasmid ( Fig. 3A ) , indicating that YdeO is also a positive regulator for these promoters . 
+ Recently RNA-seq analysis indicated the presence of a novel nhaR promoter inside the coding region of nhaA [ 25 ] . 
+ The binding site of YdeO is located upstream of this putative promoter ( see above ) . 
+ Accordingly the constructed nhaRlux reporter plasmid containing this novel nhaR promoter was also activated in the presence of YdeO expression ( Fig. 3A ) . 
+ The expression level of gadW-lux stayed unaltered with and without the YdeO-expression plasmid . 
+ It is inconsistent with the RT-qPCR result that the mRNA level of gadW increased in the presence of YdeO expression as detected by RT-qPCR ( Table 2 ) . 
+ This apparent disagreement might be due to translational inhibition of gadW-lux by the anti-sense RNA of gadW , named gadY , encoded in the gadW-lux plasmid . 
+ Recognition sequence of YdeO transcription factor
+ To identify the YdeO-binding sequence , we performed DNase I footprinting of the gadW promoter with increasing concentrations of YdeO . 
+ At low protein levels , YdeO protected the region from -- 53 to +8 of the gadW promoter ( Fig. 2B , lanes 2 -- 4 ) . 
+ In the presence of 15-fold molar excess of YdeO , the protected region by YdeO expanded from -- 53 to +84 of the gadW promoter possibly due to protein-protein interaction ( Fig. 2B , lane 5 ) in agreement with the smeared band formation observed by EMSA ( see above ) . 
+ Within the core YdeO-binding region , the inverted repeat of hexa-nucleotides , 59-ATTTCA-39 , was identified ( see Figs. 2C and 4A ) . 
+ Using this YdeO-box sequence , we searched for this inverted repeat within the seven YdeO-binding regions detected by in vivo by ChIP-chip analysis , and identified this inverted repeat sequence of all the YdeO-binding regions at various positions between -- 131 to -- 1 with respect to the transcription start site ( Fig. 4 ) . 
+ The length of spacer between the 59-ATTTCA-39 hexa-nucleotide sequence ranges from 9 to 21 nucleotides ( Fig. 4A ) . 
+ Recent studies show that YpdB and YehT bind to the direct repeat of their specific sequence separated by a 9 - and 13-bp spacer , respectively , in E. coli [ 26,27 ] . 
+ Previous work shows that the spacer length of the specific DNA binding region is diverse for the E. coli transcription factor CpxR [ 28 ] . 
+ Therefore , we have denoted the inverted repeat as the YdeO-box ( Fig. 4 ) . 
+ Induction of NhaR, GadE, and GadW by YdeO
+ Four transcription factors , the LysR-type NhaR , the LuxR-type GadE , and the AraC-type GadW , were found to be under the direct control of YdeO ( see Figs. 1 -- 4 ) . 
+ NhaR is an activator of a sodium/proton antiporter gene [ 29 ] and both GadE and GadW are involved in regulation of the genes for glutamate-dependent acid resistance system [ 8,9 ] . 
+ In addition to these three transcription factors , the gene encoding the CadC-like transcription factor DctR is located downstream of the slp gene which codes for a starvation lipoprotein , and is considered to be co-transcribed with the slp gene . 
+ DctR is involved in protection against metabolic endproducts under acidic conditions [ 30 ] . 
+ To examine the involvement of YdeO in control of expression of the three transcription factors , the cellular level of these proteins in E. coli , with or without the YdeO-expressing plasmid , were measured by 
+ Western blotting assay.
+ To perform the Western blotting assay of GadE and GadW using the anti-FLAG antibody , we constructed E. coli strains YY5002 and YY5003 including 3xflag tag at the 39-terminal end of gadE and gadW , respectively , on the E. coli chromosome . 
+ The YdeO-expression plasmid , pYdeO ( ydeO ) , and the empty expression vector pTrc99A were transformed into these E. coli strains and the transformants were grown in LB medium until log phase . 
+ The whole-cell lysates were prepared , and subjected to Western blotting assay by using anti-NhaR , anti-FLAG , and anti-RpoA for detection of NhaR , GadE-3xFLAG and GadW-3xFLAG , and RNA polymerase a subunit , respectively . 
+ All transformants with or without the YdeO-expressing plasmid retain approximately a constant amount of the a subunit of RNA polymerase ( data not shown ) . 
+ The level of GadE increased in the YY5002 harboring the YdeO-expression plasmid , supporting the prediction that the gadE gene is under the direct and positive control of YdeO . 
+ However , we failed to detect NhaR and GadW even in the presence of YdeO expression ( Fig. 5 ) . 
+ Search for the whole set of genes regulated by YdeO . 
+ GadE 
+ To obtain the gene expression profile of the YdeO . 
+ GadE cascade , we performed a transcriptome assay . 
+ E. coli wild-type KP7600 and gadE-deficient JD25278 harbouring pTrc99A and pYY0401 ( ydeO-3xflag ) were incubated in LB medium at 37uC with shaking until log phase and total RNA from these cultures was subjected to transcriptome analysis under standard experimental conditions as described in Materials and methods . 
+ The results revealed that a total of 106 genes were markedly affected by YdeO expression in the wild-type and included 53 up - and the same number of down-regulated genes ( Tables S2 and S3 ) . 
+ Among the 53 genes up-regulated by YdeO expression , clustering analysis showed 23 genes were induced in both the parent strain and the gadE-deficient mutant and 30 genes induced in the wild-type but not the gadE-deficient mutant ( Fig. 6 ) . 
+ The observed alteration of the transcriptome profile caused by deletion of the gadE gene was similar to that reported by Masuda and Church [ 31 ] . 
+ Genes induced in both strains are organized into a total of 12 transcriptional units ( Table 1 ) , including five transcription units , hyaABCDEF , appCBA , slp-dctR , and nhaAR , that are under the direct control of YdeO ( see above ) . 
+ On the other hand , the rest of the 30 up-regulated genes forming 21 transcription units were induced in the wild-type but not in the gadE-deficient mutant ( Table 1 ) , indicating that these 21 transcription units are under the direct control of GadE but the indirect control of YdeO . 
+ This set of 21 transcription units includes the hitherto identified GadE targets , gadA , gadB , and gadC [ 9 ] . 
+ On the other hand , detectable change was not observed in the transcription pattern between the parent strain and the gadW-deficient mutant , consistent with the lack of YdeO-dependent GadW expression under the conditions herein employed ( Figs. 3 and 5 ) . 
+ The yehX gene was induced by the YdeO-expression plasmid in both the parent strain and the gadE mutant but the osmF and yehY genes , and parts of osmF-yehYXW transcription unit , were not induced in the gadE mutant ( Fig. 6 and Table 1 ) , implying that GadE activates the known promoters located at the upstream of yehX which is possibly activated by YdeO . 
+ Physiological roles of YdeO in response to environmental stresses 
+ The level of translational control of the YdeO regulator itself ydeO-lacZ was analyzed using a reporter assay with the fusion . 
+ In E. coli YY0101 ( ydeO-lacZ ) grown under aerobic conditions , bgalactosidase activity from ydeO-lacZ increased two-fold under the acidic condition of pH 5.5 compared with pH 7.0 ( Fig. 3C ) . 
+ Interestingly the high level of ydeO-lacZ was detected in both pH 5.5 and 7.0 when E. coli were grown under anaerobic conditions ( Fig. 3C ) , implying that YdeO plays a role in E. coli respiration under anaerobic conditions , such as in the animal intestine . 
+ Previously , we identified that the transcription of ydeO is induced by exposure to ultraviolet light via the two-component system EvgSA two-component system [ 11 ] . 
+ In agreement with this finding , ydeO expression was not induced in the evgA-defective mutant under both acidic and anaerobic conditions ( data not shown ) . 
+ Discussion
+ The YdeO regulon
+ Here we have identified a total of seven YdeO-binding sites on the E. coli genome using ChIP-chip and transcription analyses in vivo . 
+ The EMSA experiments showed that purified YdeO also binds in vitro to these six sites ( see Fig. 2 ) . 
+ The reporter and RT-qPCR assays indicated that all of the promoters located downstream of these YdeO-binding sites are activated by YdeO ( see Table 1 and Fig. 3 ) . 
+ The hexa-nucleotide repeat 59-ATTTCA-39 , which we have named the YdeO box , is conserved in all of YdeO-binding sites we identified experimentally ( see Fig. 4 ) . 
+ Even though this YdeO-box like sequence exists within the appC promoter , which is located immediately downstream of a YdeO-binding site ( see Fig. 1 ) , the binding in vitro of YdeO to the appC promoter probe was not high ( see Fig. 2 ) , implying that an as yet unidentified additional transcription factor or DNA secondary structure is needed for efficient binding of YdeO to the target promoter . 
+ Since the appC promoter is transcribed in vivo by RNA polymerase containing the RpoS sigma factor and is induced by 
+ AppY [ 32,33 ] , one possibility is that AppY and/or RpoS sigma are required for the efficient binding of YdeO to the appC promoter . 
+ Thus , we conclude that YdeO is a positive regulator for transcription of operons controlled by seven promoters , the nhaR promoter , hyaA promoter , appC promoter , yiiS promoter , slp promoter , gadE promoter , and gadW promoter ( Fig. 7 ) . 
+ Transcription cascade: EvgSA . YdeO . NhaR, GadE, GadW
+ E. coli responds to temporary low pH using the glutamate-dependent acid resistant system , which involves two complex regulatory systems : EvgAS . 
+ YdeO . 
+ GadE ; and Crp . 
+ RpoS . 
+ GadX . 
+ GadW [ 9,12 ] . 
+ In this study , we showed that YdeO directly regulates the expression of three transcription factor genes , nhaR , gadE , and gadW ( see Fig. 7 ) , proving the novel transcription cascade : EvgAS . 
+ YdeO . 
+ NhaR/GadE/GadW . 
+ YdeO not only plays a regulatory role in positive feedback loop of EvgAS . 
+ YdeO . 
+ GadE pathway , but also a positive role in the GadXW pathway , thereby linking the GadE - and GadXWpathways for acid resistance . 
+ The GadXW circuit is believed to function during stationary phase . 
+ YdeO-overexpression induced GadE-dependent transcription of the gadW gene but GadW protein was not detected in growing E. coli cell ( Fig. 6 ) , suggesting that stationary phase specific factors are required for GadW . 
+ Transcriptome analysis identified the set of genes directly regulated by YdeO or indirectly through the YdeO . 
+ GadE cascade ( Table 2 ; see Fig. 7 for the summary model ) . 
+ GadE induced by YdeO stimulated the transcription of hdeAB-yhiD , hdeD , gadAX , gadCB , mdtEF , gadW , and yhiM as well as those previously reported promoters [ 9,34,35 ] . 
+ The GAD cluster including hdeAB-yhiD , hdeD , gadAX , gadCB , gadE , mdtEF , and gadW , is necessary for glutamine-dependent acid resistance [ 9,34 ] . 
+ Recently the yhiM gene was reported to be essential for growth at pH 2.5 and is necessary for glutamine - and lysine-dependent acid resistance , but is not required for arginine-dependent acid resistance [ 36 ] . 
+ In addition of these operons , the YdeO . 
+ GadE cascade induced a total of 19 operons including aidB , blc , cbpAM , elaB , gabDTP , pagP , sufABCDSE , ycaC , yfcG , ygaM , and yjjUV ( see Table 1 ) , of which the yfcG gene encodes a disulfide reductase [ 37 ] and the sufABCDSE operon encodes the complex biosynthetic machinery for iron-sulfur clusters in several enzymes which have critical cysteine residues [ 38 ] , suggesting a relationship between the function of YdeO and cysteine metab-olism . 
+ The physiological role of the YdeO regulon
+ In addition of the hitherto-identified target gadE , we have identified a total of seven operons belonging to the YdeO regulon . 
+ The expression of ydeO is induced under acidic conditions ( see Fig. 3C ) . 
+ In good agreement , the gadE operon encodes the master activator for expression of gadA and gadBC , which are involved in the glutamate-dependent acid resistance system which works for consumption of intracellular protons by glutamate decarboxyl-ation . 
+ In addition to acid conditions , the expression of ydeO is also induced under anaerobic growth in both neutral and acid conditions . 
+ Two YdeO-regulated targets , hyaABCDEF and appCBA , encode a hydrogenase and a quinone oxidase , respectively , both being involved in bacterial respiration . 
+ The HyaABC complex oxidizes dihydrogen to two protons , following release of them to the outside of the membrane , and donation of the electrons to the quinone pool . 
+ The AppBC complex donates electrons taken by a quinone to intracellular oxygen , consuming an intracellular proton per electron ( Fig. 7 ) , resulting in H2O production via oxygen [ 39 ] . 
+ Thus , the hyaABCDEF operon contributes to the consumption of the intracellular proton while the appCBA operon contributes to the utilization of reduced quinone . 
+ Taken together , these physiological systems activated by YdeO stimulate stress response and respiration . 
+ These findings also suggest that YdeO activated genes play an important role in primary adaptation , which enables the cell to colonize animal intestines by contributing to adaptation to acidic conditions in the stomach and to anaerobic conditions in the intestine . 
+ Supporting Information
+ nucleotides used in this study . 
+ E. coli K-12 derivatives used in this study were indicated with characterizations . 
+ The used bacteriophage and plasmids were also shown . 
+ Oligonucleotides were represented with DNA sequences . 
+ ( DOCX ) in E. coli KP7600 . 
+ Transcriptome analysis was performed using total RNAs from KP7600 harboring pTrc99A ( vector ) and pYY0401 ( ydeO-3xflag ) as described in Materials and methods . 
+ The E. coli Gene Expression Microarray microarray 8615 K ( Agilent ) hybridized by the fluorescent cDNAs was scanned with an Agilent G2565CA microarray scanner Ver . 
+ 8.1 , the intensities of both Cy3 and Cy5 were quantified by Feature Extraction Ver . 
+ 8.1 , and then , the Cy5/Cy3 ratios were calculated from the normalized values . 
+ Acknowledgments
+ We gratefully acknowledge Jon Hobman ( Nottingham University ) for valuable comments and proofreading of the manuscript . 
+ This work is supported by MEXT-Supported Program for the Strategic Research Foundation at Private Universities and Special Coordination Funds for Promoting Science and Technology and JSPS-DST International Collaborations . 
+ We are grateful to Peter Lund for providing lux plasmids . 
+ We also thank the National BioResource Project ( NBRP ) of Japan for providing E. coli strains . 
+ Author Contributions
+ Conceived and designed the experiments : YY TO KY. . 
+ Performed the experiments : YY TO . 
+ Analyzed the data : YY TO AI KY. . 
+ Contributed reagents/materials/analysis tools : TO AI KY. . 
+ Contributed to the writing of the manuscript : TO AI KY. . 
+ 39 . 
+ Borisov VB , Gennis RB , Hemp J , Verkhovsky MI ( 2011 ) The cytochrome bd respiratory oxygen reductases . 
+ Biochim Biophys Acta 1807 : 1398 -- 1413 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/25517076.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/25517076.txt 0 → 100644
View file @27818a9
+ Metagenomic chromosome conformation
+ 1Groupe Régulation Spatiale des Génomes , Département Génomes et Génétique , Institut Pasteur , Paris , France ; 2Centre national de la recherche scientifique , UMR 3525 , Paris , France ; 3Biological Physics and Evolutionary Dynamics Group , Max Planck Institute for Dynamics and Self-Organization , Göttingen , Germany ; 4Department of Physics , Laboratoire de physique théorique de la matière condensée , Université Pierre et Marie Curie , Paris , France 
+ Introduction
+ Microbial species have for a long time been studied individually , leading to the development of applications in fields as diverse as agronomy , environment , or medicine . 
+ Sequencing and analyzing the genetic material of microbial communities directly collected from various natural environments such as skin , gut , soil , and water have dramatically improved our knowledge and understanding of their diversity and interrelations ( Handelsman et al. , 1998 ; Guermazi et al. , 2008 ; The Human Microbiome Jumpstart Reference Strains Consortium , 2010 ; Mackelprang et al. , 2011 ; Lundberg et al. , 2012 ; Le Chatelier et al. , 2013 ) . 
+ A number of techniques have been developed to improve the resolution and accuracy of individual genome assembly in mixed populations , for instance by separating the species prior to sequencing ( Fitzsimons et al. , 2013 ) or by refining the clustering and scaffolding procedures used to process sequence data ( Albertsen et al. , 2013 ) . 
+ However , these approaches remain generally limited by the use of complex technologies and/or by the need to construct multiple libraries . 
+ In parallel , an interesting development in the field of genome assembly has recently arisen from the realization that the physical properties of chromosomes contain information regarding their linear structure . 
+ The frequency with which two chromosomal segments come in contact , as measured eLife digest Microbial communities play vital roles in the environment and sustain animal and plant life . 
+ Marine microbes are part of the ocean 's food chain ; soil microbes support the turnover of major nutrients and facilitate plant growth ; and the microbial communities residing in the human gut support digestion and the immune system , among other roles . 
+ These communities are very complex systems , often containing 1000s of different species engaged in co-dependent relationships , and are therefore very difficult to study . 
+ The entire DNA sequence of an organism constitutes its genome , and much of this genetic information is stored in large structures called chromosomes . 
+ Examining the genome of a species can provide important clues about its lifestyle and how it evolved . 
+ To do this , DNA is extracted from cells and is then usually cut into smaller fragments , amplified , and sequenced . 
+ The small stretches of sequence obtained , called reads , are finally assembled , yielding ideally the complete genome of the organism under study . 
+ Metagenomics attempts to interpret the combined genome of all the different species in a microbial community and has been instrumental in deciphering how the different species interact with each other . 
+ Metagenomics involves sequencing stretches of the community 's DNA and matching these pieces to individual species to ultimately assemble whole genomes . 
+ While this may be a relatively straightforward task for communities that contain only a handful of members , the metagenomes derived from complex microbial communities are huge , fragmented , and incomplete . 
+ This often makes it very difficult or even nearly impossible to match the inferred DNA stretches to individual species . 
+ A method called chromosome conformation capture ( or ` 3C ' for short ) can reveal the physical contacts between different regions of a chromosome and between the different chromosomes of a cell . 
+ How often each of these chromosomal contacts occurs provides a kind of physical signature to each genome and each individual chromosome within it . 
+ Marbouty et al. took advantage of these interactions to develop a technique that combines metagenomics and chromosome conformation capture -- called meta3C -- that can analyze the DNA of many different species mixed together . 
+ Testing meta3C on artificial mixtures of a few species of yeast or bacteria showed that meta3C can separate the genomes of the different species without any prior knowledge of the composition of the mix . 
+ In a single experiment , meta3C can identify individual chromosomes , match each of them to its species of origin , and reveal the three-dimensional structure of each genome in the mix . 
+ Further tests showed that meta3C can also interpret more complex communities where the number and types of the species present are not known . 
+ Meta3C holds great promise for understanding how microbial communities work and how the genomes of the species within a community are organized . 
+ However , further developments of the technique will be required to investigate communities as diverse as those present in most natural environments . 
+ DOI : 10.7554 / eLife .03318.002 in genome-wide chromosome conformation capture ( 3C ) experiments ( Dekker et al. , 2002 ; Lieberman-Aiden et al. , 2009 ) , obeys to some extent the laws of polymer physics ( Rippe , 2001 ; Wong et al. , 2012 ) . 
+ Contact frequencies can be used to identify synteny between DNA segments up to a few 100s kilobases ( kb ) apart , and two studies recently applied this concept to refining the human genome sequence ( Burton et al. , 2013 ; Kaplan and Dekker , 2013 ) . 
+ Such approach therefore shows great promise as a general method for improving genome assemblies . 
+ Using chromosomal contacts data in order to generate the precise scaffold of a given genome is however a difficult task : hence , we developed GRAAL , a robust and explicit statistical method to tackle this issue at the highest resolution possible ( Marie-Nelly et al. , 2014a ; Figure 1A ) . 
+ Since this approach enables the identification of individual chromosomes within the same nuclei , we hypothesized that tridimensional ( 3D ) organization holds enough specific information to distinguish as well DNA segments of chromosomes present within different organisms ( Figure 1A ) . 
+ In other words , if the interactions between chromosomes contained within a human or a fungal nucleus can be resolved , then it may be possible to resolve the chromosomes of various species mixed together . 
+ In the present work , we show that this is indeed the case : a single meta3C experiment , performed on a mix of species , allows the de novo assembly and scaffolding of the various genomes present in the mixture without prior knowledge of their genome sequences . 
+ We also show that the method allows deciphering the average 3D organization of these genomes in space , unveiling a remarkable diversity of chromosome organization in microorganisms . 
+ Therefore , meta3C paves the way to the integrated characterization and analysis of metagenomes in complex populations . 
+ Results
+ Meta3C unveils the diversity of chromosome organization of a mix of three bacterial species
+ We first processed a controlled mix of three bacterial species -- Escherichia coli ( Gram − ) , Vibrio cholerae ( Gram − ) , and Bacillus subtilis ( Gram + ) -- into a metagenomic 3C ( meta3C ) library and sequenced it on an Illumina platform ( Hiseq2000 − Paired End − 2 × 104 bp ) . 
+ The reads were aligned on the three reference genome sequences to generate a chromosome contact map of the whole population ( Figure 1B ) . 
+ To construct this map , each genome was divided into bins of 30 kb , and the contact frequencies were normalized so that their sum equaled one for each bin ( see ` Materials and methods ' ; Cournac et al. , 2012 ) . 
+ As in a typical 3C experiment squares appeared on the diagonal of the matrix , revealing individual chromosomes : the circular chromosomes of both E. coli and B. subtilis were recovered as separate entities , whereas the two chromosomes of V. cholerae yielded two squares exhibiting higher contact frequencies with each other than with the chromosomes of the other species -- as expected since these chromosomes share the same cellular compartment ( Figure 1B ) . 
+ The background generated by cross-species interactions was low ( 0.37 % of the total interactions ) , indicating that few chimeric pairs of reads ( in which one read came from the genome of one species and the other read from the genome of another species ) had been generated during the construction of the library ( Figure 1 -- figure supplement 1 ) . 
+ The meta3C contact matrix was subsequently converted into a 3D structure ( ` Materials and methods ' , Figure 1C , Animation 1 ) . 
+ In this representation , bins are shown as beads and the distance between each pair of beads is optimized in proportion to the inverse of their measured contact frequency ( Lesne et al. , 2014 ) . 
+ Using this approach , we recovered three populations of bins that corresponded to the three bacterial genomes , with the two chromosomes of V. cholerae being visualized in close vicinity to each other . 
+ The continuity and circularity of each genome was clearly apparent in the reconstructed structures . 
+ A sequence corresponding to a F plasmid ( the fertility factor of E. coli ) was characterized in the reads and appeared to interact with the genome of the E. coli HB101 strain used in the analysis ( Figure 1B , black arrow ) . 
+ The 3D data provided us with the opportunity to carefully investigate the structure and spatial positioning of this plasmid with respect to the E. coli chromosome . 
+ We first noticed that a large ( ∼ 140 kb ) region of the E. coli genome exhibited a two fold increase of the read coverage , indicating a segmental duplication . 
+ This region was also enriched in contacts with the F plasmid sequence ( Figure 1 -- figure supplement 2A , B ) , prompting us to hypothesize that one of the two copy was actually carried by a F ' plasmid ( i.e. , a F plasmid carrying bacterial sequences ) . 
+ A correlation analysis of the contacts between these two regions of interest ( a chromosome region encompassing the duplication and the plasmid ) revealed clearly that , on the one hand , a copy of the duplication is in close contact with the plasmid ( with an IS2 at one the boundaries [ coord . 
+ 390,063 ] and an IS3 within ) , and that on the other hand the plasmid contacts with the E. coli genome drop sharply past the duplication boundaries , in agreement with an integration of one copy of the duplication within a F ' plasmid ( Figure 1 -- figure supplement 2C ) . 
+ This hypothesis was confirmed experimentally by a Southern blot of a pulsed-field gel ( Figure 1 -- figure supplement 2D ) . 
+ Having deciphered the linear structure of the F ' plasmid within this strain , we investigated its 3D organization with respect to the E. coli genome ( Figure 1 -- figure supplement 2E ) . 
+ The frequent contacts each copy of the duplication makes with both the chromosome and the F ' plasmid resulted in an artifactual co-localization of the plasmid with the chromosome , since the contacts within each of the copies can not be discriminated and are all positioned along the genome ( see Figure 1 -- figure supplement 2E , i and Animation 2 ) . 
+ To alleviate this artifact , we generated a contact map were all the contacts involving the duplicated region were removed ( Figure 1 -- figure supplement 2E , ii and Animation 3 ) . 
+ In the corresponding 3D structure , the bin representing the plasmid appeared now well isolated from the chromosome ( Figure 1D ) . 
+ As a supplementary control , we verified that removing from the genome a region of a size similar to the duplication did not impair the 3D positioning of a DNA segment of a size similar to the F ' plasmid , positioned within this deleted region ( blue dot ; Figure 1 -- figure supplement 2E , iii ) . 
+ Interestingly , besides this duplication , two regions along the E. coli chromosome appeared enriched in contacts with the F ' plasmid : the replication origin ( Ori ) and , to a lower extent , the termination ( Ter ) regions ( Figure 1 -- figure supplement 2B ) . 
+ This indicates that the F ' plasmid is positioned preferentially in the vicinity of these regions in fast growing cells , as expected given the preferred location of the F plasmid in mid , 1/4 , and 3/4 positions in the cell , which are also preferred positions for Ori/Ter regions during the cell cycle ( Gordon et al. , 1997 ; Niki and Hiraga , 1997 ) . 
+ These results illustrate the high specificity of the meta3C approach , which allows the identification of genomic regions belonging to each species , and the power of using DNA physical contacts to decipher complex structures in individual genomes . 
+ The remaining two genomes were also considered individually and their average 3D reconstructions generated , unveiling a remarkable diversity of global chromosomal organization in these three bacterial species during exponential growth ( Figure 1D , Animation 4 , and Animation 5 ) . 
+ Interestingly , this diversity appeared rooted in shared principles . 
+ One structural feature shared by these bacterial genomes was the global symmetry of the replichores , that is , the two chromosomal arms between the Ori and Ter . 
+ In B. subtilis and V. cholerae this symmetry was made clearly visible , in the growth conditions used in this experiment , by the presence of a counter-diagonal in the map ( running opposite to the continuous and strong signal that accounts for the contact between adjacent DNA segments ; see also the correlation version of the contact map on Figure 1 -- figure supplement 3 ) . 
+ This global symmetry as well as this second opposite diagonal , reflecting a longitudinal organization of the two replichores extending along parallel axes within the cell , were also reported for Caulobacter crescentus ( Umbarger et al. , 2011 ; Le et al. , 2013 ) . 
+ The two chromosomes of V. cholera were found in two different conformations , with chromosome 1 exhibiting a more ` open ' configuration and the replichores of chromosome 2 appearing paired ( Figure 1D ) . 
+ The two arms of the E. coli chromosome exhibited an open 3D structure , with no visible diagonal in the matrix . 
+ Another 3D feature shared by these bacteria consisted in the presence of at least one well-defined domain centered either on the Ori or Ter region . 
+ The 3D reconstruction of the B. subtilis genome revealed a relatively compact domain at the Ori region that overlapped with the region of the genome containing the centromere-like parS sites , in agreement with the structural role described for the ParS / Spo0J/Smc complex in the literature ( Gruber and Errington , 2009 ; Sullivan et al. , 2009 ) . 
+ By contrast , the E. coli chromosome presented a strong Ter domain that was clearly identifiable in the contact map and exhibited fewer contacts with the rest of the genome , reflecting most likely the structuring role of the MatP/matS system ( Mercier et al. , 2008 ) . 
+ For V. cholerae , the Ter regions of chromosomes 1 and 2 were in closer proximity than the two Ori regions , reflecting the controlled segregation mechanisms of these two unequal-sized chromosomes ( Val et al. , 2008 ) . 
+ Interchromosomal contacts did not span the entire length of chromosome 1 but started between the Ori macrodomain of chromosome 2 and positions located at about one-third of chromosome 1 arms ( black dotted squares , Figure 1D ) . 
+ In order to visualize the interplay between replication and genomic organization , we applied a color code to represent the read coverage of the genome ( reflecting the average progression of the rep-lication fork and thus the relative timing of replication ) atop the genome structure of V. cholerae ( Animation 6 ) . 
+ Whereas the two Ter regions appeared of the same color , consistent with the fact that both chromosomes I and II achieve replication synchronously ( Rasmussen et al. , 2007 ) , the Ori regions presented , as expected , different coverages : chromosome I initiates replication first whereas chromosome II starts only to replicate later . 
+ This analysis provides a glimpse on the spatial and temporal articulation of the replication program of V. cholerae , which would be an interesting example of organ-ization-dependent function . 
+ Overall , the chromosome organizations of the three bacteria analyzed ( under the exponential , rich-medium growth conditions used in the experiment ) appeared remarkably different but shared similar principles , such as the presence of well-defined domains . 
+ In order to see if those shared features are conserved across bacteria species , it will be interesting to use a similar highresolution 3C approach to investigate the organization of bacterial linear chromosomes . 
+ The low amount of ` background interaction ' , that is , chimeric religation events between restriction fragments belonging to different species , in the experiment incited us to test whether the genomes of these three bacterial species could be directly assembled de novo from the meta3C data by taking advantage of the presence of ∼ 80 % ` regular ' paired-end reads in the library . 
+ This relatively high percentage results from the fact that , unlike the Hi-C protocol ( Lieberman-Aiden et al. , 2009 ) , meta3C does not involve an enrichment step for religated fragments ( ` Materials and methods ' ) . 
+ Using the assembly program IDBA-UD ( Peng et al. , 2012 ) , a set of 2,436 contigs was generated from the 3C read pairs ( N50 = 55 kb , total length 12.5 Mb ) . 
+ The quality of these contigs was assessed by comparing them with published reference genomes , which showed that the assembled contigs covered respectively 96 % , 98 % , and 93 % of the V. cholerae , E. coli , and B. subtilis reference genomes and that only 0.7 % of the contigs ( 52,373 bp total ) were chimeric . 
+ The meta3C reads were then realigned on the de novo contigs and the contact information was used to pool the contigs into communities sharing similar contact behavior ( hence likely belonging to the same genome ) using the Louvain algorithm ( Blondel et al. , 2008 ) . 
+ Three communities of contigs were generated , each corresponding to a different species and covering respectively 96 , 98 , and 92 % of the genomes of V. cholerae , E. coli , and B. subtilis , respectively ( Figure 1E ) . 
+ This shows that meta3C does not only allow convenient highthroughput analysis of the 3D organization of a mix of bacterial species but also provides an efficient way to assemble de novo the genomes of these species . 
+ To see if this approach could be applied to a more complex mix of eukaryotic species , we pooled 11 yeast species and performed a meta3C experiment directly on this mixture ( Figure 2A ) . 
+ The meta3C contact matrix of the 11 reference genomes put side by side presented discrete squares on the diagonal , each corresponding to a species from the mix ( Figure 2A ; see also Figure 2 -- figure supplement 2 ) . 
+ The 3D representation of this contact map revealed , again , a very low level of background interactions in the experiment ( Figure 2B , Animation 7 ; see also Figure 2 -- figure supplement 1 ) . 
+ In each of the squares , the co-localization of centromeres resulting from the Rabl configuration was clearly visible ( Figure 2C , blue arrowheads ; see also Figure 2 -- figure supplement 4 ; Duan et al. , 2010 ; Marie-Nelly et al. , 2014b ) . 
+ In contrast to the diversity of structures observed for the bacterial species analyzed above , the Rabl configuration appeared as the primary driver of yeast genome organization , as illustrated by the individual 3D structures of the genomes of Yarrowia lipolytica and Naumovozyma castellii ( Figure 2C , Animation 8,9 ) . 
+ To test the potential of using meta3C reads for assembling de novo such a complex mix of yeast genomes , we assembled them using IDBA-UD ( N50 = 6,914 bp , total length 138 Mb ) . 
+ The breadth of coverage of the 11 genomes by the resulting contigs ranged from 89.8 % ( Candida albicans ) to 98.3 % ( N. castellii ) , with chimeric contigs ( misassemblies ) representing ∼ 20 % of the total ( 37 Mb ) . 
+ This high percentage of chimera contrasted with the very low level of misassemblies observed previously for the mix of three bacterial genomes . 
+ To monitor the influence of chimeric pairs of reads ( originating from intergenomic 3D contacts ) on the generation of chimeric contigs , we performed an assembly on the same library using the Velvet software that considers all reads as independents ( i.e. , without pairing them ; Zerbino and Birney , 2008 ) . 
+ This assembly exhibited a dramatic increase ( 73 % , 69 Mb ) in chimeric contigs , demonstrating that these misassemblies were not caused by intergenomic 3C paired-end reads but rather by the frequent occurrence of identical or near-identical genome regions ( such as transposable elements ) in those eukaryotic genomes . 
+ The presence of chimeric contigs in the IDBA _ UD assembly did not impede the clustering of the most contigs based on their contact frequencies as determined using the Louvain algorithm . 
+ The clustering procedure resulted in 13 sub-groups : one for each of the 11 yeast species , one corresponding to an E. coli contaminant , and one comprising various misassembled fragments of mitochondrial genomes . 
+ In total , only 1 % of the assembly ( 2 % of the contigs ) could not be attributed to a given species ( Figure 2D ) . 
+ A de novo assembly followed by scaffolding using our dedicated program GRAAL ( Marie-Nelly et al. , 2014a ) was then applied to the pool of contigs identified as belonging to the genome of N. castellii , for which a high-quality reference sequence was available ( Figure 2E ; Gordon et al. , 2011 ) . 
+ Since this algorithm involves a splitting procedure of the contigs into restriction fragments , chimeric contigs were broken into non-chimeric parts thereby correcting the assembly errors mentioned above . 
+ After processing , 11 superscaffolds were recovered , with the reordered contact map presenting the typical co-localized centromeres regions expected from the Rabl configuration ( Figure 2E ; Animation 10 ) . 
+ Overall these 11 scaffolds covered 94.5 % of the reference sequence of the 10 chromosomes ( chromosome 3 was split into two scaffolds because of the presence of the large , unassembled rDNA array on this chromosome ) , illustrating the ability of this de novo analysis to correctly assemble and scaffold unknown genomes ( see Figure 2 -- figure supplement 3 ) . 
+ The same approach was applied to other species in the mixture , such as Saccharomyces bayanus ( Cliften et al. , 2003 ; Scannell et al. , 2011 ; visible in the matrix , emphasizing that 3D signals could also be used to annotate the centromeres of unknown species isolated from meta3C data ( Figure 2F ; Figure 2 -- figure supplement 4B ; Marie-Nelly et al. , 2014b ) . 
+ Meta3C allows exploring the genomes of unknown species in a complex environmental sample
+ As an even more challenging test of the meta3C approach , we confronted it with an environmental sample of unknown and presumably complex composition . 
+ For this purpose , sediments were collected from an affluent of the Seine river near Paris . 
+ Because environmental genomes usually contain a tremendous diversity of species , including many eukaryotes of large genome sizes , we decided to enrich our sample for prokaryotic organisms by cultivating it for 14 hr in Luria broth and filtrating it prior to meta3C library construction and sequencing . 
+ As an internal control of the assembly process , reads from a V. cholerae 3C library were added to the meta3C sequences before running IDBA-UD ( N50 = 1.2 kb ) . 
+ Contigs generated from the meta3C sediment library were clustered using the Louvain algorithm , yielding 184 significant communities ( total size 111 Mb , median size 300 kb , with 19 communities containing more than 1 Mb ) . 
+ The largest 11 ones were included in a contact matrix , revealing again squares along the diagonal ( Figure 3A ) . 
+ Each community was analyzed using MG-RAST ( Glass et al. , 2010 ) , showing relatively homogeneous taxonomic compositions in most communities ( Figure 3 -- figure supplement 1 ) . 
+ For each community , more than 80 % of the genes identified by MG-RAST within the contigs were attributed to the same taxonomic class . 
+ Going down to family level , 8 out of the 11 largest communities were > 80 % homogeneous , with the remaining three presenting more complex patterns that will require further investigation and development . 
+ One community was composed of contigs covering 95 % of the V. cholerae genome control ( for a total amount of 3.9 Mb congruent with the expected size of the genome ; Figure 3B , i ) , confirming the ability of this approach to pool DNA regions according to their genome of origin . 
+ Other communities contained contigs belonging to other discrete species , related for instance to the ubiquitous bacteria Aeromonas veronii or Exiguobacterium sp . 
+ ( for total amounts of 4 Mb and 3.1 Mb , aligning to 72 % and 36 % of the reference genomes of 4.5 Mb and 3 Mb total sizes , respectively ; Figure 3B , i ) . 
+ Several species belonging to the classes Bacilli and Enterobacteria were also present in the mix ( probably favored by the LB enrichment step ) . 
+ In some instances , weak interactions between communities from these clades suggested that a genome had been split into more than one community or that two communities contained mixtures of closely related species ( see dotted square on Figure 3A ) . 
+ In spite of this problem the approach seemed to perform relatively well , as reflected by the analysis of the contacts with plasmids . 
+ Whether integrated in a genome or under circular forms ( see Figure 1B ) , plasmids are expected to present mostly contacts with their host genome since they share the same cellular compartment ( see also the contacts of plasmid F with the E. coli genome in Figure 1 -- figure supplement 1A ) . 
+ In this study , plasmids annotated as belonging to Bacillus megaterium were retrieved almost entirely within a single Bacilli community ( Figure 3B , ii ) , suggesting that this community presented indeed a relatively homogeneous content and therefore validating our approach . 
+ Although the limited sequencing depth of our experiment restrained our ability to scaffold optimally the contigs of the communities using GRAAL , mapping the reads present in the community related to A. veronii against the reference genome of this species revealed a 3D structure reminiscent of those of B. subtilis and C. crescentus ( Figure 3C ; Animation 12 ) . 
+ Hence , the 3D information contained within the chromosome of this species was efficiently captured during the meta3C experiment , suggesting that increasing the sequencing depth of a meta3C library will likely make it possible to generate de novo scaffolds and 3D clusters for genomes entirely unknown or underrepresented , shedding light at the same time onto their overall organization . 
+ Discussion
+ Our application of meta3C to controlled mixes of bacteria and eukaryotes revealed new 3D genome organizations for four bacterial and several yeast species and showed that this approach can also be used to generate de novo communities of contigs corresponding to individual species in complex metagenomic mixtures ( a flowchart of this approach is presented in Figure 4 ) . 
+ The average organization of yeast genomes was found to differ radically from those of bacterial nucleoids , most likely because of contrasted replication and division processes and timings . 
+ Most of the organization in Saccharomycetes appears driven by the clustering of the centromeres at the spindle pole body ; this clustering remains the prominent structural feature throughout most of the cell cycle ( i.e. , G1 + S phase + G2 ) during exponential growth , with mitosis representing only a fraction of the cycle . 
+ In bacteria , in contrary , the strong overall organizational features are the activation of a unique replication origin per chromosome , the ability to initiate multiple replication forks , and the fast division cycle . 
+ In addition , important topological constraints are exerted on bacterial circular chromosomes around the Ter regions , and our observation of significant contacts at these positions was probably linked to these phenomena . 
+ More imaging and 3C-like analyses will be required to understand the precise choreography of chromosome segregation in these different microorganisms . 
+ Taking advantage of chromatin conformation capture data to address genomic questions is a dynamic field : while this paper was under review , two studies were released that also aimed at exploiting the physical contacts between DNA molecules to deconvolve genomes from controlled mixes of microorganisms ( Beitel et al. , 2014 ; Burton et al. , 2014 ) . 
+ Although these analyses open interesting perspectives , the meta3C approach presented in this work differs in several aspects . 
+ First , both studies used HiC to scafold genomes from contigs obtained either from simulated assemblies ( Beitel et al. , 2014 ) or from independent experiments ( Burton et al. , 2014 ) . 
+ Our approach conveniently uses a single meta3C library and a blind analysis for generating contigs , binning them , scaffolding them , and revealing the 3D structure of the corresponding species . 
+ Such approach based on a single experiment may require more sequencing depth than an approach combining multiple libraries ( HiC + shotgun + mate pair , for instance ) , but the exact trade-off remains to be determined and depends on the specifics of the experiment , the sequencing technology , and on the amount of starting material available . 
+ Besides , our blind analysis allowed us to delineate communities without prior knowledge of the number of species present in the mix ; this contrasts with the approach of Burton et al. ( 2014 ) , which apparently requires such prior knowledge . 
+ Similarly , we generate de novo chromosome scaffolds without any assumptions regarding the number of chromosomes present in the mixture , a realistic approach when it comes to exploring environmental samples or complete the assembly of complex genomes . 
+ Furthermore , our experience with the analysis of chromosome organization in both bacteria and yeast species emphasized the importance of the initial steps for the success of such experiment . 
+ The adequacy between the cross-linking step , which depends on the incubation time and the concentration of the fixating agent , and the restriction step and choice of the restriction enzyme , is essential for the recovery of long-distance cis contacts that improve de novo genome scaffolding and , importantly , reveal the 3D structure . 
+ Optimizing the cross-linking conditions allows new insights into the diversity of 3D chromosome organization of several species . 
+ Last but not least , the meta3C approach remains to be applied on a truly natural ( not enriched ) sample , such as a gut or wine microbiome . 
+ Based on our experience , we envision that , this experiment should include several ( two at least ) enzymes recognizing sites with various GC percentages , in order for both GC-rich and GC-poor genomes to be appropriately represented in the meta3C library . 
+ Doing the experiment with multiple enzymes would also allow taking into account GC content ( in addition to average coverage ) to improve the binning of contigs into communities . 
+ Identifying contigs presenting largely divergent characteristics compared to their neighbors and redistributing them to their most likely communities would also lead to improved assemblies for each species ( see for instance Albertsen et al. , 2013 ) . 
+ Overall , quantifying physical contacts between chromosomes provides an objective and convenient principle to segregate the genomes of sympatric species and , from there , to explore the biological diversity of complex ecosystems . 
+ Materials and methods
+ Construction of a bacteria meta3C library
+ Meta3C protocols were adapted from 3C protocols ( notably , Dekker et al. , 2002 ; Oza et al. , 2009 ) . 
+ The strains used for the meta3C bacterial library were B. subtilis BS168 ( Burkholder and Giles , 1947 ) , E. coli HB101 ( Boyer and Roulland-Dussoix , 1969 ) , and V. cholerae MV127 ( Val et al. , 2012 ) . 
+ For each strain , 100 ml of LB were inoculated with 106 cells/ml and incubated at 37 °C until a final concentration of about 2 × 107 cells/ml . 
+ Cells from the different species were then mixed and cross-linked with fresh formaldehyde for 30 min ( 3 % final concentration ; Sigma Aldrich , Saint Louis , Missouri ) at room temperature ( RT ) followed by 30 min at 4 °C . 
+ Formaldehyde was quenched with a final concentration of 0.25 M glycine for 5 min at RT followed by 15 min at 4 °C . 
+ Fixed cells were collected by centrifugation , frozen on dry ice , and stored at − 80 °C until use . 
+ Frozen pellets of 3 × 109 cells were thawed on ice and resuspended in a final volume of 650 µl 1 × TE pH 8 before adding 4 µl of Ready-Lyse lysozyme ( 35 U / µl ; Tebu Bio , France ) , followed by incubation at RT for 20 min . 
+ SDS was added to a final concentration of 0.5 % followed by 10 min RT incubation . 
+ 50 µl of lysed cells were put in eight tubes containing 450 µl of digestion mix ( 1 × NEBuffer 1 [ New England Biolabs , Ipswich , Massachusetts ] , 1 % Triton X-100 , and 100U HpaII enzyme [ NEB ; C ^ CGG ] ) . 
+ The chromatin was then digested for 3 hr at 37 °C , split into four aliquots , and diluted with 8 ml ligation buffer ( 1 × ligation buffer NEB without ATP , 1 mM ATP , 0.1 mg/ml BSA , 125 units of T4 DNA ligase [ 5 U / µl − Weiss Units − Thermo Fisher Scientific , Waltham , Massachusetts ] ) . 
+ Ligation was performed at 16 °C for 4 hr followed by a de-cross-linking step consisting of an overnight ( ON ) incubation at 65 °C in the presence of 250 µg / ml proteinase K in 6.2 mM EDTA . 
+ DNA was then precipitated with 800 µl of 3 M sodium-acetate ( pH 5.2 ) and 8 ml iso-propanol . 
+ After 1 hr at − 80 °C , DNA was pelleted by centrifugation . 
+ Pellets were suspended in 500 µl 1 × TE buffer and the RNA degraded with a final concentration of 0.03 mg/ml RNAse for 1 hr at 37 °C . 
+ DNA was transferred into 2 ml centrifuge tubes , extracted twice with 500 µl phenol -- chloroform pH 8.0 , precipitated , washed with 1 ml cold ethanol ( 70 % ) , and diluted in 30 µl 1 × TE buffer . 
+ All tubes were pooled and the resulting 3C library was quantified on gel using the program QuantityOne ( Bio-Rad , Richmond , California ) . 
+ Construction of a yeast meta3C library The strains used for the meta3C yeast library were Y. lipolytica CLIB122 , L. kluyveri CBS3082 , Candida lusitaniae ATCC42720 , C. albicans SC5T314 , Kluyveromyces lactis CLIB210 , S. bayanus 623-6C , Kluyveromyces thermotolerans CBS6340 , Saccharomyces cerevisiae BY4741 , N. castellii CBS 4309 , Candida glabrata CBS138 , and Debaryomyces hansenii CBS767 . 
+ All strains were grown at 30 °C in 50 ml BMW medium until reaching 1 × 107 cells/ml ( Thompson et al. , 2013 ) . 
+ The cultures were then mixed and cross-linked for 30 min with fresh formaldehyde ( 3 % ) . 
+ The formaldehyde was quenched with a final concentration of 0.25 M glycine for 5 min at RT followed by 15 min at 4 °C . 
+ Fixed cells were pooled as aliquots of 3 × 109 cells , collected by centrifugation , frozen on dry ice , and stored at − 80 °C . 
+ Aliquots were thawed on ice and resuspended in 6 ml of 1 × DpnII buffer ( NEB ) . 
+ The cells were then split into four tubes and lysed using a Precellys grinder ( 3 cycles : 6700 rpm − 3 × 20 s ON/60 s OFF ; Bertin Technologies , France ) and VK05 beads . 
+ Lysed cells were pooled and their volume was adjusted to 6 ml with 1 × DpnII buffer . 
+ SDS was added to a final concentration of 0.3 % and the solution was split into twelve 2 ml tubes ( Eppendorf -- DNA LoBind , Eppendorf , Germany ) and incubated for 20 min at 65 °C followed by 30 min at 37 °C under agitation . 
+ As a next step , 6 µL of 10 × restriction enzyme buffer and 50 µL of Triton X-100 20 % were added to each tube , mixed carefully , and incubated at 37 °C for another 30 min under agitation . 
+ The chromatin was digested for 3 hr with 50 units of restriction enzyme under agitation ( DpnII : G ^ ATC , NEB ) . 
+ Following incubation , 100 units of restriction enzyme were added and the incubation was extended overnight . 
+ The digested chromatin was pooled into four equal reactions and the samples were then processed as described above . 
+ Construction of an environmental meta3C library The river sediments ( 300 g ) were incubated for one night in 500 ml of LB at 30 °C . 
+ The next morning , the culture was filtrated on Whatman paper ( with a size cut-off of 15 µm ) and 20 mg of wet material were resuspended in 100 ml of fresh LB , then treated with fresh formaldehyde ( 5 % final concentration ) for 30 min at RT followed by 30 min at 4 °C . 
+ The formaldehyde was quenched with glycine ( 0.4 M final ) for 5 min at RT followed by 15 min at 4 °C . 
+ Fixed cells were collected by centrifugation , frozen on dry ice , and conserved at − 80 °C until use . 
+ Frozen pellets were slowly thawed on ice and resuspended in 800 µl of TE 1 × . 
+ Cells were then lysed using a Precellys grinder ( 3 cycles : 6700 rpm − 3 × 20 s ON/60 s OFF ) and VK05 beads . 
+ About 600 µl of lysed cells were recovered , to which SDS was added at a 0.5 % final concentration before incubation at RT for 10 min . 
+ 50 µl of lysed cells were put in eight tubes containing 450 µl of digestion mix ( 50 µl tampon NEB 2 10 × , 50 µl Triton X-100 10 % , 10 µl of HaeIII enzyme [ GG ^ CC ] − 10 U / µl NEB , 340 µl H2O ) . 
+ Chromatin was then digested during 3 hr at 37 °C under agitation . 
+ The digested chromatin was pooled into four equal reactions and the samples were processed as described above . 
+ Illumina sequencing Aliquots of 5 µg of each 3C library were dissolved in water ( final volume 130 µl ) and sheared using a Covaris S220 instrument ( duty cycle 5 , intensity 5 , 200 cycles per burst , 4 cycles of 60 s each ; Covaris Ltd. , Woburn , Massachusetts ) . 
+ The sheared DNA was purified on QIAquick columns and processed using a commercial kit ( Paired-End DNA sample Prep Kit -- Illumina -- PE-930-1001 ; Illumina , San Diego , California ) . 
+ The DNA was ligated to custom-made versions of the Illumina PE adapters ( see Table 1 ) for 3 hr at room temperature in a final volume of 30 µl ( 20 µl of DNA [ around 8 µg ] , 3 µl of ligation buffer 10 × [ NEB ] , 3 µl of T4 DNA ligase [ 400 U / µl from NEB ] , and 4 µl of 10 µM adapter solutions ) . 
+ Tubes were then incubated at 65 °C for 20 min . 
+ DNA fragments ranging in size from 400 -- 800 pb were purified using a PippinPrep apparatus ( SAGE Science , Beverly , Massachusetts ) . 
+ For each library , test PCR reactions were performed to determine the optimal number of PCR cycles and a large-scale PCR ( eight reactions ) was then set-up with the number of PCR cycles determined previously . 
+ The PCR products were finally purified using Qiagen MinElute columns ( Qiagen , Netherlands ) and paired-end ( PE ) sequenced on an Illumina platform ( HiSeq2000 ; PE 2 × 100 ) . 
+ DOI: 10.7554/eLife.03318.027
+ Processing of PE reads
+ The raw data from each 3C experiment were processed as follow : first , reads were demultiplexed using the small tag present at the beginning of each sequence ( contained in the custom-made adapters ) . 
+ Then , PCR duplicates were collapsed using the six Ns present on each adapter ( Table 1 ) . 
+ Reads from the raw data used in the present study were aligned using Bowtie 2 in its most sensitive mode ( Langmead and Salzberg , 2012 ) . 
+ We used an iterative alignment procedure similar to Imakaev et al. ( 2012 ) , that is , only the first 20 base pairs of the read were initially mapped then the length of the read was progressively increased until the mapping became unambiguous ( with a mapping quality superior to 40 ) . 
+ Paired reads were aligned independently . 
+ Indexes were built in one step and included the genome sequences of all the different organisms . 
+ Generation of contact maps
+ Each mapped read was assigned to a restriction fragment . 
+ The matrices were then binned into units of 10 or 200 fragments , resulting in 4292 × 4292 or 2229 × 2229 matrices for the mixtures of three bacteria and 11 yeasts , respectively . 
+ Matrices were normalized using the sequential component normalization procedure ( SCN ) described in Cournac et al. ( 2012 ) , similar to the iterative normalization procedure described in Imakaev et al. ( 2012 ) . 
+ The SCN procedure ensures that the sum over the column and lines of the matrix equals 1 , which reduces the biases inherent to the protocol . 
+ Full resolution contact maps are available for bacteria , yeasts , and the environmental sample on the Dryad Digital Repository : http://dx.doi.org/10.5061/dryad.gv595 ( Marbouty et al. , 2014 ) . 
+ 3D reconstruction of the contact maps
+ In order to build the 3D structures of the different genomes from the chromosomal contact maps we used the algorithm described in Lesne et al. ( 2014 ) . 
+ Briefly , we first converted the normalized contact matrix into an adjacency graph in which each node represented a genomic region and each link had a weight corresponding to the inverse of the number of contacts detected between the two corresponding nodes in the meta3C experiment . 
+ We then converted this graph into a distance matrix using the Floyd -- Warshall algorithm . 
+ This algorithm computes the distance between each pair of genomic regions by determining the shortest distance on the graph between the two corresponding nodes . 
+ We finally converted this distance matrix into a 3D structure using distance geometry theorems as described for molecular structures in Havel et al. ( 1983 ) . 
+ All this procedure was implemented in Matlab . 
+ Genome assembly
+ Reads containing undetermined bases were removed before the assembly step to retain only good-quality reads . 
+ De novo assemblies were then performed using the program IDBA-UB ( Peng et al. , 2012 ) with the pre-correction option and default parameters . 
+ Quantitative analysis of the meta3C library assemblies Bacterial meta3C assembly
+ 4,040,422 reads were used for assembling the mixture of three bacterial genomes , yielding 2,436 contigs ( 303 > 5 kb , N50 = 55 kb , total length 12.5 Mb ) . 
+ More than 95 % ( ∼ 3,880,000 ) of the initial reads mapped on the contigs . 
+ These contigs were mapped against the reference genomes of the different bacteria using BLAT ( Kent , 2002 ) . 
+ Analyzing the output revealed that 30 and 14 of these contigs covered 96.0 % ( 2,957,563 bp ) and 96.1 % ( 1,071,728 bp ) of chromosomes 1 and 2 of V. cholerae , respectively . 
+ Another group of 55 contigs covered 98.4 % ( 4,630,934 bp ) of the genome of E. coli , whereas a group of 1441 contigs covered 92.5 % ( 3,957,375 bp ) of the genome of B. subtilis ( a higher fragmentation reflecting the lower representation of this bacterium in the mix ) . 
+ We also quantified the amount of chimeric contigs in the assembly using BEDtools ( Quinlan and Hall , 2010 ) . 
+ Such chimeric contigs may occur because of the presence of repeated conserved sequences between bacteria , such as transposons , or because of the background interactions introduced by chimeric religation between restriction fragments belonging to different species . 
+ Only 17 out of the 2436 contigs were chimeric ( 0.7 % ) , accounting for 52,373 bp in total ( 0.4 % of the total size of the assembly ) . 
+ This relatively low number probably results from the limited number of species contained in the mix and their low repeat content . 
+ It also shows that chimeric religation during the experiment introduces little background interactions in the assembly process . 
+ Yeast meta3C assembly
+ 47,614 contigs were generated by assembling the ∼ 60 M yeast reads ( 7,212 > 5 kb , N50 = 6.9 kb , total length 138.7 Mb ) . 
+ When mapped against the newly generated contigs , 47 M reads ( ∼ 78 % ) did align . 
+ These contigs were then compared with the reference genomes of the different yeasts using BLAT and the analysis of the output was performed as above . 
+ Genome breadths of coverage ranged from 89 % ( Y. lipolytica ) to 98 % ( N. castellii ) . 
+ In contrast to the previous results on the bacterial meta3C assembly , there were 9,599 chimeric contigs in the yeast case , representing 27.1 % of the total number of contigs . 
+ This shows that chimeric contigs arise when dealing with mixtures of repeat-rich genomes , likely because near-identical repeats ( such as transposable elements ) are found in different genomes in the mixture . 
+ Such assembly mistakes will be probably avoided if using longer reads ( since repeats longer than the read length pose problem during assembly ) or , alternatively , can be resolved a posteriori using GRAAL . 
+ Despite the high percentage of chimeric contigs , we were able to assemble ∼ 95 % of the N. castellii genome from the meta3C reads . 
+ River sediment meta3C assembly
+ 67,920,671 pairs of reads ( 91 bp useful ) were used to assemble the raw culture from the river sediment . 
+ As a positive assembly control , 500,000 reads from a 3C library of V. cholerae were added into this pool of sequences . 
+ The assembly generated 130,713 contigs ( 2,250 > 5 kb , N50 = 1,274 kb , total length = 111 Mb ) . 
+ After realignment of the initial reads along the generated contig , 42,705,64 PE reads could be mapped against the assembled contigs ( ∼ 62 % ) . 
+ Pulsed-field gel electrophoresis (PFGE)
+ Chromosome plugs were prepared as described previously ( Koszul et al. , 2004 ) , using lysozyme instead of zymolyase ( 1 % agarose gels , 0.25 × TBE buffer at pH8 .3 ) . 
+ PFGE was performed in a Rotaphor R23 tank ( Biometra , Germany ) using the following program : 12 °C , 5 V/cm for 65 hr , angle 110 ° , pulse ramps 200 to 80 V. Southern blot hybridization was performed using probes derived from PCR products obtained with the primers listed in Table 2 . 
+ To group the different contigs into communities reflecting the different genomes present in the sequenced mixtures , we adopted an approach based on graph theory . 
+ Among several communitydetection algorithms , we found that the Louvain method ( Blondel et al. , 2008 ) generated the best reconstructions of the controlled mixes of bacteria and yeast species . 
+ Before applying the algorithm , contigs longer than 2.5 kb were divided into equal-sized chunks corresponding to several nodes in the graph ( since large contigs exhibit more contact than smaller ones , they are more prone to clustering when running the Louvain algorithm ) . 
+ We used the Louvain method with default parameters , except for the mix of three bacteria ( r = 10 for three communities ) . 
+ For bacterial mix , 10 % of the contigs of the smallest size were not attributed to a community with these parameters ( representing ∼ 15 % of the total reads ) . 
+ For yeast , few contigs were left aside , representing in total ∼ 2 % of the number of reads . 
+ For the river sediment experiment , ∼ 15 % of the contigs were left aside at the binning step . 
+ The contigs in each bin resulting from the 3D binning were characterized using BLAST to determine the dominant species . 
+ GRAAL scaffolding of the N. castellii and S. bayanus genomes The program GRAAL ( for Genome ( Re ) - Assembly Assessing Likelihood from 3D ) aims at improving incomplete genome assemblies through the probabilistic exploitation of the physical contacts endured by chromosomes within a cellular compartment ( Marie-Nelly et al. , 2014a ) . 
+ The program only needs two datasets for initialization : a set of contigs and a 3C/HiC dataset . 
+ Upon initialization and depending on the depth of the HiC dataset , the software splits the contigs into smaller pieces/bins encompassing at least 2 restriction fragments . 
+ It then iteratively ( over 1000s of steps ) searches through a broad range of structures generated from reordering these small bins for genome structures more likely to be true given the 3D data . 
+ Importantly , at each step , a bin is tested for a variety of ` structural variations ' with respect to its most likely neighbors , including duplications , deletions , fusions , inversion , etc. . 
+ The likelihood of each of these new genome structures is then computed in light of the contact data , and one of those structures is sampled for the next iteration ( during which the position of a new bin is tested ) . 
+ Each bin is tested several times throughout the entire process , and step by step the genome structure originally made of thousands of small independent bins converges towards a structure reflecting the best solution given the 3D contact data . 
+ The entire program and full source code are freely available online here : https://github.com/ koszullab/GRAAL as well as on the website of RK laboratory . 
+ The description of the program and its application are presented in Marie-Nelly et al. ( 2014a ) . 
+ We used GRAAL to reassemble the 2,060 contigs ( N50 = 16,764 bp ) contained in the S. bayanus community identified by the Louvain algorithm . 
+ The resulting scaffolding showed a large improvement for all assembly parameters . 
+ The N50 length now reached 190,514 bp , and 77 % of the total length of the assembly was assembled into regions larger than 50 kb ( as compared to 3 % before running the software ) . 
+ The result was even more spectacular for N. castellii ( probably due to its higher coverage in the meta3C library ) : after running GRAAL on the 999 contigs ( N50 = 27,452 bp ) in the community delineated by the Louvain algorithm , N50 length reached 792,652 bp , and 96 % of the assembled data was in scaffolds larger than 50 kb ( with 11 scaffolds covering 95 % of the 10 chromosomes ) . 
+ Data visualization
+ For general visualization of the meta3C assembly data we used force-directed graph-drawing algorithms . 
+ This class of algorithms positions the nodes of a graph by assigning forces among the set of edges according to weighted interactions between the nodes . 
+ In our case , the 3C contacts dictated the strength of interactions between nodes . 
+ These layouts allowed us to visualize conveniently large clusters of nodes that were subsequently confirmed by the use of the Louvain algorithm . 
+ All graphs were visualized using the network software Gephi ( Bastian et al. , 2009 ; Martin et al. , 2011 ) . 
+ Contigs longer than 2.5 kbp were divided into equal-sized chunks corresponding to several nodes in the graph ( to ` normalize ' the appearance of the clusters , since otherwise every contig , whatever its size , would be represented by a single point ) . 
+ For visualizing the meta3C bacterial assembly we used the Force Atlas 2 algorithm ( Jacomy et al. , 2014 ) . 
+ This algorithm assigns spring-like attractive forces and repulsive forces ( like those between electrically charged particles ) to the set of edges and the set of nodes : in the equilibrium state , discrete clusters are obtained . 
+ The entire set of bins was then color-labeled , with each color corresponding to a community detected by the Louvain algorithm ( revealing a strong correlation with the clusters obtained using the visualization approach ) . 
+ For the circular representation and comparison of the sequences present in the communities against known bacterial genomes , we used the CGView server ( Grant and Stothard , 2008 ) . 
+ Acknowledgements
+ We thank Jean-Yves Coppée , Caroline Proux , and Laurence Ma from the Génopole at Institut Pasteur for technical assistance with Illumina sequencing . 
+ Thanks also to Nienke Buddelmeijer and Didier Mazel for providing us the B. subtilis and V. cholerae strains and to Frédéric Boccard , Olivier Espeli , Didier Mazel , Marcelo Nollmann , and Marie-Eve Val for their constructive comments on the results . 
+ We are grateful to Agnès Thierry for providing us the yeast strains and for helping us with pulse-field electrophoresis and hybridization of the plasmid F ' and the E. coli genome . 
+ We are also grateful to Olivier Jaillon and to our colleagues from the group Régulation spatiale des génomes for useful comments and discussions . 
+ Parts of the computational analyses were performed on the computing cluster of the Gesellschaft für Wissenschaftliche Datenverarbeitung mbH Göttingen ( GWDG ) . 
+ Additional information
+ Competing interests
+ MM : GRAAL program is owned by the Institut Pasteur , its use for commercial purposes requires a specific licence . 
+ AC : GRAAL program is owned by the Institut Pasteur , its use for commercial purposes requires a specific licence . 
+ HM-N : GRAAL program is owned by the Institut Pasteur , its use for commercial purposes requires a specific licence . 
+ RK : GRAAL program is owned by the Institut Pasteur , its use for commercial purposes requires a specific licence . 
+ The other authors declare that no competing interests exist . 
+ European Research Council Fondation ARC pour la Recherche sur le
+ The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
+ Author contributions
+ MM , AC , Conception and design , Acquisition of data , Analysis and interpretation of data ; J-FF , JM , Analysis and interpretation of data , Drafting or revising the article ; HM-N , Analysis and interpretation of data ; RK , Proposed and coordinated the project , Conception and design , Analysis and interpretation of data , Drafting or revising the article 
+ Animation 1. 3D reconstruction of the bacterial meta3C contact matrix. DOI: 10.7554/eLife.03318.007
+ Animation 2 . 
+ 3D reconstruction of the E. coli genome with a plasmid F ' carrying a 140 kb segmental duplication . 
+ DOI : 10.7554 / eLife .03318.008 Animation 3 . 
+ 3D reconstruction of the E. coli genome with a plasmid F ' . 
+ DOI : 10.7554 / eLife .03318.009 Animation 4 . 
+ 3D reconstruction of the B. subtilis genome . 
+ DOI : 10.7554 / eLife .03318.010 Animation 5 . 
+ 3D reconstruction of the V. cholerae genome . 
+ DOI : 10.7554 / eLife .03318.011
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/25735747.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/25735747.txt 0 → 100644
View file @27818a9
+ The architecture of ArgR-DNA complexes at the
+ 1Department of Biological Sciences , Korea Advanced Institute of Science and Technology , Daejeon 305-701 , Republic of Korea , 2KI for the BioCentury , Korea Advanced Institute of Science and Technology , Daejeon 305-701 , Republic of Korea , 3Department of Chemical and Biochemical Engineering , Dongguk University-Seoul , Seoul 100-715 , Republic of Korea , 4Department of Bioengineering , University of California , San Diego , La Jolla , CA , USA , 5Department of Pediatrics , University of California , San Diego , La Jolla , CA , USA and 6Center for Biosustainability , Technical University of Denmark , Hørsholm , Denmark 
+ Received September 23, 2014; Revised February 12, 2015; Accepted February 13, 2015
+ ABSTRACT
+ DNA-binding motifs that are recognized by transcription factors ( TFs ) have been well studied ; however , challenges remain in determining the in vivo architecture of TF-DNA complexes on a genome-scale . 
+ Here , we determined the in vivo architecture of Escherichia coli arginine repressor ( ArgR ) - DNA complexes using high-throughput sequencing of exonuclease-treated chromatin-immunoprecipitated DNA ( ChIP-exo ) . 
+ The ChIP-exo has a unique peak-pair pattern indicating 5 ′ and 3 ′ ends of ArgR-binding region . 
+ We identiﬁed 62 ArgR-binding loci , which were classiﬁed into three groups , comprising single , double and triple peak-pairs . 
+ Each peak-pair has a unique ± 93 base pair ( bp ) - long ( 2 bp ) ArgR-binding sequence containing two ARG boxes ( 39 bp ) and residual sequences . 
+ Moreover , the three ArgR-binding modes deﬁned by the position of the two ARG boxes indicate that DNA bends centered between the pair of ARG boxes facilitate the non-speciﬁc contacts between ArgR subunits and the residual sequences . 
+ Additionally , our approach may also reveal other fundamental structural features of TF-DNA interactions that have implications for studying genome-scale transcriptional regulatory networks . 
+ INTRODUCTION
+ Transcription factors ( TFs ) are ubiquitous regulatory proteins found across all domains of life that determine gene expression by controlling the distribution of RNA polymerase ( RNAP ) molecules on promoter sites ( 1 ) . 
+ TFs recognize and bind to specific DNA sequences in response to various environmental conditions and govern transcriptional activation 
+ Nucleic Acids Research, 2015, Vol. 43, No. 6 3079–3088 doi: 10.1093/nar/gkv150
+ or repression of the genes via promoter-associated RNAP ( 2 ) . 
+ Therefore , the determination of TF-binding site ( TFBS ) with consensus DNA sequence motif is critical to understand the regulatory mechanism and role of TFs in transcription ( 3 ) . 
+ In bacterial genomes , the TF-binding consensus sequences are generally between 12 and 30 base pairs ( bp ) in length , and are often structured as direct repeats or palindromes spaced with a fixed number of random nucleotides ( 4,5 ) . 
+ Furthermore , the location of the TFBS determines whether the TFs interfere with or support the association of RNAP to a particular promoter . 
+ For example , TFBS in the vicinity of the core promoter elements , the start of the coding region , or the activator-binding site can inhibit transcription by preventing the access of RNAP to those genomic regions ( 3 ) . 
+ Interestingly , TFs often exert regulatory functions such as transcriptional activation and repression even at distal locations by causing topological changes in the structures of the genome such as DNA looping or bending ( 6 -- 8 ) . 
+ Among the bacterial TFs , cAMP receptor protein ( CRP ) and arginine repressor ( ArgR ) are particularly interesting from a DNA structure point of view . 
+ CRP bends the ◦ DNA by at least 90 at the site of interaction with DNA , thereby contributing to transcriptional regulation . 
+ The association of hexameric ArgR complex induces DNA bending with the angle of ∼ 70 − 90 ◦ apparently centered at its binding motif ( 9 -- 11 ) . 
+ Genome-scale studies for mapping of TFBS have been performed using chromatin immunoprecipitation ( IP ) coupled with microarray ( ChIP-chip ) or sequencing ( ChIP-seq ) for various bacterial TFs ( 7,12 -- 18 ) . 
+ These studies , however , have not revealed the broad changes in genome topology and motif recognition mechanism by ArgR in vivo . 
+ Here , we describe in vivo architecture of how DNA wraps around the hexameric ArgR complex on a genome-scale . 
+ The comprehensive determination of ArgR target genes by analysis of unique peak-pair pattern of ChIP-exo demonstrates that the sharp DNA bending ( 70 -- 90o ) at the TFBS facilitates the non-specific contacts between ArgR subunits and residual sequences of TFBS . 
+ This approach provides a foundation to determine direct regulon members and in vivo architecture of TFs and DNA complexes to elucidate a mechanistic understanding of transcriptional regulatory networks . 
+ MATERIALS AND METHODS
+ Bacterial strains and growth
+ All strains used are Escherichia coli K-12 MG1655 and its derivatives . 
+ The strain harboring ArgR-8myc was constructed as described previously with the tagging primers , AACGGTTTCACAGTCAAAGACC TGTACGAAGCGATTTTAGAGCTGTTCGACC AGGAGCTTGTCGGATCCAGTCTTCGTGAT and GCAGGGGGTTGAGAGGGATAAGCAACATTTTC CCCGCCGTCAGAAACGACGGGGCAGAGAAATT CCGGGGATCCGTCGACC ( 19 ) . 
+ A Glycerol stock of the strain was inoculated into 3 ml Luria broth supplemented ◦ with 150 g kanamycin and cultured overnight at 37 C with constant agitation . 
+ The cultured cells were inoculated with 1:100 dilution into 50 ml of the fresh M9 medium containing 2 g/l-glucose in either the presence or absence of 1 g/l-arginine and continued to be grown at 37 ◦ C until reaching an appropriate cell density ( OD600 ≈ 0.5 ) . 
+ ChIP-exo
+ Cultured cells ( 50 ml ) were cross-linked with 1 % formaldehyde at room temperature for 30 min . 
+ 2 ml of 2.5 M glycine was added to quench the unused formaldehyde . 
+ After washing three times with 50 ml of ice-cold Tris-buffered saline ( TBS ) , the washed cells were resuspended in 0.5 ml of lysis buffer composed of 50 mM Tris-HCl ( pH 7.5 ) , 100 mM NaCl , 1 mM EDTA , 1 g ml RNaseA , / protease inhibitor cocktail and 1 kU Ready-Lyse lysozyme ( Epicentre , Madison , WI , USA ) , and then incubated at 37 ◦ C for 30 min ( 20 ) . 
+ The cells were then treated with 0.5 ml of 2 × immunoprecipitation ( IP ) buffer ( 100 mM Tris-HCl ( pH 7.5 ) , 100 mM NaCl , 1 mM EDTA , 2 % ( v/v ) Triton X-100 and protease inhibitor cocktail ) , followed by incubation on ice for 30 min . 
+ The lysate was sonicated in an ice bath using Sonic Dismembrator Model 500 ( four times for 20 s each , output level , 2.5 W ) . 
+ Size distribution of the fragmented DNAs was confirmed using agarose gel electrophoresis ( 200 -- 400 bp ) after removing cell debris by centrifugation . 
+ The cross-linked DNA-ArgR complexes in the supernatant were then subjected to IP by adding 10 l of Anti-myc ( 9E10 ) ( Santa Cruz , Dallas , TX , USA ) . 
+ For mock-IP control , 2 g of normal mouse IgG ( Santa Cruz ) was added into the supernatant in parallel . 
+ They were then incubated overnight at 4 ◦ C with constant rotation . 
+ The cross-linked DNA-protein and antibody complexes were selectively captured by adding 50 l of Dynabeads Pan Mouse IgG magnetic beads ( Invitrogen , Grand Island , NY , USA ) . 
+ Next , DNAs were end-polished using T4 DNA polymerase ( NEB , Ipswich , 
+ MA , USA ) , ligated with the annealed adaptor 1 ( 5 ′ - Phospho-AACTGCCCCGGGTTGCTCTTCCGATCT and 5 ′ - OH-AGATCGGAAGAGC-OH ) , nick-repaired using phi29 polymerase ( NEB ) , and digested with exonuclease ( NEB ) as illustrated in the Supplementary Figure S1 ( 21 ) . 
+ Then , protein-DNA complexes were reverse-cross-linked by heating at 65 ◦ C overnight and proteins were degraded by 8 g of protease K ( Invitrogen ) . 
+ The purified DNAs were denatured at 95 ◦ C and extended by P1 primer ( 5 ′ - OH-GTGACTGGAGTTCAGACGTGTGCTCTTCC GATCT ) , further ligated with the annealed adaptor 2 ( 5 ′ - OH-ACACTCTTTCCCTACACGACGCTCTTCCGAT CT and 5 ′ - OH-AGATCGGAAGAGCGTCGTGTAGG GAAAGAGTGTAG ) . 
+ The ligated DNA products were purified using Qiagen polymerase chain reaction ( PCR ) purification kit and were PCR-amplified by P2 primer ( 5 ′ - OH-AATGATACGGCGACCACCGAGATCTAC ACTCTTTCCCTACACGACGCTCTTCCGATCT ) and P3 primer ( 5 ′ - OH-CAAGCAGAAGACGGCATACGA GATNNNNNNGTGACTGGAGTTCAGACGTGT ) . 
+ The degenerate sequence ( the underlined 6Ns ) in the P3 primer indicates the index sequence for the Illumina next-generation sequencing ( Illumina , San Diego , CA , USA ) . 
+ The PCR-amplified DNA products were separated on a 2 % agarose gel and the amplicons were excised from the gel and extracted using QIAquick gel purification columns . 
+ Real-time quantitative PCR
+ To measure the enrichment of the ArgR-binding DNA in chromatin IP samples , real-time quantitative PCR ( qPCR ) was performed . 
+ 1 l of IP or mock-IP DNA was used with specific primers to the previously identified ArgR binding regions ( gltB promoter ) and non-binding regions ( aroH gene ) ( 17 ) . 
+ The primer sequences for gltB were 5 ′ - AAGCTT ′ GCCATTTGACCTGT and 5 - TCCTTTTCGCATCGGT TAAT , the ones for aroH were 5 ′ - TCCTCTCGCCAGAC ′ AAAAAT and 5 - TCAAACTCGTGCAGCGTATC . 
+ A reaction mixture of 1 l of IP of mock-IP DNA , 1 l of 10 M primers of each region , 15 l of SYBR mix ( Biorad , Hercules , CA , USA ) and 13 l of ddH2O was prepared on ice . 
+ All real-time qPCR reactions were conducted in trip - ◦ licate . 
+ The samples were cycled for 15 s to 94 C , for 30 s ◦ ◦ to 54 C and for 30 s to 72 C ( total 40 cycles ) in Thermal Cycler ( Biorad ) . 
+ The threshold cycle ( Ct ) values were calculated automatically by the iCycler iQ optical system software ( Bio-Rad ) . 
+ Normalized Ct ( Ct ) values for each sample were calculated by subtracting the Ct value obtained for the mock-IP DNA from the Ct value for the IP-DNA ( Ct = Ct , IP -- Ct , mock ) . 
+ Next-generation sequencing
+ Prior to the high-throughput sequencing , the sequencing libraries for ChIP-exo were cloned into TOPO vector ( Invitrogen ) and several colonies were subjected to Sanger sequencing to confirm the adapter sequences and inserted DNA length of the sequencing library . 
+ Then , the sequencing libraries were quantified using Qubit © R 2.0 fluorometer ( Invitrogen ) and ExperionTM system ( Bio-Rad ) , and se quenced using Illumina Miseq © R V2 ( Supplementary Figure S2 ) . 
+ Read mapping and data processing
+ All sequencing reads from ChIP-exo experiments were mapped to E. coli MG1655 reference genome ( NC 000913 ) using CLC Genomics Workbench5 with the length fraction of 0.9 and the similarity of 0.99 ( Supplementary Table S1 ) . 
+ To capture target protein binding sites from ChIP-exo data , corresponding genomic position of mapped reads start position ( MRSP ) was counted and stored for visual inspection using in-house scripts . 
+ Motif searching
+ The motif search and sequence logo was completed using the BioProspector , MEME Suite ver . 
+ 4.9.128 , and WebLogo 3 . 
+ Raw experimental data
+ All raw data files can be downloaded from Gene Expression Omnibus through accession number GSE60546 . 
+ RESULTS
+ Immunoprecipitation (IP) of ArgR-DNA complexes
+ ArgR is a transcription factor involved in arginine biosynthesis and metabolism in E. coli . 
+ The high concentration of cellular arginine enhances ArgR affinity for specific genomic regions and concurrently modulates the transcription of the related genes . 
+ Cellular arginine facilitates the formation of the ArgR hexamer . 
+ Consequently , the presence of arginine is essential for ArgR hexamer to bind its binding sites with high affinity for the transcriptional regulation of its regulon members ( 22 ) . 
+ We used the genome-wide ChIP-exo method on the E. coli K-12 MG1655 strain harboring myc-tagged ArgR protein to probe the ArgR-binding sites at single nucleotide resolution in vivo ( 17,21 ) . 
+ Since ArgR responds to the concentration of exogenous L-arginine , the cells were grown in M9 minimal media either in the presence ( + ARG ) or absence ( − ARG ) of the amino acid . 
+ Prior to the genome-wide ChIP-exo assay , we first examined the enrichment of ArgR proteins on the promoter of gltBDF operon in the IP ArgR-DNA complexes under the experimental conditions ( Figure 1a ) . 
+ A cross-linking experiment was performed at mid-log phase , followed by lysis , DNA shearing , and IP using anti-myc antibody and then purification of DNA fragments . 
+ Quantitative PCR was performed to confirm the enrichment of ArgR-binding regions in the immunoprecipitated DNA ( IP-DNA ) samples by using primers that amplified the previously known ArgR-binding region . 
+ ArgR negatively regulates the gltBDF operon , which encodes one of the two main ammonia assimilation pathways in E. coli ( 23 ) . 
+ As a negative control , we examined the level of ArgR enrichment on the promoter region of aroH , which is involved in the biosynthesis of aromatic amino acids ( 24 ) . 
+ The occupancy level of ArgR at the promoter region of gltBDF operon was ∼ 60-fold higher than aroH under both + ARG and − ARG growth conditions ( Figure 1a ) . 
+ This result is in good agreement with the previous ChIP-chip results ( 17 ) , demonstrating that ArgRbound DNA fragments were selectively enriched under the experimental conditions . 
+ Determination of genome-wide ArgR-binding loci using ChIP-exo
+ The direct analysis of in vivo ArgR-binding across the E. coli genome , previously described using ChIP-chip experiments , revealed a total of 61 unique ArgR-binding regions . 
+ This study demonstrated that integration of the ChIP-chip with transcriptome analysis determines the ArgR regulon along with its transcriptional regulatory network overarching the amino acid metabolism ( 17 ) . 
+ Although a partially conserved 18-bp-long imperfect palindrome sequence was inferred as the consensus ArgR-binding motif from the previous ChIP-chip study , we were unable to elucidate the interaction between ArgR hexamer and the neighboring sequences of the ArgR-binding motif due to the limitation of peak resolution . 
+ Therefore , we employed ChIP-exo assay ( Supplementary Figure S1 ) , which sequentially performs exonuclease trimming , end polishing , blunt-ended and nickrepairing of the IP-DNA followed by high-throughput sequencing ( Figure 1b ) ( 21 ) . 
+ To this end , we modified the ChIP-exo method for the Illumina sequencing platforms . 
+ The high-quality sequencing reads from the + ARG and − ARG samples were uniquely mapped to the E. coli reference genome ( NC 000913 ) , separately , resulting in identification of ArgR-binding sites in the genome-wide landscape ( Figure 1c ) . 
+ In case of the + ARG sample , ArgR-binding occupancy was increased in the identified binding regions ( over 90 % loci ) , in comparison to the -- ARG sample ( Supplementary Figure S3 ) , which is consistent with the previous ChIP-chip result ( 17 ) . 
+ Overall , the genome-wide ChIP-exo profile exhibits a pattern similar to the ChIP-chip profile ; but , we observed ∼ 100-fold higher signal-to-noise ( S/N ) ratio with ChIP-exo profile . 
+ The ChIP-exo method enabled the identification of the precise location of the ArgR-binding genomic regions , which are represented by the two peaks ( hereafter , referred to as a peak-pair ) , one from the top strand and the other from the bottom strand ( Figure 1b ) . 
+ The additional exonuclease treatment digested the ArgR-bound DNA up to the first nucleotide point of cross-linking between DNA and ArgR in the 5 ′ to 3 ′ direction . 
+ Thus , these peak-pairs allowed us to identify ArgR-binding locations , which are strand-specific for the interaction between DNA and ArgR . 
+ From this data set , a total of 62 unique ArgR-binding locations were identified ( Supplementary Table S2 ) . 
+ The ChIP-exo profiles represented complete coverage of the 15 ArgR-binding regions , which had been characterized by in vitro DNA-binding experiments and in vivo mutational analysis ( 25 ) . 
+ The previous ChIP-chip assays determined a total of 64 ArgR-binding regions , including two divergent promoter regions ( 17 ) . 
+ From the comparative analysis of the ChIP-chip data with the ChIP-exo data , a majority of them ( 90 % ) were identified simultaneously ; however , a few exceptions were observed , such as asnT , yoeI , yqaE , plsC , atpI and phnN promoters ( Figure 1d ) 
+ These exceptions were attributed to low occupancy level ( ∼ 1.10 ) measured by ChIP-chip , which was significantly lower than other regions ( ∼ 2.78 ) ( Supplementary Table S2 ) . 
+ Thus , exonuclease treatment may eliminate contamination of non-ArgR-bound non-specific DNA fragments with the detection of DNA fragments that are weakly bound by ArgR ( 21 ) . 
+ Additionally , ChIP-exo profiles exhibited four new ArgR-associations from the upstream regions of proV , mltA , yhcC and ygaW , which encode a subunit of glycinebetaine/proline ABC transporter , one of six methionine tRNAs , predicted Fe-S oxidoreductase and L-alanine exporter , respectively ( Supplementary Table S2 ) . 
+ All newly identified ArgR-binding regions were confirmed by electrophoretic mobility shift assays ( EMSA ) ( Supplementary Figures S4 and S5 ) . 
+ The average distance between peaks at the extremities was 116 bp , which indicates a better peak resolution than ChIP-chip analysis ( Figure 1e ) . 
+ The high resolution of ArgR-binding location led us to infer its mode of regulation . 
+ Based upon the position of 84 % and 76 % of ArgR-binding peaks found at the upstream sites of translation start codon and within ± 100 bp at the vicinity of transcription , ArgR regulates most of the genes in its regulon at the transcriptional level ( Figure 1f and g ) . 
+ Taken together , ChIP-exo profiles show low background and enhanced signals , leading to the attainment of bona fide ArgR-binding locations with high resolution . 
+ Analysis of unique ArgR-binding peak-pair pattern
+ We found that the ArgR-binding signals are often composed of multiple peak-pairs using ChIP-exo analysis . 
+ Th presence of such multiple peaks indicates that the interaction between ArgR and the cognate DNA sequence is more complicated than previously thought ; that it was based upon the simple DNA binding motif composed of a pair of palindromic sequences ( 9,11,26 ) . 
+ For quantitative analysis of the ChIP-exo profiles , we determined 5 ′ end positions of mapped reads ( MRSPexo ) at each genomic position . 
+ The MRSPexo provides strand-specific first point of cross-linking site between DNA and the ArgR at top and bottom strands , which may directly provide structural information of the complex . 
+ For instance , we found single , double and triple peak-pairs from the promoter regions of hisJ , aroP and argD , which are responsible for the ATP-dependent histidine transport , active transport of three aromatic amino acids across E. coli inner membrane and amination steps in lysine , ornithine and arginine biosynthesis , respectively ( Figure 1h ) ( 27 -- 29 ) . 
+ We sought to analyze the characteristics of the different multiplicities of ArgR at different binding sites . 
+ First , to analyze genome-wide multiple peak-pair patterns , the MRSPexo signals of individual ArgR-binding regions were visualized as heatmaps using the values ranging from − 150 to +150 bp from the center position . 
+ The heatmaps were categorized into three classes of ArgR-binding regions based on the number of peak-pairs ( Figure 2a , Supplementary Table S3 ) . 
+ From the 63 unique ArgR-binding loci , we identified 21 sites ( ∼ 33 % ) with a single peak-pair . 
+ Significant portions of ArgR-binding loci ( ∼ 67 % ) were composed of double ( 25 sites ) and triple peak-pairs ( 17 sites ) ( Figure 2b , Supplementary Table S3 ) . 
+ MRSPexo at the single peak were enriched between − 150 and +150 bp from the center of forward and reverse single peak-pair ( F1-R1 ) . 
+ Double and triple peak-pairs are composed of F1-R1 and F2-R2 ; and F1-R1 , F2-R2 and F3-R3 , respectively ( Figure 2c ) . 
+ In cases of double and triple peak-pairs , the signals were enriched from the center of F1-R2 and F1-R3 between − 150 and +150 bp , respectively . 
+ Thus , the complex interaction between ArgR and the cognate DNA is a genome-wide pattern . 
+ Next , we calculated the distance between forward and reverse peaks from each peak-pair category . 
+ Surprisingly , the pitch had a uniform distance of 93 bp ( ± 2 ) between symmetrically arranged peaks of the peak-pair ( F1-R1 , F2-R2 and F3-R3 ) , regardless of the number of the peak-pair ( Figure 2d ) . 
+ In addition , the distance between each peak-pair was approximately 20 bp ( Figure 2e ) , suggesting that the ArgR binds to the cognate DNA in similar manner ( i.e. sequence specific binding ) but different conformation according to the number of binding events between ArgR and DNA . 
+ We next examined if the number of peak-pairs show direct correlation at the loci with the ArgR-binding occupancy in the ChIP-chip data ( 17 ) . 
+ Indeed , we observed an increase in occupancy between single , double and triple peak-pairs , whose median values were 1.56 , 3.34 and 4.08 , respectively , indicating a positive correlation due to the number of cross-linking sites between ArgR protein and DNA sequence ( Figure 2f ) . 
+ The ChIP-chip or ChIP-seq signal intensities at the ArgR-binding sites serve as a good indicator of the different binding occupancies of ArgR ( 30 ) . 
+ Furthermore , the multiple peak-pairs are a direct consequence of various topological structures of ArgR-DNA complexes . 
+ It was proposed that the association of hexameric ArgR complex induces sharp DNA bend by an angle of ∼ 70 − 90 ◦ ( 9 -- 11 ) , which covers a region of approximately four helical turns through only one side of the DNA helix ( 26,31 ) . 
+ Despite in vitro experimental evidence supporting such a steric-hindrance model , our results argue that the bending angle and region covered by ArgR complex in vivo is variable . 
+ In vivo organization of the ArgR-DNA complexes The hexameric ArgR complex binds to the specific DNA motif composed of a pair of imperfect palindromic sequences that are connected by a fixed length spacer sequence ( 2 or 3 bp ) ( 26 ) . 
+ To examine if the multiple peakpairs are the consequence of the presence of multiple ArgR-binding motifs , we inferred a de novo position-specific weight matrix ( PSWM ) for ArgR using MEME , which is a bioinformatics tool that identifies overrepresented motifs in multiple unaligned sequences ( 32 ) . 
+ The DNA motifs were screened from the sequences for peak pairs of the three categories . 
+ All peak-pairs contained the 39-bp long ArgR-binding motif comprising two 18 bp palindromic sequences with three nucleotides as a spacer , however the multiple ArgR-binding motifs were not observed in double and triple peak-pairs ( Figure 3a ) . 
+ Thus , we speculated that the multiple peak-pairs in our ChIP-exo profiles did not originate due to the interaction between ArgR subunits with the multiple binding motifs . 
+ Instead , we hypothesize that the multiple peak-pairs are the consequence of the single binding motif serving as an anchor for the confined non-specific interaction with neighboring sequences by the ArgR subunits . 
+ This hypothesis is further supported by the fact that the distance between forward and reverse peak ( ∼ 93 bp ) is longer than the 39-bp long ArgR-binding motif . 
+ To investigate this hypothesis , we determined the location of the ArgR-binding motif ( i.e. two ArgR boxes connected by 3-bp spacer ) between each paired peak . 
+ A total of 122 individual peak-pairs were identified from the 63 ArgR-binding loci ( Figure 3b , Supplementary Table S4 ) . 
+ Interestingly , these peak-pairs were classified into three groups based upon the location of the two ARG boxes in the DNA sequence between forward and reverse peak ( i.e. left , middle and right position ) . 
+ In the first group ( 34 peak-pairs ) , the two ARG boxes are located at 6.7 bp on average from the left end of the DNA sequence . 
+ In the second ( 47 peakpairs ) and third group ( 41 peak-pairs ) , the two ARG boxes were located at 26.9 and 47.3 bp from the left end , respectively . 
+ The respective distance between the left ends of each group were 20.2 and 20.4 bp . 
+ These unique peak-pair patterns suggest that the crosslinking positions detected from ChIP-exo are correlated with the interaction between a multimeric ArgR complex and its binding region . 
+ It is known that two monomeric ArgR subunits bind one ARG box . 
+ Thus , two ARG boxes of 39-bp in length are occupied by four monomeric ArgR subunits through interaction with only one side of the DNA helix that is equivalent to a region of about four helical turns ( 31 ) . 
+ Note that a hexameric ArgR complex , which is the functionally active form for regulating the target genes , is composed of two trimeri 
+ ArgR complexes depending on the allosteric effect of arginine ( 33,34 ) . 
+ However , our data show a difference in the sequence length of ArgR-binding region ( ∼ 39 bp ) between in vitro experiments and the protected region ( ∼ 93 bp ) by in vivo ChIP-exo experiment . 
+ Thus , we propose three ArgR-binding modes based upon the participation of the remaining two monomeric ArgR subunits in the interaction with the residual DNA region ( Figure 3c ) . 
+ For modes and , four monomeric ArgR subunits from the extreme left or right positions bind to the two ARG boxes , and the remaining two monomeric ArgR subunits interact non-specifically with the residual DNA ( Figure 3c ( ) and ( ) ) . 
+ The interaction between two ARG boxes and four monomeric ArgR subunits , which bends the DNA by an angle of ∼ 70 − 90 ◦ ( 9 -- 11 ) , may permit the contact of two monomeric ArgR subunits with the residual DNA . 
+ For mode , four monomeric ArgR subunits at the center position hold the ArgR-binding motif by bending DNA . 
+ Each ArgR subunit at the extreme left and right positions interacts with the residual DNA sequences non-specifically ( Figure 3c ( ) ) , which does not require an additional binding motif or identical length of sequence with the ARG box . 
+ Furthermore , the N-terminal domain of ArgR carries a basic charge that interacts with the negatively charged DNA ( 35 ) . 
+ To test this hypothesis , we screened the additional motif or a single ARG box from the DNA sequences of nonspecific contact region using the MEME tool . 
+ No significant DNA motifs were found from residual sequences of the mode , and . 
+ For example , the upstream region of hisJQMP operon containing ARG boxes participates in binding and stabilizing ArgR interaction ( 36 ) . 
+ This site is ∼ 90 bp positioned away from ARG boxes ( 37 ) . 
+ Thus , the binding of four monomeric ArgR subunits to ARG boxes facilitates DNA-bending that mediates non-specific contacts between ArgR subunits and the ArgR-binding region . 
+ Next , we elucidated the structural difference between single , double and triple peak-pairs . 
+ The previous gelretardation experiments suggested that one ArgR hexamer binds to the two palindromic ARG boxes ( 31 ) . 
+ Consistent with this , our data imply that the ArgR-binding regions can bind to one of the three modes ( Figure 3d ) . 
+ Thus , the number of peak-pairs can be determined by the binding accessibility of ArgR to the ARG boxes that results in regulat ing the bending angle ( ∼ 70 -- 90o ) . 
+ For example , the higher ArgR-binding accessibility can induce the lower bending angle , resulting in a greater chance of non-specific contact for generating the multiple peak-pairs . 
+ These diverse binding patterns agree well with the fact that the imperfect ArgR consensus sequences are important for increasing the range of the arginine concentration in vivo to regulate genes in a large regulon ( 38 ) . 
+ Interaction between ArgR and RNA polymerase
+ In general , the ArgR represses transcription by steric exclusion of RNAP from the promoter regions ( 26,29,39 ) . 
+ To determine this interaction , we compared the ArgR-binding sites with the − 10 and − 35 promoter elements occupied by RNAP . 
+ We classified the interactions between ArgR and RNAP into three unique modes based on their binding locations . 
+ For instance , ArgR binds to the promoter region of the hisJQMP operon , which is occupied by RNAP for transcriptional initiation ( 36 ) . 
+ 34 genes showed overlap of binding location of ArgR with RNAP , henceforth referred to as the overlapped mode ( O ) ( Figure 4a ) . 
+ In the genes of aroP and yaaU , which encode an aromatic amino acid permease and an uncharacterized member of the major facilitator superfamily ( MFS ) of transporters , the ArgR-binding loci were determined at the upstream ( U ) and downstream ( D ) sites from RNAP-binding region , respectively ( Figure 4b and c ) . 
+ We determined 11 such genes as having the upstream and downstream modes , respectively ( Figure 4d ) . 
+ The relative binding locations of ArgR to the TSS positions ( upstream , downstream and overlapped ) were not directly correlated with the number of peak-pairs and transcriptional activity ( 17 ) ( Figure 4d ) . 
+ Altogether , the binding of ArgR does not simply exclude the RNAP for the transcriptional repression , but instead the transcriptional regulation by ArgR is likely mediated by the combinatorial effect of DNA-bending at the ARG boxes , the ArgR-binding positions , the interaction with other TFs , and the number of peak-pairs ( 23,37 ) . 
+ DISCUSSION
+ In conclusion , we describe in vivo DNA-wrapping modes around the hexameric ArgR complex induced by DNA-bending at the ARG boxes and non-specific contacts on a genome-wide scale . 
+ ArgR is a hexameric transcriptional regulator , which controls the transcription of genes involved in arginine biosynthesis , utilization and transport , as well a histidine transport ( 17,36 ) . 
+ In the presence of L-arginine , the hexameric ArgR complex binds to specific DNA sequences called ARG boxes , which consist of a pair of imperfect palindromic sequences . 
+ The two palindromes are connected by a fixed-length spacer sequence ( 2 or 3 bp ) , resulting in the ArgR-binding site totaling 39 bp in length ( 26 ) . 
+ It has been proposed that the association of hexameric ArgR complex with two ARG boxes bends DNA by an angle of ∼ 70 − 90 ◦ apparently centered between the pair of palindromes ( 9 -- 11 ) . 
+ Additionally , it was postulated that the hexameric ArgR complex covers a region of about four helical turns through only one side of the DNA helix ( 26,31 ) . 
+ Despite in vitro experimental evidence supporting such a steric-hindrance model , the mode of interaction of hexameric ArgR-DNA complex in vivo is unclear . 
+ Our ChIP-exo data indicated comprehensive ArgR-DNA interactions at high-resolution with successful removal of false positives , resulting in a clearer snapshot of in vivo ArgR-binding events than in a previous study ( 17 ) . 
+ The ArgR-binding data showing the unique DNA sequences ( 93 ± 2 bp ) defined by peak-pairs were classified into three modes comprising multiple peak-pairs ( 93 bp-long for each peak-pair and 20-bp-long interval between peak-pairs ) . 
+ Moreover , we discovered that 67 % of ArgR-binding regions contain multiple peak-pairs where one broad peak was shown in the previous ArgR ChIP-chip data ( 17 ) . 
+ Furthermore , the peak-pairs were grouped into three modes defined by the location of the two ARG boxes ( left , middle , right ) . 
+ The sharp DNA bending ( 70 − 90 ◦ ) can be induced by specific interaction between four monomeric ArgR subunits and two ARG boxes . 
+ Subsequently , the interaction facilitates non-specific contacts between residual monomeric ArgR subunits and DNA sequences . 
+ These findings along with results of RNAP-binding loci suggest that the transcriptional regulation by hexameric ArgR complex is likely mediated by the combinatorial effect of DNA-bending at the ARG boxes , th 
+ ArgR-binding positions , the interaction with other TFs and the non-specific contacts between ArgR and neighboring sequences . 
+ ChIP-exo data significantly contributed to elucidating protein-DNA binding mechanisms at the genome-scale through the recognition of accurate protein-binding sites . 
+ In the future , this technology will support fundamental information for various transcription factors to understand the bacterial transcription regulatory network . 
+ Supplementary Data are available at NAR Online.
+ FUNDING
+ Intelligent Synthetic Biology Center of Global Frontier Project [ 2011 -- 0031957 to B.-K.C. ] ; Basic Science Research Program [ NRF-2013R1A1A3010819 to S.C. ] through the National Research Foundation of Korea ( NRF ) funded by the Ministry of Science , ICT and Future Planning . 
+ Funding for open access charge : Intelligent Synthetic Biology Center [ 2011-0031957 to B.-K.C. ] . 
+ Conflict of interest statement . 
+ None declared .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/26020590.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/26020590.txt 0 → 100644
View file @27818a9
+ Immunoprecipit on-sequencing ati
+ In bacteria , selective promoter recognition by RNA polymerase is achieved by its association with σ factors , accessory subunits able to direct RNA polymerase `` core enzyme '' ( E ) to different promoter sequences . 
+ Using Chromatin Immunoprecipitation-sequencing ( ChIP-seq ) , we searched for promoters bound by the σS-associated RNA polymerase form ( EσS ) during transition from exponential to stationary phase . 
+ We identified 63 binding sites for EσS overlapping known or putative promoters , often located upstream of genes ( encoding either ORFs or non-coding RNAs ) showing at least some degree of dependence on the σS-encoding rpoS gene . 
+ EσS binding did not always correlate with an increase in transcription level , suggesting that , at some σS-dependent promoters , EσS might remain poised in a pre-initiation state upon binding . 
+ A large fraction of EσS-binding sites corresponded to promoters recognized by RNA polymerase associated with σ70 or other σ factors , suggesting a considerable overlap in promoter recognition between different forms of RNA polymerase . 
+ In particular , EσS appears to contribute significantly to transcription of genes encoding proteins involved in LPS biosynthesis and in cell surface composition . 
+ Finally , our results highlight a direct role of EσS in the regulation of non coding RNAs , such as OmrA/B , RyeA/B and SibC . 
+ Bacteria are constantly exposed to changes and fluctuations in their environment , to which they can adapt by reprogramming their gene expression through various mechanisms , including use of alternative σ factors . 
+ σ factors are accessory subunits of bacterial RNA polymerase that associate , in a 1:1 stoichi-ometric ratio , to the core enzyme ( E ) , i.e. , the multi-subunit complex responsible for RNA polymerase catalytic activity . 
+ Binding to any of the different alternative σ factors creates different RNA polymerase holoenzymes ( Eσ ) , proficient in specific promoter recognition and transcription initiation . 
+ After the process of transcription initiation has taken place , the σ factor dissociates from the holoenzyme , and the core enzyme carries out transcription elongation1 . 
+ The number of σ factors varies considerably among bacteria : seven σ factors are known to be present in Escherichia coli , including σ70 ( or σD ) , the `` housekeeping '' σ factor devoted to transcription of a large part of the genome and of most essential genes . 
+ In contrast , alternative σ factors are responsible for the transcription of smaller subsets of genes , fulfilling specific roles or belonging to defined functional groups2 . 
+ One alternative σ factor , σS , strongly affects cell survival during stress conditions , such as starvation , oxidative stress , and exposure to either low or high pH , and controls expression of virulence factors in several pathogens3 . 
+ For its important role in response 
+ 1Institute of Biomedical Technologies , National Research Council ( ITB-CNR ) , Segrate ( MI ) , Italy . 
+ 2EAWAG , Swiss Federal Institute for Environmental Science and Technology , Dübendorf , Switzerland . 
+ 3Lab . 
+ Adaptation et Pathogénie des Micro-organismes ( LAPM ) , Univ. . 
+ Grenoble Alpes , F-38000 Grenoble , France . 
+ 4UMR 5163 , Centre National de Recherche Scientifique ( CNRS ) , Grenoble , France . 
+ 5Department of Biosciences , Università degli Studi di Milano , Milan , Italy . 
+ * These authors contributed equally to this work . 
+ Correspondence and requests for materials should be addressed to S.L. ( email : stephan.lacour@ujf-grenoble.fr ) or P.L. ( email : paolo.landini@unimi.it ) to cellular stresses , σS is considered the master regulator of the so-called `` general stress response '' and , consistently , it is induced in response to any stressful event leading to reduction in specific growth rate4 ,5 . 
+ Interestingly , σS and σ70 appear to recognize very similar promoter sequences6 . 
+ Consequently , several promoters are recognized with similar effifficiency by both EσS and Eσ70 in vitro7 , and their preferential recognition by either form of RNA polymerase in vivo is mediated by accessory regulatory proteins6 . 
+ Selective promoter recognition by either σ70 or σS can be achieved by deviations from a common consensus sequence6 ,8 which confer specificity for either σ factor : for instance the presence of a C nucleotide ( − 13C ) immediately upstream of the − 10 promoter element is a known determinant for σS binding and it is a common feature in σS-dependent promoters9 . 
+ In a previous work , we set out to determine which promoters are preferentially bound in vitro by either Eσ70 or EσS by run-off transcription microarray ( ROMA ) ; we confirmed the importance of sequence elements important for promoter recognition by σS , such as the presence of C residues at positions -13 and -12 C element , and suggested that an A/T-rich discriminator region would favour transcription initiation by EσS in vitro10 . 
+ In this work , we used Chromatin-Immunoprecipitation-sequencing ( ChIP-seq ) to identify promoters bound by EσS at early stationary phase , i.e. , at a moment in which σS accumulates inside the bacterial cell . 
+ Our results led to identification of novel σS-dependent genes , and provided insight on regulation of non-coding RNAs by σS . 
+ We could also show that a significant subset of EσS-bound promoters controls genes whose expression is σS-independent , suggesting considerable overlap in promoter recognition by different σ factors . 
+ Results
+ MG1655-rpoSHis6 construction and σS - His6 immunoprecipitation . 
+ Since no anti-σS antibodies suitable for immunoprecipitation were available at the time of this study , we decided to utilize anti-6xHis-tag antibodies targeting a histidine-tagged σS protein ( σS-His6 ) . 
+ In order to study promoter binding by σS-His6 without perturbing σS physiological levels or rpoS gene expression , we constructed a strain carrying a chromosomal rpoSHis6 allele , i.e. , an otherwise wild type rpoS allele with 6 codons for histidine at its 3 ` end , as described in Materials and Methods . 
+ We verified the effects of the rpoS allele replacement on specific growth rate ( Fig. 1A ) and checked the relative amounts of both the wild type and the σS - His6 proteins at the onset of stationary phase by Western blot , using an anti-σS antibody ( Fig. 1A , inset ) . 
+ A Western blot with the anti-6xHis antibody confirmed that the MG1655-rpoSHis6 strain did indeed produce a 6xHis-tagged σS protein ( data not shown ) . 
+ No differences were detected in either specific growth rate or intracellular σS amounts in the two strains ( Fig. 1A ) . 
+ Western blot analysis clearly showed that , as expected , the amount of σS ( or σS-His6 ) increased significantly at the end of the exponential phase , ( compare points 1 and 2 ) : at this point , bacterial cells were growing at a specific growth rate of 0.32 ( ± 0.02 ) h − 1 . 
+ Cells were collected at the growth stage corresponding to point 2 in Fig. 1A in all subsequent experiments . 
+ To verify whether the C-terminal histidine tag might affect σS activity in vivo , we tested the activity of HPII catalase , encoded by the rpoS-dependent katE gene and a marker for rpoS functionality11 . 
+ No statistically significant difference in HPII specific activity was detected between MG1655 and MG1655-rpoSHis6 , while , in contrast , HPII catalase specific activity was almost totally abolished in an rpoS null mutant strain , as expected ( Fig. 1B ) . 
+ These results indicate that introduction of the 6xHis-tag in the σS protein does not affect its abundance , physiological regulation and activity . 
+ Thus , we performed protein-DNA co-immunoprecipitation experiments in the MG1655-rpoSHis6 strain , using anti-6xHis antibodies . 
+ As a quality control of the co-immunoprecipitation experiment , we verified the enrichment of a known binding site for EσS in the immunoprecipitated samples compared to sonicated DNA ( Input sample ) . 
+ To this purpose , we performed qRT-PCR experiments comparing the relative abundance of the promoter region of the σS-dependent dps gene ( Pdps ) to coding sequences within the rpoB and the yeeJ genes . 
+ Both the Pdps/rpoB and Pdps/yeeJ ratios approached 1 in the Input sample , while being 10-fold higher in the σS-His6 immunoprecipitation sample ( σs-IP ; Fig. 1C ) , thus suggesting strong enrichment in EσS binding sites by the immunoprecipitation procedure . 
+ Chromatin immunoprecipitation-sequencing ( ChIP-seq ) . 
+ Two replicates of the Input sample ( MG1655-rpoSHis6 chromosomal DNA ) and of the σS-IP sample ( σS-His6 immunoprecipitated DNA ) were used to prepare sequencing libraries . 
+ The libraries were sequenced into 4 separate lanes of the same GAIIx run . 
+ We obtained more than 50 million mapping reads for both the input samples ( corresponding to a sequencing depth of 543-fold the E. coli genome ) ; for the first and the second IP samples , more than 26 and 32 million mapping reads were obtained , respectively . 
+ Identification of the DNA regions more represented in the σs-IP sample , corresponding to potential binding sites for EσS , was carried out using the CisGenome software12 , which yielded 78 `` peaks '' , i.e. , regions of the genome significantly enriched ( pval ≤ 0.01 ) in the σs-IP sample as compared to the Input sample . 
+ Almost all peaks detected ( 72/78 ) corresponded to DNA regions ≤ 400 bp-long or slightly larger , consistent with the DNA fragment sizes obtained after DNA sonication ( see Materials and Methods , `` σS-His6 immunoprecipitation '' ) . 
+ Three enriched regions were slightly larger in size ( 500-700 bp ) , while only three regions had sizes larger than 1kbp ( 1049 , 1199 and 3149 bp , respectively ) . 
+ The last one encompassed a DNA region including five different ORFs and several non-coding and regulatory elements , making it impossible to identify a putative binding site for EσS ; thus , this DNA fragment was excluded from further analysis and is listed , together with intragenic peaks , in Supplementary Table S2 ( see below ) . 
+ On the contrary , the two peaks just over 1 kbp overlapped a single known promoter region , and were thus included in the EσS binding site analysis shown in Table 1 . 
+ The visualization through Integrative Genome Viewer ( IGV ) of represent-ative σS binding peaks obtained from the CisGenome analysis is shown in Fig. 2 : significantly enriched genomic regions ( i.e. , peaks ) are reported for the known rpoS-dependent genes osmB , dps , osmE and csrA ( Fig. 2A ) and for loci associated to the small RNAs sibC/ibsC , ryeA/ryeB , and omrA/omrB ( Fig. 2B ; see also section `` Regulation of non-coding RNA by EσS '' ) . 
+ The large majority ( 63 out of 78 ) of the σS-IP peaks was located immediately upstream of coding sequences or known regulatory RNAs , consistent with σS binding to promoter regions . 
+ Out of these 63 peaks , 61 were located in intergenic regions , while two peaks lie within the stfR and wbbH ORFs , but upstream , respectively , of the tfaS and wbbI genes , suggesting that they might define internal promoters within operons . 
+ The remaining peaks fell into intragenic regions at considerable distance from other ORFs ( listed in Supplementary Table S2 ) . 
+ Although it is possible that some of these peaks might define bona fide EσS binding sites ( e.g. , promoters for yet unknown antisense RNAs ) , they were not considered for further characterization within this study . 
+ However , even assuming that all the intragenic peaks are artefacts of ChIP-seq , the resulting percentage of false positives ( 19 % ) would still be lower than what reported for similar studies13 . 
+ 50 out of the 63 peaks corresponding to known or putative promoter regions could unequivocally be attributed to one specific gene , based on the DNA sequence covered by the peak , the direction of transcription of the neighbouring genes , the distance to the nearest ORFs and , when available , the presence of an experimentally determined transcription start site within the boundaries of the peak . 
+ Of the 50 genes unequivocally identified , 27 had been shown to be at least partially rpoS-dependent in previous reports , as listed in Table 1 . 
+ In contrast , 13 peaks , listed in Table 2 , lie in intergenic regions between divergently transcribed genes or operons and could not be assigned to a specific gene . 
+ However , we often found that one of the two divergent genes ( or even both , as for the dsrB-yodD intergenic region , Table 2 ) had previously been described as rpoS-dependent , thus suggesting that EσS binding was due the presence of an rpoS-dependent promoter within the intergenic region . 
+ As an example , we assigned the putative EσS binding site in the osmE-nadE intergenic region to osmE , since its promoter is σS-dependent14 -- 16 ( Fig. 2 and Table 2 ) . 
+ Altogether , the peaks identified in the ChIP-seq experiment overlapped with the promoters of 36 genes that had been shown to be at least partially rpoS-dependent ( highlighted in Tables 1 and 2 ) . 
+ Stress-related genes defined the most represented functional category in our ChIP-seq analysis ( see Tables 1 -- 2 ) , in agreement with the role of σS as master regulator of the general stress response . 
+ Interestingly , binding sites for EσS were also found upstream of several genes involved in cell envelope structure ( erfK , lpp , ynhG ) and lipopolysaccharide ( LPS ) biogenesis ( lpxC , wbbH , wbbI ) , suggesting that EσS might be important for the expression of cell surface-related genes in response to growth cessation . 
+ The majority of the intergenic regions not linked to rpoS-dependent genes included known or putative promoters recognized by Eσ70 , in agreement with previous results indicating extensive cross-recognition between EσS and Eσ70 regulons7 ,9 . 
+ Interestingly , however , several promoters are also recognized by other alternative σ factors , namely σE ( ytfJ and lpxP ) and σH ( hepA , sdaA , raiA and rpmE ) ( Tables 1 -- 2 ) . 
+ In vivo expression of genes identified by ChIP-seq analysis . 
+ The results of our ChIP-seq experiments seem to indicate that a large percentage of EσS-binding sites are associated with promoters directing transcription of rpoS-independent genes . 
+ Alternatively , regulation of these genes by σS might have been overlooked in previous investigations of the rpoS regulon , mostly carried out as whole genome transcription analysis comparing an rpoS mutant to its parental strain14 -- 19 . 
+ In order to elucidate the functional role of the EσS-binding sites , we measured relative expression of 10 genes whose promoters , according to our ChIP-seq results , are recognized by EσS , by performing qRT-PCR experiments comparing E. coli MG1655 to its otherwise isogenic rpoS mutant . 
+ As control genes in the qRT-PCR experiment , we chose 4 genes previously proposed to be rpoS-dependent : dps , ycgB , rssA and bsmA15 ,16,20 . 
+ The remaining 6 genes , never previously shown to be rpoS-dependent , were selected based either on their function or on promoter features : lpp encodes Braun lipoprotein , which bridges the outer membrane to peptidoglycan and is extremely abundant in E. coli21 ; ssrA is a transfer-messenger RNA ( tmRNA ) - encoding gene ; uxaB is involved in galacturonate metabolism ; ybiI is a gene of unknown function whose promoter had been indicated as putative EσS-dependent through bioinformatics prediction22 ; ydbK is an oxidative stress-related gene23 ; ygjR , like ybiI , is an unknown function gene with a known transcription start site24 , whose putative − 10 region shows some features typical of EσS-dependent promoters , such as the − 13C . 
+ Results of the qRT-PCR experiments ( Fig. 3 ) could demonstrate rpoS-dependent gene expression for dps , ycgB , ybiI and ydbK , suggesting that the latter two are yet unidentified members of the rpoS regulon . 
+ In contrast , the expression of the remaining genes was not affected by the lack of a functional rpoS gene , at least in the conditions tested . 
+ To further investigate whether these genes showed any kind of depend-ence on σS , we tested their expression levels in a rpoS-overexpressing strain ( MG1655/pBADrpoS ) grown to early stationary phase in LB medium supplemented with 0.1 % arabinose . 
+ Although intracellular σS amounts were almost 10-fold higher in the pBADrpoS-bearing strains compared to MG1655 , no significant changes in relative expression levels were detected for any of the genes tested ( data not shown ) . 
+ In vitro EσS-promoter interactions . 
+ Results of the ChIP-seq and qRT-PCR experiments failed to show strong correlation between EσS promoter binding and EσS-dependent transcription , even for genes previously described as rpoS-dependent , such as rssA and bsmA ( Fig. 3 ) . 
+ In order to confirm ChIP-seq results , we studied EσS-promoter interactions in vitro , by comparing EσS and Eσ70 for their ability to bind and to promote open complex formation at a subset of the promoters studied in qRT-PCR experiments . 
+ We selected the promoter regions of the two newly identified rpoS-dependent genes , ybiI and ydbK , together with the promoters of the known rpoS-dependent dps and bsmA genes , which , however , showed different behaviour in our qRT-PCR experiments . 
+ Firstly , we performed GMSA with either EσS or Eσ70 , in the presence of heparin to select for open complexes , on regulatory DNA fragments ( extending from 250 bp upstream to 30 bp downstream of the start codon ) . 
+ EσS was clearly more effifficient than Eσ70 in promoting open complex formation at the ybiI , ydbK and bsmA promoters ( compare amounts of unbound DNA probes , Fig. 4A ) , while both forms of RNA polymerase showed similar proficiency in open complex formation at the dps promoter , despite its strong EσS-dependence in vivo ( Fig. 3 ; 8,16 ) . 
+ As a negative control for binding by EσS , we performed GMSA experiments on the strictly Eσ70-dependent crl promoter , which clearly showed preferential binding by Eσ70 ( Supplementary Fig . 
+ S1 ) . 
+ To further investigate promoter DNA-RNA polymerase interaction , and to map the exact location of the -10 promoter elements for ybiI , ydbK and bsmA , we performed KMnO4 reactivity assays ( Fig. 4B ) . 
+ Treatment with permanganate oxidizes thymidine residues in single-stranded DNA , allowing us to identify precisely the location of open complexes . 
+ As expected , no open complex formation by EσS was detected at the Eσ70-dependent crl promoter ( Supplementary Fig. 1 ) . 
+ In contrast , open complex formation at the bsmA promoter was only observed in the presence of EσS , consistent with GMSA results and confirming specific recognition by EσS at this promoter . 
+ Similarly , at the ybiI promoter , binding by EσS resulted in much stronger reactivity than Eσ70 , indicating more effifficient open complex formation . 
+ A more complex picture emerged from KMnO4 experiments at the ydbK promoter , which showed that both EσS and Eσ70 can recognize a promoter located , in agreement with bioinformatics predictions22 , at ca. 70 nucleotides upstream of the ydbK ORF . 
+ However , subtle changes can be observed in the pattern of KMnO4 reactivity induced by the two RNA polymerase-promoter complexes , with binding by EσS resulting in higher reactivity in the T residues at positions − 4 to − 2 ( marked by an arrow in Fig. 4B ) . 
+ Taken together with GMSA results , this observation suggests that , at the ydbK promoter , EσS might trigger formation of an open complex more resistant to heparin challenge and possibly more proficient in transcription initiation . 
+ Finally , at the dps promoter , both EσS and Eσ70 induced open complex formation with equal effifficiency , indicating lack of preferential recognition by either form of RNA polymerase in vitro . 
+ Regulation of non-coding RNAs by EσS . 
+ Results of ChIP-seq analysis indicate that three EσS binding sites are positioned in the proximity of genes encoding regulatory RNAs . 
+ A putative EσS binding site was identified upstream of the 88 nt-long regulatory RNA omrA , which controls expression of genes involved in flagellar motility , iron uptake , adhesion factors and various outer membrane proteins25 . 
+ The omrA gene lies next to omrB , which codes for a highly similar small RNA and also regulates some of the targets for omrA25 ,26 . 
+ The other two EσS binding sites were found in proximity of two complex loci : the ryeA/ryeB locus , which includes two small RNAs overlapping in antisense directions27 , and the sibC/ibsC locus , in which a non coding RNA ( sibC ) overlaps a small ORF , ibsC , reading in the opposite direction , and encoding a toxic peptide28 . 
+ The location and extension of the three ChIP-seq peaks suggest that EσS might bind the promoter regions of omrA ( but not omrB ) , and of ryeB and sibC , rather than ryeA and ibsC ( Fig. 2B ) , consistent with recent observations that omrA and ryeB are rpoS-dependent in Salmonella enterica29 ,30 . 
+ To confirm this result , we performed northern blots comparing small RNA levels in the wild type versus the rpoS mutant strain of E. coli ( Fig. 5 ) . 
+ In addition to standard growth conditions ( LB medium at 37 °C ) , we also carried out northern blot experiments at 28 °C , since low growth temperature favors σS accumulation and positively affects stability of some small RNA31 . 
+ Due to diffifficulties in obtaining a clean result with a probe for RyeB , we measured the relative amounts of RyeA , which upon pairing with RyeB , is degraded in an RNaseIII-dependent fashion and shows therefore transcript levels inversely proportional to ryeB27 ,29 . 
+ Inactivation of the rpoS gene almost abolished omrA transcription , while strongly increasing RyeA transcript levels ( Fig. 5A ) , consistent with rpoS-dependence of transcription of the omrA and ryeB genes . 
+ Interestingly , the OmrA and RyeA transcripts also displayed opposite temperature-dependence , with OmrA being more expressed at 28 °C and RyeA at 37 °C . 
+ As further confirmation that rpoS-dependent regulation specifically targets omrA , but not omrB , we performed gfp reporter assays . 
+ Reporter genes experiments clearly showed very different effects of rpoS inactivation on transcription of the two genes , with omrA showing almost complete rpoS-dependence , while omrB expression was actually slightly increased in the rpoS mutant background ( Fig. 5B ) . 
+ Interestingly , the first nucleotide of the − 10 region of omrA is a − 12C ( Supplementary Table S3 ) , a feature favouring specific promoter opening by EσS but not by Eσ70 32 , while at the omrB promoter , such a selective determinant is replaced by a canonical − 12T for Eσ70 and might explain lack of preferential binding by EσS . 
+ Substitution of the − 12C nucleotide by a − 12T in the omrA − 10 promoter element increases promoter strength by more than 10-fold and almost completely overcomes its dependence on rpoS ( Fig . 
+ 5C ) , suggesting that the − 12C act as a determinant for EσS specificity in the omrA promoter . 
+ A more complex picture emerged from analysis of the SibC transcript , which , like RyeA , showed increased expression at 37 °C than at 28 °C . 
+ At the latter temperature , SibC was transcribed in an rpoS-dependent manner ; however , the effect of the rpoS mutation was reversed at 37 °C , possibly suggesting additional regulatory mechanism affecting SibC expression at this temperature ( Fig. 5A ) . 
+ The complexity of SibC regulation is also suggested by the presence of two transcripts , either due to the presence of multiple promoters or to RNA processing as already described28 . 
+ Sequence analysis of σS-bound promoters . 
+ In order to assess the importance of σS-specific promoter determinants for binding by σS , we analyzed the sequences of the experimentally determined promoters controlling genes identified in the ChIP-seq experiments ( 30 promoters , listed in Supplementary Table S3 ) . 
+ The promoters were divided in two subsets : the ones directing transcription of genes reported to show some level of dependence on σS ( 21 promoters ) and those controlling genes whose expression is not affected by lack of a functional rpoS gene ( 9 promoters ) . 
+ In good agreement with the previously proposed consensus for σS 4,8,10,16 , − 10 region alignment of σS-dependent genes ( from − 20 to +1 , Fig. 6 ) suggests that their consensus sequence in the − 17 to − 6 region would be TNTGCYAAACTT , where N is any nucleotide and Y is a pyrimidine and W is either A or T ( Fig . 
+ 6 ) ; in addition , promoters of σS-dependent genes are characterized by an A/T-rich discriminator region . 
+ Promoters of σS-independent genes lack conservation of the C residues at positions − 13 , − 12 , and − 8 , reduced frequency of a T at position − 6 , and display a discriminator region richer in G/C ( Fig . 
+ 6 ) . 
+ Alignment of the − 35 regions of σS-bound promoters ( listed in Supplementary Table S4 ) highlighted some conservation of the σ70 consensus sequence , TTGACA , in the promoters of genes whose expression is independent of σS ; in contrast , in the promoters of σS-dependent genes , the − 35 region showed a weakly conserved sequence , GCTGACAAA , with some resemblance to the − 35 promoter element for σ70 ( Supplementary Fig . 
+ S2 ) . 
+ It remains to be understood whether this sequence might play any role in σS -- promoter interactions . 
+ Discussion
+ In this work , we used a ChIP-seq approach in order to identify promoters bound by EσS during the early stationary phase , in which σS concentrations surge in the bacterial cell ( Fig. 1A ) . 
+ The experimental conditions used in this work were chosen in order to identify genes directly regulated by σS that are induced in response to transition into stationary-phase . 
+ Indeed , we only detected 63 promoter regions bound by EσS ( Tables 1 -- 2 ) ; this number only represents a fraction of the σS-bound promoters previously identified either by microarray or by ChIP-on-chip analysis14 ,19,33 , which , however , were performed under a variety of different growth conditions and include genes subject to complex regulation and only indirectly regulated by σS . 
+ Out of the 63 promoters identified in our study , 38 ( 60 % ) control transcription of genes regulated by the σS-encoding rpoS gene ( Tables 1 -- 2 and references within ) . 
+ Two of these , ybiI and ydbK , had not yet been identified as part of the rpoS regulon , and we confirmed their preferential recognition by EσS via in vitro binding and open complex formation experiments ( Fig. 4 ) . 
+ However , a large percentage of σS-bound promoters control genes whose expression is not affected by the presence of this factor ( see Tables 1 -- 2 , Fig. 3 ) , suggesting that these promoters are recognized with similar effifficiency by σS and other σ factors , mostly σ70 . 
+ This result is consistent with the notion that σS does not only serve to promote expression of its own regulon , but it can also contribute to transcription of constitutively expressed genes . 
+ Promoter sequence comparison between bona fide σS-dependent genes and those not showing altered expression in an rpoS mutant highlighted the importance of the promoter elements associated with selective recognition by σS ( Fig. 6 ) . 
+ At least some σS-specific determinants might be more important for preventing recognition by σ70 in vivo rather than increasing binding affiffinity or promoter opening by σS , such as the presence of a C rather than a T as first nucleotide of the − 10 hexamer , as is the case at the omrA promoter ( Fig. 5C ) . 
+ Although the mechanisms of regulation by σS appear to be well conserved in Enterobacteria , some of the σS-independent genes found in our ChIP-seq analysis ( e.g. , tomB , sdaA , bsmA ) appear to be rpoS-dependent in Salmonella Typhimurium30 , possibly suggesting more effifficient promoter recognition by EσS in this bacterium . 
+ Promoter cross-recognition with σS also seems to extend to the alternative factors σE and σH ( Tables 1 -- 2 ) , in line with previous results showing similar functions of the rpoE and rpoS regulons and some promoter overlap between the two σ factors in vitro10 ,34 . 
+ Indeed , our results confirm a strong interplay between σS and σH , as the rpoH promoter is directly recognized by EσS ( Table 1 ) , in agreement with its rpoS-dependent expression35 . 
+ Our results would be consistent with recent reports showing co-regulation of the rpoE , rpoH and rpoS regulons in response to osmotic stress in enteropathogenic E. coli O157 : H736 , and an extensive analysis of the σ factor network in E. coli , showing extensive overlap in promoter recognition by alternative σ 's 33 . 
+ At least 10 of the rpoS-dependent genes identified in the ChIP-seq experiments encode small proteins involved in resistance to oxidative stress ( bsmA , dps , uspB , yaiA , ychH , ydbK , ygcG , yggE , yobF and yodD : Tables 1 -- 2 ) , while two more are linked to osmotic stress ( osmB and osmE ) . 
+ Our results would support the notion that , rather than being part of an adaptive response triggered by exposure to specific environmental stresses , the rpoS gene activates , in response to reduction in growth rate , a variety of stress-related genes , thus allowing the bacterial cells to `` brace themselves '' for any stressful conditions that might arise . 
+ However , promoter binding by EσS does not necessarily translate in increased transcription levels for EσS-dependent genes , suggesting that , upon binding , EσS might be unable to initiate transcription effifficiently at some promoters . 
+ For the bsmA promoter , this hypothesis would fit with the results of in vitro promoter interaction studies ( Fig. 4 ) and with our previous results , showing EσS-dependent transcription of the bsmA gene in vitro10 , but not in the bacterial cell . 
+ Since bsmA is induced in biofilm growth37 , it is possible that its transcription is repressed in planktonic cells , and triggered during biofilm growth . 
+ Thus , our results suggest that EσS might be poised at various promoters waiting for additional signals ( e.g. , leading to removal of a repressor protein ) in order to form a complex proficient in transcription initiation . 
+ While stress responses are well known examples of gene functions associated with the rpoS regulon , our results suggest direct involvement of σS in the expression of genes involved in biogenesis and structure of the LPS and outer membrane proteins ( Tables 1 -- 2 ) . 
+ Indeed , changes in cell surface structure and composition are known to take place in stationary phase38 . 
+ According to our ChIP-seq results , in addition to LPS genes , EσS also binds to the promoter of lpp , encoding Lpp or Braun lipoprotein , which links the outer membrane to peptidoglycan and is the most abundant outer membrane-associated lipoprotein in E. coli21 . 
+ Although lpp gene expression does not depend on the rpoS gene ( Fig. 3 ) , a connection of the rpoS gene with the function of Braun lipoprotein is further suggested by the identification of two more binding sites for EσS upstream of the erfK and ynhG genes , encoding two of the four alternative transpeptidases that crosslink Lpp to peptidoglycan . 
+ Both the erfK and ynhG genes had already been described as rpoS-dependent15 ,16 . 
+ Thus , it appears that , upon entry in the stationary phase of growth , rpoS might be required for maintenance of Lpp-transpeptidase activity in the periplasmic space . 
+ Finally , our results point to a direct role of EσS in the finely tuned regulation of non-coding RNAs : for instance , EσS promotes transcription of omrA , but not of the flanking gene , omrB ( Fig. 5 ) . 
+ Both genes encode very similar non-coding RNAs which target the same genes . 
+ It appears possible that different dependence on EσS by the two promoters might have evolved so to allow differential expression of the OmrA and OmrB non-coding RNAs in response to different signals , with OmrA induced as part of the rpoS regulon . 
+ The results of mutagenesis at the -12 position of the omrA promoter strongly reinforce the notion that the -12 C nucleotide can favourably bias transcription initiation by EσS at several promoters39 . 
+ Since both the OmrA and OmrB RNAs affect translation of several outer membrane proteins and extracellular structures such as curli and flagella40 , their selective regulation might mediate the impact of EσS on these structures , contributing to a general reorganization of the bacterial cell surface in response to stationary phase . 
+ Methods
+ Strain construction . 
+ The E. coli MG1655 His6 : : rpoS strain ( from now on MG1655-rpoSHis6 ) , carrying an rpoS gene in which a 6-histidine tag is added to an otherwise wild type allele , was constructed following the genetic procedures described for allele replacement41 ,42 . 
+ Linear DNA fragments containing a kanamycin resistance gene and the ccdB gene under the control of a rhamnose inducible promoter were amplified by PCR from the pKD45 plasmid . 
+ The first 45 nucleotides of either primer used for amplification ( primers rpoS_OF and rpoS_OR , Supplementary Table S1 ) correspond to the DNA regions immediately upstream and downstream of rpoS , targeting the gene for mutagenesis . 
+ After PCR amplification , the resulting DNA fragment including the kanR-ccdB cassette was used to transform the DY330 strain42 ; the rpoS knockout was then P1-transduced into MG1655 , selecting for kanamycin resistance . 
+ The ΔrpoS : : kanR-ccdB cassette was then replaced by an otherwise wild type rpoS sequence to which an additional sequence coding for a 6-histidine tag ( 6xHis-tag ) had been added by PCR amplification , using the rpoS_IF and rpoS_IR primers ( Supplementary Table S1 ) . 
+ To this aim , DY330 cells carrying the rpoS knockout were transformed by electroporation with a linear DNA fragment encoding for the rpoSHis6 gene , carrying the His-tag at the 3 ` end . 
+ Transformant selection was performed on M9 minimal medium agar plates containing 0.2 % rhamnose and 0.01 % biotin : due to the toxicity of the ccdB gene in the presence of rhamnose , only the cells in which an allele replacement has taken place are able to grow on this medium . 
+ The rpoSHis6 allele was P1-transduced into MG1655 carrying the rpoS : : kan-ccdB knockout , again selecting for loss of the ccdB gene by plating on M9 minimal medium agar plates containing 0.2 % rhamnose and 0.01 % biotin . 
+ The stability and functionality of the RpoS protein was verified by Western blot and measurement of HPII catalase activity . 
+ σs S - His6 immunoprecipitation . 
+ For immunoprecipitation of the σ protein carrying a 6xHis-tag at its C-terminal end ( σS-His6 ) , the MG1655-rpoSHis6 strain was grown in 50 ml LB medium at 37 °C with vigorous shaking to an OD600 = 3.0 . 
+ In order to enrich the amount of RNA polymerase bound to promoters , cells were treated with rifampicin , which inhibits transcription initiation blocking RNA polymerase at the transcription start site , following the protocol described43 . 
+ To obtain protein-DNA crosslinking , for-maldehyde was added at a final concentration of 1 % for 5 minutes at room temperature . 
+ The crosslinking reaction was stopped by addition of 0.25 M glycine followed by 20 minute incubation at 4 °C with gentle shaking . 
+ The cells were washed , resuspended and treated with 100 μg / ml lysozyme for 30 minutes at 37 °C . 
+ The lysate was sonicated in order to fragment chromosomal DNA to a size between 100-400bp , and treated with RNaseI ( 100 μg / ml ) for 15 minutes at 37 °C . 
+ Cells debris was removed by centrifugation ( 10 minutes at 10000Xg ) . 
+ A 250 μl-fraction of the sample was treated with 100 μg / ml Proteinase K and 5 mM CaCl2 for two hours at 42 °C , and then at 65 °C overnight , to remove proteins non specifically bound to DNA . 
+ DNA was recovered by phenol-chloroform extraction and analyzed on a 2 % agarose gel to verify DNA fragmentation . 
+ The sample was mixed at a 5:1 ( vol : vol ) ratio with protein A/G agarose slurry and incubated for 2 h at 4 °C on a rotating wheel to clear the sample and reduce unspecific binding . 
+ Subsequently , the agarose beads were separated from the lysate by centrifugation at 10000Xg . 
+ The cleared lysate was then incubated at 4 °C overnight on a rotating wheel with 5 μl of antibody ( rabbit polyclonal to 6XHis-tag , ChIP grade , # 9108 , Abcam , Cambridge , UK ) . 
+ The rest of the procedure was carried out as previously described44 . 
+ DNA from untreated MG1655-rpoSHis6 was sonicated and 200 μl were taken to be used as a control in sequencing reactions ( Input = non-immunoprecipitated DNA ) . 
+ The Input and immunoprecipitated DNA samples were analyzed with the Agilent Bioanalyzer using the High Sensitivity DNA kit ( Agilent Technologies ) . 
+ Five IP samples were pooled on the same DNA purification column ( minElute , QIAGEN ) to reach 5 ng of total DNA , which is the minimum amount for sequencing library preparation . 
+ Two pools of IP DNAs were produced . 
+ Prior to sequencing libraries construction , quantitative Real Time reverse transcriptase-PCR ( qRT-PCR ) was carried out to assess the enrichment of the promoter region of the rpoS-dependent dps gene in the immunoprecipitated samples in comparison to the Input sample . 
+ The sequences of the primers used for qRT-PCR are listed in Supplementary Table S1 . 
+ Library preparation and sequencing procedure . 
+ Illumina libraries were prepared either from 5 ng of each of the two pools of immunoprecipitated-DNA ( RpoS-IP ) or from 5 ng of the two control DNA ( Input ) following the Illumina TruSeq ChIP-seq DNA sample preparation kit ; then each library was sequenced in a lane of a single strand 51 bp Illumina run on a GAIIx sequencer . 
+ Raw data are publicly available at Sequence Reads Archive under accession number BioProject SRP041323 ; BioSample SRS595203 ; Experiment SRX523029 ; Run1 SRR1265068 ; Run2 SRR1271103 . 
+ Statistical and bioinformatic data analysis . 
+ Raw reads were mapped against the Escherichia coli MG1655 genome using Bowtie45 with zero mismatches . 
+ The resulting BAM files were processed using SAMtools46 and BEDTools47 . 
+ The quality of each sequenced sample was checked using cross-correlation analysis implemented in spp R package48 . 
+ ChIP-seq peak calling was performed using CisGenome12 by imposing default parameters . 
+ Input data ( control DNA ) was used to model the background noise . 
+ Determination of rpoS-dependent gene expression in vivo . 
+ For all gene expression experiments , bacterial strains were grown in LB medium to OD600nm = 3.0 . 
+ For qRT-PCR , RNA was extracted and experiments performed as previously described49 , using 16S RNA as reference . 
+ Primers used in qRT-PCR experiments are listed in Supplementary Table S1 . 
+ For northern blots , total RNA was extracted using a hot-phenol procedure , so to maintain small RNA molecules . 
+ 5 to 20 μg of RNA were separated onto a 6 % denaturing acrylamide gel prior to their electro-transfer onto a nylon membrane . 
+ As gene specific probes , 5 ` - Biotinylated oligomers ( Supplementary Table S1 ) were used at 1 nM in combination with 20 pM of the 5S RNA probe as internal control . 
+ Saturation and hybridization were performed with the ULTRAhyb ® - Oligo buffer ( Ambion ) at 45 °C and signals were detected using a Chemi nucleic acid detect wmodule ( Thermo Scientific Pierce ) . 
+ GFP reporter assays were performed as previously described50 . 
+ RNA polymerase in vitro assays . 
+ RNA polymerase reconstitution , gel mobility shift and KMnO4 reactivity assays were performed as previously described32 . 
+ 32P-labeled DNA was produced by PCR after 5 ` - phosphorylation of the primer complementary to the coding strand ( see Supplementary Table S1 ) in order to generate linear DNA pieces of about 250 bp , typically encompassing the first 10 codons of the gene and 220 bp of the upstream DNA , including the promoter region . 
+ For gel mobility shift assays ( GMSA ) , complexes between reconstituted RNA polymerase ( 18 to 150 nM ) and DNA ( 1 nM ) were allowed to form for 15 min at 37 °C in K - glu100 buffer ( 40 mM HEPES , pH 8.0 , 10 mM magnesium chloride , 100 mM potassium glutamate , 4 mM dithiothreitol ( DTT ) , and 500 μg / ml bovine serum albumin ) , in a final reaction volume of 10 μl . 
+ The reaction mixture was loaded onto a 5 % native polyacrylamide gel after addition of 2.5 μl of heparin-supplemented loading buffer32 and gel electrophoresis was carried out in 0.5 xTBE buffer at 120 V. Experiments were performed at least twice and gave very similar results . 
+ For KMnO4 reactivity assays , 50 nM of either form of RNA polymerase ( EσS and Eσ70 ) were incubated with about 3 nM of labeled promoter DNA for 20 min at 37 °C in K-glu100 buffer without DTT for complex formation . 
+ KMnO4 was added to a final concentration of 10 mM and the reaction was stopped after 30 seconds by adding 2 mM DTT . 
+ Samples were phenol-extracted and precipitated , treated with 1 mM piperidine , resuspended in pure formamide blue before being loaded onto a 7 % polyacrylamide denaturing gel . 
+ A DNA ladder was generated for each labeled DNA fragment by partial G/A sequencing using formic acid and piperidine . 
+ Other methods . 
+ Determination of HPII catalase activity and Western blot experiments were carried out as previously described51 ,52 . 
+ Mutagenesis of the omrA promoter was carried out by generation of PCR products with mutagenic primers carried the desired substitutions , as previously described32 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/26125937.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/26125937.txt 0 → 100644
View file @27818a9
+ Invasive E. coli Signature Transcripts by
+ 1 Department of Applied Mathematics and Statistics , Stony Brook University , Stony Brook , New York , United States of America , 2 Department of Medicine , Stony Brook University , Stony Brook , New York , United States of America , 3 Department of Pediatrics , Stony Brook University , Stony Brook , New York , United States of America , 4 Department of Pediatrics , Washington University St. Louis , St. Louis , Missouri , United States of America , 5 Department of Molecular Microbiology , Washington University St. Louis , St. Louis , Missouri , United States of America , 6 The Genome Institute , Washington University St. Louis , St. Louis , Missouri , United States of America , 7 Department of Medicine , University of New Mexico , Albuquerque , New Mexico , United States of America , 8 Program in Molecular Structure and Function , The Hospital for Sick Children , Toronto , Canada , 9 Department of Biochemistry & Molecular and Medical Genetics , University of Toronto , Toronto , Canada , 10 Department of Medicine , University of Colorado , Denver , Colorado , United States of America 
+ $ Current Address : Jackson Laboratory for Genomic Medicine , Farmington , Connecticut , United States of America * grace.gathungu@stonybrookmedicine.edu 
+ Abstract
+ Adherent-invasive Escherichia coli ( AIEC ) strains are detected more frequently within mucosal lesions of patients with Crohn 's disease ( CD ) . 
+ The AIEC phenotype consists of adherence and invasion of intestinal epithelial cells and survival within macrophages of these bacteria in vitro . 
+ Our aim was to identify candidate transcripts that distinguish AIEC from non-invasive E. coli ( NIEC ) strains and might be useful for rapid and accurate identification of AIEC by culture-independent technology . 
+ We performed comparative RNA-Sequence ( RNASeq ) analysis using AIEC strain LF82 and NIEC strain HS during exponential and stationary growth . 
+ Differential expression analysis of coding sequences ( CDS ) homologous to both strains demonstrated 224 and 241 genes with increased and decreased expression , respectively , in LF82 relative to HS . 
+ Transition metal transport and siderophore metabolism related pathway genes were up-regulated , while glycogen meta-bolic and oxidation-reduction related pathway genes were down-regulated , in LF82 . 
+ Che-motaxis related transcripts were up-regulated in LF82 during the exponential phase , but flagellum-dependent motility pathway genes were down-regulated in LF82 during the stationary phase . 
+ CDS that mapped only to the LF82 genome accounted for 747 genes . 
+ We applied an in silico subtractive genomics approach to identify CDS specific to AIEC by in corporating the genomes of 10 other previously phenotyped NIEC . 
+ From this analysis , 166 CDS mapped to the LF82 genome and lacked homology to any of the 11 human NIEC strains . 
+ We compared these CDS across 13 AIEC , but none were homologous in each . 
+ Four LF82 gene loci belonging to clustered regularly interspaced short palindromic repeats region ( CRISPR ) -- CRISPR-associated ( Cas ) genes were identified in 4 to 6 AIEC and absent from all non-pathogenic bacteria . 
+ As previously reported , AIEC strains were enriched for pdu operon genes . 
+ One CDS , encoding an excisionase , was shared by 9 AIEC strains . 
+ Reverse transcription quantitative polymerase chain reaction assays for 6 genes were conducted on fecal and ileal RNA samples from 22 inflammatory bowel disease ( IBD ) , and 32 patients without IBD ( non-IBD ) . 
+ The expression of Cas loci was detected in a higher proportion of CD than non-IBD fecal and ileal RNA samples ( p < 0.05 ) . 
+ These results support a comparative genomic/transcriptomic approach towards identifying candidate AIEC signature transcripts . 
+ Introduction
+ Crohn 's disease ( CD ) is a form of inflammatory bowel disease ( IBD ) that is characterized by skip lesions of transmural inflammation , and can occur at multiple sites in the digestive tract . 
+ Inflammation can be found anywhere in the gastrointestinal tract from the mouth to the anus , but in most ( 60 -- 80 % ) CD patients , the distal small intestine is frequently involved [ 1 , 2 ] . 
+ Factors implicated in the pathogenesis of IBD include host genetic predisposition , and continual activation of the mucosal immune system by luminal bacteria and their products [ 3 , 4 ] . 
+ From 16S ribosomal RNA gene sequence data , several laboratories have demonstrated imbalances in the gut microbial composition of CD patients , particularly those with ileal involvement when compared to unaffected individuals [ 5 -- 16 ] . 
+ A consistent feature is a reduction in the relative frequency of Faecalibacterium prausnitzii [ 8 ] and an increase in Proteobacteria , particularly Escherichia coli [ 5 ] . 
+ A greater relative abundance of E. coli has been associated with CD , and particularly in active disease compared to patients in remission [ 17 ] . 
+ Mucosa-associated E. coli in particular are more abundant in CD [ 18 ] and in several small studies were isolated from inflamed tissue that include areas with ulcers and granulomas [ 19 , 20 ] . 
+ In addition E. coli from the neoterminal ileum in post-surgical CD patients are linked to early recurrence of the disease [ 2 ] . 
+ Adherent invasive E. coli ( AIEC ) are considered to be pathobionts [ 21 -- 23 ] and are isolated from the intestinal mucosa in humans with a higher prevalence in CD patients than in healthy subjects [ 2 , 24 , 25 ] . 
+ The AIEC phenotype requires adherence and invasion of intestinal epithelial cells and survival and replication within macrophages [ 26 , 27 ] . 
+ Only a few commensal E. coli have been tested for this phenotype [ 28 ] . 
+ Using these methods , AIEC strains are detected in 22 -- 52 % of ileal CD patients and in 6 -- 18 % of non-IBD subjects [ 2 , 18 , 29 -- 31 ] . 
+ However , these studies differ with respect to the number of biopsies analyzed , the anatomical location of the biopsies , and disease activity . 
+ The design of a culture independent assay is hindered by the fact that although AIEC usually belong to the B2 or D groups , they are phylogenetically heterogeneous [ 18 , 32 ] . 
+ Jensen et al [ 33 ] reported a quantitative real-time PCR ( RT-qPCR ) to determine the proportion of E. coli LF82 in DNA from human intestinal biopsies using spiked samples , but has not reported the results of this assay using clinical samples . 
+ Furthermore the genomic target of this assay , the pMT1-like plasmid , is not conserved among AIEC . 
+ Dogan et al , [ 34 ] reported that genes encoding processes responsible for propanediol utilization ( pdu operon ) and iron acquisition ( yersi-niabactin , chu operon ) are overrepresented in human and dog AIEC genomes and might represent AIEC virulence factors . 
+ To gain insight into biological pathways that contribute to AIEC pathogenicity we conducted a comparative transcriptomic analysis of the reference AIEC strain LF82 and the noninvasive commensal strain HS , grown in pure cultures . 
+ Furthermore , the genomic sequences of 11 non-invasive E. coli strains , including MG1655 [ 35 ] and HS [ 36 ] , and a panel of 13 AIEC strains [ 34 , 37 -- 41 ] were compared to identify coding regions that could potentially serve as AIEC probes . 
+ Five of these gene targets and the previously described gene pduC , were tested by reverse transcriptase quantitative polymerase chain reaction ( RT-qPCR ) following extraction of RNA from fecal and ileal biopsy samples from 53 patients with and without IBD . 
+ Materials and Methods Homology searches in AIEC and non-invasive E. coli genomic
+ sequences The characteristics of seven previously published human AIEC ( strains LF82 , UM146 , NRG857c , HM605 , 541_1 , 541_15 , 576_1 ) and three human NIEC ( strains T75 , HS and MG-1655 ) , are summarized in Table 1 . 
+ Reference genomes were retrieved from NCBI [ 28 , 34 -- 41 ] . 
+ The characteristics of the six AIEC ( strains MS-107-1 , MS-115-1 , MS-119-1 , MS124-1 , MS145-7 , MS57-2 ) , and 8 NIEC ( strains MS185-1 , MS187-1 , MS196-1 , MS198-1 , MS45-1 , 
+ MS60-1 , MS78-1 , MS84-1 ) are also listed in Table 1 . 
+ These MS strains were isolated from de-identified surgical resection specimens collected at Mount Sinai School of Medicine [ 6 ] from CD , UC and non-IBD patients and characterized with respect to AIEC phenotype . 
+ The genomes of these 14 E. coli strains are accessible through the Human Microbiome Project data-base [ 42 ] . 
+ Homologous CDS were compared for these 13 AIEC and 11 NIEC . 
+ A search was also conducted among diarrheagenic ( DEC ) and extraintestinal ( ExPEC ) pathogenic E. coli ( S1 Table ) using the alignment tool BLASTN ( version 2.2.28 + ) . 
+ Homologous genes were defined as those with 85 % sequence identity over 90 to 110 % of the length of the query as previously described [ 37 ] . 
+ Bacterial RNA isolation, sequencing and alignment to genomes
+ The reference AIEC strain LF82 , originally isolated by Dr. Darfeuille-Michaud , was provided as a gift by Dr. Phillip Sherman ( University of Toronto ) and its identity was confirmed by multi-locus sequence typing [ 43 ] . 
+ The non-invasive HS strain was purchased from American Type Culture Collection ( ATCC 700891 ) . 
+ Triplicate Luria broth cultures ( 37 °C ) of LF82 and HS were grown with continuous shaking for 2 hours ( exponential phase ) and 24 h without shaking ( stationary phase ) . 
+ Total RNA was extracted from the cells using the RiboPure Bacteria kit ( Life Technologies Corp. . 
+ Carlsbad , CA ) , following the manufacturer 's protocol . 
+ The average RNA Integrity Number ( RIN ) over all samples was 7 . 
+ Two micrograms of RNA was depleted of ribosomal RNA using the RiboMinus Transcriptome Isolation Kit ( Life Technologies Corp. . 
+ Carlsbad , CA ) . 
+ These samples were then used as a template for strand-specific cDNA synthesis and subjected to single-end 150 bp Illumina sequencing . 
+ The RNA-Seq libraries were prepared and sequenced at the New York Genome Center ( NYGC ) . 
+ Raw sequences were filtered to remove human sequence contamination , remove short reads ( < 50 bp ) , depleted of duplicate reads , and quality trimmed using Trimmomatic ( v 0.32 ) [ 44 ] . 
+ rRNA sequences were identified and culled using SortMe RNA ( v1 .9 ) [ 45 ] . 
+ Raw sequence reads for LF82 and HS were mapped to NCBI reference genomes NC_011993 and NC_009800 , respectively [ 37 ] using the Burroughs Wheeler aligner ( BWA ) [ 46 ] . 
+ Counts for each annotated genomic loci were determined by HTseq-count ( version 0.6.1 ) [ 47 ] . 
+ The data discussed in this publication have been depos-ited in NCBI 's Gene Expression Omnibus and are accessible through GEO Series accession number GSE69020 . 
+ Differentially Expressed Genes (DEGs) in LF82 compared to HS
+ Two DEG algorithms were employed , edgeR [ 48 ] and DESeq [ 49 ] . 
+ The raw counts produced by HTseq-count provided the input variables for the DESeq and edgeR packages . 
+ DEGs were defined as 2 fold change and FDR < 0.05 and LF82 and HS transcripts were compared at 2h or 24h , independently . 
+ DEGs resulting from edgeR were the input variables for knowledge based biological functions using the Gene Ontology ( GO ) plugin BiNGO [ 50 ] and the custom ontology and annotation files found on the Gene Ontology website [ 51 , 52 ] . 
+ DEGs resulting from DESeq were the input variables for knowledge based pathways/modules defined either by the Kyoto Encyclopedia of Genes and Genomes ( KEGG , http://www.genome.jp/kegg/ ) [ 53 ] or a set of modules obtained through clustering a network of high quality functional interactions predicted for E. coli [ 54 ] . 
+ The up-regulated and down-regulated output from DESeq for each time point were entered to identify the perturbed pathways regardless of the overall polarity . 
+ Ethics Statement
+ This study was approved by the Institutional Review Board ( IRB ) at Stony Brook University Hospital . 
+ Pediatric ( age 7 years ) and adult patients are recruited in a consecutive fashion by the Stony Brook Digestive Diseases Research Tissue Procurement Facility and provide verbal and written consent for chart abstraction , blood , stool , tissue biopsies and/or surgical waste collection with analysis for research purposes and for their information to be stored in the hospital database . 
+ For children between 7 -- 17 years old participating in this study , both oral and written parent/legal guardian permission and a separate oral and written assent from the child was obtained . 
+ The IRB at Stony Brook University Hospital approved this consent procedure . 
+ Enrollment of patients and collection of samples
+ After receiving IRB approval , participants previously scheduled to undergo colonoscopy or intestinal resection , were identified and consented . 
+ Pediatric ( ages 7 years ) and adult patients were recruited in a consecutive fashion by the Stony Brook Digestive Diseases Research Tissue Procurement Facility . 
+ The period of enrollment was between March 2011 and June 2014 . 
+ Patients with a confirmed diagnosis of IBD were phenotyped based on endoscopic and radiographic studies as previously described [ 55 ] . 
+ Tissue specimens were collected and immediately placed into RNAlater ( Life Technologies , Carlsbad , CA ) . 
+ DNA isolation from bacteria
+ Nine bacterial strains were processed for DNA isolation : LF82 , MG1655 , HS , and 6 MS AIEC strains . 
+ Following overnight culture , a single colony of each bacterial strain was placed in 5 ml of tryptic soy broth and incubated overnight at 37 °C with shaking . 
+ Total bacterial DNA was extracted using the QIAamp DNA Mini Kit and according to the manufacturer 's protocol and stored at -20 °C until batch analysis . 
+ PCR and electrophoresis
+ The forward and reverse primers for the Cas genes ( strains LF82_088 , LF82_091 , LF82_092 and LF82_093 ) were designed using the NCBI primer designing tool Primer-BLAST [ 56 ] . 
+ The E. coli 16S rRNA forward and reverse primers were previously validated [ 57 ] . 
+ The predicted PCR products were 340 bp for E. coli 16S rRNA , 107 bp for LF82_088 , 109 bp for LF82_091 , 97 bp for LF82_092 , and 125 bp for LF82_093 . 
+ Amplification was performed in a 15 μL reaction volume and consisting of 1.5 μL 10X PCR buffer ( Qiagen ) , 3 μL Q solution , nuclease free water , 0.5 μM forward and reverse primers , 0.1 uL Qiagen Taq DNA polymerase , and 1μL template . 
+ PCR was performed using an Eppendorf Mastercycler EPGradient S . 
+ The following thermal cycling conditions were used : 5 min at 94 °C and 36 cycles of amplification consisting of 30 seconds at 95 °C , 30 seconds at 56 °C , and 1 min at 72 °C , with 5 min at 72 °C for the final extension . 
+ PCR product bands were analyzed after electrophoresis in a 1 % agarose gel in 1X TBE containing ethidium bromide and digital imaging using The ChemiDoc MP system ( Biorad , Hercules , CA ) . 
+ RNA isolation from stool and bacteria
+ Total bacterial RNA was extracted from each stool sample using a fecal RNA isolation kit ( Zymo Research Corporation , Irvine , CA ) according to the manufacturer 's protocol . 
+ RNA from strains LF82 , MG1655 , and HS was extracted using the same kit after culture for 2 and 24 hours . 
+ RNA was archived at -800 C until batch analysis . 
+ RNA isolation from ileal biopsies
+ Fresh frozen ileal biopsies were homogenized individually in 2 ml of Trizol solution ( Life Technologies ) with the PowerGen125 homogenizer ( Fisher Scientific ) and 1 ml aliquots placed into 
+ 1.5 mL microcentrifuge tubes . 
+ RNA was subsequently extracted using phenol/chloroform extraction methods as previously described [ 58 ] . 
+ The RNA was reconstituted in 50ul of RNA Storing Solution ( Life Technologies ) and stored at -800 C until batch analysis . 
+ Reverse transcription quantitative polymerase chain reaction (RT- qPCR) of E. coli transcripts
+ For cDNA production , 500 nanograms of RNA was added to a 20 μL reaction using the SuperScript VILO cDNA Synthesis Kit ( Life Technologies , Carlsbad , CA ) . 
+ Quantitative PCR was conducted in triplicate on 1:2 dilutions of cDNA from fecal samples and 1:2 , 1:4 and 1:8 dilutions of cDNA from pure E. coli cultures and using 1 μL volumes . 
+ Amplification was performed in a 20 μL reaction volume and consisting of 10 μl of 2x SYBR Green Master Mix , 1 μl each of 10uM forward and reverse primers , 1 μL of cDNA , and 7 μL of nuclease free water . 
+ The thermal cycling conditions were : 10 min at 95 °C and 40 cycles of amplification consisting of 30 seconds at 95 °C and 60 seconds at 60 °C using a Mastercycler EPGradient S ( Eppendorf ) . 
+ Primers included Total bacteria and E. coli 16S rRNA forward and reverse primers as previously validated [ 57 ] and the pduC gene as previously described [ 34 ] . 
+ Primers were designed for 5 candidate genes LF82_088 , LF82_091 , LF82_092 , LF82_093 , and LF82_095 , using an online primer design tool [ 56 ] . 
+ The sequences of all primers are listed in S2 Table . 
+ Statistical analysis
+ All analyses were performed using the GraphPad Prism 5 software suite ( GraphPad , San Diego , CA ) . 
+ For each RT-qPCR assay , the average cycle threshold ( Ct ) of 3 replicates per gene was determined . 
+ Positive assays had a mean threshold cycle values ( Ct ) 35 . 
+ The Ct values in negative samples and water ranged from 39 -- 40 . 
+ Fisher 's exact test was performed to compare positive and negative counts in IBD compared to non-IBD and CD compared to non-IBD , for fecal and ileal biopsy samples , respectively . 
+ The relative abundance of E. coli 16S rRNA transcripts was determined by defining the delta Ct ( ΔCt ) . 
+ ΔCt was generated by subtracting the average Ct value for total bacteria away from the average Ct value of E. coli 16S rDNA . 
+ The nonparametric Mann-Whitney test was used to compare values for IBD compared to non-IBD and CD compared to non-IBD for fecal and ileal biopsy samples , respectively . 
+ Results
+ Identification of differentially expressed genes (DEG) in LF82 vs. HS
+ We analyzed gene expression levels of strains LF82 and HS in separate samples prepared from exponential ( 2h ) and stationary ( 24h ) phase cultures grown at 37 °C , in order to interrogate gene expression under different growth conditions . 
+ Expression levels were standardized by reads per kilobase of exon per million mapped sequence reads ( RPKM ) [ 59 ] . 
+ The edgeR and the DESeq algorithms yielded similar findings . 
+ Results generated using edgeR are shown in S1 File . 
+ For the 2h and 24h samples , 654 and 459 CDS , respectively , had increased expression ( RPKM 2 fold , FDR < 0.05 ) in LF82 compared to HS ( Table A in S1 File ) , with 224 of the CDS exhibiting increased expression in LF82 at both time points . 
+ At 2 h , 6 genes shared by LF82 and HS were expressed only in LF82 . 
+ Similarly at 24 h , 17 genes had detectable transcripts in LF82 and not in HS ( Table A in S1 File ) . 
+ Six genes were detected only in LF82 at both time points . 
+ Some of these genes are involved in bacteriophage infections and others have no known function ( Table 2 ) . 
+ A total of 712 and 492 genes had decreased expression at 2h and 24h respectively , in LF82 compared to HS ( Table B in S1 File ) , with 241 genes exhibiting decreased expression in LF82 ( RPKM 0.05 , FDR < 0.05 ) at both time points . 
+ Functional profiling of genes was accomplished using the Gene Ontology ( GO ) plugin BiNGO [ 50 ] and the custom ontology and annotation files on the Gene Ontology website ( http://www.geneontology.org ) . 
+ This analysis revealed that multiple functional categories have overlapping datasets as shown in Tables A-D in S2 File . 
+ Examples include `` siderophore meta-bolic process like enterobactin '' , which are up-regulated at both time points , and '' glycogen met-abolic process '' and `` oxidation-reduction process '' , which are down regulated at both time points ( Table 3 ) . 
+ Analysis using alternative pathways/modules gene sets [ 53 , 54 ] facilitated visualization of patterns of gene expression against a very complex background . 
+ For example , the functional category chemotaxis is up-regulated ( FDR = 0.008 ) in LF82 at 2h , but bacterialtype flagellum-dependent cell motility is down regulated ( FDR = 1.4 x 10 − 6 ) at 24h . 
+ However as shown in Fig 1 , the polarity of the DEGs are preserved at both time points . 
+ These networkbased results draw attention to modules that do not overlap with the GO categories ( e.g. modules 24 and 79 in Fig 1 ) . 
+ 24h time points . 
+ For more comprehensive lists of up-regulated and down-regulated pathways at 2h and 24h please see S2A and S2B Table . 
+ The false discovery rate ( FDR ) is indicated for both the 2h and 24h cultures . 
+ GO -- Pathway Genes at 24h time point 2h FDR 24h ID FDR 
+ Selection of candidate AIEC signature transcripts
+ To identify candidate AIEC signature transcripts , we took a subtractive approach to identify coding DNA sequences that were present in the genome of the reference AIEC strain LF82 but not homologous to sequences in 11 non-invasive E. coli strains . 
+ In addition to the HS and MG1655 strains , we included 9 strains from patients with and without IBD and phenotyped with respect to their inability to invade epithelial cells and survive within macrophages . 
+ Although five of the non-invasive strains were isolated from non-IBD patients , four others were isolated from IBD patients ( 2 UC , 2 CD ) ( Table 1 ) . 
+ Of the 4508 predicted CDS in the LF82 genome [ 37 ] , 3446 could be uniquely mapped to corresponding CDS with 85 % sequence identity in the control HS genome . 
+ Although 747 LF82 CDS lacked homology to the 
+ HS genome [ 36 ] further subtraction was accomplished by including 10 additional NIEC . 
+ In the final analysis , 166 CDS in LF82 were absent from all 11 NIEC genomes . 
+ We compared the 166 CDS across six published AIEC genomes ( UM146 , NRG857c , HM605 , 541 -- 1 , 541 -- 15 , 576 -- 1 ) and six MS AIEC strains ( MS-107-1 , MS-115-1 , MS-119-1 , MS124-1 , MS145-7 , MS57-2 ) . 
+ None of the 166 CDS were homologous to all 13 human AIEC genomes ( see S1 Table ) . 
+ The CDS LF82_95 , which encodes an excisionase , was the most prevalent with homology in 9 of 13 AIEC genomes ( see Table 4 and S1 Table ) . 
+ This CDS also shared homology with a number of pathogenic E. coli , particularly DEC ( see S1 Table ) . 
+ The CDS LF82_332 corresponds to the pduC gene and was homologous with 6 of 13 AIEC . 
+ We also selected 4 CDS ( LF82_089 , LF82_091 , LF82_091 , LF82_092 , and LF82_093 ) that mapped to a region previously described as `` specific region 6 '' [ 37 ] and corresponded to 4 CRISPR-Cas genes . 
+ Three AIEC ( LF82 , NRG857c , and MS-57-2 ) shared homologous CDS with all 6 candidate AIEC transcripts . 
+ Three AIEC strains 541_1 , 576_1 , and MS -- 115 -- 1 ) , shared only the pduC gene and 3 additional AIEC strains ( UM146 , HM605 , and MS-145-7 ) shared only the 4 Cas genes . 
+ To test the in silico results and validate the PCR primers we amplified DNA for each of the 4 candidate genes ( S1 Table ) . 
+ Agarose gel electrophoresis of PCR reactions verified amplification products of the expected sizes ( see methods ) for candidate genes LF82_091 , LF82_092 , LF82_093 and LF82_088 in strains LF82 , MS145-7 , and MS57 -- 2 ( S2 Table ) . 
+ All other strains , including MG1655 , HS and the 4 MS AIEC strains without homologous Cas genes , exhibited no PCR amplification with these primers . 
+ All samples produced the expected band at 340 base pairs for the E. coli 16S rRNA gene product ( S1 & S2 Figs ) . 
+ Screening Candidate Gene Transcripts in Human Clinical Specimens 
+ RNA was isolated from fecal samples collected from 53 individuals at Stony Brook University . 
+ Within this collection , 43 ( 81.1 % ) stool samples were acquired from children ( Table 3 ) . 
+ Twenty-two were IBD patients and 31 individuals were non-IBD controls . 
+ Non-IBD patients included subjects with functional GI disorders , Celiac disease , lactose intolerance and one patient with juvenile polyps . 
+ The number of male patients was significantly higher in both IBD cohorts compared to controls , p = 0.009 and 0.029 for CD and UC respectively . 
+ CD patients were significantly older ( p = 0.024 ) . 
+ The median ages for CD , UC/IC and controls were 20 , 16 and 15 years , respectively . 
+ The IBD patients included 14 patients with CD , 6 patients with UC and 2 with indeterminate colitis ( IC ) . 
+ Three of the CD patients were diagnosed at enrollment . 
+ Parallel ileal biopsies were available for 10 CD patients , 3 UC patients and 23 non-IBD controls . 
+ Table 5 displays the characteristics of all subjects . 
+ For IBD patients , age of diagnosis , disease location and disease behavior ( CD ) are as defined by the Montreal classification [ 60 ] . 
+ Also included are disease duration , body mass index ( BMI ) , smoking , surgical management of IBD , and IBD medications . 
+ To compare the relative abundance of E. coli between clinical specimens , we performed RT-qPCR with E. coli-specific 16S rRNA gene primers and normalized results to total bacterial 16S rRNA gene expression ( Tables 6 & 7 ) . 
+ The median ΔCT values ( Total-E . 
+ coli Ct ) among CD , UC/IC and non-IBD fecal samples were -14.40 , -7.14 , and -13.56 , respectively . 
+ There was no statistically significant difference in E. coli abundance compared to non-IBD controls . 
+ Among ileal biopsy specimens , the mean ΔCT values for CD , UC/IC and non-IBD samples were -9.94 , -10.50 , and -11.84 , respectively . 
+ There was no statistically significant elevation in E. coli abundance in IBD specimens compared to controls . 
+ The threshold of detection of transcripts corresponding to excisionase ( LF82_095 ) , pduC ( LF82_332 ) and four Cas homologous genes ( LF82_088 , LF82_091 , LF82_092 , and LF82_093 was set at Ct 35 . 
+ The negative Ct values ranged between 39 and 40 . 
+ A higher proportion of CD fecal ( Table 6 ) and ileal ( Table 7 ) cDNA samples were positive for LF82_091 and 
+ LF82_092 transcripts than non-IBD fecal and ileal RNA samples ( p < 0.05 ) A higher proportion of CD fecal samples were positive for LF82_088 in CD vs. non-IBD samples and a higher proportion of CD ileal samples were positive for LF82_093 AND LF82_095 in CD vs. nonIBD . 
+ The median ΔCT values ( Total-E . 
+ coli Ct ) among CD , UC/IC and non-IBD fecal samples were -14.40 , -7.14 , and -13.56 , respectively . 
+ There was no statistically significant difference in E. coli abundance when compared to non-IBD controls . 
+ Among ileal biopsy specimens , the mean ΔCT values for CD , UC/IC and non-IBD samples were -9.94 , -10.50 , and -11.84 , respectively . 
+ There was no statistically significant elevation in E. coli abundance in IBD specimens compared to controls . 
+ Discussion
+ Although a higher proportion of CD patients harbor AIEC , such organisms can also be recovered from non-IBD patients . 
+ Conversely , NIEC strains are recovered from IBD patients ( Table 1 ) . 
+ The pathogenic potential of AIEC may vary depending on host susceptibility . 
+ Host factors such as IBD risk alleles and Paneth cell function have been linked to alterations in ileal mucosa-associated microbial composition and the Escherichia/Shigella genus [ 12 , 14 , 57 , 61 , 
+ 62 ] . 
+ In-vitro analysis has not been performed for many human commensal E. coli strains . 
+ In this study the complete genomes for 13 AIEC and 11NIEC , all with prior in-vitro phenotypic analysis were compared . 
+ Multiple studies have demonstrated that CD patients , particularly those with ileal disease , have altered intestinal microbial biodiversity and composition . 
+ Because most of these studies are based on 16S rRNA sequence analysis , they do not address alterations in microbial function , or in subgroups within identified species . 
+ Shotgun bacterial DNA metagenomics and bacterial metatranscriptomics measure alterations in microbial function more directly than does 16S rRNA sequence analysis . 
+ The advantage of bacterial transcriptomic data over shotgun metagenomics data is that the former provides information on which bacterial genes are actually transcribed . 
+ In this study we compared the transcriptomes of a reference AIEC strain , LF82 to a control strain HS to identify genes associated with the AIEC phenotype . 
+ We selected HS as the control strain which was previously demonstrated to be non-invasive [ 28 ] . 
+ A comparative analysis of genes shared between the LF82 and HS genomes indicated that many of the DEG had a relatively low fold change ( ~ 2 -- 4 fold ) making them less suited for clinical assays . 
+ Up-regulated genes in LF82 are involved in many key pathways including iron metabolism , supporting the recent report that AIEC strains are enriched for genes involved in iron utilization [ 37 ] , a feature of many B2 phylotype members . 
+ Comparison of the transcriptional profiles revealed a significant effect of growth conditions ( see Fig 1 ) . 
+ We identified six genes with no detectable expression in HS ( Table 2 ) at both growth conditions . 
+ Four of the genes code for identical proteins in the enteropathogenic bacteria Salmonella and Shigella . 
+ Further characterization of these proteins in AIEC and non-invasive E. coli strains is necessary to determine if they are a component of the AIEC phenotype . 
+ In the comparative analysis of RNA-seq data , 747 CDS that mapped to the LF82 genome did not share homology with CDS in HS ( S1 Table ) . 
+ We extended our comparative analysis to 13 E. coli strains with the AIEC pathotype and 11 NIEC ( Table 1 ) . 
+ Using a subtractive geno-mics approach , we found that the 166 CDS present only in LF82 were not homologous in all 11 NIEC ( S1 Table ) . 
+ However , none of the 166 CDS were present in the 13 AIEC strains surveyed . 
+ This observation supports the concept that the AIEC pathovar is formed by a heterogeneous collection of serogroups and serotypes . 
+ As shown in Table 3 , AIEC genomes are enriched in genes belonging to the pdu operon , the ibe operon , and the type VI secretion system [ 34 , 37 , 38 ] . 
+ The pdu operon is a component of a metabolic pathway required for fucose utilization [ 63 ] , and is present in enterpathogenic bacteria and offers a competitive advantage for energy production under anaerobic conditions [ 34 , 63 ] . 
+ The ibeA gene ( invasion of brain endothelium ) encodes an invasion protein found in several extraintestinal pathogenic E. coli ( ExPEC ) strains [ 64 ] . 
+ This gene may also play a role in E. coli resistance to H2Os stress [ 65 ] . 
+ IbeA is a necessary component for invasion of IECs and absence or mutation of this gene limits survival of AIEC within macrophage [ 66 ] . 
+ The type VI secretion system has been implicated in targeting other bacterial and eukaryotic cells [ 67 ] . 
+ We found homologous CDS for chuA and yersiniabac-tin , in 6 of 11 NIEC . 
+ These iron uptake genes are enriched among AIEC strains [ 37 ] and other pathogenic E. coli including ExPEC and EHEC . 
+ However , it remains to be determined whether these genes are expressed in the noninvasive strains . 
+ This analysis is limited by the fact that growth in pure cultures represents a very different environment than within the human intestine , and thus does not take into consideration complex microbe-microbe and host-microbe interactions . 
+ In addition , our subtractive genomics approach was limited to CDS expressed in the reference AIEC strain LF82 . 
+ NRG857C has a genome that is highly similar to LF82 however , CDS in NRG857C but absent in LF82 were present in as many as six of the 13 other AIEC strains . 
+ Additional CDS that were homologous among three or more AIEC except LF82 and absent in the 11 NIEC are listed in S3 Table . 
+ Nonetheless , the results of this analysis provide a useful baseline repertoire of E. coli transcriptional patterns that may aid in the analysis of complex patient based metatranscriptomic data . 
+ Among the 166 CDS mapping to the LF82 genome , we identified four potential signature transcripts belonging to CRISPR-associated ( Cas ) genes . 
+ These genes map to a region of the LF82 genome that is highly specific [ 37 ] and in our analysis these CDS were conserved in 4 of 6 AIEC strains . 
+ We did not find homologous CDS in DEC , although they are homologous to CDS in three ExPEC . 
+ Among the strains with these specific Cas genes , four of the AIEC strains and the three ExPEC are of the B2 phylotype . 
+ AIEC of the B2 phylotype are described to be among the most abundant and the most virulent [ 68 ] . 
+ CRISPR-Cas forms the adaptive immunity system [ 69 -- 71 ] . 
+ Bacterial strains express Cas proteins that recognize foreign genetic elements in plasmids and phages and insert fragments of the exogenous DNA into their own genomes . 
+ Most E. coli harbor CRISPR-Cas systems that belong to subtype I-E [ 72 ] . 
+ LF82 has the I-F system which has 3 CRISPR arrays and an operon of 6 cas-F genes ( cas6f , csy3 , csy2 , csy1 , cas2 , cas3 , and cas1 ) [ 72 ] . 
+ This system is also found in Yersinia pestis an enterotoxigenic E. coli ( strain B7A ) and a subset of B2 phylotype E. coli [ 72 ] . 
+ Toro et al , [ 73 ] examined the relationship between CRISPR-Cas systems and virulence in Shiga toxin-producing E. coli ( STEC ) and observed conservation of CRIPR spacer contents among strains of the same serotype and that the highly virulent STEC strains had fewer spacers within CRISPR arrays . 
+ Two other groups have recently identified CRISPR-Cas gene loci for the development of serotype-specific PCR assays of STEC [ 74 , 75 ] and Salmonella enterica serotypes Typhi and Paratyphi A [ 76 ] . 
+ We analyzed 53 fecal samples ( Table 6 ) using 4 Cas gene assays and 35.7 % of CD compared to 6.2 % of non-IBD control samples ( p = 0.02 ) revealed positive assays for 3 of the 4 assays . 
+ Using the pduC primers described in Dogan et al [ 34 ] , expression of the pduC gene was detected in 21.4 % CD compared to 25 % of non-IBD controls ( p = 1.0 ) . 
+ For the excisionase gene 64 % of CD compared to 44 % of non-IBD control samples ( p = 0.34 ) had positive assays . 
+ We also analyzed 38 parallel ileal biopsy samples ( Table 7 ) and 50 % of CD compared to 13 % of non-IBD control samples ( p = 0.04 ) had positive assays . 
+ Expression of the pduC gene was detected in 17 % CD compared to 4 % of non-IBD controls ( p = 0.27 ) . 
+ For the excisionase gene 33 % of CD compared to 0 % of non-IBD control samples ( p = 0.0095 ) had positive assays . 
+ All 4 excisionase positive samples were correspondingly positive for Cas genes . 
+ The p-values for the Cas assays did not reach significance after applying the Bonferroni correction for multiple comparisons ( p < 0.01 ) . 
+ Nevertheless we observed a similar trend in fecal and/or ileal biopsies for all four of the Cas genes tested . 
+ Our data suggests the Cas genes may serve as promising AIEC biomarkers ; this will need to be confirmed in a larger set of patient samples . 
+ We did not detect a significant difference in E. coli 16S rRNA gene expression ( ΔCT ) relative to total bacteria in cases compared to non-IBD controls . 
+ Altogether our sample sizes were small and pduC expression was less discriminating for AIEC infected samples . 
+ However , it may be a useful target for therapeutic intervention as previously described . 
+ It is also possible that other pdu operon genes are more specific and could serve as better targets . 
+ Our study is consistent with other reports that no single gene is able to distinguish AIEC from NIEC . 
+ Furthermore , it remains to be demonstrated whether any candidate AIEC signature transcripts with utility as a microbial biomarker , has a functional role in pathogenicity . 
+ In summary , these results identify potential candidate AIEC signature transcripts , which may be more prevalent among CD patients than non-IBD patients and serve as proof of principle for our comparative genomic/transcriptomic analysis of AIEC and NIEC . 
+ Supporting Information
+ S1 Fig . 
+ Specific CAS genes are detected in AIEC by PCR . 
+ Agarose gel electrophoresis analysis of PCR products obtained from reactions using forward and reverse primers of the Cas genes LF82_091 and LF82_092 , with E. coli 16S rRNA as a positive control . 
+ Positions of molecular size standards ( in bp ) are indicated , also see methods . 
+ ( TIF ) 
+ S2 Fig . 
+ Specific CAS genes are detected in AIEC by PCR . 
+ Agarose gel electrophoresis analysis of PCR products obtained from reactions using forward and reverse primers of the Cas genes LF82_088 and LF82_093 , with E. coli 16S as a positive control . 
+ Positions of molecular size standards ( in bp ) are indicated , also see methods . 
+ ( TIF ) 
+ S1 File . 
+ Up-regulated and Down regulated transcripts in LF82 and HS . 
+ Table A. Up-regu-lated transcripts in LF82 compared to HS . 
+ RNA was extracted from bacteria at exponential ( 2h ) and stationary ( 24h ) phases of growth in pure cultures and RNA sequencing completed . 
+ Expression level of homologous CDS in LF82 and HS is compared at 2 h , 24h and at both time points using edgeR . 
+ Up-regulated CDS with fold change 2 , FDR < 0.05 . 
+ The RPKM values for LF82 and HS are shown . 
+ Table B. Down-regulated transcripts in LF82 compared to HS cultures . 
+ RNA was extracted from bacteria at exponential ( 2h ) and stationary ( 24h ) phases of growth in pure cultures and RNA sequencing completed . 
+ Expression level of homologous CDS in LF82 and HS is compared at 2 h , 24h and at both time points using edgeR . 
+ Up-regulated CDS with fold change 2 , FDR < 0.05 . 
+ The RPKM values for LF82 and HS are shown . 
+ ( XLSX ) 
+ S2 File . 
+ GO functional categories in LF82 and HS . 
+ Table A. Up-regulated GO-categories ( FDR < 0.05 ) in LF82 compared to HS cultures at 2h . 
+ Table B. Up-regulated GO-categories ( FDR < 0.05 ) in LF82 compared to HS cultures at 24h . 
+ Table C. Down-regulated GO categories ( FDR < 0.05 ) in LF82 compared to HS cultures at 2h . 
+ Table D. Down-regulated GO categories ( FDR < 0.05 ) in LF82 compared to HS cultures at 2h . 
+ ( XLSX ) 
+ S1 Table . 
+ LF82 transcripts that lack homology ( < 85 % sequence identity ) within 11 noninvasive E. coli strains . 
+ ( XLSX ) 
+ S2 Table. Forward and reverse primers for RT-qPCR assays. (DOCX)
+ S3 Table . 
+ AIEC genes that do not share sequence homology ( < 85 % sequence identity ) with LF82 or non-invasive E. coli strains . 
+ ( XLSX ) 
+ Acknowledgments
+ We would like to acknowledge Dr. Phillip Sherman ( University of Toronto ) for providing the LF82 strain , the assistance of the New York Genome Center in generating and processing the bacterial RNA-sequence data and the Washington University Genome Institute for sequencing all the MS E. coli isolates . 
+ The authors gratefully acknowledge the pediatric and adult Gastroenterologists and all medical staff at the Stony Brook Hospital Endoscopy Unit . 
+ We also thank Donald W. Pettet III and Brian Righter for their technical assistance . 
+ Conceived and designed the experiments : YZ PT EB JP DF EL GG . 
+ Performed the experiments : YZ LR JFK EO NS PT ES GW EB XX JP DF EL GG . 
+ Analyzed the data : YZ LR JFK EO NS PT ES GW EB XX JP DF EL GG . 
+ Contributed reagents/materials/analysis tools : YZ EO NS PT ES GW EB XX JP DF EL GG . 
+ Wrote the paper : YZ LR JFK EO NS PT ES GW EB XX JP DF EL GG .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/26261330.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/26261330.txt 0 → 100644
View file @27818a9
+ Edited by Gerald R. Smith, Fred Hutchinson Cancer Research Center, Seattle December 18, 2014) Understanding molecular mechanisms in the context of living cells requires the development of new methods of in vivo biochemical analysis to complement established in vitro biochemistry. A critically important molecular mechanism is genetic recombination, required for the beneficial reassortment of genetic information and for DNA double-strand break repair (DSBR). Central to recom- bination is the RecA (Rad51) protein that assembles into a spiral filament on DNA and mediates genetic exchange. Here we have developed a method that combines chromatin immunoprecipita- tion with next-generation sequencing (ChIP-Seq) and mathemati- cal modeling to quantify RecA protein binding during the active repair of a single DSB in the chromosome of Escherichia coli. We have used quantitative genomic analysis to infer the key in vivo molecular parameters governing RecA loading by the helicase/ nuclease RecBCD at recombination hot-spots, known as Chi. Our genomic analysis has also revealed that DSBR at the lacZ locus causes a second RecBCD-mediated DSBR event to occur in the ter- minus region of the chromosome, over 1 Mb away.
+ DNA double-strand break repair ( DSBR ) is essential for cell survival and repair-deficient cells are highly sensitive to chromosome breakage . 
+ In Escherichia coli , a single unrepaired DNA DSB per replication cycle is lethal , illustrating the critical nature of the repair reaction ( 1 ) . 
+ DSBR in E. coli is mediated by homologous recombination , which relies on the RecA protein to efficiently recognize DNA sequence identity between two mol-ecules . 
+ RecA homologs are widely conserved from bacteriophages to mammals , where they are known as the Rad51 proteins ( 2 ) . 
+ The RecA protein plays its central role by binding single-stranded DNA ( ssDNA ) to form a presynaptic filament that searches for a homologous double-stranded DNA ( dsDNA ) donor from which to repair . 
+ It then catalyzes a strand-exchange reaction to form a joint molecule ( 3 ) , which is stabilized by the branch migration activities of the RecG and RuvAB proteins ( 4 ) . 
+ The joint molecule is then resolved by cleavage at its four-way Holliday junction by the nuclease activity of RuvABC ( 5 , 6 ) . 
+ RecA binding at the site of a DSB is dependent on the activity of the RecBCD enzyme ( Fig. 1A ) . 
+ RecBCD is a helicase-nuclease that binds to dsDNA ends , then separates and unwinds the two DNA strands using the helicase activities of the RecB and RecD subunits ( see refs . 
+ 7 and 8 for recent reviews ) . 
+ RecD is the faster motor of the two and this consequently results in the formation of a ssDNA loop ahead of RecB ( Loop 1 in Fig. 1A ) ( 9 ) . 
+ As the enzyme translocates along dsDNA , the 3 ′ - terminated strand is continually passed through the Chi-scanning site thought to be located in the RecC protein ( 10 ) . 
+ When a Chi sequence ( the ′ ′ octamer 5 - GCTGGTGG-3 ) enters this recognition domain , the RecD motor is disengaged and the 3 ′ strand continues to be unwound by RecB . 
+ Under in vitro conditions , where the con ¬ 
+ A, and accepted by the Editorial Board July 15, 2015 (received for review
+ centration of magnesium exceeds that of ATP , the 3 ′ end ( unwound by RecB ) is rapidly digested before Chi recognition , whereas the 5 ′ end ( unwound by RecD ) is intermittently cleaved ( 11 , 12 ) . 
+ After Chi recognition the 3 ′ end is no longer cleaved but the nuclease domain of RecB continues to degrade the 5 ′ end as it exits the enzyme ( 11 , 12 ) . 
+ Under in vitro conditions where the concentration of ATP exceeds that of magnesium , unwinding takes place but the only site of cleavage detected is ∼ 5 nucleo-tides 3 ′ of the Chi sequence ( 13 , 14 ) . 
+ Because the RecB motor continues to operate while the RecD motor is disengaged , Loop 1 is converted to a second loop located between the RecB and RecC subunits or to a tail upon release of the Chi sequence from its recognition site . 
+ We therefore describe this single-stranded region as Loop/Tail 2 in Fig. 1A . 
+ After the whole of Loop 1 is converted to Loop/Tail 2 , this second single-stranded region continues to grow as long as the RecB subunit unwinds the dsDNA . 
+ The RecBCD enzyme enables RecA protein to load on to Loop/Tail 2 to generate the presynaptic filament necessary to search for homology and initiate strand-exchange ( 15 ) . 
+ Finally , the RecBCD enzyme stops translocation and disassembles as it dissociates from the DNA , releasing a DNA-free RecC subunit ( 16 ) . 
+ Our understanding of the action of RecBCD and RecA has been the result of more than 40 years of genetic analysis and 
+ Author contributions : C.A.C. , M.E.K. , and D.R.F.L. designed research ; C.A.C. and M.F. performed research ; C.A.C. , M.F. , V.D. , M.E.K. , and D.R.F.L. analyzed data ; and C.A.C. , V.D. , M.E.K. , and D.R.F.L. wrote the paper . 
+ The authors declare no conflict of interest.
+ This article is a PNAS Direct Submission . 
+ G.R.S. is a guest editor invited by the Editorial Board . 
+ Freely available online through the PNAS open access option . 
+ Data deposition : The data reported in this paper have been deposited in the Gene Expression Omnibus ( GEO ) database , www.ncbi.nlm.nih.gov/geo ( accession no . 
+ GSE71249 ) . 
+ 1To whom correspondence may be addressed . 
+ Email : Meriem.Elkaroui@ed.ac.uk or D.Leach@ed.ac.uk . 
+ This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10 . 
+ 1073/pnas .1424269112 / - / DCSupplemental . 
+ biochemical investigation of these purified proteins in vitro . 
+ However , relatively little is known about their activities on the genomic scale . 
+ To investigate these reactions in vivo , we have used RecA chromatin immunoprecipitation with next-generation sequencing ( ChIP-Seq ) in an experimental system that allows us to introduce a single and fully repairable DSB into the chromosome of E. coli ( 1 ) . 
+ Because DSBR by homologous recombination normally involves the repair of a broken chromosome by copying the information on an unbroken sister chromosome , our laboratory has previously developed a procedure for the cleavage of only one copy of two genetically identical sister chromosomes ( 1 ) . 
+ We have made use of the observation that the hairpin nuclease SbcCD specifically cleaves only one of the two sister chromosomes following DNA replication through a 246-bp interrupted palindrome to generate a two-ended DSB ( 1 ) . 
+ As shown in Fig. 1B , this break is fully repairable and we have shown that recombination-proficient cells suffer very little loss of fitness in repairing such breaks ( 17 ) . 
+ Here we investigate in vivo and in a quantitative manner the first steps of DSBR : because the outcome of RecBCD action is understood to be the loading of RecA on DNA in a Chi-dependent manner , we use RecA-ChIP to reveal the consequences of RecBCD action on a genomic scale during DSBR . 
+ Analyses of most ChIP-Seq datasets focus on the identification of regions of significant enrichment of a given protein but do not take into account the underlying mechanisms giving rise to the binding ( 18 ) . 
+ We reasoned that given the detailed mechanistic understanding of RecBCD in vitro , we could gain a deeper insight into its in vivo functions by developing a mathematical model of RecBCD action that would enable us to estimate the mechanistic parameters of the complex in live cells . 
+ Our ChIP data indicate that RecA is indeed loaded on to DNA in a Chi-dependent manner and we have used our mathematical model to infer the parameters of RecBCD action in vivo on a genomic scale . 
+ Furthermore , our analysis reveals that DSBR at lacZ induces DSBR in the terminus region of the chromosome , an unanticipated observation illuminated by the genomic scale of our data . 
+ Results
+ DSB-Dependent RecA Loading to DNA . 
+ We initially investigated the in vivo binding of RecA at the site of a DSB by ChIP and assayed RecA -- DNA interactions by quantitative PCR ( qPCR ) . 
+ The DSB was generated by SbcCD-mediated cleavage of a 246-bp interrupted DNA palindrome inserted in the lacZ gene ( lacZ : : pal246 ) ( 1 ) . 
+ In the absence of a DSB , there was no RecA enrichment detected in the 40-kb region surrounding lacZ ( Fig. 2A ) . 
+ However , following the induction of a DSB there was significant RecA binding detected on both sides of lacZ ( Fig. 2B ) . 
+ This binding corresponded to the first correctly orientated Chi site on either side of the DSB and spread out over several kilobases of DNA , consistent with the formation of a RecA filament on a single-strand of DNA generated by RecBCD , followed by strand invasion to form a joint molecule . 
+ These data suggested that , as expected and consistent with in vitro data , RecA is loaded at the DSB in a Chi-dependent manner ( 12 , 19 ) . 
+ After recognition of the Chi sequence , qPCR analysis detected the binding of RecA to a 30-kb region of DNA surrounding the DSB . 
+ Large peaks of RecA enrichment were detected immediately after the first Chi sites on both sides of the DSB , with RecA enrichment decreasing at Chi sites further away from lacZ ( Fig. 2B ) . 
+ On the origin-proximal side of the break we detected binding of RecA following , not only the first Chi site encountered , but also to subsequent Chi sites . 
+ Eighteen-fold RecA enrichment was observed following the Chi site positioned closest to the DSB at a locus on the origin-proximal side and a subsequent peak of 12-fold RecA enrichment was detected at loci positioned ∼ 13 kb origin-proximal to the DSB . 
+ This second peak is consistent with the presence of four Chi sites in this region . 
+ The origin-distal side of the DSB does not have a second peak , with RecA enrichment plateauing at fivefold enrichment at the sites tested between 7 kb and 13 kb , as expected given the presence of only a single Chi site in this region . 
+ The distribution of RecA binding also suggested that the two sides of the DSB might be processed differently , with a higher RecA-enrichment observed at the first Chi position on the origin-proximal side of the break compared with the origin-distal side ( Fig. 2B ) . 
+ RecA Loading at Synthetic Arrays of Three Chi Sites . 
+ To confirm that RecA was indeed loaded in relation to the recognition of Chi sites by the RecBCD enzyme , we investigated RecA binding in the presence of synthetic arrays of three Chi sites inserted at 3 kb on either side of the DSB . 
+ In vitro studies have shown that a single Chi site is recognized by RecBCD with an efficiency of 20 -- 40 % , which suggests that following a DSB , a substantial number of RecBCD molecules fail to recognize Chi ( 20 ) . 
+ An efficiency of Chi recognition in vivo similar to that obtained in vitro would explain the observed Chi distribution at lacZ : : pal246 . 
+ Previously , arrays of three synthetic Chi sites have been shown to be recognized by RecBCD with an efficiency of 60 -- 80 % ( 21 ) . 
+ We reasoned that placing these arrays either side of the DSB would focus a similar proportion of RecA loading closer to lacZ . 
+ Furthermore , we placed the Chi sites at equal distances ( 3 kb ) on the two sides of the break to increase the symmetry of the reaction . 
+ ChIP-qPCR revealed that , in vivo , the triple Chi arrays do indeed stimulate RecA binding closer to the DSB and that binding was enhanced relative to that observed at single endogenous Chi sites ( Fig. 2 B and D ) . 
+ Interestingly , the asymmetry in RecA binding to the DNA following a DSB remained , with more RecA bound to the originproximal side of the DSB compared with the origin-distal side ( Fig. 2D ) . 
+ Furthermore , despite the 38-fold RecA enrichment detected at a locus ∼ 3 kb origin-proximal to the DSB , there was still as much as 15-fold RecA enrichment observed at loci following endogenous Chi sites subsequent to the triple-Chi arrays . 
+ This finding confirmed that , like single Chi sites , the triple Chi arrays failed to be recognized in a detectable proportion of the population and that successive Chi sites are required for efficient DSBR . 
+ High-Resolution Analysis of RecA Loading by ChIP-Seq . 
+ To quantify RecA binding to DNA in relation to Chi , ChIP was combined with high-throughput sequencing ( ChIP-Seq ) to provide a genome-wide analysis of RecA -- DNA interactions following a DSB ( Fig. 3 ) . 
+ These experiments were carried out with the arrays of three Chi sites at 3 kb on either side of the DSB site to focus the reaction at equidistant sites on both sides of the break . 
+ RecA was loaded , at the site of a DSB in the lacZ gene , in a Chi-dependent manner with approximately twofold more RecA on the originproximal side of the DSB compared the origin-distal side ( Fig. 3B ) . 
+ This finding is consistent with the results obtained by ChIP-qPCR ( Fig. 2 ) and suggests that the two DNA ends are not equally competent for Chi recognition . 
+ Because previous work has shown that SbcCD generates a two-ended break in a RecB mutant ( 1 ) , we suggest that approximately half of the DSBs arising at the interrupted palindrome are converted from two-ended to one-ended structures by RecBCD action . 
+ This could happen if RecBCD traveling on the origin-distal end catches up with the replication fork and dissociates before recognizing a Chi sequence ( Fig . 
+ S1 ) . 
+ To further determine the role of subsequent Chi sites during DSBR , we deleted all of the endogenous Chi sites within a 15-kb region on either side of the break , leaving only the triple-Chi arrays positioned 3 kb either side of the lacZ . 
+ ChIP-Seq analysis revealed that , although the level of enrichment stimulated by the triple-Chi array remained unchanged , RecA binding was decreased in the 15-kb region where the Chi sites had been deleted ( Fig. 3C ) . 
+ Strikingly , this decrease was correlated with an increase in RecA binding caused by Chi sites more than 20 kb away from the DSB ( Fig . 
+ S2 ) . 
+ This finding confirmed that RecBCD enzyme complexes that did not act at the array of three Chi sites progress many kilobases further on the DNA until they do recognize a Chi site . 
+ Inferring the Parameters of RecBCD Activity in Vivo from HighResolution ChIP-Seq Data Using Mathematical Modeling . 
+ We reasoned that the high spatial resolution afforded by ChIP-Seq data could be exploited to reveal quantitative aspects of the molecular behavior of RecBCD-mediated loading of RecA in living cells . 
+ To interpret the quantitative information contained in the highresolution ChIP-Seq data in terms of the parameters of RecBCD action in vivo , we developed a mathematical model of enzyme action . 
+ Our mathematical model is based on the known in vitro properties of RecBCD and its crystal structure ( 7 , 8 , 22 ) and is described in detail in the SI Appendix ( see also Fig. 1A ) . 
+ The model 's main assumptions are the following : Before Chi recognition , the RecBCD complex translocates along the DNA with both the RecD and RecB motors engaged . 
+ RecD is the lead motor and the difference in speed of the two motors leads to the accumulation of a ssDNA loop ( Loop 1 ) ahead of RecB that depends on the motor speed ratio ( V / V ) and the distance B D traveled by the enzyme . 
+ Chi recognition is stochastic and we denote pChi the probability of Chi recognition . 
+ Upon Chi rec-ognition , the RecD motor is disengaged and Loop 1 is converted to Loop/Tail2 , which is extended using the RecB motor and RecA is loaded with equal probability across the ssDNA . 
+ We denote p the probability of RecBCD dissociating from DNA stop or stopping RecA loading . 
+ Using these assumptions , we calculated the probability of RecA loading at a genomic position in the vicinity of the DSB depending on the position of the DSB and the Chi sites , VB/VD , pChi , and pstop . 
+ Using a strain in which we had deleted all of the endogenous Chi sites within a 15-kb region either side of the break , leaving only the artificial Chi arrays positioned 3 kb either side of the lacZ , we varied the number of Chi sites in the origin-proximal array from one to six . 
+ We then compared the mathematical model prediction to the RecA ChIP-Seq data obtained for strains . 
+ We observed that as the number of Chi sites in the origin-proximal array was increased , RecA binding close to the DSB was increased relative to the proportion of events involving recognition of Chi sites further away from the break ( Fig. 4 ; a direct comparison between one Chi site and the six Chi array is provided in Fig . 
+ S3 ) . 
+ RecA loading at Chi sites 40 kb away from the break was most clearly noticeable in the strain with a single Chi site 3-kb origin-proximal to the break ( Fig. 4A ) . 
+ Strikingly , the mathematical model accurately captured the shape of the RecA distribution in all different configurations of Chi positioning with respect to the DSB , indicating that it reflects the main features of RecBCD-mediated end resection ( Fig. 4 A -- F ) . 
+ We have used maximum-likelihood estimation to infer the parameters of the mathematical model from the in vivo data ( Fig. 4G ) . 
+ Whereas pChi ( 0.20 -- 0.43 ) and RecBCD processivity ( 10 kb ) estimates were close to those obtained in vitro , the motor speed ratio , VB/VD ( 0.94 -- 0.96 ) , was significantly higher than previously reported in vitro ( 9 , 19 , 20 , 23 , 24 ) . 
+ Interestingly , we observed that the mathematical model 's estimate for pChi decreased as the number of Chi sites in the array was increased ( Fig. 4G ) . 
+ As pChi is the probability of recognizing one Chi site , this suggests that when Chi sites are positioned very close together ( Chi sites are positioned 10 bp apart in the artificial Chi arrays ) they are not recognized independently by RecC . 
+ This would lead to an underestimation of pChi in the strains that have multiple Chi sites in the array . 
+ Therefore , we focused our interpretation of the data on the strain with only one Chi site positioned 3-kb origin-proximal of the DSB ( Fig. 4A ) . 
+ Recent in vitro single-molecule experiments have suggested that there may be two populations of RecBCD molecules each with a different velocity ( 25 ) . 
+ We extended the mathematical model allowing for two populations of RecBCD with distinct VB/VD and pChi . 
+ Strikingly , the extended model showed a better fit to the data ( SI Appendix , Fig. 7 ) and had a lower Bayesian Information Criterion score , indicating that this better fit is statistically significant . 
+ Maximumlikelihood estimates of the parameters of this extended model indicated two clearly separated populations with 46 % of RecBCD with low pChi ( 0.26 ) and high VB/VD ( 0.86 ) and 54 % with higher pChi ( 0.86 ) and lower V / V ( 0.58 ) ( SI Appendix ) . 
+ B D 
+ ChIP-Seq Reveals RecA Binding to Other Regions in the Genome , Including DSB-Dependent Binding in the Terminus of the Chromosome . 
+ Genome-wide analysis of our dataset revealed DSB-independent RecA binding at distinct loci across the genome ( Fig. 5 A and B ) . 
+ These loci include the rRNA genes , tRNA genes , and ribosomal protein genes . 
+ The positions of these loci of RecA binding were not associated with the positions of Chi sites , suggesting that the RecA binding at these sites is not RecBCD-dependent . 
+ ChIP signal at highly transcribed genes has been reported for other proteins , including Smc of Bacillus subtilis and SeqA of E. coli , and it is unclear whether such a signal is directly related to RecA activity ( 26 , 27 ) . 
+ Surprisingly , we observed that the DSB at the lacZ locus induced RecA binding in the region of the chromosome involved in the termination of replication ( Fig. 5 A and C ) . 
+ This RecA binding occurred at positions of Chi sites , characteristic of RecBCD-mediated processing . 
+ This finding therefore indicates the presence of additional , indirectly generated , double-strand ends in the region of the chromosome containing dif , the site responsible for the resolution of chromosome dimers by XerCD site-specific recombination ( 28 ) . 
+ Discussion RecA Protein Binding to a DNA DSB in Vivo Is Determined by the Chi Sites in the Region Surrounding the Break . 
+ We have used a system that accurately introduces a single site-specific DNA DSB into one copy of the replicated E. coli chromosome . 
+ The system uses the fact that a 246-bp interrupted palindrome is cleaved by the SbcCD enzyme at the site of a DNA hairpin structure formed on only one of the replicated chromosomal copies ( 1 ) . 
+ We predict the hairpin to be formed on the lagging-strand template because of its single-stranded nature and that under our growth conditions , repair occurs efficiently , presumably using the uncleaved sister chromosome as a template ( 4 , 17 ) . 
+ Using ChIP in E. coli with antibodies against the RecA protein , we investigated the behavior of this protein as it is engaged in repairing this DSB . 
+ We then combined ChIP with whole-genome sequencing to map these RecA -- DNA interactions on a genome-wide scale . 
+ As predicted by the in vitro biochemistry of RecBCD enzyme , following a DSB , RecA protein is loaded onto DNA in relation to the Chi sites ( 5 ′ - GCTGGTGG-3 ′ ) surrounding the break , and using a simple mathematical model we were able to infer the probability of Chi recognition in vivo . 
+ Although the mathematical model we used is specific to RecBCD , such ability to infer the parameters of enzyme action in vivo from a combination of genomic analysis and mathematical modeling promises to be applicable to other macromolecular reactions , such as the activity of RNA polymerase . 
+ Inference of the Parameters of RecBCD Action in Vivo . 
+ We have coupled quantitative genomic analysis of RecA binding with mathematical modeling of the RecBCD complex in its loading of 
+ RecA to infer , based on the assumptions of the mathematical model , the molecular parameters of RecBCD action in live cells . 
+ Initial analysis using a mathematical model with one mode of action of RecBCD led to estimates of the probability of Chi recognition and processivity values similar to those that had previously been measured in vitro ( 19 , 20 ) . 
+ However , we observed that the inferrered ratio of the two motors ' speed ( VB/VD = 0.94 -- 0.96 ) was significantly higher than the value of 0.6 observed in vitro from studies of mutant enzymes defective for the helicase motors of RecB or RecD ( 25 ) and from evaluation of the average rate of Loop 1 formation relative to total unwinding by the wildtype enzyme ( 9 ) . 
+ We calculated ( SI Appendix , Eq . 
+ [ 2 ] ) that a ratio of 0.95 would result in the production of a single-strand loop before Chi ( Loop 1 ) of 3 kb for a Chi site positioned 60 kb from the break . 
+ In contrast , a VB/VD of 0.6 similar to that reported in vitro would result in an extremely long loop of 40 kb in vivo . 
+ These differences in VB/VD might be because of differences in RecBCD activity in vitro and in vivo . 
+ However , a wide distribution in the values of VB/VD has been observed in vitro ( 9 ) , and two broad populations of RecBCD molecules with different velocities have been reported ( 25 ) . 
+ When we extended the mathematical model to explore the possibility of two RecBCD populations , each with a different mode of action , we found that a two-population model was supported by the data . 
+ Assuming the existence of two populations could be an oversimplification and we can not rule out more complex RecBCD populations . 
+ However , it is interesting to note that under this two-population model , the inferred parameters show a sharp contrast between molecules with a low probability of Chi recognition associated with a high motor speed ratio and molecules with a higher pChi and lower VB/VD . 
+ Both these combinations of parameters will result in approximately the same average length of Loop 1 given the average density of Chi sites on the genome ( see calculation in SI Appendix ) . 
+ RecBCD complexes with low pChi may have to travel very far before Chi recognition but because of their high VB/VD they accumulate a relatively short Loop 1 . 
+ In contrast , RecBCD molecules with high pChi , which will recognize Chi motifs close to the break , accumulate a longer Loop 1 . 
+ This trade-off may indicate that controlling the size of Loop 1 has important consequences for RecBCD function . 
+ The parameter estimates obtained here need to be interpreted within the assumptions of the mathematical model . 
+ For example , we have assumed that the whole single-stranded region generated by RecBCD is covered equally well by RecA . 
+ However , if only part of it is covered by RecA or if RecA binding extends into the adjacent double-stranded region , the inference of pstop and its interpretation would be affected . 
+ For example , if RecBCD continues unwinding after ceasing RecA loading , this part of the single strand would not be detected in the experimental assay and RecBCD processivity would be underestimated . 
+ Therefore , the pstop inferred here is to be understood as an `` effective processivity of RecA loading by RecBCD , '' which is the combination of its DNA unwinding and RecA loading activities . 
+ Similarly , the estimation of VB/VD could be affected if RecA is not loaded with the same probability across the Chi-proximal region of the ssDNA . 
+ No . 
+ of hits 
+ A DSB at lacZ Induces DSBR in the Terminus Region of the Chromosome . 
+ In our system , we induce a DSB at the site of an interrupted DNA palindrome inserted at the lacZ locus , which lies about half way between the single origin of replication and the terminus . 
+ Our genomic analysis has revealed that DSBR in the lacZ region of a chromosome can result in DSBR ( characterized by Chi-correlated RecA binding ) in the terminus region surrounding dif , at a distance of over 1 Mb from lacZ . 
+ This RecA binding indicates that following a DSB at lacZ , dsDNA ends are generated in the region containing the dif site , which is required for the resolution of chromosome dimers by XerCD ( 28 ) . 
+ The signal in this region is significantly lower than at lacZ , suggesting that these double-stranded ends only appear in a subpopulation of cells . 
+ However , it is currently unclear how a break at lacZ causes breakage in the terminus region . 
+ The RecA bound is approximately symmetrically distributed on the two sides of the dif-containing region . 
+ The observation that strains undergoing these breaks are fully viable ( 17 ) leads us to believe that unbroken sister chromosomal DNA in the dif-containing region must also be present to facilitate efficient repair . 
+ Whether this RecA binding implies the presence of two-ended breaks or of two , equally frequent , single-ended breaks remains to be determined . 
+ Interestingly , the existence of double-strand ends in the terminus region of the chromosome has been hypothesized previously . 
+ Kogoma has proposed that recombination-dependent DNA replication may be responsible for induced stable DNA replication , which can be initiated at a sequence known as oriM2 in the terminus region ( 29 ) . 
+ However , oriM2 has not been mapped accurately and it is not known whether the DSB that we observe relates to this origin . 
+ Several other results , such as the existence of terminal recombination ( 30 -- 32 ) and the striking replication profile of a recB mutant ( 33 ) , indicate that the terminus region of the chromosome presents an area of importance to recombination . 
+ However , there are very likely several different reactions taking place . 
+ Our observation of a DSBR event close to dif induced by DSBR at lacZ provides a clear physical demonstration of one such interaction in this region . 
+ Experimental Procedures
+ Bacterial Strains and Growth . 
+ All strains are derivatives of E. coli K12 MG1655 ( 34 ) and are listed in Table S1 . 
+ Cells were grown in M9 minimal media supplemented with 0.2 % casamino acids , 0.5 % glucose , 5 μM CaCl2 , and 1 mM MgSO4 at 37 °C . 
+ Mutations were introduced by P1 transduction or plasmid-mediated gene replacement ( PMGR ) ( 35 -- 38 ) using the plasmids described in Table S2 . 
+ All primers used for cloning and genotyping are detailed in Table S3 . 
+ ChIP Sample Preparation . 
+ All ChIP experiments were performed with ∼ 5 × 108 cells growing in exponential growth phase ( OD600nm 0.2 -- 0.25 ) . 
+ RecA -- DNA interactions were chemically cross-linked with formaldehyde ( Sigma-Aldrich ; final concentration 1 % ) for 10 min at 22.5 °C . 
+ Cross-linking was quenched by the addition of glycine ( Sigma-Aldrich ; final concentration 0.5 M ) . 
+ Cells were collected by centrifugation and washed three times in ice-cold 1 × PBS . 
+ The pellet was then resuspended in 250 μL ChIP buffer [ 200 mM Tris · HCl ( pH 8.0 ) , 600 mM NaCl 4 % ( vol/vol ) Triton X , Complete protease inhibitor mixture EDTA-free ( Roche ) ] . 
+ Sonication of cross-linked samples was performed using the Diagenode Bioruptor at 30-s intervals for 10 min at high amplitude . 
+ After sonication , 350 μL of ChIP buffer was added to each sample , the samples were mixed by gentle pipetting and 100 μL of each lysate was removed and stored as `` input . '' 
+ Immunoprecipitation was performed overnight at 4 °C using 1/100 anti-RecA antibody ( Abcam , ab63797 ) . 
+ IP samples were then incubated with Protein G Dynabeads ( Life Technologies ) for 2 h at room temperature . 
+ All samples were washed three times with 1 × PBS + 0.02 % Tween-20 before resuspending the Protein G dynabeads in 200 μL of TE buffer [ 10 mM Tris ( pH 7.4 ) , 1 mM EDTA ] + 1 % SDS . 
+ Next , 100 μL of TE buffer was added to the input samples and all samples were then incubated at 65 °C for 10 h to reverse formaldehyde cross-links . 
+ DNA was isolated using the MinElute PCR purification kit ( Qiagen ) . 
+ DNA was eluted in 50 μL of TE buffer using a two-step elution . 
+ Samples were stored at − 20 °C . 
+ Library Preparation for High-Throughput Sequencing . 
+ Input and ChIP samples were processed following New England Biolab 's protocol from the NEBNext ChIP-Seq library preparation kit . 
+ Briefly , 200 ng of input and ChIP-enriched DNA was subject to end repair to fill in ssDNA overhangs , remove 3 ′ phosphates and 5 ′ phosphorylate the sheared DNA . 
+ Klenow exo - was used to adenylate the 3 ′ ends of the DNA and NEXTflex DNA barcodes ( Bioo Scientific ) were ligated using T4 DNA ligase . 
+ After each step , the DNA was purified using the Qiagen MinElute PCR purification kit according to the manufacturer 's instructions . 
+ After adaptor ligation , the adaptor-modified DNA fragments were enriched by PCR using primers corresponding to the beginning of each adaptor . 
+ Finally , agarose gel electrophoresis was used to size select adaptor-ligated DNA with an average size of ∼ 275 bp . 
+ All samples were quantified on a Bioanalyzer ( Agilent ) before being sequenced on the Illumina HiSEq . 
+ 2000 . 
+ RecA binds to both ssDNA and dsDNA in presynaptic and postsynaptic complexes ( 39 -- 41 ) . 
+ It was previously believed that ssDNA could not be detected by ChIP-Seq . 
+ However , several studies have recently shown that this is not the case ; ssDNA is rendered double-stranded during the library preparation process through the formation of DNA hairpins that arise as a result of regions of microhomology ( 42 -- 44 ) . 
+ This allows the DNA to be amplified and detected by ChIP-Seq . 
+ These findings are consistent with our data , which shows a similar pattern of RecA binding detected using both qPCR and high-throughput sequencing . 
+ 1 . 
+ Eykelenboom JK , Blackwood JK , Okely E , Leach DRF ( 2008 ) SbcCD causes a double-strand break at a DNA palindrome in the Escherichia coli chromosome . 
+ Mol Cell 29 ( 5 ) :644 -- 651 . 
+ 2 . 
+ Cromie GA , Connelly JC , Leach DR ( 2001 ) Recombination at double-strand breaks and DNA ends : Conserved mechanisms from phage to humans . 
+ Mol Cell 8 ( 6 ) :1163 -- 1174 . 
+ 3 . 
+ Kowalczykowski SC , Dixon DA , Eggleston AK , Lauder SD , Rehrauer WM ( 1994 ) Biochemistry of homologous recombination in Escherichia coli . 
+ Microbiol Rev 58 ( 3 ) :401 -- 465 . 
+ 4 . 
+ Mawer JS , Leach DR ( 2014 ) Branch migration prevents DNA loss during double-strand break repair . 
+ PLoS Genet 10 ( 8 ) : e1004485 . 
+ 5 . 
+ Connolly B , et al. ( 1991 ) Resolution of Holliday junctions in vitro requires the Escherichia coli ruvC gene product . 
+ Proc Natl Acad Sci USA 88 ( 14 ) :6063 -- 6067 . 
+ 6 . 
+ Connolly B , West SC ( 1990 ) Genetic recombination in Escherichia coli : Holliday junctions made by RecA protein are resolved by fractionated cell-free extracts . 
+ Proc Natl Acad Sci USA 87 ( 21 ) :8476 -- 8480 . 
+ 7 . 
+ Dillingham MS , Kowalczykowski SC ( 2008 ) RecBCD enzyme and the repair of double-stranded DNA breaks . 
+ Microbiol Mol Biol Rev 72 ( 4 ) :642 -- 671 . 
+ ChIP-Seq Data Analysis . 
+ For ChIP-Seq analysis , 50-bp single-end reads were mapped to the E. coli K12 MG1655 ( NC000913 .3 ) ( 34 ) genome using Novoalign v2 .07 ( www.novocraft.com ) . 
+ Novoalign uses the Needleman -- Wunsch algorithm to determine the optimal alignment of reads . 
+ Before mapping , the 3 ′ adaptor sequences were removed using fastx_clipper and the data collapsed using fastx_collapser to remove identical sequence reads ( hannonlab.cshl.edu/fastx_toolkit/index.html ) . 
+ The preparation of ChIP-Seq libraries requires a PCR of the adaptor ligated DNA . 
+ This can result in PCR duplication of certain DNA fragments . 
+ Removing duplicates mitigates the effects of PCR amplification bias so that regions of the genome do n't appear more enriched than they actually are . 
+ The ChIP-Seq datasets in this study contained ∼ 4 % PCR duplicates and these were discarded . 
+ The data were also plotted without removing these duplicates and revealed that the trend in RecA binding was unchanged . 
+ Sequences were mapped with default parameters , allowing for a maximum of one mismatch per read ( novoalign - f DL4900_IP . 
+ fasta - d DL4900_genome . 
+ nix - r Random > DL4900_IP . 
+ sam ) . 
+ To report reads that have multiple alignment loci , we specified the -- r parameter as either -- r Random or -- r None . 
+ In the first case Novoalign chooses a single alignment location at random among all of the alignment results ; in the second case , only the reads that map to a single genomic location are aligned ( www.novocraft.com ) . 
+ PyReadCounters was used to calculate the overlap between aligned reads and E. coli genomic features ( 45 ) . 
+ The distribution of reads along the E. coli genome was visualized using the Integrated Genome Browser ( 46 ) . 
+ Full details of all scripts are available upon request . 
+ Identification of RecA-Binding at the DSB . 
+ Because of the specific mechanism of RecBCD-mediated RecA loading observed around the DSB , classic peakcalling algorithms such as MACS ( 47 ) failed to recapitulate the RecA binding at this site . 
+ This is because RecA loading at a DSB is the result of a complex dynamic process that can not be described as a simple binding event . 
+ This suggests that , as has been observed for other datasets ( 48 ) , the shape of the peaks may carry important information . 
+ In particular , we reasoned that given the high spatial resolution of ChIP-Seq data , the position and shape of the peaks observed at Chi sites could give us quantitative information about the mechanism of RecBCD-mediated DSB repair in vivo . 
+ Therefore , we developed a mathematical model of RecBCD-dependent RecA loading to evaluate the probability that a nucleotide in the vicinity of a DSB is coated by the RecA protein . 
+ We then used maximum-likelihood estimation to extract the parameters of this model from the dataset . 
+ This mathematical model and the associated data analysis are described in detail in the SI Appendix . 
+ qPCR . 
+ All real-time qPCR reactions were carried out in 15-μL volumes in the MX3000P qPCR machine ( Agilent ) using the Brilliant II SYBR Green qPCR master mix ( Agilent ) . 
+ The temperature profile for all assays was 95 °C for 10 min followed by 40 cycles of 95 °C for 20 s and 60 °C for 60 s All reactions were repeated in triplicate and the formation of PCR products of the correct lengths was confirmed by agarose gel electrophoresis . 
+ A full list of primers used for qPCR is given in Table S4 . 
+ Assay performance was checked by standard curve for all assays . 
+ Data were exported from the MxPro software to Microsoft Excel for analysis . 
+ The melting temperature of the qPCR primers was calculated by the manufacturer ( MWG Biotech ) . 
+ ACKNOWLEDGMENTS . 
+ We thank Dr. N. Molina and Dr. G. Sanguinetti for advice on data analysis ; Dr. Sander Granneman for advice on ChIP-Seq analysis ; Dr. Ralph Hector for advice on ChIP-Seq library preparation ; and Dr. M. White for the construction of pDL4690 . 
+ This research has been supported by an Medical Research Council studentship and a Medical Research Council Centenary Award ( to C.A.C. ) ; a Darwin Trust of Edinburgh postgraduate studentship ( to M.F. ) ; Marie Curie Fellowship PIOF-GA-2009-254082 -- DRIBAC ( to M.E.K. ) ; European Research Council Advanced Grant RULE-320823 ( to V.D. ) ; and a Medical Research Council programme Grant G0901622 ( to D.R.F.L. ) . 
+ 8 . 
+ Smith GR ( 2012 ) How RecBCD enzyme and Chi promote DNA break repair and recombination : A molecular biologist 's view . 
+ Microbiol Mol Biol Rev 76 ( 2 ) :217 -- 228 . 
+ 9 . 
+ Taylor AF , Smith GR ( 2003 ) RecBCD enzyme is a DNA helicase with fast and slow motors of opposite polarity . 
+ Nature 423 ( 6942 ) :889 -- 893 . 
+ 10 . 
+ Handa N , et al. ( 2012 ) Molecular determinants responsible for recognition of the single-stranded DNA regulatory sequence , χ , by RecBCD enzyme . 
+ Proc Natl Acad Sci USA 109 ( 23 ) :8901 -- 8906 . 
+ 11 . 
+ Dixon DA , Kowalczykowski SC ( 1995 ) Role of the Escherichia coli recombination hotspot , Chi , in RecABCD-dependent homologous pairing . 
+ J Biol Chem 270 ( 27 ) : 16360 -- 16370 . 
+ 12 . 
+ Anderson DG , Kowalczykowski SC ( 1998 ) SSB protein controls RecBCD enzyme nuclease activity during unwinding : A new role for looped intermediates . 
+ J Mol Biol 282 ( 2 ) :275 -- 285 . 
+ 13 . 
+ Ponticelli AS , Schultz DW , Taylor AF , Smith GR ( 1985 ) Chi-dependent DNA strand cleavage by RecBC enzyme . 
+ Cell 41 ( 1 ) :145 -- 151 . 
+ 14 . 
+ Taylor AF , Schultz DW , Ponticelli AS , Smith GR ( 1985 ) RecBC enzyme nicking at Chi sites during DNA unwinding : Location and orientation-dependence of the cutting . 
+ Cell 41 ( 1 ) :153 -- 163 . 
+ 15 . 
+ Anderson DG , Kowalczykowski SC ( 1997 ) The recombination hot spot Chi is a regulatory element that switches the polarity of DNA degradation by the RecBCD enzyme . 
+ Genes Dev 11 ( 5 ) :571 -- 581 . 
+ 16 . 
+ Taylor AF , Smith GR ( 1999 ) Regulation of homologous recombination : Chi inactivates RecBCD enzyme by disassembly of the three subunits . 
+ Genes Dev 13 ( 7 ) :890 -- 900 . 
+ 17 . 
+ Darmon E , Eykelenboom JK , Lopez-Vernaza MA , White MA , Leach DR ( 2014 ) Repair on the go : E. coli maintains a high proliferation rate while repairing a chronic DNA double-strand break . 
+ PLoS One 9 ( 10 ) : e110784 . 
+ 18 . 
+ Bailey T , et al. ( 2013 ) Practical guidelines for the comprehensive analysis of ChIP-seq data . 
+ PLOS Comput Biol 9 ( 11 ) : e1003326 . 
+ 19 . 
+ Dixon DA , Kowalczykowski SC ( 1993 ) The recombination hotspot Chi is a regulatory sequence that acts by attenuating the nuclease activity of the E. coli RecBCD enzyme . 
+ Cell 73 ( 1 ) :87 96 . 
+ -- 20 . 
+ Taylor AF , Smith GR ( 1992 ) RecBCD enzyme is altered upon cutting DNA at a chi recombination hotspot . 
+ Proc Natl Acad Sci USA 89 ( 12 ) :5226 -- 5230 . 
+ 21 . 
+ Spies M , et al. ( 2003 ) A molecular throttle : The recombination hotspot Chi controls DNA translocation by the RecBCD helicase . 
+ Cell 114 ( 5 ) :647 -- 654 . 
+ 22 . 
+ Singleton MR , Dillingham MS , Gaudier M , Kowalczykowski SC , Wigley DB ( 2004 ) Crystal structure of RecBCD enzyme reveals a machine for processing DNA breaks . 
+ Nature 432 ( 7014 ) :187 -- 193 . 
+ 23 . 
+ Bianco PR , et al. ( 2001 ) Processive translocation and DNA unwinding by individual RecBCD enzyme molecules . 
+ Nature 409 ( 6818 ) :374 -- 378 . 
+ 24 . 
+ Taylor A , Smith GR ( 1980 ) Unwinding and rewinding of DNA by the RecBC enzyme . 
+ Cell 22 ( 2 Pt 2 ) :447 -- 457 . 
+ 25 . 
+ Liu B , Baskin RJ , Kowalczykowski SC ( 2013 ) DNA unwinding heterogeneity by RecBCD results from static molecules able to equilibrate . 
+ Nature 500 ( 7463 ) :482 -- 485 . 
+ 26 . 
+ Waldminghaus T , Skarstad K ( 2010 ) ChIP on Chip : Surprising results are often artifacts . 
+ BMC Genomics 11:414 . 
+ 27 . 
+ Gruber S , Errington J ( 2009 ) Recruitment of condensin to replication origin regions by ParB/SpoOJ promotes chromosome segregation in B. subtilis . 
+ Cell 137 ( 4 ) :685 -- 696 . 
+ 28 . 
+ Barre FX , et al. ( 2001 ) Circles : The replication-recombination-chromosome segregation connection . 
+ Proc Natl Acad Sci USA 98 ( 15 ) :8189 8195 . 
+ -- 29 . 
+ Kogoma T ( 1997 ) Stable DNA replication : Interplay between DNA replication , homologous recombination , and transcription . 
+ Microbiol Mol Biol Rev 61 ( 2 ) :212 238 . 
+ -- 30 . 
+ Corre J , Cornet F , Patte J , Louarn JM ( 1997 ) Unraveling a region-specific hyperrecombination phenomenon : Genetic control and modalities of terminal recombination in Escherichia coli . 
+ Genetics 147 ( 3 ) :979 -- 989 . 
+ 31 . 
+ Horiuchi T , Fujimura Y , Nishitani H , Kobayashi T , Hidaka M ( 1994 ) The DNA replication fork blocked at the Ter site may be an entrance for the RecBCD enzyme into duplex DNA . 
+ J Bacteriol 176 ( 15 ) :4656 -- 4663 . 
+ 32 . 
+ Wendel BM , Courcelle CT , Courcelle J ( 2014 ) Completion of DNA replication in Escherichia coli . 
+ Proc Natl Acad Sci USA 111 ( 46 ) :16454 -- 16459 . 
+ 33 . 
+ Rudolph CJ , Upton AL , Stockum A , Nieduszynski CA , Lloyd RG ( 2013 ) Avoiding chromosome pathology when replication forks collide . 
+ Nature 500 ( 7464 ) :608 -- 611 . 
+ 34 . 
+ Blattner FR , et al. ( 1997 ) The complete genome sequence of Escherichia coli K-12 . 
+ Science 277 ( 5331 ) :1453 -- 1462 . 
+ 35 . 
+ Link AJ , Phillips D , Church GM ( 1997 ) Methods for generating precise deletions and insertions in the genome of wild-type Escherichia coli : Application to open reading frame characterization . 
+ J Bacteriol 179 ( 20 ) :6228 -- 6237 . 
+ 36 . 
+ Merlin C , McAteer S , Masters M ( 2002 ) Tools for characterization of Escherichia coli genes of unknown function . 
+ J Bacteriol 184 ( 16 ) :4573 -- 4581 . 
+ 37 . 
+ Darmon E , et al. ( 2007 ) SbcCD regulation and localization in Escherichia coli . 
+ J Bacteriol 189 ( 18 ) :6686 -- 6694 . 
+ 38 . 
+ White MA , Eykelenboom JK , Lopez-Vernaza MA , Wilson E , Leach DR ( 2008 ) Nonrandom segregation of sister chromosomes in Escherichia coli . 
+ Nature 455 ( 7217 ) : 1248 -- 1250 . 
+ 39 . 
+ Chen Z , Yang H , Pavletich NP ( 2008 ) Mechanism of homologous recombination from the RecA-ssDNA/dsDNA structures . 
+ Nature 453 ( 7194 ) :489 -- 494 . 
+ 40 . 
+ Galletto R , Amitani I , Baskin RJ , Kowalczykowski SC ( 2006 ) Direct observation of individual RecA filaments assembling on single DNA molecules . 
+ Nature 443 ( 7113 ) : 875 -- 878 . 
+ 41 . 
+ Pugh BF , Cox MM ( 1987 ) Stable binding of recA protein to duplex DNA . 
+ Unraveling a paradox . 
+ J Biol Chem 262 ( 3 ) :1326 -- 1336 . 
+ 42 . 
+ Croucher NJ , et al. ( 2009 ) A simple method for directional transcriptome sequencing using Illumina technology . 
+ Nucleic Acids Res 37 ( 22 ) : e148 . 
+ 43 . 
+ Khil PP , Smagulova F , Brick KM , Camerini-Otero RD , Petukhova GV ( 2012 ) Sensitive mapping of recombination hotspots using sequencing-based detection of ssDNA . 
+ Genome Res 22 ( 5 ) :957 -- 965 . 
+ 44 . 
+ Yamane A , et al. ( 2013 ) RPA accumulation during class switch recombination represents 5 ′ -3 ′ DNA-end resection during the S-G2 / M phase of the cell cycle . 
+ Cell Reports 3 ( 1 ) :138 -- 147 . 
+ 45 . 
+ Webb S , Hector RD , Kudla G , Granneman S ( 2014 ) PAR-CLIP data indicate that Nrd1-Nab3-dependent transcription termination regulates expression of hundreds of protein coding genes in yeast . 
+ Genome Biol 15 ( 1 ) : R8 . 
+ 46 . 
+ Nicol JW , Helt GA , Blanchard SG , Jr , Raja A , Loraine AE ( 2009 ) The Integrated Genome Browser : Free software for distribution and exploration of genome-scale datasets . 
+ Bioinformatics 25 ( 20 ) :2730 -- 2731 . 
+ 47 . 
+ Zhang Y , et al. ( 2008 ) Model-based analysis of ChIP-Seq ( MACS ) . 
+ Genome Biol 9 ( 9 ) : R137 . 
+ 48 . 
+ Schweikert G , Cseke B , Clouaire T , Bird A , Sanguinetti G ( 2013 ) MMDiff : Quantitative testing for shape changes in ChIP-Seq data sets . 
+ BMC Genomics 14:826 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/26307168.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/26307168.txt 0 → 100644
View file @27818a9
+ Zhong Qian,a Mirjana Macvanin,a Emilios K. Dimitriadis,b Ximiao He,c Victor Zhurkin,d Sankar Adhyaa
+ ABSTRACT Repeated extragenic palindromes ( REPs ) in the enterobacterial genomes are usually composed of individual palindromic units separated by linker sequences . 
+ A total of 355 annotated REPs are distributed along the Escherichia coli genome . 
+ RNA sequence ( RNAseq ) analysis showed that almost 80 % of the REPs in E. coli are transcribed . 
+ The DNA sequence of REP325 showed that it is a cluster of six repeats , each with two palindromic units capable of forming cruciform structures in supercoiled DNA . 
+ Here , we report that components of the REP325 element and at least one of its RNA products play a role in bacterial nucle-oid DNA condensation . 
+ These RNA not only are present in the puriﬁed nucleoid but bind to the bacterial nucleoid-associated HU protein as revealed by RNA IP followed by microarray analysis ( RIP-Chip ) assays . 
+ Deletion of REP325 resulted in a dramatic increase of the nucleoid size as observed using transmission electron microscopy ( TEM ) , and expression of one of the REP325 RNAs , nucleoid-associated noncoding RNA 4 ( naRNA4 ) , from a plasmid restored the wild-type condensed structure . 
+ Independently , chromosome conformation capture ( 3C ) analysis demonstrated physical connections among various REP elements around the chromosome . 
+ These connections are dependent in some way upon the presence of HU and the REP325 element ; deletion of HU genes and/or the REP325 element removed the connections . 
+ Finally , naRNA4 together with HU condensed DNA in vitro by connecting REP325 or other DNA sequences that contain cruciform structures in a pairwise manner as observed by atomic force microscopy ( AFM ) . 
+ On the basis of our results , we propose molecular models to explain connections of remote cruciform structures mediated by HU and naRNA4 . 
+ IMPORTANCE Nucleoid organization in bacteria is being studied extensively , and several models have been proposed . 
+ However , the molecular nature of the structural organization is not well understood . 
+ Here we characterized the role of a novel nucleoid-associated noncoding RNA , naRNA4 , in nucleoid structures both in vivo and in vitro . 
+ We propose models to explain how naRNA4 together with nucleoid-associated protein HU connects remote DNA elements for nucleoid condensation . 
+ We present the ﬁrst evidence of a noncoding RNA together with a nucleoid-associated protein directly condensing nucleoid DNA . 
+ Noncoding RNAs ( ncRNAs ) present in both prokaryotic and eukaryotic cells do not function as mRNA , tRNA , or rRNA ( 1 ) . 
+ Although many ncRNAs of different sizes and different functions have been widely reported ( 2 -- 6 ) , new ncRNAs with new functions are still being discovered . 
+ Recently , we discovered a novel ncRNA , transcribed from a speciﬁc repeated extragenic pal-indromic element , REP325 , in the chromosome of Escherichia coli by RNA IP followed by microarray analysis ( RIP-Chip ) assays of the nucleoid-associated HU protein ( 7 ) . 
+ In this paper , we termed it as nucleoid-associated ncRNA ( naRNA ) . 
+ REP elements in the enterobacterial genomes , ﬁrst reported 30 years ago , contain individual palindromes separated by linkers ( 8 -- 10 ) . 
+ The functions of the REPs have been speculated to be related to transcription termination signals , binding sites for proteins , cleavage sites for DNA gyrase , and , possibly , manipulation of nucleoid structures ( 11 -- 14 ) . 
+ In this study , we investigated potential functions of the REP325 element , which is located between genes yjdM ( phnA ) and yjdN ( phnB ) , and its RNA products . 
+ REP325 contains six homologous units ( Fig. 1A ) . 
+ Each repeat is composed of two palindromic cruciform-generating motifs , Y and Z2 , connected by a short linker , l. Cells deleted for the REP325 segment and/or hup genes encoding the nucleoid-associated HU protein showed a decondensed nucleoid structure , suggesting that these two factors participate in nucleoid condensation ( 7 ) . 
+ RNA sequencing ( RNAseq ) analysis and nucleoid RNA tiling array clearly showed the existence of RNA species transcribed from each unit of REP325 , named naRNA1 to naRNA6 ( Fig. 1A ) . 
+ Multialignment of DNA sequences of the 6 repeats in REP325 showed high homology ( Fig. 1B ) . 
+ Each naRNA contains two potential hairpins , corresponding to Y and Z2 motifs ( Fig. 1C ) . 
+ It is unknown whether these six RNAs are transcribed independently or are the result of processing of a larger RNA transcribed from a common promoter ( the promoter of the upstream gene , yjdM ) . 
+ One of these RNAs is naRNA4 , which binds two dimeric forms of HU , HU and HU . 
+ In this report , we show that the expression of naRNA4 from a plasmid restores the decondensed morphology of the nucleoid caused by 
+ REP325 deletion . 
+ We also show intersegmental connections in vivo between different remote cruciform-containing DNA structures like those present in REP elements ; the connections are affected by deletion of hup genes encoding HU subunits and/or of REP325 . 
+ We also demonstrate the connections between cruciforms in DNA in vitro in the presence of naRNA4 and HU . 
+ We propose that these connections are a major part of the cellular nucleoid architecture and help its condensation . 
+ RESULTS The existence of nucleoid-associated RNA: tiling array and
+ RNAseq analysis . 
+ Ohniwa et al. showed that the E. coli nucleoid is a 40-nm-thick ﬁbrous structure as observed by atomic force microscopy ( AFM ) ; the ﬁbers assume 10-nm-thick structures in cells devoid of the nucleoid-associated protein ( NAP ) HU ( 15 ) . 
+ They also showed the existence of 10-nm-thick nucleoid ﬁbers after RNase treatment , suggesting a role of some RNA and HU in nucleoid architecture . 
+ Pettijohn and Hecht also suggested that the E. coli nucleoid contains RNAs which are important for the structural integrity of the nucleoid ( 16 ) . 
+ We isolated RNA from the E. coli nucleoid , which was puriﬁed from cells cultured in minimal medium , and identiﬁed structural elements by the use of a DNA tiling array . 
+ It is clear that the nucleoid RNA contains fragments of rRNAs , tRNAs , a few mRNAs , and many ncRNAs , many of which are also present among HU binding RNAs ( 7 ) . 
+ It is noticeable that more than 30 ncRNAs are transcribed from the REP elements ( see Text S1 in the supplemental material ) . 
+ In order to analyze the transcription proﬁle of the REP elements in E. coli , we analyzed RNAseq data that was previously published and deposited in the NCBI Sequence Read Archive ( 17 ) . 
+ According to the gene annotation of E. coli MG1655 , there are a total of 355 REP elements , 152 of which are transcribed in cells cultured in deﬁned minimal medium ( see Text S3 and Text S4 in the supplemental material ) . 
+ In our previous study , two overlapping RNA reads , which were part of transcripts from REP325 and identiﬁed in RIP-Chip assays of HU protein , were assumed to originate from a single RNA , named nc5 RNA ( 7 ) . 
+ Exploring the RNAseq data with unique alignments revealed RNA sequences matched to six repeated structures ( Y-l-Z2 ) in the REP325 ( see Fig . 
+ S1A in the supplemental material ) . 
+ It appears to generate six RNA species , now named naRNA1 to naRNA6 ( Fig. 1A ) . 
+ The DNA repeats are connected by ﬁve unknown motifs with identical sequences ( named U motifs ) , which are also transcribed . 
+ We have not determined if the entire REP segment is 325 cotranscribed and processed into six naRNAs . 
+ But the abundance of matched readings suggests a potential direction of transcription of the entire REP325 from the promoter of the upstream yjdM ( phnA ) gene to the position upstream of the yjdN ( phnB ) gene . 
+ Furthermore , sequence analysis of the unmatched gaps for the REP325 segment showed high sequence similarity between the repeats . 
+ Multisequence alignments of DNA sequences encoding the naRNA1 gene to the naRNA6 gene indicated that these six repeats share high sequence homology and that the sequence of the naRNA4 gene is identical to that of the naRNA2 gene ( Fig. 1B ) . 
+ Multiple-sequence alignments also showed that the entire REP325 is transcribed ( see Fig . 
+ S1B ) . 
+ RNA secondary structure analysis conﬁrmed that RNAs from the repeats have similar structures and contain the Y and Z2 potential hairpins . 
+ The typical structure of naRNA4 is shown in Fig. 1C . 
+ Decondensation of nucleoid in vivo : TEM analysis . 
+ Transmission electron microscopy ( TEM ) observations previously showed that deletion mutants of HU genes ( hupA hupB strain ) and/or of the REP325 element ( REP325 gene strain ) decondensed the E. coli nucleoid in both growing and nongrowing cells compared to a compacted nucleoid observed in the wild-type strain under similar conditions , suggesting that HU and part or all of REP325 DNA and/or its RNA product affect nucleoid architecture ( 7 ) . 
+ We extended the TEM observations further by investigating the details of REP325 participation in the nucleoid structure . 
+ We ﬁrst conﬁrmed that in growing cells , compared to wild-type results , deletion of HU genes or of REP325 decondensed the nucleoid size ( Fig. 2A ; nucleoids are outlined in red ) . 
+ Moreover , the REP325 deletion strain carrying a plasmid vector showed no change in decondensed nucleoid morphology . 
+ But expression of naRNA4 from the plasmid reproducibly condensed the nucleoid . 
+ We observed some overcondensation that was most likely due to overexpression of naRNA4 . 
+ The expression of an unrelated RNA from the same plasmid had no effect on the decondensed nucleoid . 
+ We also tested the effect of expression of derivatives of naRNA4 containing only a Z2 or Y motif in the same REP325-deleted cells . 
+ The expression of RNA containing only the Z2 or Y motif alone , unlike that of the intact naRNA4 , did not restore the nucleoid morphology to wild type , although we do not know anything about the relative stability of naRNA4 or its truncated derivatives under the conditions of the experiments . 
+ TEM analysis performed with nongrowing cells in the same set of strains gave identical results ; only the presence of an intact naRNA4 caused nucleoid condensation ( Fig. 2B ) . 
+ We note that , in the absence of any simple way to quantify the nucleoid volume in TEM observations , we estimated the size of the nucleoid in two-dimensional ( 2-D ) analysis of the thin sections ( shown by red outlines in Fig. 2 ) . 
+ Although these observations show the involvement of HU and naRNA4 in nucleoid condensation , a direct participation of any part of the REP325 element in the process is not apparent . 
+ If DNA is involved in the condensation , as seems very likely , other DNA sequences homologous to REP325 may fulﬁll the same role . 
+ In summary , this is ﬁrst demonstration of a direct involvement of an ncRNA in DNA condensation at the molecular level . 
+ Intersegmental chromosomal interactions in vivo : 3C analysis . 
+ We hypothesized that one plausible mechanism of nucleoid structural organization is that of facilitating contacts between REP elements around the chromosome by naRNA4 and the nucleoid protein HU . 
+ To test the idea , we employed the chromosome conformation capture ( 3C ) approach , the use of which has been established in studies of distal intrachromosomal interactions in vivo in both eukaryotes and prokaryotes ( 18 -- 20 ) . 
+ We designed primers for 23 randomly selected REP segments , including 
+ REP325 , around the chromosome to investigate potential connections between REP sites ( primer sequences are listed in Table S1 in the supplemental material ) . 
+ Of 253 pairs tested , 27 combinations showed positive PCR ampliﬁcations , suggesting that the corresponding DNA segments may be connected to each other ( Fig. 3 ) . 
+ These pairs were further tested by 3C in the following mutants : the hup mutant , the REP325 mutant , and hup REP325 mutant . 
+ We measured the ratio of PCR signals in each mutant compared to that in the wild-type strain after normalization to an internal control . 
+ In interpreting the positive ampliﬁcation results in 3C experiments in the following discussion , we assume that an observed contact involves the REP elements and not another DNA sequence present nearby in the chromosome . 
+ Similarly , when a contact signal observed in the wild-type strain is missing in the REP325 deletion strain , we assume that it is because of the absence of naRNA4 . 
+ The effect of the HU and REP325 deletions on the observed intrachromosomal interactions were grouped into four classes . 
+ ( i ) HU and naRNA4 independent . 
+ Deletion of either HU and/or REP325 has no effect on the interactions suggesting perhaps other NAPs and RNAs are involved in DNA contacts ( 4 out of 27 ) . 
+ ( ii ) HU dependent . 
+ Deletion of the HU gene signiﬁcantly affected the interactions while deletion of REP325 did not ( 6 out of 27 ) . 
+ In these cases , HU together with other RNA may be involved in bringing DNA contacts . 
+ ( iii ) HU and naRNA4 dependent . 
+ Deletion of either the HU gene or REP325 signiﬁcantly affected the interactions ( 4 out of 27 ) . 
+ In these cases , both HU and naRNA4 are speciﬁcally involved in DNA contacts . 
+ ( iv ) HU or naRNA4 dependent . 
+ Only deletion of both HU and REP325 signiﬁcantly affected the interactions while removal of either HU or RNA4 did not ( 13 out of 27 ) . 
+ In this group , HU collaborates with another RNA or naRNA4 collaborates with another protein for DNA-DNA interactions . 
+ DNA condensation mediated by HU and naRNA4 in vitro : AFM analysis . 
+ Both TEM and 3C analyses showed the involvement of HU and naRNA4 in nucleoid organization . 
+ However , they did not reveal any mechanistic details . 
+ We used AFM to monitor any condensing effects of HU protein and naRNA4 on naked supercoiled DNA in vitro to get some insights about the mechanism ( s ) of their action . 
+ A plasmid containing one REP325 ( pQZ080 ) was used as the template for AFM ( Fig. 4 ) . 
+ The addition of naRNA4 or of different HU dimers to the plasmid did not noticeably change its supercoiled morphology . 
+ The absence of any effect of HU in this experiment is consistent with previous reports ( 15 , 21 ) . 
+ However , the presence of either HU or HU together with naRNA4 dramatically condensed the DNA , apparently because of the presence of multiple intersegmental contacts , thus demonstrating that naRNA4 collaborates with HU in condensing DNA ( Fig. 4A ) . 
+ Note that either HU or HU dimer works but not HU dimer ( , , and ) . 
+ Since the plasmid DNA contained only one REP325 element , the multiple contacts very likely involve either nonspeciﬁc DNA binding or some other sequences in the plasmid that allow DNA contacts . 
+ The REP325 palindromic repeats generate cruciform structures in a supercoiled state . 
+ We believe that several transcription terminators ( 22 ) that are present in the plasmid and which also generate cruciform structures perform the role of REP325 . 
+ Consistent with this idea , the addition of HU and naRNA4 to the parental plasmid ( pSA508 ) containing no authentic REP element but several transcription terminators also resulted in DNA condensation ( Fig. 4B ) . 
+ To investigate the features of naRNA4 required in DNA condensation , we ﬁrst tested whether an intact naRNA4 is needed to condense DNA . 
+ We performed AFM analysis with three 77-nucleotide ( nt ) RNAs : a nonspeciﬁc control RNA , an RNA containing only the Y motif , and an RNA containing only the Z2 motif ( see Table S2 in the supplemental material ) . 
+ Compared to naRNA4 , none of these RNAs condensed DNA ( Fig. 4B ) , which is in agreement with the TEM data ( Fig. 2 ) . 
+ We also asked whether the exact sequences or only hairpin features of Z2 and Y motifs are involved . 
+ We synthesized a 77-nt-long RNA whose sequence was the exact complement of naRNA4 . 
+ This anti-naRNA4 molecule has an RNA sequence completely different from that of naRNA4 but should contain two hairpin structures . 
+ Figure 4B shows that the antinaRNA4 also condensed DNA in the presence of HU protein , indicating that it is the secondary structure of naRNA4 and not the sequence of the RNA itself that is important in DNA condensation . 
+ Because of the large sizes of pQZ080 and pSA508 ( 3.9 kb and 3.5 kb , respectively ) , we could not discern the precise organization of the DNA contact points involved in the observed DNA condensation . 
+ To simplify the condensed DNA structure , we tested a number of minicircle DNAs which contain 0 , 1 , or 2 potential condensation sites ( cruciform structures originated from REP325 or one of the transcription terminators , rpoC , present in the pa-rental plasmid , pSA508 ) at marked positions : mini103 , mini104 , mini105 , mini106 , mini107 , and mini120 ( Fig. 4C ; see also Fig . 
+ S2 in the supplemental material ) . 
+ Any looping generated using the marked cruciform sites can be discerned by measuring the size of the DNA loops between contact points . 
+ As elaborated below , consistent with our proposal , HU - and RNA4-mediated DNA condensation in vitro requires only a cruciform structure in DNA and not other parts of the multipalindromic unit and the presence of both HU and naRNA4 on minicircle DNAs ( Fig. 4C ) . 
+ `` Figure 8 '' structures , which are caused by either random crossover of DNA or speciﬁc bridging of DNA due to the presence of HU and naRNA4 , were observed in all minicircle DNAs . 
+ To determine whether the looping was random or speciﬁc , we measured each loop from the ﬁgure 8 structures in the minicircle DNAs in the presence of naRNA4 and HU protein . 
+ We observed that the frequency of ﬁgure 8 structures with the expected loop sizes resulting from interactions between any two marked cruciform structures was signiﬁcantly higher in all minicircle DNAs except in mini103 , which has no cruciform structure . 
+ For example , we found 79 ﬁg-ure 8 structures in mini103 in the presence of naRNA4 and HU , and none was found with an expected loop size . 
+ In mini106 , mini107 , and mini120 containing two cruciforms , the ratios of the numbers of observed ﬁgure 8 structures with expected loop sizes to the total numbers of counted structures were 24/49 , 48/101 , and 28/53 , respectively . 
+ These results suggest that HU and naRNA4 form bridges between two cruciforms present in a minicircle DNA . 
+ For mini104 and mini105 plasmids , which contain one rpoC transcription terminator of pSA508 and one REP325 repeat , respectively , we observed only random ﬁgure 8 structures and , occasionally , two or more minicircle DNAs bridged together by a complex core ( red arrows in Fig . 
+ S2 in the supplemental material ) , suggesting that two cruciform structures present in different DNA molecules can connect . 
+ DISCUSSION
+ It has become clear that the organization of the chromosome in E. coli is not random . 
+ The chromosome is not merely a disordered aggregate of randomly coiled DNA . 
+ Instead , it is a dynamic but spatially organized deﬁned entity that undergoes strictly controlled and reproducible changes when they are needed ( 23 ) . 
+ Structural models of elements such as `` macro domains , '' `` supercoiled topological loops , '' `` ﬁlaments , '' and `` remote connections '' are suggested to represent structural constituents of chromosomes from observations using different approaches ( 19 , 24 -- 27 ) . 
+ A number of NAPs , such as the HU , Fis , IHF , H-NS , and SMC proteins , modulate chromosome structure . 
+ We focused on HU , which binds to DNA nonspeciﬁcally but prefers distorted DNA structures such as nicks , gaps , bends , and cruciforms ( 28 , 29 ) . 
+ Due to its high abundance and growth-phase-dependent subunit compositions ( HU , HU , and HU ) , HU is believed to modulate chromosome structure in accordance with the growth phase of the cell ( 30 ) . 
+ We conﬁrmed that HU binds to naRNA4 and to several other RNAs by electrophoretic mobility shift assay ( EMSA ) ( see Fig . 
+ S3 in the supplemental material ) , but not all HU-RNA bindings could help DNA condensation both in vivo ( Fig. 2 ) and in vitro but bound to those which contained two hairpin structures ( Fig. 4 ) . 
+ Thus , an HU-naRNA4 interaction may be somewhat unusual and speciﬁc ; the presence of at least two hairpin motifs , such as Z2 and Y , in the RNA is needed for DNA condensation . 
+ We conclude that two cruciform structures in DNA , not yet completely deﬁned , interact with each other in a pairwise fashion for DNA condensation , which needs both HU and naRNA4 . 
+ We propose four mechanisms for interactions between two DNA cruciforms mediated by HU and naRNA4 ( Fig. 5 ) . 
+ ( i ) For DNA-naRNA-HU-naRNA-DNA interactions , each cruciform structure binds to one hairpin of naRNA and two DNA-bound RNAs are bridged together by an HU dimer using the other hairpins of the two naRNAs ( Fig. 5A ) . 
+ ( ii ) For DNA-naRNA ( HU ) - DNA interactions , the model is similar to model i , but the stoichiometry of HU and naRNA in the complex is 1:1 . 
+ HU binding to naRNA makes the latter amenable to interaction with two cruciforms ( Fig. 5B ) . 
+ ( iii ) For DNA-HU-naRNA-HU-DNA interactions , HU binds to cruciform DNA ; two bound-HU dimers are then connected by a molecule of naRNA through the two hairpins ( Fig. 5C ) . 
+ ( iv ) For DNA-HU ( naRNA ) - DNA interactions , the model is similar to model iii , but the stoichiometry of HU and naRNA in the complex is 1:1 . 
+ naRNA binding to HU makes the latter potent for interactions with two cruciform structures . 
+ We note here that in models ii and iv , it is possible that the roles of naRNA and HU , respectively , could be only catalytic and that they are not involved in the complex . 
+ At this stage , we are unable to prefer one model to the others except that a speciﬁc interaction between HU and a DNA cruciform structure has been previously established ( 31 , 32 ) , which would support models iii and iv . 
+ Cross-linking of the condensed DNA complexes followed by fragmentation and chemical identiﬁcation of the products may distinguish between the different models . 
+ MATERIALS AND METHODS
+ Construction of strains and plasmids . 
+ Wild-type E. coli MG1655 and the hupA hupB mutant were previously described ( 7 ) . 
+ The REP325 strain was constructed by mini - recombineering , in which REP325 was replaced by a Cat-SacB cassette ( 33 , 34 ) . 
+ Plasmid pQZ080 was constructed with the insertion of REP325 into pSA508 ( 35 ) at SacI to BamHI sites . 
+ Minicircle DNA was puriﬁed according to the method of Choy and Adhya ( 35 ) . 
+ Plasmid pNM12 was from Nadim Majdalani ( NIH , USA ) . 
+ DNA fragments encoding the naRNA4 gene and Con , Y-Con , and Con-Z2 genes were ampliﬁed using chemically synthesized single-stranded DNAs ( ssDNAs ) as templates and inserted into pNM12 at MscI to HindIII sites . 
+ All recombinant plasmids and pNM12 were transformed into the REP325 strain . 
+ Validation of expression of REP325 by analysis of RNAseq data . 
+ Raw RNAseq data for E. coli MG1655 obtained from the NCBI Sequence Read Archive ( accession no . 
+ SRP006793 ) ( 17 ) were mapped onto the E. coli genome using Novoalign software and allowing up to two mismatches between a 36-nt read and the genome sequence . 
+ Two different strategies , using unique map reads and total map reads , were applied , and the unique map read results and total map read results were preserved separately , with multiple alignments of up to 50 different locations in the genome . 
+ The alignment ﬁles ( sorted by bam format ) were used for visualization in tracks in the genome browser at the University of Southern California , Santa Cruz ( UCSC ) ( 36 ) . 
+ Synthesis of RNA used in AFM analysis . 
+ A series of complementary ssDNAs that contain a T7 promoter sequence ( 5 = - TAATACGACTCACT ATAGGGAGA-3 =) followed by experimental sequences and their complements , listed in Table S2 in the supplemental material , were chemically synthesized . 
+ The double-stranded DNAs ( dsDNAs ) were obtained by annealing the appropriate complementary ssDNAs ( 7 ) . 
+ Synthesis and puri-ﬁcation of RNAs were completed by the use of an AmpliScribe T7-Flash transcription kit according to the manufacturer 's instructions ( Epicentre , Madison , WI ) . 
+ The quality and quantity of RNAs were determined by the use of an agarose gel and a NanoDrop spectrophotometer ( Thermo Scientiﬁc , Wilmington , DE ) , respectively . 
+ For RNAs used in the gel shift assay , [ -32 P ] UTP was used instead of the unlabeled UTP provided in the kit . 
+ Electrophoretic mobility shift assay . 
+ Electrophoretic mobility shift assays in gels were done as described before ( 37 , 38 ) with modiﬁcations . 
+ Radioactively labeled RNAs were incubated with increasing amounts ( 0 to 1.6 mM ) of HU protein in binding buffer containing 20 mM Tris-HCl ( pH 7.5 ) , 0.2 M NaCl , and 10 % glycerol at 37 °C for 20 min . 
+ The mixtures were separated by the use of 8 % prerun native polyacrylamide gels and 1 Tris-borate-EDTA ( TBE ) buffer . 
+ Gels were ﬁnally exposed to X-ﬁlm at 80 °C . 
+ TEM analysis of nucleoid structure in E. coli . 
+ Strains used in TEM analysis were inoculated from plates with appropriate antibiotics into M63 minimal medium with 0.2 % fructose , 0.05 % Casamino Acids , and proper antibiotics and incubated at 37 °C overnight . 
+ The cultures were diluted into fresh medium as mentioned above with 0.1 % arabinose and grown to log or stationary phase for harvest . 
+ One milliliter of fresh cultures was mixed with an equal volume of Fixation buffer ( 8 % formaldehyde -- 4 % glutaraldehyde -- 0.2 M cacodylate buffer or 2 phosphatebuffered saline [ PBS ] ) and kept at room temperature for 2 h . 
+ The ﬁxed cell solutions were stored at 4 °C until TEM analyses were performed . 
+ Cells were spun down to form a small pellet and then processed for EM analysis of thin sections . 
+ Brieﬂy , the pellet was postﬁxed in 1 % osmium tetroxide ( Electron Microscopy Sciences , Ft. Washington , PA ) -- 0.1 M cacodylate buffer for 1 h at room temperature , stained in 0.5 % uranyl acetate -- 0.1 M acetate buffer for 1 h , and then dehydrated in a series of ethanol ( 35 % , 50 % , 75 % , 95 % , and 100 % ) and propylene oxide ( 100 % ) solutions . 
+ The pellets were inﬁltrated into 100 % propylene oxide and epoxy resin ( 1:1 ) overnight and embedded in pure resin the following day . 
+ The epoxy resin was cured in a 55 °C oven for 48 h , and 70-to-80-nm-thick sections were made and mounted on copper grids ( 300 mesh ) and stained using uranyl acetate followed by lead citrate . 
+ The cells were examined and imaged using a model H7600 TEM ( Hitachi , Tokyo , Japan ) operated at 80 kV . 
+ Images were captured by a bottom-mounted charge-coupled-device ( CCD ) camera ( Gatan , Pleasanton , CA ) . 
+ 3C analysis of intrachromosomal interactions . 
+ 3C analysis was carried out as previously described ( 19 ) . 
+ Primers are listed in Table S1 in the supplemental material . 
+ After PCR , the products were separated by electrophoresis on a 2 % agarose gel . 
+ Each ampliﬁed band of the images was quantitatively measured by the use of 1-D gel analysis software ( UVP Bioimaging Systems ) . 
+ The fold change of interaction frequency for each primer pair was determined as the ratio of the 3C products of a given mutant to those of the wild-type strain after normalization to the internal control . 
+ Each frequency value represents the average of the results of four independent experiments . 
+ AFM analysis of DNA condensation in vitro . 
+ Sample preparations and AFM analysis were performed as reported previously ( 19 ) with some modiﬁcations . 
+ After the binding step , samples were not directly delivered to AFM analysis . 
+ Formaldehyde was added to reach a ﬁnal concentration of 1 % , and the reaction mixture was incubated for 15 min at room temperature , followed by quenching with glycine at a ﬁnal concentration of 0.125 M for 5 min at room temperature . 
+ Images were preprocessed using the instrument image processing software and then exported for further analysis with the NIH ImageJ image processing software package . 
+ The lengths of DNA loops observed in minicircle DNA were measured by tracing . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at http://mbio.asm.org/ lookup/suppl/doi :10.1128 / mBio.00998-15 / - / DCSupplemental . 
+ Text S1 , DOC ﬁle , 0.1 MB . 
+ Text S2 , XLSX ﬁle , 0.04 MB . 
+ Text S3 , XLSX ﬁle , 0.02 MB . 
+ Text S4 , XLSX ﬁle , 0.02 MB . 
+ Figure S1 , PDF ﬁle , 0.2 MB . 
+ Figure S2 , PDF ﬁle , 0.1 MB . 
+ Figure S3 , PDF ﬁle , 0.05 MB . 
+ This research was supported by the Intramural Research Program of the National Institutes of Health , National Cancer Institute , Center for Cancer Research . 
+ We thank our colleagues Dale Lewis , Amlan Dhar , Phuoc Le , Sangmi Lee , and Andrei Trostel for assistance .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/26670385.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/26670385.txt 0 → 100644
View file @27818a9
+ Impact of Anaerobiosis on Expression of the Iron-Responsive Fur and RyhB Regulons
+ ABSTRACT Iron , a major protein cofactor , is essential for most organisms . 
+ Despite the well-known effects of O2 on the oxidation state and solubility of iron , the impact of O2 on cellular iron homeostasis is not well understood . 
+ Here we report that in Escherichia coli K-12 , the lack of O2 dramatically changes expression of genes controlled by the global regulators of iron homeo-stasis , the transcription factor Fur and the small RNA RyhB . 
+ Using chromatin immunoprecipitation sequencing ( ChIP-seq ) , we found anaerobic conditions promote Fur binding to more locations across the genome . 
+ However , by expression proﬁling , we discovered that the major effect of anaerobiosis was to increase the magnitude of Fur regulation , leading to increased expression of iron storage proteins and decreased expression of most iron uptake pathways and several Mn-binding proteins . 
+ This change in the pattern of gene expression also correlated with an unanticipated decrease in Mn in anaerobic cells . 
+ Changes in the genes posttranscriptionally regulated by RyhB under aerobic and anaerobic conditions could be attributed to O2-dependent changes in transcription of the target genes : aerobic RyhB targets were enriched in iron-containing proteins associated with aerobic energy metabolism , whereas anaerobic RyhB targets were enriched in iron-containing anaerobic respiratory functions . 
+ Overall , these studies showed that anaerobiosis has a larger impact on iron homeostasis than previously anticipated , both by expanding the number of direct Fur target genes and the magnitude of their regulation and by altering the expression of genes predicted to be posttranscriptionally regulated by the small RNA RyhB under iron-limiting conditions . 
+ IMPORTANCE Microbes and host cells engage in an `` arms race '' for iron , an essential nutrient that is often scarce in the environment . 
+ Studies of iron homeostasis have been key to understanding the control of iron acquisition and the downstream pathways that enable microbes to compete for this valuable resource . 
+ Here we report that O2 availability affects the gene expression programs of two Escherichia coli master regulators that function in iron homeostasis : the transcription factor Fur and the small RNA regulator RyhB . 
+ Fur appeared to be more active under anaerobic conditions , suggesting a change in the set point for iron homeostasis . 
+ RyhB preferentially targeted iron-containing proteins of respiration-linked pathways , which are differentially expressed under aerobic and anaerobic conditions . 
+ Such ﬁndings may be relevant to the success of bacteria within their hosts since zones of reduced O2 may actually reduce bacterial iron demands , making it easier to win the arms race for iron . 
+ In nearly all organisms , iron is an essential nutrient that serves as a protein cofactor in pathways ranging from central metabolism to genome maintenance . 
+ In single-cell organisms , such as bacteria , maintaining a pool of intracellular iron sufﬁcient to cofactor proteins requires coordination of synthesis of iron-containing proteins and cofactors with iron uptake and iron storage ( 1 , 2 ) . 
+ Although this process of iron homeostasis has been well studied in Escherichia coli K-12 , most of our understanding comes from analyzing cells grown in the presence of O2 ( 2 -- 4 ) , conditions known to result in oxidation of Fe2 to Fe3 , decreased iron solubility , and the formation of reactive oxygen species ( 5 ) . 
+ In contrast , less is known about iron homeostasis during anaerobiosis , conditions in which the soluble form of iron ( Fe2 ) is more stable and many important iron-requiring activities ( e.g. , cyclic photosynthesis , N2 ﬁxation , and anaerobic respiration ) of bacteria occur ( 6 ) . 
+ We are interested in determining how O2 availability alters the expression of genes needed to maintain cellular pools of iron . 
+ In E. coli K-12 , the transcription factor Fur ( ferric uptake regulator ) ( 7 , 8 ) and the small RNA RyhB ( 9 , 10 ) are the major regulators of iron homeostasis . 
+ Studies from cells grown under aerobic conditions have led to the prevailing view that Fe2 - Fur binds DNA when iron is sufﬁcient ( 8 , 11 , 12 ) , resulting in repression of most of its target genes . 
+ Functions repressed by Fur include RyhB ( 9 ) , the Fe3 - siderophore uptake pathways ( e.g. , fepA , fhuA , and cirA [ 7 , 13 ] ) , one of two Fe-S cluster biogenesis pathways ( sufAB-CDSE [ 14 -- 16 ] ) , the manganese uptake system ( mntH [ 17 ] ) , the manganese-containing superoxide dismutase ( sodA [ 18 ] ) , and the manganese-containing ribonucleotide reductase complex ( nrdHIEF [ 19 ] ) . 
+ In contrast , Fur increases the expression of ftnA , encoding an iron storage complex ( 20 ) . 
+ Siderophores are regarded as a major route of iron uptake under aerobic conditions , because their high afﬁnity for Fe3 compensates for the poor solubility of oxidized iron in the presence of O2 ( 21 , 22 ) . 
+ Although repression of iron uptake systems by Fur under iron-sufﬁcient conditions may seem counterintuitive , the low levels of these gene products are apparently adequate to supply iron for protein cofactors and storage . 
+ Upon iron limitation , Fe2 is not available to bind Fur , and Fur binding to its DNA sites decreases , resulting in decreased expression of Fur-induced genes ( i.e. , ftnA ) and increased expression of most of the Fur regulon , including RyhB ( 13 , 23 ) . 
+ The reported changes in Fur-dependent gene transcription under iron-limiting conditions portray a coordinated strategy of reducing iron storage , scavenging limiting iron , and switching to manganese-dependent proteins to replace those requiring iron for function . 
+ Although Fur function has not been systematically studied under anaerobic conditions in E. coli , some O2-dependent differences in Fur-regulated genes have been reported . 
+ Whereas expression of siderophore-mediated iron transport systems ( fepA , fhuA , cirA , tonB , and exbB [ 7 , 24 , 25 ] ) is more repressed under anaerobic conditions , expression of Fe2 transport ( feoABC ) is increased anaerobically ( 26 , 27 ) . 
+ Expression of the small RNA RyhB under iron-limiting conditions mediates an iron-sparing response to supply iron for critical iron-containing proteins by decreasing translation of certain ironcontaining proteins or increasing translation of iron uptake functions ( 10 , 28 ) . 
+ Base pairing of RyhB with speciﬁc mRNA transcripts results in either enhanced translation through disruption of an inhibitory complex ( 29 ) or decreased translation through Hfq recruitment , which is often accompanied by decreased transcript stability ( 30 , 31 ) . 
+ Transcription of a few known RyhB targets is repressed under anaerobic conditions ( sdhCDAB , acnA , acnB , and fumA ) ( 27 , 32 , 33 ) , suggesting that these RNAs would not be posttranscriptionally regulated under anaerobic conditions . 
+ The fact that transcription of genes encoding other iron-containing respiratory proteins is selectively upregulated under anaerobic conditions ( 32 , 34 -- 37 ) raises the question of whether these transcripts might be targets of RyhB under anaerobic conditions . 
+ Since it is challenging to predict direct targets of small RNAs like RyhB bioinformatically ( 38 , 39 ) , experimental studies are needed to identify RyhB candidates under anaerobic conditions . 
+ Here we address how anaerobiosis affects the Fur and RyhB regulons of E. coli K-12 . 
+ Chromatin immunoprecipitation followed by high-throughput sequencing ( ChIP-seq ) identiﬁed in vivo Fur DNA binding sites in the presence and absence of O2 . 
+ Global gene expression studies of wild-type and Fur ( fur ) strains cultured under aerobic or anaerobic conditions revealed genes regulated by Fur in an O2-dependent manner . 
+ Promoter fusions to lacZ conﬁrmed new Fur targets . 
+ Global gene expression studies of strains lacking RyhB ( ryhB ) or RyhB and Fur ( fur ryhB ) were used to identify the scope of possible RyhB targets during anaerobiosis . 
+ The metallome of cells grown under aerobic and anaerobic conditions was probed to ask if the intracellular availability of iron or other metals changes under anaerobic conditions . 
+ Our ﬁndings reveal major changes in the Fur and RyhB regulons under anaerobic conditions that tailor these gene expression programs to an O2-free lifestyle . 
+ RESULTS Fur binds to more genomic regions under anaerobic growth
+ conditions . 
+ To address how anaerobiosis impacts the Fur regulon , we mapped Fur DNA binding regions genome-wide in E. coli K-12 from cells grown in deﬁned , iron-sufﬁcient medium under aerobic or anaerobic conditions using ChIP-seq ( see Table S2 in the supplemental material ) . 
+ Under aerobic conditions , Fur bound to 96 locations , and two-thirds of these binding peaks were found in intergenic regions ( Fig. 1A ) . 
+ Under anaerobic conditions , Fur bound all sites identiﬁed under aerobic conditions and 157 additional locations ( Fig. 1B ) ; only half of these newly identiﬁed sites were in intergenic regions . 
+ Fur binding under anaerobic conditions was iron dependent because the vast majority of binding locations ( 247 out of 255 ) were either eliminated or greatly reduced when assayed under iron-limiting conditions ( Fig. 1C ) . 
+ Together , these data show that Fur is bound to more genomic regions under anaerobic conditions , and the iron dependence of its DNA binding suggests that Fur is interacting with its DNA sites in a regulated manner ( 23 ) . 
+ Fur binds to less-conserved sequences under anaerobic conditions . 
+ To understand why Fur is bound to more genomic regions under anaerobic conditions , we asked if there were DNA sequence differences between the sites bound only under anaerobic conditions versus those bound under both aerobic and anaerobic growth conditions . 
+ The DNA sequences within 100 bp of the summit of the iron-dependent Fur binding peaks were analyzed for overrepresented sequences using the motif-ﬁnding algorithm MEME-ChIP ( 40 ) . 
+ The motif derived from the iron-dependent DNA regions bound by Fur under both aerobic and anaerobic growth conditions ( Fig. 2 , top panel ) was similar to the signature inverted repeat 5 = - GATAAT-N1-ATTATC-3 = previously described as the Fur dimer binding site ( 41 -- 44 ) . 
+ However , the motif derived from the regions bound by Fur only under anaerobic conditions ( Fig. 2 , bottom panel ) revealed less sequence conservation to this canonical motif . 
+ Taken together , this analysis suggests that under anaerobic conditions , Fe2 - Fur is bound to potentially stronger afﬁnity sites represented by the signature Fur motif , as well as potentially weaker afﬁnity DNA sites represented by the less-conserved motif . 
+ Despite Fur binding to more locations under anaerobic conditions , the genes regulated by Fur are quite similar between aerobic and anaerobic conditions . 
+ Transcription proﬁling of Fur and Fur strains was used ﬁrst to determine whether Fur binding to the sites identiﬁed as speciﬁc to anaerobic conditions led to transcriptional regulation . 
+ Of the 178 operons whose expression we found to be Fur regulated under anaerobic conditions ( see Tables S3 to S5 in the supplemental material ) , only four operons were associated with a Fur ChIP-seq peak that was speciﬁc to anaerobic conditions and were not RyhB regulated ( see below ) ( Fig. 3 ; see Table S3 ) . 
+ Expression of three of these operons was increased by Fur under anaerobic conditions , and these encode PreTA , an Fe-S dihydropyrimidine dehydrogenase , Dps , a dualfunction nucleoid and iron sequestration protein , and AppY , a transcription activator of anaerobic metabolism . 
+ The fourth operon was repressed by Fur under anaerobic conditions and encodes GltA , a citrate synthase , which was reported previously to be regulated by Fur under aerobic conditions ( 13 ) . 
+ Thus , despite the fact that ChIP-seq identiﬁed 157 Fur binding sites speciﬁc to an-aerobic conditions , only four led to detectable transcription regulation under anaerobic conditions . 
+ In contrast , for the DNA regions bound by Fur under both aerobic and anaerobic conditions , more than one-third were associated with operons whose expression was regulated by Fur under anaerobic conditions ( see Table S3 in the supplemental material ) . 
+ Indeed , the majority of these 36 operons were already known from studies carried out under aerobic growth conditions to be members of the Fur regulon ( 7 , 13 , 23 ) and include well-known iron homeostasis functions , such as iron acquisition ( e.g. , fepA-entD and tonB ) , iron storage ( ftnA and bfd ) , and Fe-S cluster biogenesis ( sufABCDSE ) , as well as the small RNA RyhB ( ryhB ) . 
+ Two of the 36 operons , amiA , encoding a peptidoglycan amidohydrolase , and yrbL , a gene of unknown function , were not reported previously to be Fur regulated . 
+ In summary , although this genomic approach correctly identiﬁes most of the known Fur regulon and associated binding sites , the majority of Fur binding sites ( ~ 200 ) identiﬁed by ChIP-seq from either aerobic or anaerobic conditions do not lead to Fur-dependent changes in transcription under the growth conditions tested here . 
+ Anaerobiosis enhances the magnitude of Fur regulation for most of the Fur regulon . 
+ To further investigate the impact of O2 availability on expression of the four promoter regions bound by Fur only under anaerobic conditions , we compared RNA levels from gene expression proﬁling of Fur and Fur strains grown under aerobic or anaerobic conditions . 
+ For these four operons ( preTA , gltA , dps , and appY ) ( Fig. 3 , middle panel ; see Table S3 in the supplemental material ) , we found that Fur-dependent changes in RNA levels were indeed greatest under anaerobic conditions . 
+ A similar trend was observed when individual promoter regions of preT and dps were fused to a lacZ reporter gene ( Fig. 3 , bottom panel ) and assayed for Fur-dependent changes in - galactosidase activity in Fur and Fur strains grown under aerobic or anaerobic conditions . 
+ In contrast , we did not observe a comparable effect of Fur on expression of the PappY-lacZ fusion as found with appY transcript levels . 
+ Since the activity of the appY promoter is known to be regulated by the nucleoid-associated protein H-NS ( 45 ) , transplanting this promoter out of its normal genomic context may eliminate the ability of Fur to increase expression , if Fur acts to prevent H-NS repression similar to the mechanism of ftnA induction ( 20 ) . 
+ The effect of O2 on Fur-dependent regulation of the entire regulon was also examined . 
+ By comparing RNA levels from gene expression proﬁling of Fur and Fur strains , we found that Furdependent repression was greater under anaerobic compared to aerobic growth conditions for most of the Fur regulon ( Fig. 4 , top panel ; see Table S3 in the supplemental material ) . 
+ The genomic regions upstream of representative operons were further analyzed by assaying expression from promoter-lacZ fusions in the presence and absence of Fur under both aerobic and anaerobic growth conditions . 
+ Expression from promoter-lacZ fusions of three operons ( fhuA , bfd , and nrdHIEF ) recapitulated the increased anaerobic repression by Fur observed by expression proﬁling ( Fig. 5 ) . 
+ In addition , expression from promoter-lacZ fusions also revealed small increases in Fur repression of fepA and fhuE during anaerobiosis that we were unable to detect in genome-wide experiments due to their low expression levels in Fur strains ( Fig. 5 ; see Table S3 ) . 
+ Although not tested further here , it seems likely that other strongly repressed genes with low expression levels ( e.g. , fes , fepDGC , entS , and entCEBAH ) will also show some degree of Furdependent O2 regulation if examined by more sensitive assays . 
+ Furthermore , expression of ftnA , which is positively affected by Fur , was also increased under anaerobic conditions ( see Table S3 ) . 
+ Thus , Fur appears to be more active anaerobically , resulting in O2 regulation of many promoters within its regulon . 
+ However , for some Fur-repressed operons , O2 regulation was not solely mediated by Fur . 
+ For example , transcript levels from several operons known to be Fur targets ( ﬁu-ybiX , feoABC , yddAB-pqqL , sufABCDSE , yoeA , cirA , mntH , nrdHIEF , yrbL , fecABCDE , and sodA ) showed O2-dependent differences in Fur strains ( see Table S3 ) . 
+ The genomic regions upstream of three such Fur-regulated operons ( feoA , ﬁu , and mntH ) were further analyzed by assaying expression from promoter-lacZ fusions in the presence and absence of Fur under both aerobic and anaerobic conditions ( Fig. 5 ) . 
+ In the case of feoABC , Fur repression was limited to anaerobic conditions , likely due to the known activation of this operon by the anaerobic transcription factors ArcA and FNR ( 26 , 27 ) . 
+ In contrast , Fur repression of ﬁu and mntH was greater under aerobic than anaerobic conditions . 
+ Although the mechanism is not known , MntR also represses mntH ( 46 ) , and ArcA binds to the ﬁu promoter region ( 27 ) . 
+ Finally , although not tested here , expression of the sufABCDSE , nrdHIEF , and sodA operons is known to also be controlled by the transcription factor IscR , whose activity is regulated by O2 availability ( 19 , 47 , 48 ) . 
+ Therefore , it is probable that , in addition to Fur , transcription factors such as IscR , ArcA , or FNR further modulate the expression of operons within this group under anaerobic conditions . 
+ Thus , anaerobiosis appears to have a major effect on expression of the Fur regulon . 
+ Many genes indirectly regulated by Fur appear to be novel RyhB targets . 
+ The small regulatory RNA RyhB is known to posttranscriptionally decrease expression of select iron-containing proteins and increase expression of certain iron uptake functions in an effort to spare iron for critical functions during ironlimiting , aerobic growth conditions ( 10 , 28 ) . 
+ Because RyhB is elevated in strains lacking Fur , we reasoned that some of the operons indirectly regulated by Fur ( i.e. , those that lacked a ChIP-seq peak ) might represent novel RyhB targets . 
+ Using the criteria that transcripts regulated by RyhB should return to wild-type levels when strains lacking Fur are also deleted for ryhB , we found that nearly one-third of the operons indirectly controlled by Fur under anaerobic growth conditions are candidates for direct RyhB regulation ( see Table S4 in the supplemental material ) . 
+ The effect of RyhB on RNA levels was generally small ( ~ 2-fold ) , in agreement with previous reports ( 49 ) , although a few showed 5-fold changes . 
+ Control experiments comparing expression of wild-type ( Fur RyhB ) to Fur RyhB strains revealed very little differential gene expression as expected ( see Table S7 in the supplemental material ) . 
+ The remaining genes , whose expression was not RyhB regulated but which were indirectly inﬂuenced by Fur , are reported in Table S5 in the supplemental material . 
+ Of the 46 operons regulated by RyhB under anaerobic conditions , most are new candidates for RyhB regulation ( Fig. 4 , bottom panel ; see Table S4 in the supplemental material ) . 
+ For example , 13 of the 15 operons whose RNA levels were increased by RyhB expression are new potential targets . 
+ In addition to the known RyhB-dependent increase of shiA transcripts ( 29 ) , we observed increases in RNA levels for proteins involved in cellular motility ( ﬂgBCDEFG , ﬂiAZY , ﬂiC , and ﬁmICDFGH ) , metabolism ( e.g. , ybiV , ppsR , and bioA ) , and transport ( e.g. , ydeA , ynfM , and yohJ ) ( Fig. 4 , bottom panel ; see Table S4 ) . 
+ If these increases are due to direct effects of RyhB , then these data suggest that the positive effect of RyhB on transcript stability may be broader than just iron homeostasis . 
+ In contrast , the RNA levels of the 31 operons decreased by RyhB expression ( Fig. 4 , bottom panel ; see Table S4 ) under anaerobic conditions encode mostly iron-containing proteins , consistent with the paradigm of iron sparing ( 28 ) . 
+ Some of these operons were previously known or predicted to be regulated by RyhB ( e.g. , sodB , frdABCD , pﬂA , nuoABCEFGHIJKLMN , dmsABC , hypABCDE , and nirBDC ) ( 39 , 49 ) . 
+ However , most operons are new candidates for RyhB regulation and encode protein complexes involved in anaerobic respiration and metabolism ( hyaABCDEF , hycABCDEFGHI , hydN-hypF , fdhF , narGHJI , nrfABCDEF , appCB-yccB-appA , ynfEFGH , narK , and dppBCDF ) . 
+ The suite of RyhB downregulated genes differs between aerobic and anaerobic conditions . 
+ Many of the newly identiﬁed downregulated RyhB candidates are expressed preferentially under anaerobic conditions ( 27 , 32 , 33 , 36 ) , providing a plausible explanation for why they were not previously detected when cells grown under aerobic conditions were analyzed . 
+ Conversely , some known RyhB targets were not found to be regulated by RyhB under anaerobic conditions -- presumably because they were not sufﬁciently expressed . 
+ To test how extensively O2 inﬂuences the transcription of RyhB targets , we compared RNA levels from Fur versus Fur RyhB strains grown under both aerobic and anaerobic growth conditions . 
+ Indeed , of the 44 transcripts negatively regulated by RyhB , 13 are expressed at higher levels in the presence of O2 , 14 are similarly expressed whether O2 is present , and 17 are expressed at higher levels under anaerobic conditions ( Fig. 4 , bottom panel ; see Table S4 in the supplemental material ) . 
+ We also found that Fur strains grew slower than Fur strains under aerobic but not anaerobic conditions ( Fig. 6 ) . 
+ However , wild-type growth rates were reestablished in aerobic Fur strains upon deletion of RyhB . 
+ The fact that this growth rate phenotype is only observed under aerobic growth conditions suggests the genes whose transcription is limited to aerobic conditions and are targeted by RyhB may be responsible for the observed growth defect . 
+ Together , these analyses highlight a major role of O2-dependent transcriptional changes in determining which mRNAs are targeted by RyhB to promote an iron-sparing response . 
+ Iron-containing proteins that are not regulated by RyhB . 
+ By analyzing the expression levels of all annotated iron-binding proteins ( according to the EcoCyc database [ 50 ] ) in our data set , we could separate out the iron-binding proteins that appear to evade RyhB regulation . 
+ This group was enriched in heme biosynthetic enzymes , cytochrome maturation functions , and heme proteins ( see Table S6 in the supplemental material ) . 
+ In addition , genes coding for iron-containing proteins that function in cofactor bio-synthesis ( e.g. , bioB , lipA , nadA , ispG , and ispH ) , DNA repair ( nth and mutY ) , RNA modiﬁcation ( rlmC , rlmD , rlmN , queG , ttcA , tsaD , and miaB ) , and transcriptional regulation ( soxR , nsrR , and fnr ) also do not appear to be subject to RyhB regulation ( see Table S6 ) . 
+ Perhaps some of these processes escape RyhB regulation because they are more critical to cellular function ( e.g. , RNA modiﬁcation ) or too costly to abandon ( e.g. , synthesis of cofactors such as heme , thiamine , biotin , ubiquinone , or NAD ) upon iron deprivation . 
+ Possible dual regulation of genes by Fur and RyhB . 
+ Our data suggest that several operons are potentially coregulated by Fur and RyhB because we found an upstream Fur DNA binding site and differential gene expression in Fur compared to Fur RyhB strains . 
+ For example , cirA , fecABCDE , yddAB-pqqL , yncE , ybiV , ydeA , and ppsR are bound upstream by Fur , and their RNA levels are increased by RyhB ( see Tables S3 and S4 in the supplemental material ) . 
+ In fact , cirA is known to be regulated by Fur ( 7 ) and RyhB ( 51 ) . 
+ Maximizing expression of known ( cirA and fecAB-CDE ) and predicted ( yddAB-pqqL and yncE ) iron uptake systems during iron-limiting conditions could be advantageous . 
+ Yet for a second group of genes ( including acnA , sdhCDAB , hybOAB , op-pBCDF , tsx , sseB , and pepB ) , the potential opposing effects of Fur and RyhB and additional regulation by transcription factors such as ArcA and IscR preclude any conclusions without additional data ( see Table S4 ) . 
+ O2-dependent regulation of the metallome . 
+ Since we found that anaerobic conditions led to increased Fur-dependent repression of ferric uptake pathways and genes encoding other divalent cation-binding proteins ( Mn-binding SodA and NrdHIEF ) , we assayed whether cellular metal levels changed between aerobic and anaerobic conditions . 
+ The cellular levels of 11 elements ( Mn , Co , Ni , Zn , Mg , P , S , Fe , K , Cu , and Ca ) were measured by inductively coupled plasma mass spectrometry ( ICP-MS ) ( Table 1 ) . 
+ Fe was present at 19.1 ng/mg cell pellet during aerobic growth , corresponding to ~ 0.0063 % of the cellular dry weight -- ~ 3-fold less than previous reports ( 2 ) -- and showed a small 1.2-fold increase during anaerobic growth . 
+ In contrast , Mn , Co , and Ca levels showed large O2-dependent differences in abundance : anaerobic cellular Mn and Co levels decreased 37-fold and 5.5-fold , respectively , whereas anaerobic Ca levels increased 2.9-fold compared to those in aerobic cells ( Table 1 ) . 
+ These data show that O2 availability has a broad effect on cellular metal homeostasis . 
+ DISCUSSION
+ The ﬁndings reported here show that the lack of O2 produces large and previously unreported effects on metal ion homeostasis in the enteric bacterium E. coli . 
+ Speciﬁcally we found that O2 availability impacts the expression of genes regulated by the two iron global regulators , Fur and RyhB , but for different reasons . 
+ Under anaerobic conditions , the positive or negative effects of the transcription factor Fur on expression of many genes were enhanced . 
+ In contrast , the O2-dependent changes in the genes posttranscriptionally regulated by RyhB could be attributed to differential transcription of these target mRNAs . 
+ This regulatory hierarchy suggests that the iron proteome may be differentially remodeled under iron starvation conditions , depending on O2 availability . 
+ Finally , enhanced Fur-dependent repression of manganese-cofactored enzymes was accompanied by a dramatic decrease in cellular manganese levels , suggesting previously unknown rewir-ing of the metallome under anaerobic growth conditions . 
+ Adaptation of the Fur regulon to anaerobic conditions . 
+ While our analysis of anaerobic cells allowed us to identify some new genes regulated by Fur , our major ﬁnding was the enhancement in Fur regulation in response to anaerobiosis . 
+ Thus , our results provide new insight into the sensitivity of the control of iron homeostasis in E. coli to O2 availability ( summarized in Fig. 7 ) . 
+ For example , expression of genes encoding several Fe3-siderophore uptake systems was decreased under anaerobic conditions , consistent with decreased demand for ferric uptake . 
+ Fur also increased expression of the genes encoding two iron storage proteins , ferritin A and Dps , and decreased expression of bfd , encoding a protein that would facilitate iron release from bacterioferritin ( 52 ) , suggesting that iron storage may be increased anaerobically . 
+ However , the process by which iron is stored under anaerobic conditions in E. coli is unclear because the best-studied mechanisms for iron storage require O2 , mineralizing ~ 1,000 to 3,000 iron atoms/ferritin ( 1 , 3 ) and ~ 20 to 500 irons/Dps ( 2 , 53 ) . 
+ Thus , in the absence of an O2-dependent mineralization mechanism , less iron may actually be stored anaerobically , despite the increase in dps and ftnA expression . 
+ Nevertheless , we observed a small increase in total cellular iron levels under anaerobic conditions , raising the possibility that iron storage could be increased . 
+ Although the bulk of cellular iron is assumed to be in a bound state , allocated between iron-bound proteins and storage forms ( 2 , 54 ) , the overall distribution of iron in anaerobically grown cells is not known . 
+ Determining whether this increased iron is present in iron stores and protein cofactors or is unbound will be critical in addressing if Fur activity is enhanced under anaerobic conditions because of an increase in the `` labile '' iron pool . 
+ Our results also reinforce previous studies that Fe-S cofactor biosynthesis is regulated by O2 and iron availability ( Fig. 7 ) . 
+ First , expression of both the housekeeping Isc pathway and the stress-induced Suf Fe-S cluster biogenesis pathways is decreased under anaerobic conditions , due to the regulators IscR ( 47 , 55 ) and Fur ( 14 ) , respectively . 
+ Since some Fe-S clusters are known to be labile to O2 or reactive oxygen species ( 4 ) , the decrease in expression of Fe-S biogenesis pathways under anaerobic conditions might re-ﬂect a decreased demand for Fe-S clusters under conditions where clusters are more stable . 
+ However , when iron is limiting , this coordinate control of Fe-S biogenesis pathways should be disrupted under anaerobic conditions because RyhB downregulates expression of the Isc pathway ( 56 ) , whereas the loss of Fur repression promotes an increase in expression of the Suf pathway ( 14 -- 16 ) . 
+ Surprisingly , the enzymes required for heme biosynthesis or many heme-containing proteins were not found to be part of the Fur or RyhB regulon . 
+ How the ﬂux of iron into this pathway is controlled remains to be determined . 
+ RyhB connects iron status to O2-dependent transcriptional networks . 
+ This study also reveals extensive integration of the RyhB network with those that respond to O2 limitation , ensuring that cells produce the most appropriate suite of iron-containing proteins , depending on environmental conditions . 
+ For example , transcription of several genes encoding iron-containing proteins is regulated by O2 to tailor protein production to the appropriate mode of energy conservation ( e.g. , tricarboxylic acid [ TCA ] cycle and aerobic and anaerobic respiratory pathways ) ( 6 , 32 , 37 ) . 
+ When external iron is sufﬁcient , intracellular iron is available to synthesize the appropriate complement of iron-containing proteins , and transcriptional regulation by O2 is the primary point of control . 
+ However , when external iron is not sufﬁcient , RyhB is expressed and a second level of control is added ( Fig. 7 ) , which decreases mRNA levels of a subset of iron-containing proteins , making iron available for more `` essential '' iron proteins in the so-called `` iron-sparing response '' ( 28 , 57 ) . 
+ The observed downregulation of components of the TCA cycle and respiratory pathways by RyhB under both aerobic and anaerobic growth conditions suggests that this small RNA selectively targets respiration-linked energy conservation pathways to maintain the function of other iron-binding proteins under ironlimiting conditions . 
+ Surprisingly , targeting these mRNAs only affected the growth rate under aerobic conditions . 
+ Although the downregulation of pathways that generate NADH ( i.e. , TCA cycle ) and its oxidation ( i.e. , NADH dehydrogenase I ) by RyhB in E. coli may be sufﬁcient to explain this decreased growth rate , we can not exclude contributions from the downregulation of other RyhB targets , such as superoxide dismutase B ( SodB ) , which functions in reducing oxidative stress ( 58 ) . 
+ It is noteworthy that Bacillus subtilis appears to exert a similar strategy in that Fur strains can not grow on the respiratory substrate succinate unless these strains also lack the small RNA FsrA , the B. subtilis equivalent of RyhB ( 59 ) . 
+ Anaerobiosis may impact the metallation state of other divalent cation-containing proteins . 
+ Comparison of the metallomes of E. coli K-12 between aerobic and anaerobic conditions revealed a large decrease in cellular manganese levels during anaerobiosis . 
+ This decrease in manganese was accompanied by enhanced anaerobic repression by Fur of two major Mn-containing enzymes , the Mn-superoxide dismutase encoded by sodA ( 18 , 60 ) and the Mnribonucleotide reductase encoded by nrdHIEF ( 19 ) . 
+ Both of these enzymes have iron-containing isozymes , encoded by sodB and nrdAB or nrdDG , which substitute for their Mn counterparts under anaerobic conditions ( 61 -- 64 ) . 
+ Thus , decreasing synthesis of the Mn isozymes when they are not needed under anaerobic conditions serves both to avoid wasting energy in synthesizing unnecessary polypeptides and also to possibly avoid their mismetallation when Mn is decreased ; the importance of maintaining the cellular Fe/Mn ratio for protein metal ion selectivity and cellular physiology has recently been reviewed ( 54 ) . 
+ The mechanism by which Mn levels decrease under anaerobic conditions is not known . 
+ Perhaps decreased Mn transport via the MntH Mn2 / Fe2 : H symporter , which is driven by the proton motive force ( 65 ) , could be limiting under the fermentative conditions of growth used in these experiments . 
+ We found that expression of the Mn exporter MntP ( 66 ) is reduced anaerobically ( see Table S3 in the supplemental material ) , so it would seem unlikely to play a role in this response . 
+ Interestingly , expression of the Mn-dependent isozyme of phosphoglycerate mutase , GpmM , is induced during anaerobiosis ( 67 ) . 
+ If this is the major form of the enzyme under anaerobic conditions , it is possible that cells have a system to prioritize manganese loading of GpmM under conditions of decreased cellular Mn . 
+ The MntS/RybA protein has been suggested to be an Mn chaperone ( 46 ) , which could perhaps carry out this activity . 
+ Fur binds the canonical Fur motif under either aerobic or anaerobic conditions . 
+ Prior to this study , the in vivo DNA binding sites of Fur had been mapped only under aerobic growth conditions ( 23 ) . 
+ We found good agreement between the locations of 59 high-signal-to-noise , iron-dependent Fur binding regions in our study and the binding regions mapped with the higherresolution `` ChIP-exo '' approach ( 23 ) , which combines ChIP with exonuclease digestion and high-throughput sequencing ( see Table S2 in the supplemental material ) . 
+ Less overlap between our studies and ChIP-exo experiments was observed for peaks with low signal-to-noise ratio ( ChIP signal of 3,000 ) , which may be attributed to the complexity in resolving the background of ChIP experiments . 
+ Nevertheless , a large number of Fur binding regions mapped under either aerobic or anaerobic conditions did not result in transcriptional regulation . 
+ Some of these sites could function under different physiological conditions , as was proposed for similar `` transcriptionally inactive '' binding regions found with the transcription factor FNR ( 36 ) , or some may contribute to the overall nucleoid structure of the genome ( 68 ) . 
+ Perhaps the necessity of global regulators to bind more degenerate DNA sites to achieve regulation within many promoter regions might lead to binding at other regions of the genome as an unintended consequence . 
+ We also did not observe Fur binding in vivo to some sites predicted by an information theory model ( 44 ) , indicating that not all high-quality DNA sites are accessible to transcription factor binding . 
+ This property was also previously observed for the transcription factor FNR ( 36 ) , providing additional support to the notion that global regulators compete for occupancy in vivo with other DNA binding proteins . 
+ Why is Fur binding increased anaerobically ? 
+ Bioinformatic analysis of the regions bound by Fur under both aerobic and an-aerobic conditions revealed DNA sites similar to the Fur motif deﬁned from previous studies ( 44 ) . 
+ In contrast , the Fur binding sites speciﬁc to anaerobic conditions were not predicted by this weight matrix model , consistent with the notion that these represent weaker afﬁnity sites . 
+ Because Fur occupancy of these lessconserved sequences increased under anaerobic conditions , we propose that there must be more active Fur under anaerobic conditions to bind to these putative weaker afﬁnity sites . 
+ Increased Fur DNA occupancy of previously known Fur sites could also explain increased repression and induction of many Fur-regulated genes during anaerobiosis . 
+ While increased Fur DNA binding could result from greater Fur protein abundance in the cell during anaerobiosis , we did not observe an increase in fur expression under anaerobic conditions . 
+ Thus , the mechanism to explain increased Fur activity and whether it is connected to the small increases in cellular iron observed under anaerobic conditions requires further study . 
+ In summary , our in vivo DNA binding and expression data suggest that Fur activity increases during anaerobiosis . 
+ Our ﬁndings reveal that in the absence of O2 , the Fur regulon is modiﬁed such that transcription of iron uptake genes is geared toward Fe2 and expression of iron storage proteins is increased . 
+ Furthermore , Mn levels and expression of Mn-cofactored enzymes that have iron counterparts are decreased under anaerobic conditions . 
+ We also found that many potential targets of RyhB are also transcriptionally regulated by O , implying that the iron proteome is likely 2 to be differentially remodeled in response to iron deprivation under anaerobic conditions compared to aerobic conditions . 
+ MATERIALS AND METHODS Strain construction for global analyses and promoter activity assays.
+ Relevant strains are listed in Table S1 in the supplemental material . 
+ Sequences of primers are available upon request . 
+ E. coli K-12 MG1655 ( F rph-1 ) served as the wild-type strain . 
+ To construct the Fur strain , PK9427 , fur : : kan from the Keio collection ( 69 ) was moved into MG1655 by transduction with P1 vir , selecting for kanamycin resistance ( Kmr ) . 
+ The kan cassette was removed by transforming strains with pCP20 , encoding the FLP recombinase ( 70 ) . 
+ To construct the RyhB ( PK10474 ) and Fur RyhB ( PK10475 ) strains , ryhB : : cat from PK7875 was moved into MG1655 and PK9427 , respectively , by transduction with P1 vir , selecting for chloramphenicol resistance ( Cmr ) . 
+ PK7875 was made by Pl vir transduction of ryhB : : cat from EM1238 ( 9 ) into MG1655 and selection for Cmr . 
+ Strains bearing chromosomal promoter-lacZ fusions were constructed as previously described ( 47 ) . 
+ Brieﬂy , promoter regions of interest were ampliﬁed from MG1655 and cloned into pPK7035 . 
+ A lacI-kan-promoter-lacZ fragment , ampliﬁed from pPK7035 derivatives , was then electroporated into either BW25993 containing pKD46 ( 70 ) or PK9012 ( MG1655 c1857 mutS : : Tn10 cro-bioA ) . 
+ PK9012 was constructed in a manner previously described ( 55 ) . 
+ Chromosomal promoter-lacZ fusions were moved into MG1655 , PK9427 , and JEM609 by transduction with P1 vir , selecting for Kmr . 
+ All constructs were conﬁrmed by colony PCR and/or DNA sequencing . 
+ Growth of strains for global analyses . 
+ Strains were grown in gas-sparged Roux bottles at 37 °C in MOPS ( morpholinepropanesulfonic acid ) minimal medium with 0.2 % glucose and the indicated amount of FeSO ( 36 ) . 
+ For ChIP-seq or ChIP-chip ( ChIP with microarray technol-4 ogy ) analysis of MG1655 or PK9427 and for transcriptomic analysis of MG1655 , PK9427 , PK10474 , and PK10475 the medium contained 10 M FeSO4 . 
+ For ChIP-seq analysis of JEM609 ( MG1655 lacZ , tonB , feoABC , zupT [ 19 ] ) the medium contained 1.0 M FeSO to promote 4 iron deﬁciency . 
+ A gas mixture of 70 % N2 , 5 % CO2 , and 25 % O2 was used for aerobic experiments , and a gas mixture of 95 % N2 and 5 % CO2 was used for anaerobic experiments ( 36 ) . 
+ Cells were harvested at an optical density at 600 nm ( OD600 ) of 0.3 to 0.35 , measured using a Perkin Elmer Lambda 25 UV/visible spectrophotometer . 
+ Chromatin immunoprecipitation followed by high-throughput sequencing or hybridization to a microarray chip . 
+ ChIP assays were performed as previously described ( 36 ) using antibodies speciﬁc to Fur that were puriﬁed over a His6-Fur-bound HiTrap N-hydroxysuccinimide ( NHS ) - activated high-performance ( HP ) column ( GE Healthcare ) ( 71 ) . 
+ Western blot analysis , performed as previously described ( 72 ) , showed that the puriﬁed antibody was speciﬁc for Fur ( see Fig . 
+ S1 in the supplemental material ) . 
+ For ChIP-seq experiments , DNA enriched from three replicates of aerobic Fur cultures ( MG1655 or MG1655 PfepA-lacZ ) , three replicates of anaerobic Fur cultures ( MG1655 or MG1655 PfepA-lacZ ) , two replicates of anaerobic iron-deﬁcient cultures ( JEM609 or JEM609 PfepA-lacZ ) , or one combined-input sample was submitted to the University of Wisconsin -- Madison DNA Sequencing Facility for library construction and Illumina sequencing ( Illumina Genome Analyzer IIx or Illumina HiSeq2000 [ all single-end , 1 50 bp ] ) per the manufacturer 's recommendations . 
+ Strains bearing PfepA-lacZ allowed for readout of cellular iron status ( see Fig . 
+ S2 in the supplemental material ) . 
+ Illumina sequencing FASTQ ﬁles were reformatted to the Sanger format using the FASTQ Groomer script ( 73 ) and reads ( aerobic Fur read counts of 21,140,380 , 25,480,433 , and 25,181,847 ; anaerobic Fur read counts of 17,617,567 , 16,084,736 , and 18,255,201 ; anaerobic iron-deﬁcient read counts of 18,361,478 and 20,432,024 ; and an input read count of 23,481,977 ) were mapped to the E. coli K-12 MG1655 genome ( version U00096 .2 ) using the Bowtie 2 algorithm ( default settings ) ( 74 ) . 
+ Greater than 90 % of reads mapped to the genome for all samples . 
+ Enriched regions were identiﬁed using the peak-calling algorithm MOSAiCS ( 75 , 76 ) using a false discovery rate ( FDR ) of 0.1 . 
+ The dPeak algorithm ( 77 ) was used to deconvolute close-proximity peaks . 
+ A total of 517 unique peaks , having been found in at least two replicates , were identiﬁed across all strains and growth conditions . 
+ Data sets were normalized to 20 million reads , and 262 peaks of low read count ( 2,500 reads at the peak summit ) were removed because they were present in both Fur and iron-deﬁcient cultures ( DBChIP [ 78 ] ; P 0.05 ) or did not visually conform to a peak above the local background . 
+ Normalized ChIP-seq data ﬁles were visualized with MochiView ( 79 ) . 
+ The ﬁnal peak list is given in Table S2 in the supplemental material . 
+ Fur binding site motifs were constructed by analyzing the 100 bp upstream and downstream of the dPeak-identiﬁed peak summits , submitting the sequences to MEME-ChIP ( 40 ) , and using the overrepresented sequences to construct the position weight matrix ( PWM ) . 
+ For ChIP-chip experiments , DNA enriched from one replicate of an anaerobic Fur culture ( PK9472 ) and input DNA was ampliﬁed , labeled , and hybridized to a custom-made E. coli K-12 MG1655 tiled-genome microarray ( Roche NimbleGen , Inc. , Madison , WI [ 80 ] ) ; hybridized microarrays were scanned using a GenePix 4000B ( Axon Instruments ) microarray scanner as previously described ( 36 ) . 
+ ChIP and input data were quantile normalized using `` normalize.Quantiles '' from the preprocessCore package ( 81 ) in R ( 82 ) , and enriched binding regions were identiﬁed using the CMARRT peak-calling algorithm ( P 0.1 ) ( 83 ) . 
+ Enriched regions were removed from the ﬁnal Chip-seq peak list if they were also present in the Fur strain data . 
+ RNA isolation and whole-genome transcriptomic microarray analysis . 
+ RNA was isolated from two biological replicates of MG1655 , PK9472 , PK10474 , and PK10475 under aerobic or anaerobic growth conditions as previously described ( 36 ) . 
+ Ten micrograms of RNA was reverse transcribed and labeled with Amersham Cy3 monoreactive dye ( GE Healthcare ) as previously described ( 80 ) . 
+ The puriﬁed , labeled cDNAs were fragmented with DNase I ( 0.1 U per g of cDNA ) for 10 min at 37 °C . 
+ DNase I was then inactivated at 95 °C for 10 min ; samples from RyhB and Fur RyhB strains also included 0.85 mM EDTA in this reaction . 
+ Approximately 0.6 to 1.5 g of precipitated Cy3-labeled cDNA was hybrid-ized to a custom-made E. coli K-12 MG1655 tiled-genome microarray ( Roche NimbleGen , Inc. , Madison , WI [ 80 ] ) as previously described ( 36 ) . 
+ Hybridized microarrays were scanned using a GenePix 4000B ( Axon Instruments ) microarray scanner , and the photomultiplier tube ( PMT ) was adjusted so that the median ﬂuorescence was just below 100 . 
+ Raw probe intensities were normalized across all samples using the Robust Multichip Average ( RMA ) algorithm in the NimbleScan software package ( version 2.5 [ 84 ] ) . 
+ After probes were matched to gene coordinates ( `` IRanges '' package [ 85 ] in R [ 82 ] ) , differential expression of genes between experiments was determined using an analysis of variance 
+ ( ANOVA ) test ( `` anova.test '' in R [ 82 ] ) , and P values were adjusted using the Benjamini-Hochberg false discovery rate control procedure ( 86 ) ( `` p.adjust '' with method `` BH '' in R [ 82 ] ; FDR , 0.01 ) to address the multiple testing issue . 
+ For differentially expressed genes , the experiments in which gene expression was signiﬁcantly different from those of other experiments were identiﬁed by a post hoc test using the Tukey 's honestly signiﬁcant difference method ( 87 ) ( `` TukeyHSD '' in R [ 82 ] ; P 0.01 ) . 
+ Genes were further required to show at least a 1.5-fold change in expression between experiments . 
+ A cutoff of 1.5-fold change in expression between experiments was chosen based on known regulation of the Isc pathway by RyhB ( 56 ) . 
+ Regulation of an operon was reported to be RyhB dependent only if expression changed under the condition in which the transcript was most expressed . 
+ The expression value of a given gene is the sum of the intensity of the probes that overlap that gene , from both biological replicates , divided by the length of that gene , log transformed . 
+ 2 Promoter activity measurements by - galactosidase assay . 
+ Strains bearing promoter-lacZ fusions were grown at 37 °C in MOPS minimal medium containing 10 M FeSO4 and 0.2 % glucose to an OD600 of ~ 0.2 to 0.4 under either aerobic or anaerobic conditions , and promoter activity was measured as previously described ( 88 ) . 
+ Differences in aerobic and anaerobic cell counts for cells grown in minimal medium were corrected by multiplying aerobic activity by 1.5 ( 72 ) . 
+ Assays were repeated at least three times , and error bars represent the standard error from these biological replicates . 
+ Cellular element analysis . 
+ Triplicate cultures of MG1655 were grown in MOPS minimal medium containing 10 M FeSO4 and 0.2 % glucose under either aerobic or anaerobic growth conditions in gas-sparged Roux bottles as in global analyses . 
+ At an OD600 of ~ 0.4 , equal numbers of cells were centrifuged , resuspended in MOPS minimal medium containing 10 M FeSO4 , 0.2 % glucose , and 10 mM diethylenetriaminepentaacetic acid ( DTPA [ Sigma Aldrich ] ) , and incubated for 15 min at 37 °C to remove contaminating surface metals ( 89 ) . 
+ Cells were centrifuged and resuspended twice in 20 mM Tris-HCl ( pH 7.4 ) and then transferred to preweighed 10 % HCl-treated tubes . 
+ Cells were centrifuged , aspirated of the remaining liquid , and resuspended in H2O to 0.33 mg cell pellet / l. Cells were lysed on ice by sonication with a cup horn-equipped Misonix S-4000 sonicator at 10-s on-off intervals of 60 % output for 60 min . 
+ At the Wisconsin State Laboratory of Hygiene , 200 l of this cell lysate was digested with 100 l of tetramethylammonium hydroxide ( TMAH ) for 1 h at 70 °C prior to dilution with 4 % HNO ( 90 ) and element analysis by 3 magnetic sector inductively coupled plasma mass spectrometry using a Thermo-Finnigan Element 2 plasma mass spectrometer ( 91 ) . 
+ Microarray data accession number . 
+ ChIP-seq , ChIP-chip , and tiling array data sets have been deposited in the Gene Expression Omnibus ( GEO ) under accession no . 
+ GSE74933 . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at http://mbio.asm.org/ lookup/suppl/doi :10.1128 / mBio.01947-15 / - / DCSupplemental . 
+ Figure S1 , TIF ﬁle , 0.2 MB . 
+ Figure S2 , TIF ﬁle , 0.04 MB . 
+ Table S1 , DOCX ﬁle , 0.1 MB . 
+ Table S2 , DOCX ﬁle , 0.1 MB . 
+ Table S3 , DOCX ﬁle , 0.1 MB . 
+ Table S4 , DOCX ﬁle , 0.2 MB . 
+ Table S5 , DOCX ﬁle , 0.1 MB . 
+ Table S6 , DOCX ﬁle , 0.1 MB . 
+ Table S7 , PDF ﬁle , 3.3 MB . 
+ ACKNOWLEDGMENTS
+ We thank Marie Adams at the University of Wisconsin -- Madison DNA Sequencing Facility for help with ChIP library construction and highthroughput sequencing . 
+ We acknowledge Martin Shafer of the Wisconsin State Lab of Hygiene for help with whole-cell element analyses . 
+ We thank Jason M. Peters for advice on tiling microarray experiments . 
+ We thank Jim A. Imlay for advice and strain JEM609 . 
+ This work was funded by a grant from the NIH to P.J.K. ( R01GM045844 ) . 
+ N.A.B. was supported by University of Wisconsin -- Madison NIH Chemistry Biology Interface training grant T32GM008505 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/26673755.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/26673755.txt 0 → 100644
View file @27818a9
+ Genomic Targets and Features of BarA-UvrY
+ $ a Current address : Department of Oral Biology , University of Florida , College of Dentistry , Gainesville , FL , 32610 -- 0424 $ b Current address : Integrated DNA Technologies , Molecular Genetics Department , 1710 Commercial Park , Coralville , IA , 52241 $ c Current address : State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources , and College of Life Science and Technology , Guangxi University , Nanning , Guangxi , PR China * tromeo@ufl.edu 
+ Abstract
+ The two-component signal transduction system BarA-UvrY of Escherichia coli and its ortho-logs globally regulate metabolism , motility , biofilm formation , stress resistance , virulence of pathogens and quorum sensing by activating the transcription of genes for regulatory sRNAs , e.g. CsrB and CsrC in E. coli . 
+ These sRNAs act by sequestering the RNA binding protein CsrA ( RsmA ) away from lower affinity mRNA targets . 
+ In this study , we used ChIP-exo to identify , at single nucleotide resolution , genomic sites for UvrY ( SirA ) binding in E. coli and Salmonella enterica . 
+ The csrB and csrC genes were the strongest targets of cross-linking , which required UvrY phosphorylation by the BarA sensor kinase . 
+ Crosslinking occurred at two sites , an inverted repeat sequence far upstream of the promoter and a site near the -35 sequence . 
+ DNAse I footprinting revealed specific binding of UvrY in vitro only to the upstream site , indicative of additional binding requirements and/or indirect binding to the downstream site . 
+ Additional genes , including cspA , encoding the cold-shock RNA-bind-ing protein CspA , showed weaker crosslinking and modest or negligible regulation by UvrY . 
+ We conclude that the global effects of UvrY/SirA on gene expression are primarily mediated by activating csrB and csrC transcription . 
+ We also used in vivo crosslinking and other experimental approaches to reveal new features of csrB/csrC regulation by the DeaD and SrmB RNA helicases , IHF , ppGpp and DksA . 
+ Finally , the phylogenetic distribution of BarA-UvrY was analyzed and found to be uniquely characteristic of γ-Proteobacteria and strongly anticorrelated with fliW , which encodes a protein that binds to CsrA and antagonizes its activity in Bacillus subtilis . 
+ We propose that BarA-UvrY and orthologous TCS transcribe sRNA antagonists of CsrA throughout the γ-Proteobacteria , but rarely or never perform this function in other species . 
+ Introduction
+ The ability of bacteria to flourish under diverse environmental conditions requires their physiology and metabolism to be regulated by complex transcriptional and posttranscriptional circuitries . 
+ The Csr ( carbon storage regulator ) or Rsm ( repressor of stationary phase metabolites ) system is a post-transcriptional regulatory system of E. coli and other γ-proteobacteria that is extensively interconnected with transcriptional regulatory circuits [ 1 -- 8 ] . 
+ Its centerpiece , CsrA , is a small dimeric RNA binding protein that regulates mRNA translation , turnover and transcription termination [ 3 , 9 -- 11 ] . 
+ CsrA activity in E. coli and Salmonella enterica serovar Typhi-murium ( hereafter referred to as Salmonella ) is regulated by CsrB and CsrC sRNAs , which utilize multiple CsrA binding sites to sequester CsrA away from its lower affinity mRNA targets [ 3 , 12 -- 14 ] . 
+ CsrB/C levels are regulated by factors affecting both their synthesis and turn-over [ 15 ] . 
+ CsrB and CsrC transcription is directly regulated by the BarA-UvrY or BarA-SirA two-com-ponent signal transduction system ( TCS ) in E. coli [ 13 , 16 , 17 ] and Salmonella [ 14 , 18 , 19 ] , respectively . 
+ The BarA protein belongs to a family of membrane-associated tripartite sensor-kinases and UvrY belongs to the FixJ family of response regulators [ 14 , 19 -- 21 ] . 
+ BarA is required for sensing the presence of acetate , formate and other carboxylate compounds by an undetermined mechanism [ 22 -- 24 ] . 
+ This leads to autophosphorylation at a conserved histidine residue ( His302 ) , and transphosphorylation of UvrY/SirA through a His302 ! 
+ Asp718 ! 
+ His861 ! 
+ Asp54 phosphorelay [ 20 , 22 ] . 
+ Phosphorylated UvrY-P/SirA-P , in turn , binds to its DNA targets and regulates their transcription . 
+ For instance , in vitro binding studies suggest that the SirA protein ( SirA-P ) binds to DNA sequences located upstream of target genes for Csr sRNAs and activates their transcription in Salmonella [ 18 , 19 , 25 ] . 
+ BarA-UvrY and its orthologs in other γ-proteobacteria , including GacS/GacA ( Pseudomo-nas ) , VarS/VarA ( Vibrio ) , ExpS/ExpA ( Pectobacterium ) and LetS/LetA ( Legionella pneumo-phila ) have been reported to regulate virulence , metabolism , biofilm formation , stress resistance , quorum sensing and secretion systems [ 4 , 20 , 26 -- 30 ] . 
+ Transcriptomics studies of UvrY and its orthologs have shown effects on the expression of numerous genes [ 30 -- 32 ] . 
+ An understanding of which of these effects represents direct regulation vs. indirect regulation via effects on the Csr sRNAs is necessary for modeling of the complex genetic circuitry that underpins the systems biology of these species . 
+ This question has been examined in only one case , which used ChIP-on-chip analysis to conclude that only the genes for Csr sRNAs ( RsmY , RsmZ ) in Pseudomonas aeruginosa are direct targets for GacA binding [ 33 ] . 
+ Here , we used ChIP-exo , an advanced procedure for determining genome-wide DNA-pro-tein interactions with single nucleotide resolution [ 34 ] to map the UvrY and SirA DNA binding sites across the E. coli and Salmonella genomes . 
+ The csrB and csrC genes were by far the stron-gest direct targets of binding in both species . 
+ These genes exhibited crosslinking at two distinct locations . 
+ By comparing the results of in vivo crosslinking with in vitro DNA binding assays , an 18 NT palindrome or inverted repeat sequence ( IR ) located far upstream of the promoter was found to serve as a specific binding site for UvrY at csrB and csrC . 
+ Disruption of barA eliminated UvrY binding in vivo . 
+ IHF ( integration host factor ) was required for optimal UvrY binding to and transcriptional activation of csrB but not csrC . 
+ In addition to the csrB/C genes , weaker genomic binding sites of UvrY were identified by ChIP-exo . 
+ Several genes associated with the weaker binding sites were tested and found to exhibit negligible or only modest regulation by UvrY , suggesting that diverse effects of this TCS are mediated through Csr circuitry . 
+ In addition to BarA-UvrY , other factors activate csrB/C transcription in E. coli , such as CsrA [ 13 , 16 , 35 ] , DksA and ppGpp of the stringent response [ 2 ] , and the DEAD-box RNA helicases DeaD and SrmB [ 4 ] . 
+ The stringent response describes a regulatory network of eubacteria that responds to amino acid starvation and other stresses [ 2 ] . 
+ Activation of this system is characterized by a rapid downshift in synthesis of stable RNAs , such as rRNA and tRNA and stimulation of the expression of genes involved in amino acid biosynthesis and transport , although these processes represent a fraction of its global regulatory role [ 36 ] . 
+ The effector of this response is the nucleotide secondary messenger guanosine tetraphosphate , ppGpp , also known as `` magic spot '' [ 37 ] . 
+ Complexed with RNA polymerase , ppGpp positively or negatively affects transcription [ 38 ] . 
+ In most cases , regulation by ppGpp requires the RNA-polymerase associated transcription factor DksA [ 39 , 40 ] , which acts together with ppGpp to synergistically regulate expression of a number of global regulators and other genes . 
+ The DEAD-box RNA helicases are enzymes that utilize ATP energy to alter RNA structure or RNA-protein interactions [ 4 ] . 
+ Their roles in the posttranscriptional regulation of bacterial gene expression are presently underappreciated . 
+ DeaD regulates uvrY translation by altering mRNA structure , while SrmB uses a distinct , but incompletely understood mechanism . 
+ We used a combination of in vivo and in vitro approaches to refine our understanding of how these regulatory factors affect csrB / C transcription . 
+ Bioinformatics analyses revealed that BarA-UvrY orthologs are strongly anti-correlated with the fliW gene , which encodes a protein that binds to and antagonizes CsrA of B. subtilis [ 41 ] , indicating that few if any species use both FliW and BarA-UvrY transcribed sRNAs as CsrA antagonists . 
+ These studies advance our understanding of this global regulatory circuitry and highlight the importance of including post-transcriptional regulation along with transcriptional control when modeling bacterial regulatory networks . 
+ Bacterial strains, plasmids and primers used in this project are listed in S1 Table.
+ Bacterial strains and culture conditions
+ LB medium ( 1 % [ w/v ] Tryptone ; 1 % [ w/v ] NaCl and 0.5 % [ w/v ] yeast extract ) was used for culture of bacteria unless stated otherwise . 
+ The antibiotics ampicillin ( 100 μg mL − 1 ) , tetracycline ( 15 μg mL − 1 ) , kanamycin ( 50 μg mL − 1 ) , and chloramphenicol ( 25 μg mL − 1 ) were included in growth media as needed . 
+ Bacterial cultures were stored at -80 °C in medium containing ~ 15 % glycerol . 
+ To revive cultures , LB medium ( 2 mL ) was inoculated from the frozen stock cultures and incubated with shaking ( 250 rpm ) at 37 °C overnight . 
+ The overnight cultures served as inoculum ( 1:1000 ) for LB medium or Kornberg medium ( 1.1 % [ wt/vol ] K2HPO4 , 0.85 % [ wt / vol ] KH2PO4 , 0.6 % [ wt/vol ] yeast extract containing 0.5 % [ wt/vol ] glucose ) . 
+ Growth was determined by monitoring OD600 and/or by assaying total cellular protein . 
+ Construction of chromosomal deletions and FLAG1 fusion proteins Gene deletions ( of E. coli ) and carboxy-terminally 3XFLAG1-tagged constructs ( of E. coli and Salmonella ) were introduced by the Red recombinase method , as described [ 42 , 43 ] . 
+ For gene deletions , primers carrying sequences ( 40 nt ) for the beginning ( forward primer ) or end ( reverse primer ) of the open reading frame were designed ( S1 Table ) and plasmids pKD13 ( or pKD3 ) were used as template DNAs for PCR amplification [ 43 ] . 
+ For constructing 3XFLAG tag constructs , primers carrying sequences ( 40 nt ) matching the terminus of the targeted gene ( forward primer ) and the region downstream from it ( reverse primer ) were designed ( S1 Table ) and plasmid pSUB11 was used as template DNA for PCR amplification [ 42 , 43 ] . 
+ PCR products were gel purified and used directly for electro-transformation . 
+ E. coli and Salmonella strains carrying pKD46 helper plasmid were grown at 30 °C in SOB ( 0.5 % [ w/v ] yeast extract ; 2 % [ w/v ] tryptone ; 10 mM NaCl , 2.5 mM KCl ; 10 mM MgCl2 ; 10 mM MgSO4 ) supplemented with 
+ 100 μg / mL ampicillin and 10 mM arabinose to mid-log ( OD600 of 0.5 ) , collected by centrifugation , washed three times with ice-cold 10 % glycerol and resuspended with ice-cold 10 % glycerol . 
+ The transformation was then performed by electroporation . 
+ After 1h recovery at 37 °C in SOB medium , bacteria were spread onto LB agar plates supplemented with antibiotics for the selection of CmR or KnR recombinants . 
+ Correct insertion of the marker and 3XFLAG sequence in the genome was confirmed by PCR amplification and by sequencing of the insertion site using primers listed in ( S1 Table ) . 
+ When necessary , the FRT-flanked antibiotic resistance cassette was removed using pCP20 [ 43 ] . 
+ Construction and purification of carboxy-terminally His-tagged UvrY protein
+ His-tagged UvrY ( UvrY-His6 ) protein was constructed by PCR amplification of the coding sequence of uvrY gene using genomic DNA of E. coli MG1655 as template DNA and oligonucleotides UvrY-6xhis-F and UvrY-6xhis-R as primers ( S1 Table ) . 
+ The amplicon was gel-puri-fied , digested using Nde I and Xho I restriction enzymes ( NEB ) , and cloned into a similarly digested and dephosphorylated pET24-a ( + ) vector DNA ( NEB ) using electrocompetent DH5α for the transformation . 
+ The pET-UvrY plasmid was confirmed by gel electrophoresis and sequencing of the inserted DNA using T7-promoter and T7 terminator primers . 
+ The pET-UvrY plasmid was moved into BL21 ( DE3 ) E. coli strain for expression . 
+ The E. coli BL21_DE3 / pET-UvrY cells carrying UvrY-His6 were grown in 500 mL of LB containing 50 μg / mL kanamycin at 37 °C . 
+ At OD600 of 0.6 , expression of the UvrY-His6 was induced by the addition of isopropyl β-D-1-thiogalactopyranoside ( IPTG ) to 1 mM , followed by 2h of shaking incubation at 37 °C , 250 rpm . 
+ The purification procedure for UvrY-His6 was similar to the method previously described for isolation of DeaD-His6 with some modification [ 4 ] . 
+ Cells were collected by centrifugation ( 600 x G , 4 °C , 5min ) , suspended in buffer ( 20 mM Tris-HCl [ pH 7.9 ] , 500 mM NaCl , 20 mM imidazole ) containing a protease inhibitor cocktail ( cOmplete Mini , EDTA-free protease inhib-itor cocktail , Roche Diagnostics ) , and lysed using a French Press . 
+ The lysates were centrifuged ( 20,000 x G , 15 min , 4 °C ) to remove unbroken cells and cell debris and the supernatant solutions were purified by affinity chromatography on a HisTrap HP column as instructed by the manufacturer ( GE Healthcare Life Sciences ) . 
+ Fractions containing UvrY were pooled and dialyzed against UvrY storage buffer ( 20 mM Tris-HCl ( pH 7.9 ) , 150 mM NaCl , 1 mM dithiothreitol and 10 % glycerol ) . 
+ UvrY was then aliquoted , frozen using liquid nitrogen , and stored at − 80 °C . 
+ The bicinchoninic acid method was used for assaying protein concentration , as recommended ( Pierce Biotechnology ) . 
+ UvrY phosphorylation in vitro
+ For in vitro phosphorylation , the purified UvrY-His6 protein was incubated with 100 mM ace-tyl-phosphate ( Sigma ) in a phosphorylation buffer containing 50 mM HEPES pH7 .5 , 100 mM NaCl and 10mM MgCl2 . 
+ The reaction mixtures were incubated for 60 min at room temperature . 
+ To determine relative amount of UvrY-P formed , the reactions were fractionated on Phos-tagTM SDS-PAGE gels [ 1.0 mm Protean 3 ( Bio-Rad ) minigels ] of the following composition : 7.5 % separating gel [ 7.5 % ( 29:1 ) acrylamide : bis-acrylamide , 357mM Bis-Tris , pH 6.8 , 100 μM Zn ( NO3 ) 2 , and 50 μM Phos-tag reagent ( Waco Pure Chemical Industries ) ] with a 4 % stacking gel [ 4 % ( 29:1 ) acrylamide : bis-acrylamide , 357 mM Bis-Tris , pH 6.8 ] . 
+ UvrY-P was resolved by electrophoresis at constant 150 V at 4 °C for 70 min using modified MOPS running buffer ( 100 mM Tris , 100 mM MOPS , 0.5 % SDS , and 5 mM NaHSO3 ) . 
+ Gels were subsequently stained with Coomassie blue and the signals were imaged using a ChemiDoc XRS + system ( Bio-Rad ) and quantified using Quantity One image analysis software ( Bio-Rad ) . 
+ ChIP-exo and deep sequencing
+ E. coli and Salmonella strains carrying biologically functional UvrY-FLAG [ 4 ] or SirA-FLAG proteins ( S1 Fig ) were grown in Kornberg medium and LB ( supplemented with 10 mM glucose ) , respectively at 37 °C , 250 rpm . 
+ At mid-exponential phase of growth ( OD600 of 0.6 ) , form-aldehyde ( 1 % final concentration ) was added and the cultures were incubated for 20 min at 30 °C , 150 rpm . 
+ The crosslinking reaction was then quenched by the addition of 5 mL of 1.0 M glycine , pH 8.0 to 10 mL of culture . 
+ The samples were kept at room temperature for 5 min with gentle swirling . 
+ The cells were harvested by centrifugation ( 6000 x G , 5 min , 4 °C ) , washed twice with ice-cold 1x PBS , suspended in 500 μL lysis buffer ( 10 mM Tris-HCl pH 8.8 , 50 mM NaCl , 20 % [ w/v ] sucrose , 10 mM EDTA ) containing protease inhibitor cocktail ( cOmplete Mini , EDTA-free protease inhibitor cocktail , Roche Diagnostics ) and 2 mg/mL lysozyme . 
+ After 30 min on ice , 500 μl of 2x IP buffer ( 100 mM Tris-HCl , pH 7.0 , 300 mM NaCl , 2 % Triton X-100 , 2 mM EDTA ) was added and the samples were incubated at 37 °C for 10 min , followed by 2 min on ice . 
+ The samples were then sonicated on ice , ( 60 pulses total , 5 sec on 5 sec off per pulse , in a total of 6 sets , separated by 2 min on ice , using a sonicator ( Fisher Scientific , Sonic Dismembrator Model 500 , set at 20 % amplitude ) . 
+ Unbroken cells and debris were removed by centrifugation . 
+ Sonicated samples ( 1 mL ) were treated with 20 μL ANTI-FLAG1 M2 beads ( Sigma ) by spinning on a tube rotating mixer ( Thermo Scientific ) at 4 °C for 4 hr to immu-noprecipitate UvrY-FLAG . 
+ The beads were washed three times with 1mL 1X IP buffer and two times with 1mL 1X TE buffer ( 10 mM Tris-HCl pH 8.0 , 1 mM EDTA ) . 
+ The beads were resuspended in 100 μL TE buffer and the formaldehyde crosslinking was reversed by heating at 95 °C for 20 min . 
+ Afterward , the samples were incubated with 8 μL of 10 mg/mL RNase A for 2 h at 37 °C and with 4 μL of 20 mg/mL proteinase K at 55 °C for 2h . 
+ The resulting DNA was purified using a Qiagen MinElute PCR Purification Kit according to manufacturer 's instructions . 
+ As a control , wild type strains expressing no FLAG epitope tagged protein were treated using the same procedures . 
+ The specificity of the ChIP assay was confirmed by amplifying promoter DNA of the csrB , lacY and 16s rRNA ( rrsH ) genes with primers ( S1 Table ) using the DNA recovered from the ChIP reactions as a template ( S2 Fig ) . 
+ csrB was used as a positive control for UvrY/SirA binding while lacY ( E. coli ) and 16S rDNA ( Salmonella ) were used as negative controls . 
+ After confirming the specificity of the ChIP reactions , the DNA samples were frozen and shipped to Peconic LLC for the remainder of the ChIP-exo procedure and deep sequencing with Illumina Hi-seq 2000 as described previously [ 34 ] . 
+ ChIP-exo data analysis
+ Sequencing reads were mapped to their respective E. coli MG1655 ( NC_000913 .3 ) and Salmo-nella enterica subsp . 
+ enterica serovar Typhimurium 14028S ( NC_016856 .1 ) genomes with BWA ( Burrows-Wheeler Aligner , available from http://bio-bwa.sourceforge.net ) . 
+ Alignments were visualized using the Integrated Genome Viewer ( IGV , Broad Institute ) [ 44 , 45 ] . 
+ Peaks were identified in the mapped reads with GeneTrack [ 46 ] . 
+ Low confidence singleton peaks and those without matching peaks on the opposing strand were discarded . 
+ Enriched genomic loci were classified into three groups based on enrichment of read counts relative to read counts within 500 bp upstream of the transcription start sites of lacY and/or 16s rRNA ( rrsH ) , which were determined not to be direct targets of UvrY/SirA ( S2 Fig ) . 
+ Maps of enriched regions are shown as bedgraphs , which display in a continuous fashion the abundance of reads identified at particular position in the genome from the sequencing analysis . 
+ For motif discovery , DNA sequences from the peak-pair midpoint of each of the enriched genomic loci , with + / - 40nt upstream and downstream extension distance were extracted . 
+ Centered in the 81nt-long extracted DNA sequence is the UvrY/SirA crosslinking site . 
+ The extracted DNA sequences were uploaded into MEME Suite [ 47 ] for motif discovery analysis . 
+ The datasets from ChIP-exo analyses were submitted to the NCBI GEO repository under the accession number GSE74810 . 
+ RNA extraction and Northern blotting
+ Procedures for RNA extraction and Northern blotting were conducted as described previously , with minor modification [ 4 ] . 
+ Briefly , bacteria were harvested at mid-exponential phase ( OD600 of 0.6 ) or as otherwise stated and total cellular RNA was prepared ( RNeasy mini kit , Qiagen ) . 
+ RNA was separated by electrophoresis on 5 % polyacrylamide gels containing 7 M urea , transferred to a positively charged nylon membrane ( Roche Diagnostics ) by electro-blotting , and crosslinked to the membrane with UV light . 
+ Crosslinked RNA was hybridized to DIG-labeled antisense probe ( 68 °C , overnight ) and signal was developed using the DIG Northern Starter kit ( Roche Diagnostics ) . 
+ Antisense RNAs were prepared from PCR products using the DIG Northern Starter kit ( Roche Diagnostics ) . 
+ Blots were imaged using a ChemiDoc XRS + system ( BioRad ) and RNA signals quantified using Quantity One image analysis software ( Bio-Rad ) . 
+ The 5S or 16S and 23S rRNAs served as loading controls , and were detected by hybridization or methylene blue staining , respectively . 
+ Western blotting
+ Western blotting was conducted as previously described [ 4 ] . 
+ Briefly , proteins ( 10 μg ) were separated using SDS-PAGE and electroblotted onto 0.2 mm polyvinylidene difluoride mem-branes . 
+ Anti-FLAG1 M2 monoclonal antibody ( Sigma ) and anti-RpoB monoclonal antibody ( Neoclone ) were used for detection of the FLAG epitope and RpoB . 
+ Signal detection used treatment with horseradish peroxidase-linked secondary antibodies followed by SuperSignal1 West Femto Chemiluminescent Substrate ( Thermo Scientific ) . 
+ Blots were imaged using the ChemiDoc XRS + system ( Bio-Rad ) and the signals were quantified using Quantity One image analysis software ( Bio-Rad ) . 
+ Analysis of UvrY-FLAG protein stability
+ Bacterial cells were grown to mid-exponential phase of growth , at which point protein synthesis was stopped by the addition of tetracycline ( 200 μg / mL ) and chloramphenicol ( 100 μg / mL ) . 
+ Cells were collected at several times following the addition of the antibiotics , harvested by centrifugation and were immediately mixed with Laemmli sample buffer and lysed by sonication and boiling . 
+ Samples ( 10 μg protein ) were subjected to SDS - PAGE and were analyzed by Western blotting as described above . 
+ The assays for specificβ-Galactosidase activity were conducted as described previously [ 2 ] . 
+ Values represent the averages from two independent experiments . 
+ Error bars represent standard errors of the means . 
+ Analysis of UvrY phosphorylation in vivo
+ Studies to determine the extent of phosphorylation of the UvrY-FLAG protein in vivo were conducted as previously described [ 4 ] . 
+ Briefly , proteins from cell lysates were fractionated on 
+ Phos-tagTM SDS-PAGE gels , which resolve UvrY from UvrY-P , and the FLAG epitope of the UvrY-FLAG protein was detected by Western blotting . 
+ Quantitative RT-PCR
+ Quantitative Real Time-PCR ( q-RT-PCR ) was conducted in an iCycler ™ thermocycler ( BioRad ) using the iScript ™ One-Step RT-PCR Kit with SYBR1 Green ( Bio - Rad ) as described previously , with minor modifications [ 4 ] . 
+ Reactions ( 15 μl ) contained immunoprecipitated DNA ( 50 ng / μl ) , primers ( 0.5 μM each ; S1 Table ) and SYBR1 Green RT-PCR Reaction Mix . 
+ PCR cycle parameters were as follows : 40 cycles of PCR at 95 °C denaturation for 10 s , 60 °C of annealing , extension , and detection for 30 s . 
+ The specificity of the PCR product was determined by melting curve analysis with reference to the calculated Tm . 
+ For melting curve analysis , the temperature was increased from 60 °C to 95 °C at a rate of 0.5 °C / 10 s. PCR product concentration was determined using a standard curve prepared with iCycler iQ optical system software version 3.1 ( Bio-Rad ) , according to the manufacturer 's instructions . 
+ Analysis of UvrY binding to csrB DNA in vivo by ChIP-PCR
+ UvrY binding to csrB genomic DNA in vivo was analyzed by ChIP-quantitative PCR ( ChIP-PCR ) . 
+ The first steps of the ChIP-PCR assay were conducted as described for the ChIP-exo assay . 
+ After releasing the formaldehyde crosslinks , the DNA was extracted and purified using Qiagen MinElute Purification Kit . 
+ The isolated csrB DNA was assayed using quantitative real time-PCR ( q-RT-PCR ) with primers listed in ( S1 Table ) . 
+ The lacY gene served as a negative control for this reaction using primers also listed in ( S1 Table ) . 
+ Electrophoretic gel mobility shift assay (EMSA) for DNA binding
+ For DNA electrophoretic gel mobility shift assays , the regions from -246 to +49 and -247 to +56 , with respect to the transcriptional start sites of csrB and csrC , respectively , were amplified by PCR and end-labeled with [ γ-32P ] ATP using T4 polynucleotide kinase . 
+ Binding reactions ( 10 μl ) contained 0.5 nM of end labeled DNA , 20 mM Tris HCl ( pH 7.5 ) , 10 % ( v/v ) Glycerol , 50 mM KCl , 3 mM MgCl2 , 1 mM dithiothreitol , 100 μg / mL BSA and phosphorylated or nonphosphorylated UvrY-His6 protein . 
+ Reactions were incubated for 30 min at 37 °C , then 1μl xylene cyanol was added and samples were separated by electrophoresis on 7 % non-denaturing polyacrylamide gels with 0.5 X TBE as the running buffer . 
+ After electrophoresis , the gels were vacuum dried , the radioactive signals captured by phosphorimage analysis ( PMITM , Bio-Rad ) and analyzed using Quantity One software . 
+ DNase I footprinting
+ Double stranded DNA for footprinting of csrB ( 466 bp long , extending from 420 bp upstream to 46 bp downstream of the TSS ) or csrC ( 419 bp long , extending from 319 bp upstream to 100 bp downstream of the TSS ) was prepared by PCR using strand-specific [ 32P ] 5 ' - end-primers ( S1 Table ) . 
+ The 5 ' - end-labeled PCR products were gel-purified and the DNA ( 0.5 nM ) was used for binding reactions ( 10 μL ) containing 20 mM Tris HCl ( pH 7.5 ) , 10 % ( v/v ) glycerol , 50 mM KCl , 3 mM MgCl2 , 1mM dithiothreitol and 100 μg / mL bovine serum albumen along with phosphorylated or non-phosphorylated UvrY-His6 protein . 
+ The binding reactions were incubated at 37 °C degrees for 30 min followed by treatment with 0.025 U DNase I ( Roche ) per reaction , for 1 min at 37 °C . 
+ The DNase I was inactivated by heating at 75 °C degrees for 10 min . 
+ Reactions were then mixed with formamide loading buffer ( 10 mL formamide , 10 mg xylene cyanol FF and 10 mg bromophenol blue ) , denatured at 95 °C for 5 min , and separated by electrophoresis on a 6 % denaturing polyacrylamide gel . 
+ The gel was vacuum dried , radioactive signals captured by phosphor imaging ( PMITM , Bio-Rad ) and analyzed with Quantity One software . 
+ Construction of plasmid-borne csrB-lacZ and csrC-lacZ fusions for S-30 transcription-translation 
+ The csrB-lacZ and csrC-lacZ carrying plasmids , pLFXcsrB-lacZ and pLFXcsrC-lacZ , were constructed by first amplifying the 502 nt ( -500 to +2 with respect to csrB transcriptional start site ) and 304 nt ( -301 to +3 with respect to csrC transcriptional start site ) genomic regions using the primer pairs csrB lacZ Fwd / csrB lacZ Rev and csrC lacZ Fwd / csrC lacZ Rev ( S1 Table ) . 
+ The PCR products were gel purified , digested with PstI and KpnI , ligated to PstI - and KpnI-digested and dephosphorylated plasmid pLFX , and electroporated into DH5αλpir cells . 
+ DNA sequencing was used to confirm that the cloned regions did not contain any mutations . 
+ In vitro coupled transcription-translation
+ In vitro transcription-translational assays used pLFXcsrB-lacZ and pLFXcsrC-lacZ supercoiled plasmids and were performed with S-30 extracts prepared from the UvrY deficient strain , CF7789 uvrY : : cam , as described previously [ 16 ] , except that reactions ( 32 μl ) contained 0.5 U E. coli RNA polymerase holoenzyme and 3 μl of 35S-methionine ( 1175 Ci/mmol ) . 
+ The UvrY-P ( 2.3 μM ) , ppGpp ( 250 μM ) and DksA ( 2 μM ) were included , as indicated , in the reactions . 
+ IHF ( 1μM ) was added to all of the reactions that contained pLFXcsrB-lacZ plasmid . 
+ DksA protein was a generous gift of Prof. Richard Gourse , University of Wisconsin , Madison . 
+ IHF protein was a generous gift from Prof. Anca Segall , San Diego State University . 
+ Incorporation of 35Smethionine into protein products was determined after SDS PAGE separation by using phosphorimaging . 
+ Signal intensity was determined using Quantity One software . 
+ Phylogenetic analysis of BarA, UvrY, CsrA, and FliW distribution
+ We identified CsrA , BarA , UvrY and FliW orthologs from all fully-sequenced bacterial genomes in the NCBI genomes database ( http://www.ncbi.nlm.nih.gov/genome/ ) , using the NCBI Prokaryotic Genome Annotation Pipeline v2 .0 to assign orthology [ 48 ] . 
+ This pipeline combines a sequence similarity-based approach with the comparison of the predicted gene products to the nonredundant protein database , Entrez Protein Clusters , the Conserved Domain Database ( CDD ) [ 48 ] . 
+ In order to filter out false positives and false negatives from the NCBI orthologs data set , we aligned a set of representative sequences against the genomes with predicted orthologs using tblastn [ 49 ] . 
+ The alignment results were filtered by sequence similarity and alignment coverage : BarA was filtered by 50 % similarity and 50 % coverage , UvrY was filtered by 75 % similarity and 80 % coverage , CsrA was filtered by 60 % similarity and 50 % coverage , and FliW was filtered by 50 % similarity and 50 % coverage . 
+ All the similarity and coverage thresholds were established by the minimum similarity and coverage observed between annotated sequences from NCBI nucleotide database and the following set of representatives used as query sequences : BarA from Escherichia coli ( NCBI protein identification number PI : 190908466 ) , UvrY from E. coli ( PI : 190906390 ) , CsrA from E. coli and Bacillus subtilis ( PI : 257755450 and 459391362 ) , and FliW from Bacillus amyloliquefaciens ( PI : 307608134 ) . 
+ After filtering the results by alignment similarity and coverage , we performed a manual refinement of the results by verifying the pairwise alignments , sequence annotations , and bibliographic information . 
+ A reference phylogeny of fully-sequenced genomes encoding CsrA and at least one of BarA , UvrY or FliW was extracted from the NCBI taxonomy database [ 50 ] using NCBI-taxcollector 
+ [ 51 ] and PhyloT v2015 .1 ( http://phylot.biobyte.de ) , and gene presence/absence data were visualized using iTOL v3 .0 [ 52 ] . 
+ Correlations among presence/absence of CsrA , BarA , UvrY and FliW were calculated using Pearson 's product moment correlation , and significance was assessed using Fisher 's Z transform ( implemented in the R stats package , v3 .2.0 ) . 
+ For calculating the correlations , we clustered the results at species and genus level in order to minimize the number of false negatives and false positives . 
+ For clustering the results at species level , we established the presence/absence as the value that appears most often at strain level ( statistical mode ) . 
+ For clustering the results at genus level , we established the presence/absence as the mean calculated at species level . 
+ In order to determine if any strong biases in the methodology used to identify orthologous genes could affect our results we used two additional and widely-employed methods to identify alternative ortholog sets : KEGG orthology database ( KO ) [ 53 ] , and UniProt reference clusters of orthologs ( UniRef ) [ 54 ] . 
+ KEGG database defines the orthologs by comparing experimental data with KEGG pathway maps , BRITE functional hierarchies and KEGG modules . 
+ UniProt defines the orthologs by clustering protein sequences by 50 % identity . 
+ We compared the correlations calculated for these alternative approaches to the results obtained from the NCBI Pro-karyotic Genome Annotation Pipeline , at genus and species level . 
+ Results and Discussion
+ Determination of genome-wide UvrY/SirA-binding loci using ChIP-exo 
+ To probe for UvrY/SirA DNA binding sites in E. coli and Salmonella , we used ChIP-exo , a comprehensive genomic DNA-protein interaction assay with single nucleotide resolution and high specificity [ 34 ] . 
+ This assay uses nuclease trimming reactions to remove nonspecific DNA from the immunoprecipitated complexes and to identify the crosslinking sites much more precisely than ChIP-seq [ 34 ] . 
+ The experiments were performed with mid-exponential phase cultures ( OD600 of 0.6 ) of strains carrying FLAG-tagged UvrY or SirA proteins , respectively , expressed from the native genomic loci . 
+ Altogether , 44 million sequencing reads for UvrY ( E. coli ) and 31 million for SirA ( Salmonella ) were mapped to their respective genomes . 
+ From these analyses , two highly enriched genomic loci were identified at the csrB and csrC genes of both E. coli ( Fig 1A and 1B ) and Salmonella ( Fig 2A and 2B ) . 
+ These results indicated that the csrB and csrC genes represent the strongest targets of UvrY/SirA binding in these species . 
+ In addition to the csrB and csrC genes , weakly enriched genomic loci were identified proximal to the promoter regions of 286 genes in E. coli and 301 genes in Salmonella ( S2 Table ) . 
+ These enriched genomic loci were arbitrarily classified into three groups based on enrichment of their occupancy peaks over background ( S2 Table ) . 
+ Accordingly , genomic loci enriched 5-fold or greater than the lacY and/or 16s rRNA ( rrsH ) genes were included in group one . 
+ Besides csrB and csrC , the promoter regions of fhuF and spf from E. coli and spf from Salmo-nella fell into group one . 
+ Genomic loci that were enriched 2 - to 5-fold comprised group two , which included the promoter regions of 9 genes in E. coli and 8 genes in Salmonella . 
+ The rest of the genomic loci , which include regions proximal to the promoter regions of 275 genes in E. coli and 292 genes in Salmonella , have only1 .5 - to 2.0 occupancy read fold over that of lacY or 16s rDNA ( rrsH ) and were grouped into group three . 
+ UvrY/SirA consensus DNA binding motif
+ Further inspection of the ChIP-exo results showed enriched crosslinking in the upstream locations of csrB and csrC , with sequences extending from -222 to -142 from the start of transcription ( csrB ) or -223 to -143 ( csrC ) in E. coli ( Fig 1D and 1E ) and from -215 to -135 ( csrB ) and -181 to -101 ( csrC ) in Salmonella ( Fig 2C and 2D ) . 
+ Distinct sites of crosslinking in these genes co-immunoprecipitation with cross-linked UvrY-FLAG , proximal to the csrB ( A ) , csrC ( B ) , spf ( B ) and cspA ( C ) genes . 
+ Two UvrY crosslinking sites were discovered in the promoter regions of csrB ( panel A , underlined in panel D ) and csrC ( panel B , underlined in panel E ) . 
+ One site is within the region extending from -222 to -142 in csrB or from -223 to -143 in csrC . 
+ The other crosslinking site lies close to the promoter , extending from -56 to +25 in csrB and from -49 to +32 in csrC promoter regions . 
+ A 9 bp inverted repeat DNA sequence within the upstream crosslinking regions is bolded and capitalized . 
+ A putative IHF binding site ( broken underline ) is located between the two UvrY crosslinking sites in the promoter region of csrB ( D ) , but was not apparent in csrC . 
+ The DNA used for DNase I footprinting included 466 bp ( -420 to +46 ) for csrB and 419 bp ( -319 to +100 ) for csrC . 
+ DNA fragments used for electrophoretic mobility shift assay included ( -246 to +49 ) for csrB and ( -247 to +56 ) for csrC . 
+ The images were constructed using the Integrated Genome Viewer ( IGV , Broad Institute ) ( 44 , 45 ) . 
+ were also observed at downstream sequences that extended from -56 to +25 ( csrB ) and from -49 to +32 ( csrC ) in E. coli ( Fig 1D and 1E ) and from -41 to +40 ( csrB ) and from -81 to -1 ( csrC ) in Salmonella ( Fig 2C and 2D ) . 
+ In the center of each of the upstream putative binding proximal to the regulatory regions of csrB ( A ) , csrC ( B ) and spf ( B ) . 
+ Two putative SirA binding sites are shown for the promoter regions of csrB ( rna62 ) ( panel A , underlined in panel C ) and csrC ( panel B , underlined in panel D ) . 
+ One site is located in the region from -215 to -135 in csrB and from -181 to -101 csrC promoter regions , respectively . 
+ The other putative binding site is close to the promoter , within the -41 to +40 ( csrB ) and -81 to -1 in csrC promoter regions , respectively . 
+ A 9 bp-long inverted repeat DNA sequence in the upstream sites is shown in bold and capitalized . 
+ A putative IHF binding site is marked between the two putative UvrY binding sites in the promoter region of csrB ( C , broken underline ) , but was not apparent in csrC . 
+ The images were constructed using the Integrated Genome Viewer ( IGV , Broad Institute ) ( 44 , 45 ) . 
+ sites is an 18 nt nearly perfect palindrome or inverted repeat sequence ( IR ) . 
+ In csrB , the sequence , TGTGAGAGATCTCTTACA , is centered at -183 / -182 in E. coli ( Fig 1D ) and -182 / -181 in Salmonella ( Fig 2C ) . 
+ Moreover , a partially conserved additional sequence ( TGTAGGAGA ) located 5 bp downstream of the IR , is seen in the csrB promoter of E. coli ( Fig 1D ) and Salmonella ( Fig 2C ) . 
+ The IR of csrC shows weaker symmetry in both E. coli ( TGTGAGACATTGCCGATA ) ( Fig 1E ) and Salmonella ( TGTGAGACATTGACCATT ) ( Fig 2D ) . 
+ Moreover , a partially conserved sequence ( TGTAAG ) representing half of the IR , located 16 bp upstream of the IR , is also seen in the csrC promoter of E. coli ( Fig 1E ) and Salmonella ( Fig 2D ) . 
+ In contrast , the IR is not preserved in the downstream crosslinking sites of the csrB or csrC genes . 
+ To search for the conserved IR at the weaker targets of UvrY/SirA ( S2 Table ) , DNA sequences from these enriched promoter regions were analyzed using MEME Suite software [ 47 ] . 
+ These analyses showed that the IR sequence is not conserved in any of the weaker putative binding sites . 
+ UvrY requires phosphorylation for DNA binding
+ In the BarA-UvrY TCS , BarA is the histidine kinase , which upon signal detection , autopho-sphorylates and transfers the phosphoryl group to a conserved aspartate residue of UvrY [ 20 ] . 
+ The phosphorylated UvrY is then thought to bind specific DNA sequences in the promoters of its target genes and regulate their transcription [ 18 , 20 , 22 ] . 
+ MBP-SirA was previously found to bind similarly to csrB DNA in vitro regardless of whether it had been phosphorylated or not , starting at a concentration of 1.5 μM [ 25 ] , suggesting that MBP-SirA phosphorylation might not be required for DNA binding in vitro . 
+ This observation was consistent with another report , showing that phosphorylation only increased the DNA binding affinity of SirA-His6 by approximately two-fold in vitro [ 19 ] . 
+ In contrast to results from in vitro DNA binding experiments , deletion of the gene for BarA sensor kinase caused more than a 10-fold decrease in the level of CsrB RNA [ 16 , 22 , 35 ] . 
+ Similarly , in Salmonella , substitution of alanine for the predicted phosphorylated histine residue of SirA , Asp54 , caused the loss of CsrB expression [ 25 ] . 
+ Thus , it appears that phosphorylation is critical for UvrY activity in vivo . 
+ Whether UvrY requires phosphorylation for efficient DNA binding in vivo or for later steps in transcription initiation was not clear . 
+ To address this issue , we used both in vitro and in vivo assays to investigate the role of UvrY phosphorylation in DNA binding . 
+ We performed electrophoretic mobility shift assay ( EMSA ) using phosphorylated ( UvrY-P ) and non-phosphorylated forms of the recombinant protein , UvrY-His6 , which was determined to be functional in vivo ( S1 Fig ) . 
+ DNA fragments that encompass the ChIP-exo-derived putative UvrY binding sites in the promoter regions of csrB ( Fig 1D ) and csrC ( Fig 1E ) were used as probes . 
+ Unlabeled csrB and rrlE DNA fragments were used as competitive and non-competitive DNA probes , respectively . 
+ Similar to the previous observations [ 19 , 25 ] , our results showed that in vitro phosphorylation led to a modest ( ~ 2-fold ) increase in DNA binding affinity of UvrY to csrB ( Fig 3A and 3B ) and csrC DNA ( Fig 3C ) , as compared to the non-phosphorylated UvrY . 
+ However , we also observed that the UvrY protein preparation contained around 7 % of UvrY-P prior to the in vitro phosphorylation reaction ( Fig 3D ) . 
+ It is possible that this contaminating UvrY-P was entirely responsible for binding to csrB and csrC DNA in vitro ( Fig 3B ) . 
+ To determine whether UvrY requires phosphorylation for in vivo DNA binding , we tested UvrY-FLAG binding to csrB in vivo in a strain that lacked BarA ( ΔbarA ) and its isogenic barA wild-type strain . 
+ For this analysis , we first determined the phosphorylation state of UvrY-FLAG protein in the presence and absence of BarA . 
+ The results indicated that in the presence phosphorylated ( UvrY-P ) and non-phosphorylated ( UvrY ) UvrY-His6 to csrB DNA ( A and B ) and csrC DNA ( C ) was tested as shown . 
+ The csrB and csrC DNA probes ( 0.5 nM ) used for this experiment ( depicted in Fig 1D and 1E , respectively ) were incubated with increasing concentrations of in vitro phosphorylated or nonphosphorylated UvrY-His6 protein for 30 min at room temperature . 
+ The DNA-protein complexes were resolved by electrophoresis on a non-denaturing 7 % polyacrylamide gel . 
+ The phosphorylation state of the UvrY-His6 protein used in these experiments was determined by Phos-tag SDS PAGE gel analysis ( D ) . 
+ of BarA , approximately 7 % of the UvrY protein was phosphorylated ( S3 Fig ) , similar to a previous determination [ 4 ] . 
+ In the absence of BarA , no detectable phosphorylation of UvrY-FLAG protein was observed in LB medium at mid-exponential phase of growth ( S3 Fig ) . 
+ We next used ChIP-PCR to test for in vivo binding of UvrY to csrB DNA under this growth condition . 
+ In this experiment , an approximate 35-fold reduction in DNA binding was observed in the strain that lacked BarA ( ΔbarA ) relative to the isogenic barA wild-type strain ( Fig 4 ) . 
+ These results show for the first time that UvrY requires phosphorylation for effective DNA binding in vivo . 
+ vivo binding of UvrY to csrB promoter were determined by ChIP-quantitative PCR ( ChIP-PCR ) in a WT strain ( MG1655 expressing UvrY-FLAG ) and isogenic ΔbarA , ΔihfA , ΔihfB and ΔihfAΔihfB strains , as described in the Experimental Procedures . 
+ The lacY gene served as a negative control for this reaction . 
+ Data depict the results of three independent experiments . 
+ Error bars represent the standard errors of the means . 
+ 32P-end labeled DNA probe that included both the upstream and downstream putative UvrY binding sites was used for these experiments ( shown in Fig 1D ) . 
+ Reactions in all lanes except 1 contained DNase I ( 0.025 U / 12.5 ul reaction ) . 
+ Reactions in lanes 3 -- 6 and lanes 7 -- 10 contained 0.25 , 0.35 , 0.5 , 0.7 μM of phosphorylated or non-phosphorylated UvrY-His6 , respectively . 
+ Lane 2 reaction contained no UvrY protein . 
+ A vertical black bar indicates a protected region , and the sequence corresponding to the protected region is shown in a vertical rectangular box . 
+ An alignment of sequences corresponding to the protected regions from DNase I and the ChIP-exo results is shown in the horizontal rectangular box . 
+ The 18nt-long palindromic sequence and the partially conserved palindromic sequences are marked with broken black lines . 
+ UvrY specifically binds to an 18 nt-long IR DNA sequence
+ In order to more precisely define the UvrY DNA binding sites , we performed in vitro DNase I footprinting experiments on csrB and csrC DNA using phosphorylated and non-phosphory-lated UvrY-His6 protein . 
+ The DNA probes for this experiment encompassed both the upstream and downstream putative binding sites of these genes ( Fig 1D and 1E ) . 
+ The results of these experiments showed protection of only the upstream binding sites containing the IR sequences of both csrB ( Fig 5 ) and csrC ( Fig 6 and S4 Fig ) . 
+ The putative downstream binding sites that were observed in vivo ( Fig 1A and 1B & 1D and 1E ) were not protected in vitro . 
+ In addition , only the phosphorylated UvrY-His6 was observed to protect the IR , which further indicates that UvrY requires phosphorylation for tight , specific binding . 
+ Together , these results suggest that UvrY-P binds specifically and directly to the IR sequence of these genes . 
+ However , in vivo binding of UvrY-P to the downstream sequences of these genes either requires conditions or factor ( s ) that were absent from the footprinting reactions or perhaps more likely , UvrY-P becomes cross-linked to DNA indirectly through interactions with DNA binding proteins such as RNA polymerase , which must bind to the downstream regions of these genes in order to initiate transcription . 
+ UvrY requires IHF for optimal binding to and expression of csrB but not csrC 
+ The nucleoid-associated protein Integration Host Factor ( IHF ) is a heterodimeric protein composed of two homologous subunits , IHFα ( IhfA ) and IHFβ ( IhfB ) , which facilitates the transcription of many genes by bending the DNA [ 55 -- 57 ] . 
+ In the Csr/Rsm system , IHF was shown to directly bind to the promoter of the csrB gene of Salmonella ) [ 18 ] and the rsmZ promoter of Pseudomonas fluorescens [ 58 ] . 
+ In Salmonella , deletion of ihfA was also shown to decrease csrB 
+ 32P-end labeled DNA probe that included both the upstream and downstream putative UvrY binding sites was used for these experiments ( Fig 1E ) . 
+ Reactions in all lanes except lanes 1 contained DNase I ( 0.025 U/12 .5 ul reaction ) . 
+ Reactions in lanes 3 -- 6 and lanes 7 -- 10 contained 0.25 , 0.35 , 0.5 , 0.7 μM of phosphorylated or nonphosphorylated UvrY-His6 , respectively . 
+ Lane 2 reaction contained no UvrY . 
+ A vertical black bar indicates a protected region , and the sequence corresponding to the protected region is shown in a vertical rectangular box . 
+ An alignment of sequences corresponding to the protected regions from DNase I and ChIP-exo results is shown in the horizontal rectangular box . 
+ The 18nt-long palindromic sequence and the partially conserved palindromic sequences are marked with broken black lines . 
+ expression [ 18 ] . 
+ However , whether the observed effect of IHF in sRNA expression in those bacterial species is UvrY-mediated and if so , whether IHF is required for UvrY to bind to DNA or for later transcription initiation steps is not clear . 
+ In this study , we identified a putative IHF binding site in the promoter region of csrB of E. coli ( Fig 1D ) and Salmonella ( Fig 2C ) , in agreement with an earlier report [ 18 ] , located between the upstream and downstream in vivo crosslinking sites for UvrY-P , but we did not identify a similar site within the csrC genes . 
+ To determine whether IHF affects expression of csrB in E. coli , we measured the levels of CsrB in the presence and absence of IHF ( ΔihfA and/or ΔihfB ) . 
+ Our results revealed a ~ 10-fold reduction in CsrB RNA levels in the absence of IhfA , IhfB , or both , compared to the isogenic wild type strain ( Fig 7A ) . 
+ We also performed epistasis experiments in which we measured the effect of IHF ( ΔihfA and/or ΔihfB ) on the expression of csrB in a strain lacking uvrY ( ΔuvrY ) . 
+ Our results showed that neither IhfA nor IhfB affected the levels of CsrB in the uvrY mutant strain ( Fig 7A ) , suggesting that UvrY mediates the effect of IHF on the expression of csrB . 
+ Analysis of csrB DNA binding by UvrY-FLAG using in vivo ChIP-PCR analyses showed that binding was reduced by approximately 2.5-fold in the absence of either IhfA or IhfB ( Fig 4 ) , suggesting that IHF is required for optimal binding of UvrY-P to csrB DNA in vivo . 
+ However , this modest effect of IHF on UvrY DNA binding does not appear to account for its 10 to 12-fold effect on csrB expression . 
+ Therefore , IHF also appears to affect later steps in csrB transcription ( Fig 7A ) . 
+ Western blotting experiments demonstrated that IHF did not affect the phosphorylation ( S3 Fig ) or expression of UvrY-FLAG ( S5 Fig ) . 
+ Hence , our results reveal that IHF is required for optimum binding of UvrY-P to csrB DNA and activation of csrB transcription . 
+ gene deletions on the levels of E. coli CsrB ( A ) and CsrC ( B ) . 
+ Cultures were grown in LB to mid-exponential growth phase ( OD600 of 0.6 ) . 
+ The 16S/23S rRNA loading controls are also shown . 
+ IHF facilitates the transcription of many genes by bringing relatively distant DNA sites ( ~ 200 bp ) closer together in space by bending the DNA ( ~ 140 ° ) [ 59 , 60 ] . 
+ Therefore , we propose that UvrY-P bound at the upstream IR site ( centered at -183 / -182 in csrB of E. coli and at -182 / -181 in the csrB of Salmonella ) is brought into the proximity of the csrB promoter-RNA polymerase complex by IHF-mediated bending of the DNA . 
+ Such DNA bending may lead to UvrY-P binding to the downstream DNA crosslinking site directly or indirectly through interactions with RNA polymerase or perhaps with other unknown DNA binding factor ( s ) . 
+ This type of transcriptional activation , known as repositioning , is observed in promoters where the primary activator is unable to make a productive contact with RNA polymerase without being repositioned by a secondary activator [ 60 , 61 ] . 
+ Such a mechanism appears to be required for the activation of the E. coli narG promoter by NarL , another FixJ family response regulator , which binds at -- 190 , while IHF binds at -125 and Fnr binds at -- 41 of narG [ 59 , 60 ] . 
+ In contrast to its effect on csrB , IHF was not needed for expression of csrC ( Fig 7B ) . 
+ In fact , in the single and double IHF deletion strains , CsrC RNA levels were 2-fold higher than in the isogenic wild type strain ( Fig 7B ) . 
+ This may be because in the ΔihfA ΔihfB strains , the lower levels of CsrB RNA ( Fig 7A ) lead to an increase in the concentration of free CsrA in the cell , which in turn leads to increased levels of CsrC via a negative feedback loop in the Csr system [ 13 ] . 
+ UvrY effects on the expression of other putative in vivo targets
+ To examine the regulatory effects of UvrY on the expression of the potential new target genes discovered by ChIP-exo ( S2 Table ) , we tested several gene products and transcripts from groups one , two and three by Western and Northern blotting . 
+ Our results showed little or no regulatory effects of UvrY on the expression of the genes that were tested ( S6A and S6C Fig ) , with the exception of cspA . 
+ UvrY showed a relatively modest , but reproducible negative effect on the expression of the cold-shock protein CspA ( Fig 8 ) . 
+ A 2-fold increase in the levels of CspA-FLAG was observed in a ΔuvrY strain compared to the isogenic wild type strain ( Fig 8 ) . 
+ Furthermore , disruption of csrA caused a modest decrease in CspA-FLAG levels , suggesting that CsrA may activate cspA expression . 
+ To determine whether UvrY regulates cspA indirectly through its effects on CsrB/C , and therefore CsrA activity , we conducted an epistasis analysis ( Fig 8 ) . 
+ A strain that was disrupted in both uvrY and csrA was transformed with plasmids expressing uvrY or csrA from the plasmid cloning vector pBR322 , and CspA-FLAG protein levels were determined in the resulting strains . 
+ We observed that CspA-FLAG levels were essentially identical in strains containing pBR322 and the csrAexpression plasmid pCRA16 . 
+ In contrast , the uvrY-expression plasmid ( pUY14 ) led to modest effects of uvrY deletion and csrA : : kan disruption on the levels of CspA-FLAG protein in an MG1655 derivative expressing CspA-FLAG ( WT ) from the cspA genomic locus . 
+ Effect of csrA : : kan ΔuvrY complemented with UvrY ( pUY14 ) , CsrA ( pCRA16 ) or control ( pBR322 ) expression plasmids is also shown . 
+ Cultures were grown at 37 °C in LB to mid-exponential growth phase ( OD600 of 0.6 ) . 
+ The RpoB protein served as a loading control . 
+ This experiment was repeated at least three times with reproducible results . 
+ decrease in CspA-FLAG levels in this strain . 
+ These results are consistent with a model in which UvrY inhibits cspA expression by binding to the cspA promoter , while the effect of CsrA may be mediated indirectly . 
+ Also consistent with this model , in silico analyses did not reveal typical CsrA binding sequences ( GGA ) in the 160 NT 5 ' - UTR of the cspA transcript ( data not shown ) and an RNA-seq analysis conducted previously did not reveal cspA mRNA among the 721 different transcripts that copurified with CsrA [ 2 ] . 
+ Based on all of these observations , we conclude that , under the growth conditions tested , the BarA-UvrY TCS directly exerts its global effect on gene expression primarily through the Csr system by activating the transcription of the CsrB and CsrC sRNAs , with cspA representing a likely exception . 
+ This raises the question of why UvrY-FLAG crosslinked in vivo ( though weakly ) to the regulatory regions of 286 genes in E. coli and 301 genes in Salmonella ( S2 Table ) , but exerted little or no regulatory effect in the examples that were tested . 
+ To explain these observations , we propose three possible hypotheses : ( i ) Perhaps other unknown activator ( s ) are required along with UvrY/SirA for the expression of these genes , but were not available or functioning under the chosen growth conditions . 
+ Thus , UvrY/SirA might activate some of these genes under other growth conditions . 
+ ( ii ) Perhaps one or more repressors overrides the influence of UvrY/SirA on the expression of these genes under the conditions tested [ 62 ] . 
+ To address these two hypotheses , we searched for factors known to regulate the expression of the putative UvrY target genes . 
+ This analysis revealed several DNA binding proteins including CRP , FNR and IHF appear to regulate the greatest number of potential UvrY targets ( S7 Fig ) . 
+ Hence , it is conceivable that such factors may mask the effects of UvrY on the expression of these genes under our growth conditions . 
+ ( iii ) Because the putative UvrY targets from the ChIP lack the 18nt-long IR sequence found in the regulatory regions of csrB/C , it is also possible that some of these genes have degenerate UvrY/SirA binding sites with no functional relevance or perhaps serving as nonspecific holding sites for UvrY/SirA [ 62 , 63 ] . 
+ UvrY expression is activated by CsrA via a putative posttranscriptional mechanism that does not involve the RNA helicase DeaD , a known regulator of uvrY translation In addition to BarA-UvrY , the transcription of CsrB/C is also strongly activated by CsrA [ 13 , 16 , 35 ] by mechanisms that are yet to be determined . 
+ Recent evidence suggested that CsrA positively affects uvrY expression both at the transcriptional and translational levels [ 35 ] . 
+ Moreover , CsrA is required for switching BarA from a protein possessing phosphatase activity to kinase activity on UvrY [ 35 ] . 
+ Thus , we tested the effect of CsrA on the expression , stability and phosphorylation of UvrY-FLAG . 
+ The results showed that while CsrA does not affect UvrY-FLAG protein stability ( S8A Fig ) , it has a strong positive effect on UvrY-FLAG expression gene fusions used in this study were previously depicted in ( 4 ) : lacUV5 promoter fused at the transcription start site to the uvrY mRNA leader and N ( 12 or 22 ) uvrY codons fused in frame to lacZ ( A ) . 
+ The effect of csrA disruption on expression of PlacUV5-uvrY ' -- ' lacZ reporter fusions is shown ( B ) . 
+ Cells were grown in LB and harvested at various times throughout growth and assayed for β-galactosidase specific activity ( A420/mg protein ) . 
+ The values represent the average of two independent experiments . 
+ Error bars depict standard error of the means . 
+ Western blots showing effects of csrA on the level of DeaD-FLAG ( C ) or SrmB-FLAG ( D ) in an MG1655 derivative that expresses the corresponding FLAG-tagged gene from its native genomic locus are shown . 
+ Effect of csrA : : kan complemented by a csrA expression plasmid ( pCRA16 ) or with a control plasmid ( pBR322 ) is also shown for DEAD-FLAG . 
+ RpoB served as a loading control for these blots . 
+ protein . 
+ Therefore , we hypothesized that in addition to activating uvrY expression , CsrA might also activate the expression of csrB by activating the expression of IHF . 
+ However , we observed no effect of CsrA on IhfA-FLAG or IhfB-FLAG levels when these fusions were expressed from their native chromosomal loci ( S8B Fig ) . 
+ Altogether , our results indicate that CsrA activates csrB/C transcription by activating uvrY expression , most likely through a posttranscriptional mechanism that is yet to be defined . 
+ Effects of ppGpp and DksA, mediators of the stringent response, on csrB expression
+ Previously , ppGpp and DksA were shown to strongly activate expression of CsrB and CsrC in E. coli [ 2 ] . 
+ However , the mechanism for these effects is yet to be elucidated . 
+ In an attempt to define this mechanism , we hypothesized that ppGpp and DksA might activate the expression of factor ( s ) that are known to activate CsrB/C transcription i.e. , UvrY , IHF , CsrA , DeaD and / or SrmB . 
+ Previous studies showed that ppGpp and DksA positively affect CsrA levels , but this effect appears to be too modest to account for their strong effects on CsrB/C levels [ 2 ] . 
+ Therefore , we first determined whether DksA and ppGpp affect the in vivo levels of UvrY-FLAG protein ( Western blot ) and the in vivo csrB DNA binding ( ChIP-PCR ) by UvrY in ΔdksA , ΔrelA ( encoding the major ppGpp synthase ) and isogenic wild type strains . 
+ It was found that ppGpp and DksA had weak or negligible effects on UvrY levels ( Fig 11 ) , in vivo binding of UvrY to csrB ( Fig 12 ) , and the in vivo phosphorylation status of the UvrY protein ( S3 Fig ) . 
+ Next , we tested the effects of DksA and ppGpp on IHF expression . 
+ The results showed no effect of DksA or ppGpp on IhfA-FLAG ( Fig 13A ) or IhfB-FLAG levels under our growth conditions ( Fig 13B ) . 
+ Because ppGpp and DksA did not substantially affect the in vivo expression of IHF or alter the in vivo binding of UvrY-P to csrB DNA , we decided to test their direct effects on the in vitro transcription of csrB and csrC genes . 
+ However , neither gene was expressed in defined transcription reactions that contained UvrY-P and IHF in the case of csrB and basal reaction components ( 5nM RNAP ; 40 mM Tris HCl , pH 7.9 ; NaCl ( 165 mM ) ; 5 % glycerol ; 10 mM MgCl2 ; 1 mM DTT ; 0.1 μg / μl BSA ; 500 μMATP ; 200 μM CTP and UTP ; 10 μMGTP and [ α-32P ] GTP ( 2.5 μCi ) . 
+ This suggests that unknown factors may be required for csrB/C 
+ MG1655 ( no FLAG fusion ) , WT ( MG1655 expressing UvrY-FLAG from the uvrY genomic locus ) , and isogenic ΔdksA and ΔrelA strains . 
+ Proteins were collected from cultures grown in LB medium to mid-exponential growth phase ( ~ OD600 of 0.6 ) . 
+ RpoB served as a loading control . 
+ transcription ( data not shown ) . 
+ We next used coupled transcription-translation in S-30 extracts to determine whether the addition of ppGpp and DksA would directly regulate expression of csrB/C in the presence of other cellular factors present in the S-30 extracts but not the defined transcription reactions . 
+ As previously reported [ 13 , 16 ] , UvrY-P strongly stimulated the expression of the full-length β-galactosidase from csrB-lacZ ( Fig 14 ) and csrC-lacZ ( S9 Fig ) transcriptional fusions in this assay . 
+ A series of truncated β-galactosidase products was also observed in these reactions , which quantitatively responded to activators similarly to the fulllength protein . 
+ While expression in the absence of UvrY-P was weak , the addition of ppGpp or ppGpp and DksA to the UvrY-deficient csrB-lacZ reaction caused a modest increase in expression ( Fig 14 ) . 
+ In the presence of UvrY-P , addition of ppGpp alone modestly activated csrB-lacZ expression ( 1.5-fold ) . 
+ DksA alone also had weak effects in the presence of UvrY-P ( 1.4-fold ) , while the addition of both ppGpp and DksA led to a 1.7-fold increase in csrB-lacZ expression . 
+ Addition of ppGpp alone , in the absence of UvrY-P , resulted in a slight increase in csrC-lacZ expression ( S9 Fig ) . 
+ However , ppGpp and/or DksA failed to activate csrC-lacZ expression in reactions containing UvrY-P . 
+ We conclude that ppGpp and DksA directly activate csrB expression , at least partially accounting for the stimulatory effect of these regulators on the in vivo expression of this gene [ 2 ] . 
+ However , ppGpp and/or DksA may be indirectly involved in or require a factor that was deficient in our assays for csrC expression . 
+ As mentioned above , DeaD and SrmB activate csrB and csrC transcription by distinct mechanisms [ 4 ] . 
+ While DeaD activates uvrY translation , the mechanism by which SrmB activates csrB/C transcription is still not defined . 
+ Our ChIP-PCR data revealed that SrmB is required for normal binding of UvrY to csrB in vivo ( S2 Fig ) . 
+ This effect of SrmB on UvrY binding occurs without it altering UvrY or UvrY-P levels [ 4 ] . 
+ Because neither ppGpp nor DksA had substantial 
+ RelA on in vivo binding of UvrY to csrB promoter was determined by ChIP-PCR assay in a WT ( MG1655 expressing UvrY-FLAG ) and isogenic ΔdksA , ΔrelA and ΔbarA strains . 
+ Agarose gel showing PCR amplification of csrB promoter region recovered from each strain . 
+ The lacY gene served as a negative control in this experiment . 
+ ( A ) and IhfB-FLAG ( B ) proteins examined in WT ( MG1655 expressing ihfB-FLAG or ihfA-FLAG fusions from the native genomic loci ) and an MG1655 control lacking the FLAG fusions . 
+ Proteins from isogenic strains with ΔdksA , ΔrelA and ΔuvrY disruption are as shown . 
+ Proteins were collected from cultures grown in LB medium to mid-exponential growth phase ( ~ OD600 of 0.6 ) . 
+ RpoB loading controls for these analyses are also shown . 
+ effects on in vivo binding of UvrY to csrB DNA ( Fig 12 ) we infer that the mechanism by which ppGpp and DksA activate csrB/C transcription does not involve SrmB and vice versa . 
+ In silico analysis of other possible targets of UvrY (SirA) binding
+ In many cases , transcription factors compete for binding to their DNA binding sites with other transcription factors , which may play antagonistic roles in the regulation of the target gene ( s ) [ 64 ] . 
+ Hence , we reasoned that there could be additional targets of UvrY/SirA in the genomes of E. coli / Salmonella that were not captured by ChIP-exo . 
+ To test this hypothesis , we performed fusion . 
+ Reactions contained pLFXcsrB-lacZ plasmid ( 2 μg ) , UvrY-P ( 2.3 μM ) , ppGpp ( 250 μM ) and/or DksA ( 2 μM ) as indicated . 
+ Incorporation of 35S-labeled methionine into protein products was detected by SDS PAGE followed by phosphorimaging . 
+ Signal intensity of the full length protein was determined using Quantity One software . 
+ The fold-effects of regulatory factors were determined with respect to the control reaction lacking the factors , after normalization against the internal control , β-lactamase , which was encoded on the same plasmid . 
+ Absolute deviation for each reaction was determined from two independent experiments . 
+ in silico analysis using the Ab Initio Motif Identification Environment ( AIMIE ) database [ 65 ] . 
+ We scanned the E. coli genome using the first six bases of the 18 bp ( TGTAAGNNNNNNCTTACA ) UvrY binding sequence , followed by manually checking the presence of the rest of the IR DNA sequence in the regulatory region of each discovered putative target . 
+ In this way , we identified 19 putative target genes containing the 18-bp UvrY-P IR sequence either perfectly or near perfectly conserved in their regulatory regions ( S10 Fig ) . 
+ Next , we searched for factors known to directly regulate the expression of these putative targets and compared the respective DNA binding motif of each factor with the 18 bp IR UvrY binding motif . 
+ Out of the 19 putative targets , regulatory factors were previously established for 5 of them ( S11 Fig ) . 
+ When we compared the respective DNA binding motifs of each factor with the IR binding motif of UvrY , we found an overlap in all of them ( S11 Fig ) , supporting the possibility that undiscovered direct target sequences for UvrY binding might have gone undetected in our experiments . 
+ Whether these putative target sequences function in UvrY regulation under other growth conditions will require future investigation . 
+ Phylogenetic distribution of BarA, UvrY, CsrA and FliW
+ The BarA-UvrY TCS exerts global effects on gene expression by activating CsrB and CsrC transcription , thus controlling CsrA activity [ 16 , 35 ] . 
+ This signaling pathway is common to the commensal bacterium E. coli , the pathogen Salmonella , as well as a variety of other γ-proteo-bacterial pathogens [ 66 ] . 
+ In contrast , little is known about the workings of the Csr system in other species that possess csrA homologs , but have not been shown to express CsrA-inhibitory sRNAs . 
+ Recently , Mukherjee et al. have shown that the FliW protein of B. subtilis binds to CsrA and antagonizes its activity , thus preventing it from binding to the flagellin mRNA , hag [ 41 , 67 ] . 
+ Because BarA-UvrY is devoted to transcription of CsrB/C sRNAs in E. coli and Salmo-nella and fliW is absent in species known to produce CsrA-inhibitory sRNAs [ 41 ] , we hypothesized that BarA-UvrY and FliW might represent different modules for regulating CsrA activity in different bacterial species . 
+ To test this hypothesis , we examined the phylogenetic distributions of CsrA , BarA-UvrY and FliW across fully-sequenced bacteria ( S3 Table ) . 
+ We found that , for species encoding at least one readily identifiable CsrA ortholog , the presence of Bar-A-UvrY and FliW were strongly anti-correlated at species level ( Spearman correlation = -0.97 , p = 1.10x10-205 ) and genus level ( Spearman correlation = -0.95 , p = 7.31x10-74 ) ( Fig 15 ) . 
+ Of the 346 genomes encoding CsrA , 340 also encoded either BarA-UvrY or FliW , but not both , and only 6 species might encode both BarA-UvrY and FliW systems : Desulfosporosinus acidiphilus , D. meridiei , D. orientis , D. baculatum , Magnetococcus marinus , and Candidatus Sulfuricurvum sp ( Fig 15 and S3 Table ) . 
+ Interestingly , in all cases where both BarA/UvrY and FliW may function , only the BarA component appears to be present , which may indicate that this two-compo-nent system has lost or is losing its function in these species . 
+ Additional experimental data are needed in order to confirm this hypothesis . 
+ Accurate and complete identification of orthologs across a broad taxonomic range is a challenging problem , with potential for both false-positives and false-negatives [ 68 ] . 
+ Although difficult to detect , any strong biases in the methodology we used to identify orthologous genes could affect our results . 
+ To address this concern , we used two additional and widely-used methods to identify alternative ortholog sets : KEGG orthology database ( KO ) [ 53 ] , and Uni-Prot reference clusters of orthologs ( UniRef ) [ 54 ] . 
+ We found that , regardless of the method used to identify orthologs , FliW and BarA-UvrY were strongly negatively correlated , particularly when conditioning on the presence of CsrA ( S3 Table ) . 
+ In addition , we applied sequence similarity filtering and manual curation to reduce false positives and false negatives in the NCBI orthologs data set . 
+ correlation between BarA/UvrY and FliW . 
+ Depicted is a reference phylogeny of fully-sequenced genomes encoding CsrA and at least one of BarA , UvrY or FliW , plotting their presence/absence data obtained from orthology databases . 
+ The results show that the presence of BarA-UvrY and FliW are significantly anticorrelated . 
+ After the sequence similarity filtering and manual curation , we found several false positives for CsrA in the Streptococcus genus , such as misannotated heavy metal stress response proteins ( NCBI identification numbers ( GI ) : 15675047 , 13622200 , 21904470 ) , peptide methionine sulfoxide reductases ( GI : 222114035 , 134272076 , 209540564 , 24638057 , 342165139 , 342165138 ) , and putative CsrA homologs that lack the N-terminal and C-terminal conserved regions of CsrA ( GI : 895760047 , 882844105 , 882819224 ) . 
+ Most of the main false negatives that we identified are BarA and UvrY from genomes that match the representative homologs from E. coli within the similarity and coverage thresholds ( see methods ) such as Pseudomonas ( NCBI genome accession numbers : NC_022594 , NC_022591 , NC_022361 , NC_022360 ) , Xanthomonas ( NC_020815 , NC_017271 , NC_017267 , NC_016010 , NC_013722 ) , Pseudoxanthomonas ( NC_014924 ) , Shewanella ( NC_009052 , NC_009665 , NC_008321 , NC_008322 , NC_017566 , NC_016901 ) , and other members of the γ-Proteobacteria class ( S3 Table ) . 
+ These false positives and false negatives caused a small increase of 2 % in the negative correlation , reinforcing the present conclusions ( Spearman correlation = -0.95 vs. -0.97 , comparing the unfiltered and filtered data sets , respectively ) ( S3 Table ) . 
+ These results argue against meth-odological bias as strongly affecting our results . 
+ Together , these results indicate that the negative correlation between FliW and BarA-UvrY regulatory systems is highly unlikely to be artifactual and thus represents a biologically relevant observation . 
+ The implications of this observation for the possible regulation of CsrA activity by inhibitory sRNAs in FliW-encoding species will require additional investigation to unravel . 
+ Conclusions
+ Using ChIP-exo [ 34 ] , we probed the complete repertoire of UvrY ( SirA ) DNA binding sites in the genomes of E. coli and Salmonella . 
+ We discovered that the csrB/C genes are by far the stron-gest direct targets of UvrY in these species . 
+ UvrY binds specifically to an 18 nt palindromic sequence in the promoter regions of csrB/C and exhibited an almost absolute requirement for phosphorylation by BarA for this binding in vivo under growth conditions examined . 
+ UvrY-P requires IHF for optimal binding to and activation of csrB but not csrC . 
+ CsrA activates csrB/C transcription by activating uvrY expression by an undefined mechanism , which may require the noncoding mRNA leader , but does not involve the other known posttranscriptional regulator of uvrY , the DeaD-box RNA helicase DeaD [ 4 ] . 
+ The RNA DEAD-box helicase SrmB , which also activates csrB/C transcription in E. coli [ 4 ] , promoted binding of UvrY to csrB DNA in vivo without affecting the expression of other factors known to activate csrB expression . 
+ This suggests that SrmB may regulate unknown factor ( s ) involved in csrB transcription . 
+ The stringent response factors ppGpp and DksA activate CsrB/C expression in vivo [ 2 ] and were found to modestly activate csrB expression in vitro in S-30 extracts . 
+ Whether this involves direct binding to RNA polymerase remains to be determined . 
+ Genomic loci that crosslinked weakly to UvrY were identified proximal to the promoter regions of 286 genes in E. coli and 301 genes in Salmonella , respectively ( S2 Table ) . 
+ However , further analysis showed weak or negligible regulatory effects of UvrY on the expression of the genes that were tested ( S6A -- S6C Fig ) . 
+ Hence , we conclude that , under the growth conditions that we have examined , UvrY-P exerts its global effects on gene expression almost entirely by activating the transcription of CsrB and CsrC . 
+ We suspect that most of the genes that have been found to respond to BarA and UvrY in E. coli [ 31 ] and their orthologs in other species [ 30 , 32 ] are indirect targets of UvrY , which are regulated by CsrA . 
+ The BarA-UvrY/Csr signaling pathway has been studied in the commensal bacterium E. coli , the pathogen Salmonella , and a number of other γ-proteobacterial pathogens [ 66 ] . 
+ B. subtilis , the only Gram-positive bacterium in which CsrA has been studied to date , uses FliW as an α-CsrA protein , which binds to and inhibits CsrA activity [ 41 , 67 ] . 
+ While this is only the first such example , FliW is present in diverse species , where it may act as an inhibitor of CsrA . 
+ For species encoding at least one readily identifiable CsrA ortholog , the presence of BarA-UvrY and FliW were strongly anti-correlated . 
+ This suggests that while γ-Proteobacteria use the Bar-A-UvrY TCS to control CsrA by activating the transcription of its sRNA antagonists , members of the β-Proteobacteria , δ-Proteobacteria , ε-Proteobacteria , Firmicutes , Spirochaetes , Thermotogae , Actinobacteria , Nitrospira , Thermosulfobacteria , Deferribacteres , Planctomycetes , Chla-mideae , Acidobacteria , and Synergistia may use FliW to regulate CsrA activity . 
+ Supporting Information
+ S1 Fig . 
+ SirA-FLAG and UvrY-His6 are functional in vivo . 
+ Western blot showing expression of SirA-FLAG ( A ) and Northern blots showing CsrB levels in 14028S ( wild type Salmonella ) , 14028S strain with sirA-FLAG fusion integrated at the native sirA locus and sirA and csrB deletion strains ( B ) . 
+ CsrB levels in MG1655 , uvrY deletion and UvrY-His6 ( expressed from pET24-a expression vector ) E. coli strains ( C ) . 
+ For Western blotting , RpoB loading controls shown ( A ) . 
+ For Northern blotting , the 16S/23S rRNA loading controls are shown . 
+ Cultures were grown in LB to mid-exponential growth phase ( OD600 of 0.6 ) . 
+ ( TIFF ) 
+ S2 Fig . 
+ ChIP specificity confirmation by PCR . 
+ Polymerase chain reaction was used to confirm the specificity of ChIP assay . 
+ Primers ( S1 Table ) annealing to the promoter regions of csrB , lacY and 16S rDNA ( rrsH ) were used to amplify the promoters of csrB , lacY and/or 16S rDNA genes from DNA that was crosslinked and immunoprecipitated from E. coli ( panel A ) or Salmonella ( panel B ) . 
+ In these analysis , csrB was used as a positive control and lacY and 16S rDNA ( rrsH ) were used as negative controls for E. coli and Salmonella , respectively . 
+ ( TIFF ) 
+ S3 Fig . 
+ Effect of BarA , IHF , DksA , RelA and CsrA on UvrY-FLAG and UvrY-FLAG-P levels . 
+ Phos-tag SDS-PAGE with Western blotting was used for detection of the phosphorylated ( P-UvrY-FLAG ) and non-phosphorylated ( UvrY-FLAG ) protein levels expressed in a WT ( MG1655 expressing UvrY-FLAG ) and isogenic ΔbarA , ΔdksA , ΔrelA , ΔihfA , ΔihfB , csrA : : kan and ΔuvrY strain . 
+ Cultures were grown in LB to mid-exponential growth phase ( OD600 of 0.6 ) . 
+ The relative levels and % of phosphorylation of UvrY in the WT , barA , dksA , relA , ihfA , ihfB and csrA are : 1.0 , 1.2 , 1.0 , 1.0 , 1.2 , 1.2 , 0.23 ( UvrY levels ) and 7 % , 1 % , 6.8 % , 6.9 % , 9 % , 9 % and 8 % ( % of UvrY phosphorylation ) , respectively . 
+ ( TIFF ) 
+ S4 Fig . 
+ DNase I footprinting of csrC DNA by phosphorylated and non-phosphorylated UvrY . 
+ A 32P-end labeled DNA probe ( reverse strand ) that included both the upstream and downstream putative UvrY binding sites was used ( Fig 1E ) . 
+ Reactions in all lanes except lanes 1 contained DNase I ( 0.025 U/12 .5 ul reaction ) . 
+ Reactions in lanes 3 -- 6 and lanes 7 -- 10 contained 0.25 , 0.35 , 0.5 , 0.7 μM of phosphorylated and non-phosphorylated UvrY-His6 , respectively . 
+ Lane 2 reaction contained no UvrY . 
+ ( TIFF ) 
+ S5 Fig . 
+ Effect of IHF on UvrY-FLAG levels . 
+ Western blotting of UvrY-FLAG levels in strains MG1655 ( no FLAG fusion ) , WT ( MG1655 expressing UvrY-FLAG ) , and isogenic ΔihfA and ΔihfB strains . 
+ Cultures were grown in LB to mid-exponential growth phase ( OD600 of 0.6 ) . 
+ RpoB loading control is also shown . 
+ ( TIFF ) 
+ S6 Fig . 
+ Examination of putative UvrY targets discovered by ChIP-exo . 
+ Northern blot showing effect of UvrY and Csr factors on SpoT42 sRNA ( A ) . 
+ Cultures of MG1655 and the isogenic mutants indicated were grown in Kornberg medium containing 0.5 % glucose , to early stationary growth phase ( OD600 of 2.0 ) . 
+ The 16S/23S rRNA loading controls are also shown . 
+ Western blot showing effect of uvrY deletion on FhuF-FLAG protein ( B ) . 
+ Cultures were grown in LB to mid-exponential growth phase ( OD600 of 0.6 ) , at which point dipyridyl was added to culture ( 1mM final concentration ) . 
+ Samples were collected before and 10 min after the addition of dipyridyl . 
+ RpoB loading control is also shown . 
+ Electrophoretic gel mobility shift assay showing UvrY binding to cspA , spf , fhuF and csrB DNA ( C ) . 
+ Phosphorylated ( UvrY-P ) UvrY-His6 binding to spf , cspA , fhuF , csrB and rrlE DNA was tested by EMSA as shown . 
+ The spf , cspA , fhuF and csrB DNA probes used in this experiments encompass the ChIP-exo derived putative UvrY binding sites discovered in the promoter region of each gene ( shown in Fig 1 and S2 Table ) . 
+ The DNA probes ( 0.5 nM ) were incubated at room temperature with increasing concentration of in vitro phosphorylated UvrY-His6 protein . 
+ End-labeled 0.5 nM rrlE and 50-fold cold csrB ( for the specific competitor , marked with , 0.65 μM UvrY-P was used ) were also used as non-specific and specific competitors , respectively . 
+ The DNA-protein complexes were resolved in a non-denaturing 7 % polyacrylamide gel . 
+ Shifted protein-DNA complex is indicated in black arrows . 
+ Effect of UvrY on the expression of putative sRNA targets ( D ) . 
+ Northern blots showing effect of UvrY on the expression of several sRNA genes . 
+ Cultures were grown in Kornberg , supplemented with 0.5 % glucose , to mid-exponential ( OD600 of 0.6 ) , transition to stationary ( OD600 of 1.2 ) and stationary growth phases ( OD600 of 3.0 ) . 
+ The 5S rRNA loading control is also shown . 
+ ( TIF ) 
+ S7 Fig . 
+ List of transcription factors known to regulate expression of the ChIP-exo derived putative UvrY targets ( listed in S2 Table ) . 
+ ( TIFF ) 
+ S8 Fig . 
+ Effect of CsrA on UvrY-FLAG protein stability and expression of IHF subunits . 
+ Western blot UvrY-FLAG protein stability ( A ) in MG1655 ( no FLAG fusion ) , WT ( MG1655 with a uvrY-FLAG fusion integrated at the uvrY locus ) and isogenic csrA mutant . 
+ Cells were grown in LB to mid-exponential growth phase ( OD600 of 0.6 ) at which point tetracycline and chloramphenicol were added and cultures were sampled thereafter at the times shown . 
+ Western blotting of IhfA-FLAG and IhfB-FLAG proteins ( B ) examined in MG1655 or an isogenic csrA : : kan mutant expressing ihfA-FLAG or ihfB-FLAG fusions from the native genomic loci ( WT ) . 
+ Cultures were grown in LB to mid-exponential growth phase ( ~ OD600 of 0.6 ) . 
+ RpoB loading controls for these analyses are also shown . 
+ ( TIFF ) 
+ S9 Fig . 
+ In vitro transcription-translation of a supercoiled plasmid-encoded csrC-lacZ fusion . 
+ Reactions contained pLFXcsrC-lacZ ( 4 μg ) , UvrY-P ( 2.3 μM ) , ppGpp ( 250 μM ) and/or DksA ( 2 μM ) as indicated . 
+ Incorporation of 35S-labeled methionine into protein products was detected by SDS PAGE with phosphorimaging . 
+ Signal intensity was determined using Quantity One software . 
+ The fold-effects of regulatory factors were determined with respect to the control reaction lacking these factors , after normalization against β-lactamase as an internal control . 
+ Absolute deviation was determined from two independent experiments . 
+ ( TIFF ) 
+ S10 Fig . 
+ Putative UvrY targets derived by in silico analysis . 
+ 19 Putative UvrY targets were derived by in silico analysis using the Ab Initio Motif Identification Environment ( AIMIE ) database ( 46 ) . 
+ This was done by scanning the E. coli genome using the first six bases of the 18bp-long ( TGTAAGNNNNNNCTTACA ) UvrY consensus binding sequence , followed by manually checking the presence of the rest of the IR DNA sequence in the regulatory region of each discovered putative target . 
+ The predicted UvrY binding sequence in the promoter region of each putative target in comparison to the UvrY consensus sequence is shown . 
+ Distance from the center of the predicted UvrY binding sequence to the known transcription start site ( TSS ) of each putative target is shown . 
+ Nucleotides marked in red are the mismatch between the UvrY consensus sequence and the predicted UvrY binding sequence . 
+ ( TIFF ) 
+ S11 Fig . 
+ Overlap between predicted UvrY binding site and consensus DNA binding site of known regulators of putative UvrY targets derived by in silico analysis . 
+ Shown in this figure are five of the 19 putative UvrY targets identified by in silico analysis ( S9 Fig ) , for which regulatory factors were previously established . 
+ The list of the five putative targets , their respective known regulators , the consensus DNA binding site of each regulator ( capitalized ) and the overlap between the consensus binding site of the known regulator and the predicted UvrY binding site within the promoter of each putative target ( underlined ) is shown . 
+ ( TIFF ) 
+ S1 Table. List of strains, plasmids, bacteriophages and primers used in this study (DOCX)
+ S2 Table. List of putative UvrY-SirA targets identified by ChIP-exo. (XLSX)
+ S3 Table . 
+ Distribution of BarA , UvrY , CsrA , and FliW at species and genus level . 
+ We identified CsrA , BarA , UvrY and FliW orthologs from all fully-sequenced bacterial genomes in the NCBI genomes database ( http://www.ncbi.nlm.nih.gov/genome/ ) . 
+ The results were compared to other orthology databases and filtered by sequence similarity and alignment coverage in order to minimize the number of false positives/negatives ( see Methods ) . 
+ At both species and genus levels , the results exhibited a strong anti-correlation between BarA-UvrY and FliW . 
+ ( XLSX ) 
+ These studies were supported in part by the National Institutes of Health ( R01AI097116 , R01GM059969 , F32AI100322 ) , the Consejo Nacional de Ciencia y Tecnología ( 178033 ) , the DGAPA-UNAM ( IN209215 ) , a University of Florida Alumni Graduate Fellowship ( T.Z. ) and the Guangxi Scholarship Fund of Guangxi Education Department , PR China ( D.T. ) . 
+ Author Contributions
+ Conceived and designed the experiments : TRZ CAV YL AP AHP RD DT BK DG BMMA TR . 
+ Performed the experiments : TRZ CAV YL AP AHP RD DT . 
+ Analyzed the data : TRZ CAV YL AP AHP RD BK BMMA TR . 
+ Contributed reagents/materials/analysis tools : BK BMMA TR . 
+ Wrote the paper : TRZ CAV AP AHP RD BK TR .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/26789284.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/26789284.txt 0 → 100644
View file @27818a9
+ H-NS Facilitates Sequence Diversification of
+ 1 Department of Biological Information , Graduate School of Bioscience and Biotechnology , Tokyo Institute of Technology , Meguro-ku , Tokyo , Japan , 2 Department of Biomedical Informatics , Graduate School of Medicine , Osaka University , Suita , Osaka , Japan , 3 Department of Medical Genome Sciences , Graduate School of Frontier Sciences , The University of Tokyo , Kashiwa-shi , Chiba , Japan , 4 Graduate School of Biological Sciences , Nara Institute of Science and Technology , Nara , Japan , 5 Earth-Life Science Institute , Tokyo Institute of Technology , Meguro-ku , Tokyo , Japan 
+ $ a Current address : Department of Molecular Oncology and Leukemia Program Project , Research Institute for Radiation Biology and Medicine , Hiroshima University , Hiroshima , Hiroshima , Japan $ b Current address : Faculty of Science , Biology Department , Harran University , Sanliurfa , Turkey * torutobe@sahs.med.osaka-u.ac.jp ( TT ) ; ken@bio.titech.ac.jp ( KK ) ; taku@bs.naist.jp ( TO ) 
+ Abstract
+ Bacteria can acquire new traits through horizontal gene transfer . 
+ Inappropriate expression of transferred genes , however , can disrupt the physiology of the host bacteria . 
+ To reduce this risk , Escherichia coli expresses the nucleoid-associated protein , H-NS , which preferentially binds to horizontally transferred genes to control their expression . 
+ Once expression is optimized , the horizontally transferred genes may actually contribute to E. coli survival in new habitats . 
+ Therefore , we investigated whether and how H-NS contributes to this optimi-zation process . 
+ A comparison of H-NS binding profiles on common chromosomal segments of three E. coli strains belonging to different phylogenetic groups indicated that the positions of H-NS-bound regions have been conserved in E. coli strains . 
+ The sequences of the H-NS-bound regions appear to have diverged more so than H-NS-unbound regions only when H-NS-bound regions are located upstream or in coding regions of genes . 
+ Because these regions generally contain regulatory elements for gene expression , sequence divergence in these regions may be associated with alteration of gene expression . 
+ Indeed , nucleotide substitutions in H-NS-bound regions of the ybdO promoter and coding regions have diversified the potential for H-NS-independent negative regulation among E. coli strains . 
+ The ybdO expression in these strains was still negatively regulated by H-NS , which reduced the effect of H-NS-independent regulation under normal growth conditions . 
+ Hence , we propose that , during E. coli evolution , the conservation of H-NS binding sites resulted in the diversification of the regulation of horizontally transferred genes , which may have facilitated E. coli adaptation to new ecological niches . 
+ Introduction
+ The Escherichia coli species consists of genetically diverse strains , for example , in terms of nutrient metabolism , stress responses , and pathogenicity [ 1 ] . 
+ One of the well-known factors causing genetic diversity in bacteria is horizontal gene transfer ; an estimated 10 -- 16 % of genes in E. coli strains have been acquired horizontally [ 2 ] . 
+ However , unregulated expression of newly acquired genes could disrupt the physiology of the host cell [ 3,4 ] . 
+ Both E. coli and Salmo-nella express the protein H-NS , which preferentially binds adenine and thymine ( AT ) - rich DNA [ 3,5 -- 7 ] . 
+ Many horizontally transferred genes ( HTGs ) have a high AT content relative to E. coli genes , which facilitates H-NS binding to , and repression of , the foreign genes [ 8 ] . 
+ This repression guards host cells from potential physiological perturbations caused by expression of HTGs [ 3,5 ] . 
+ Deficiency in the gene hns impairs Salmonella growth during laboratory cultivation [ 9 ] . 
+ Compensatory mutations for this growth impairment have been identified in the gene stpA , encoding StpA , which is the H-NS paralog . 
+ These mutations alter StpA functionality to resemble that of H-NS [ 9 ] . 
+ In addition , loss of virulence genes in the Salmonella pathogenic island-1 ( SPI-1 ) and frameshift and missense mutations in phoPQ , which encodes the positive transcriptional regulator of virulence genes , could also compensate for the fitness loss of hns deficiency [ 9 ] . 
+ Therefore , the major role of H-NS in Salmonella is purportedly the silencing of genes within SPI-1 [ 9 ] . 
+ In addition , H-NS suppresses transcription of pervasive non-coding and antisense sequences in both coding regions and intergenic regions [ 10 -- 12 ] by inhibiting the recruitment of RNA polymerase to promoters , trapping this polymerase at promoters , or inhibiting transcriptional elongation [ 8,10,11,13 -- 17 ] . 
+ However , AT-rich sequences bound by H-NS can be highly expressed when both hns and stpA are disrupted [ 18 ] . 
+ In this scenario , the spurious expression of non-coding and antisense RNAs and the higher expression of AT-rich genes impose high metabolic costs and reduce the fitness of hns-deficient cells [ 11,18 ] . 
+ Furthermore , H-NS can both directly and indirectly regulate global gene expression in E. coli [ 19,20 ] . 
+ Mutations that counter the slow growth observed for the hns/stpA double mutant have been identified . 
+ One mutation inactivates the sigma factor for stress response , namely RpoS , which is involved in the expression of many genes induced by the hns/stpA double mutation . 
+ The other mutation amplifies ~ 40 % of the E. coli chromosome centered near the origin of replication , which causes remodeling of the transcriptome and partially reverses the imbalance in global gene expression caused by the double mutation [ 21 ] . 
+ Interestingly , the transcriptional repression activity of H-NS is affected by the location of H-NS binding sites throughout the E. coli chromosome . 
+ H-NS is a strong repressor of the hns promoter when this promoter is ectopically placed in the Ter or Left macrodomain of the chromosome [ 22 ] . 
+ It is also known that environmental factors , such as pH , temperature , and osmolarity , can alter H-NS-mediated gene repression [ 23 ] . 
+ Hence , a change in environmental conditions , i.e. , an abiotic stressor , can activate a large number of genes that normally are repressed by H-NS , thereby potentiating the stress response . 
+ [ 5,23 ] . 
+ Any HTG should be expressed only when its function is beneficial to the host bacteria . 
+ However , transcriptional regulators are not well conserved and transcriptional networks are highly diversified among bacterial species [ 24 ] . 
+ For acquired genes , therefore , the regulation that occurs via a host-cell transcriptional regulator ( s ) and/or regulatory element ( s ) would need to be optimized [ 25 ] . 
+ It has been suggested that , in bacteria , such optimization requires a long time , and this is accomplished through several steps : 1 ) upon integration of the HTG ( s ) into the host genome , the initial expression would be lower than for native host genes ; 2 ) a host-cell activator is required to express HTGs ; and 3 ) the expression of the transferred genes must be fine-tuned to match the needs of host cells [ 25 ] . 
+ On the other hand , Dorman [ 3 ] proposed that H-NS-mediated repression of HTGs could be an effective way to reduce the risk of inappropriate expression of such genes until expression could be optimized . 
+ Although H-NS-mediated repression of virulence genes , which are HTGs , may reduce the fitness cost raised by the expression of virulence genes and contributes to the evolution of the Salmonella [ 9 ] , it remains unclear whether H-NS actually contributes to the optimization of expression of transferred genes so as to benefit host cells . 
+ The aim of our study was to improve our knowledge of how H-NS contributes to the integration of HTGs into E. coli . 
+ Genome-wide H-NS binding profiles were recently obtained with the E. coli K-12 genome using chromatin immunoprecipitation ( ChIP ) - chip and ChIP-seq analyses [ 19,26 -- 28 ] . 
+ Using this information , it is possible to examine the conservation/diversification of H-NS-bound regions within the E. coli genome during evolution . 
+ Hence , we used chromatin affinity precipitation ( ChAP ) - seq to compare H-NS-bound regions within the genomes of genetically diverse E. coli strains belonging to different subgroups , specifically , laboratory strain K-12 ( subgroup A ) , commensal strain SE11 ( subgroup B1 ) , and commensal strain SE15 ( subgroup B2 ) [ 29,30 ] . 
+ This analysis enabled us to investigate the influence of H-NS binding on the diversification of genomic sequences . 
+ Our analysis suggests that the distribution of H-NS-bound regions within E. coli genomes has been highly conserved during evolution . 
+ In addition , sequence diversity in the H-NS-bound regulatory regions tended to be greater than in H-NS-unbound regulatory regions . 
+ Hence , we propose that transcriptional repression by H-NS increases the propensity for nucle-otide substitutions in transcriptional regulatory regions of HTGs , which may alter the expression of transferred genes to facilitate adaptation of E. coli cells to new habitats . 
+ Results
+ ChAP-seq analysis of H-NS-bound regions in three E. coli strains
+ Phylogenetic analysis has indicated that group B2 is the ancestral phylogroup in the E. coli line-age , whereas groups A and B1 have diverged [ 31 -- 33 ] . 
+ To assess the impact of E. coli evolution on H-NS binding , we comprehensively compared the localization of H-NS-bound regions on chromosomes among the three E. coli strains K-12 ( group A ) , SE11 ( group B1 ) , and SE15 ( group B2 ) . 
+ Notably , the amino acid sequence of H-NS is completely conserved among these strains . 
+ We created H-NS-12His-expressing recombinant K-12 ( W3110 ) , SE11 , and SE15 strains and determined H-NS binding profiles on the chromosomes for the three strains using ChAP-seq . 
+ Each strain was grown to mid-log phase ( OD600 0.4 ) in LB medium under aerobic condition and treated with formaldehyde to crosslink H-NS-12His to DNA , followed by ChAP of the crosslinked DNA fragments with H-NS , as described [ 34 ] . 
+ Purified DNA from ChAP and whole-cell extract ( WCE ; pre-ChAP ) was subjected to high-throughput Illumina sequencing , and the H-NS-bound regions were determined ( See details in Materials and Methods ) . 
+ We performed duplicate ChAP analyses for each strain , and the H-NS binding profiles were highly reproducible ( Fig 1A ) . 
+ Thus , we defined overlapping regions of H-NS binding regions in duplicate ChAP analyses as reproducible H-NS binding regions , and used these defined regions in subsequent analyses . 
+ We identified H-NS-bound regions covering 802,561 bp in SE11 , 642,859 bp in SE15 , and 697,762 bp in K-12 , corresponding to 14 -- 16 % of each genome ( Table 1 ) . 
+ Comparison of H-NS binding profiles from the three E. coli strains
+ To compare the H-NS-bound regions among the E. coli strains , we aligned the three chromosome sequences using the Mauve program developed for the multiple alignment of bacterial chromosome sequences [ 35,36 ] . 
+ We identified the `` common '' ( conserved in all three strains ) , `` shared '' ( conserved in two strains ) , and `` specific '' ( unique for each strain ) chromosome segments ( Table 1 ) . 
+ Whereas the common segments would have been in the ancestral genome before divergence of the E. coli lineage , the `` shared '' and `` specific '' segments would have become integrated in the E. coli genome after the divergence . 
+ We calculated the proportions of H-NS-bound regions in each of the `` common '' , `` shared '' , and `` specific '' segments of the three strains ( Table 1 ) . 
+ The proportion of H-NS-bound sequences was higher in the `` specific '' and `` shared '' segments ( ~ 30 -- 38 % ) than in the `` common '' segments ( ~ 10 -- 12 % ) , suggesting that many genes in the `` specific '' and `` shared '' regions were horizontally transferred during E. coli evolution and retained preferential binding to H-NS . 
+ H-NS-bound regions on `` common '' genome segments are conserved in the E. coli strains 
+ Although many of the `` specific '' and `` shared '' segments were bound by H-NS , more than half of the H-NS-bound regions were located within `` common '' segments , with similar total length among the chromosomes of SE11 ( 451,643 bp ) , SE15 ( 383,226 bp ) , and K-12 ( 427,731 bp ) ( Table 1 ) . 
+ Specifically , 76.2 % ( SE11 ) , 89.8 % ( SE15 ) , and 80.4 % ( K-12 ) of H-NS-bound regions in `` common '' segments overlapped among the three strains ( Fig 1B left [ Common ] and Fig 1C -- 1F ) . 
+ In addition , very few H-NS-bound regions in `` common '' segments ( 3.4 % to 7.7 % ) were identified as unique in each strain , and the remainder of the binding regions were shared by two strains ( Fig 1B left [ Common ] ) . 
+ We manually examined these unique and shared H-NS-bound regions and found that most of these regions ( 84 % of the unique and shared H-NS-bound regions in common segments ) had H-NS binding signals on a certain level in all three strains , although signal intensities were below the threshold to be categorized as H-NS-bound regions in one or two strains . 
+ We concluded that the H-NS-bound regions in `` common '' segments are highly conserved in the three strains . 
+ It has been reported that H-NS binding to orthologous genes in E. coli and Salmonella is highly conserved [ 19 ] . 
+ This and our current result indicate that the H-NS-bound regions have been retained in the E. coli lineage during evolution . 
+ Notably , the H-NS-bound regions within `` shared '' segments between two strains are also conserved ( 85.0 -- 94.7 % , Fig 1B right [ Shared ] ) . 
+ Non-synonymous sites in H-NS-bound genes evolve faster than those in H-NS-unbound genes 
+ We concluded that the H-NS binding in `` common '' segments has been conserved during the evolution of E. coli . 
+ Therefore , we were interested in the effects of the conservation of H-NS binding on sequence diversification/conservation among the E. coli genomes . 
+ We initially compared sequence diversities between the H-NS-bound and - unbound orthologous genes . 
+ OrthoMCL was used to search for conserved orthologs that are present in SE15 , SE11 , and K-12 and at least 37 other E. coli strains , of the 44 strains in the curated non-redundant genome collection of reference sequences ( RefSeq ) at NCBI , when we started this analysis [ 37 ] ( See details in Materials and Methods and S1 Fig ) . 
+ Then , 2,702 genes were selected as being well conserved orthologs ( S2 Table ) , and these were used to estimate the synonymous ( dS ) and non-synonymous ( dN ) substitution rates based on multiple sequence alignment . 
+ Genes among these were defined as H-NS bound if their coding regions overlapped with H-NS-bound regions identified in at least one of the SE15 , SE11 , and K-12 strains as determined by ChAP-seq analysis . 
+ As expected , dS was higher than dN for the orthologous genes regardless of H-NS binding ( see sequence diversity scales of Fig 2A and 2B ) , whereas dN in the H-NS-bound genes tended to be higher than that in the H-NS-unbound genes ( Fig 2A ; p < 0.001 , Wilcoxon rank-sum test ) . 
+ In contrast , the dS between H-NS-bound and - unbound genes was not significantly different ( Fig 2B ; p = 0.08 ) . 
+ These observations indicated that the non-synonymous sites in the H-NS-bound genes evolved faster than those in the H-NS-unbound genes . 
+ Because H-NS preferentially binds to horizontally transferred genes ( HTGs ) [ 3,6,7,27 ] , this apparent faster evolution of non-synonymous sites in H-NS-bound genes could simply reflect the rapid evolution of genes recently transferred to host cells , which was indicated in the Bacillus cereus group [ 38 ] and E. coli lineages [ 39 ] . 
+ To assess the effect of H-NS binding and horizontal transfer , orthologous genes were classified into HTGs which were estimated as HTGs in at least one of previous predictions [ 40 -- 42 ] or Core genes ( other non-HTGs ) , and the tendency of dS and dN in each class of H-NS-bound and -- unbound genes was evaluated . 
+ The dN of HTGs with or without 
+ H-NS binding was greater than that of Core genes ( Fig 2C ) , which is consistent with previous observations [ 38,39 ] . 
+ In addition , dN of H-NS-bound Core genes was greater than that of H-NS-unbound Core genes ( Fig 2C Core genes ; p < 0.001 ) . 
+ Furthermore , dS of H-NS-bound Core genes was also greater than that of H-NS-unbound Core genes ; this difference in dS was smaller than that of dN , but statistically significant ( Fig 2D Core genes ; p = 0.0072 ) . 
+ These results indicated that the non-synonymous and synonymous sites in H-NS-bound Core genes evolve faster than those in H-NS-unbound Core genes in the E. coli lineage . 
+ In contrast , dN of H-NS-bound and - unbound HTGs indicated no significant difference ( Fig 2C HTGs ; p = 0.097 ) . 
+ However , the variance of dN of H-NS-bound HTGs and that of H-NS-unbound HTGs were significantly different ( Fig 2C HTGs ; p = 0.010 , Levene 's test ) . 
+ As shown in Fig 2C , the 75th percentile of dN for H-NS-bound HTGs was shifted upward compared with that for H-NS-unbound HTGs ( Fig 2C HTGs ; compare the height of the upper edges in boxes and whiskers for H-NS-bound [ red ] and - unbound HTGs [ gray ] ) , suggesting that dN of a certain fraction of H-NS-bound HTGs tended to be greater than that of H-NS-unbound HTGs . 
+ These results suggested that the observed larger dN for H-NS-bound regions did not result only from the tendency of HTGs to evolve rapidly . 
+ H-NS-bound Core genes may have been horizontally transferred in ancient ancestors of E. coli
+ To characterize H-NS-bound Core genes , we investigated the conservation of each class of genes in proteobacteria classified into the same family , the same class , or the same phylum with E. coli , using the ortholog table acquired from the Microbial Genome Database for Comparative Analysis ( MBGD ) [ 43 -- 46 ] . 
+ The results indicated that H-NS-bound Core genes have been less conserved in proteobacteria than H-NS-unbound Core genes , but more conserved than H-NS-bound HTGs ( Fig 2E ) . 
+ This result suggested that H-NS-bound Core genes were acquired by ancient ancestors of E. coli . 
+ In contrast , the conservation of H-NS-bound HTGs was lowest in bacteria belonging to the same family as E. coli , suggesting that the genes were more recently acquired by ancestors of E. coli . 
+ To evaluate whether the adaptation of H-NS-bound Core genes to host cells could be assessed based on gene expression level , quantitative RNA-seq data [ 47 ] were analyzed ( Fig 2F ) . 
+ This analysis revealed that the expression of both H-NS-bound and - unbound Core genes was greater than that of H-NS-bound and - unbound HTGs , respectively ( Fig 2F ; p < 0.001 ) . 
+ This suggested that H-NS-bound Core genes have adapted to host cells . 
+ However , the expression level of H-NS-bound Core genes tended to be lower than that of H-NS-unbound Core genes ( Fig 2F ; p < 0.001 ) . 
+ Interestingly , the analysis indicated that cellular protein level , rather than functional category , essentiality , or metabolic cost of a protein 's amino acid composition , has been the principal driving force constraining non-synonymous substitutions [ 48 ] . 
+ Therefore , one possible explanation for the tendency of a higher dN in the H-NS-bound Core genes than in H-NS-unbound Core genes might be the H-NS-mediated transcriptional repression of H-NS-bound Core genes . 
+ The H-NS-bound intergenic regions evolve faster than the H-NS- unbound intergenic regions
+ To investigate the relationship between H-NS binding and the evolution of the intergenic regions , we compared sequence diversity between the H-NS-bound and - unbound intergenic regions . 
+ To avoid spurious alignments of the intergenic regions caused by recombination , insertion , or deletion , we selected the `` conserved '' intergenic regions , i.e. , those that were 10 -- 300 bp and were located between two neighbouring orthologous genes in E. coli strains . 
+ In addition , after the multiple alignment of each conserved intergenic region , if there was a difference of 10 % in the length of the aligned sequence with at least one strain , the region was considered as a region with an insertion/deletion and it was removed from the set of `` conserved '' intergenic regions . 
+ Furthermore , after the likelihood phylogenetic analysis , the intergenic regions that showed too large an evolutionary distance for accurate alignment ( evolutionary distance > 1.0 ) were removed from the analysis . 
+ Ultimately , 703 intergenic regions , which included 94 H-NS-bound intergenic regions , were selected for the purpose of calculating sequence diversity ( S3 Table ) . 
+ The results indicated that sequence diversity in H-NS-bound intergenic regions tended to be higher than in H-NS-unbound intergenic regions ( Fig 3A ; p < 0.001 ) , suggesting that the H-NS-bound intergenic regions have evolved faster than the H-NS-unbound intergenic regions . 
+ Greater sequence diversity of H-NS-bound intergenic regions is observed only in intergenic regions upstream of genes
+ In general , H-NS functions as a transcriptional repressor in E. coli [ 8 ] . 
+ We investigated whether the higher sequence diversification in the H-NS-bound intergenic regions is related to the regulation of gene expression . 
+ We categorized the intergenic regions into two classes ( Fig 3B ) based on the assumption that the regulatory elements for transcription ( i.e. , promoters and binding sites of transcriptional regulators ) are more frequently present upstream of genes than downstream of genes . 
+ Class I was defined as the region sandwiched between the tails ( 3 ' ends ) of two convergently transcribed genes , representing the non-regulatory intergenic region ( Fig 3B ) ; class II included two subtypes , namely the region sandwiched between the heads ( 5 ' ends ) of two divergently transcribed genes ( head-to-head region ) or that between the tail and the head of two genes ( tail-to-head region ) , representing the regulatory intergenic regions ( Fig 3B ) . 
+ Then , we compared the sequence diversification between the H-NS-bound and - unbound regions in each class . 
+ The sequence diversity of the class I regions tended to be greater than that of the class II regions ( Fig 3C ; p < 0.001 ) . 
+ In addition , there was no significant difference in sequence diversity between the H-NS-bound and - unbound class I regions ( Fig 3D , class I ; p = 0.29 ) . 
+ In contrast , the sequence diversity in the H-NS-bound regions tended to be greater than in the H-NS-unbound regions within the class II regions ( Fig 3D , class II ; p < 0.001 ) . 
+ These results suggested that the regulatory intergenic regions have evolved slower than non-regulatory intergenic regions , whereas the H-NS-bound regions have evolved faster than the H-NS-unbound regions among the regulatory intergenic regions . 
+ In addition , we extracted the horizontally transferred intergenic regions ( HTG-intergenic ) sandwiched by HTGs and core intergenic regions ( Core-intergenic ) sandwiched by Core genes , respectively , from the class II intergenic regions to evaluate any difference in the effects of H-NS binding on sequence diversification of HTG - and Core-intergenics . 
+ To avoid mixing the Core-intergenic and HTG-intergenic characteristics , which might have occurred in the intergenic regions between Core genes and HTGs , we used the intergenic regions that were uniquely sandwiched only by HTGs or Core genes , as `` HTG-intergenic '' or `` Core-intergenic '' , respectively . 
+ The sequence substitution rates for H-NS-bound HTG-inter-genic were higher than that for H-NS-unbound HTG-intergenic ( Fig 3E ; p = 0.031 ) . 
+ This tendency was also observed in Core-intergenics ( Fig 3E ; p < 0.001 ) . 
+ We thus concluded that the higher sequence substitution rates of H-NS-bound class II intergenic regions could not be explained exclusively by the rapid adaptation of the regulatory regions of recent HTGs . 
+ Evaluation of the effects of sequence substitutions on transcriptional regulation in H-NS-bound regions
+ Our analysis indicated that the sequence substitution rate of H-NS-bound regulatory regions was higher than that of H-NS-unbound regulatory regions . 
+ We hypothesized that these sequence substitutions in H-NS-bound transcriptional regulatory regions could alter the expression of HTGs . 
+ To test this , we selected one of the H-NS-bound HTGs , namely ybdO , which has a large number of sequence substitutions in the upstream intergenic and coding regions ( within the rank of top 50 for sequence substitution rate in the class II and coding regions , S1 Fig ) , and seems to be a single cistron in strains SE11 , SE15 and K12 . 
+ In addition , the H-NS binding profile encompassing the upstream and/or coding regions of ybdO was highly conserved among strains SE11 , SE15 , and K-12 , suggesting that H-NS represses ybdO expression in these strains ( S2A Fig ) . 
+ Thus , the effects of sequence substitutions within ybdO on its transcriptional regulation were examined . 
+ Although the transcription start site of ybdO in K-12 was recently identified [ 49 ] ( Fig 4A ) , the transcriptional regulation of ybdO has not been thoroughly investigated . 
+ We , therefore , identified transcriptional regulatory elements for ybdO . 
+ We systematically constructed ybdO-lac operon fusions on the low-copy-number plasmid , pRW50 [ 50 ] , by inserting DNA segments containing upstream intergenic regions and the 5 ' - proximal coding region of ybdO or its deleted derivatives ( Fig 4A ) . 
+ The activities of the ybdO promoters from different E. coli strains were monitored using the recombinant pRW50 plasmids introduced into the E. coli K-12 wildtype and the hns mutant strains . 
+ The presence of the Shine-Dalgarno sequence for the lac operon on the plasmids implies that the β-galactosidase activity of transformants represented the transcriptional activity of the particular DNA segment inserted into pRW50 . 
+ First , we examined the β-galactosidase activity for ybdO promoters from SE11 , SE15 , and K-12 in cloned L2 fragments , which contained the region from -- 250 bp to +239 bp ( Fig 4A , L2 , nucleotide positions are relative to the first nucleotide of the initiation codon [ +1 ] of K-12 ybdO ) in growing cells . 
+ We found that transcription from the ybdO promoters was maximally induced from the early stationary phase in LB medium ( S2B Fig ) . 
+ In addition , ybdO transcription in all strains was higher in the hns mutant cells compared with wild-type cells ( S2B Fig ) , suggesting that H-NS repressed ybdO transcription in all strains . 
+ We also determined transcription start sites of ybdO in SE11 and SE15 during the early stationary phase using 5 ' - RACE as described in Materials and Methods . 
+ The 5 ' end of SE11 and SE15 ybdO mRNAs was mapped at 1 bp downstream of the transcription start site of K-12 ybdO ( S3A and S3B Fig ) , which localized at 107 bp upstream from the initiation codon of ybdO [ 49 ] ( S3B Fig ) . 
+ The results suggested that the promoters of ybdO in the three strains overlap ( S3B Fig ; putative -- 10 element is indicated by a red line ) . 
+ We then looked closely at regions both upstream and downstream of the ybdO promoter , which revealed a number of sequence substitutions in the promoter proximal region among E. coli strains ( S1B Fig ) . 
+ We also determined the elements necessary for H-NS dependent repression by comparing the activities of ybdO-lac operon fusions with systematic deletions in the wild-type and hns mutant strains . 
+ The results indicated that deletions of two specific regions , namely upstream ( from -- 250 to -- 176 bp ) and downstream ( from +27 to +164 bp ) of the region of the genome surrounding the ybdO promoter in SE11 , SE15 , and K-12 , enhanced β-galactosidase activity in the wild-type cells ( Fig 4B -- 4D , compare blue bars of L2 and L3 , R2 and R1 ) . 
+ In addition , comparison of transcriptional activities of L3 and R1 fragments in the wild type cells with those in the hns mutant indicated that H-NS-mediated repression was abolished or weakened in L3 and R1 fragments in the wild type cells ( Fig 4B -- 4D , compare blue bars with red bars in L3 and R1 ) , indicating that there are H-NS-dependent negative transcriptional regulatory elements in these regions . 
+ We concluded that H-NS represses ybdO expression dependent on these two specific regions -- upstream and downstream regulatory regions ( URE and DRE ) -- which are in the same location in each of the three E. coli strains ( Fig 4A , bottom of the panel , H-NS-dependent regions ) . 
+ URE and DRE are required for H-NS-mediated repression of the bgl and proU operons and repression via URE and DRE is synergistic in both operons [ 51 ] . 
+ H-NS may bind both URE and DRE to form a bridge and a stable nucleoprotein complex with consequent spreading of H-NS binding away from the high-affinity H-NS binding sites [ 51 ] . 
+ The URE and DRE of ybdO may also function in a manner similar to that of the URE and DRE for the bgl and proU operons with respect to the effect of H-NS binding . 
+ The β-galactosidase assays of the systematic deletions surrounding the ybdO promoter also indicated that there are sequences involved in repression of promoter activity independent on H-NS . 
+ The β-galactosidase activity for fragment R2 of SE15 was greater than that for fragment F of SE15 in the hns mutant cells ( Fig 4C , compare red bars in R2 and F ) , suggesting that the region from +164 bp to +239 bp is sufficient to reduce ybdO transcription independent of H-NS in SE15 . 
+ In contrast , in the case of SE11 and K-12 , deletion of the same region did not increase the activity for fragment F in the hns mutant cells ( Fig 4B and 4D , compare red bars in R2 and F ) . 
+ Rather , deletion of the region from +27 bp to +164 bp ( fragment R1 lacking +27 bp to +164 bp in fragment R2 and +27 bp to +239 bp in fragment F ) increased β-galactosidase activity ( Fig 4B and 4D , compare red bars in R1 and F ) , suggesting that this region reduces ybdO transcription independent of H-NS in SE11 and K-12 . 
+ These results indicated that there are H-NS-independent transcriptional regulatory elements that reduce ybdO transcription , and that the location of these elements differs in the ybdO loci of SE11 and K-12 , and SE15 ; these elements were designated as negative elements ( NE , Fig 4A , bottom panel ) . 
+ The β-galactosidase activity for the longest DNA fragment , F , of SE15 was ~ 2-fold higher than that for SE11 and K-12 in the hns mutant cells ( Fig 4B , 4C and 4D , compare red bars for F of Fig 4C to those of Fig 4B and 4D ) , whereas the fragment LR , lacking negative elements ( URE , DRE and NE ) , showed similar β-galactosidase levels amongst all strains ( Fig 4B , 4C and 4D , compare red and blue bars for LR of Fig 4B or 4C to those of Fig 4D ) . 
+ This suggested that in addition to the difference in the locations of NEs for SE15 , and SE11 and K-12 , the ability of NEs to reduce transcription in SE15 , and SE11 and K-12 differed . 
+ To confirm the different effects of NEs on the promoter activity , we constructed hybrid DNA fragments of the upstream and downstream regions of ybdO promoter for SE11 and SE15 ( Fig 4E ) . 
+ As seen in Fig 4F , the transcription for all hybrid fragments containing the SE11 coding region ( Fig 4F , red bars in lanes a -- d ) tended to be lower than all hybrid fragments containing the SE15 coding region in the hns mutant cells ( Fig 4F , red bars in lanes e -- h ) . 
+ We thus concluded that the diversity of ybdO transcription between SE11 and SE15 is a consequence of sequence divergence downstream of the ybdO promoter , including NEs . 
+ Discussion
+ In this analysis , we determined that H-NS-bound regions in E. coli genome have been highly conserved during E. coli evolution . 
+ This is supported by the previous finding that H-NS-bound genes are conserved in Salmonella and E. coli [ 19 ] . 
+ Phylogenetic analysis indicated that the sequence diversity in H-NS-bound regions tended to be greater than that in H-NS-unbound regions . 
+ This tendency was limited to the regulatory intergenic regions ( upstream of genes ) and coding regions , in which transcriptional regulatory elements often exist . 
+ These findings suggest that H-NS-bound regulatory regions are much freer to evolve than H-NS-unbound regulatory regions because H-NS-mediated repression of genes would reduce the negative impact of sequence substitutions for instances in which such substitutions result in altered expression and/or function of genes that are toxic to host cells . 
+ We have also evaluated whether sequence diversity in H-NS-bound regions contributes to variations in transcription using ybdO as a test gene . 
+ The results indicate that transcription of ybdO differs among E. coli strains and that ybdO expression is repressed by H-NS in wild-type E. coli . 
+ This observation supports our hypothesis that sequence substitutions in H-NS-bound regions contribute to the observed diversity of transcriptional regulation of H-NS-bound genes among E. coli strains , which may provide E. coli strains the opportunity to adapt to new habitats by integrating HTGs . 
+ Interestingly , the H-NS-bound orthologous genes located within the `` common '' segments among SE11 , SE15 , and K-12 significantly overlapped with HTGs ( p < 0.001 , Fisher 's exact test ; S2 Table ) , which were predicted as HTGs based on at least one prediction method [ 40 -- 42 ] . 
+ We have also showed , that , in proteobacteria , H-NS-bound Core genes were less conserved than H-NS-unbound Core genes ( Fig 2E ) , suggesting that the H-NS-bound Core genes tend to be genes acquired by ancestors of E. coli . 
+ These observations suggest that H-NS-bound genes located within the `` common '' segments were horizontally transferred into the ancestors of E. coli , and these genes persist in contemporary E. coli strains . 
+ Our analysis reveals that the tendency for greater sequence divergence of H-NS-bound intergenic regions compared with those in H-NS-unbound intergenic regions has been limited to regions upstream of genes ( class II intergenic regions ) . 
+ This relative greater sequence diversity of H-NS-bound intergenic regions was observed in both types of intergenic regions : HTG-intergenic regions sandwiched by HTGs , and Core-intergenic regions sandwiched by Core genes ( Fig 3E ) . 
+ Therefore , the relatively greater sequence diversity in the H-NS-bound class II intergenic regions can not be explained only by the rapid adaptation of horizontally transferred DNAs to host cells . 
+ It is likely that , compared with H-NS-unbound class II intergenic regions , H-NS has made H-NS-bound class II intergenic regions much freer to evolve by repressing the expression of HTGs . 
+ It was difficult to clearly determine the contribution of H-NS binding to the observed greater dN values calculated for H-NS-bound genes . 
+ We found that the dN values for H-NS-bound Core genes were significantly greater than that for H-NS-unbound Core genes . 
+ This can be simply explained by the apparently slower evolution of the H-NS-unbound Core genes because these include many essential genes , including `` information '' proteins , e.g. , translationrelated proteins that have evolved at a significantly slower rate compared with metabolic proteins including those encoded by HTGs [ 48 ] , and H-NS-bound Core genes may have been horizontally transferred in ancient ancestors of E. coli . 
+ Interestingly , we found that the dS values for H-NS-bound Core genes were also greater than those of H-NS-unbound Core genes ( Fig 2D , Core genes ) . 
+ In addition , the expression of H-NS-bound Core genes tended to be lesser than that of H-NS-unbound Core genes ( Fig 2F , Core genes ) . 
+ It was known that the dN and dS values for low-expression genes are greater than those of high-expression genes [ 48 ] . 
+ Therefore , H-NS-mediated repression may increase the sequence diversification of H-NS-bound genes by reducing the expression of H-NS-bound genes . 
+ Furthermore , there are H-NS-bound HTGs that have a greater dN than many H-NS-unbound HTGs ( Fig 2C ) . 
+ Taken together , our results suggest that H-NS-mediated repression contributes , at least partially , to the observed higher rate of sequence substitution in H-NS-bound coding regions compared with H-NS-unbound coding regions . 
+ Recent work indicated that the average mutation rate in regions bound by one of four E. coli nucleoid association proteins ( NAPs ) , H-NS , Fis , IHF-A , IHF-B , in the E. coli genome , is lower than that of NAP-unbound regions [ 52 ] . 
+ In contrast to the analysis by Warnecke et al. , our analysis indicated that the rate of sequence substitution in H-NS-bound regions was higher than that of H-NS-unbound regions . 
+ In our analysis , the effects of H-NS binding were limited to class II intergenic regions and coding regions , while Warnecke et al. reported an average of sequence substitutions at four-fold non-synonymous sites in coding and intergenic regions [ 52 ] . 
+ Therefore , the apparent discrepancy between our results and those of Warnecke et al. may be a consequence of differences in the genes and protein binding regions used for the two analyses . 
+ We also evaluated whether the sequence diversity in H-NS-bound regions could alter transcription of the affected genes . 
+ This indeed was the case for at least one of the H-NS-bound genes , namely ybdO . 
+ We identified H-NS-independent NEs in the coding regions of ybdO , whose locations and activities differed among E. coli strains ( Fig 4A ) . 
+ Although further analyses are needed to reveal the molecular mechanism by which an NE inhibits ybdO transcription , our results suggest that sequence substitutions downstream of ybdO promoters , including NE , dictate the ybdO transcription level . 
+ Recently , hundreds to ~ 20,000 RNA polymerase ( RNAP ) pause sites were identified in exponentially growing E. coli cells , and it was suggested that RNAP pausing is one of the common mechanisms by which gene expression is controlled [ 53 -- 55 ] . 
+ It is difficult to directly evaluate the possibility that RNAP will pause at NEs based on the data from those studies because ybdO expression remained low in exponentially growing cells . 
+ Nevertheless , differential pausing of the transcription machinery at NE sites constitutes one possible explanation for the observed variation in NE potency among E. coli strains . 
+ The assignment of transcription start sites for ybdO in SE11 , SE15 , and K-12 indicated that the location of the ybdO promoter is conserved among E. coli strains , although we found that the nucleotide sequences in ybdO promoter proximal regions were different ( S3B Fig ) . 
+ Although we could not find any typical transcriptional regulator that recognizes sequences affected by substitutions near the ybdO promoter , such substitutions would provide the opportunity to acquire positive regulation because it has been shown that , during evolution , HTGs acquired positive regulation when they became integrated in the host transcriptional network [ 25 ] . 
+ Because HTGs have contributed to the evolution of host-cell metabolic networks that allow adaptation to new environments [ 56 ] , further investigation of ybdO transcriptional regulation under different growth conditions , e.g. , in minimal medium , will be needed to clearly define the effects of sequence substitutions on ybdO promoter function . 
+ In our present study , the β-galactosidase assay did not allow us to directly evaluate whether H-NS-mediated repression is crucial for introducing sequence substitutions that alter the transcriptional regulation of HTGs . 
+ It is possible that H-NS directly enhances the sequence substitution rate in class II intergenic regions and coding regions by unknown mechanisms . 
+ To delineate the importance of H-NS-mediated repression in the evolution of the transcriptional regulation , further investigations must directly evaluate the relationship between transcriptional repression and sequence substitutions , i.e. , in vitro evolution experiments using the hns deletion mutant . 
+ It has been reported that variance in gene expression contributes to the heterogeneity of E. coli strains , which could potentiate the ability of E. coli strains to adapt new ecological niches . 
+ The mat ( meningitis-associated and temperature regulated ) fimbrial gene cluster is conserved across many E. coli strains [ 57 ] . 
+ However , B2 group strains have acquired the ability to express mat genes despite H-NS-mediated repression at low temperature , low pH , and high acetate concentration , conditions under which mat is not expressed in strains of groups A and B1 [ 57 ] . 
+ Differences in mat regulation among E. coli strains is caused by polymorphisms in gene promoters repressed by H-NS [ 57 ] . 
+ Thus , mat and ybdO might exemplify the biological importance of sequence diversity in H-NS-bound regions for adaptation of E. coli strains to different ecological niches . 
+ Based on our observations , we hypothesize that H-NS-mediated repression helps HTGs to adapt their transcriptional regulation to the local environment for host E. coli strains by accelerating the rate of sequence polymorphism in H-NS-bound regulatory regions . 
+ This hypothesis is supported by the finding that the optimization of HTG expression was initially found to occur via the evolution of regulatory regions rather than coding regions [ 58 ] . 
+ Our results support the proposal that H-NS-mediated repression is a valuable mechanism by which host cells can integrate HTGs into the host transcriptional regulatory network [ 3 ] . 
+ The primers used in this study are listed in S4 Table.
+ Construction of strains used for ChAP-seq experiments and the β- galactosidase assay
+ Strains used in this study are listed in S5 Table . 
+ To generate the K-12 ( W3110 ) derivative expressing H-NS C-terminally tagged with 12 histidines ( 12His ) , we used a modified one-step gene inactivation method [ 59 ] . 
+ Plasmid pSTV28-C-12His , which was kindly provided by Dr. Mika Yoshimura , was constructed by inserting the chemically synthesized 12His coding sequence and a kanamycin resistance gene derived from plasmid pKD4 [ 59 ] into the multiple cloning site of pSTV28 ( Takara Bio , Japan ) . 
+ We amplified a DNA fragment containing the 12His sequence flanked with the Arg-Gly-Ser linker and kanamycin resistance gene by PCR using pSTV28-C-12His and the TOP705-TOP706 primer set . 
+ To facilitate insertion of the PCR product into the chromosome , we added a ~ 70-bp sequence of the hns coding region and its downstream region to the TOP705 and TOP706 primer sequences , respectively . 
+ The BW25113 cells harboring pKD46 encoding Red recombinase [ 59 ] were transformed with the amplified DNA fragment , and transformants in which linker and 12His sequences were inserted at the 3 ' end of the chromosomal hns through a double-crossover at the coding and downstream regions of hns , were selected with kanamycin to obtain the K-12 ( BW25113 ) H-NS-12His strain . 
+ hns fused with the 12His sequence was transferred into the K-12 ( W3110 ) chromosome , together with the kanamycin resistance gene , via phage P1 transduction . 
+ Because the SE11 and SE15 strains are resistant to P1 , to construct the derivatives expressing 12His-tagged H-NS , we adopted the gene-doctoring method [ 60 ] using plasmid pDEX harboring an I-SceI recognition site and sucB and pACBSR harboring I-SceI and the kanamycin resistance gene [ 61 ] . 
+ The 12His coding sequence and kanamycin resistance gene in pSTV28-C-12His were amplified by PCR using primers hns-His12-H1 and hns-His12-H2-1 ( for SE11 ) or primers hns-His12-H1 and hns-His12-H2-2 ( for SE15 ) . 
+ Amplified fragments were inserted into the EcoRV site of pDEX . 
+ SE11 and SE15 were co-transformed with two plasmids -- pACBSR and the appropriate pDEX-H-NS-His12 -- with subsequent selection for kanamycin and sucrose resistance . 
+ Transformants were cultured in LB liquid medium containing 25 μg / ml chloramphenicol and 0.2 % arabinose for a few hours , inducing inactivation of pDEX-H-N-S-His12 by I-Sce1 . 
+ Cells were harvested by centrifugation and regrown in LB liquid medium containing 5 % sucrose at 30 °C for 2 h to cure pACBSR . 
+ Finally , kanamycin - and sucrose-resis-tant colonies were selected on an LB plate containing 50 μg / ml kanamycin and 5 % sucrose to isolate transformants in which the 12His sequence and kanamycin resistance gene were integrated into the chromosome via homologous recombination at the hns coding sequence and sequences downstream of hns introduced at the 5 ' and 3 ' ends of the PCR products , respectively . 
+ Expression of H-NS-12His in the created strains was confirmed by western blotting using an antibody against His tag ( MBL , Japan ) . 
+ Sequencing of the introduced hns tagged with 12His revealed a point mutation within the hns coding region in the K-12 derivative , probably attrib-utable to an error during synthesis of the primer used to generate the strain . 
+ Because the identified point mutation ( from AAG [ 136K ] to AAA [ 136K ] ) did not lead to an amino acid substitution in H-NS , the strain was employed for further analysis . 
+ Noteworthy , the C-terminal 12His tag did not negatively affect the growth of K-12 , SE11 and SE15 in Luria-Bertani ( LB ) medium under aerobic conditions . 
+ The hns deletion mutant ( MC4100 Δhns : : Km ) used in the β-galactosidase assay was constructed using P1 transduction of the hns : : km allele from K-12 ( W3110 ) hns : : km [ 62 ] into MC4100 . 
+ ChAP-seq experiments
+ ChAP was performed according to the reported procedure [ 34 ] using 50-ml cultures of E. coli grown in LB medium under aerobic conditions at 37 °C . 
+ DNA fragments that co-purified with H-NS-12His and in the supernatant fraction before ChAP were sequenced using the Illumina GA sequencer ( Illumina , USA ) . 
+ We performed ChAP-seq experiments twice for each strain , and 36-bp single-end reads provided 8 -- 11 million reads ( first set of sequencing results of ChAP and WCE fractions of three strains ) and 5 -- 10 million reads ( second set ) . 
+ The sequence data used in this publication have been deposited in the DRA database ( DDBJ Sequence Read Archive : http://trace.ddbj.nig.ac.jp/dra/index_e.shtml ) with accession number : DRA000539 . 
+ Multiple alignment of genome sequences of the three strains
+ Complete sequences and annotations of genes in the three genomes ( SE11 [ AP009240 .1 ] , SE15 [ AP009378 .1 ] , and K-12 [ W3110 ; AP009048 .1 ] ) were obtained from the NCBI GenBank data-base . 
+ We compared the three chromosome sequences and their synteny of gene arrangement using the Mauve 2.3.1 program for Progressive Mauve algorithm with default parameters [ 35,36 ] and determined the segments that were conserved in all three strains ( `` common '' ) and unique to two ( `` shared '' ) or one ( `` specific '' ) strain ( s ) . 
+ The K-12 ( W3110 ) chromosome contains a large inverted region ( ~ 800 kbp ) surrounded by two ribosomal operons ( 3,423,096 -- 4,216,800 bp ) . 
+ To avoid eliminating this region from `` common '' segments by the above analysis , we manually reversed this region in the chromosome sequence of K-12 ( W3110 ) before alignment using the Mauve program . 
+ The sum of the consensus sequences of `` common '' segments was 3,888,365 bp . 
+ However , the DNA sequences of `` common '' segments in each strain occasionally had small gaps compared with the consensus `` common '' segments of all three strains . 
+ Thus , the total length of the `` common '' segment in each strain was shorter than that of the consensus segments , specifically , SE11 : 3,886,369 bp , SE15 : 3,886,157 bp , K-12 ( W3110 ) : 3,886,242 bp . 
+ Short reads mapping , normalization of mapped reads , and estimation of H-NS binding intensities for each nucleotide 
+ Short reads ( 36 bp ) obtained from the Illumina GA sequencer were uniquely mapped on to the reference genome sequences of K-12 ( W3110 ) , SE11 , and SE15 , allowing no gaps and up to two mismatches using the BLAT program [ 63 ] . 
+ Because the purpose of this study was to compare the DNA binding profiles of H-NS in these three strains , we mapped the short reads only on the chromosome in each strain . 
+ Uniquely aligned reads were specifically used for further analysis . 
+ In addition , because it is impossible to specifically map 36-bp reads to one of seven rRNA genes in the E. coli genome and the rRNA genes were not used for the phylogenetic analysis , rRNA coding regions were not included in this study . 
+ Next , mapped reads were extended to 200 bp in length from the 3 ' end of each read , taking into account the length of DNA fragments to construct the sequence library . 
+ We subsequently normalized the number of mapped read numbers at every nucleotide in each experiment by global scaling , in which the number of mapped reads at each nucleotide was divided by the median number of mapped reads at all nucleotides in each sample . 
+ Finally , to estimate the H-NS binding intensity at every nucleotide , we divided the scaled number of mapped reads for DNA from the ChAP fraction by that from WCE before ChAP-mediated purification to remove the effects of sequence preference of Illumina GA. . 
+ In cases where the number of mapped reads at some positions was zero for the ChAP or WCE fraction , the H-NS binding intensity of the position was defined as zero . 
+ As H-NS binding intensity spanned a wide range of values , log10-scaled values were used for subsequent analysis . 
+ To evaluate our normalization procedure in the comparison of different sequencing outputs , the average H-NS binding intensity in 200-bp windows was calculated in 100-bp steps along whole-genome sequences . 
+ Scatter plots shown in S4A Fig demonstrate that correlation coefficients of estimated average H-NS binding intensities in each window obtained in all experiments for each strain were high ( r > 0.8 ) . 
+ In addition , correlation coefficients of the binding intensities of corresponding windows in `` common '' segments of different strains were greater than 0.69 for all combinations ( S4B Fig ) , indicating that our normalization procedure was adequate . 
+ Determination of the H-NS binding regions
+ H-NS binding intensity showed a bimodal distribution of `` noise '' components at ~ 1.0 , and `` signal '' components , which ranged from 10.0 to 1000.0 ( S5 Fig ) . 
+ In four experiments ( all data from the 1st experiment and K-12 data from the 2nd experiment ) , the bimodal distribution was clear , and noise components could be clearly discriminated from signal components . 
+ In these cases , noise components could be approximated as a normal distribution in which μ represents mode and σ is 0.2 ( S5 Fig ) . 
+ Thus , we set the threshold value to remove noise components as mode + 3σ ( = 0.6 ) . 
+ In the two remaining experiments ( data for SE11 and SE15 in the 2nd experiment ) , noise components were not clearly separable from signal components , and the two possibly overlapped . 
+ However , we referred to the threshold value from other experiments ( mode + 0.6 ) to infer signal components in these cases ( S5 Fig ) . 
+ Next , we searched for regions in which H-NS binding intensity was greater than the threshold . 
+ To remove the effects of the remaining noise signals by our threshold setting , we extracted regions longer than 200 bp as possible H-NS binding sequences . 
+ Finally , we compared the H-NS-bound regions obtained in the two experiments for each strain , and overlapping regions were identified as H-NS-bound regions for further analysis . 
+ To evaluate the accuracy of our mapping and determination of H-NS-bound regions , we required the second mapping result of our short reads that was acquired with a different mapping program , namely Bowtie 2 [ 64 ] , and we also required a determination of H-NS-bound regions with the Bowtie 2 mapping results . 
+ Comparison of H-NS-bound regions determined by BLAT mapping ( original result ) and by Bowtie 2 mapping ( second mapping ) indicated that the H-NS-bound regions that were determined with the two mapping procedures were 97 % identical . 
+ This result clearly indicated that our mapping and determination of H-NS-bound regions were highly reliable , and thus we conducted subsequent analyses using the BLAT mapping results for the H-NS-bound regions . 
+ The reproduc-ibility of H-NS binding profiles for the whole genome of each strain ( SE11 , SE15 , K-12 ) are indicated in S6 Fig . 
+ In addition , the conservation of H-NS-bound regions in `` common '' segments within each whole genome is presented in S7 Fig . 
+ Phylogenetic analysis of orthologous genes
+ The 44 E. coli strains whose genome sequences had been annotated in RefSeq were used for our phylogenetic analysis ( S1 Table ) . 
+ All chromosome sequences and the annotations of the 44 strains were obtained from the RefSeq ( NCBI Reference Sequence database ) . 
+ Because RefSeq represents reference sequences for which gene annotation is consistent and standardized , it enabled us to precisely identify orthologous genes in the E. coli lineage . 
+ To identify the conserved orthologous genes in the E. coli strains , we initially evaluated the level of conservation of the amino acid sequence translated from each gene . 
+ We carried out all-against-all reciprocal BLASTP comparisons for all proteins in all strains followed by clustering of the BLASTP hits using OrthoMCL [ 65 ] . 
+ To remove genes encoding mobile elements , duplicate genes , and pseudogenes , which have repetitive sequences , and paralogs that interfere with phylogenetic analysis , the proteins encoded by prophage and insertion ( IS ) genes were searched by BLASTP against the ACLAME database [ 66 ] and ISFinder [ 67 ] and excluded from further analysis . 
+ Paralogs and hidden paralogs were also removed from the orthologous proteins by excluding the gene clusters containing more than two copies of the proteins present in one strain . 
+ Then , we selected the 3,107 orthologous proteins ( gene clusters ) that were conserved in > 90 % of strains ( 40 of 44 ) , in which K-12 , SE11 , and SE15 were always included . 
+ From the selected orthologous proteins , the 405 orthologous proteins encoded by genes that had at least one broken codon with one or two nucleotide deletions or insertions in at least one strain were excluded to remove pseudogenes . 
+ Ultimately , 2,702 orthologous protein clusters were selected for subsequent analysis ( S8 Fig ) . 
+ Multiple sequence alignment for each orthologous protein cluster was performed using MAFFT [ 68 ] ( G-INS-i algorithm ) and back-trans-lated into the aligned nucleotide sequence . 
+ GBLOCKs [ 69 ] ( codon model , default settings ) was used to remove gaps and unreliably aligned positions . 
+ To assess the accuracy of our orthologous gene sets , we constructed a representative phylogenetic tree based on the concatenated super-alignment . 
+ We concatenated the alignments of 100 randomly chosen orthologous genes and inferred the maximum likelihood ( ML ) tree using PhyML [ 70 ] with the following parameters : - b 100 - d nt - m HKY85 - v 0 - c 4 - a 1 . 
+ The resulting ML tree reflected the phylogenic relationships revealed in previous studies [ 71 ] ( S9 Fig ) . 
+ The dN and dS values for orthologous genes were computed using Codeml from PAML [ 72 ] ( settings : tree = ML gene tree from PhyML , CodonFreq = F3X4 , clock = 0 , kappa = estimated by ML , omega = estimated by ML , alpha = 0 , rho = 0 ) . 
+ In this analysis , we identified H-NS-bound genes as those that overlapped with H-NS-bound regions determined in at least one strain of SE11 , SE15 , and K-12 , because the H-NS-bound regions in common segments were essentially overlapping . 
+ To evaluate this classification , we manually inspected H-NS binding signals in each H-NS-bound gene , which also indicated that , even if the H-NS-bound region overlapped with the H-NS-bound gene in only one or two strains , possible H-NS binding signals were observed in the H-NS-bound gene in the other strains , albeit the H-NS binding intensity for the gene was lower than the threshold value in most cases . 
+ There were 42 genes ( S6 Table ) that were specifically bound by H-NS in only one or two strains , in which H-NS binding was dependent on the specific or shared segments that were localized in the vicinity ( in many cases , neighbors ) of these 42 genes in the chromosomes ( a typical example is presented in S10 Fig , where ytfI is the H-NS-bound specific segment ) , because H-NS binding was not detected for strains in which the specific segments were absent from the chromosomes . 
+ Therefore , we regarded these 42 genes as H-NS-unbound genes . 
+ We verified the significance of the higher dN in the H-NS-bound regions compared with that in the H-NS-unbound regions by modifying the definition of the H-NS-bound genes . 
+ The results indicated that the dN of the H-NS-bound genes was significantly greater than that of the H-NS-unbound genes , even when we excluded the genes in which H-NS binding was limited to the 3 ' end and the length overlapping with the H-NS-bound regions was 10 % of the total gene length or if the genes included in transcriptional units whose promoters , intergenic , or coding regions could bind H-NS were considered as H-NS-bound genes ( S11B and S11C Fig ) . 
+ Furthermore , even when we regarded the 42 genes that bound to H-NS in a specific - or shared segment -- dependent manner ( described above ) as H-NS-bound genes , the dN in the H-NS-bound genes was still significantly greater than that in the H-NS-unbound genes ( p < 0.001 ) . 
+ These results suggested that our conclusion concerning the sequence diversity of H-NS-bound genes was not affected by the definition of the H-NS-bound genes . 
+ Although we carefully selected orthologous genes based on the above criteria , it was possible that horizontal gene transfer and recombination events among E. coli strains might have affected our results -- particularly the horizontal transfer and recombination events in H-NS-bound orthologous genes . 
+ To validate the potential effects of horizontal transfer and recombination events on our analysis , we calculated minimal tree split compatibilities between H-NS-bound and - unbound orthologs by which we could evaluate whether the genes had been vertically evolved in the E. coli lineage [ 73,74 ] . 
+ If the orthologs were present in the ancestral E. coli genome before the divergence of the E. coli lineage and had not been involved in horizontal transfer or recombination events among E. coli strains , their phylogenies should be similar . 
+ Therefore , if H-NS-bound orthologs tend to be transferred horizontally more so than H-NS-unbound orthologs , phylogenies of trees would differ between H-NS-bound and - unbound orthologs . 
+ To avoid a sample-size bias , we reconstructed five datasets : set A , trees of H-NS-unbound orthologs ( N = 2,183 ) ; set B , trees of H-NS-bound orthologs ( N = 519 ) ; set C , trees of downsampled H-NS-unbound orthologs ( N = 519 , randomly sampled without replacement ) ; set D , trees of H-NS-bound orthologs with a simulated horizontal transfer event ( N = 519 , constructed by a minimal perturbation of set B where for each tree a randomly selected branch was pruned and then regrafted at a random branch ) ; set E , trees of H-NS-bound orthologs with two simulated horizontal transfer events ( N = 519 ) . 
+ We used set A as a reference dataset and calculated minimal tree split compatibilities for each tree in sets B , C , D , and E against set A . 
+ The distributions of compatibility scores for each dataset were compared using the two-sided Kol-mogorov-Smirnov test . 
+ We could not reject the null hypothesis that H-NS-bound and H-NS-unbound tree sets were drawn from the same distribution ( S12 Fig , p = 0.16 ) , whereas the slightest perturbation ( a single horizontal transfer event ) strongly rejected the null hypothesis ( S12 Fig , p < 0.001 ) . 
+ This suggested that there was no bias for sequence substitutions caused by horizontal gene transfer or recombination events by which the number of H-NS-bound ortho-logs would have been much greater than H-NS-unbound orthologs . 
+ It is also known that gaps in alignments can reduce the accuracy of the estimation of sequence diversity because of the difficulty in achieving an accurate alignment around gap positions [ 75 ] . 
+ Thus , we calculated the dN and dS values for coding regions only using the orthologous gene clusters without gaps in their alignments and compared the sequence substitution rates in H-NS-bound and - unbound regions . 
+ The results are shown in S13A and S13B Fig . 
+ The sequence diversity at non-synonymous positions in the H-NS-bound coding regions was significantly greater than that in H-NS-unbound regions . 
+ Therefore , this result suggested that our conclusion concerning the sequence diversity of coding regions was not affected by misalignment caused by insertions/deletions in coding regions . 
+ When we investigated the conservation of the four classes ( H-NS-bound HTGs , H-NS-unbound HTGs , H-NS-bound Core genes , and H-NS-unbound Core genes ) of genes in proteo-bacterial species classified in the same family , the same class , or the same phylum as E. coli , MBGD was used for the comparison of the conservation of genes [ 43 -- 46 ] . 
+ First , we constructed the ortholog cluster table by using 48 completely sequenced bacterial genomes from MBGD . 
+ Of these 48 genomes , one was E. coli K-12 MG1655 , 25 strains belonged to the same family but different genus than E. coli ( family Enterobacteriaceae ) , 14 strains belonged to the same class but different family than E. coli ( class Gammaproteobacteria ) , and 8 were strains belonged to the same phylum but different class than E. coli ( phylum Proteobacteria ) . 
+ Strains used in this analysis are listed in S7 Table . 
+ For clustering parameters , we used the default values of MBGD . 
+ From this ortholog cluster table , we searched our E. coli orthologous genes by gene name . 
+ In total , 2,098 genes were identified ( H-NS-bound HTGs : N = 157 ; H-NS-unbound HTGs : N = 224 ; H-NS-bound Core genes : N = 174 ; H-NS-unbound Core genes : N = 1,543 ) . 
+ Then , we checked for the presence or absence of these genes in each of the 48 genomes ( S14 Fig ) . 
+ The conservation rate for each class of genes was calculated for each genome separately . 
+ Finally , the average conservation rates were calculated separately for the same family genomes , the same class genomes , and the same phylum genomes , and we compared these values for each class of genes . 
+ defined `` conserved '' intergenic regions as the regions presently between two neighbouring orthologous genes we had determined ( see above ) in the same order and direction in E. coli strains in which these orthologous genes were identified . 
+ In addition , we selected the regions whose length was no less than 10 bp nor more than 300 bp in all chromosomes . 
+ The multiple alignment of each set of conserved intergenic regions was performed using MAFFT ( G-INS-i algorithm ) . 
+ After the multiple sequence alignment , we selected a cluster in which the lengths of all intergenic regions in each cluster were different , less than 10 % of the aligned sequence length of a cluster , implying that no intergenic regions with long insertions and/or deletions were used for subsequent analyses . 
+ Consequently , 712 regions were selected as conserved intergenic regions ( average length was 94.8 bp ) . 
+ We then estimated the sequence diversity matrices for those intergenic regions using Baseml from PAML ( setting : tree = ML tree from PhyML , model = 7 , clock = 0 , kappa = 2.5 ( starting value ) , fix_kappa = 0 ( ML estimation of kappa ) , alpha = 0 , fix_alpha = 1 ( fixed value ) , rho = 0 , fix_rho = 1 ( fixed value ) , npark = 0 , nhome = 0 , Mgene = 0 . ) . 
+ In addition , we removed sets in which the evolutionary distance of at least a pair of strains was > 1.0 , meaning that the sequences of those regions were too divergent to yield a correct alignment . 
+ Finally , we selected the 703 conserved intergenic regions , including 94 H-NS-bound intergenic regions that overlapped with the H-NS-bound regions identified in at least one strain , and compared the sequence diversification rates in intergenic regions bound or not bound to H-NS . 
+ In this analysis , we also calculated the sequence substitution rate for each intergenic region only using the set of the intergenic regions without gaps and concluded that the sequence diversity of intergenic regions was not affected by any potential misalignment caused by insertions/deletions in coding regions ( S13C Fig ) . 
+ We further assessed the impact of the presence of a promoter ( s ) on the extent of proximal sequence diversification . 
+ The intergenic regions with known promoters were selected from the class II regions using the information about promoters in the K-12 strain acquired from RegulonDB [ 76 ] . 
+ The sequence diversity of the H-NS-bound regions was greater than that of the H-NS-unbound regions , although the difference in sequence diversity between H-NS-bound and H-NS-unbound was even greater in regions with known promoters than in the regions without known promoters ( S15 Fig ; p < 0.001 [ with known promoters ] , p = 0.0079 [ without known promoters ] ) . 
+ These results suggested that the presence of other transcriptional regulatory elements , such as pause and termination signals , may also affect the observed H-NS binding -- dependent increase of sequence substitution rates . 
+ Analysis of transcriptional activity of sequence-divergent promoters using the β-galactosidase assay
+ Plasmid construction for the β-galactosidase assay . 
+ To investigate the effects of sequence diversity without the influence of differences of genetic backgrounds in the three strains , we examined the effects of sequence divergence of ybdO in different strains under the same genetic background using the β-galactosidase assay . 
+ Plasmids used for the β-galactosidase assay are listed in S5 Table and were constructed using plasmid pRW50 [ 50 ] . 
+ Various DNA fragments including ybdO promoter regions and regions up - and downstream of the promoters indicated in Fig 4A were amplified by PCR using chromosomal DNA purified from strains SE11 , SE15 , and K-12 as templates and appropriate primers ( S4 Table ) , and the products were cloned into pRW50 as EcoRI/HindIII fragments . 
+ Hybrid DNA fragments fused the SE11 promoter proximal region ( or the upstream region of SE11promoter ) and the downstream region of SE15 promoter ( or SE15 coding region ) , or the SE15 promoter proximal region ( or the upstream region of SE15 promoter ) and the downstream region of SE11 promoter ( or SE11 coding region ) were amplified via recombinant PCR using four primers for each hybrid fragment ( S4 Table ) . 
+ Two DNA fragments were independently amplified by PCR using purified SE11 and SE15 chromosomal DNA , and the resultant DNA fragments were purified and used as template DNA for the second PCR . 
+ The first PCR was performed with primers that have sequences corresponding to the 5 ' or 3 ' ends of the fragments and the junction points . 
+ A second PCR was performed with the primers corresponding to the 5 ' and 3 ' ends of the fragments . 
+ The junction point of each fragment is indicated in Fig 4E . 
+ The resultant DNA fragments had EcoRI/HindIII sites at both ends and were cloned into pRW50 . 
+ E. coli K-12 DH5α cells were transformed with the plasmids , and the transformants were selected on the basis of tetracycline ( 5 μg / ml ) resistance . 
+ These plasmids were subsequently introduced into the strains MC4100 and MC4100 Δhns : : km to prepare the reporter strains . 
+ Pre-cultures of the reporter strains were grown overnight at 37 °C in 2 ml of LB medium containing 5 μg / ml tetracycline and then used to reinoculate the E. coli cells in 10 ml fresh LB medium containing 5 μg / ml tetracycline at 1:500 ( v/v ) . 
+ The cells were cultivated under aerobic conditions at 37 °C and harvested at various times ( for time course experiments ) , after 5 h ( for the assay of wild-type cells in stationary phase ) , or after 7 h ( for hns mutant cells in stationary phase ) from the start of cultivation . 
+ β-galactosidase activity was measured as described by Miller [ 77 ] and is expressed in Miller units . 
+ Investigation of growth phase -- dependent altered expression and H-NS-mediated repression of the ybdO promoter . 
+ To monitor the growth phase -- dependent transcriptional alteration of ybdO , we performed a β-galactosidase assay using pRW50 carrying DNA segments from − 250 to +239 bp relative to the ybdO start codon ( the L2 fragment in Fig 4A ) of the three strains . 
+ The plasmids were introduced into K-12 wild-type ( MC4100 ) or the hns mutant ( MC4100 Δhns : : km ) strain in which hns was replaced by the kanamycin resistance gene . 
+ β-galactosidase activity was measured using cells grown in LB medium at 37 °C under aerobic conditions . 
+ The transcriptional activities of all three fusions gradually increased during growth at log phase and then plateaued during the early stationary phase both in the wild type and the hns mutant ( S2B Fig ) . 
+ In addition , the transcriptional activities of three strains decreased in the wild type as compared with the hns mutant , indicating that ybdO transcription is repressed by H-NS in K-12 , SE11 and SE15 . 
+ Mapping of H-NS-dependent regulatory elements ( URE and DRE ) for ybdO and NEs . 
+ To determine the regions responsible for the regulating ybdO transcription , we utilized the βgalactosidase assay using lac-operon fusions involving various lengths of segments in the upstream and coding regions of ybdO of the three strains ( Fig 4A ; segments L1 , L2 , L3 , L4 , LR , R1 , R2 , and F ) . 
+ The segments fused were selected based on sequence conservation in the upstream and coding regions of ybdO of the three strains . 
+ Because the transcriptional activity of ybdO plateaued early during the stationary phase , we measured the activity only during the early stationary phase , which corresponded to 5 h of cultivation for the wild type and 7 h of cultivation for the hns mutant ( S2B Fig ) . 
+ As compared with the transcriptional activity of the fusion with the L2 fragment , the addition of further upstream sequences did not significantly affect the ybdO transcription of the SE11 and K-12 fusions in wild-type cells ( Fig 4B and 4D , compare blue bars of L1 or F with those of L2 ) . 
+ The addition of the upstream sequence to − 298 bp reduced the transcriptional activity of the SE15 fusion to basal level ( fully repressed ; Fig 4C , compare blue bars of L1 and L2 ) in wild-type cells . 
+ On the other hand , deletion of the sequence from − 250 bp to − 176 bp ( see Fig 4A , compare L3 with L2 ) increased the transcriptional activities of all three fusions in wild-type cells ( Fig 4B -- 4D , compare blue bars of L3 and L2 ) . 
+ The transcriptional activities with F , L1 , L2 , and L3 in the hns mutant ( Fig 4B -- 4D , red bars of F , L1 , L2 and L3 ) were at the same level as that with L3 in wild type ( Fig 4B -- 4D , blue bars of L3 ) . 
+ These results indicated that the nucleotide sequence between − 250 and − 176 bp was necessary for H-NS-dependent negative regulation of ybdO transcription of all three strains . 
+ Therefore , this region was denoted as an URE ( upstream regulatory element ; Fig 4A ) . 
+ The sequence from − 298 to − 250 bp also contrib-uted to the full H-NS-mediated repression of the SE15 promoter along with the URE . 
+ Because further deletion of upstream sequences to − 99 bp abolished transcriptional activity in each of the three strains ( even in the hns mutant ; Fig 4B -- 4D , L4 ) , the sequence from − 179 to − 99 bp appeared to be essential for the promoter activity of ybdO of all three strains . 
+ This result was consistent with the mapping of the 5 ' end of SE11 and SE15 ybdO mRNA with 5 ' - RACE and the transcription start site of K-12 ybdO determined by differential RNA-seq [ 49 ] ( Fig 4A ) . 
+ As compared with the promoter activity in the longest fragment F , deletion of +239 to +164 bp ( see Fig 4A , compare R2 with F ) increased the transcriptional activity of the SE15 fusion in the hns mutant ( Fig 4C , compare red bars of R2 and F ) , whereas the same deletion had little effect on the transcriptional activities of the SE11 and K-12 fusions ( Fig 4B and 4D , compare red bars of R2 with those of F ) . 
+ Thus , the region from +239 to +164 is crucial for H-NS-inde-pendent negative regulation of SE15 ybdO . 
+ Further deletion of the sequence from +164 to +27 bp increased the transcriptional activities of the SE11 and K-12 fusions in the hns mutant ( Fig 4B and 4D , compare red bars of R1 with those of R2 ) but had little effect on the transcriptional activity of the SE15 fusion ( Fig 4C , compare red bars of R1 and R2 ) . 
+ Hence , the region from +164 to +27 bp is most crucial for the H-NS-independent negative regulation of SE11 and K-12 ybdO . 
+ This sequence of SE15 was required only for the H-NS-dependent negative regulation because deletion of the sequence increased the activity of the SE15 promoter only in the wild type ( Fig 4C , compare blue bars of R1 and R2 ) . 
+ The corresponding regions of SE11 and K-12 are involved both in H-NS-dependent and - independent negative regulation because the H-NS-mediated repression was lower in R1 than in R2 ( Fig 4B and 4D ; in each fusion , the relative ratio of hns mutant and wild type [ red bar / blue bar ] of R2 was greater than that of R1 ; R2 and R1 of SE11 were 8.4 and 3.0 , respectively ; R2 and R1 of K-12 were 3.3 and 1.8 , respectively ) . 
+ From these results , we defined the downstream regulatory regions of ybdO as NESE15 and NESE11 , K12 , which are the H-NS-independent negative elements for SE15 and for SE11 and K-12 , respectively , and as DRE , the downstream regulatory element , which is necessary for H-NS-dependent repression for all three strains ( Fig 4A ) . 
+ Although the activities of the ybdO promoters of SE11 and SE15 with LR and R1 in the hns mutant were comparable to each other , the transcriptional activity of the longest SE15 segment ( Fig 4C , red bar of F ) was higher than that of SE11 ( Fig 4B , red bar of F ) . 
+ This result indicated that the repression potential of NESE15 was weaker than that of NESE11 , K-12 . 
+ We further evaluated the difference in transcriptional regulation of NEs using pRW50 carrying hybrid DNA fragments including the SE11 promoter ( or upstream region of SE11 promoter ) with the SE15 coding region ( or SE15 promoter and coding regions ) or the SE15 promoter ( or upstream region of SE15 promoter ) with the SE11 coding region ( or SE11 promoter and coding regions ) . 
+ We amplified DNA fragments including those containing regions up - and downstream of the ybdO promoter including the 5 ' end of the ybdO coding region of SE11 and SE15 and fused at -- 176 , -- 99 , -- 34 and +27 bp and vice versa ( Fig 4E ) . 
+ In wild-type cells , H-NS still repressed ybdO expression in the hybrid DNA fragments , except for fragment d in which H-NS-mediated repression was quite reduced compared with other hybrid fragments . 
+ Because the mechanism of H-NS-mediated repression was not the focus of our present study , we did not further investigate this phenomenon . 
+ In hns mutant cells , the β-galactosidase activity of each hybrid having the SE15 coding region tended to be greater than that of the hybrids having the SE11 coding region . 
+ The largest difference was detected when the DNA fragments were fused at -- 33 bp ; therefore , part of the intergenic regions might also contribute to H-NS-independent negative regulation of ybdO . 
+ Taken together , ybdO expression has diverged between SE15 and SE11 ( or K-12 ) , mainly attributable to the difference in the activities of NESE15 and NESE11 , K12 . 
+ Notably , H-NS-dependent negative regulation kept the promoter activity at the basal level ( low expression ) in wild-type E. coli cells ( Fig 4B -- 4D , blue bars of F ) . 
+ 5 ' RACE and determination of the 5 ' end of ybdO transcripts Total RNA was extracted and purified from E. coli K-12 ( MC4100 ) transformed by pRW50 carrying LR fragments , in which the major negative regulation of ybdO in an H-NS-dependent or - independent manner were cancelled by the deletion of the NEs , URE , and DRE for SE11 and SE15 using the RNeasy Mini kit ( Qiagen ) . 
+ RACE was performed with First-Choice RLM-RACE kit ( Ambion ) using the manufacturer 's manual with modifications . 
+ Specifically , RNA ( 5 μg ) was treated with tobacco acid pyrophosphatase or left untreated , and then the 5 ' RACE adaptor was ligated to each RNA molecule . 
+ cDNA was synthesized from adapterattached RNA with a random decamer . 
+ The 5 ' end of ybdO was amplified by PCR with primers ( 5 ' RACE Outer Primer and ybdO-D2 : CAAGTCGTAGAGATTGGCCATACA [ for SE11 ybdO ] or ybdO-SE15-D2 : TAGATCATAAAGATTAGCCATAAC [ for SE15 ybdO ] ) , and products were visualized after electrophoresis in Gel-Red containing agarose gel and cloned with pGEM-easy ( Promega ) . 
+ Sequences of cloned fragments were determined and 5 ' ends were mapped on the genome sequences of SE11 and SE15 . 
+ The raw data and their tables are available in our web page, http://palaeo.bio.titech.ac.jp/ Resources/hns2015/.
+ Supporting Information
+ S1 Fig . 
+ Nucleotide substitutions in the promoter and the 5 ' end of ybdO . 
+ ( A ) Schematic diagram of fragments used in the β-galactosidase assay . 
+ Shown is a multiple sequence alignment of ybdO including the upstream region for the E. coli strains . 
+ At the top of the first page , the locations of the 5 ' and 3 ' ends of the fragments are shown according to distance from the ybdO start codon . 
+ Blocks correspond to the regions that were truncated in shorter fragments cloned into the reporter plasmids for the β-galactosidase assay . 
+ The horizontal arrow denote the positions of transcription start sites suggested by 5 ' - RACE and differential RNA-seq [ 49 ] . 
+ The alignment of the ybdO promoter region and downstream region are shown below each sche-matic representation of blocks . 
+ In this analysis , we independently aligned coding and intergenic regions by different methods ( see Materials and Methods ) . 
+ Therefore , we separately indicate the alignment in ortholog0270 ( dsbG in K-12 ) , intergenic region intergenic0112 ( dsbG -- ybdO ) , and ortholog2573 ( ybdO ) . 
+ Numbers at the top of the alignment show the positions relative to the start codon of ybdO in K-12 . 
+ The location of each block is indicated at the bottom of the alignment . 
+ Sequences for ortholog0270 and ortholog2573 were aligned using protein-based alignment , which was then back-translated to yield DNA sequences . 
+ Sequence alignment of the intergenic0112 was performed by the DNA-based alignment . 
+ The Alignment of ortholog0270 ( dsbG ) are constructed by sequences of all 44 E.coli strains , while alignments of intergenic0112 ( dsbG -- ybdO ) and ortholog2573 ( ybdO ) are constructed by 41 sequences . 
+ It is due to the fact that ybdO are conserved only in 41 E.coli strains and 3 strains do not possess ybdO ortholog . 
+ In these 3 strains , recombination or HGT event might have been occurred at the downstream of ortholog0270 . 
+ Alignments are depicted by UGENE environment [ 78 ] . 
+ At positions 107 and 106 bp upstream of the ybdO initiation codon , we indicate the transcription start site for each of SE11 , SE15 , and K-12 ybdO . 
+ ( B ) Top panel : schematic diagram of the location of block A ~ G in the upstream and coding regions of ybdO . 
+ Block A ~ D , F and G are the regions systemtically deleted from the fragments F and Block E + D was corresponding to fragment LR , which were cloned into pRW50 for the β-galactosidase assay ( Fig 4A ) . 
+ The frequency of segregating sites in each block among all strains was calculated by dividing the number of segregating sites at which at least one strain had a substitution by the sum of alignment positions in each block . 
+ The segregation frequencies among the E. coli strains used to identify orthologous genes in blocks are shown in the second panel , followed by the frequencies of segregating sites between two strains `` SE11 vs K-12 '' , `` SE15 vs SE11 '' , and `` SE15 vs K-12 '' . 
+ The alignment positions at which there are gaps in at least one strain were ignored to calculate the sum of segregating sites in each block both in the total and pairwise comparisons . 
+ ( PDF ) 
+ S2 Fig . 
+ Analysis of ybdO expression with the β-galactosidase assay ( time course ) . 
+ ( A ) H-NS binding profiles near ybdO are presented with CDS maps for SE11 ( top ) , SE15 ( middle ) , and K-12 ( bottom ) , which are segments of the maps in S2 Fig . 
+ The yellow arrows show the locations of ybdO in K-12 , SE11 , and SE15 . 
+ ( B ) Expression profiles of the SE11 , SE15 , and K-12 ybdO promoters in the time course . 
+ The wild type ( MC4100 ) and the hns mutant ( MC4100 Δhns : : km ) transformed with pRW derivatives carrying the L2 fragments of SE11 ( left ) , SE15 ( middle ) , and K-12 ( right ) were grown at 37 °C in LB medium under aerobic conditions . 
+ The optical density ( OD600 ) of the wild-type ( open circles with black line ) and hns mutant ( open triangles with dashed black line ) cultures and the β-galactosidase activities ( Miller units ) of the wild type ( cross with bold dashed line ) and the hns mutant ( open diamond with bold black line ) were measured every hour and plotted on the same graph . 
+ The time points of the early stationary phase , when β-galactosidase activity of the various fragments ( L1 -- F ) was measured and compared ( Fig 4B -- 4D ) , are indicated by black ( wild type ) and dashed arrows ( hns mutant ) on the growth and β-galactosidase activity curves . 
+ The values represent the average of three independent assays . 
+ Standard errors are shown with error bars . 
+ ( PDF ) 
+ S3 Fig . 
+ Raw sequencing results for 5 ' - RACE and the mapping positions of the 5 ' edge of SE11 and SE15 ybdO mRNAs and transcription start site of K-12 mapped by differential RNA-seq . 
+ ( A ) Raw sequencing data for 5 ' - RACE . 
+ The 5 ' edge position of each ybdO mRNA is denoted by an arrow . 
+ ( B ) The represents the region encompassing the ybdO transcription start site ( indicated by an arrow ) and promoter regions for each of SE11 , SE15 , and K-12 in the context of the alignment of E. coli genomes with the putative promoter sequence ( the location of the putative -10 sequence is indicated by a red horizontal bar ) . 
+ This is a part of S1 Fig . 
+ ( PDF ) 
+ S4 Fig . 
+ Scatter plots of the H-NS binding intensity as measured in duplicate experiments and with different strains . 
+ ( A ) Average H-NS binding intensity ( logarithmic scale ) in 200-bp windows was calculated at 100-bp steps along the whole genome to compare results obtained from duplicate experiments using scatter plots . 
+ ( B ) Average H-NS binding intensity ( 200-bp windows at 100-bp steps , logarithmic scale ) along connected `` common '' segments was calculated to compare all combinations of ChAP-seq results . 
+ r : Pearson product-moment correlation coefficient . 
+ ( TIF ) 
+ S5 Fig . 
+ Distribution of H-NS binding intensity . 
+ Distribution of H-NS binding intensity for all nucleotides in the E. coli genome obtained with ChAP-seq was assessed via Kernel density estimation using the R program with default parameters . 
+ Vertical axis values represent nucleo-tide density , with binding intensity [ ChAP/WCE ( log10 ) ] shown on the horizontal axis . 
+ The mode value of the noise component and threshold value ( mode + 0.6 ) to extract H-NS binding 
+ S6 Fig . 
+ Comparison of H-NS binding profiles in duplicate experiments . 
+ H-NS binding profiles in duplicate experiments are presented in CDS maps , which are the original H-NS binding profiles shown in Fig 1A , for SE11 , SE15 , and K-12 . 
+ Overlapping binding regions in the two experiments are indicated with rectangles above the CDS maps . 
+ ( PDF ) 
+ S7 Fig . 
+ H-NS binding profiles on whole `` common '' segments in SE11 , SE15 , and K-12 . 
+ The H-NS binding profiles on the connected `` common '' segments in SE11 , SE15 , and K-12 are shown as for Fig 1C -- 1F . 
+ ( PDF ) 
+ S8 Fig . 
+ Identification of orthologous genes . 
+ The bar graph shows the number of orthologous genes conserved in SE11 , SE15 , K-12 , and the additional E. coli strains used in this study ( see S1 Table ) . 
+ A total of 3,107 genes were conserved in > 90 % of strains ( 40 of 44 , surrounded by a black rectangle ) and were used as orthologous genes in this study . 
+ Among the selected 3107 orthologous proteins , the 405 orthologs encoded by genes that had at least one broken codon ( with one or two nucleotide deletions or insertions ) in at least one strain were excluded to remove pseudogenes . 
+ Ultimately , 2,702 orthologous protein clusters were selected for phylogenetic analysis . 
+ ( PDF ) 
+ S9 Fig . 
+ Phylogenetic tree for 44 E. coli strains estimated by the ML method . 
+ The ML phylogenetic tree for 44 E. coli strains constructed via the concatenated superalignment of 100 randomly chosen orthologous genes . 
+ The reliability of the internal branches was assessed by bootstrapping with 100 pseudo-replicates . 
+ Strains used in ChAP-seq analysis are indicated with different colored underlines : blue , SE11 ; green , SE15 ; purple , K-12 . 
+ ( PDF ) 
+ S10 Fig . 
+ An example of how the H-NS binding depends on a strain-specific insertion sequence . 
+ There is a locus in which a specific sequence ( ytfI , red arrow ) is inserted into the chromosome ( in this case , the K-12 chromosome ) , and H-NS binding to neighboring genes ( in this case , cpdB [ yellow ] , cysQ [ green ] , ytfJ [ blue ] and ytfK [ purple ] ) is observed ( bottom panel ) . 
+ Without ytfI , H-NS binding to neighboring genes in SE11 and SE15 did not occur ( top and middle panels ) . 
+ ( PDF ) 
+ S11 Fig . 
+ Comparison of sequence diversities of H-NS-bound and - unbound orthologous genes determined using various definitions of `` H-NS-bound '' genes . 
+ Box plots were prepared as for Fig 2 . 
+ ( A ) The same figures are shown as in Fig 2A and 2B . 
+ ( B ) Similar to ( A ) , but orthologous genes in which H-NS bound only 10 % of its gene length at the 3 ' end were regarded as `` H-NS unbound '' ( red ; H-NS bound , N = 474 , gray ; H-NS unbound , N = 2,228 ) . 
+ ( C ) Similar to ( A ) , but orthologous genes whose promoter sequence or the upstream region of its transcriptional unit was bound by H-NS were included as H-NS-bound genes ( red ; H-NS bound , N = 752 , gray ; H-NS unbound , N = 1,950 ) . 
+ The asterisks indicate the statistical significance of the difference between the sequence diversities in the H-NS-bound and - unbound genes as assessed with the Wilcoxon rank-sum test ( p < 0.001 , p < 0.05 , ns : not significant ) . 
+ ( TIF ) 
+ S12 Fig . 
+ Comparison of sets of trees for H-NS-bound and - unbound orthologs . 
+ Cumulative distributions of tree compatibility scores with the H-NS-unbound reference dataset . 
+ The p-val-ues were calculated using the two-sided Kolmogorov-Smirnov test . 
+ Black dots : set C ( H-NS-unbound ) ; red dots : set B ( H-NS-bound ) ; blue dots : set D ( H-NS-bound with random pruning and regrafting ) ; green dots : set E ( H-NS-bound with two rounds of random pruning and regrafting ) . 
+ ( PDF ) 
+ S13 Fig . 
+ Sequence diversities of homologous genes and conserved intergenic regions that contain no gap sites in their alignments . 
+ Each distribution of sequence diversity is indicated as for Fig 3 , but the orthologous genes and intergenic regions including gaps were excluded from the analysis . 
+ ( A ) Distribution of dN in the H-NS-bound ( red ; N = 159 , median value = 0.0026 ) and - unbound ( gray , N = 940 , median value = 0.0019 ) genes . 
+ ( B ) Distribution of dS in the H-NS-bound ( red , N = 159 , median value = 0.054 ) and - unbound ( gray , N = 940 , median value = 0.058 ) genes . 
+ ( C ) Distribution of sequence diversity of H-NS-bound ( red , N = 56 , median value = 0.013 ) and - unbound ( N = 458 , median value = 0.0050 ) conserved intergenic regions . 
+ The asterisks indicate the statistical significance of the difference between the sequence diversities in the H-NS-bound and - unbound genes and intergenic regions as assessed with the Wilcoxon rank-sum test ( p < 0.05 , ns : not significant ) . 
+ ( PDF ) 
+ S14 Fig . 
+ Distribution of E. coli orthologous genes in Proteobacteria . 
+ For each gene cluster ( columns ) , boxes indicate the presence ( black ) or absence ( white ) of genes in the corresponding genomes ( rows ) . 
+ Left panel shows the reference phylogenetic tree for proteobacteria species computed using DnaK protein sequences of these species . 
+ ( PDF ) 
+ S15 Fig . 
+ The relevance of evolutionary distance in class II intergenic regions and the presence ( + known promoter ) or absence ( − known promoter ) of known promoters . 
+ Each distribution of sequence diversity is indicated as for Fig 3 . 
+ The information regarding promoters was acquired from the RegulonDB database [ 76 ] . 
+ Sequence diversity of H-NS-bound ( + known promoter ; N = 50 , median value = 0.019 ) and - unbound ( + known promoter ; N = 267 , median value = 0.0062 ) class II intergenic regions with known promoters ( left ) and of H-NS-bound ( − known promoter ; N = 30 , median value = 0.014 ) and H-NS-unbound ( − known promoter ; N = 264 , median value = 0.0043 ) class II intergenic regions without known promoters ( right ) . 
+ The asterisks indicate the statistical significance of the difference between the sequence diversities in the H-NS-bound and - unbound genes and intergenic regions as assessed with the Wil-coxon rank-sum test ( p < 0.05 , ns : not significant ) . 
+ ( TIF ) 
+ S7 Table. List of strains used in the gene conservation analysis. (XLSX)
+ Acknowledgments
+ We thank David Lee , Jon Hobman , and Mika Yoshimura for providing plasmids to generate the His-tagged strains , Hirofumi Aiba for providing the antibody against H-NS , Tetsuya Haya-shi for providing E. coli strains SE15 and SE11 , Charles Dorman for the helpful suggestion concerning promoter heterogeneity , Terumi Horiuchi and Etsuko Sekimori for the primary data handling for Illumina sequencing , and Jon Hobman for the critical reading of the manuscript . 
+ We also thank the editors and the anonymous reviewers for their helpful suggestions . 
+ Author Contributions
+ Conceived and designed the experiments : KH TT KK NO TO . 
+ Performed the experiments : TT EU SI TO AK YS . 
+ Analyzed the data : KH KK TT TO . 
+ Contributed reagents/materials/analysis tools : KH KK TT SI YS TO NO . 
+ Wrote the paper : KH KK TT NO TO .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/26818886.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/26818886.txt 0 → 100644
View file @27818a9
+ OPEN Global transcriptomic responses
+ Volatile organic compounds ( VOCs ) are commonly used as solvents in various industrial settings . 
+ Many of them present a challenge to receiving environments , due to their toxicity and low bioavailability for degradation . 
+ Microorganisms are capable of sensing and responding to their surroundings and this makes them ideal detectors for toxic compounds . 
+ This study investigates the global transcriptomic responses of Escherichia coli K-12 to selected VOCs at sub-toxic levels . 
+ Cells grown in the presence of VOCs were harvested during exponential growth , followed by whole transcriptome shotgun sequencing ( RNAseq ) . 
+ The analysis of the data revealed both shared and unique genetic responses compared to cells without exposure to VOCs . 
+ Results suggest that various functional gene categories , for example , those relating to Fe/S cluster biogenesis , oxidative stress responses and transport proteins , are responsive to selected VOCs in E. coli . 
+ The differential expression ( DE ) of genes was validated using GFP-promoter fusion assays . 
+ A variety of genes were differentially expressed even at non-inhibitory concentrations and when the cells are at their balanced-growth . 
+ Some of these genes belong to generic stress response and others could be specific to VOCs . 
+ Such candidate genes and their regulatory elements could be used as the basis for designing biosensors for selected VOCs . 
+ Volatile organic compounds ( VOCs ) are low molecular weight molecules with a vapor pressure of ≥ 10 Pa at 20 °C 1 , while compounds with a 6-months volatility between 5 and 95 % at ambient temperature can be termed semi VOCs ( sVOCs ) 2 . 
+ VOC such as toluene , methyl acetate , trichloroethylene , benzene , and phenol etc. , are common indoor and urban contaminants3 . 
+ Examples of common sVOCs include high molecular weight alkanes , polycyclic aromatic hydrocarbons ( PAH ) , organochlorine pesticides , and substitute benzenes4 ,5 . 
+ While there are natural VOCs ( e.g. cyclopentanone and dimethyl disulfide ) and sVOCs ( e.g. , n-Heptadecane and 1-butyl-3-methyl - imidazolium hexafluorophosphate ) produced biologically during degradation products of plant components or for biochemical signaling6 ,7 , many VOCs and sVOCs originate from fossil fuels , industrial chemicals and solvents . 
+ These compounds present a challenge to receiving environments and wastewater treatment processes , due to their toxicity and low bioavailability for degradation8 . 
+ The toxicity of VOCs and sVOCs has been evaluated in selected animal models . 
+ For example , cyclopentanone , N-methyl-2-pyrrolidone ( NMP ) and dimethylacetamide ( DMA ) , were found to cause developmental toxicity in rat embryos9 and rabbits10 ,11 . 
+ The toxicity of sVOCs commonly found in indoor environment , such as plasticizers , solvents , and flame retardant is also well studied12 . 
+ In industrial settings , VOC and sVOCs have been shown to concentrate in both liquid and gas phases of wastewater treatment plant13 ,14 . 
+ Thus , industrial VOC and sVOC discharges present serious concerns for wastewater treatment . 
+ Microorganisms are constantly sensing and responding to surrounding environmental conditions , including the presence of biologically toxic compounds . 
+ VOCs have been found to affect microbial diversity and biodeg-radation performance in activated sludge15 and in soil16 . 
+ Microbial tolerance to various VOCs in bacteria falls into three broad mechanisms : 1 ) alteration of membrane protein composition17 -- 19 , 2 ) export of toxic compounds through membrane transporters20 ,21 ; and 3 ) to a lesser extent , biotransformation of the compound to less toxic variant , which has been reported for soil microorganisms and a number of Pseudomonas species22 . 
+ Expression of 
+ 1Singapore Centre for Environmental Life Sciences Engineering ( SCELSE ) . 
+ 60 Nanyang Drive , SBS-01N-27 , Singapore 637551 . 
+ 2School of Chemical and Biomedical Engineering , NanyangTechnological University , 62 Nanyang Drive , Singapore 637459 . 
+ 3School of Biotechnology , Dublin City University , Collins Avenue , Dublin 9 , Ireland . 
+ 4Asian School of the Environment , NanyangTechnological University , 50 Nanyang Avenue , N2-01C-45 , Singapore 639798 . 
+ Correspondence and requests for materials should be addressed to F.M.L. ( email : FLauro@ntu.edu.sg ) detoxifying enzymes such as reductive dehalogenases23 and oxygenases24 , have been exploited in the bioremediation of chlorinated aliphatic hydrocarbons and polycyclic aromatic contaminated soil and groundwater . 
+ The E. coli K-12 MG1655 strain used in this study is the primary experimental reference model with a highly curated genome sequence with annotation25 . 
+ It is widely considered the E. coli strain of choice and its genome was the first published sequence of a wild-type laboratory strain of E. coli K-12 because it has relatively few genetic modifications compared to most other E. coli strains . 
+ E. coli has also been used extensively as biosensor due to its ease of genetic manipulation and availability of information26 . 
+ We chose E. coli K-12 also because the GFP : fusion library is readily available27 . 
+ Various genetic mechanisms have been identified to contribute to VOC tolerance in E. coli . 
+ For example , membrane transport proteins like the acrAB-tolC pump28 , mannose transporter29 and phosphate transporters30 in E. coli were found to confer tolerance to various VOCs . 
+ Regulatory elements such as the FadR , MarR31 and purR regulon32 , were found to be involved in conferring tolerance to n-hexane , p-xylene and cyclohexane . 
+ Genes under the central metabolic processes , such as the cyo and nuo operons , responsible for energy conservation and production , and those under galactitol metabolic process ( gat genes ) were up-regulated in response to ethanol30 and butanol33 , respectively . 
+ Overexpression of heat shock proteins , such as the GrpE and GroESL chaperone system also resulted in increased tolerance of various forms of butanol as well as ethanol34 ,35 . 
+ In addition , studies has been conducted to look at tolerance of E. coli to butanol using genomic library screening36 , microarray , and at proteomic , regulatory network and metabolite levels33 ,37,38 . 
+ In most of these studies , genetic responses to sub-toxic VOC and sVOC concentrations have not been described . 
+ Studying gene activation/inactivation following exposures to sub-toxic levels will enable mechanisms of adaptation and enhanced tolerance to be decoupled from general stress responses , which would be expected at higher concentrations . 
+ In addition , information on the genetic responses of microorganisms to non-inhibitory levels would be relevant to understand and improve VOC and sVOC resistance in microorganisms that can be used for biocatalysis ( e.g. for the removal of VOCs and sVOCs ) applications . 
+ Such information would be preliminary to the development of rapid biosensing of VOC and sVOC in contaminated wastewater , offering protective measures for wastewater treatment plants and final users of reclaimed water39 . 
+ In this study , we used transcriptomics to investigate the global gene expression of E. coli K-12 grown in the presence of industrially relevant VOCs and sVOCs . 
+ All of the selected compounds are commonly used as solvents or produced as by-products during manufacturing of polymers , cleaners and industrial chemicals , with an exception of N-methylsuccinimide ( NMS ) , which is one of the metabolites commonly used as a biomarker for exposure of the solvent N-methyl-2-pyrrolidone ( NMP ) 9 . 
+ We aim to understand the specific and non-specific responses to the selected compounds . 
+ The focus in this study is to investigate genes that are responsive at non-growth inhibi-tory concentration , yet significant enough to induce a response at the transcriptome level . 
+ Results and Discussion
+ Growth and overall transcriptome profile of E. coli grown with VOCs . 
+ We analyzed the transcriptome of E. coli K-12 grown in the presence and absence of selected VOCs ( Supplementary Figure S1 ) using Illumina RNA-seq . 
+ Growth curve experiments were performed on E. coli with 0 ( as control ) , 0.02 , 0.1 and 0.5 % ( v/v ) of the selected VOCs to determine the highest non-inhibitory concentration to be used in RNAseq experiments ( Supplementary Methods and Figure S2 ) . 
+ The concentrations were established to be : 0.02 % for toluene ( T ) , 0.1 % for n-butanol ( B ) , N-cyclohexyl-pyrrolidone ( CHP ) , cyclopentanone ( CP ) , dimethyl sulfide ( DMS ) , N-methyl-2-pyrrolidone ( NMP ) ; 0.5 % for N , N -- Dimethylacetamide ( DMA ) and N-methyl succinimide ( NMS ) ( Table 1 ) . 
+ At these concentrations the cells reach optical density ( 600nm ) of 0.4 in approximately 5 -- 6h from initial O.D. of 0.02 in MOPS media ( Supplementary Figure S2 ) . 
+ There was a slight growth inhibition on DMS and CHP treatment during mid-log growth at concentration of 0.1 % . 
+ We have regarded this inhibition as non-significant and have chosen this concentration for subsequent RNA extraction . 
+ Previous work using E. coli to study the genes involved in tolerance ( using microarray/genomic library screening ) of selected VOCs uses a range of concentrations from 0.5 % 36 to 1.7 % butanol36 , and up to 10 % for toluene29 . 
+ The concentration of n-butanol that caused a 50 % growth decrease in M9 medium in E. coli DH1 was 0.8 % ( v/v ) 33 . 
+ Most of these studies used concentrations that are growth inhibitory to E. coli . 
+ We expect that the transcriptome of E. coli using non-inhibitory levels of compounds used in the current study would provide new insights compared to existing literature . 
+ In the present transcriptomic analysis , read mapping against the E. coli K-12 MG1655 genome was performed which allowed us to identify differentially expressed genes . 
+ The analysis identified the expression of 4140 coding DNA sequence ( CDS ) tags . 
+ The non-metric multidimensional scaling ( NMDS ) plot of global mRNA expression profiles revealed separate clustering patterns on cells grown with VOC compared to the no VOC controls , with NMS and NMP-treated cells clustering furthest from the controls on the first dimension ( Fig. 1 ) . 
+ Biological rep-licates for most VOC treatments clustered tightly indicating consistency between the replicates , although the clustering for treatment DMS , DMA , B are not as tight compared to the rest of the treatments . 
+ The differentially expressed ( DE ) genes identified ( with cut off at log fold change of greater than 1 or less than − 1 , an average logCPM value of greater than or equal to 3 , and a p-value less than 0.05 ) are distributed across a range of average logCPM values ( Supplementary Figure S3 ) . 
+ More DE genes were up rather than down regulated following treatment by B , DMA , DMS , and T . 
+ The converse was true for CHP , CP , NMP , and NMS-treated cells ( Supplementary Figure S3 , Table 2 ) . 
+ The percentages of genes identified as significantly differentiated over the total gene expression profile in VOC treatments compared to the controls ranged from 9.28 % ( DMA ) up to 25.94 % ( NMS ) ( Table 2 ) . 
+ Similar trend was found for chemical-specific gene responses ( identified based on Venn analysis of DE genes ) , with DMA having the lowest ( 1.30 % ) and NMS the highest percentage ( 24.21 % ) ( Table 2 , Fig. 2 ) . 
+ In addition , a total of 625 DE genes were shared by four or more VOC treatments , suggesting a subset of common genetic responses . 
+ The expression pattern for these DE genes appears to be divided into two major clusters for the VOCs used in this study ( Fig. 3 ) . 
+ Cells grown with B , T , DMS and DMA elicited more similar transcription patterns than CHP , CP , NMP and NMS . 
+ ( Fig. 3 ) . 
+ These observations suggest that some VOCs might induce more cellular responses compared to others at non-growth inhibitory concentrations . 
+ Clustering of the overall transcriptome patterns of VOC treatments ( Fig. 1 ) had some resemblance compared to the shared DE gene profiles ( Fig. 3 ) . 
+ For example , the profile of treatment CHP and CP , NMP and NMS are clustering closer to each other compared to other treatment in both the MNDS and heatmap plots . 
+ The relationship between the chemical properties of the compounds tested and the degree of cellular response in E. coli would be an interesting investigation in the future . 
+ A number of genes relating to cold-shock responses were up regulated in our transcriptomic datasets ( Supplementary Table S1 ) . 
+ We have disregarded these genes as response to VOCs as the promoter clones for these genes failed to show an increase in GFP expression compared to the control in our bioassays at 37 °C ( Supplementary Figure S4 ) . 
+ These cold-shock related genes are likely to be an artifact of concentrating the bio-mass at 4 °C . 
+ Functional gene categories induced by multiple chemical treatments . 
+ Induction of iron-sulfur assembly system . 
+ Fe/S proteins participate in diverse biological processes such as respiration , central metabo-lism , DNA repair and gene regulation40 . 
+ The iron-sulfur cluster ( ISC ) and sulfur mobilization ( SUF ) systems carry out biogenesis and maturation of all Fe/S clusters in prokaryotes . 
+ In the ISC system , IscU and IscS are required to build the Fe/S cluster , followed by release of Fe/S cluster by HscA and HscB . 
+ In the SUF system , SufSE forms the Fe/S cluster , and SufBCD complex is responsible for cluster transfer and release40 . 
+ The compounds used in the current study had a higher expression of genes under different Fe/S cluster biogenesis system compared to the no chemical treatment control . 
+ Both ISC and SUF systems were activated following B and CP treatment , while only the ISC system is activated following CHP , NMP , NMS and T treatment , and only SUF system was activated when cells were grown with DMA and DMS ( Table 3 ) . 
+ This suggests that different chemicals induced distinctive responses in Fe/S assembly systems . 
+ IscR , a gene encoding the regulator responsible for Fe/S homeostasis and regulates the expression of a number of Fe/S proteins41 , was also up regulated in cells exposed to the eight compounds tested in the current study . 
+ The up regulation of iscR is validated through promoter : GFP fused expression assays ( Supplementary Figure S5 ) . 
+ IscR represses its own expression when there is suffifficient Fe/S cluster in the cell , and the isc operon is activated when cells are under Fe/S cluster-limiting and oxidative stress conditions . 
+ Overexpression of iscR might indicate that the chemicals tested in the current study elicited an oxidative stress or iron-limiting condition on the cells . 
+ This could be caused by the action of the VOCs on outer membrane proteins33 . 
+ In addition , the SUF system is believed to provide better resistance to iron40 ,42 and oxidative stresses compared to the ISC system43 -- 45 . 
+ Whether the induction of the SUF system when cells were exposed to DMA and DMS is directly linked to oxidative stress is unknown , as other regulators , like Fur , OxyR are also known to be involved in SUF-type Fe/S regulation40 . 
+ Oxidative stress responses . 
+ A number of genes known to be induced by oxidative agents were up regulated in response to at least 4 VOCs used in the current study ( Table 3 and Supplementary Figure S5 ) . 
+ PqiAB is a SoxRS-regulated membrane protein known to be induced by paraquat and other superoxide generators , but it is not induce by hydrogen peroxide , ethanol and heat shock46 . 
+ YhcN was identified as a stress protein associated with hydrogen peroxide , cadmium and acid47 . 
+ MntS confers resistance to hydrogen peroxide by facilitating delivery of Mn2 + to Mn2 + - dependent enzymes48 . 
+ A gene encoding for methionine sulfoxide reductase , msrB , was up regulated as well . 
+ MsrB repairs methionine residues in proteins that have been oxidized by reactive oxygen species49 . 
+ Collectively , the results indicate that E. coli cells exposed to the compounds tested in the study induce oxidative stress responses even at non-inhibitory concentrations . 
+ In addition , there might be proteins oxidized by the presence of VOCs . 
+ yfbU , a gene known to be involved in cell death by oxidative DNA damage50 , was down regulated in all treatments , suggesting that the cells did not go through the toxin : antitoxin response when grown with chemical tested , but instead employ alternative oxidative stress responses as described . 
+ Induction of various transporter proteins . 
+ Transporter proteins for inorganic ions , amino acids , and the PTS systems were among the top three categories to be differentially expressed in at least 4 chemical treatments compared to the control ( Fig. 4 and Supplementary Figure S5 ) . 
+ Genes involved in the uptake of both inorganic iron ( e.g. feoA , feoB and efeO ) , and siderophores ( exbBD , yncD and fhuF ) were up regulated . 
+ Genes involved in iron uptake have been shown to increase E. coli 's tolerance to environmental stresses . 
+ For example , over expression of feoA increases the tolerance of E. coli to butanol36 , and efeO confers resistance to mitomycin C and other stresses such as UV irradiation compared to wild type cells51 . 
+ ExbB and ExbD proteins are required to provide energy for the import of iron-siderophore complexes and vitamin B12 across the outer membrane via TonB52 -- 54 . 
+ YncD , a putative TonB-dependent outer membrane transporter for iron55 , could be one of the protein targets of TonB-ExbB-ExbD . 
+ The FhuF protein is required for cells to use hydroxamate-type siderophores as iron source56 . 
+ Collectively , up regulation of iron uptake genes implies that the cells are actively utilizing iron , possibly for the formation of Fe/S cluster containing proteins as described above . 
+ Transporters for other inorganic ions were also up regulated ( Fig. 4 ) , e.g. , genes for magnesium ( mgtA ) and manganese ( MntH ) uptake . 
+ MntH was to shown support the growth of E. coli cells encountering iron-deficiency and oxidative stress57 . 
+ During H2O2 stress , mutants lacking ability to import manganese and iron suffer high rates of protein oxidation , implying the role of MntH in preventing protein damage . 
+ Potassium efflfflux genes ( kefB and kefG ) were up regulated too . 
+ Efflfflux of potassium is known to play a role in protecting the cell from electrophile toxicity through acidification of the cytoplasm58 , suggesting cells grown with VOC might be undergoing electro-philic stress . 
+ The second largest transporter type relates to amino acids ( Fig. 4 ) . 
+ In particular , the dipeptide ABC transporter , encoded by the oppABCDF operon , was up regulated in most VOC treatments . 
+ The OppABCDF system function in oligopeptide uptake as well as recycling of cell wall peptides59 . 
+ Expression of opp genes was up regulated in cells treated with 1 % isobutanol as an early stage response38 , and oppD increased antibiotic resistance in E. coli during biofilm formation60 . 
+ Increased expression of the opp genes support previous findings that these transporters are involved in VOC resistance . 
+ The tnaCAB gene cluster , responsible for the uptake of tryptophan , was down regulated in response to most VOC used . 
+ Mutants lacking tnaCAB had increased isobutanol tolerance61 , supporting our finding that tnaCAB plays a negative role in VOC tolerance . 
+ The cytoplasmic putrescine transporter protein , encoded by PpotFGHI , was significantly up regulated following n-butanol , DMA , NMP and T treatment . 
+ The up regulation of potG stimulates cell growth in the presence of phenylpropanoids , which indicates that PotFGHI might also be involved in the import of this compound class62 . 
+ Cells grown with VOCs could either have an elevated concentration of putrescine inside the cell , or could also plays a role in transport of VOCs . 
+ The third most abundant transporter class containing DE genes identified belong to the phosphotransferase ( PTS ) system , which is an active transport system responsible for uptake of nutrients in bacteria ( Fig. 4 ) . 
+ The PTS system is activated when ambient nutrient level is low63 . 
+ In this study , most of the DE genes under the PTS systems were down regulated in most VOC treatments , including those responsible for glucose , dihydroxyacetone , fructose , galactitol , mannose and glucitol . 
+ Down regulation of these systems could be explained by the high nutrient media utilized in growing the cells ( 1.5 % glucose ) , hence the cells does not require active transport for nutrient uptake . 
+ Other transporter types with differential gene responses include multidrug efflfflux proteins and those related to osmotic response ( Fig. 4 ) . 
+ Three genes related to multidrug efflfflux proteins , mdtI , mdtJ and emrB , were up regulated in most chemical treatment used in the current study . 
+ MdtJ and I are two components of a spermidine exporter64 and emrB is known to increase tolerance to hydrophobic compounds , such as organomercurials and nalidixic acid65 and thiolactomycin66 . 
+ Multidrug exporters are capable of exporting compounds consisting different structural components , hence they could potentially export the compounds tested in the current study . 
+ Genes known to be associated with maintaining appropriate osmotic conditions in cells , for example , osmY , and ABC transporters for transport of osmoprotectants like proline , glycine betaine , and taurine ( proP , proV , proX and tauA ) were up regulated . 
+ The VOC used in the current study might have an effect in the osmotic condition in E. coli cells , hence inducing the expression of this gene class . 
+ In addition , the expression of a DNA-binding transcriptional repressor known to confer organic and inorganic acid stress , ydcI , was up regulated in all VOC treatment . 
+ YdcI protein is conserved across gram-negative bacteria and a S. typhimurium mutant lacking this gene had decreased resistance to acid stress67 . 
+ Up regulation of ydcI genes in our study imply that this gene may also be a response to VOC . 
+ Universal stress proteins . 
+ E. coli harbors six usp genes -- uspA , C , D , E , F and G. 68,69 . 
+ The functions of Usps overlap to some extent , e.g. both UspA and UspE are involved in oxidative stress defense68 , while UspG and UspF are associated with fimbriae-associated adhesion68 ,70 . 
+ From the transcriptomic results of the current study , we observed a down regulation of uspA and uspG in most VOC treatments , while uspE and uspF were up-regulated in B , DMA , DMS and T ( Table 3 , Supplementary Figure S5 ) . 
+ As UspA have functions that overlaps with UspE , down regulation of uspA can be compensated for by the expression in uspE . 
+ Similarly , down regulation of uspG expression can be compensated by up regulation of uspF . 
+ Flagella and cellular motility . 
+ Many genes relating to flagella biosynthesis and motility ( the flg , flh and fli genes ) were significantly down regulated in all VOC treatments , with the exception of treatment NMS ( Table S2 ) . 
+ Previous studies have found that flagellar biosynthesis was down regulated in E. coli exposed to ethanol30 as well as heat stress71 . 
+ Since NMS is not a VOC , it is not surprising that these genes were not repressed . 
+ However , a decrease in expression of flagella genes did not result in a reduction in motility in soft agar motility assays ( Supplementary Methods and Figure S6 ) . 
+ It is possible that the E. coli cells have already synthesized the flagellum before flagellar gene repression occurring in the assay . 
+ Other possible reasons include the differences in growth condition of E. coli due to the nature of the motility assay , e.g. surface-associated soft agar versus liquid , and the time of incubation . 
+ Functional gene categories induced by specific chemical treatments . 
+ Shared DE genes responsive to CHP and CP . 
+ A total of 96 genes responded significantly with specificity to CP and CHP , which shared the highest number of genes compared to other chemical treatment combinations ( Figs 2 and 5 ) . 
+ Top COG categories of the shared DE genes belong to Post-translational modification , protein turnover and chaperones ( O ) , Amino acid transport and metabolism ( E ) , Cell wall/membrane biogenesis ( M ) ( Fig. 5 ) . 
+ A few DE genes identified gave indications that CP and CHP might interfere with protein structure and outer membrane integrity . 
+ For example , the mlaD and mlaF genes , which prevent accumulation of phospholipids ( PLs ) in the outer leaflet of the outer membrane in E. coli cells , were up regulated . 
+ Cells accumulate PLs in the outer leaflet of the OM when exposed to harsh chemical treatments . 
+ This process would disrupt the LPS organization and increasing sensitivity to small toxic molecules72 . 
+ Up regulation of mla genes imply that the cells ' OM lipid asymmetry could be disrupted in the presence of the chemicals tested . 
+ In addition , a number of genes encoding for molecular chaperones were significantly up regulated in response to CHP and CP . 
+ These include the protein ( re ) - folding chaperones ( htpG , fkpA , dnaK-DnaJ-GrpE and the GroES ) , protein resolubilization chaperones ( clpB ) and a protease involving in clearing the defective peptides ( hslU ) . 
+ Up regulation of these genes imply that CHP and CP cause cellular protein misfolding in E. coli . 
+ Transporter-related genes specifically up regulated in response to CHP and CP include genes encoding for peptide transport proteins ( dtpD ) , and a putative drug efflfflux system protein ( mdtG ) . 
+ Overexpression of mdtG has been found to increase resistance to deoxycholate ( bile acid ) and the board spectrum antibiotic fosfomycin73 . 
+ Up regulation of such multidrug efflfflux genes could imply that cells perceive CP and CHP compounds as drugs and attempt to export them out of the cells . 
+ Shared DE genes responsive to NMP and NMS . 
+ The next chemical pair sharing the highest number of DE genes is NMP and NMS , sharing 68 genes based on Venn analysis ( Figs 2 and 6 ) . 
+ NMP is an organic compound consisting of a 5-membered lactam and NMS is a metabolite of NMP biodegradation9 . 
+ Although NMS is not considered as a VOC , it is cyclic . 
+ Most DE genes under energy production and conversion responding specifically to NMP and NMS were down regulated ( e.g. hyaDC , cbdAB and frdAD genes ) , except for rsxC , which is part of the rsx operon ( Fig. 6 ) . 
+ The rsxABCDGE gene cluster is involved in switching off the SoxR-mediated induction of SoxS transcription factor when cells are deficient of oxidizing agents74 . 
+ Up regulation of these genes could imply that the cells cultured with NMP and NMS were less prone to oxidative stress and require SoxR reduction to repress downstream activation of SoxS . 
+ Interestingly , rxsA was shown to be important for survival of cells exposed to ionizing radiation75 . 
+ Genes encoding for TolA-TolQ-TolR complex , were up regulated in cells treated with NMP and NMS ( Fig. 6 ) . 
+ The Tol-Pal cell envelope complex is known to be involved in maintaining cell envelope integrity , and mutants have greatly increase sensitivity to drugs and detergents and are prone to periplasmic leakage76 ,77 . 
+ Cells treated with NMP and NMS might respond differently to membrane disruption compared to that of CP and CHP . 
+ NMP and NMS activate the TolAQR complex whereas cells exposed to CP and CHP activate the Mla pathway . 
+ The molecular mechanisms behind activation of different gene clusters in response to maintenance of cell envelope integrity would be an interesting area for future investigations . 
+ Genes under `` Defense mechanisms '' that were up regulated include arnE and nudE which belong to the drug/metabolite transporter superfamily and the Nudix hydrolases family respectively ( Fig. 6 ) . 
+ Genes relating to iron-enterobactin transporter ( fepB and fepD ) and thiosulfate : cyanide ( glpE ) sulfurtransferase were up regulated specifically following NMP and NMS treatment . 
+ The fepBCDG complex together with the TonB-dependent outer-membrane transporter , and fepA , is responsible for the import of ferric enterobactin across the cell envelope . 
+ In addition to the iron-uptake system discussed in the previous sections , cells treated with NMP and NMS appear to have an additional iron-enterobactin transporter up regulated in the conditions tested in this study . 
+ Stress and membrane repair-related DE genes responsive to one chemical treatment . 
+ Genes that responded positively to one particular VOCs were identified , with a number of them related to stress ( oxyR , dinF , ydiY ) , transport pumps for metals ( nikC , rcnA and rcnB ) and transport pumps for drugs ( emrKY , mdtA , sbmA , yebQ ) ( Fig. 7 ) . 
+ Expression of emrK ( part of the EmrKY-TolC multidrug efflfflux transport system ) was found to increase in the presence of sub-inhibitory concentration of a number of antibiotics78 . 
+ As the concentration of the chemical used in this study is considered non-inhibitory , results supported the conclusion that low concentrations of compounds are suffifficient to induce a transcriptional response in various functional categories from the cell ( Supplementary Figures S7 ) . 
+ A number of genes relating to cell wall biogenesis were specifically up regulated when cells were exposed to NMP ( tonB , phoE , ldtB , wzzB , ugd ) . 
+ Induction of these genes could imply that the cells have membrane component biosynthetic pathways activated specifically when exposed to NMP , implying that the involvement of NMP damages cell wall component , thus requiring repair . 
+ COG category enrichment of DE genes . 
+ We performed COG enrichment analysis of total DE genes induced by individual chemical treatments against the E. coli 's genome copy of COG distribution ( Fig. 8 ) . 
+ More than half of the chemicals tested had amino acid related genes overrepresented compared to the E. coli genomic background . 
+ Amino acid metabolism is central to cellular survival and it is related to many parts of cellular metabolism . 
+ Genes under this category have been found to be differentially expressed in E. coli cells exposed to butanol33 ,36 . 
+ Cells exposed to n-butanol , DMS and toluene have a significant higher number of DE genes belonging to COG category of energy conversion , implying that genes under energy conversion are responsive to these VOCs . 
+ NMS is the only treatment that had genes relating to translation overrepresented . 
+ A total of 31 genes under translation and ribosomal biogenesis category were specifically responsive to NMS , many of them encode for ribosomal subunit proteins , implying that the cells were actively synthesizing proteins . 
+ Being a metabolic by-product of NMP , NMS is not a VOC , and appears that this metabolite does not impair cellular metabolism/growth at all . 
+ Both CHP and NMP had motility gene class overrepresented compared to E. coli 's background as well . 
+ COG categories were under represented compared to E. coli 's genomic background including those related to replication and repair for treatment NMP and T , and cell wall biogenesis were underrepresented in treatment n-butanol and toluene . 
+ Collectively these results could imply that different VOCs induced genes under specific COG categories . 
+ Catabolic pathways of VOCs and aromatic compounds . 
+ Little is known about the biodegradation of VOCs focused on in this study . 
+ The genome of E. coli K-12 contains neither the genes responsible for the degradation of DMS ( e.g. DMS monooxygenase , DMS dehydrogenase and DMS methyltransferase ) 79 , nor for toluene ( i.e. toluene-2 ,3 - dioxygenase ) 80 . 
+ Transcriptomic profiles of genes encoding for ring-hydroxylating oxygenases and transformation of aromatic compounds revealed very few differentially expressed genes in cells treated with the VOCs in the current study , with the exception of tnaA and entA , which were up-regulated following tolu-ene treatment , and ubiX , ubiB , which were up-regulated with CHP and NMS treatment ( Supplementary Table S3 ) . 
+ However , these genes are also involved in generic cellular metabolism and their direct involvement in the transformation of VOCs in this study is yet to be determined . 
+ A search for xenobiotics degradation pathways in KEGG ( according to which some pyrrolidones have been classified ) , revealed that most of the described xenobiotics in KEGG are structurally very different from the VOC used here . 
+ Hence to the best of our knowledge , this study is the first to describe transcriptomic responses of E. coli K-12 exposed to VOCs with pyrrolidone backbone . 
+ In conclusion , RNA-seq data in this study suggested that a variety of genes relating to Fe/S cluster biogenesis , oxidative and universal stress responses , as well as transport and membrane bound proteins are responsive to selected VOCs in E. coli . 
+ These genes were differentially expressed when the cells were in balanced-growth and at the highest non-inhibitory concentrations , which is well above the basal detectable environmental levels ( PUB , personal communications ) . 
+ By identifying the transcriptional responses occurring between the basal levels and high concentration spikes , we have set the framework for the analysis of the dose dependent response , a key element in biosensor development . 
+ The numerous changes in gene expression upon exposure to the different VOCs suggests that E. coli might exhibit analogous response when exposed to chemical compounds of similar nature . 
+ It is interesting to speculate that the clustering of DE genes in response to different VOC tested could be related to the overall physical properties ( polarity , volatility ) and to the structure of the VOCs ( i.e. linear chain vs cyclic compounds ) used in the current study . 
+ Further studies are necessary to uncover the specific molecular mechanisms of E. coli 's cellular responses to chemical compounds of different structures . 
+ In addition , a number of DE genes described in this study , for example , those related to Fe/S cluster biogenesis , and various transporter genes , are conserved in other environmentally relevant bacteria , such as Pseudomonas species20 . 
+ Results from the current study hence could also be applicable to future biosensor development in bacteria other than E. coli . 
+ However , one should note that some Pseudomonas species are known to be able to metabolize a number of VOCs and cyclic hydrocarbons via enzymatic conversions22 ,81 , hence their global genetic response to VOCs might be different from E. coli . 
+ Experimental Procedures
+ Chemicals . 
+ Chemicals used were as follows : n-butanol ( B ) , N-cyclohexyl-pyrrolidone ( CHP ) , Cyclopentanone ( CP ) , N , N -- Dimethylacetamide ( DMA ) , Dimethyl sulphide ( DMS ) , N-methyl-2-pyrrolidone ( NMP ) , and N-methyl-succinimide ( NMS ) and Toluene ( T ) . 
+ All were purchased from Sigma-Aldrich ( Taufkirchen , Germany ) and were of analytical purity . 
+ E. coli cultivation and RNA extraction . 
+ E. coli K-12 strain MG1655 was cultured in 10 mL LB5 broth within a shaking incubator at 150 rpm and at 37 °C for 16 h . 
+ The overnight culture was diluted ( 1:100 ) in 10 mL MOPS medium ( Neidhardt et al. 1974 ) supplemented with 1.5 % glucose . 
+ Based on the MIC assays ( Supplementary Methods ) , different VOC concentrations were added at the beginning of cultivation ( Table 1 ) and three biological replicates were used for each chemical treatment . 
+ Cells were grown in Balch-type tubes ( 18 × 150 mm ) with 20 mm butyl rubber stopper and aluminum seal to minimize leakage of VOCs during the cultivation time . 
+ Cells were incubated in a shaking incubator at 37 °C and were harvested for RNA extraction when OD600 reached 0.4 . 
+ The RNA extraction was as follows : 5 mL aliquots of the cultures were added to two volumes of RNAprotect Bacteria Reagent ( Qiagen ) . 
+ The mixture was incubated at room temperature for 5 min followed by centrifugation at 4,000 × g for 10 min at 4 °C . 
+ The supernatant was removed and the cell pellets were stored at − 80 °C until RNA extraction . 
+ RNA was extracted using the RNeasy ® Mini Kit ( Qiagen ) , following the manufacturer 's recommendations . 
+ Contaminating DNA was removed using DNsae ( Qiagen ) until DNA concentration was less than 5 % of the RNA . 
+ DNA and RNA concentrations were measured using Picogreen and Ribogreen assays ( Invitrogen ) , respectively . 
+ RNA sequencing . 
+ The quality of the RNA samples was determined by running the samples on a Bioanalyzer RNA 6000 Pico Chip ( Agilent ) . 
+ Next-generation sequencing library preparation was performed following Illumina 's TruSeq Stranded mRNA Sample Preparation protocol with the following modifications : RNA samples were added to the elute-fragment-prime step . 
+ The PCR amplification step , which selectively enriches for library fragments that have adapters ligated on both ends , was performed according to the manufacturer 's recommendation . 
+ Each library was uniquely tagged with one of Illumina 's TruSeq LT RNA barcodes to allow library pooling for sequencing . 
+ Library quantitation was performed using Invitrogen 's Picogreen assay and the average library size was determined by running the libraries on a Bioanalyzer DNA 1000 chip ( Agilent ) . 
+ Library concentrations were normalized to 2 nM and validated by qPCR on a ViiA-7 real-time thermocycler ( Applied Biosystems ) , using qPCR primers recommended in Illumina 's qPCR protocol , and Illumina 's PhiX control library as stand-ard . 
+ Libraries were then pooled and sequenced in one lane of an Illumina HiSeq2500 rapid sequencing run at a read-length of 101bp paired-end . 
+ Sequencing data have been submitted to GenBank SRA archive with the BioProject ID : PRJNA286974 and SRP accession SRP059483 . 
+ RNAseq data analysis . 
+ Quality trimming and adaptor removal were done using Cutadapt v1 .9.082 with the following parameters : -- q 20 , -- m 30 , -- overlap 10 , -- quality-base 33 . 
+ Sequences were mapped to the E. coli str . 
+ K-12 MG1655 genome ( NCBI accession : NC_00913 .3 ) by bowtie283 with end-to-end and very-sensitive modes . 
+ The alignments were converted to . 
+ bam and . 
+ bam-indexed files using Samtools84 . 
+ Sorted alignment files were imported into R to calculate overlapping reads as counts per gene using a combination of the following R packages : Rsamtools , GenomicFeatures and GenomicAlignments85 . 
+ Only the concordant pairs in the sorted * . 
+ bam files were imported using the function `` readGAlignmentPairsFromBam '' . 
+ Differential genes were identified from the tabular output of gene count abundance using edgeR package86 . 
+ The edgeR package implements a quantile-adjusted conditional maximum likelihood ( qCML ) estimator for the dispersion parameter of the negative binomial distribution86 ,87 . 
+ Testing for DE genes from biological replicates is based on the exact test derived based on these models . 
+ To calculate differentially expressed genes , all VOC treatments were compared to the control in which the cells did not have any exposure to VOCs . 
+ DE genes that have at least 2-fold change , p-value less than 0.05 and logCPM value greater than 3 were considered significantly different from the no VOC control . 
+ Genes were mapped to COG and KEGG IDs using the December 2014 release of COG database88 and June 2013 release of the KEGG database ( Kanehisa Laboratories ) . 
+ Principal component analysis ( PCoA ) , Venn and heatmap analysis were performed using R packages ( vegan , venn , heatmap .2 , respectively ) , and pathway maps were plotted using iPATH89 . 
+ GFP kinetics using fluorescent transcriptional reporter E. coli clones . 
+ Selected E. coli clones with transcriptional fusions of GFP to relevant promoters of the identified DE genes were used to validate the RNAseq results27 . 
+ Reporter strains were inoculated from frozen stocks into 2 × LB broth and incubated for 16 h at 37 °C . 
+ The cells were diluted ( 1:100 ) into fresh 1 × MOPS medium supplemented with 25 μg / mL kanamycin and 1.5 % glucose and grown as described previously . 
+ The VOC were added at the same concentration used in RNA experiments . 
+ When the OD600nm reaches 0.35 , an aliquot of culture was transferred to 96-well microplate . 
+ Optical densities were measured at OD595nm and GFP intensity were measured at 485/535nm at 15 min interval for 4 h. Triplicates were performed and cells grown without VOC were used for comparison . 
+ E. coli clone with the same vector backbone without any promoter was used as background noise subtraction . 
+ Fold-change analysis was performed and maximum fold-change was recorded . 
+ Acknowledgements
+ The authors would like to acknowledge financial support from PUB ( Award No : M4340001.C70 ) and Singapore Centre on Environmental Life Sciences Engineering ( SCELSE ) , whose research is supported by National Research Foundation Singapore , Ministry of Education , Nanyang Technological University and National University of Singapore , under its Research Centre of Excellence Programme . 
+ We also acknowledge Martin Tay for providing the revised mapping table of the COG 's 2014 database . 
+ This work is licensed under a Creative Commons Attribution 4.0 International License . 
+ The images or other third party material in this article are included in the article 's Creative Commons license , unless indicated otherwise in the credit line ; if the material is not included under the Creative Commons license , users will need to obtain permission from the license holder to reproduce the material . 
+ To view a copy of this license , visit http://creativecommons.org/licenses/by/4.0/
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/27171414.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/27171414.txt 0 → 100644
View file @27818a9
+ Mapping Topoisomerase IV Binding and
+ Abstract 
+ Catenation links between sister chromatids are formed progressively during DNA replication and are involved in the establishment of sister chromatid cohesion . 
+ Topo IV is a bacterial type II topoisomerase involved in the removal of catenation links both behind replication forks and after replication during the final separation of sister chromosomes . 
+ We have investigated the global DNA-binding and catalytic activity of Topo IV in E. coli using genomic and molecular biology approaches . 
+ ChIP-seq revealed that Topo IV interaction with the E. coli chromosome is controlled by DNA replication . 
+ During replication , Topo IV has access to most of the genome but only selects a few hundred specific sites for its activity . 
+ Local chromatin and gene expression context influence site selection . 
+ Moreover strong DNA-binding and catalytic activities are found at the chromosome dimer resolution site , dif , located opposite the origin of replication . 
+ We reveal a physical and functional interaction between Topo IV and the XerCD recombinases acting at the dif site . 
+ This interaction is modulated by MatP , a protein involved in the organization of the Ter macrodomain . 
+ These results show that Topo IV , XerCD/dif and MatP are part of a network dedicated to the final step of chromosome management during the cell cycle . 
+ Introduction
+ DNA replication of a circular bacterial chromosome involves strong DNA topology constraints that are modulated by the activity of DNA topoisomerases [ 1 ] . 
+ Our current understanding of these topological modifications comes from extensive studies on replicating plasmids [ 2 , 3 ] These studies suggest that positive supercoils are formed ahead of the replication fork , while precatenanes are formed on newly replicated sister strands . 
+ At the end of a replication round , unresolved precatenanes accumulate in the region of replication termination and are converted to catenanes between the replicated sister chromosomes . 
+ Neither precatenanes or catenanes have been directly observed on chromosomes but their presence is generally accepted and failure to resolve them leads to chromosome segregation defects and cell death [ 4 ] . 
+ Topo IV is a type II topoisomerase formed by two dimers of the ParC and ParE subunits and is the main decatenase in Esherichia . 
+ coli [ 5 ] . 
+ in vitro , its activity is 100 fold stronger on cat-enated circles than that of DNA gyrase [ 6 ] . 
+ Topo IV activity is dependent on the topology of the DNA substrate ; Topo IV activity is strongest on positively supercoiled DNA and has a marked preference for L-braids , which it relaxes completely and processively . 
+ Topo IV can also unlink R-braids but only when they supercoil to form L-plectonemes [ 7 -- 9 ] . 
+ In vivo , DNA gyr-ase appears to have multiple targets on the E. coli chromosome [ 10 -- 12 ] , whereas Topo IV cleavage sites seem to occur less frequently [ 11 ] . 
+ Interestingly , Topoisomerase IV activity is not essential for replication itself [ 13 ] but is critical for chromosome segregation [ 14 ] . 
+ The pattern of sister chromatid separation has been shown to vary upon Topo IV alteration , leading to the view that precatenanes mediate sister chromatid cohesion by accumulating for several hundred kilobases behind the replication forks keeping the newly replicated DNA together [ 13 , 15 ] . 
+ The regulation of Topo IV and perhaps the accessibility of the protein to chromosome dimers was proposed to be an important factor controlling chromosome segregation [ 15 , 16 ] . 
+ Topo IV activity can be modulated by a number of proteins including MukB and SeqA . 
+ MukB , is an SMC-related protein in E. coli and is reported to bind to the C-terminus of Topo IV [ 17 ] to enhance Topo IV unlinking activities [ 18 , 19 ] . 
+ MukB also appears to be important in favoring the formation of Topo IV foci ( clusters ) near the origin of replication [ 20 ] . 
+ SeqA , a protein involved in the control of replication initiation , and Topo IV also interact [ 21 ] . 
+ These interactions may play a role in sister chromatid segregation at the late segregating SNAP regions near the origin of replication of the chromosome [ 16 ] . 
+ Beside its role in the resolution of precatenanes , Topo IV is mostly required in the post-rep-licative ( G2 ) phase of the cell cycle for the resolution of catenation links . 
+ Indeed , Espeli et al. showed that Topo IV activity is mostly observed during the G2 phase , suggesting that a number of catenation links persist after replication [ 22 ] . 
+ Recent cell biology experiments revealed that in G2 , the terminal region ( ter ) opposite oriC segregates following a specific pattern [ 23 -- 25 ] . 
+ Sister ter regions remain associated from the moment of their replication to the onset of cell division . 
+ This sister-chromosome association is mediated by the Ter macrodomain organizing protein , MatP [ 26 ] . 
+ At the onset of cell division , the FtsK DNA-translocase processes this region , releasing the MatP-mediated association . 
+ This process ends at the dif site , when the dimeric forms of the sister chromosomes are resolved by the XerC and XerD recombinases . 
+ A functional interaction between the MatP/FtsK/XerCD-dif system and Topo IV has long been suspected . 
+ FtsK interacts with Topo IV , enhancing its decatenation activity in vitro [ 27 , 28 ] and the dif region has been reported as a preferential site of Topo IV cleavage [ 29 ] . 
+ This functional interaction has been poorly documented to date and is therefore remains elusive . 
+ In this study we have used genomic and molecular biology methods to characterize Topo IV regulation during the Escherichia coli cell cycle on a genome-wide scale . 
+ The present work revealed that Topo IV requires DNA replication to load on the chromosome . 
+ In addition , we have identified two binding patterns : i ) regions where Topo IV binds DNA but is not engaged in a cleavage reaction ; ii ) numerous sites where Topo IV cleavage is frequent . 
+ We show that Topo IV-mediated removal of precatenanes is influenced by both local chromatin structure and gene expression . 
+ We also demonstrate that at the dif site , Topo IV cleavage and binding are enhanced by the presence of the XerCD recombinase and the MatP chromosome-structur-ing factor . 
+ The enhancement of Topo IV activity at dif promotes decatenation of fully replicated chromosomes and through interaction with other DNA management processes , this decatenation ensures accurate separation of the sister chromosomes . 
+ Results
+ Topoisomerase IV binding on the E. coli chromosome
+ To identify Topo IV binding , we performed ChIP-seq experiments in ParE and ParC Flag tagged strains . 
+ The C-terminus fusions of ParE and ParC replaced the wild-type ( WT ) alleles without any observable phenotypes ( S1 Fig ) . 
+ We performed three independent experiments , two ParEflag IPs and one ParC-flag IP , with reproducible patterns identified in all three experiments . 
+ A Pearson correlation of 0.8 , 0.9 and 0.7 was observed for ParC-ParE1 , ParE1-ParE2 and ParC -- ParE2 respectively . 
+ A map of enriched regions observed in each experiment is represented on Fig 1A ( red circles ) . 
+ Four of the highly-enriched sites are illustrated at a higher magnification in Fig 1A -- right panels . 
+ Interestingly one of these sites corresponds to the dif site ( position 1.58 Mb ) , which has previously been identified as a strong Topoisomerase IV cleavage site in the presence of norfloxacin [ 29 ] . 
+ We also observed strong enrichment over rRNA operons , tRNA and IS sequences . 
+ To address the significance of the enrichment at rRNA , tRNA and IS , we monitored these sites in ChIP-seq experiments performed in the same conditions with a MatP-flag strain and mock IP performed with strain that did not contain any flag tagged protein . 
+ Both MatP and Mock IP presented significant signals on rRNA , tRNA and IS loci ( S2 Fig ) . 
+ This observation suggested that Topo IV enrichment at rRNA , tRNAs and IS was an artifact of the ChIP-Seq technique . 
+ By contrast no enrichment was observed at the dif site in the MatP and mock-IP experiments ( S2 Fig ) , we therefore considered dif to be a genuine Topo IV binding site and compared every enriched region ( > 2 fold ) with the dif IP . 
+ We filtered the raw data for regions presenting the highest Pearson correlation with the dif signal ( > 0.7 ) . 
+ This procedure discarded many highly enriched regions ( Fig 1A orange circles ) . 
+ We identified 19 sites throughout the chromosome where Topo IV IP/input signal suggested a specific binding for at least two of the experiments ( Fig 1A , outer circle histogram , S1 Table ) . 
+ Most Topo IV binding sites span a 200 bp region . 
+ These sites frequently overlapped intergenic regions , with their mid-points located inside the intergenic region , and did not correlate with any identifiable consensus sequence . 
+ In addition to dif , which exhibited a 10-fold enrichment , three other sites were strongly enriched . 
+ These sites corresponded to positions 1.25 Mb ( 9.4 x ) , 1.85 Mb ( 31x ) and 2.56 Mb ( 19x ) on the chromosome ( Fig 1A , right panels ) . 
+ Beside these specific sites , Topo IV IP showed non-specific enrichment in the oriC proximal half of the chromosome . 
+ This bias was not a consequence of locus copy number , as the enrichment remained after copy number normalization ( Fig 1B ) . 
+ We used MatP-Flag IP [ 30 ] and a control IP in a strain that does not contain a Flag tagged gene to differentiate non-specific Topo IV binding from experimental noise ( S3A Fig ) . 
+ In addition , Topo IV enrichment was also observed in GC rich regions of the chromosomes ( S3B Fig ) . 
+ Importantly , the ori/ter bias was not a result of the GC % bias along the chromosome since it was still explicit after GC % normalization ( S3C Fig ) . 
+ More precisely , the Topo IV binding pattern closely followed gene dosage for a ~ 3Mb region centered on oriC ( S3D and S3E Fig and S1 Text ) . 
+ In the complementary ter-proximal region , gene dosage ( input reads ) was higher than the ChIP-seq profile , suggesting that the nonspecific Topo IV binding was lower or lasts for a shorter time in the cell cycle ( since these data are population-averaged ) . 
+ The Terminus region that is depleted in Topo IV binding ( 1.6 Mb ) surpassed , by far , the size of the Ter macrodomain ( 800kb ) . 
+ Topo IV binding is influenced by replication
+ The influence of Topo IV on sister chromatid interactions [ 15 ] prompted the question of how Topo IV would follow replication forks and bind to the newly replicated sister chromatids throughout the cell cycle . 
+ We performed ChIP-seq experiments in E. coli dnaC2 strains under conditions suitable for cell cycle synchronization of the entire population . 
+ Synchronization was achieved through a double temperature shift , as described previously [ 15 ] . 
+ Using these conditions , in each cell , S phase is initiated on one chromosome , lasts for 40 -- 45 min and is followed by a G2 phase ( 20 min ) ( S4 Fig ) . 
+ We analyzed ParE binding before the initiation of replication , in S phase 20 min ( S20 ) and 40 min ( S40 ) after the initiation of replication and in G2 phase . 
+ The synchronization of replication in the population was monitored by marker frequency analysis of the Input DNA ( Fig 1C ) . 
+ The profile observed for bacteria that did not replicate at nonpermissive temperature was strictly flat , but the S20 replication profile presented two sharp changes of the marker frequency slope around positions 500kb and 2700kb . 
+ This suggested that each replication fork had crossed approximately 1000 to 1300 kb in 20 min . 
+ The S40 replication profile demonstrated that most cells had finished replication , with the unreplicated region being limited to 300 kb around dif in no more than 20 % of the bacteria . 
+ In G2 phase the marker frequency was flat . 
+ We used flow cytometry to demonstrate that at G2 , the amount of DNA in each bacterium was double compared to that of the G1 bacteria , indicating that cytokinesis has not yet occurred ( S4 Fig ) . 
+ We analyzed Topo IV binding at specific binding sites ( Fig 1D ) . 
+ Binding at these sites was strongly impaired in the absence of replication . 
+ Binding at every site started in the S20 sample and was maximal in the S40 or G2 samples , without showing any marked decrease , even in the oriC-proximal region . 
+ These observations suggest that Topo IV binds to specific sites during S phase . 
+ However , since enrichment was observed for non-repli-cated loci and was maintained for a long time after replication , it was not compatible with a model of Topo IV migration with the replication forks . 
+ Synchronization experiments with a higher temporal resolution are required to clarify this observation . 
+ Only certain Topo IV binding sites correspond to Topo IV cleavage sites 
+ To measure Topo IV cleavage at the binding sites , we took advantage of the fact that norfloxa-cin covalently links Topoisomerase II to the gate segment of DNA and prevent its relegation 
+ [ 31 ] . 
+ We first monitored Topo IV activity on the Topo IV enriched regions ( 1.2 , 1.8 , 2.5 , 3.2 Mb and dif ) by incubating bacteria with norfloxacin for 10 minutes before genomic extraction and performing Southern blot analysis to detect the cleaved DNA products [ 10 , 29 ] . 
+ This revealed cleavage fragments induced by both DNA Gyrase and Topo IV poisoning in the WT strain , but only Topo IV cleavage in a nalR strain where DNA Gyrase is resistant to norfloxa-cin . 
+ Among the 5 tested sites , only two displayed clear Topo IV cleavage at the expected position ( Fig 2A ) . 
+ As expected , the dif site exhibited strong cleavage . 
+ Moreover cleavage was also observed at position 2.56 Mb . 
+ However the 1.2 , 1.8 and 3.2 Mb sites did not show any Topo IV mediated cleavage in the presence of norfloxacin . 
+ Topo IV presents hundreds of cleavage sites on the chromosome
+ The above result prompted us to investigate Topo IV cleavage at the genome-wide scale . 
+ We performed IPs in the presence of norfloxacin as a crosslinking agent instead of formaldehyde . 
+ Following this step , all downstream steps of the protocol were identical to that of the ChIP-Seq assay . 
+ We referred to this method as NorflIP . 
+ The NorflIP profile differed from the ChIP-seq profile ( Fig 2B ) . 
+ Regions immunoprecipitated with Topo IV-norfloxacin cross-links were frequently observed ( Fig 2C orange circle ) . 
+ Similarly to the ChIP-seq experiments , the NorflIP profile revealed strong enrichment over the rRNA operons and IS sequences but not at the tRNA genes ( S5A Fig ) . 
+ We used a Southern blot cleavage assay to demonstrate that these signal did not correspond to Topo IV cleavages ( S5B Fig ) . 
+ The NorflIP peaks correspond to a ~ 170 bp forward and reverse enrichment signal separated by a 130 bp segment , which is not enriched . 
+ This pattern is the consequence of the covalent binding of Topo IV to the 5 ' bases at the cleavage site . 
+ After Proteinase K treatment the cleaving tyrosine residue bound to the 5 ' extremity resulted in poor ligation efficiency and infrequent sequencing of the cleaved extremities . 
+ ( S6A and S6B Fig ) This observation confirmed that we were observing genuine Topoisom-erase cleavage sites . 
+ We used this pattern to define an automatic peak calling procedure ( S6C Fig ) that identified between 134 and 458 peaks in the three NorflIP experiments , two experiments performed with ParC-Flag and one with ParE-Flag ( Fig 2C purple circles and Fig 2D ) . 
+ We observed a total of 571 possible sites in the three experiments with about half of the sites common to at least two experiments and approximately 88 sites common to all three experiments ( S1 Table ) . 
+ We analyzed sequencing reads for the three experiments around the dif , 0.2 Mb and 1.92 Mb positions . 
+ It revealed abrupt depletions of forward and reverse reads in a 100bp center region suggesting that it corresponds to the site of cleavage . 
+ We extrapolated this result for every peak to estimate the cleavage positioning of Topo IV ( ~ 150bp downstream of the center of the forward peak , S6D Fig ) We manually validated 172 sites that were common to ParC-1 and ParE-1 experiments ( S1 Table ) for further analysis . 
+ Characteristics of Topo IV cleavage sites
+ The Topo IV cleavage at the dif site was the most enriched of the chromosome ( ~ 30 fold ) , fourteen sites were enriched from 5 to 10 fold and other positions were enriched from 2 to 5 fold ( Fig 2E ) . 
+ Most NorflIP sites did not correspond to significant peaks in the ChIP-seq experiment ( Fig 2E ) . 
+ We also did not observe any cleavage for the majority of the strong binding sites observed by ChIP-seq . 
+ This is illustrated for the binding site at 1.85 Mb ( Fig 2E ) . 
+ We verified several Topo IV cleavage sites by Southern blot , a significant cleaved DNA fragment was observed at the expected size for each of them ( Fig 2F ) . 
+ Southern blotting experiments following DNA cleavage in the presence of norfloxacin on synchronized cultures revealed that , like its binding , Topo IV cleavage is coordinated with DNA replication . 
+ In good agreement with 
+ ChIP-seq experiments , increased cleavage was observed as soon as 20 minutes after initiation of replication for the dif and 2.56 Mb sites ( Fig 2G ) . 
+ Genomic distribution of Topo IV cleavage sites
+ The general genomic distribution of Topo IV cleavage sites was not homogeneous ; a few regions had a large number of sites clustered together , while the 1.2 Mb -- 2.5 Mb region contained a low density of sites ( Fig 2H ) . 
+ We further analyzed the distribution of cleavage sites in the terminus and the oriC regions . 
+ In the terminus region , the average distance of consecutive cleavage sites was long ( around 30 kb in the 1.5 -- 2.5 Mb region ) compared to 8 kb in the 0.8 -- 1.5 Mb or the 2.5 -- 3.1 Mb regions ( S7A Fig ) . 
+ The oriC region displays a mixed distribution ( S7B Fig ) , a high density of sites near oriC flanked by two depleted regions , including the SNAP2 region [ 16 ] . 
+ At the gene scale , the mid-point of Topo IV cleavage signal can be localized inside genes ( 82 % ) or intergenic regions ( 16 % ) but it presents a bias toward the 5 ' or 3 ' gene extremities ( S7C Fig ) . 
+ Since the cleavage signal spans approximately 200bp , nearly 50 % of the sites overlapped , at least partly , with intergenic regions that account for only 11 % of the genome . 
+ Finally , we did not identify any robust consensus between sets of Topo IV cleavage sites . 
+ The only sequence traits that we identified are a bias for GC dinucleotides near the center of the sites ( S7D Fig ) and an increased spacing of GATC motifs around cleavage sites ( S7E Fig ) . 
+ Targeting of Topo IV cleavage activity is influenced by local environment 
+ The bias in the distribution of cleavage sites ( Fig 2H ) was very similar to the Topo IV binding bias revealed by ChIP-seq ( Fig 1C ) . 
+ NorflIP and ChIP-seq data were compared on Fig 3A . 
+ Despite the lack of corresponding ChIP-seq enrichment at the position of most highly enriched NorflIP sites , a number of consistencies were observed between these two data sets . 
+ Overall the NorflIP and ChIP-seq datasets had a Pearson correlation of 0.3 and the averaged data ( 1 kb bin ) revealed a Pearson correlation of 0.5 . 
+ First a small amount of local enrichment in the ChIP-seq experiments was frequently observed in the regions containing many cleavage sites ( Fig 3A and 3C ) . 
+ This led us to consider that trapped Topo IV engaged in the cleavage reaction could contribute to a small amount of local enrichment in the ChIP-seq experiments . 
+ Second , both Topo IV cleavages and binding sites were rare in highly expressed regions ( Fig 3A ) , only one of the 172 manually validated Topo IV cleavage site overlapped a highly expressed region . 
+ However cleavages sites were more frequently , than expected for a random distribution , observed in their vicinity ( Fig 3C and S8 Fig ) . 
+ Thirty percent ( 50/172 ) of the Topo IV sites are less than 2 kb away from the next highly expressed transcription unit ( Fig 3 ) . 
+ We explored correlations between the localization of Topo IV cleavages and binding sites of various NAPs thanks to the Nust database and tools [ 32 ] . 
+ A significant correlation was only observed for Fis binding sites ( Fig 3B ) . 
+ Sixty eight genes present both Fis binding [ 33 ] and Topo IV cleavage ( P value 2x10-03 ) . 
+ Thirty-three of the 172 manually validated cleavage sites overlapped at least partially with a Fis binding site , 80 of them are located less than 400 bp away from a Fis binding site . 
+ At the genome scale this correlation is difficult to observe ( Fig 3A ) , but close examination clearly revealed overlapping Topo IV cleavages and Fis binding sites ( Fig 3C ) . 
+ Fis binding sites are more numerous than Topo IV cleavage sites , therefore a large number of them do not present enrichment for Topo IV ( Fig 3C ) . 
+ By contrast , Topo IV peaks are excluded from H-NS rich regions ( Fig 3A , 3B and 3C ) . 
+ Only one of the 172 manually validated Topo IV cleavage site overlapped with an H-NS binding site . 
+ As observed for highly expressed regions TopoIV cleavage sites were frequently observed at the border of H-NS rich regions ( Fig 3C ) . 
+ Moreover H-NS rich regions contain less Topo IV than the rest of the chromosome ( Fig 3A -- 3D and S9A Fig ) . 
+ H-NS rich regions correspond to an AT rich segment of the chromosome ( Fig 3C and 3D ) . 
+ Indeed background level of Topo IV binding and cleavage were significantly reduced in AT rich regions ( S9B Fig ) . 
+ In rare occasions binding of H-NS has been observed in regions with a regular AT content ( Fig 3C ) , notably Topo IV binding and cleavage were also reduced in these regions . 
+ This observation suggested that H-NS itself rather than AT content limits the accessibility of Topo IV to DNA . 
+ This observation was confirmed by the identification of Topo IV cleavage in regions with an AT content ranging from 20 to 80 % ( S9C and S9D Fig ) . 
+ We performed Southern blot analysis of Topo IV cleavage on representative sites to test whether gene expression and chromatin factors influenced Topo IV site selection . 
+ First , we observed that the exact deletion of cleavage sites at position 1.92 Mb and 2.56 Mb did not abolish Topo IV cleavage activity ( Fig 3D and 3E ) . 
+ Second , since these loci also contain a Fis binding site overlapping Topo IV cleavage signal , we deleted the fis gene . 
+ However , deletion of the fis gene did not modify Topo IV cleavage ( Fig 3D and 3E ) . 
+ Finally we performed cleavage assays in the presence of rifampicin to inhibit transcription . 
+ To limit the pleiotropic effects of rifampicin addition we performed the experiment with a 20 min pulse of rifampicin . 
+ Rifampi-cin treatment abolished Topo IV cleavage ( Fig 3E ) . 
+ These results suggest that gene expression rather than chromatin factors influences Topo IV targeting . 
+ XerC targets Topo IV to the dif site
+ Our analysis confirms that the dif region is a hot spot for Topo IV activity [ 29 ] . 
+ Indeed , ChIP-seq and NorflIP show that Topo IV binds to and cleaves frequently in the immediate proximity of dif . 
+ We measured DNA cleavage by Topo IV in the presence of norfloxacin in various mutants affecting the structure of dif or genes implicated in chromosome dimer resolution . 
+ Southern blot was used to measure Topo IV cleavage ( Fig 4A ) . 
+ We observed that exact deletion of dif totally abolished Topo IV cleavage . 
+ Interestingly , the deletion of the XerC-binding sequence ( XerC box ) of dif was also sufficient to abolish cleavage , while the deletion of the XerD box only had a weak effect . 
+ Deletion of the xerC and xerD genes abolished Topo IV cleavage at dif . 
+ However , cleavage was restored when the catalytically inactive mutants XerC K172A or XerC K172Q were substituted for XerC ( Fig 4B ) . 
+ This suggests that the role of XerCD/dif in the control of Topo IV activity is structural and independent of XerCD catalysis . 
+ Deletion of dif or xerC did not significantly alter cleavage at any of the other tested Topo IV cleavage sites ( Fig 4C ) . 
+ This suggests that influence of XerC on Topo IV is specific to dif . 
+ To evaluate the role of XerCD-mediated Topo IV cleavage at dif , we attempted to construct parEts xerC , parEts xerD and parCts xerC double mutants . 
+ We could not obtain parCts xerC mutants by P1 transduction at any tested temperature . 
+ We obtained parEts xerC and parEts xerD mutants at 30 °C . 
+ The parEts xerC double mutant presented a growth defect phenotype at 30 °C and did not grow at temperature above 35 °C ( Fig 4D ) . 
+ The parEts xerD mutant presented a slight growth defect at 37 °C compared to parEts or xerD mutants . 
+ None of the parEts mutant grew above 42 °C . 
+ Next , we used quinolone sensitivity as a reporter of Topo IV activity . 
+ To this aim , we introduced mutants of the FtsK/Xer system into a gyrAnalR ( nalR ) strain ; Topo IV is the primary target of quinolones in such strains . 
+ The absence of XerC , XerD , the C-terminal activating domain of FtsK or dif exacerbated the sensitivity of the nalR strain to ciprofloxacin ( Fig 4D ) . 
+ We therefore concluded that the impairment of Topo IV was more detrimental to the cell when the FtsK/Xer system was inactivated . 
+ Among partners of the FtsK/Xer system the absence of XerC was significantly the most detrimental , suggesting a specific role for XerC in this process . 
+ The above results suggest an interaction between Topo IV and the XerCD/dif complex . 
+ We therefore attempted to detect this interaction directly in vitro ( Fig 4E and 4F ) . 
+ We performed EMSA with two fluorescently labeled linear probes , one containing dif and the other containing a control DNA not targeted by Topo IV in our genomic assays . 
+ Topo IV alone bound poorly to both probes ( Kd > 100nM ) . 
+ Binding was strongly enhanced when XerC or both XerC and XerD were added to the reaction mix . 
+ In contrast , Topo IV binding to dif was slightly inhibited in the presence of XerD alone . 
+ These results were consistent with the observation that deletion of the XerC box but not of the XerD box inhibited Topo IV cleavage at dif and pointed to a specific role for XerC in Topo IV targeting . 
+ The control fragment showed that these effects are specific to dif . 
+ Topo IV-XerC/dif complexes were stable and resisted a challenge by increasing amount of XerD ( S10A Fig ) . 
+ The positive influence of XerCD on TopoIV binding was also observed on a negatively supercoiled plasmid containing dif . 
+ In the presence of XerCD ( 50nM ) , a delay in the plasmid migration was observed with 40nM of TopoIV . 
+ By contrast , 200 nM was required in the absence of XerCD ( S10B Fig ) . 
+ The Southern blot cleavage assay showed that overexpression of the ParC C-terminal domain ( pET28parC-CTD ) strongly reduced cleavage at dif but enhanced cleavage at the Topo IV site located at 2.56 Mb . 
+ This suggested that , as observed for MukB [ 17 ] , Topo IV might interact with XerC through its C-terminal domain ( Fig 4G ) . 
+ Topo IV activity at dif depends on dynamics of the ter region and chromosome circularity
+ We assayed the effects of the reported Topo IV modulators and proteins involved in chromosome segregation the activity of Topo IV at dif . 
+ MukB has previously been shown to influence the activity of Topo IV [ 17 , 18 ] . 
+ We measured Topo IV cleavage in a mukB mutant at dif and at position 2.56 Mb , cleavage was reduced at dif but no significant effect was observed at position 2.56 Mb ( Fig 5A ) . 
+ We did not detect any effect of a seqA deletion on Topo IV cleavage at either position ( Fig 5B ) . 
+ We next assayed the effect of MatP , which is required for compaction and intracellular positioning of the ter region as well as for the its progressive segregation pattern ending at dif [ 25 , 26 ] . 
+ The Topo IV cleavage at dif was significantly impaired in the matP mutant ( Fig 5C ) . 
+ The Topo IV cleavage site at position 1.9 Mb is included in the Ter macrodo-main , but cleavage at this site was almost unchanged in the absence of MatP ( Fig 5C ) . 
+ Introduction of a matP deletion into the nalR strain yielded an increase in ciprofloxacin sensitivity ( Fig 5D ) . 
+ We also constructed a parEts matP double mutant . 
+ Growth of this strain was significantly altered compared to the parEts parental strain at an intermediate temperature ( Fig 5E ) . 
+ Such a synergistic effect was not found when combining the matP deletion with a gyrBts mutation . 
+ Taken together , these results led us to consider that MatP itself or the folding of the Ter macrodomain might be important for Topo IV targeting at dif . 
+ Since the FtsK/Xer/dif system is dedicated to post-replicative events that are specific to a circular chromosome , it was tempting to postulate that the activity of Topo IV at dif is also dedicated to post-replicative decatenation events and is strictly required for circular chromosomes . 
+ To address this question , we used E. coli strains harboring linear chromosomes [ 34 ] . 
+ In this strain , expression of TelN from the N15 phage promotes linearization of the chromosome at the tos site inserted a 6kb away from dif . 
+ Indeed , chromosome linearization suppresses the phenotypes associated with dif deletion [ 34 ] . 
+ We analyzed cleavage at the dif site by Topo IV in the context of a linearized chromosome . 
+ Cleavage was completely abolished ; showing that Topo IV activity at dif is not required on linear chromosomes . 
+ This effect was specific to the dif site , since cleavage at the 1.9 Mb site remained unchanged after chromosome linearization ( Fig 5F ) . 
+ We next assayed if the phenotypes associated with matP deletion , i.e. , formation of elongated cells with non-partitioned nucleoids [ 26 ] , depend on chromosome circularity . 
+ Strikingly , most of the phenotypes observed in the matP mutant were suppressed by linearization of the chromosome ( Fig 5G ) . 
+ Interestingly , the frequency of cleavage at dif sites inserted far ( 300 kb ) from the normal position of dif or in a plasmid were significantly reduced compared to the WT situation ( S11 Fig ) confirming that Topo IV cleavage at dif is specific to circular chromosomes . 
+ Discussion
+ Specific Topo IV binding and cleavage sites on the chromosome
+ Whole genome analysis of Topo IV binding by ChIP-seq revealed approximately 10 Topo IV binding sites across the E. coli genome . 
+ Among them , only 5 sites were strongly enriched in every experiment and these were mapped to positions 1.25 , 1.58 ( dif ) , 1.85 , 2.56 and 3.24 Mb . 
+ We did not identify any consensus sequence that could explain specific binding to these sites . 
+ Band shift experiments at the dif site and the 1.25 Mb site revealed that Topo IV binding is not sequence-dependent . 
+ This led us to favor models involving exogenous local determinants for Topo IV binding as it is the case for the dif site in the presence of XerC . 
+ Because XerC is only known to bind to dif , we could speculate that other chromatin factors might be involved in Topo IV targeting . 
+ Topo IV and Fis binding sites [ 33 ] overlap more frequently than expected ( Nust P value 10e-03 [ 32 ] . 
+ Topo IV and Fis binding sites overlap at the positions 1.25 and 2.56 Mb ; it is therefore possible that Fis plays a role in defining some Topo IV binding sites . 
+ However our EMSA , cleavage and ChIP experiments did not show any cooperative binding of Topo IV with Fis . 
+ In spite of its co-localization with Topo IV , Fis does not contribute in defining Topo IV binding or cleavage sites . 
+ Nevertheless , the role of the chromatin in Topo IV localization was also illustrated by the strong negative correlation observed for the Topo IV and H-NS bound regions . 
+ H-NS rich regions were significantly less enriched for nonspecific Topo IV binding than the rest of the chromosome . 
+ Topo IV mediated DNA cleavage sites
+ We postulated that loci where Topo IV is catalytically-active could be identified by DNA cleavage mediated by the quinolone drug norfloxacin . 
+ We designed a new ChIP-seq strategy that consisted of capturing DNA-norfloxacin-Topo IV complexes . 
+ We called it NorflIP . 
+ Three independent experiments show that Topo IV was trapped to a large number of loci ( 300 to 600 ) with most of these loci observed in two out of three experiments . 
+ A hundred of these loci were identified in all three experiments . 
+ Dif presented a strong signal in the NorflIP as in the ChIP-seq but this is not the case for most of the other ChIP-seq peaks . 
+ NorflIP peaks presented a characteristic pattern suggesting that they are genuine DNA-norfloxacin-Topo IV complexes . 
+ Considering that norfloxacin does not alter Topo IV specificity , our results suggest that for Topo IV the genome is divided into five categories : i ) Loci where Topo IV binds strongly but remains inactive for most of the cell cycle ; ii ) Loci where Topo IV is highly active but does not reside for very long time ; iii ) Loci where we observed both binding and activity ( dif and 2.56 Mb ) ; iv ) regions where Topo IV interacts non-specifically with the DNA and where topological activity is not stimulated ; v ) regions where non-specific interactions are restricted ( the Ter domain , chromatin rich regions ( tsEPODs [ 35 ] , H-NS rich regions ) . 
+ Detection of norfloxacin-mediated genomic cleavage by pulse field electrophoresis has previously revealed that when Topo IV is the only target of norfloxacin the average fragment size is 300 -- 400 kb while it drops to 20 kb when Gyrase is the target [ 11 ] . 
+ This suggests that , for each cell , no more than 10 to 20 Topo IV cleavages are formed in 10 min of norfloxacin treatment . 
+ To fit this observation with our data , only a small fraction ( 10 -- 20 out of 600 ) of the detected Topo IV cleavage sites would actually be used in each cell . 
+ This might explain why Topo IV cleavage sites were hardly distinguishable from background in the ChIP-seq assay ( Fig 3 ) . 
+ This is in good agreement with the estimation that the catalytic cycle only provokes a short pause ( 1.8 sec ) in Topo IV dynamics [ 36 ] . 
+ The mechanism responsible for the choice of specific Topo IV cleavage sites is yet to be determined . 
+ As indicated by our findings that deletion of the cleavage site resulted in the formation of a new site or sites in the vicinity , cleavage is not directly sequence-related . 
+ We observed several biases that might be involved in determination of cleavage sites ( GC di-nucleotide skew , GATC spacing , positioning near gene ends or intergenic regions , proximity with highly expressed genes and Fis binding regions ) . 
+ Interestingly inhibition of transcription with rifampicin inhibits Topo IV cleavage ( Fig 3 ) . 
+ This raises the possibility that transcription , that can be stochastic , may influence stochastic determination of Topo IV activity sites . 
+ The influence of transcription could be direct , if RNA polymerase pushes Topo IV to a suitable place , or indirect if the diffusion of topological constraints results in their accumulation near barriers imposed by gene expression [ 37 , 38 ] . 
+ This accumulation could then , in turn , signal for the recruitment of Topo IV . 
+ Replication influences Topo IV binding and activity
+ Synchronization experiments revealed that , like Topo IV binding at specific sites , Topo IV cleavage activity is enhanced by chromosome replication . 
+ Enrichment was the highest in late S phase or G2 phase ; it seems to persist after the passage of the replication fork at a defined locus . 
+ Enrichment in asynchronous cultures was significantly reduced compared to S40 or G2 synchronized cultures suggesting that Topo IV is not bound to the chromosome for the entire cell cycle . 
+ Unfortunately our experiments did not have the time resolution to determine at what point of the cell cycle Topo IV leaves the chromosome and if it would leave the chromosome during a regular cell cycle . 
+ The role of DNA replication of Topo IV dynamics has recently been observed by a very different approach [ 36 ] . 
+ The authors propose that Topo IV accumulates in the oriC proximal part of the chromosome in a MukB and DNA replication dependent process . 
+ These observations are in good agreement with our data and suggest that Topo IV is loaded on DNA at the time of replication , accumulate towards the origin of replication and remains bound to the DNA until a yet unidentified event triggers its release . 
+ Formation of positive supercoils and precatenanes ahead and behind of the replication forks respectively , could be the reason for Topo IV recruitment . 
+ One could hypothesize that MukB is used as a DNA topology sensor that is responsible for redistribution of Topo IV . 
+ However we only detected a modest effect of mukB deletion on Topo IV cleavage at dif ( Fig 5 ) . 
+ Putative events responsible for Topo IV release could be , among others , complete decatenation of the chromosome , SNAPs release , or stripping by other proteins such as FtsK . 
+ Non-specific Topo IV binding
+ Non-specific Topo IV binding presents a very peculiar pattern ; it is significantly higher in the oriC proximal 3Mb than in the 1.6 Mb surrounding dif . 
+ This pattern is not simply explained by the influence of replication ( S3 Fig ) . 
+ Interestingly , ChIP-seq and ChIP-on-Chip experiments have already revealed a similar bias for DNA gyrase [ 12 ] and SeqA [ 39 ] . 
+ The CbpA protein has been shown to present an inverse binding bias [ 40 ] , with enrichment in the terminal region and a reduction in the oriC proximal domain . 
+ The HU regulon has also presented a similar bias [ 41 ] . 
+ The terminus domain defined by these biases always comprises the Ter macrodo-main but it extends frequently beyond the extreme matS sites . 
+ The role of MatP in the definition of these biases has not yet been tested . 
+ The group of G. Mushelishvili proposed a topological model to interpret the DNA gyrase and HU regulon biases , suggesting that HU coordinates the global genomic supercoiling by regulating the spatial distribution of RNA polymerase in the nucleoid [ 41 ] . 
+ Topo IV could benefit from such a supercoiling gradient to load on the chromosome . 
+ Interestingly , the strongest Topo IV binding and cleavage sites are localized inside the Terminus depleted domain . 
+ One possibility could be that these sites minimize Topo IV binding to adjacent nonspecific sequences . 
+ Alternatively one can propose that a regional reduction of non-specific binding creates a selective advantage for optimal loading on to specific sites . 
+ Dif and the control of decatenation
+ Dif was the strongest Topo IV cleavage site detected by NorflIP , it was also detected in the ChIP-seq assays . 
+ We have used Southern blot to analyze the determinants involved in this activity . 
+ The binding of XerC on the xerC box of dif and the region downstream of the xerC box are essential . 
+ In vitro , XerC also strongly favors binding of Topo IV at dif . 
+ Interestingly XerD and the xerD box did not improve Topo IV binding or cleavage . 
+ We propose that XerC works as a scaffold for Topo IV , simultaneously stimulating its binding and its activity . 
+ Topo IV activity at dif is also dependent on the circularity of the chromosome , suggesting that when topological constraints can be evacuated through chromosome ends , Topoisomerase IV does not catalyze strand passage at dif . 
+ This suggests that topological complexity is directly responsible for Topo IV activity . 
+ Topo IV cleavage activity at dif is not influenced by SeqA or FtsK , which are two known Topo IV partners . 
+ Interestingly , mukB and matP deletion mutants slightly reduced this activity . 
+ The synergistic effect observed when a matP deletion is combined with a parEts mutation suggests that MatP indeed influences Topo IV activity . 
+ The phenotypes of the matP mutant are rescued by the linearization of the chromosome . 
+ A similar rescue has been observed for the dif mutant [ 34 ] . 
+ Therefore it is likely that a significant part of the problems that cells encounter in the absence of matP corresponds to failure in chromosome topology management , either decatenation or chromosome dimer resolution [ 25 ] . 
+ In conclusion , we propose that genomic regulation of Topo IV consists of : ( 1 ) Topo IV loading during replication , ( 2 ) Topo IV binding to specific sites that may serve as reservoirs , ( 3 ) Topo IV activation to remove precatenanes or positive supercoils in a dozen of stochastically chosen loci ( 4 ) XerC and MatP ensuring the loading of Topo IV at the dif site for faithful decatenation of fully replicated chromosomes . 
+ Materials and Methods ChIP-seq assay
+ ParE-flag and ParC-flag C-terminus fusions were constructed by lambda red recombination [ 42 ] . 
+ Cultures were grown in LB or Minimal medium A supplemented with succinate ( 0.2 % ) and casamino acids ( 0.2 % ) . 
+ Cells were fixed with fresh Formaldehyde ( final concentration 1 % ) at an OD600nm 0.2 -- 0.4 . 
+ Sonication was performed with a Bioruptor Pro ( Diagenode ) . 
+ Immunoprecipitations were performed as previously described 26 . 
+ Libraries were prepared according to Illumina 's instructions accompanying the DNA Sample Kit ( FC-104-5001 ) . 
+ Briefly , DNA was end-repaired using a combination of T4 DNA polymerase , E. coli DNA Pol I large fragment ( Klenow polymerase ) and T4 polynucleotide kinase . 
+ The blunt , phosphorylated ends were treated with Klenow fragment ( 3 ' to 5 ' exo minus ) and dATP to yield a protruding 3 - ` A ' base for ligation of Illumina 's adapters which have a single ` T ' base overhang at the 3 ' end . 
+ After adapter ligation DNA was PCR amplified with Illumina primers for 15 cycles and library fragments of ~ 250 bp ( insert plus adaptor and PCR primer sequences ) were band isolated from an agarose gel . 
+ The purified DNA was captured on an Illumina flow cell for cluster generation . 
+ Libraries were sequenced on the Genome Analyzer following the manufacturer 's protocols . 
+ Norfloxacin (final concentration 2μM) was added to the cultures at O before harvesting. Sonication and immunoprecipitation were perform ChIP-seq assay.
+ Analysis of sequencing results
+ Sequencing results were processed by the IMAGIF facility . 
+ Base calls were performed using CASAVA version 1.8.2 . 
+ ChIP-seq and NorflIP reads were aligned to the E. coli NC_000913 genome using BWA 0.6.2 . 
+ A custom made pipeline for the analysis of sequencing data was developed with Matlab ( available on request ) . 
+ Briefly , the number of reads for the input and IP data was smoothed over a 200bp window . 
+ Forward and reverse signals were added , reads were normalized to the total number of reads in each experiment , strong non-specific signals observed in unrelated experiments were removed , data were exported to the UCSC genome browser for visualization and comparisons . 
+ The strongest peaks observed with NorflIP experiments ( dif and 1.9 Mb ) present a characteristic shape ( S6 Fig ) that allows the automatic detection of lower amplitude peaks but preserves the characteristic shape . 
+ We measured Pearson correlation coefficient with the dif and the 1.9 Mb site for 600bp sliding windows over the entire genome . 
+ Peaks with a Pearson correlation above 0.72 were considered as putative Topo IV cleavage sites . 
+ Sequencing data are available on the GEO Repository ( http://www.ncbi.nlm . 
+ nih.gov / geo / ) with the accession number GSE75641 . 
+ Data were plotted with the Circos tool [ 43 ] and UCSC Archaeal Genome Browser [ 44 ] . 
+ Southern blot
+ Cleavage of DNA by Topo IV in the presence of Norfloxacin was monitored by Southern blot as previously described [ 10 ] . 
+ DNA was extracted from E. coli culture grown in minimal medium supplemented with glucose 0.2 % and casaminoacids 0.2 % . 
+ Norfloxacin ( final concentration 10μM ) was added to the cultures at OD 0.2 for 10 min before harvesting . 
+ DNA was transferred by neutral blotting on nitrocellulose membranes . 
+ For synchronization experiments a flash freeze step in liquid nitrogen is included before harvesting . 
+ Quantification was performed with Image J software . 
+ EMSA
+ Experiments were conducted using Cy3-coupled probes harboring the dif site and a Cy5-coupled dye as control . 
+ Reactions were carried out in EMSA reaction buffer ( 1mM spermidine , 30mM potassium glutamate , 10mM DTT , 6mM magnesium chloride , 10 % glycerol , pH 7.4 ) . 
+ Reactions were incubated for 15 min at RT , loaded on 4 % native PAGE gel at 25 volts and then run at 125 volts for 2 hours . 
+ Gels were then visualized using a Typhoon FLA 5000 scanner ( GE healthcare Life Science ) . 
+ EMSA of plasmids were performed with unlabeled supercoiled plasmid in the same reaction buffer . 
+ Electrophoresis was performed in a 0.8 % agarose gel in 0.5 x TAE buffer at 4 °C for 80 min at 150V . 
+ DNA labeling was performed with SYBR green . 
+ Supporting Information
+ S1 Fig . 
+ A ) Measure of the colony formation unit ( CFU ) of the WT , nalR , ParC-Flag , ParC-Flag nalR and ParE-Flag nalR strains . 
+ Culture were grown until OD 0.2 and treated for 40 minutes with norfloxacin 2μM and plated on LB plates . 
+ B ) Measure of the growth rate of the nalR , ParC-Flag nalR and ParE-Flag nalR strains . 
+ C ) Southern blot analysis of Topo IV mediated cleavage in the presence of norfloxacin at the 1.9 Mb site in the WT , nalR and ParC-Flag nalR and ParE-Flag nalR strains . 
+ ( PDF ) 
+ S2 Fig . 
+ Genome browser magnifications illustrating common non-specific signal observed over rRNA operons , tRNA and IS sequences . 
+ ParE-Flag ChIP-seq is represented in red , MatP-Flag ChIP-seq is represented in blue , Mock IP with a strain that did not contain Flag tagged proteins is represented in black . 
+ Genes , ribosomal operons and tRNA are represented below ChIPseq signals ( PDF ) 
+ S3 Fig . 
+ A ) Analysis of the Topo IV nonspecific binding . 
+ Normalized enrichment ( Average number of reads in a 1kb sliding window divided by the total amount of reads ) of each flag immuno-precipitation experiment was plotted as a function of the genomic position . 
+ Left panel a 100 kb region near oriC ( positions 4.26 to 4.36 Mb ) is represented . 
+ Right panel a 100 kb region around dif ( positions 1.55 to 1.65 Mb ) is represented . 
+ B ) Scatter plot of the average GC content according to parC-flag IP/Input . 
+ 60 kb sliding windows were used for GC content and IP/Input . 
+ C ) Average IP/Input values were normalized for GC content . 
+ D ) Null model I , a Topo IV comet follows replication forks . 
+ Illustration of the Topo IV binding kinetics under null model I described in S1 Text . 
+ The x axis in the plots represents the chromosome coordinate s , going between 0 ( ori ) and L ( ter ) . 
+ The y axis represents cell cycle time . 
+ The shaded areas are the positions of the Topo IV comets ( also sketched as red lines on a circular representation of the chromosome ) , and the numbers represent the number of bound regions per replichore . 
+ Left panel : case of non-overlapping rounds . 
+ Right panel : case of overlapping rounds , in the case where the B period starts after the termination of replication within the same cell cycle . 
+ E ) Topo IV binding bias , shown by the specific Input/IP values ( each normalized by total reads ) . 
+ This bias is not compatible with a model where Topo IV binding follows replication and persists for a characteristic period of time ( purple trace ) . 
+ ( PDF ) S4 Fig . 
+ Flow cytometry analysis of the synchronization experiment . 
+ Samples were fixed in ethanol at different time points : after 1h30 at 40 °C ( G1 ) , 20 min after downshift to 30 °C ( S20 ) , 40 min after downshift to 30 °C ( S40 ) , 60 min after downshift to 30 °C ( G2 ) and in stationary phase . 
+ ( PDF ) 
+ S5 Fig . 
+ A ) Genome browser magnifications illustrating common non specific signal observed over rRNA operon , IS sequences in the NorflIP and ChIP-seq experiments . 
+ ParE-Flag NorflIP is represented in purple , MatP-Flag ChIP-seq is represented in blue , Mock IP with a strain that did not contained Flag tagged proteins is represented in black . 
+ Genomic localization are the same as in S2 Fig B ) Southern blot cleavage assays performed in WT and nalR strains at the insH locus , ribosomal operon A and ribosomal operon B. TopoIV did not present any cleavage 
+ S6 Fig . 
+ A ) Snapshots of the ChIP-seq and NorflIP experiments at the position 1.85 and 1.92 Mb . 
+ Topo IV binding to position 1.85 Mb was only revealed by the ChIP-seq experiment in the presence of formaldehyde . 
+ Topo IV cleavage at position 1.92 Mb was only revealed by the NorflIP experiment . 
+ NorflIP peaks present a characteristic shape illustrated on the 1.92 Mb with a large 200 bp empty region in between the forward and reverse signal ( arrow ) . 
+ B ) Snapshot of the ChIP-seq and NorflIP experiments at the dif position . 
+ Topo IV binding ( ChIP-seq ) and cleavage ( NorflIP ) were detected at the dif position . 
+ C ) Description of the NorflIP peak calling procedure . 
+ Forward and reverse reads from the Flag immunoprecipitation were smoothed over 200 bp , and then subtracted from each other . 
+ The dif and 1.9 Mb signals observed on a 2kb window were used as a probe to test the entire genome with 100 bp sliding intervals . 
+ Pearson coefficient between the dif and 1.9 Mb signals and each interval were measured . 
+ Pearson coefficients above 0.72 were considered as putative Topo IV peaks . 
+ The initial list of Topo IV sites ( S1 Table ) corresponds to sites presenting a Pearson correlation above 0.72 in comparison with dif and 1.9 Mb . 
+ IP/input ratio was measured . 
+ 172 peaks with Pearson coefficient above 0.72 and an IP/input ratio > 2 were manually validated as Topo IV sites ( S1 Table ) . 
+ D ) Analysis of reads orientation in the NorflIP experiment at position 0.2 Mb . 
+ Forward and reverse read peaks are about 200 bp large , a 100 nucleotides gap is observed in between the peaks . 
+ For the analysis of Topo IV cleavage site distribution we estimated that the center of the 100 nucleo-tides gap corresponds to the position of Topo IV cleavage . 
+ ( PDF ) 
+ S7 Fig . 
+ Measure of the distance between two adjacent Topo IV cleavage sites in the dif region ( A ) and the region containing oriC and SNAP2 ( B ) . 
+ For this analysis the 571 Topo IV cleavage sites observed in the 3 experiments were pooled . 
+ C ) Distribution of the Topo IV cleavages inside genes and intergenic regions . 
+ The gene sizes were normalized to 1 . 
+ D ) RSAT analysis of the NorflIP peak calling results ( http://www.rsat.eu/; Thomas-Chollier M , Defrance M , Medina-Rivera A , Sand O , Herrmann C , Thieffry D , van Helden J. ( 2011 ) RSAT 2011 : regulatory sequence analysis tools . 
+ Nucleic Acids Res . 
+ 2011 Jul ; 39 . 
+ Analysis of the dinucleotide bias in 172 manually validated NorflIP Topo IV cleavage sites . 
+ In average GC dinucleotides are enriched near the middle of the ChIP signal . 
+ E ) GATC spacing around Topo IV peaks detected with the NorflIP experiment . 
+ Average distances between two consecutive GATC are measured around ( + / - 20 GATC sites ) 172 validated Topo IV cleavage sites and 172 random sequences . 
+ ( PDF ) 
+ S8 Fig . 
+ A ) Box plot of the distribution of distance between TopoIV cleavages and the closest highly expressed transcription unit ( T.U. ) . 
+ For this analysis the 571 Topo IV cleavage sites observed in the 3 experiments were pooled . 
+ T.U. expression was determined by RNAseq . 
+ An arbitrary threshold was set to 500 reads , it corresponds to the 10 % of the T.U. the most expressed . 
+ The distribution of a random set of cleavage sites was used as control . 
+ The two distributions are statistically different according to Anova test . 
+ The median distance is 8.5 kb for the TopoIV cleavage set and 12.3 kb for the random set . 
+ B ) Genome browser zoom on the region 1.92 Mb were TopoIV cleavages were observed in a region with a number of highly expressed T.U. C ) Distribution of 458 Topo IV cleavages ( black ) and random sites ( grey ) in between two consecutive highly expressed T. U. Topo IV cleavages are slightly more frequent near the TU than in the middle of the region . 
+ ( PDF ) 
+ S9 Fig . 
+ A ) Distribution of ParE-Flag 1 ChIP-seq enrichment in the region overlapping or not a H-NS binding site . 
+ B ) Box plot of the distribution of GC % in the regions depleted for Topo IV ( IP/input < 0.6 ) or enriched for Topo IV ( IP/input > 1.2 ) or enriched for H-NS . 
+ C ) Distribution of the GC % in 172 validated Topo IV cleavage sites as function of NorflIP IP/input signal . 
+ D ) Measure of the GC % in the 172 validated cleavage sites . 
+ GC % was measured in sliding windows of 20 bp and color coded . 
+ ( PDF ) 
+ S10 Fig . 
+ A ) Analysis of the robustness of the Topo IV-XerC-dif complex in the presence of increasing amounts of XerD protein . 
+ EMSA were performed with prebound Topo IV and XerC on dif and subsequent addition of XerD for 10 minutes before loading on the gel . 
+ B ) Analysis of Topo IV binding to negatively supercoiled plasmid by EMSA on agarose gel . 
+ Topo IV from 10 , 50 , 100 , 200 nM was added to the pFC24 ( dif ) plasmid in the presence of XerCD ( 25 or 50 nM ) . 
+ ( PDF ) 
+ S11 Fig . 
+ A ) Southern Blot analysis of Topo IV cleavage in the nalR strain at dif and an ectopic dif site located at 1.3 Mb on the genomic map . 
+ B ) Southern Blot analysis of Topo IV cleavage on a plasmid ( pFC25 ) carrying the dif region ( 10 kb around dif ) + or -- dif ( PDF ) 
+ S1 Table . 
+ Sheet 1 ) Validated ChIP-seq sites . 
+ Sheet 2 ) NorflIP sites observed in the ParC-Flag 1 NorflIP , ParE-Flag NorflIP and ParC-Flag 2 NorflIP . 
+ Sheet 3 ) Common NorflIP sites for the different experiments . 
+ Sheet 4 ) Manually Validated Topo IV cleavages . 
+ ( XLSX ) 
+ S1 Text . 
+ Model to test the correlation between TopoIV binding and the progression of rep-lication . 
+ To test if ParC and ParE ChIP-seq biases were related to chromosome replication we constructed in silico models The result of this null model is that in all cases ( overlapping or non-overlapping rounds ) the observed mean occupancy should follow the dosage . 
+ Hence the occupancy gap observed in S3E Fig in the Ter region ( when occupancy is normalized by dos-age ) has to be interpreted as a sign that this model does not apply , at least in this region . 
+ ( DOCX ) 
+ Acknowledgments
+ We thank D. Sherratt , K. Marians , D. Grainger , J. Berger , P. Rousseau and F.X. Barre for the generous gift of proteins and plasmids . 
+ We thank Marie Franquin , Jorgelindo Da Veiga Moreira and Estelle Mignot for preliminary experiments . 
+ We thank Stephane Marcand for Southern blot tips . 
+ We thank Charlotte Cockram for careful reading of the manuscript . 
+ We thank Ivan Junier and Thibault Lepage for technical help with Circos . 
+ We thank the IMAGIF geno-mic facility for deep sequencing . 
+ Author Contributions
+ Conceived and designed the experiments : HES LLC EL FC OE . 
+ Performed the experiments : HES LLC EL EV CP . 
+ Analyzed the data : HES LLC EL FC MCL OE . 
+ Wrote the paper : FC MCL OE . 
+ Conceived and analyzed models : MCL OE .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/27492737.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/27492737.txt 0 → 100644
View file @27818a9
+ Plasmid
+ abstract 
+ Conjugation plays an important role in the horizontal movement of DNA between bacterial species and even genera . 
+ Large conjugative plasmids in Gram-negative bacteria are associated with multi-drug resistance and have been implicated in the spread of these phenotypes to pathogenic organisms . 
+ A/C plasmids often carry genes that confer resistance to multiple classes of antibiotics . 
+ Recently , transcription factors were characterized that regulate A/C conjugation . 
+ In this work , we expanded the regulon of the negative regulator Acr2 . 
+ We developed an A/C variant , pARK01 , by precise removal of resistance genes carried by the plasmid in order to make it more genetically tractable . 
+ Using pARK01 , we conducted RNA-Seq and ChAP-Seq experiments to characterize the regulon of Acr2 , an H-NS-like protein . 
+ We found that Acr2 binds several loci on the plasmid . 
+ We showed , in vitro , that Acr2 can bind speciﬁc promoter regions directly and identify key amino acids which are important for this binding . 
+ This study further characterizes Acr2 and suggests its role in modulating gene expression of multiple plasmid and chromosomal loci . 
+ © 2016 Elsevier Inc. . 
+ All rights reserved . 
+ Received in revised form 26 July 2016 Accepted 28 July 2016 Available online 1 August 2016
+ 1. Introduction
+ The IncA/C plasmid type has spread globally and is now a major contributor to multidrug resistance in enteric pathogens ( Welch et al. , 2007 ; Fernández-Alarcón et al. , 2011 ; Fricke et al. , 2009 ; Harmer and Hall , 2014 ) . 
+ This plasmid type is modular in nature , containing several variable regions encoding antimicrobial resistance mechanisms and a conserved backbone of core genes for plasmid replication and maintenance ( Fernández-Alarcón et al. , 2011 ; Harmer and Hall , 2014 ; Fricke et al. , 2011 ; Carraro et al. , 2014a ) . 
+ Recent work characterized several transcriptional regulators encoded by the core backbone ( Carraro et al. , 2014b ) . 
+ These regulators were found to regulate conjugative transfer of IncA/C plasmids . 
+ Interestingly , one of the negative regulators , Acr2 , is similar to H-NS , a chromosomally encoded , global transcriptional regulator ( Fernández-Alarcón et al. , 2011 ) . 
+ There are several nucleoid-associated proteins encoded by E. coli . 
+ H-NS is among the most well studied ( Dillon and Dorman , 2010 ) . 
+ H-NS binds preferentially to DNA with a low G + C content ( Fang and Rimsky , 2008 ) . 
+ These sequences serve as sites of initial binding . 
+ After these initial interactions , other copies of H-NS can bind to each other via pro-tein-protein interactions , as well as the DNA , forming large complexes ( Fang and Rimsky , 2008 ; Bouffartigues et al. , 2007 ) . 
+ This is achieved through two distinct domains . 
+ The N-terminus of H-NS contains a oligomerization domain and the C-terminus contains a DNA binding 
+ Corresponding author at : University of Minnesota , Department of Veterinary and Biomedical Sciences , 1971 Commonwealth Ave , St. Paul , MN 55108 , United States . 
+ E-mail address : joh04207@umn.edu ( T.J. Johnson ) . 
+ domain ( Ali et al. , 2013 ) . 
+ Primarily , H-NS binding results in silencing of adjacent gene transcription ( Navarre et al. , 2006 ; Stoebel et al. , 2008 ) . 
+ Chromosomal copies of H-NS play an important role in regulation of horizontally acquired DNA , such as pathogenicity islands ( Dorman , 2007 ; Ali et al. , 2014 ; Navarre et al. , 2007 ) . 
+ In some cases , mobile elements encode their own copies of H-NS homologs ( Yun et al. , 2010 ; Dillon et al. , 2010 ; Müller et al. , 2010 ) . 
+ These horizontally encoded homologs of H-NS have been shown to antagonize the binding of the chromosomal copies of H-NS to horizontally acquired DNA ( Stoebel et al. , 2008 ; Bustamante et al. , 2001 ) . 
+ These antagonistic H-NS homologs have been found only on genomic islands of pathogenic E. coli , which is intuitive , given their speciﬁc function . 
+ There have been a few studies focusing on plasmid encoded H-NS homologs ( Yun et al. , 2010 ; Dillon et al. , 2010 ; Forns et al. , 2005 ; Doyle et al. , 2007 ) . 
+ H-NS homologs from plasmid pSfR27 and pCAR ( Sfh and Pmr , respectively ) seem to play roles in regulating a diverse set of genes , some of which are regulated by the chromosomally encoded H-NS copies ( Yun et al. , 2010 ; Dillon et al. , 2010 ; Doyle et al. , 2007 ) . 
+ It has been proposed that uncontrolled expression of these genes , caused by plasmid acquisition , could lead to a reduction in ﬁtness and subsequent loss of the plasmid from the population . 
+ The H-NS homolog encoded on the R27 plasmid of E. coli , H-NSR27 , has been shown to directly interact with the plasmid 's origin of replication , oriT and other transfer associated genes to regulate conjugation ( Forns et al. , 2005 ) . 
+ H-NSR27 was shown to be involved in an intricate interplay of chromosomally encoded H-NS homologs to thermally regulate the expression of the conjugative transfer apparatus of R27 . 
+ These recent studies exemplify the diverse roles these plasmid encoded H-NS homologs serve . 
+ Given how widely distributed thes homologs are among plasmid types , the true diversity of roles for H-NS homologs is unknown ( Shintani et al. , 2015 ) . 
+ In this study , we characterize Acr2 , an H-NS-like protein that was found to negatively regulate conjugative transfer . 
+ RNA-Seq and ChAP-Seq ( Chromatin Afﬁnity Precipitation-Seq ) were used to characterize Acr2 binding sites and regulatory network . 
+ We show that Acr2 binds multiple loci on the plasmid , speciﬁcally in regions of transfer genes and transposons carried by the plasmid . 
+ Additionally , we found that Acr2 binds several loci on the host bacterial chromosome and may directly alter host gene expression . 
+ Our sequence analysis indicates that Acr2 shares a DNA binding motif with that of other H-NS homologs and using site-directed mutagenesis we demonstrate that these amino acids are critical for its function as a repressor of conjugation . 
+ 2. Materials and methods
+ 2.1. Bacterial strains, plasmids and growth conditions
+ The bacterial strains and plasmids are listed in Table 1 . 
+ All strains were routinely grown in Difco ™ Luria-Bertani ( LB ) broth or LB agar at 37 °C unless otherwise noted . 
+ Broth cultures were grown in a shaking incubator at 37 °C with shaking ( 200 RPM ) unless otherwise noted . 
+ Supplementation of ampicillin ( Amp , 100 μg / mL ) , chloramphenicol ( Cm , 20 μg / mL ) , nalidixic acid ( Nal , 30 μg / mL ) , kanamycin ( Kan , 100 μg / mL ) , rifampicin ( Rif , 100 μg / mL ) and tetracycline ( 12.5 μg / mL ) were used as needed . 
+ Counter-selections were done using M9 minimal agar supplemented with 0.2 % rhamnose . 
+ Arabinose induction of pBAD22 vector constructs was achieved with a ﬁnal concentration of 0.02 % . 
+ X-gal added to agar plates at a concentration of 40 μg / mL . 
+ Diaminopimelic acid ( DAP ) was add to a ﬁnal concentration of 300 μM to facilitate growth of WM3064 . 
+ 2.2. Strain construction via recombineering
+ All deletions and mutations were done via λ-Red mediated recombination ( recombineering ) with some variations . 
+ In order to delete the resistance genes from pAR060302 , strain DY331 was used because it expresses recombineering genes from the chromosome and not from a plasmid which must be selected for . 
+ To ease transfer of pAR060302 , it was moved into a nalidixic acid resistant variant of strain WM3064 , which is a DAP auxotroph . 
+ To obtain a nalidixic acid resistant WM3064 , it was grown at 37 °C with shaking ( 200 RPM ) in LB broth supplemented with DAP overnight . 
+ This culture was used to inoculate a new 10 mL LB + Dap broth culture , which was incubated for 4 h at 37 °C with shaking ( 200 RPM ) . 
+ The cells were pelleted by centrifugation at 8000 RPM for 10 min . 
+ The pellet was resuspended with 100 μL of LB broth and used to inoculate an LB agar plate supplement with Nal and Dap to select a spontaneous nalidixic acid resistant mutant ( WM3064nalR ) . 
+ This plate was incubated at 37 °C overnight . 
+ Isolated colonies were streak puriﬁed on a new LB + Nal + Dap agar plate . 
+ The pAR060302 parental E. coli strain , AR060302 , was mated with WM3064nalR . 
+ Brieﬂy , WM3064nalR was struck on an LB + Dap plate and the WT strain AR060302 was struck over the top . 
+ The plate was incubated overnight . 
+ The resulting growth was struck onto an LB + Nal + CM + Dap plate to select for WM3064nalR ( pAR0603020 ) transconjugants . 
+ WM3064nalR ( pAR060302 ) was used to transfer pAR060302 into strain DY331 in a similar manner except selection for transconjugants was achieved by growth on LB + Cm plates ( No DAP ) to select for DY331 ( pAR060302 ) . 
+ PCR was used to generate the tet knockout amplicon by amplifying the neo-ccdB cassette from pKD45 using the primers pARdeltaTet-fw and rv ( Table 2 ) ( 12.5 μL Phusion 2 × master mix ( Life Technologies ™ ) , 500 nM of each primer , 1 μL of template ( boiled cells ) , PCR conditions : 95 °C 5 min , 25 cycles of 95 °C 30 s , 55 ° C 30 s , 72 °C for 2 min ; then a ﬁnal incubation at 72 °C for 10 min ) . 
+ The resulting amplicon was puriﬁed with a Qiagen ™ PCR cleanup kit . 
+ The neo-ccdB cassette from pKD45 allows for selection with kanamycin and then removal of the cassette ( scar-less or otherwise ) using counterselection as expression of ccdB ( via growth on minimal rhamnose plates ) is lethal . 
+ Strain DY331 ( pAR060302 ) was grown in LB + Cm broth at 32 °C with shaking ( 200 RPM ) overnight . 
+ The overnight culture was used to inoculate a new 50 mL LB + Cm broth culture ( 1:100 dilution ) . 
+ This was grown until an OD600 ~ 0.4 at which time it was moved to a 43 °C shaking water bath and incubated for 20 min to prime cells for recombineering . 
+ After the incubation the ﬂask was cooled on ice for 
+ 10 min . 
+ The cells were pelleted by centrifugation at 8000 RPM for 5 min at 4 °C . 
+ The cell pellet was resuspended with 25 mL ice cold H2O . 
+ The cells were collected again by centrifugation and washed an additional time . 
+ The ﬁnal cell pellet was resuspended in 200 μL of ice cold H2O . 
+ A 40 μL aliquot of washed cells was electroporated with 0.5 -- 1 μg of puri-ﬁed amplicon . 
+ Cells were recovered with LB broth and incubated at 32 ° C for 4 h . 
+ After recovery cells were plated on LB + Kan plates . 
+ Isolated colonies were checked for growth on LB + Tet plates to conﬁrm disruption of the tet ( A ) gene . 
+ Successful mutants were mated on LB plates with E. coli strain K-12 MG1655 carrying pSIM5-Tet at 32 ° overnight . 
+ Successful transconjugants were selected on LB + Cm + Tet plates to select for pAR060302 ( Δtet : : neo-ccdB ) and pSIM5-tet . 
+ This strain was used to remove the neo-ccdB cassette by recombineering using a ssDNA substrate ( Sawitzke et al. , 2011 ) . 
+ Cells were primed for recombineering as above and the 70 nt oligonucleotide DeltaTetRepairOligo ( Table 2 ) was electroporated . 
+ Cells were recovered and plated on M9 + Rhamnose + Cm plates . 
+ This resulted in pAR060302 ( Δtet ) , where the tet locus was deleted and the neo-ccdB cassette was removed . 
+ Isolated colonies were checked for mutations via PCR . 
+ The entire insertionremoval process was then done a second time to target and remove the blaCMY-2 gene using the appropriate primers . 
+ This resulted in pARK01 , a tetracycline and ampicillin susceptible variant of pAR060302 . 
+ Deletions of predicted transcriptional regulators on pARK01 were done via recombineering in strain K12 ( pARK01 , pSIM5-tet ) . 
+ The plasmid pKD4 ( Datsenko and Wanner , 2000 ) carrying the FRT-neo-FRT cassette was used as a template for PCR ( 12.5 μL Phusion 2 × master mix ( Life Technologies ™ ) , 500 nM of each primer , 1 μL of template ( boiled cells ) , PCR conditions : 95 °C 5 min , 25 cycles of 95 °C 30 s , 55 ° C 30 s , 72 °C for 2 min ; then a ﬁnal incubation at 72 °C for 10 min ) . 
+ PCR amplicons were subjected to a DpnI digestion to eliminate template plasmid and then electroporated into cells primed for recombineering . 
+ Resulting colonies were checked by PCR for mutations . 
+ Successful mutants were then grown and made electrocompetent and the plasmid pCP20 was electroporated . 
+ These transformants were grown at 32 °C for 48 h to `` ﬂip '' out the neo cassette , leaving a copy of FRT . 
+ The plasmid pCP20 was lost via incubation at 37 °C for 48 h. Colonies were screened via PCR and amplicons were veriﬁed by sequencing . 
+ Alterations in the AT-Hook motif of Acr2 were achieved by ﬁrst inserting the neo-ccdB cassette ( ampliﬁed using delATHOOKAcr2-kanccdB-fw and - rv primers ) into acr2 , and then through another round of recombineering using the appropriate ssDNA oligo and counterselection on minimal rhamnose plates to remove the cassette and alter the coding sequence as desired . 
+ 2.3. Molecular cloning methods
+ For constructs using pLUXtet , containing a promoterless lux operon , cloning was done via double digestion with SpeI and BamHI and ligation with DNA ligase ( NEB ) . 
+ Oligonucleotides used to generate PCR amplicons of promoter regions with ﬂanking ends containing restriction sites are listed in Table 2 . 
+ To clone genes from pAR060302 into pBAD22 ( Guzman et al. , 1995 ) , in vivo cloning via recombineering was used ( Lee et al. , 2001 ) . 
+ In order to generate a template for PCR , pBAD22 was digested overnight using BamHI . 
+ The resulting digestion was used as template for PCR ( 12.5 μL Phusion 2 × master mix ( Life Technologies ™ ) , 500 nM of each primer , 1 μL of template , PCR conditions : 95 °C 5 min , 25 cycles of 95 °C 30 s , 55 °C 30 s , 72 °C for 6 min ; then a ﬁnal incubation at 72 °C for 10 min ) . 
+ Primers used for in vivo cloning were designed to amplify the vector pBAD22 , and add 40 bp regions of homology starting with 2nd codon of the gene of interest to the reverse primer and 40 bp of homology ending with the codon next to the stop codon for the forward primer . 
+ PCR amplicons were puriﬁed with a Qiagen ™ PCR Cleanup Kit and electroporated into K-12 ( pARK01 , pSIM5-tet ) cells that were primed for recombineering . 
+ After 4 h of recovery in LB broth at 32 °C , cells were plated on LB + Amp + Cm plates to select for strains still carrying pARK01 as well as the newly recombined pBAD22 with appropriate insertion . 
+ Subsequent colonies were checked via PCR for insertions in pBAD22 . 
+ Successfully cloned plasmids were conﬁrmed via DNA sequencing . 
+ The pSIM5-tet plasmid was lost by incubation at 37 °C for 24 h. 
+ 2.4. Conjugation experiments
+ E. coli strain DH10 was used as a recipient in all conjugation assays . 
+ Donor and recipient cells were grown overnight at 37 °C in LB broth with the appropriate selection . 
+ The overnight cultures were used to inoculate new 5 mL cultures in LB broth with no selection . 
+ The new cultures were incubated for 4 h ; if arabinose induction was needed it was done at 2 h of incubation . 
+ After 4 h of incubation , 0.5 mL of donor and recipient cells were added to 1.5 mL centrifuge tube , mixed by pipetting and incubated at 37 °C without shaking for 1 h . 
+ The mating reactions were then vortexed and placed immediately on ice . 
+ They were subsequently diluted in 1 × phosphate buffered saline ( PBS ) and plated to select for transconjugants and donors . 
+ 2.5. Lux reporter assays
+ Overnight cultures were grown with the appropriate selections in LB broth at 37 °C with shaking ( 200 RPM ) . 
+ These cultures were diluted 1:100 in new broth with selection . 
+ These cultures were grown under the same conditions for 2 h . 
+ At that time , the cultures were split in half . 
+ One half was treated with arabinose ( 0.02 % ) and the other half received no treatment . 
+ The cultures were allowed to grow for 2 h. All cultures were then aliquoted into a 96-well plate , 200 μL per well . 
+ Plates were then read on a Bio-Tek plate reader . 
+ Cell density and arbitrary light units were measured . 
+ Bioluminescence was standardized for cell density by dividing light units by the OD600 absorbance . 
+ Each value represents the mean of 3 experiments . 
+ 2.6. RNA isolation and sequencing for RNA-Seq
+ Strains DH10B ( pAR ) and DH10B ( pARΔacr2 ) were grown until an OD600 of 0.5 was achieved . 
+ Cells were pelleted and RNA was puriﬁed using a commercially available RNA extraction kit ( Qiagen ) . 
+ Treatments were included to remove DNA contamination ( Qiagen ) and ribosomal RNA ( MicrobExpress , Ambion ) . 
+ Two biological replicates for each strain were pooled for paired-end library sequencing ( either 50 or 100 bp reads ) via Illumina Genome Analyzer II at the University of Minnesota Genomics Center . 
+ All of the sequencing data are publically available under the NCBI BioProject ID PRJNA273283 . 
+ 2.7. RNA-Seq analysis
+ All Perl scripts and other computational biology resources used in this study can be found at https://github.com/kevinslang . 
+ cDNA reads were ﬁrst trimmed so that the quality at each base position was above 30 and then mapped to the appropriate genome or plasmid sequence ( for pAR060302 , GenBank accession no . 
+ NC_012692 , for DH10 , the E. coli K-12 MG1655 published sequence was used GenBank accession no . 
+ NC_000913 ) Read mapping was done using BOWTIE ( Langmead et al. , 2009 ) . 
+ For each host , transcriptome maps of pAR060302 were constructed using Circos ( Krzywinski et al. , 2009 ) . 
+ To achieve this , a table was generated containing the average number of reads mapped per 250 bp of plasmid sequence . 
+ Each average was then normalized per 1 million total reads in the cognate sequence library . 
+ These averages were then log transformed and plotted as a line plot . 
+ For statistical testing of differentially expressed genes , the total number of reads mapped to each coding sequence ( CDS ) was calculated using Perl . 
+ These values were then analyzed using the R package EdgeR ( Robinson et al. , 2010 ; Robinson et al. , 2011 ) . 
+ We conservatively estimated the dispersion a 
+ 0.001 . 
+ A fold-change cutoff of N2 or b − 2 and an adjusted p-value of b0 .05 were used to deﬁne signiﬁcantly differentially expressed genes . 
+ 2.8. ChAP-Seq experiments
+ K12 ( pARK01 , pBacr26xHis ) was grown overnight in LB + Cm + Amp at 37 °C with shaking ( 200 RPM ) . 
+ The overnight culture was used to inoculate a new 10 mL LB + Cm + Amp culture ( 1:100 dilution ) . 
+ This was grown for 2 h at 37 °C with shaking ( 200 RPM ) . 
+ Arabinose was added to a ﬁnal concentration of 0.02 % and the culture was incubated for 2 additional hours . 
+ Cells were ﬁxed with the addition of formalin to a ﬁnal concentration of 1 % . 
+ Fixed cells were incubated at RT for 20 min . 
+ The formalin was quenched by addition of glycine to a ﬁnal concentration of 0.5 M and incubation for 5 min at RT. . 
+ Cells were collected by centrifugation at 8000 RPM for 5 min . 
+ The cell pellet was washed 1 × with 10 mL of cold PBS and collected again . 
+ The resulting pellet was resuspended in 1 × lysis solution ( MagneHIS Kit ™ , Promega ) . 
+ The cells were sonicated on ice at 5 watts for 30 s intervals and 1 min rest times . 
+ This was done 10 times . 
+ The cell lysates were spun at 12,000 g for 5 min and the supernatant was moved to a 1.5 mL centrifuge tube containing 50 μL of MagneHIS particles . 
+ This suspension was taken through the MagneHIS kit protocol . 
+ After elution , cross-linked DNA was released via incubation at 65 °C for 18 h . 
+ The resulting DNA was puriﬁed with a Qiagen ™ PCR Clean up Kit and sent for paired-end sequencing using the Illumina MiSeq platform . 
+ The library generation and sequencing was done at the University of Minnesota Genomics Center . 
+ In total , 2 biological rep-licates were sequenced separately . 
+ As a negative control , two genomic DNA preparations of the same cells were done using a Qiagen ™ DNeasy ™ kit and ~ 400 ng of each was sent for sequencing . 
+ The reads that resulted from the MiSeq run were all trimmed to 50 bp from the 3 ′ end using the tool Trimmomatic ( Bolger et al. , 2014 ) . 
+ The trimmed reads were then mapped to either the pAR060302 sequence or the K-12 genome sequence using BWA using the default parameters ( Li and Durbin , 2009 ) . 
+ Reads that were mapped correctly were ﬁltered using SamTools ( Li et al. , 2009 ) . 
+ The replicate libraries were then merged using SamTools . 
+ The resulting read alignments ﬁles were then analyzed using MACS ( Zhang et al. , 2008 ) . 
+ Peak summit coordinates were used to extract 200 bp of sequence surrounding them using BedTools ( Quinlan and Hall , 2010 ) . 
+ These sequences were combined into a multifasta ﬁle and submitted to MEME for motif analysis ( Bailey et al. , 2009 ) . 
+ ChAP-Seq read alignments were also subjected to read counting on 150 bp windows of the pAR060302 sequence for visualization in Circos ( Krzywinski et al. , 2009 ) . 
+ 2.9. Protein puriﬁcation and EMSA
+ The pBacr26xHis construct was transformed into the BL21 ( DE3 ) strain . 
+ The resulting strains were cultured in LB + AMP at 37 °C for 1 h with shaking ( 250 RPM ) . 
+ Arabinose was added to a ﬁnal concentration of 0.2 % prior to growing the cultures for 5 h at 28 °C . 
+ Cells were spun at 4500 × g for 30 min , resuspended in 5 mL cell lysis buffer ( 20 mM Tris pH 8 , 500 mM NaCl , 5 mM imidazole , 5 mM β-mercaptoethanol ) and sonicated . 
+ The cellular debris was removed by centrifugation at 12,000 g for 15 min . 
+ 200 μL of MagnePURE beads were added to the supernatant and these were incubated for 1 h on a rocking platform , washed twice with high salt washing buffer ( 20 mM Tris pH 8 , 1 M NaCl , 5 mM β-mercaptoethanol ) and twice with low salt washing buffer ( 20 mM Tris pH 8 , 0.5 M NaCl , 30 mM imidazole , 5 mM β-mercaptoethanol ) , and then eluted with 100 μL elution buffer ( 20 mM Tris pH 8 , 500 mM NaCl , 500 mM imidazole ) . 
+ Pefabloc ™ ( Roche ) and glycerol were added to ﬁnal concentrations of 1 mg/mL and 5 % , respectively . 
+ Puriﬁed proteins were analyzed by SDS-PAGE gel electrophoresis and stored at − 80 °C . 
+ Two DNA fragments were used for EMSA analysis . 
+ The region upstream of acr1 ( ~ 480 bp ) and the ﬂoR promotor ( ~ 150 bp ) . 
+ These fragments were ampliﬁed by PCR using the primers Orf184-EMSA-F and Orf184-EMSA-R ( acr1 ) , and Flo-F and Flo-R ( ﬂoR ) . 
+ The amplicons were puriﬁed using a PCR cleanup kit ( Qiagen ) . 
+ Various concentrations of puriﬁed Acr2 were incubated with 10 nM DNA in binding buffer ( 15 mM HEPES pH 7.9 , 40 mM KCl , 1 mM EDTA , 1 mM DTT , 5 % glycerol ) for 30 min . 
+ The reactions were separated by gel electrophoresis for 2.5 h at 70 V on a 7.5 % native polyacrylamide gel at 4 °C ( buffered with Tris glycine pH 8.0 ) . 
+ Gels were stained with SYBR Green for 20 min at room temperature , washed twice with ddH2O , and DNA complexes were visualized with ultraviolet light . 
+ 2.10 . 
+ Development of acr1-lacZ fusion construct and detection of LacZ activity 
+ Recombineering was used to replace the E. coli chromosomal genes lacY and lacA with an FRT-neo-FRT cassette . 
+ The lacZ gene , starting at the 4th codon was then ampliﬁed , along with the FRT-neo-FRT cassette with primers that contained 40 bp of homology with acr1 and the region upstream of acrC . 
+ This amplicon was used to replace acr1-acrC starting with the 4th codon of acr1 in strain MC4100 ( pARK01 , pSIM5-Tet ) via recombineering and the neo-FRT cassette was removed by introduction of pCP20 expressing the FLP recombinase . 
+ In the same strain , acr2 was disrupted via recombineering . 
+ These reporter cells were then transformed with constructs over expressing Acr2 and the relevant Acr2 variants . 
+ They were struck onto LB + Amp + Xgal plates with and without arabinose . 
+ 3. Results
+ 3.1 . 
+ Acr2 is an H-NS-like protein that represses conjugative transfer of IncA/C plasmids 
+ To further investigate IncA/C plasmid conjugation , we developed a plasmid variant by systematically deleting the tetracycline and betalactam resistance genes ( see Materials and Methods ) to generate pARK01 . 
+ We demonstrate that this variant is no different in terms of conjugation frequency than the wild-type plasmid ( Fig . 
+ S1A ) . 
+ Recently , several transcriptional regulators on the IncA/C plasmid pVCR94ΔX were characterized in terms of their ability to regulate conjugative transfer ( Carraro et al. , 2014b ) . 
+ However , pVCR94ΔX was isolated from a Vibrio isolate and a deletion of unknown content was made in its development for genetic studies ( Carraro et al. , 2014a ) . 
+ To be certain that the functions were conserved in our IncA/C plasmid , pAR060302 ( isolated from an E. coli ) , we repeated several experiments using mutants of pAR060302 . 
+ Our results were mostly consistent with what was previously found ( Fig . 
+ S1C and D ) , which indicated that pAR060302 was regulated in a similar matter to pVCR94ΔX . 
+ The putative protein encoded by orf183 shares homology only with hypothetical proteins . 
+ Because of its proximity to the other predicted transcriptional regulators , we wondered if it may be involved in regulating conjugative transfer ( Fig . 
+ S1B ) . 
+ Deletion of this gene does result in a slight increase in transconjugants , however we could not rule out that this was due to a polar effect of the deletion ( Fig . 
+ S1C ) . 
+ Given the small magnitude of the effect , the putative protein produced by orf183 is unlikely to be a major contributor to repression of conjugation . 
+ In our experiments , deletion of acr1 did not alter the frequency of conjugative transfer of pARK01 . 
+ However , given the strong evidence previously reported on acr1 ( Carraro et al. , 2014b ) , we acknowledge that this could be due to differences in our experimental set up . 
+ To further investigate Acr2 , we generated an Acr26xHis construct and conﬁrmed its ability to complement an acr2 deletion mutant by conducting transfer experiments . 
+ Plasmids lacking acr2 exhibit a 10-fold increase in conjugation frequency compared to wild-type ( Fig. 1 ) . 
+ The observation , from our data and that of other 's ( Carraro et al. , 2014b ) , that Acr2 negatively regulates conjugation corroborated our RNA-Seq results comparing WT and Δacr2 plasmids . 
+ Nearly all of the genes that are predicted to be involved in conjugative transfer wer signiﬁcantly up-regulated ( at least 2-fold and p-value b0 .05 ) in the Δacr2 strain ( Fig. 2 , 4th ring ) . 
+ In addition to genes involved with transfer being up-regulated , the region encoding many putative hypothetical proteins and putative phage-like proteins ( bp positions ~ 82 k -- 100 k ) were also up-regulated in our experiment . 
+ Again these results are congruent with what has previously been reported , demonstrating that pARK01 is regulated in the same manner as other IncA/C plasmids and that our Acr26xHis construct is functionally analogous to the wild-type protein . 
+ 3.2. Characterization of Acr2 binding sites
+ We performed a ChAP-Seq experiment in vivo to better understand where Acr2 binds on both the plasmid and host bacterial chromosome . 
+ Using nickel afﬁnity chromatography to pull down Acr26xHis , cross-linked DNA was extracted and subsequently sequenced . 
+ Analysis of the ChAP-Seq data revealed that Acr26xHis binds to several loci on the plasmid ( Fig. 2 , rings 1 and 2 ) . 
+ Some of the ChAP peaks overlapped with genes of unknown function , such as near base pair coordinates 19 k , 45 k , 140 k , and 145 k . 
+ The regions that showed greatest binding were the entire ISEcp1 region , the regions within the traFHG and the region upstream of acr1 . 
+ Binding upstream of acr1 would suggest a direct repression of the operon encoding the positive regulators acrDC . 
+ Our ChAP data suggests that Acr26xHis binds to chromosomal DNA sequences . 
+ However , like other H-NS-like proteins , the interactions are not easily interpretable . 
+ For example , we only found 3 instances of binding of genes that were subsequently found to be signiﬁcantly differentially expressed in our RNA-Seq data ( Fig. 3 ) . 
+ Interestingly , all three of these genes have functions associated with metabolism , most notably the glcB gene which was found previously as being up regulated in E. coli carrying pAR060302 , compared to cells lacking the plasmid ( Lang and Johnson , 2015 ) . 
+ We used MACS ( Model-based Analysis of ChAP-Seq ) ( Zhang et al. , 2008 ) to determine signiﬁcantly enriched peaks in DNA pulled down with Acr26xHis . 
+ The sequence surrounding the summits of each of the signiﬁcant ChAP peaks found by MACS was submitted to MEME ( Multiple Em for Motif Elicitation ) to determine if there were common motifs within the signiﬁcant peaks . 
+ The motif discovered in a majority of the peaks is A + T rich , similar to that of other H-NS-like proteins ( Fig. 4B ) . 
+ Electromobility shift assays ( EMSA ) were used to conﬁrm that Acr26xHis binds to the DNA fragment upstream of acr1 containing the motif , but not a different A + T rich promoter sequence located upstream of the ﬂoR gene on plasmid pAR060302 ( Fig. 4A ) . 
+ Analysis using the tool Bprom ( Solovyev and Salamov , 2011 ) to computationally predict bacterial promoters shows a predicted transcriptional start site 287 bp upstream of the start codon of acr1 ( Fig. 4C ) . 
+ These results demonstrate that Acr2 directly represses transcription of the operon containing acrDC by binding to the sequence upstream of acr1 and that there is sequence speciﬁcity to Acr2 binding regardless of A + T content . 
+ 3.3. The C-terminal domain of Acr2 is crucial for its activity
+ The C-terminal domain of Acr2 is similar to that of other H-NS-like proteins ( Fig. 5A ) . 
+ It contains an AT-hook motif , with conserved amino acids Q/RGR , that has been shown in structure studies to be critical to contact with DNA ( Gordon et al. , 2011 ) . 
+ We examined the possibility that this motif was important for Acr2 activity by making mutations in this region using recombination with a ssDNA substrate ( see Materials and methods ) . 
+ An in-frame deletion of the codons for QGRRPD ( ΔAT-hook ) resulted in abolishment of the ability of Acr2 to repress conjugative transfer ( Fig. 5B ) . 
+ Although substitution of alanine for glutamine at position 116 resulted in increased transfer frequency , it was not as a dramatic as the increase observed for alanine substitutions for the arginine residues at positions 118 and 119 ( Fig. 5B ) . 
+ Given that these substitutions could have led to proteins targeted for degradation , which could also explain the acr2 − phenotypes observed , we cloned each of these Acr2 variants in an arabinose inducible vector . 
+ We used these constructs to test the ability to repress the LacZ activity of an acr1-lacZ fusion construct ( Fig. 5C ) . 
+ Only the WT Acr2 and the Q116A variant were able to repress LacZ activity . 
+ The Q116A result contrasts with that of the conjugation experiment . 
+ In the conjugation experiment the Acr2 variants were under native promotion . 
+ This is probably much lower than that achieved by arabinose induction of the pBAD vector , however , we can not rule out the possibility of degradation . 
+ Taken together , these results suggest that the QGRR motif is critical for the ability of Acr2 to repress the promoter upstream of acr1 . 
+ 4. Discussion
+ 4.2. Acr2 is an H-NS-like protein with a conserved DNA binding motif
+ Our results demonstrate that , like other H-NS homologs , Acr2 binds A + T rich DNA . 
+ Our in vitro results suggest that Acr2 binds in a sequence speciﬁc manner . 
+ It has been proposed that the amino acid composition in the AT-hook motif is the mechanism by which different H-NS homologs distinguish between chromosomal and horizontally acquired DNA ( Gordon et al. , 2011 ) . 
+ It is thought that the number or arrangement of positively charged amino acids , such as arginine , determine the DNA recognition by H-NS-like proteins . 
+ Acr2 shares a DNA binding motif with that of other H-NS homologs and our results show that this set of amino acids is critical for its function as a repressor . 
+ When we removed or altered this DNA binding motif from Acr2 , we found that , in some cases , the conjugation frequency increased higher than that of an Δacr2 mutation . 
+ This result could be explained in two ways : 1 ) Acr2 variants could be imparting a dominant negative effect on other negative regulators , such as Acr1 . 
+ Or 2 ) Acr2 variants could bind the promoter upstream of acr1 in such a way as to promote transcription . 
+ The AT-hook motif in Acr2 contains an additional arginine residue ( R119 ) that is not shared with other H-NS homologs . 
+ We have shown that this arginine is important for the function of Acr2 . 
+ This difference might result in Acr2 having a distinct propensity to bind speciﬁc regions of IncA/C plasmids to accomplish speciﬁc tasks , such as repressing conjugation . 
+ 4.3. Other roles for Acr2
+ We have generated evidence that Acr2 might have other roles in the biology of IncA/C plasmids . 
+ Our ChAP-Seq data suggest that Acr2 binds nearly the entire ISEcp1 element that carries the blaCMY-2 gene . 
+ Comparative genomics studies have demonstrated that the acquisition of this transposon is a recent event in the evolution of IncA/C plasmids ( Fernández-Alarcón et al. , 2011 ; Fricke et al. , 2009 ; Harmer and Hall , 2014 ) . 
+ Could Acr2 bind newly acquired mobile elements within IncA/C plasmid backbones ? 
+ Given the propensity for chromosomally encoded H-NS to bind genomic islands , it seems plausible ( Navarre et al. , 2006 ) . 
+ Furthermore , Acr2 binds to the 3 ′ end of the rhs gene located downstream of the class 1 integron ( bp coordinate ~ 146,000 ) . 
+ The rhs genes have repeat sequence elements that implicates them in genome shufﬂing ( Lin et al. , 1984 ) . 
+ Recent work from Harmer et al. suggests that the rhs homolog carried on IncA/C plasmids plays an important role in the diversity of newly integrated mobile elements ( Harmer and Hall , 2014 ) . 
+ Binding by Acr2 may inhibit the ability for recombination to happen at the rhs locus . 
+ H-NS has recently been shown to play an important role in the evolution of Salmonella , providing stability for its genomic islands ( Ali et al. , 2014 ) . 
+ Given broad spatial distribution of highly similar variants of IncA/C plasmids , Acr2 binding of rhs could be a mechanism driving IncA/C plasmid evolution . 
+ Our ChAP-Seq experiment also yielded evidence that Acr2 binds chromosomal DNA . 
+ It is unclear what the true role of this binding is as nearly all genes bound showed no differential expression in RNA-Seq experiments . 
+ We can not rule out that these peaks might be an artifact of overexpressing Acr2 . 
+ The lack of differentially expressed genes could also be explained by the presence of chromosomally encoded nucleoid associated proteins , such as H-NS , IHF and others . 
+ It will take more sophisticated experiments to tease out the meaning behind these results . 
+ There were three bound genes , however , that were differentially expressed in our RNA-Seq data . 
+ All three have to do with different metabolic pathways . 
+ Interestingly , one of the genes was glcB , which is involved in the glyoxylate bypass pathway . 
+ We previously found this pathway to be modulated in several different host bacteria upon acquisition of IncA/C plasmids ( Lang and Johnson , 2015 ) . 
+ This lends some credence to the possibility that IncA/C plasmids encode the ability to speciﬁcally alter host metabolic pathways to improve ﬁtness of plasmid carrying cells . 
+ This presents an almost phage-like scenario where the plasmid co-ops the host bacterium to become adept at carrying and disseminating the plasmid . 
+ It is an interesting area of study that future work must explore . 
+ Supplementary data to this article can be found online at http://dx . 
+ doi.org/10.1016/j.plasmid.2016.07.004 . 
+ Acknowledgments
+ The authors would like to thank Dr. Jeff Gralnick ( University of Minnesota ) , Dr. Fitnat Yildiz ( UC Santa Cruz ) , and Dr. Don Court ( NIH ) for sharing of strains . 
+ The authors thank Dr. William Navarre ( University of Toronto ) for sharing puriﬁed H-NS and EMSA technical assistance . 
+ Data analysis was carried out using tools available through the Minnesota Supercomputing Institute at the University of Minnesota . 
+ The primary author , KSL , was supported by a fellowship from the United States Department of Agriculture National Institute of Food and Agriculture grant no. 2013-67011-21276 . 
+ TJJ was supported through funding from the University of Minnesota College of Veterinary Medicine Signature Programs .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/27856567.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/27856567.txt 0 → 100644
View file @27818a9
+ Transcriptional and translational regulation by RNA
+ ∗ Corresponding author : Lehrstuhl für Mikrobielle Ökologie , Technische Universität München , Weihenstephaner Berg 3 , 85354 Freising , Germany . 
+ Tel : +498161715549 ; E-mail : neuhaus@wzw.tum.de One sentence summary : RNAseq and RIBOseq are able to quantify transcriptional and translational regulation of gene expression by ncRNAs in EHEC after adaptation to combined cold and osmotic stress . 
+ Editor : Richard Calendar 
+ ABSTRACT 
+ INTRODUCTION
+ G 
+ C 
+ C f o 
+ R 
+ C 
+ C f e o o l p 
+ ¬ 
+ F o p 
+ ¬ 
+ P o o l 
+ F 
+ D 
+ R 
+ R 
+ R d v v f n d u u 
+ D 
+ P 
+ P e 
+ K u u a 
+ R a f e n n u n n l 
+ ( l 
+ R 
+ K c 
+ K e 
+ M c t t n t t u e u 
+ ( 
+ M 
+ M r s s h c t s c s h e c e c a t s t t s e d r s o t a e t t s t t i t 
+ ( g 
+ ( d r o r t r o r t n r n a r r n r r n o r e e e g a n a n a g t a e t c n e a t r a e g e d d e s e 
+ R n s s e n n e n n 
+ R n s r o n r s s s s s 
+ ( 
+ ( o s g c s 
+ ) s s 
+ ( g s l a o 
+ ) s l l l b s a c l c a e e e 
+ ) r i b l a l e n r a r a t b a a d 
+ R 
+ R a d i i t a p t t g 
+ ) 
+ ) d t g p p o o b o o t t e r o e o o o 
+ R e m m 
+ R m m u 
+ ) g m e e 
+ ) e e t m m 
+ 1. Regulation by RNA thermometers
+ u e l g e e 
+ 44 70 178 2.5∗ 0.006 20 1711 689 0.4∗ 0.005
+ e a t i 
+ 0.487 0.962
+ n o 
+ G e 
+ C o 
+ C o f o l p 
+ ¬ v 
+ F 
+ D 
+ R 
+ P 
+ R 
+ P 
+ C o 
+ C o f o l p 
+ ¬ v 
+ F 
+ D 
+ R d 
+ P 
+ R n u u 
+ P a 
+ R 
+ R e e l f n n 
+ ( f 
+ K u 
+ K u u d c a 
+ R 
+ K t t u 
+ K e e 
+ M n 
+ M n n c l s s h 
+ ( 
+ M 
+ M r c t t e t d u e c a c e t r c t s c t t s s h t 
+ ( t g s i o r s t o t s t a e o d r r t n r o r e t r e n a r o n a t r r n r r n 
+ ( a n a g g a n t t d a e e 
+ R c n e e n t n e a t r a e n n s g e s 
+ ) d e r s s e 
+ R n r s r s s s 
+ ( g o n s s 
+ ( c o s a o c s n s o l n s e s s 
+ ( g 
+ ) l l c l c e l a b s 
+ ) r i n b l r a r 
+ R r i b l a l a e e d a i i 
+ ) t b a t a p d p a g t a p p t t d 
+ R 
+ ) o t t e o b r o t o o g o o 
+ R 
+ 5848 5253 7147 2597 0.4
+ 782 789 305 144 0.5
+ o e o e m m u 
+ ) 
+ 1205 1491 418 1987 4.8
+ g m m m 
+ R e e t u m e e 
+ ) m m e l g e e a 
+ 3.0∗
+ e 
+ ∗ 
+ ∗ 
+ ∗ 
+ ∗ 
+ 511 2087 239 1317 5.5 4.77E-07
+ e t i n e o
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/27876680.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/27876680.txt 0 → 100644
View file @27818a9
+ Methods
+ Indeed , other putative targets identiﬁed by MAPS still need validation . 
+ Surprisingly , in Lalaouna et al. [ 1 ] , unusual binding partners were also co-puriﬁed with MS2-sRNA , such as transcribed spacer sequences of polycistronic tRNA transcripts ( glyW-cysT-leuZ and metZ-metW-metV ) . 
+ For a long time , these sequences were considered as non-functional `` junk RNA '' . 
+ Nevertheless , we demonstrated that , during glyW-cysT-leuZ pre-tRNA processing , the 30external transcribed spacer sequence ( 30ETSleuZ ) is excised and acts as a sRNAs sponge ( RyhB and RybB ) preventing them from binding their real targets [ 1 ] . 
+ Thus , 30ETSleuZ represents the ﬁrst functional tRNA-derived RNA fragment ( tRF ) in bacteria ( see reviews [ 6 -- 8 ] for more details on tRFs ) . 
+ Evidence also showed that RybB sRNA interacts with another kind of tRFs : the internal transcribed spacers ( ITS ) of the metZ-metW-metV pre-tRNA [ 1,9 ] . 
+ We ﬁrst noticed a strong enrichment of both ITSmetZ-metW/ITSmetW-metV with MS2-RybB [ 1 ] . 
+ Then , we performed the reverse experiment using each ITS as bait and were able to co-purify RybB sRNA conﬁrming that ITSmetZ-metW/ITSmetW-metV and RybB interact in vivo [ 9 ] . 
+ However , the related mechanism of regulation is still under investigation . 
+ Taken together , these results demonstrated that , unlike other methods commonly used to identify targets ( e.g. microarray , Northern blot , RNAseq ) , MAPS can depict the whole interactome of a speciﬁc RNA as it identiﬁes all kinds of interacting partners . 
+ In this report , we concisely describe adjustments made to MAPS to study various kind of RNA : RNA interactions . 
+ We also bring important modiﬁcations to MAPS protocol to be suitable for patho-genic or Gram-positive strains . 
+ Finally , we discuss the future of this technology 
+ Recently , we developed an in vivo technology , called MS2-afﬁnity puriﬁcation coupled with RNA sequencing ( MAPS ) , to tackle the problem of the identiﬁcation of RNA : RNA interactions [ 1 ] . 
+ Brieﬂy , the technique is based on the strong afﬁnity of the MS2 protein for the MS2 RNA aptamer used to tag an RNA of interest ( see review [ 2 ] for more details on the history of RNA tagging ) . 
+ The MAPS technology was ﬁrst used in two recent reports studying sRNA-mRNA interactions [ 1,3 ] . 
+ Regulatory small RNAs ( sRNAs ) are involved in the adaptation of bacteria facing environmental ﬂuctuations by post-transcriptionally regulating a subset of mRNAs [ 4,5 ] . 
+ Using MAPS , we already drew the interacting map of RyhB , RybB and DsrA , three well-characterized sRNAs in Escherichia coli . 
+ Here , we conﬁrmed previously known targets but also revealed new ones [ 1,3 ] . 
+ For example , we characterized grxD mRNA , involved in iron-sulfur cluster biosynthesis , as negatively regulated by RyhB sRNA [ 1 ] . 
+ In the same report , we validated yifE , encoding a conserved protein of unknown function , as a RybB positively regulated target . 
+ Moreover , we characterized a new atypical mRNA target of DsrA , coding for the ribose pyranase ( RbsD ) [ 3 ] . 
+ Contrary to the canonical mechanism of repression , DsrA inhibits translation of rbsD mRNA by interacting in the coding sequence and induces mRNA decay by an unknown ribonuclease ( RNase ) . 
+ However , we only touched the tip of the iceberg in these reports . 
+ ⇑ Corresponding author at : Department of Biochemistry , Université de Sherbrooke , 3201 Jean Mignault Street , Sherbrooke , QC J1E 4K8 , Canada . 
+ E-mail address : eric.masse@usherbrooke.ca ( E. Massé ) . 
+ 2. Materials and methods
+ 2.1. MS2 tagging
+ We usually fused the 50-end of the RNA fragment of interest with the MS2 RNA aptamer . 
+ We recommend 50-end MS2 tagging in order to keep the endogenous terminator and to minimize secondary structure changes . 
+ To obtain the corresponding chimeric DNA sequence cloned in a plasmid under the control of a pBAD promoter ( arabinose-inducible promoter ) , we have ﬁrst inserted MS2 stemloops into a pNM12 vector using MscI and EcoRI restriction enzymes ( NEB ) . 
+ Then , the DNA sequence corresponding to the RNA of interest is added using EcoRI and SphI ( Fig. 1A ) . 
+ The secondary structure of the chimeric RNA is veriﬁed using mfold software ( http://unafold.rna.albany.edu/?q=mfold ) . 
+ In the case of 50UTR-mRNA ( untranslated region of mRNA ) , we added MS2 stemloops at the 30end ( commonly at the beginning of the coding sequence ) followed by a T7 transcriptional terminator ( Fig. 1B ) . 
+ The MS2 aptamer and the T7 terminator are added to the 50UTR of mRNA using nested PCR . 
+ A stop codon is inserted in frame , just before the MS2 aptamer sequence to interrupt protein translation . 
+ In all cases , we used the untagged RNA as a control . 
+ We performed Northern blot analysis to verify that the MS2-RNA construct is expressed at a level similar to the untagged RNA . 
+ For sRNA , we also tested the activity of the MS2-sRNA construct on known mRNA targets . 
+ This activity has to be equivalent to the control . 
+ 2.2. Strains
+ Typically , we performed MS2-sRNA MAPS in a DsRNA rne131 ( RNA degradosome assembly mutant ) background . 
+ Both mutations maximize mRNA targets enrichment by eliminating interaction with the endogenous untagged sRNA and slowing down mRNA targets decay . 
+ Indeed , the use of a wild-type background reduces the enrichment ratio of negatively regulated targets ( data not shown ) . 
+ In the case of MS2-tRFs and 50UTR-mRNA-MS2 , we did not delete the endogenous gene to avoid potential disruptions of cellular metabolism . 
+ 2.3. MS2-MBP protein puriﬁcation
+ The plasmid pHMM , ﬁrst published in Batey and Kieft [ 10 ] , encodes the DNA sequence required to produce the 59 kDa polypeptide His6-MS2-MBP . 
+ This hybrid protein is composed of an N-terminal hexahistidine-tag , a maltose-binding protein ( MBP ) domain , and a C-terminal MS2 coat protein . 
+ The MS2-MBP protein puriﬁcation was performed as described by Batey and Kieft [ 10 ] with major modiﬁcations . 
+ BL21/pLysS E. coli cells containing pHMM plasmid were ﬁrst grown in 1 L of rich medium ( Luria Bertani medium , LB ) supplemented with 30 lg/mL chloramphenicol and 10 lg/mL kanamycin at 37 C to an OD600nm of 0.7 . 
+ At this point , the construct was expressed by addition of 0.5 mM IPTG . 
+ After 3 h of induction , cells were harvested and centrifuged ( 3800g for 15 min at 4 C ) . 
+ Cells pellets were resuspended in 50 mL of Lysis buffer ( 50 mM NaH2PO4 pH 8.0 , 300 mM NaCl , 0.5 % Tween 20 , 10 mM imidazole and 10 % glycerol ) supplemented with 50 ll of protease inhibitor cocktail ( 1 mg/mL ; Sigma-Aldrich ) and 5 mL of lysozyme ( 10 mg/mL ) . 
+ Then , cells were lysed by sonication ( 25 % for 4 min 5 s sonication , 5 s pause on ice ; Branson Digital Soniﬁer ) and centrifuged ( 17,000 g for 45 min ) . 
+ A second step of sonication is required if the supernatant remains too viscous . 
+ After the lysis step , the His6-MS2-MBP protein was puriﬁed using a Ni-NTA agarose column ( 4 mL of Ni-NTA agarose resin ( Qiagen ) on a 25 mL Econo-Pac chromatography column ( Biorad ) ) equilibrated with 25 mL of lysis buffer . 
+ The supernatant was loaded onto the column and the column was washed with 25 mL of lysis buffer . 
+ Samples were eluted by addition of 4 mL of Lysis buffer supplemented with 250 mM imidazole . 
+ After a dialyze step in 1 L of Dialysis buffer ( 25 mM Na-MES pH 6.0 , 25 mM NaCl ) , samples were loaded on a amylose resin column ( 3 mL of amylose resin ( NEB ) on a 25 mL Econo-Pac chromatography column ) equilibrated with 25 mL of Column buffer ( 20 mM Tris-HCl pH 7.4 , 200 mM NaCl , 0.5 mM EDTA ) . 
+ The column was wash with 36 mL of Column buffer . 
+ Samples were eluted with 6 mL of Column buffer supplemented with 10 mM maltose . 
+ The output was dialyzed in 1 L of buffer A ( 20 mM Tris-HCl pH 8.0 , 150 mM KCl , 1 mM MgCl2 ) supplemented with 10 % glycerol . 
+ Finally , the protein concentration was calculated using OD280nm and a molar extinction coefﬁcient of 83.310 M 1 cm 1 . 
+ 2.4. MS2-afﬁnity puriﬁcation coupled with RNA sequencing
+ A schematic representation of MAPS technology is available in Fig. 2 . 
+ Cells were grown in LB supplemented with 50 lg/mL ampicillin ( diluted 1/1000 from an overnight culture grown ) . 
+ Note that MAPS was successfully performed in other media ( e.g. M63 minimal medium ) ( data not shown ) . 
+ Cells were harvested in ( A ) exponential ( OD600nm = 0.5 ; 100 mL ) and/or ( B ) stationary phase of growth ( OD600nm > 1 ; 100 mL ) after induction of MS2 construct ( or control ) with 0.1 % arabinose for 10 min and then chilled on ice for 10 min . 
+ For input samples ( collected before afﬁnity puriﬁcation ) , RNA was extracted from 600 lL of culture using the hot-phenol procedure [ 11 ] . 
+ The remaining cells were washed with 1 mL of buffer A ( supplemented with 1 mM DTT and 1 mM PMSF ) and centrifuged . 
+ Cells were resuspended in ( A ) 2 or ( B ) 3 mL of buffer A and lysed using a French Press Cell Disrupter ( Thermo electron corporation ) with the following parameters : 430 psi , ( A ) three or ( B ) four times . 
+ Next , the lysat was cleared by centrifugation ( 17,000 g for 30 min at 4 C ) . 
+ For protein samples ( input ) , 20 lL of the soluble fraction were collected and mixed with 20 lL of protein sample buffer ( 125 mM Tris-HCl pH 6.8 , 1 % SDS , 20 % glycerol , 0.02 % bromophenol blue and 100 mM DTT ) . 
+ The remaining supernatant was subjected to afﬁnity chromatography ( all the following steps were performed at 4 C ) . 
+ The column was prepared by adding ( A ) 75 or ( B ) 100 lL of amylose resin ( New England Biolabs ) to Bio-Spin disposable chromatography columns ( Bio-Rad ) . 
+ We washed the column with 3 mL of buffer A. Next , ( A ) 100 or ( B ) 200 pmol of His6-MS2-MBP protein were immobilized on the amylose resin . 
+ Again , we washed the column with 2 mL of buffer A. Here , the supernatant was loaded onto the column , before being washed with ( A ) 5 or ( B ) 8 mL of buffer A . 
+ This crucial step requires some adaptation and adjustment in function of cell density . 
+ Bound RNA was eluted using 1 mL of buffer A supplemented with 15 mM maltose . 
+ Eluted RNA was extracted with phenol-chloroform ( V/V ) and precipitated by the addition of ethanol ( 2 vol ) and 20 mg of glycogen . 
+ At the same time , the organic phase was subjected to acetone precipitation to recover proteins . 
+ RNA samples were then analyzed by Northern blot and protein samples by Western blot as described in Lalaouna et al. [ 1 ] 
+ After all these steps , RNA samples ( output ) were treated with TURBOTMDNase ( 4 units ; Ambion ) for 30 min at 37 C to eliminate remaining genomic DNA . 
+ TURBO DNase was then removed using phenol-chloroform extraction and RNA was again precipitated . 
+ RNA quality and quantity were analyzed on Agilent Nano Chip on the bioanalyzer 2100 . 
+ cDNA libraries were prepared with ScriptSeqTM v2 RNA-Seq Library Preparation Kit ( Illumina ) for MS2-sRNA samples and with NEBNext Small RNA Library Prep set E7330S kit for MS2-ITS , MS2-ETS and 50UTR-mRNA-MS2 . 
+ Libraries were sequenced using Illumina MiSeq . 
+ In the case of MS2-30ETSleuZ ( Table 1 ) , we extracted RNA from exponential ( OD600nm = 0.5 ; 100 mL ) and stationary phase ( OD600nm = 1.3 ; 100 mL ) cultures ( WT background ) . 
+ We then followed the ( B ) procedure as described above . 
+ GEO accession numbers of MAPS data are : MS2-30ETSleuZ ( GSE79278 ) , MS2-ITSmetZW/MS2-ITSmetWV ( GSE66517 ) , MS2-RyhB ( GSE66519 ) , MS2-RybB ( GSE66518 ) , MS2-DsrA ( GSE67605 ) and MS2 control ( GSE67606 ) . 
+ 2.5. MAPS with pathogenic and/or Gram positive strains
+ Recently , we adapted MAPS technology to pathogenic strains manipulation . 
+ Interestingly , described modiﬁcations are also effective for Gram-positive strains ( data not shown ) . 
+ Here , the medium used for cell growth will depend on the studied strain ( e.g. LB med-ium for Salmonella enterica serovar Typhimurium , brain-heart infusion ( BHI ) broth for Staphylococcus aureus ) and is supplemented by the appropriate antibiotic in function of the overexpression system used . 
+ Generally , the extraction is performed on larger volume ( 300 mL ) . 
+ The lysis is a crucial step and requires adaptations for safety purposes . 
+ Here , the Precellys 24 bead beater was used to break cells instead of French press . 
+ After washing , harvested cells were resuspended in 1 mL of buffer A . 
+ The cell suspension was then transferred to screw cap tubes containing 200 lL of glass beads ( 0.1 mm in diameter ; BioSpec ) . 
+ Cells were disrupted using the bead beater ( three cycles : 6500 rpm for 20 s , followed by 1 min in ice ) . 
+ The lysate was then centrifuged at 17,000 g for 30 min at 4 C and the supernatant was apply to the afﬁnity puriﬁcation column . 
+ Following steps are identical to those described in Section 2.3 . 
+ For input samples , two different procedures were used depending on bacterial classiﬁcation . 
+ For Gram-negative bacteria , RNA was extracted from 1 mL of culture using hot-phenol procedure [ 11 ] . 
+ For Gram-positive bacteria , 1 mL of culture was centrifuged and resuspended in 500 lL of lysis solution ( 0.5 % SDS , 1 mM EDTA , 20 mM sodium acetate ) and transferred into a screw cap tube containing 200 lL of glass beads and 500 lL of phenol ( pH 4 ) . 
+ To break the cell wall , cells were vortexed using the bead beater ( three cycles : 6500 rpm for 20 s , followed by 1 min in ice ) . 
+ The sample was then centrifuged and RNA was then precipitated as described in Section 2.3 . 
+ 2.6. Data processing
+ Data processing procedure is described in details in Fig. 3 . 
+ We used bioinformatics tools freely available on Galaxy Project platform ( https://galaxyproject.org/ ) [ 13 ] . 
+ The ﬁrst workﬂow enables to align reads to the corresponding genome ( e.g. E. coli K12 ) and to visualize them using the UCSC Gen-ome Browser ( Fig. 3A ) . 
+ First , we used FASTQ Groomer ( Galaxy version 1.0.4 ) to convert FASTQ ﬁles . 
+ Second , the quality of raw sequences data was assessed using FastQC ( Galaxy version 0.52 ) . 
+ Third , checked sequences were aligned to the corresponding genome assembly using Map with Bowtie for Illumina ( Galaxy version 1.1.2 ) . 
+ To ﬁnish , we used Create a BedGraph of genome coverage ( Galaxy version 0.1.0 ) to obtain data ﬁles in a format compatible with UCSC Microbial GenomeBrowser ( http : / / microbes.ucsc.edu / ) . 
+ The second workﬂow is useful to assign reads to gene . 
+ All required steps are indicated in Fig. 3B . 
+ Brieﬂy , we compare mapped regions with gene positions ( extracted from NCBI/GenBank ) . 
+ For this purpose , we formatted a gene bank ﬁle as indicated by the following example : chr 190 255 þ thrL 
+ After all , read counts were normalized by coverage . 
+ Data are presented in a tab-delimited text ﬁle which includes normalized reads and MS2-RNA/RNA ( Control ) ratio . 
+ 2.7. Primer extension
+ The cleavage site that releases 30ETSleuZ was determined using primer extension ( Fig. 4 ) . 
+ This result was required to remove RNase E cleavage site from the MS2-30ETSleuZ construct and avoid the loss of the MS2 aptamer . 
+ Brieﬂy , 20 lg of total RNA were incubated with 0.5 pmol of 32P-radiolabelled oligonucleotide ( EM3204 , CCGAAGGTGGTTT-CACGACAC ) and 1 mM dNTPs . 
+ After 5 min of incubation at 65 C , followed by 1 min on ice , 5X reaction buffer , 0.1 M DTT , RNase Inhibitor Murine ( 40 units , NEB ) and ProtoScript II Reverse Transcriptase ( 200 units , NEB ) were added to the reaction . 
+ Reverse transcription was performed for 60 min at 42 C before the enzyme was inactivated at 90 C for 10 min . 
+ Samples were then precipitated and migrated on a denaturing 8 % polyacrylamide gel . 
+ As a control , we performed the same experiment without reverse transcriptase . 
+ The sequencing ladder was obtained with a DNA template ( PCR with oligonucleotides EM3205 ( CTCCGGGTACCATGG-GAAAG ) and EM3206 ( CCTATCTTACATGCCGGTCCG ) ) and the same radiolabelled primer ( EM3204 ) . 
+ Here , we added ddNTP to stop the reaction performed by the Vent DNA polymerase ( 2 units , NEB ) as described below . 
+ For the G lane , we used 0.36 mM ddGTP , 0.037 mM dGTP , 0.03 mM dATP , 0.1 mM dCTP and 0.1 mM dTTP . 
+ For the A lane , we used 0.9 mM ddATP , 0.03 mM dATP , 0.1 mM dGTP , 0.1 mM dCTP and 0.1 mM dTTP . 
+ For the T lane , we used 0.72 mM ddTTP , 0.033 mM dTTP , 0.1 mM dGTP , 0.03 mM dATP and 0.1 mM dCTP . 
+ For the C lane , we used 0.42 mM ddCTP , 0.041 mM dCTP , 0.1 mM dGTP , 0.03 mM dATP and 0.1 mM dTTP . 
+ We also added 0.1 % Triton to the reaction . 
+ Then , we performed the following PCR steps : 1 min at 95 C , 1 min at 52 C and 1 min at 72 C ( 25 cycles ) . 
+ The reaction was stopped by addition of formamide loading dye and migrated next to primer extension samples 
+ 3.1. Evolution of MAPS technology
+ Over the years , we modiﬁed and improved key parameters of the MS2-afﬁnity puriﬁcation [ 1,3,9,14 ] . 
+ We exhaustively expose these modiﬁcations in Section 2 . 
+ To perform MAPS with higher cellular concentration ( exponential and stationary phases or stationary phase only ) , we modiﬁed the lysis step and increased the loading capacity of the column . 
+ We also adapted MAPS procedure to pathogenic strains manipulation by modifying cell breakage . 
+ We demonstrated that the same protocol can be used for Gram-positive bacteria . 
+ The adjustment of MAPS technology to pathogenic and/or Gram-positive bacteria will certainly facilitate the targets identiﬁcation in strains remarkably difﬁcult to genetically manipulate . 
+ Deep characterization of the interactome of sRNA involved in virulence in pathogenic strains will also bring to light targets with a therapeutic potential . 
+ To study sRNA interacting with a 50UTR-mRNA or a tRF , we also used another library preparation kit to speciﬁcally enrich small RNA fragments . 
+ This approach has already been successfully used with both MS2-ITSmetZW/MS2-ITSmetWV [ 9 ] and MS2-30ETSleuZ ( Fig. 4A ) . 
+ Thus , we conﬁrmed results previously obtained with MS2-RyhB and MS2-RybB [ 1 ] and hence the efﬁciency of MAPS technology to determine binding partners of tRNA-derived RNA fragments . 
+ In this case , one of the major limitations is the presence of enzymatic cleavage sites that naturally enable the maturation of pre-tRNAs and , as a consequence , the release of tRFs . 
+ To counter this problem and avoid loss of the MS2 aptamer , we added an additional step : we ﬁrst identify the +1 of each tRF using primer extension ( see Section 2 ) . 
+ Next , we fuse the MS2 aptamer to the determined 50-end . 
+ This procedure has been validated with 30ETSleuZ ( Fig. 4B ) . 
+ Here , the cleavage site occurs 15 nt after the 30CCA end of leuZ gene , releasing a 53 nt-long RNA fragment ( Fig. 4C ) . 
+ MAPS technology was performed with pBAD-MS2 and pNM12 ( Ctrl ) . 
+ Cells were harvested in exponential ( OD600nm = 0.5 ; 100 mL ) and in stationary phase ( OD600nm = 1 ; 100 mL ) ( WT background ) . 
+ After that , the ( B ) procedure described in Section 2 was followed . 
+ The 50 most enriched genes are represented ( with more than 50 reads ) . 
+ The GEO accession number is GSE67606 . 
+ 3.2. Removal of false positives
+ As previously mentioned in the Introduction , a lot of putative targets were revealed by MAPS but not veriﬁed [ 1,3 ] . 
+ To facilitate the analysis , we performed MAPS using MS2 aptamer only as bait to discriminate the subpopulation of genes that interact with the MS2 tag rather than the gene of interest [ 3 ] . 
+ The Top 50 most enriched genes are represented in Table 2 . 
+ The complete list is also available ( GEO accession number GSE67606 ) . 
+ This list is useful to eliminate false positives and reduce the number of candidates . 
+ For example , ompW gene is often co-puriﬁed with MS2-sRNA without being regulated by them ( data not shown ) . 
+ 3.3. Perspectives
+ In bacteria , the MS2-afﬁnity puriﬁcation was successfully used to co-purify the RNA chaperone protein Hfq with sRNAs , 50UTR-mRNA and tRFs [ 1,3,15 ] . 
+ Others proteins are known to be involved in sRNA-mediated regulation like various ribonucleases [ 16 ] . 
+ Due to transitory interaction , MS2-pulldown assay failed to pick up RNase E with a sRNA : mRNA complex reproducibly ( data not shown ) . 
+ To improve RNase recovery , an additional step of UV crosslink should be performed before cell lysis . 
+ Especially , application of the UV crosslink to MS2-afﬁnity puriﬁcation coupled with mass spectrometry should enable the identiﬁcation of still unknown ribonucleases as in the case of rbsD mRNA [ 3 ] or the characterization of sRNA sequestering proteins by mimicking their recognition site . 
+ Recently , McaS sRNA , ﬁrst described as regulator of multiple mRNAs through base pairing , was also shown to directly interact with the pleiotropic regulatory protein CsrA and alleviate its activity [ 17 ] . 
+ Hence , we can easily assume that other well-characterized sRNAs could have a dual function . 
+ Certainly , UV crosslink MS2-afﬁnity puriﬁcation coupled with mass spectrometry will be complementary to recently published method where authors opted for the opposite approach by using protein as bait [ 18 -- 20 ] . 
+ Recently , Wade 's group adapted ribosome proﬁling technology to the identiﬁcation of sRNA targets [ 21 ] . 
+ Using RNA sequencing , they compared ribosome-protected mRNA fragments proﬁles in presence or absence of sRNA . 
+ Thus , ribosome proﬁling could be performed in parallel to MAPS . 
+ The combination of these two methods will represent a perfect screen for sRNA targetome characterization . 
+ Finally , in Desnoyers and Masse [ 14 ] we demonstrated that MS2-afﬁnity puriﬁcation can be used to perform co-variational mutagenesis instead of using classical experiments ( e.g. in vitro probing or b-galactosidase ) to prove direct interaction . 
+ Mutations were introduced at either the sRNA pairing site or the Hfq pairing site in the mRNA sequence , which allowed us to characterize the Hfq-mRNA ribonucleoprotein complex . 
+ During the last few years , we performed MAPS with various kind of RNA : sRNA [ 1,3 ] , 50UTR-mRNA [ 3 ] and tRFs [ 1,9 ] . 
+ We overcame technical limitations of MAPS to apply it to all kind of RNA : RNA interaction regardless of the studied organism . 
+ Especially , extension of MAPS technology to pathogenic strains will help to explore the targetome of sRNAs involved in bacterial virulence and , therefore , increase the reservoir of potential molecules for antibacterial design . 
+ Moreover , the combination of MAPS with other high-throughput technologies will pave the way for an easier sRNA functional screening . 
+ This work has been supported by an operating grant from the Canadian Institutes of Health Research ( CIHR ) to EM .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/27900321.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/27900321.txt 0 → 100644
View file @27818a9
+ Genome-Wide Transcriptional
+ 1 Laboratory of Molecular Biology , National Institutes of Health , National Cancer Institute , Bethesda , MD , USA , 2 Microbiomics and Immunity Research Center , Korea Research Institute of Bioscience and Biotechnology , Daejeon , Korea , 3 Laboratory of Metabolism , National Institutes of Health , National Cancer Institute , Bethesda , MD , USA , 4 Wadsworth Center , New York State Department of Health , Albany , NY , USA , 5 Department of Biomedical Sciences , School of Public Health , University of Albany , Albany , NY , USA , 6 Gene Regulation and Chromosome Biology Laboratory , National Institutes of Health , National Cancer Institute , Center for Cancer Research , Frederick , MD , USA , 7 DNASTAR , Inc. , Madison , WI , USA 
+ Keywords: GalR regulon, mega-loop, ChIP-chip, nucleoid, DNA superhelicity
+ INTRODUCTION
+ The 4.6 Mb Escherichia coli chromosomal DNA is packaged into a small volume ( 0.2 -- 0.5 µm3 ) for residing inside a cell volume of 0.5 -- 5 3 µm ( Loferer-Krossbacher et al. , 1998 ; Skoko et al. , 2006 ; Luijsterburg et al. , 2008 ) . 
+ It has been suggested that a bacterial chromosome has a 3-D structure that dictates the entire chromosome 's gene expression pattern ( Kar et al. , 2005 ; Macvanin and Adhya , 2012 ) . 
+ The chromosome structure and the associated volume are deﬁned and environmentdependent . 
+ The compaction of the DNA into a structured chromosome ( nucleoid ) is facilitated by several architectural proteins , often called `` nucleoid-associated proteins '' ( NAPs ) . 
+ NAPs are well-characterized bacterial histone-like proteins such as HU , H-NS , Fis , and Dps ( Ishihama , 2009 ) . 
+ For example , deletion of the gene encoding the NAP HU leads to substantial changes in cell volume and in the global transcription proﬁle , presumably due to changes in chromosome architecture ( Kar et al. , 2005 ; Oberto et al. , 2009 ; Priyadarshini et al. , 2013 ) . 
+ A recent and surprising addition to the list of NAPs in E. coli is the sequence-speciﬁc DNA-binding transcription regulatory protein , 
+ GalR ( Qian et al. , 2012 ) . 
+ In contrast , related DNA-binding proteins PurR , MalT , FruR , and TyrR do not appear to affect the chromosome structure ( Qian et al. , 2012 ) . 
+ Here , we discuss experimental results that led us to explore the idea that GalR also regulates transcription at a global scale through DNA architectural changes . 
+ GalR regulates transcription of the galETKM , galP , galR , galS , and mglBAC transcripts ( Figure 1 ) . 
+ These genes all encode proteins involved in the transport and metabolism of D-galactose . 
+ Moreover , GalR controls expression of the chiPQ operon , which encodes genes involved in the transport of chitosugar . 
+ The galETKM operon ( Figure 1 ) is transcribed as a polycistronic mRNA from two overlapping promoters , P1 ( +1 ) and P2 ( − 5 ) ( Musso et al. , 1977 ; Aiba et al. , 1981 ) . 
+ GalR regulates P1 and P2 promoters differentially . 
+ GalR binds two operators , OE , located at position − 60.5 , and OI , located at +53.5 ( Irani et al. , 1983 ; Majumdar and Adhya , 1984 , 1987 ) . 
+ Binding of GalR to OE represses P1 and activates P2 by arresting RNA polymerase , and facilitating the step of RNA polymerase isomerization , respectively ( Roy et al. , 2004 ) . 
+ When GalR binds to both OE and OI , which are 113 bp apart and do not overlap with the two promoters , it prevents transcription initiation from both P1 and P2 ( Aki et al. , 1996 ; Aki and Adhya , 1997 ; Semsey et al. , 2002 ; Roy et al. , 2005 ) . 
+ Mechanistically , two 
+ DNA-bound GalR dimers transiently associate , creating a loop in the intervening promoter DNA segment . 
+ Kinking at the apex of the loop facilitates binding of HU , which in turn stabilizes the loop ( Figure 2 ; Kar and Adhya , 2001 ) . 
+ The DNA structure in the looped form is topologically closed and binds RNA polymerase , but does not allow isomerization into an actively transcribing complex ( Choy et al. , 1995 ) . 
+ Following the example of GalR-mediated DNA loop formation by interaction of GalR bound to two operators in the galE operon , and considering the fact that GalR operators in the galP , mglB , galS , galR , and chiP promoters are scattered around the chromosome , we hypothesized that GalR may oligomerize while bound to distal sites , thereby forming much larger DNA loops ( `` mega-loops '' ) . 
+ We employed the Chromosome Conformation Capture ( 3C ) method to investigate interactions between distal GalR operators ( Dekker et al. , 2002 ) . 
+ Thus , we showed that GalR does indeed oligomerize over long distances , resulting in the formation of mega-loops . 
+ Moreover , our data suggested the existence of other unidentiﬁed GalR binding sites around the chromosome , with these novel sites also participating in long-distance interactions ( Qian et al. , 2012 ) . 
+ Figure 3 shows in a cartoon from the demonstrable GalR-mediated DNA-DNA connections as listed in Table 1 . 
+ Although , we originally proposed that DNA-bound GalR-mediated mega-loops may serve to increase the local concentrations of GalR around their binding sites for regulation of the adjacent promoters ( Oehler and Muller-Hill , 2010 ) , global regulation of gene expression due to change in chromosome structure may be another consequence of mega-loop formation . 
+ We propose that GalR-mediated mega-loop formation results in the formation of topologically independent DNA domains , with the level of superhelicity in each domain inﬂuencing transcription of the local promoters . 
+ Bacterial and Bacteriophage Strains
+ Bacteriophage P1 lysates of galR : : kanR ( from Keio collection ; ( Baba et al. , 2006 ) ) were made and E. coli K-12 MG1655 galR deletion strains were constructed from MG655 by bacteriophage P1 transduction using the lysate . 
+ Cells were then grown in 125 ml corning ﬂasks ( Corning © R 430421 ) containing 30 ml of M63 minimal medium plus D-fructose ( ﬁnal concentration 0.3 % ) at 37 ◦ C with 230 rpm shaking . 
+ At OD600 0.6 , cell cultures were separated into two ﬂasks . 
+ Subsequently , D-galactose ( ﬁnal 
+ 3072949 3072964 O ( F25-1 ) CTTAAATCGATTGCCG 
+ 3072989 3073004 O ( F25-2 ) TTTGAAGCGATTGCGG 
+ Connections were detected among these sites except galEE and galEI by 3C assays . 
+ The ﬁrst seven operators that showed connections by 3C were known before . 
+ The ones named as F were discovered during the 3C studies ( Qian et al. , 2012 ) . 
+ concentration 0.3 % ) or water was added and cells were cultivated for an additional 1.5 h at 37 ◦ C. E. coli MG1655 galR-TAP ( AMD032 ) was constructed by bacteriophage P1 transduction of the kanR-linked TAP tag cassette from DY330 galR-TAP ( Butland et al. , 2005 ) . 
+ The kanR cassette was removed using pCP20 , as described previously ( Datsenko and Wanner , 2000 ) . 
+ E. coli MG1655 galR-FLAG3 ( AMD188 ) was constructed using FRUIT ( Stringer et al. , 2012 ) . 
+ RNA Isolation
+ Cell cultures were placed on ice and RNAprotectTM Bacteria Reagent ( Qiagen © R 76506 ) was added to stabilize the RNA ( Lee et al. , 2014 ) . 
+ Cells were harvested for RNA puriﬁcation by RNeasy © Mini Kit ( Qiagen R R © 74104 ) following the manufacturer 's recommendations . 
+ RNA concentrations and TM purity were measured using a Thermo Scientiﬁc NanoDrop 1000 . 
+ Further sample processing was performed according to the Affymetrix GeneChip © R Expression Analysis Technical Manual , Section 3 : Prokaryotic Sample and Array Processing ( 701029 Rev. 4 ) . 
+ Isolated RNA ( 10 µg ) was used for Random Primer cDNA synthesis using SuperScript IITM Reverse Transcriptase ( Invitrogen Life Technologies 18064-071 ) . 
+ The reaction mixture was treated with 1N NaOH to degrade any remaining RNA and treated with 1N HCl to neutralize the NaOH . 
+ Synthesized cDNA was then puriﬁed using MinElute © R PCR Puriﬁcation columns ( Qiagen © 28004 ) . 
+ Puriﬁed cDNA concentration R and purity were measured using a Thermo Scientiﬁc NanoDropTM 1000 . 
+ Puriﬁed cDNA was fragmented to between 50 and 200 bp by 0.6 U / µg of DNase I ( Amersham Biosciences 27-0514-01 ) ◦ for 10 min at 37 C in 1X One-Phor-All buffer ( Amersham Biosciences 27-0901-02 ) . 
+ Heat inactivation of the DNase I enzyme was performed at 98 ◦ C for 10 min . 
+ Fragmented cDNA was then 3 ′ termini biotin labeled using the GeneChip © DNA Labeling Reagent ( Affymetrix R 900542 ) and 60 U of Terminal Deoxynucleotidyl Transferase ( Promega M1875 ) at 37 ◦ C for 60 min . 
+ The labeling reaction was then stopped by the addition of 0.5 M EDTA . 
+ Microarray Hybridization
+ Labeled cDNA fragments ( 3 µg ) were then hybridized for 16 h ( 60 rpms ) at 45 ◦ C to tiling array chips ( Ecoli_Tab520346F ) purchased from Affymetrix ( Santa Clara , CA ) . 
+ The chips have 1,159,908 probes in 1.4 cm × 1.4 cm and a 25-mer probe every 8 bps in both strands of whole E. coli genome . 
+ In addition , the probes are also overlapped by 4 bps with other strand probes . 
+ Each 25-mer DNA probe in the tiling array chip are 8 bp apart from the next probe . 
+ Probes are designed to cover the whole E. coli genome . 
+ Microarray: Washing and Staining
+ The chips were then washed with Wash Buffer A : NonStringent Wash Buffer ( 6X SSPE , 0.01 % Tween-20 ) . 
+ Wash Buffer B : ( 100 mM MES , 0.1 M [ Na + ] and 0.01 % Tween-20 ) and stained with Streptavidin Phycoerythrin ( Molecular Probes S-866 ) and anti-streptavidin antibody ( goat ) , biotinylated ( Vector Laboratories BA-0500 ) on a Genechip Fluidics Station 450 ( Affymetrix ) according to washing and staining protocol , ProkGE-WS2_450 . 
+ Microarray: Scanning and Data Analysis
+ Hybridized , washed , and stained microarrays were scanned using a Genechip Scanner 3000 ( Affymetrix ) . 
+ Standardized signals , for each probe in the arrays , were generated using the MAT analysis software , which provides a model-based , sequencespeciﬁc , background correction for each sample ( Johnson et al. , 2006 ) . 
+ A gene speciﬁc score was then calculated for each gene by averaging all MAT scores ( natural log ) for all probes under the annotated gene coordinates . 
+ Gene annotation was from the ASAP database at the University of WisconsinMadison , for E. coli K-12 MG1655 version m56 ( Glasner et al. , 2003 ) . 
+ Data were graphed with ArrayStar © , version 2.1 . 
+ R DNASTAR . 
+ Madison , WI . 
+ The tiling array data was submitted to NCBI Gene Expression Omnibus . 
+ The accession number is GSE85334 . 
+ ChIP-Chip Assays
+ MG1655 galR-TAP ( AMD032 ) cells were grown in LB at 37 ◦ C to an OD600 of ∼ 0.6 . 
+ ChIP-chip was performed as described previously ( Stringer et al. , 2014 ) . 
+ Data analysis was performed as described previously except that probes were ignored only if they had a score of < 100 pixels , indicating regions that are likely missing from the genome ( Stringer et al. , 2014 ) . 
+ Adjacent probes scoring above the threshold for being called as being in GalR-bound regions were merged , and the highest-scoring probe was selected as the `` peak position . '' 
+ The closely spaced peaks upstream of mglB and galS were manually separated . 
+ The ChIP-chip data was submitted to the EBI Array Express repository . 
+ The accession number is E-MTAB-4903 . 
+ Identiﬁcation of an Enriched Sequence Motif from ChIP-Seq Data
+ For each peak position , we extracted genomic DNA sequence using the following formulae to determine the upstream and downstream coordinates : upstream coordinate : UP − ( ( UP − UP − 1 ) ∗ ( SP − 1 / SP ) ) ; downstream coordinate : DP − ( ( DP +1 − DP ) ∗ ( SP +1 / SP ) ) ; where S = probe score , U = genome coordinate corresponding to the upstream end of a probe , D = genome coordinate corresponding to the downstream end of a probe , P = peak probe , P − 1 = probe upstream of peak , and P +1 = probe downstream of peak . 
+ We used MEME ( version 4.11.2 , default parameters except any number of motif repetitions was allowed ) to identify an enriched sequence motif ( Bailey and Elkan , 1994 ) . 
+ ChIP-qPCR
+ MG1655 galR-FLAG3 ( AMD188 ) cells were grown in LB at 37 ◦ C to an OD600 of 0.6 -- 0.8 . 
+ ChIP-qPCR was performed as described previously ( Stringer et al. , 2014 ) . 
+ The motifs in bold letters are also present in Table S2.
+ RESULTS
+ In silico Identiﬁcation of Novel GalR Target Genes in E. coli A consensus sequence of GalR binding sites from the previously known functional 9 operators in the gal regulon ( galE , galP , mglB , galS , and galR promoters ; Figure 1 ) appears to be a 16-bp hyphenated dyad symmetry sequence with the center between 1 16 positions 8 and 9 : GTGNAANC.GNTTNCAC ( with N being any nucleotide ; Weickert and Adhya , 1993a ) . 
+ Genetic analysis showed that mutations at any of the positions 3 , 5 , 9 , and 15 ( labeled in bold ) create a functionally defective operator ( Adhya and Miller , 1979 ) . 
+ Therefore , we used a motif in which nucleotides at positions 3 , 5 , 9 , and 15 were ﬁxed to search through the whole genome of E. coli ( NC_000193 .3 ) ( Baba et al. , 2006 ) for putative GalR operators , allowing two mismatches at other non-N positions as described ( Qian et al. , 2012 ) . 
+ Thus , we found 165 potential GalR operators distributed across the genome ( Table S1 ) . 
+ Further analysis of the original 9 GalR-target operators sequences with critical information content was conducted ( Figure 1 ; Schneider and Mastronarde , 1996 ) . 
+ A unique alignment of 42 bp length was obtained ; the information content of the optimally aligned sites was Rsequence = 16.1 ± 0.7 bits/site for the 42 bp sequence range ( Shannon , 1948 ; Pierce , 1980 ; Schneider et al. , 1986 ) . 
+ The information content needed to ﬁnd these 9 sites in the 4,641,652 bp E. coli genome ( NC_000913 .3 ) is Rfrequency = 18.98 bits/site ; the information content in the sites is not suficient for them to be found in the genome , Rsequence/Rfrequency = 0.85 ± 0.04 , so the binding sites do not have enough information content for them to be located in the genome ( Schneider et al. , 1986 ; Schneider , 2000 ) . 
+ This result implies that there could be 66 ± 32 sites in the genome . 
+ As shown in Figure 4 , the sequence logo of the binding sites covers the DNase I protection segment ( Majumdar and Adhya , 1987 ; Schneider and Stephens , 1990 ) . 
+ There may be additional conservation near a DNase I-hypersensitive site in a major groove one helical turn from the central two major grooves bound by GalR ( − 16 and +17 ; Figure 4 ) . 
+ The sequence conservation in the center of the site at bases 0 and 1 exceeds the sine wave , indicating that GalR binds to non-B-form DNA 
+ ( Schneider , 2001 ) as was previously suggested ( Majumdar and Adhya , 1989 ) . 
+ An individual information weight matrix corresponding to positions − 20 to +21 of the logo in Figure 4 was created and scanned across the E. coli genome ( Schneider , 1997 ) . 
+ Sixty sites were identiﬁed that contain more than 9.4 bits , the lowest information content of the biochemically proven sites . 
+ The sequences of novel GalR predicted sites corresponding to the logo are summarized in Table 2 . 
+ Rfrequency for these sites in the genome is 16.24 bits/site , which is close to the observed 16.3 ± 0.1 bits/site from all the predicted genomic sites . 
+ Functional Analysis of the Putative GalR Binding Sites Using ChIP-chip Assays
+ For the functional analysis of the putative binding sites , a ChIP-chip assay was performed to detect GalR target sequences genome-wide in vivo ( Collas , 2010 ; Wade , 2015 ) . 
+ In this ChIP-chip assay the binding of C-terminally TAP ( tandem afinity puriﬁcation ) - tagged GalR ( tagged at its native locus in an unmarked strain ) was mapped across the E. coli genome . 
+ The experimental data resulting from ChIP-chip analysis were validated by quantitative real-time PCR ( ChIP/qPCR ) . 
+ To demonstrate that the ChIP signal was not an artifact of the TAP tag , we constructed an unmarked derivative of E. coli MG1655 that expressed a C-terminally FLAG3-tagged GalR from its native locus . 
+ We selected six ( ytfQ , galE , purR , talB , cyaA , and chiP ) sites for validation , including ytfQ , talB , and cyaA that had not been described or predicted previously . 
+ In all cases , we detected signiﬁcant signal of GalR binding indicating that these are genuine sites of GalR binding ( Figure 5 ) . 
+ The inferred binding sites from ChIP-chip assays are listed in Table 3 . 
+ We identiﬁed 15 GalR-bound regions , four of which contain two operators . 
+ These include 8 known operators ( in galE , galP , galS , galR , chip , and mglB ; Weickert and Adhya , 1993b ; Plumbridge et al. , 2014 ) . 
+ Thirteen of the 15 putative GalR-bound regions overlap an intergenic region upstream of a gene start . 
+ This is a strong enrichment over the number expected by chance ( only ∼ 12 % of the genome is intergenic ) . 
+ Global Transcription Proﬁle in the Presence and Absence of GalR
+ Since both in silico investigation and ChIP-chip assays suggested that the regulatory role of GalR goes beyond D-galactose metabolism , we used transcriptome proﬁling to gain further insight into the impact of GalR on genome-wide transcription . 
+ To evaluate the effect of galR deletion on global gene expression patterns , we compared the ratio of RNA isolated from a ∆ galR mutant to that isolated from wild-type cells , using DNA tiling microarrays ( Tokeson et al. , 1991 ) . 
+ The results of the transcriptional analysis are displayed in the MAT plot shown in Figure 6 . 
+ For all analysis , we arbitrarily selected a stringent ratio cut-off of 3 . 
+ We identiﬁed 238 genes with values exceeding this cut-off ( Table S2 ) . 
+ These 238 genes are transcribed from 158 promoters . 
+ Three transcripts ( 5 genes ) of the 158 promoters are up-regulated ( GalR acting as a repressor ) and 155 transcripts ( 233 genes ) are down-regulated ( GalR acting as an activator ; Table S2 ) . 
+ Interestingly , several genes including mglB are dys-regulated by GalR but fall outside of the cut-off range . 
+ All three ( galP , galP1 , and galP2 ) of the up-regulated promoters have adjacent operators . 
+ Of the 155 down-regulated promoters , 4 promoters contain adjacent operators and the remaining 151 do not . 
+ DISCUSSION
+ Using a combination of bioinformatic and experimental approaches we identiﬁed many putative novel GalR operators in the E. coli genome . 
+ As expected , several of these putative operators were identiﬁed by both information theory and ChIP-chip assays , demonstrating that they represent genuine GalR binding sites . 
+ Thus , we have substantially expanded the known GalR regulon . 
+ Surprisingly , our data suggest that GalR , a regulator of D-galactose metabolism , also regulates the expression of genes involved in other cellular processes . 
+ Interestingly , three of the putative novel GalR target genes -- cytR , purR , and adiY -- encode transcription factors , suggesting that GalR may be part of a more complex regulatory network . 
+ Moreover , putative GalR operators upstream of cytR and purR overlap with operators for CytR and PurR , respectively , indicating combinatorial regulation of these genes ( Meng et al. , 1990 ; Rolfes and Zalkin , 1990 ; Mengeritsky et al. , 1993 ) . 
+ Despite our identiﬁcation of GalR operators with high conﬁdence upstream of genes mentioned above , our expression microarray data show little or no regulation of these genes by GalR . 
+ We propose that regulation of these genes by GalR is conditionspeciﬁc , requiring input from additional regulatory factors . 
+ Role of GalR in Gene Regulation
+ DNA tiling array analysis revealed that the transcription of a surprisingly large number of promoters ( 158 ) in E. coli is dysregulated by deletion of the galR gene . 
+ On the other hand , we identiﬁed 165 established or potential GalR operators in the chromosome , 76 of which are located between − 200 to +400 bp from the tsp of promoters ( cognate ) , and the other 89 operators are not ( Table S1 ) . 
+ We called the former group of operators , `` Gene Regulatory Sites '' ( GRS , listed in Table 4 ) . 
+ Consistent with a previous proposal ( Macvanin and Adhya , 2012 ) , we believe that 89 non-cognate operators around the chromosome are playing an architectural role in chromosome organization . 
+ The unattached operators would be referred to as `` Chromosome Anchoring Sites '' ( CAS ) . 
+ Some of the sites may serve as both GRS and CAS . 
+ The 76 ( 46 % ) GRS and 89 ( 54 % ) CAS are shown in Table S1 . 
+ Seventy-six GRS include 9 previously known operators of the gal regulon ( see Figure 1 ) ; the other 67 , which control promoters , were not known previously . 
+ The discovery of new GRS indicates that GalR , a well-known regulator of D-galactose metabolism , also regulates the expression of other genes . 
+ Among the new GRS , 3 ( in yaaJ , purR , and ytfQ promoters ) were conﬁrmed by in vivo DNA-binding ( ChIP-chip assays ) as shown in Table 3 . 
+ The salient features of our ﬁndings presented in this paper are shown schematically in Figure 7 . 
+ Although we identiﬁed 158 transcripts whose expression was regulated by GalR , very few of these are associated with a putative GalR operator identiﬁed in silico and/or ChIP-chip assays , strongly suggesting that the majority of regulation by GalR occurs indirectly . 
+ Based on our earlier observation that GalR mediates mega-loop formation , we propose that long-range oligomerization of GalR indirectly regulates transcription by altering chromosome structure . 
+ There are at least three possible mechanisms for such regulation : indirect control , enhancer activity , and modulation of DNA superhelicity . 
+ In the indirect control model , GalR directly regulates another regulator , such as PurR or CytR , and the downstream regulator directly regulates other genes . 
+ The regulation by GalR is indirect , but occurs by a classical regulatory mechanism . 
+ In the enhancer activity model , GalR stimulates transcription of some target genes by binding to a distal site and forming an enhancer-loop with a protein bound to the promoter region . 
+ Examples of enhancer activity have been described before for some prokaryotic and many eukaryotic promoters ( Rombel et al. , 1998 ; Schaffner , 2015 ) . 
+ In the DNA superhelicity modulation model , GalR creates DNA topological domains by mega-loop formation and deﬁnes local chromosomal superhelicity by GalR-GalR interactions between distally bound dimers . 
+ The strength of a promoter is usually deﬁned by superhelical nature of the DNA ( Pruss and Drlica , 1989 ; Lim et al. , 2003 ) . 
+ We propose that GalR entraps different amount of superhelicity in different topological domains and thus controls transcription of the constituent promoters . 
+ In the absence of GalR such domains are not formed resulting in a change in local DNA superhelicity , and thus a change in the strength of the constituent promoters . 
+ In this model , GalR protein indirectly regulates gene transcription as an architectural protein . 
+ We are currently studying the regional superhelicities in the entire chromosome in the presence and absence of GalR as well as the implication of genes affected by GalR , but independent of D-galactose metabolism ( Lal et al. , 2016 ) . 
+ AUTHOR CONTRIBUTIONS
+ ZQ : designed genome-wide sequence analysis , interpreted sequence analysis data and tiling array data ; AT and SL : executed tiling array experiments and data analysis ; XH : executed genome-wide sequence analysis ; TD : integrated tiling array and genome-wide sequence data ; AS and JW : executed ChIP-chip and ChIP-qPCR experiments and data analysis ; DL : data analysis ; TS : executed Information Theory and data analysis ; SA : organized and designed experiments , and data analysis . 
+ All authors contributed to the manuscript preparation . 
+ ACKNOWLEDGMENTS
+ This work was supported by the Intramural Research Program of the National Institutes of Health , the National Cancer Institute , and the Center for Cancer Research . 
+ The authors have no conﬂict of interest to declare . 
+ We thank the Wadsworth Center Applied Genomic Technologies Core Facility for assistance with microarrays for ChIP-chip assays . 
+ SUPPLEMENTARY MATERIAL
+ The Supplementary Material for this article can be found online at : http://journal.frontiersin.org/article/10.3389/fmolb . 
+ 2016.00074 / full #supplementary - material
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/28174601.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/28174601.txt 0 → 100644
View file @27818a9
+ programed vanillin-sensing bacterium
+ Abstract 
+ Background : Lignin is a potential biorefinery feedstock for the production of value-added chemicals including vanillin . 
+ A huge amount of lignin is produced as a by-product of the paper industry , while cellulosic components of plant biomass are utilized for the production of paper pulp . 
+ In spite of vast potential , lignin remains the least exploited component of plant biomass due to its extremely complex and heterogenous structure . 
+ Several enzymes have been reported to have lignin-degrading properties and could be potentially used in lignin biorefining if their catalytic properties could be improved by enzyme engineering . 
+ The much needed improvement of lignin-degrading enzymes by high-throughput selection techniques such as directed evolution is currently limited , as robust methods for detecting the conversion of lignin to desired small molecules are not available . 
+ Results : We identified a vanillin-inducible promoter by RNAseq analysis of Escherichia coli cells treated with a suble-thal dose of vanillin and developed a genetically programmed vanillin-sensing cell by placing the ` very green fluorescent protein ' gene under the control of this promoter . 
+ Fluorescence of the biosensing cell is enhanced significantly when grown in the presence of vanillin and is readily visualized by fluorescence microscopy . 
+ The use of fluorescenceactivated cell sorting analysis further enhances the sensitivity , enabling dose-dependent detection of as low as 200 µM vanillin . 
+ The biosensor is highly specific to vanillin and no major response is elicited by the presence of lignin , lignin model compound , DMSO , vanillin analogues or non-specific toxic chemicals . 
+ Background
+ Plant biomass is a potential renewable raw material for sustainable production of biofuels and value-added chemicals . 
+ The three major constituents of plant bio-mass are cellulose ( 40 -- 43 % ) , hemicellulose ( 20 -- 27 % ) and lignin ( 20 -- 30 % ) . 
+ Huge amounts of lignin are produced as a by-product of the paper industry , while cellulosic components of plant biomass are utilized for the production of paper pulp . 
+ In the near future , biorefineries will generate substantial amounts of lignin by-products after converting plant cellulose to bioethanol , which have no significant use apart from burning for energy . 
+ Compared to cellulose , lignin has extremely heterogeneous aromatic building blocks that can potentially be converted into various value-added chemicals or precursors for the synthesis of commodity chemicals . 
+ Lignin could serve as a potential source of aromatics that can substitute fossil-derived consumer products [ 1 -- 3 ] . 
+ In spite of its vast potential , lignin remains the least exploited component of plant biomass due to its recalcitrant nature that is attributed to the extremely complex cross-linked three-dimensional structures of the lignin backbone [ 4 , 5 ] . 
+ Vanillin is the most lucrative lignin degradation product due to its higher cost and notable demand in the food , flavour and cosmetics industries . 
+ Other lignin degradation products like acetovanillone , vanillyl alcohol , syringaldehyde , guaiacol and eugenol also have potential industrial applications . 
+ Although once a common industrial practice , chemical conversion of lignin to vanillin is not widely used today due to hazard-ous environmental impacts of chemical conversion methods . 
+ Only one major company is still producing vanillin from spent sulfite liquor by a chemical process [ 6 , 7 ] . 
+ The search for greener alternatives is leading to the development of chemical catalysts that can potentially lead to oxi-dative lignin degradation under mild conditions [ 8 -- 10 ] . 
+ Biocatalysts may play an important role as several micro-organisms are well known for recycling abundant lignin biomass in nature and a few natural enzymes have been reported with lignin-degrading properties [ 11 -- 17 ] . 
+ Reiter et al. demonstrated depolymerization of complex lignin into small amounts of aromatic monomeric compounds using a combination of Cα-dehydrogenase , β-etherase and glutathione lyase enzymes [ 15 ] . 
+ Studies suggest the production of small phenolic compounds ( acids , ketones and aldehydes ) via oxidative lignin degradation by white-rot fungi [ 18 -- 21 ] . 
+ Very recently , Salvachúa et al. [ 22 ] have reported the partial depolymerization of high-lignin content biorefinery stream using fungal secretomes containing high laccase and peroxidase activity in the presence of an aromatic-catabolic bacterium as a ` microbial sink ' . 
+ However , to date no enzyme is reported to degrade lignin to the monomeric phenolic subunits with high effifficiency . 
+ Development of robust lignin-degrading enzymes by engineering the catalytic effifficiency of currently available enzymes would be a valuable step forward in implementing white biotechnology processes for the conversion of lignin biomass to value-added chemicals . 
+ Directed evolution is a protein engineering technique whereby extremely large numbers ( up to 1010 ) of mutant enzymes are generated and rapidly screened for the desired characteristics [ 23 -- 25 ] . 
+ However , the application of this technique for developing effective lignin-degrad-ing enzymes is limited due to the lack of an effifficient high-throughput screening method that is essential for rapid screening of large numbers of mutants . 
+ Several reports have described the directed evolution of lignindegrading enzymes such as laccase and peroxidase using plate-based colorimetric screening methods [ 26 -- 29 ] . 
+ These strategies have been useful in generating enzymes with desired characteristics such as higher redox potential , improved expression level , altered substrate specific-ity and organic solvent tolerance . 
+ However , these studies have mainly employed readily oxidizable colorimetric proxy substrates such as 2,2 ′ - azino-bis ( 3-ethylbenzothia-zoline-6-sulphonic acid ) ( ABTS ) , which does not necessarily result in these enzymes showing a lignin-degrading phenotype . 
+ Direct detection of lignin degradation products is the ideal way to identify effifficient lignin-degrading enzymes generated by directed evolution , but current product detection methods of choice such as GC/MS and LC/MS are time consuming and not suitable for high-throughput enzyme screening . 
+ Development of a biosensor that can detect a lignin degradation product by transducing the metabolite concentration to reporter gene expression would be useful in rapid phenotypic evaluation of specific product formation and identification of superior enzyme variants within an engineered enzyme library . 
+ Inducible regulator-based systems for gene expression are regulated by the presence of a specific small mole-cule inducer . 
+ The IPTG-inducible LacI promoter is the prototypical example of a small-molecule-inducible system and is widely used in hyper-expression of recombinant genes . 
+ The engineered LacI promoter-based system has also been used in signal processing and chromosomal visualization [ 30 , 31 ] . 
+ If a reporter gene is controlled by this inducible regulator , the presence of the inducing molecule can be detected by the phenotypic change due to the production of the reporter protein . 
+ While a few well-characterized small-molecule-induc-ible regulators ( LacI , AraC , TetR etc. ) are widely used in several applications , the development of additional inducible systems will expand their use in innovative areas of biotechnology , such as metabolic engineering . 
+ Detection of a target chemical using live cell biosensors would be a robust technique for directed evolution methodologies . 
+ Such small-molecule-inducible biosensors could be developed using DNA constructs that control the expression of a reporter gene in response to the presence of the specific target molecule . 
+ The utility of inducible microbial biosensors was recently demonstrated through monitoring glucarate production in a heterologous glucarate biosynthesis pathway and identification of superior enzyme variants using a live cell glucarate sensor [ 32 ] . 
+ This study suggests the potential of small-molecule-inducible biosensors in screenin enzyme libraries for the production of the inducing chemicals . 
+ Here , we describe the development and characterization of an inducible whole-cell biosensing system ( i.e. an engineered E. coli cell ) that can detect the presence of vanillin , a commercially attractive lignin degradation product ( Fig. 1 ) . 
+ This biosensor has potential use in the screening of engineered enzymes that could convert lignin to vanillin . 
+ Methods Chemicals and reagents
+ All chemicals including kraft lignin , vanillin , vanillin analogues , benzaldehyde , DMSO and acrylic acid used in this study were purchased from Sigma-Aldrich ( USA ) . 
+ The lignin model compound ( guaiacylglycerol-beta-guai-acyl ether ) was from Tokyo Chemical Industries Co. , Ltd. ( Japan ) . 
+ Identification of up‑regulated genes by RNAseq analysis Vanillin treatment and RNA isolation Escherichia coli BL21 cells were cultured in LB medium at 37 °C and exposed to increasing concentration of van-illin at the mid-log phase ( OD600 = 0.5 ) , and the sublethal dose of vanillin was determined from the growth curves . 
+ For RNAseq experiments , fresh E. coli BL21 culture was exposed to 0 and 5 mM vanillin ( sublethal concentration ) at the mid-log phase and the cells were collected at 0 , 1 , 2 and 3 h post exposure . 
+ Total RNA was extracted from the E. coli cells using RNeasy Mini kit ( Qiagen ) following the manufacturer 's protocol , and rRNA was removed using ribo-zero rRNA removal kit ( Epicentre ) following the manufacturer 's protocol . 
+ The experiment was done in triplicate , the RNA was quantified and quality was confirmed from the high RIN values ( > 9.0 ) in the Bioanalyzer quality analysis . 
+ A gene-based expression matrix was generated from the BAM files using Cuffnorm v2 .2.0 , a program that is part of Cufflfflinks [ 33 ] . 
+ Cuffnorm was run with the options of `` -- library-type fr-unstranded '' and `` -- library-norm-method classic-fpkm '' . 
+ The resulting expression matrix is normalized for library size and the values are represented as FPKM ( fragments per kilobase of exon per million fragments mapped ) . 
+ Hierarchical clustering
+ Hierarchical clustering was performed on the FPKM expression matrix in R v3 .1.0 . 
+ The expression matrix was first transformed to the log-2 space before computing the distance matrix based on the Euclidean distance of measure using the dist function of R . 
+ The Spearman correlation was then calculated using the cor function before being plotted using the heatmap .2 function from the gplots package from CRAN . 
+ Differential expression analysis
+ Cuffdiff v2 .1.1 , a program that is part of Cufflfflinks , was used to identify differentially expressed genes at each time point [ 33 ] . 
+ Default parameters were used except for the option of `` -- multi-read-correct '' and `` -- max-bundle-frags 100000000 '' . 
+ A threshold of FDR < 0.05 and absolute fold change > 2.0 were used for significance . 
+ SAM ( Significance Analysis of Microarrays ) was used to identify genes that were differentially expressed across time points . 
+ This analysis was performed in R v3 .1.0 using the samr package from CRAN with the following options : resp . 
+ type = `` Two class unpaired timecourse '' , nperms = 100 and time.summary.type = `` slope '' . 
+ Genes having a log-2 fold change > 2.0 were identified as up-regulated , while those having a log-2 fold change smaller than − 2 were identified as down-regulated . 
+ Plasmid construction and biosensor development Prediction of putative promoter regions and plasmid construction The putative promoter regions of the top seven upregulated genes were arbitrarily predicted to be located within the first 300 bp of the non-coding region immediately upstream of the up-regulated genes ( Additional file 1 : Tables S1 , S2 ) . 
+ The putative promoters were amplified by PCR using suitable infusion cloning primers . 
+ The amplified products were cloned upstream of the very green fluorescence protein ( vGFP ) gene [ 34 ] in a customized plasmid construct developed in pUC19 backbone by replacing ~ 600-bp nucleotides after the origin of replication ( including the lac promoter sequence ) by the vGFP gene , using Infusion HD cloning kit ( Clontech Laboratories ) . 
+ The predicted endogenous ribosome binding sites ( RBS ) were replaced by a strong g10 RBS sequence ` tttaactttaagaaggagatatacat ' [ 32 ] . 
+ The final plasmid constructs contain an ampicillin resistance gene , E. coli origin of replication and a vanillin-inducible putative promoter region followed by the RBS and the vGFP gene ( Fig. 1 ; Additional file 1 : Figure S3 ) . 
+ Biosensor development and selection
+ Seven live cell biosensors ( Lcb1 -- Lcb7 ) were developed by transforming chemically competent E. coli BL21 cells with seven plasmid constructs containing different putative promoter sequences ( Fig . 
+ 1 ; Additional file 1 : Table S2 ) . 
+ The live cell biosensors were grown in LB medium at 37 °C up to mid-log phase followed by overnight induction with 5 mM vanillin . 
+ The 5 mM final vanillin concentration was achieved by 400 times dilution of 2.0 M stock solution in DMSO ; a set of control experiments was done without the addition of vanillin but in the presence of an equivalent amount of DMSO . 
+ The cells were collected by centrifugation , washed with phosphatebuffered saline ( PBS ) and resuspended in the same buffer to make the cell concentration to OD600 = 1 . 
+ Expression of the vGFP was estimated by measuring green fluorescence ( Ex/Em = 488/509 nm ) of 100 µl resuspended cells ( OD600 = 1.0 ) using a multilabel plate reader ( Perki-nElmer 2104 ) , and the level of induction was calculated from the fluorescence ratio of induced to uninduced cells of each live cell biosensor . 
+ High induction level of the best biosensor ( with the highest increase of fluorescence ) was confirmed by measuring the green fluorescence of induced and uninduced cells by fluorescence-activated cell sorting ( FACS ) analysis using BD FACSAria cell sorter ( BD Biosciences ) and the sensor was selected for further characterization . 
+ Characterization of the selected live cell biosensor Sensitivity of the live cell biosensor to vanillin concentration The relationship between vanillin concentration and the expression of the fluorescent reporter was evaluated . 
+ 1 ml overnight culture of the live cell biosensor was inoculated in 100 ml LB medium and cultured at 37 °C with constant shaking at 175 rpm until the OD600 reached 0.5 . 
+ Then the culture was split into twelve 5-ml portions and induced ( in duplicate ) for 20 h with 0 , 0.2 , 0.5 , 1.0 , 3.0 and 5.0 mM vanillin . 
+ The cells were collected by centrifugation , washed with phosphate-buffered saline ( PBS ) and resuspended in the same buffer . 
+ A portion of the cell suspension was further diluted to prepare a 2-ml sample with a final cell concentration of OD600 = 0.1 . 
+ Response of the live cell biosensor to various vanillin concentrations was estimated from median fluorescence of the cells measured by FACS analysis using a BD FACSAria cell sorter ( BD Biosciences ) . 
+ The experiment was repeate three times independently and the average fluorescence of the biosensor treated with individual vanillin doses was calculated after normalizing fluorescence of the control to 100 . 
+ Cross‑reactivity testing
+ Cross-reactivity of the live cell biosensor was tested against a panel of potential inducing compounds including various lignin degradation products ( acetosyringone , acetovanillone , guaiacol , syringaldehyde , vanillic acid and vanillyl alcohol ) , kraft lignin , dimeric lignin model compound ( guaiacylglycerol-beta-guaiacyl ether ) , DMSO , benzaldehyde , veratraldehyde and a non-specific toxic chemical ( acrylic acid ) . 
+ The live cell biosensor was grown to mid-log phase and treated with three different concentrations of each chemical individually . 
+ The cultures were grown for 20 h and cells were collected by centrifugation . 
+ To study its performance in real lignin-degrading condition , the biosensor was also treated with two mixtures : ( 1 ) 5 mg/ml alkaline kraft lignin and 5 mM vanillin and ( 2 ) 5 mg/ml alkaline kraft lignin and 5 mM each of acetosyringone , acetovanillone , guaiacol , syringaldehyde , vanillyl alcohol and vanillin . 
+ Cross-reactivity of the bio-sensor to the individual chemicals and their mixtures was assessed from the fluorescence of the cells measured by FACS analysis of the samples after washing and diluting with PBS . 
+ The experiment was repeated three times independently in duplicate , and the average fluorescence of the vanillin-sensing cells treated with individual chemicals was calculated after normalizing fluorescence of the control to 100 . 
+ Fluorescence microscopy
+ Increased fluorescence of the vanillin-induced live cell biosensor was visualized under a fluorescence microscope and compared with the fluorescence of untreated and non-specific chemical-treated biosensor . 
+ The bio-sensor was treated with 5 mM vanillin , guaiacol , acrylic acid or 5 mg/ml lignin and grown for 20 h ; one control is prepared without treatment with any chemical . 
+ The cells were washed with PBS and diluted to OD600 = 1.0 . 
+ One drop of the cell suspension was placed on a microscope slide , air dried and covered with a coverslip . 
+ All samples were observed under the AxioImager Z1 upright fluorescent microscope ( Zeiss ) using 63 × oil immersion lense and imaged with 500 ms exposure time . 
+ Mean intensity of the cells was measured using Fiji software . 
+ Toxicity test
+ Toxicity of various lignin degradation products and the non-specific chemical acrylic acid was determined by growing the live cell biosensor in the presence of various concentrations of each chemical . 
+ LB medium containing 
+ 0.1 mg/ml ampicillin was inoculated with 1 % ( v/v ) over-night culture of the live cell biosensor . 
+ 2 M stock solutions of acrylic acid ( in water ) and the lignin degradation products including vanillin , vanillic acid , vanillyl alcohol , syringaldehyde , guaiacol , acetovanillone and acetosyrin-gone ( in DMSO ) were added immediately to obtain the final concentrations of 2.5 , 5.0 , 10.0 and 20.0 mM . 
+ A control was prepared without the addition of any chemical . 
+ The cells were grown at 37 °C with constant shaking at 175 rpm and growth was monitored by measuring OD600 at regular time intervals . 
+ The experiment was repeated three times independently and the growth curves were obtained by plotting average cell density against time . 
+ Toxicity of the chemicals at each concentration was estimated by comparing the growth curve with the control in which no chemical was added . 
+ Results
+ Vanillin is toxic to E. coli at high concentrations [ 35 ] . 
+ Treatment with vanillin showed that the growth of E. coli was significantly inhibited at concentrations ≥ 5 mM and that the cells started to recover 2.5 h post treatment with 5 mM vanillin ( Additional file 1 : Figure S1 ) . 
+ To identify genes that are regulated to mediate the response to van-illin exposure , we carried out RNAseq analysis of E. coli cells treated with 5 mM vanillin [ 36 ] . 
+ Significant variations in global gene expression profiles were observed between vanillin-treated and control samples collected at different time points for the RNAseq experiment ( Additional file 1 : Figure S2 ) . 
+ Differentially expressed genes were further identified by comparing RNA levels of vanillin-treated cells with those of untreated cells at individual time points . 
+ These identified 759 E. coli genes that were differentially expressed across all time points , of which 725 genes were down-regulated and 34 genes were up-regulated . 
+ There was no clear functional clustering of the up - or down-regulated genes . 
+ Several genes encoding inner membrane proteins such as ygbE , mrp , ydjX and yjgN were down-regulated by vanillin treatment ; however , their precise functions remain unknown . 
+ Analysis of the top seven up-regulated genes suggests their association in multiple physiological functions including osmoprotection , stress response and heavy metal detoxification ( Additional file 1 : Tables S1 ) . 
+ The top two up-regulated genes yjhD and yijF have unknown functions , while the three up-regulated genes ydcI , yeiW and sodC potentially contribute to heavy metal detoxification and oxidative stress defence . 
+ The fourth highest up-regulated gene proA is involved in the biosynthesis of the osmoprotective amino acid proline , and the other upregulated gene higA encodes an antitoxin of the HigB -- HigA toxin -- antitoxin system . 
+ Locations of the top seven up-regulated genes within the E. coli genome and thei upstream/downstream sequences were manually investigated ( Additional file 1 : Table S1 ) . 
+ Promoter regions were arbitrarily predicted to be located within the first 300 bp of the non-coding region immediately upstream of the up-regulated genes . 
+ The putative promoter regions from these top seven up-regulated genes ( Additional file 1 : Table S2 ) were cloned individually upstream of the vGFP gene in a customized plasmid , and seven live cell biosensors ( Lcb1 -- Lcb7 ) were generated by transforming E. coli BL21 cells with these plasmid constructs ( Fig . 
+ 1 ; Additional file 1 : Figure S3 ) . 
+ The vGFP gene of each biosensor was overexpressed by inducing with 5 mM vanillin and the production of the vGFP protein was estimated from the fluorescence of the overnight induced cells ( Additional file 1 : Table S3 ) . 
+ The sensors Lcb4 and Lcb5 showed high levels of fluorescence in the presence of vanillin although uninduced cells also showed background fluorescence due to leaky nature of the promoters . 
+ Fold induction of these biosensors was calculated from the fluorescence ratio of induced to uninduced biosensors measured by FACS analysis . 
+ The biosensor Lcb5 constructed with the putative promoter region upstream of the yeiW gene showed higher fluorescence enhancement ( 4.3 fold ) compared to Lcb4 that was made with the putative promoter region upstream of the proA gene . 
+ Lcb4 showed high background fluorescence with 1.8-fold fluorescence enhancement . 
+ The biosensor Lcb5 ( hereafter termed ` vanillin-sensing cell ' or VSC biosensor ) was further characterized for sensitivity to vanillin , cross-reactivity and toxicity towards lignin , lignin model compounds , solvents , potential lignin degradation products , vanillin analogues and non-specific toxic chemicals . 
+ Sensitivity of VSC biosensor to vanillin
+ The VSC biosensor was then tested against different concentrations of vanillin to determine its detection threshold . 
+ The biosensor responds to different concentrations of vanillin in a dose-dependent manner ( Fig . 
+ 2 ) . 
+ FACS analysis showed increased fluorescence of the biosensor treated with vanillin at a concentration as low as 200 μM . 
+ The average cell fluorescence increases dynamically with an increase in vanillin concentration , with a maximum ~ 4.5-fold increase observed at 5.0 mM ( Fig . 
+ 2 ) . 
+ Further increases in vanillin concentration adversely affect cell growth due to toxicity . 
+ The observed broadening of peaks in the 0.2 -- 3 mM samples may reflect the presence of mixed populations that comprise cells with different levels of vGFP expression , while in the 5 mM sample all cells have shifted to an `` on '' state with optimum expression of the vGFP protein . 
+ Increased fluorescence in the cells treated with 5.0 mM vanillin was clearly visualized by fluorescence imaging ( Fig. 3 ) , with mean fluorescence intensity being ~ 3.5-fold higher than that of untreated cells . 
+ Specificity of VSC biosensor
+ The VSC biosensor was next assayed against various lignin degradation products and kraft lignin that would be present in lignin-degrading reaction systems , along with a dimeric lignin model compound that is often used as a substrate to test potential lignin-degrading properties of enzymes . 
+ Cross-reactivity was also tested against DMSO , benzaldehyde and acrylic acid to rule out fluorescence enhancement by solvent , non-specific aromatic aldehydes and a non-specific toxic chemical , respectively . 
+ The VSC biosensor does not show any significant response to high concentrations of other potential lignin degradation products ( Fig . 
+ 4 ) or non-specific chemicals ( Fig. 5 ) with the exception of syringaldehyde and vanillic acid , both showing ~ 1.6-fold fluorescence enhancement compared to a 4.5-fold fluorescence enhancement by vanillin . 
+ This cross-reactivity may be related to the structural similarity of these compounds with vanillin . 
+ No significant fluorescence enhancement was noticed when the biosensor was treated with DMSO , which is used to dissolve lignin and lignin degradation products . 
+ No crossreactivity was observed when the cells were treated with lignin or dimeric lignin model compound ( guaiacylglyc-erol-beta-guaiacyl ether ) that is often used to study enzymatic lignin degradation [ 11 , 12 , 15 ] . 
+ Response of the biosensor to 5 mM vanillin was not abruptly affected by the presence of 5 mg/ml alkaline kraft lignin alone or in combination with 5 mM each of acetosyringone , acetovanillone , guaiacol , syringaldehyde and vanillyl alcohol . 
+ However , about 5 -- 10 % less fluorescence was observed when the sensor was treated with the mixtures in comparison to treatment with vanillin alone ( Fig . 
+ 5 ) . 
+ Fluorescence of the VSC biosensor did not change upon treatment with 5 mM acrylic acid , a chemical toxic to E. coli [ 37 ] . 
+ This observation disfavours fluorescent enhancement by any non-specific toxicity-induced overexpression of the vGFP gene . 
+ Fluorescence imaging of the biosensor treated with acrylic acid , guaiacol and lignin also showed no fluorescence enhancement , which confirms no cross-reactivity of the sensor with these chemicals ( Fig. 3 ) . 
+ Toxicity of potential lignin degradation products to VSC biosensor The toxicity of various lignin degradation products towards the biosensing cells was studied to understand potential inhibition of cell growth with successful lignin degradation ( Fig. 6 ) . 
+ Although sublethal induction is the basis of vanillin-induced fluorescence enhancement of the VSC biosensor , it would not be able to detect positive mutants developed by directed evolution if high toxicity of any lignin degradation product suppresses cell growth . 
+ With the exception of vanillin and vanillic acid , all the major lignin degradation products showed minima toxicity to the biosensor when treated with up to 20 mM concentrations . 
+ Only a slight inhibition was observed at high concentrations ( 20 mM ) of syringaldehyde . 
+ As expected , vanillin inhibited the growth of the biosensor at 5 mM concentration but the cells recovered from initial toxicity and entered log phase after 6 h. However , the biosensing cells could not recover within the study time ( 9 h ) from growth inhibition at vanillin concentrations of 10 mM or higher . 
+ The toxicity of vanillic acid was very similar to that of vanillin , in agreement with other studies on the antimicrobial activity of vanillin and vanillic acid on E. coli [ 35 , 38 , 39 ] . 
+ A previous study also showed the complete suppression of E. coli growth in the presence of 15 mM vanillin [ 35 ] . 
+ Friedman et al. [ 39 ] have shown that phenolic benzaldehyde and benzoic acid compounds have significant antimicrobial activity against E. coli , whereas benzoic acid ester does not . 
+ This observation also explains the inhibition of E. coli growth by vanil-lin , syringaldehyde and vanillic acid but not by guaiacol , acetovanillone or acetosyringone . 
+ Davidson and Naidu reported that the antimicrobial activity of a phenolic compound depends mainly on its chemical structure and concentration , which supports our observations [ 40 ] . 
+ We also studied the toxicity of various concentrations of acrylic acid , a non-specific chemical known to be toxic to E. coli [ 37 ] . 
+ Growth inhibition of the VSC biosensor by acrylic acid was similar to that observed with vanillin ; it showed substantial toxicity at 5 mM concentration and no growth was observed at higher concentrations . 
+ Discussion
+ Lignin can potentially be converted to valuable aromatics such as vanillin , by controlled enzymatic catalysis . 
+ While natural enzymes have great potential , their performance could be further improved using directed evolution approaches . 
+ In addition to other enzymes , several natural and engineered laccases and peroxidases have been studied for enzymatic conversion of lignin to vanillin [ 11 , 14 , 41 ] . 
+ In spite of vast research in this area , no single enzyme has been reported to convert actual lignin substrates to their monomeric phenolic subunits . 
+ Remarkably , there are very few reports of selecting engineered enzymes using genuine lignin substrates . 
+ This is partially due to unavailability of high-throughput screening tools to detect lignin degradation and also because lignins have extremely heterogeneous structures with versatile chemical linkages that are least likely to be cleaved by the action of a single enzyme [ 13 , 42 ] . 
+ Considering the complexity of lignin structures , future research may be directed towards simultaneous evolution of multiple enzymes or a multi-enzyme pathway using high-through-put screening tools like the live cell vanillin sensor described here . 
+ In this respect , the multi-enzyme system described by Reiter et al. [ 15 ] is particularly applicable . 
+ It was possible to release a small amount of lignin mono-mers from complex lignin structures using a combination of Cα-dehydrogenase , β-etherase and glutathione lyase enzymes . 
+ Salvachua et al. [ 22 ] reported lignin depolym-erization by fungal secretomes containing a high level of laccase and peroxidase enzymes . 
+ The VSC biosensor described here will be a useful tool in selecting vanillin-synthesizing enzymes from both metagenomic and mutant libraries . 
+ Developing enzymes for the conversion of lignin to vanillin would be of particular interest as vanillin is the most important lignin degradation product due to its large-scale use in the food , flavour and cosmetic industries . 
+ Induction of the putative E. coli promoter used in our biosensor is van-illin specific and no vanillin analogue or non-specific toxic chemical ( like acrylic acid ) can induce the expression of the vGFP gene under the control of this promoter , which is particularly interesting considering the absence of a known vanillin metabolism pathway in native E. col 
+ [ 43 ] . 
+ However , vanillin 's mode of antimicrobial activity may explain this ambiguity . 
+ This comes mainly from its ability to damage the plasma membrane of the microbial cells through interaction with the lipids or proteins , which cause subsequent loss of the ionic gradient across the membrane and inhibition of bacterial respiration [ 35 , 44 ] . 
+ A study using propidium iodide staining suggests that a significant proportion of E. coli cells remain alive even after treatment with 50 mM vanillin , although vanil-lin can completely arrest E. coli growth at a concentration of 15 mM , indicating that microbial growth inhibition by vanillin is bacteriostatic in nature rather than bactericidal [ 35 ] . 
+ This report also showed that E. coli can maintain partial potassium gradients after exposure to 50 mM vanillin for 40 min ; vanillin treatment in this condition completely dissipates potassium ion gradients of Lactobacillus plantarum . 
+ Collectively , these observations suggest that the extent of E. coli membrane damage caused by vanillin is relatively less severe , and that when exposed to sublethal concentrations of vanillin , E. coli may cope with the stress by reestablishing ion gradients by alternative means , without vanillin metabolism . 
+ Although we can not establish any functional group in the up-regulated genes identified by RNAseq analysis of vanillin-treated 
+ E. coli cells , the functions of the top seven up-regulated genes imply association with osmoprotection , metal ion transport and heavy metal toxicity ( Additional file 1 : Table S1 ) . 
+ The up-regulated gene ydcI encodes a putative LysR-type DNA-binding transcriptional regulator . 
+ The exact function of ydcI protein is not known yet but other members of LysR-type transcriptional regulators are involved in the expression of various unrelated proteins including sodium -- hydrogen antiporter and proteins involved in zinc homeostasis and oxidative stress defence [ 45 , 46 ] . 
+ The proteins encoded by the other two up-regulated genes yeiW and sodC also play some role in metal ion detoxification and oxidative stress defence . 
+ The fourth highest up-regulated gene proA encodes a subunit of glutamate-5-semialdehyde dehydrogenase and gamma-glutamyl kinase-GP-reductase multi-enzyme complex that catalyses the first step in the synthesis of the osmoprotective amino acid proline [ 47 , 48 ] . 
+ The VSC biosensor responds in a dose-dependent manner and upon induction with 5.0 mM vanillin fluorescence of the sensor is increased more than 4-fold but further increase of signal is not possible as the E. coli can not grow at higher vanillin concentrations . 
+ Detectability within a relatively narrow range of vanillin concentratio may be a limitation in selecting for mutants that can produce very low ( < 0.5 mM ) or very high ( > 5 mM ) concentrations of vanillin . 
+ Transposing the genetic sensing construct into a vanillin-tolerant microorganism could potentially address toxicity issues . 
+ In this respect , the top seven up-regulated genes upon vanillin exposure of E. coli cells are conserved within members of the Enterobacteriaceae family including several strains from th genus Escherichia , Shigella , Salmonella and Enterobacter . 
+ The large molecular weight of lignin precludes ready uptake into microbial cells . 
+ The VSC biosensor will likely find optimal use in screening extracellular enzymes that convert lignin to vanillin ( Fig . 
+ 1 ) . 
+ Selection experiments could be carried out on lignin-containing agar plates or using emulsion encapsulation methodology [ 49 ] . 
+ Additionally , this system could find use in screening enzymes that produce vanillin from smaller cell-permeable precursors . 
+ Examples include vanillyl alcohol oxidase and carboxylic acid reductase that produce vanillin from precursors like creosol , vanillylamine and vanillic acid [ 50 , 51 ] . 
+ Versatility of live cell biosensors is also restricted by relatively short cellular lifespans , requirement of specific conditions for growth and survival of microbial systems and restriction of using engineered or live microorganisms in various end products . 
+ However , these issues will not create any major challenge in using this biosensor as a host cell for screening lignin-degrading enzyme libraries . 
+ Inducible regulator-based whole-cell sensing systems are established as valuable analytical tools , which allow highly specific detection of target chemicals using fluorescent or bioluminescent reporters [ 32 , 52 -- 56 ] . 
+ While most of these biosensors were developed for environmental microbiology or bioremediation applications , inducible promoter-based sensing systems could be extremely useful in metabolic engineering applications including high-throughput screening of engineered enzyme libraries . 
+ Despite their vast potential , only a handful of small-molecule-inducible regulators ( e.g. LacI ) are well characterized and repeatedly used for a diverse range of applications . 
+ Development of additional inducible regulators can potentially provide newer biotechnology tools with innovative applications . 
+ The product inhibition approach used here could be potentially applied to identify putative promoter regions and develop biosensors for any chemical that exhibits partial toxicity to a microorganism with known genome sequence . 
+ Using a similar approach , Rogers et al. [ 32 ] developed four genetically encoded biosensors that respond to acrylate , glucarate , erythromycin and nar-ingenin , and demonstrated the usage of glucarate bio-sensor for selecting superior enzyme variants from the glucarate biosynthesis pathway . 
+ Conclusions
+ Enzymatic valorization of lignin will most likely be achieved through engineering of multi-enzyme pathways using a directed evolution approach combined with an effective high-throughput screening technique . 
+ The VSC biosensor can provide such a tool , detecting as low as 200 µM vanillin by FACS . 
+ This biosensor does not show any cross-reactivity to lignin , vanillin analogues or nonspecific toxic chemicals . 
+ No major lignin degradation product showed significant toxicity towards the biosensor when treated with up to 20 mM concentration . 
+ We propose the use of this biosensor as a host cell for screening of lignin-degrading enzymes from randomized librar-ies and metagenomic samples . 
+ Authors’contributions
+ BS conducted the experiment for development and characterization of the vanillin-sensing cell ; KHBC and NN performed the RNAseq analysis ; SSR designed the plasmid construct and helped in FACS analysis ; BS and FJG conceptualized and designed the research and are the major contributors in writing the manuscript ; BR , NN , JS and FJG contributed to the project development and supervising . 
+ All authors read and approved the final manuscript . 
+ Author details
+ 1 p53 Laboratory , Agency for Science Technology And Research ( A * STAR ) , 8A Biomedical Grove , # 06-04/05 Neuros/Immunos , Singapore 138648 , Singapore . 
+ 2 Genome Institute of Singapore , 60 Biopolis Street , Genome , # 02-01 , Singapore 138672 , Singapore . 
+ 3 Institute of Chemical and Engineering Sciences , 8 Biomedical Grove , Neuros , # 07-01 , Singapore 138665 , Singapore . 
+ Acknowledgements
+ The vGFP gene was kindly provided by Dr. Swaine Chen from Genome Institute of Singapore . 
+ Competing interests
+ The authors declare that they have no competing interests.
+ Availability of data and materials
+ All data generated or analysed during this study are included in this published article and its supplementary information files . 
+ Funding
+ This work was supported by funding under Biomass-to-Chemicals program , Science and Engineering Research Council ( SERC ) , Agency for Science , Technology and Research ( A * STAR ) . 
+ The funding body did not play any role in the design of the study , the collection , analysis , and interpretation of data and in writing the manuscript . 
+ Received: 3 December 2016 Accepted: 28 January 2017
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/28240544.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/28240544.txt 0 → 100644
View file @27818a9
+ ■ INTRODUCTION
+ particles ( QA NPs ) . 
+ Modiﬁcation suggests that synergistic quercetin ( Qe ) improves the antibacterial effect of silver nanoparticles ( Ag NPs ) . 
+ Characterization experiment indicates that QA NPs have a diameter of approximately 10 nm . 
+ QA NPs show highly effective antibacterial activities against drug-resistant Escherichia coli ( E. coli ) and Staphylococcus aureus ( S. aureus ) . 
+ We explore antibacterial mechanisms using S. aureus and E. coli treated with QA NPs . 
+ Through morphological changes in E. coli and S. aureus , mechanisms are examined for bacterial damage caused by particulate matter from local dissociation of silver ion and Qe from QA NPs trapped inside membranes . 
+ Moreover , we note that gene expression proﬁling methods , such as RNA sequencing , can be used to predict discover mechanisms of toxicity of QA NPs . 
+ Gene ontology ( GO ) assay analyses demonstrate the molecular mechanism of the antibacterial effect of QA NPs . 
+ Regarding cellular component ontology , `` cell wall organization or biogenesis '' ( GO : 0071554 ) and `` cell wall macromolecule metabolic process '' ( GO : 0044036 ) are the most represented categories . 
+ The present study reports that transcriptome analysis of the mechanism offers novel insights into the molecular mechanism of antibacterial assays . 
+ KEYWORDS : quercetin , silver nanoparticles , antibacterial mechanism , RNASeq , transcriptome 
+ Bacteria are microorganisms that cause deadly infections .1 Drug-resistant microorganisms are another major problem for current medicine . 
+ Although antibiotics are the frontline defense against bacterial infection , the emergence of pathogenic antibiotic resistance has prompted the development of highly effective , novel antimicrobial agents .2 Antimicrobial resistances are a worldwide issue because they generate antibiotics resistance and increases in healthcare costs .3 Thus , new e cient anti-fi bacterial material is signiﬁcant and necessary in our life . 
+ Silver and its compounds exert strong inhibitory and bactericidal effects , as well as broad-spectrum antimicrobial activities against fungi and viruses .4,5 Although silver is toxic to microorganisms , it is less dangerous to mammalian cells than other metals .6 Silver NPs as a kind of nanosized silver particles can be used as bactericides .7,8 Someone proposed possible antibacterial mechanisms indicating that Ag NPs release Ag + , which then binds to the thiol groups of bacterial enzymes to interfere with DNA replication . 
+ Another mechanism of bactericidal action was proposed ; this mechanism explains that antibacterial activity is based on electrostatic attraction between a negatively charged cell membrane of microorganisms and positively charged Ag + ions .9 − 11 Particle-speciﬁc interaction of Ag NPs with bacteria , their subsequent penetration , and local release of Ag + ions , which all cause bacterial death , were also proposed as their antibacterial property .12 − 14 Thereby , Ag NPs have attracted great attention because of their effective bactericidal effect . 
+ In recent years , dietary ﬂavonoids have gotten a lot of attention because their potential health beneﬁts are associated with decreased risks of different chronic diseases , 7,15,16 especially cardiovascular disease . 
+ 3,3 ′ ,4 ′ ,5,7 - Pentahydroxy-ﬂavone ( quercetin , Qe ) is a highly abundant ﬂavonoid from 17 fruits and vegetables . 
+ Moreover , Qe can be extracted from the ﬂowers and leaves of some plants . 
+ As an important component of numerous plant-based medicines , Qe has been used to treat 18,19 several diseases . 
+ Ag NPs connected with Qe was introduced as a new nanomaterial for antibacterial assays . 
+ Furthermore , conventional toxicity assays may not sufice to fully capture complexities of cellular responses toward NPs . 
+ Antibacterial mechanisms remain unclear with regard to how this speciﬁc effect of exposure to nanomaterial occurs . 
+ Thus , new and more comprehensive approaches are needed . 
+ The transcriptomics ﬁeld has sped development in recent years with the introduction of next-generation sequencing technologies , such as RNA sequencing ( RNASeq ) , which will possibly displace cDNA microarrays as the favored method 20,21 for gene expression proﬁling of cells and tissues . 
+ RNASeq provides a useful tool to identify differently in the expression level of genes , following treatment with various compounds .22 Compared with whole genome sequencing , the main advantage of RNASeq is that it only analyzes transcribed regions of genomes . 
+ Compared with the conventional method , less is known about antibacterial mechanisms of NPs at gene expression levels . 
+ Gene expression pro ling can also be used as a new tool ﬁ to evaluate the interaction between NPs and biological systems 23 to reveal its molecular mechanism . 
+ In a previous study , mechanisms were unclear regarding how this speciﬁc effect of exposure to silver-nanoparticle-decorated quercetin nano-particles ( QA NPs ) occurs ; damage of the bacterial membrane was probably caused by the presence of particulate matter and/or local dissociation of Ag + and Qe from QA NPs trapped in mucus surrounding bacteria membrane .23,24 Through this study , we hoped to identify the set of complete mechanisms for QA NPs in antibacterial assays . 
+ ■ MATERIALS AND METHODS
+ Materials . 
+ All chemicals were purchased from Sigma-Aldrich ( Sigma ) Chemical Co. . 
+ Ultrapure Luria − Bertani ( LB ) agar powder was homemade or acquired from the School of Life Sciences , Anhui Agricultural University . 
+ Pseudomonas aeruginosa ATCC 27853 ( P. aeruginosa ) , Bacillus subtilis ATCC 6633 ( B. subtilis ) , and Escherichia coli ATCC 8739 ( E. coli ) cells and Staphyloccocus aureus ATCC 6538 ( S. aureus ) lines were acquired from Anhui Agricultural University . 
+ All other chemicals were of analytical grade . 
+ Ultrapure water was used throughout all experiments . 
+ Synthesis of QA NPs . 
+ QA NPs were synthesized based on the methods reported by Sun et al. . 
+ The only difference between this study 's method and that of Sun et al. is that aqueous AgNO3 and Qe solutions with different molarities ( 1/0 .75 , 1/1 .5 , and 1/2 ) were mixed under vigorous stirring for 5 min . 
+ Finally , NPs were collected via centrifugation at 8000 rpm for 10 min .25,26 QA NP Characterization . 
+ We examined the morphology of the QA NPs with a transmission electron microscope ( TEM , TJEOL 6300 F , Tokyo Japan , Philip ) and a scanning electron microscope ( SEMXL-20 , Holland , Philips ) . 
+ The samples were prepared for viewing by dripping 20 L of QA NP solution onto a carbon-coated copper grid . 
+ The μ samples were then air-dried before imaging . 
+ The nanoparticle ζ potentials of the QA NPs were measured with Zetasizer Nano ZS ( Malvern Instruments , U.K. ) .28 The infrared , UV − vis absorption and uorescence spectra of the QA NPs were obtained with a Bruker ﬂ Tensor 27 FT-IR DTGS detector ,27,28 a spectrophotometer ( JASCO , Japan ) , and a uorescence spectrophotometer ( JASCO FP-6300 , ﬂ Tokyo , Japan ) , respectively . 
+ All determinations were performed in triplicate .25,29 Cell Culture . 
+ Four kinds of bacterial cells were cultured in LB medium at 37 °C . 
+ A solution of logarithmic-phase ( log-phase ) bacterial cells was acquired by reinoculating into fresh media for 12 h . 
+ The cell solution was incubated in a shaking incubator for 2 − 3 h until reaching a 0.5 optical density at 600 nm ( OD ) .30 600 nm Antibacterial Activity Test of QA NPs . 
+ Based on the antibacterial method developed by Sun et al. , solutions of log-phase bacterial cells ( P. aeruginosa , S. aureus , B. subtilis , and E. coli ) were inoculated in a solution that contained 20 μg / mL QA NPs , Qe , or Ag NPs . 
+ Then , the solution was incubated for 12 h at 37 °C in a shaking incubator . 
+ The LB-agar plate contained the same number of the four bacterial species . 
+ The bacterial cells that were cultured in a solution without QA NPs served as the control . 
+ The number of viable cells were statistically determined by counting colony-forming units ( CFUs ) .31,32 This study used a concentration unit of micrograms per milliliter based on Qe . 
+ All tests were carried out in triplicate or quadruplicate . 
+ Log-phase S. aureus and E. coli were cultured with different QA NP concentrations ( 5 , 10 , and 15 μg / mL ) under similar culture conditions . 
+ Exactly 5 μL of bacterial suspension was viewed with a FL microscope at 480 nm ( IX-71 , Olympus , Japan ) .33 Cellular Uptake Assay . 
+ The cellular uptake QA NP assay was conducted using a FL microscope . 
+ Log-phase cells ( S. aureus and E. coli ) were treated with QA NPs ( 5 , 10 , and 15 μg / mL ) . 
+ Cells were collected via centrifugation ( 3000 rpm , 15 min ) , washed twice with phosphatebuffered saline ( PBS ; pH 7.5 , 0.1 M ) , and stained with 4 ′ ,6 - diamidino-2-phenylindole ( DAPI ; 5 μg / mL , Life Technologies ) for 30 min in the dark . 
+ Cell suspensions were also washed twice with PBS ( pH 7.5 , 0.1 M ) to eradicate redundant DAPI . 
+ A total of 5 μL of cell suspensions were observed under a FL microscope at red and blue channels to visualize QA NP uptake and DAPI staining , respectively .30 Fluorescence Microscopic Observation ( Live/Dead ) . 
+ Logphase bacterial cells ( E. coli and S. aureus ) were treated with QA NPs ( 5 , 10 , and 15 μg / mL ) under similar culture conditions . 
+ Bacterial cells were collected and washed using the same methods . 
+ Subsequently , bacteria were stained using the LIVE/DEAD BackLight Bacterial Viability Kit ( SYTO9 and propidium iodide ( PI ) , Life Technologies ) for 30 min in the dark . 
+ Cell suspensions also were washed twice with PBS ( pH 7.5 , 0.1 M ) . 
+ Lastly , 5 μL of bacterial suspension was observed under a FL microscope at the green and red channels for SYTO9 and PI ( PI samples need to be observed within 1 h ) , respectively .34,35 Membrane Integrity Studies . 
+ Membrane integrity assays were performed based on the methods reported by Sun et al. . 
+ Log-phase cells ( E. coli and S. aureus ) were subjected to the same treatment with QA NPs ( 5 , 10 , and 15 μg / mL ) . 
+ The bacterial cells that were cultured without QA NPs were utilized as the blank group . 
+ The collected cells were dehydrated with a series of ethanol concentrations and subsequently postﬁxed with 2.5 % glutaraldehyde and 2 % paraformaldehyde for 12 h. Finally , the air-dried bacterial cells were observed via SEM .36,37 β-Galactosidase assays were performed using the methods established by Koepsel and Russell . 
+ Log-phase E. coli were inoculated in fresh LB medium . 
+ Then , the assay was performed by adding 100 μL of 80 mg/mL o-nitrophenyl-β-D-galactopyranoside ( ONPG ) to 1.5 mL of log-phase E. coli suspension . 
+ The optimum pH values to stimulate E. coli β-galactosidase activity in glycine buffer was 8.0 and 7.5 with lactose and ONPG as substrates , respectively .40 Then , various concentrations of QA NPs ( 5 , 10 , and 15 μg / mL ) were added to the suspension . 
+ The reaction proceeded until a visible yellow color was observed . 
+ To evaluate the effects of o-nitrophenol ( ONP ) , the extent of the reaction was determined by measuring the OD of the suspension at 420 nm . 
+ Then , enzyme concentration was calculated . 
+ Bacterial cells that were cultured in solutions without Ag NPs ( 15 μg / mL ) and with Milli-Q water served as the control groups .38 Wall Destruction Assay . 
+ E. coli and S. aureus were exposed to QA NPs ( 5 , 10 , and 15 μg / mL ) . 
+ E. coli and S. aureus that were exposed without QA NPs were used as the blank controls . 
+ Cells were collected and then ﬁxed with 2 % paraformaldehyde and 2.5 % glutaraldehyde for 12 h. Subsequently , the bacterial cells were postﬁxed on a rotator with 2 % osmium tetroxide ( OsO4 ) for 1 h . 
+ The ﬁxed bacterial cells were dehydrated in an acetone gradient series ( 35 % , 50 % , 70 % , 80 % , 95 % , and 100 % ) for 20 min . 
+ The cells were treated with a series of processing steps ( including being embedded , sectioned , and mounted on 200-mesh copper grids ) . 
+ Then , the air-dried cells were observed with TEM .39,40 RNA Extraction and Quantiﬁcation . 
+ Log-phase E. coli cells were treated with QA NPs ( 10 μg / mL ) for 12 h at 37 °C . 
+ Cells without treatment served as the blank group . 
+ Total cellular RNA was extracted using a Trizol kit ( Life Technologies ) . 
+ RNA quantiﬁcation was evaluated using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system ( Agilent Technologies , CA , USA ) . 
+ cDNA Library Preparation and Clustering and Sequencing Analysis . 
+ cDNA library preparation is described in detail in the Supporting Information . 
+ Index-coded samples were clustered based on the methods described by Zhang et al. 41 After cluster generation , Illumina Hiseq platform was used to prepare the cDNA library sequenced . 
+ Raw RNASeq data ( raw reads ) were processed using in-house Perl scripts with FastQC software . 
+ Raw reads that contained adapter and low-quality sequences were removed from the raw data to obtain clean RNASeq data . 
+ Then , the Q20 and Q30 were counted . 
+ Downstream analyses were performed based on the high-quality clean data . 
+ Quantiﬁcation of Gene Expression Level . 
+ Read numbers that were mapped to each gene were counted using HTSeq v0 .6.1 . 
+ Additionally , FPKM was calculated based on the gene length and used to analyze transcript expression levels .42 − 44 Differential Expression Gene Analysis . 
+ The control and QA NP groups were subjected to differential gene expression analysis using DEGSeq R package version 1.20.0 , which identiﬁes differentially expressed genes . 
+ P-values were calibrated for multiple tests as previously described ( Benjamini and Hochberg method ) . 
+ For comparison , a P-value of 0.005 and log ( fold change ) of 1 were set as the thresholds for 2 signiﬁcantly differential expression .22 GO and KEGG Enrichment Analysis . 
+ Gene ontology ( GO ) enrichment analysis and pathway enrichment analysis of KEGG ( Kyoto Encyclopedia of Genes and Genome ) were performed as previously described . 
+ Simply , the GOseq R package was used to analyze the GO enrichment analysis of differentially expressed genes . 
+ The P-value denotes the signiﬁcance of GO term enrichment in the differentially expressed genes ( DEG ) . 
+ The GO term with corrected P-values less than 0.05 is recommended . 
+ The statistical enrichment of differential expression genes was tested using KOBAS software in KEGG pathways .45 
+ Characterization . 
+ In the synthesis procedure of QA NPs , Qe was loaded on surfaces of Ag + ion nanosheets . 
+ Figure 1 showed a full list of characterizations . 
+ Features of QA NPs were strongly inﬂuenced by interaction charged Qe and Ag NPs ( Ag ) . 
+ These experiments aimed to determine the critical ratio for QA NPs . 
+ TEM images suggested that the optimum ratio was 0.75:1 for QA NPs ( Figure 1A ) . 
+ Under this condition , QA NPs ranged from 5 to 10 nm , as revealed in Figure 1A . 
+ The intensity of ﬂuorescence at the peak position of QA NPs had a maximum value comparable with other ratios . 
+ The ratio was used in succeeding experiments . 
+ Synthetic QA NPs were monitored by 
+ UV − vis spectroscopy ( Figure 1B ) . 
+ The surface plasmon resonance peak of Ag NPs at approximately 400 nm suggested formation of QA NPs .46,47 FTIR test analysis of interactions between Qe and Ag NPs was carried out . 
+ The FTIR spectra were shown in Figure 1C . 
+ In the FTIR spectra of Qe , at 3416 cm − 1 , the centered broad and intense peaks are OH groups , and the strong peak corresponds to stretching vibrations of C O carboxylic − 1 48 moieties at 1728 cm . 
+ Therefore , this result conﬁrmed that 49 modiﬁcation of Qe plasmonic NPs , such as Ag NPs , was easier . 
+ For QA NPs , peak positions of functional groups remained on Qe , and their shapes were similar . 
+ Characteristic peaks of QA NPs were observed in the spectrum ; these peaks may be indicative of Qe interaction with Ag NPs . 
+ Regarding ζ-potential measurements , as shown in Figure 1D , the value of Ag NPs was +51.2 ± 0.29 mV . 
+ But QA NPs decreased to +22.7 ± 0.15 mV . 
+ Variation in ζ-potential further conﬁrmed the modiﬁcation of Qe connected with Ag NPs . 
+ Synthesized QA NPs completely dissolved in water and showed ﬂuorescent properties under UV light ( Figure 1E ) . 
+ Raw Qe suspended in water was insoluble and did not show any ﬂuorescence under UV light . 
+ Qe also showed a difference under bright light . 
+ The SEM image showed better morphology of QA NPs compared with the TEM image ( Figure 1F ) . 
+ The left panel suggested distribution of NPs . 
+ But the right panel showed Qe surrounding QA NPs . 
+ Qe released into the solutions was performed by testing absorbance using the UV − visible spectrophotometer ( OD260 nm ) from QA NPs . 
+ The initial rate of Qe release was high , and the release rate reached equilibrium at 12 h . 
+ The results suggested that the release rate of Qe from nanoparticles was up to 76 % ( Figure . 
+ S1 ) . 
+ Testing Antibacterial Activity of QA NPs . 
+ In this work , fabrication of QA NPs modiﬁed the Qe property . 
+ Screening with 
+ QA NPs ( 20 μg / mL ) against bacteria was performed using P. aeruginosa , B. subtilis , S. aureus , and E. coli . 
+ All antibacterial experiments used log-phase bacterial cells . 
+ CFU method was carried out in this study . 
+ Different antibacterial activities of QA NPs were observed against four kinds of bacteria ( Figure 2C ) . 
+ New particles showed higher antibacterial activity than raw Qe and Ag NPs . 
+ QA NPs had a more evident effect on the activity of S. aureus and E. coli cells than P. aeruginosa and B. subtilis . 
+ Against S. aureus and E. coli , survival rates of QA NPs were 12.4 % and 23.1 % , respectively . 
+ Survival rates of other two bacteria were 43.7 % and 56.3 % , respectively . 
+ The inhibitory effect of QA NPs was highest against E. coli . 
+ Therefore , we used S. aureus and E. coli cells as our model bacteria for all drug delivery studies . 
+ The CFU method was also adapted in this part , where the bacterial cell was estimated in an LB-agar powder plate . 
+ Photographs of bacterial colonies ( S. aureus and E. coli ) formed on LB-agar plates were blank , Qe ( 20 μg / mL ) , Ag NPs ( 20 μg / mL ) , and QA NPs ( 20 μg / mL ) groups . 
+ Corresponding images were graphed by origin software ( Figure 2A ) . 
+ Antibacterial activity of QA NPs was compared with blank , Qe , and Ag NPs groups . 
+ Water was used as blank . 
+ CFU was also adapted in our study , where bacterial cell viability was estimated throughout . 
+ CFU values of blank for S. aureus and E. coli reached 6.7 × 108 and 8.5 × 108 CFU/mL , respectively . 
+ CFU values of Qe for E. coli and S. aureus were 9.1 × 107 and 7.3 × 107 CFU/mL , respectively . 
+ CFU values of Ag NPs for E. coli and S. aureus were 1.9 × 107 and 1.3 × 107 CFU/mL , respectively . 
+ Most optimum antibacterial activity was demonstrated by QA NPs , which had a CFU value ( CFU/mL ) of 2.3 × 106 for E. coli and 1.7 × 106 for S. aureus . 
+ As shown in Figure 2A , the survival rate of QA NPs was very much lower than with the three other groups . 
+ Results suggested that QA NPs had superior antibacterial activities compared with raw Qe and Ag NPs . 
+ As shown in Table S1 , MICs of QA NPs are 2.8 μg / mL for S. aureus and 4.2 μg / mL for E. coli . 
+ The MIC of QA NPs against E. coli was lower than that of kanamycin ( 0.9 μg / mL ) , whereas the MIC of QA NPs against S. aureus was greater than that of kanamycin ( 0.3 μg / mL ) compared with ampicillin and kanamycin . 
+ However , the MICs of QA NPs against S. aureus and E. coli were less than Qe and Ag NPs . 
+ The MICs of Qe are 10.5 μg / mL for S. aureus and 7.5 μg / mL for E. coli . 
+ QA NPs antibacterial activity is predominant . 
+ QA NPs may become new antibacterial nanoparticles for further research . 
+ To further investigate antibacterial activity and drug delivery of QA NPs , the work used the ﬂuorescent property of proposed nanoparticles . 
+ Results suggested that QA NPs could enter cells at low concentration ( 5 μg / mL ) ; the ﬂuorescence intensity of the nanoparticles in cells gradually decreased ( Figure 2B ) with increasing concentration of QA NPs . 
+ This part of the study showed that increasing concentration of QA NPs could effectively inhibit bacteria . 
+ The in vitro experiment results suggested that QA NPs be checked as a new nanoparticle for antibacterial assay . 
+ Antibacterial Activity . 
+ To investigate drug delivery eficacy , a ﬂuorescence microscopy analysis experiment was carried out to test cellular uptake of NPs . 
+ The present study used the DAPI , which is a nucleic acid dye that can act on all cells in ﬂuorescence assays . 
+ QA NPs ( 5 , 10 , and 15 μg / mL ) were used to treat E. coli and S. aureus cell cultures for 12 h . 
+ As shown in Figure 3 , red cells resulted from actions of QA NPs and blue cells were dyed by DAPI . 
+ Fluorescence assays were performed at 210 nm laser excitation and 365 nm emission ﬁlters for QA NPs , at 358 nm laser excitation and 461 nm emission ﬁlters for DAPI . 
+ Pink cells were overlap images , which revealed that NPs exhibited a high uptake rate at 100 % . 
+ At low concentrations , QA NPs could enter the cell . 
+ But the inhibit effect was bad . 
+ After increasing the concentration of QA NPs , pink cells decreased . 
+ Above all , results showed that QA NPs could be easily uptaken by bacteria cells . 
+ Hence , QA NPs are possible antibacterial nanoparticles that can be transported into cells in vitro . 
+ Results suggested QA NPs had a good inhibitory effect for E. coli and S. aureus . 
+ In antibacterial assays , QA NPs either killed bacterial cells or incompletely destroyed bacteria but simply harmed cells ; bacteria then could not form visible colonies . 
+ In another ﬂuorescence 
+ Fluorescence assays were performed at 488 nm laser excitation and 530 nm emission ﬁlters for SYTO 9 ( live stain ) . 
+ At 561 nm laser excitation and 640 nm emission , ﬁlters for propidium iodide ( dead stain ) were performed . 
+ After brieﬂy being stained with the LIVE/DEAD kit , bacterial cells treated with QA NPs showed intensely red light , indicating dead cells ( Figure 4 ) . 
+ The green channel represented live cells . 
+ Through inhibitory assays results showed that QA NPs indeed killed bacteria cells , rather than harmed cells . 
+ Moreover , there are large quantities of living cells and few dead cells in the images ( low concentration of QA NPs treated ) . 
+ However , in experiments , a high concentration QA NPs yielded the completely opposite result . 
+ These results further conﬁrmed that QA NPs were bactericidal . 
+ But altered live or dead bacteria cells do not mean altered subsequent cell functions , such as protein synthesis and secretion . 
+ Cell Integrity Study . 
+ Morphological changes of S. aureus and E. coli treated with QA NPs were observed using SEM and TEM . 
+ Damage on the bacterial membrane is illustrated by SEM ( Figure 5 ) . 
+ E. coli and S. aureus without QA NPs maintained the integrity of the membrane structure after incubation for 12 h . 
+ The zoomed-in region ( control ) showed that bacterial cells were smooth and had an intact cell membrane . 
+ By contrast , cell integrity was compromised when cells were treated with QA NPs for 12 h compared with untreated cells . 
+ After treatment with QA NPs , the quantity of S. aureus and E. coli cells decreased , cell membrane was wrinkled , and intracellular contents were damaged . 
+ At high concentration of QA NPs solution ( 15 μg / mL ) , damage was still observed in surface morphologies of most cells , whereas leaked intracellular contents were observed in most S. aureus and E. coli cells , as shown in Figure 5 . 
+ The form and size of cells also changed signiﬁcantly . 
+ Results showed that QA NPs exhibited evident antibacterial effects on cell and QA NPs changed the morphology of the bacteria cell ; this effect might eventually result in cell death . 
+ Morphological changes of S. aureus and E. coli of mechanisms are indicative of bacterial cell membrane damage , which was caused by particulate matter from local dissociation of silver ion and Qe from QA NPs trapped in bacteria . 
+ Enhanced antibacterial activity was attributed to the synergy of Qe when combined with Ag NPs , as suggested by TEM results . 
+ Nanoparticles killed bacteria by penetrating the bacterial cell membranes and wall . 
+ This penetration is possibly the primary antibacterial mechanism . 
+ We carried out TEM experiments on bacterial sections ( E. coli and S. aureus ) and studied distribution of QA NPs inside bacteria for proving the mechanism . 
+ Control images showed that bacteria without exposure to QA NPs showed intact cell morphology and a clear cell wall . 
+ However , remarkable changes in the cell walls were observed after exposure to QA NPs . 
+ Cell walls were destroyed or disintegrated . 
+ As concentration increased , most bacteria lost cellular integrity after exposure to QA NPs solution for 12 h. Entire proﬁles became unclear , most cell walls were damaged , and the cytoplasm was leaking . 
+ Results were found in Figure 6 . 
+ Red arrows indicate cell walls , and yellow squares represent the QA NPs . 
+ Antibacterial activity of QA NPs can directly destroy bacterial cell walls , leading to bacterial death . 
+ The aforementioned results effectively suggested high antibacterial activity of QA NPs because of compromised bacterial cell integrity . 
+ Meanwhile , the cause of disruption of the DNA structure should be investigated . 
+ To further investigate antibacterial effects of QA NPs on bacterial cell membrane , we continue to carry out other experiments . 
+ E. coli has β-galactosidase enzyme which exists in the cytoplasm . 
+ When bacterial cell integrity was compromised by 
+ QA NPs , β-galactosidase would be released into solution . 
+ And β-galactosidase would produce ONP due to catalyzed hydrolysis of ONPG . 
+ It would be tested by UV − vis spectroscopy ( OD420 nm ) . 
+ Thus , we used ONPG analyzed β-galactosidase . 
+ As shown in Figure 7 , the concentration of β-galactosidase increased with increasing nanoparticle concentration in E. coli suspensions . 
+ These results indicated that treatment with QA NPs compromised membrane integrity . 
+ Loss of membrane integrity caused cytoplasm release into liquid medium . 
+ In this study , QA NPs can penetrate cell and enhance introduction of β-galactosidase . 
+ Some antibacterial agents can disrupt DNA , thus inﬂuencing synthesis of necessary enzymes and cell division , causing death of bacteria . 
+ As shown in Figure S2 , we used a concentration gradient ( 1 , 3 , 5 , 10 , and 15 μg / mL ) of QA NPs solution to treat bacteria ( E. coli and S. aureus ) . 
+ Bacterial cells exhibited prominent speciﬁc DNA degradation , which is typical of necrosis and degeneration , especially when cells were treated at high concentrations . 
+ By contrast , DNA ( control ) disappearance was not observed . 
+ We assumed that expression levels of DNA from both organisms gradually decreased with the concentrations of 
+ QA NPs ( Figure S2 ) . 
+ Results showed that QA NPs affected cells at the gene expression level . 
+ In Vivo Study . 
+ In vitro testing by previous experiment showed apparent antibacterial effects of QA NPs on E. coli and S. aureus . 
+ However , few reports illustrated and examined the in vivo operation model . 
+ The lack of characterization and evaluation of QA NPs in vivo greatly hinders their further development toward practical and routine biomedical applications . 
+ S. aureus is an opportunistic pathogen . 
+ Owing to its drug resistance and high mortality rate , S. aureus-caused infections became a widespread problem in the global medical community . 
+ Therefore , we explored and established a bacteremia model of mice infected by S. aureus . 
+ For antibacterial drugs and clinical applications , cytotoxicity to cells in vivo was unexplored . 
+ Histological analysis was used to reveal the cytotoxicity of QA NPs in mice . 
+ As shown in Figure S3 , hematoxylin and eosin ( H&E ) staining images of tissues of a blank group exhibited normal morphology . 
+ Simultaneously , the experimental group ( QA NPs treated ) showed no signiﬁcant effects on mice organs . 
+ Hepatocytes were normal , and signs of destruction were not present in liver samples . 
+ Therefore , 
+ QA NPs were also nontoxic to mice at effective concentrations of antibacterial drug agents . 
+ Biodistribution of bacteria in mice infected by micro-organisms at different time points ( 1 , 3 , and 5 days ) were explored . 
+ Bacteria were monitored in major organs at different days ( Figure S4A − C ) . 
+ The blank group had a normal quantity of bacteria ( ( 100 ± 10 ) × 102 CFU/mL or g ) , but infection and treatment groups showed almost a similar number of bacteria ( ( 700 ± 50 ) × 104 CFU/mL or g ) after intravenous injection ( Figure S4A ) . 
+ Bacteria gradually decreased from the third to the seventh day after QA NPs treatment ( Figure S4B , C ) in vivo . 
+ However , the bacterial number constantly caused organ inﬂammation , and the number of bacteria reached ( 600 ± 50 ) × 106 CFU/mL or g . 
+ When QA NPs were injected on infected mice , bacteria gradually decreased over time and were almost similar to those of the blank group on day 5 ( Figure S4C ) . 
+ Bacteremia also led to death for mice . 
+ Figure S4D shows survival curves of mice . 
+ The infected group had a minimum survival rate compared with the other two groups . 
+ The survival rate of the treatment group obviously increased . 
+ Possibly , the treatment group was cured by QA NPs . 
+ Three groups of mice ( blank , infection , and treatment groups ) were anesthetized and an abdomen incision was made on the seventh day after infection and treatment . 
+ Toxicity in target organs was observed through result of H&E staining images of tissues ; such observation was performed to decide whether S. aureus could cause inﬂammation or lesions . 
+ Five representative organs were ﬁxed , stained , and analyzed . 
+ As shown in Figure S5 , normal morphology was observed from H&E staining images of blank group tissues . 
+ These were observed to be normal such as hepatocytes in liver samples , pulmonary ﬁbrosis in lung samples , and glomerulus structure in kidney sections . 
+ However , the result showed visual inﬂammation or lesions tissue caused by S. aureus from the infection group . 
+ There were distinct results that appeared in the treatment groups . 
+ There was no apparent histopathological abnormalities . 
+ Red circles mean inﬂammation cells . 
+ Therefore , QA NPs could be used as a kind of antibacterial particle for S. aureus in vivo . 
+ E. coli Sequencing Data Results . 
+ Genomic DNA from E. coli ( control and QA NPs groups ) was sequenced in triplicate . 
+ The quality and length of the sequenced fragments were analyzed to select the most reliable target sequences . 
+ In total , the control group generated 11.186574 million raw reads with Q30 over 96 % . 
+ After removing low-quality sequences ( length < 35 bp ; Q < 20 ) , retained clean reads totaled 10.974774 million . 
+ The QA NPs group yielded 10.385656 million raw reads and 9.942204 million clean reads ( Table 1 ) . 
+ All error rates were low . 
+ Statistical analysis revealed a high total number of reads of sequencing samples and a high ratio of high-quality reads . 
+ The results suggested good quality sequencing data . 
+ Gene Expression Analysis . 
+ To identify differentially expressed genes , the expression of each gene analysis by FPKM 
+ ( the expected number of fragments per kilobase of transcript sequence per million base pairs sequenced ) . 
+ Gene expression levels were calculated based on universal reads . 
+ The differential gene expression was analyzed with the HTSeq program . 
+ The statistical analysis of gene expression identiﬁed genes that the QA NP and control groups differentially expressed . 
+ All uniquely mapped reads were transformed into FPKM by Cuﬄinks , and HTSeq passage was used to identify the DEGs ( Figure 8A ) . 
+ It shows results for different FPKM interval gene expressions using statistical analysis by HTSeq in Table 2 . 
+ FPKM had six intervals ( 0 − 1 , 1 − 3 , 3 − 15 , 15 − 60 , and > 60 ) . 
+ Two groups ( control and QA NPs ) exhibited different gene expression counts in each FPKM interval ( Figure 8B , Table 2 ) . 
+ These results suggested that E. coli treated with QA NPs had different gene expression compared with the control group . 
+ The relationship was assessed using Pearson 's correlation coeficient ( r ) . 
+ Linear regression and heat map diagram analyses were performed to evaluate the association between the control and QA NPs ( Figure 9 ) . 
+ The R2 value ( 0.687 ) < 0.8 indicated poor-level correlation . 
+ The result was caused by treatment of E. coli with QA NPs . 
+ The observed caused requires analysis in succeeding experiments . 
+ To characterize transcriptome changes in E. coli treated with QA NPs , the present study provided DEGs by comparing QA NPs with control groups . 
+ A total of 460 DEGs ( FPKM > 1 ) were discovered in two groups ; 451 and 9 DEGs were identiﬁed in control and QA NPs , respectively ( Figure 10A ) . 
+ Two groups of genes were discerned : 330 genes were up-regulated and 1294 genes were down-regulated in control and QA NPs , respectively ( Figure 10B ) . 
+ We established differentially genes expressed between treatment ( QA NPs ) and control groups . 
+ Control and QA NP populations showed high numbers of speciﬁcally expressed genes , suggesting that QA NPs possibly played key roles in antibacterial effect . 
+ Through the FPKM values of two groups , we constructed a heat map ( Figure 10C ) . 
+ Figure 10C showed heat maps of the induced and suppressed transcripts of NPs-exposed samples relative to matched controls and QA NPs exposures ; the ﬁgure illustrated agreement of results from different donors in terms of fold-change values . 
+ Interestingly , the study discovered that speciﬁcally expressed genes were higher than that of differentially expressed lncRNAs . 
+ Here , two gene groups manifested different expressions , and this result was consistent with that of previous antibacterial assays , showing high-effect antibacterial activity of QA NPs . 
+ Functional Classiﬁcations by GO . 
+ The GO project is a collaborative effort to provide reliable gene product descriptions from various databases . 
+ GO offers a set of dynamic , controlled , and structured terminologies to describe gene functions and products in any organism23 ,52 . 
+ All transcripts have been further functionally characterized into GO categories , such as molecular functions , biological processes , and cellular components . 
+ GOseq was used for the GO functional classiﬁcation of the assembled E. coli at the macrolevel . 
+ Enriched GO terms totaled 30 terms . 
+ As shown in Figure 11A , GO analysis identiﬁed a total of 10 terms related to cellular components , 15 terms for biological processes , and 5 terms for molecular functions ( QA NPs versus control ) . 
+ Regarding cellular component ontology , most represented categories were `` cell adhesion '' ( GO : 0007155 ) , `` translation '' ( GO : 0006412 ) , `` cell wall organization or biogenesis '' ( GO : 0071554 ) , and `` cell wall macromolecule metabolic process '' ( GO : 0044036 ) . 
+ Results showed a wall of E. coli culture with QA NPs had been changed . 
+ This result also explained the mechanism of SEM and TEM . 
+ We estimated the expression of the 30 GO terms ( Figure 11B ) . 
+ Results showed that gene expression of E. coli treated by QA NPs was up-regulated or down-regulated and indicated a molecular mechanism for antibacterial effect of QA NPs . 
+ Pathway Analysis by KEGG . 
+ Research on biological pathways is essential in understanding and advancing genomics research . 
+ The highly integrated database Kyoto Encyclopedia of Genes and Genome ( KEGG ) provides data on biological systems and their relationships at the molecular , cellular , and organism levels . 
+ KEGG pathway annotations were generated ( Figure 12 ) from assembled E. coli transcriptome , and results were mapped with GO terms . 
+ KEGG analysis revealed 22 KEGG pathways . 
+ We selected the group related to the microbial metabolic pathway for further analysis : microbial metabolism in diverse environments , the bacterial secretion system , and bacterial chemotaxis ( Figure S6 ) . 
+ KEGG analyses of E. coli treated by QA NPs transcriptome sequences revealed the presence of signiﬁcant DEGs enrichment in three pathways compared with the control group . 
+ Results showed that QA NPs affected the bacterial metabolic pathway , thus inhibiting E. coli growth . 
+ In this study , an environmentally friendly , facile , and simple method was developed to synthesize QA NPs . 
+ Nanomaterial was fully characterized by TEM , SEM , FTIR spectra , UV − vis absorption spectra , ﬂuorescence spectrometry , and Zetasizer ZS Nano instrument . 
+ QA NPs had a diameter of approximately 10 nm . 
+ QA NPs showed highly effective antibacterial activities against drug-resistant E. coli and S. aureus . 
+ CFU assays suggested that QA NPs had more pronounced antibacterial effects than Qe and Ag NPs . 
+ Fluorescence microscopy assays demonstrated that individually dispersed QA NPs had high antibacterial activity and cellular uptake . 
+ SEM , TEM , and ONP experiments were used to investigate mechanisms of cell integrity study through changes in membrane and enzymes . 
+ Disruption of nucleic acids assay assumed that expression levels of DNA from both organisms gradually decreased with the concentrations of QA NPs . 
+ In vivo studies demonstrated that QA NPs could act as a kind of antibacterial particle for S. aureus in vivo . 
+ Gene expression proﬁling such as RNASeq can be used to predict discover mechanisms of toxicity of QA NPs . 
+ E. coli sequencing data results revealed a high total number of reads and high ratio of highquality reads of sequencing samples , suggesting good quality and quantity of sequencing data . 
+ Gene expression analysis showed different expressions of two gene groups ; this result was consistent with that of previous antibacterial assays , showing high-effect antibacterial activity of QA NPs . 
+ Results showed that the antibacterial mechanism of NPs was at the genes expression level . 
+ The results showed an antibacterial mechanism of NPs at the gene expression level . 
+ GO and KEGG pathway analyses reveal key pathways involved in the biological pathway to E. coli treated by QA NPs . 
+ The present study reports that transcriptome analysis of the mechanism offers novel insights into the molecular mechanism of antibacterial assays . 
+ S Supporting Information The Supporting Information is available free of charge on the ACS Publications website at DOI : 10.1021 / acsami .7 b02380 . 
+ Qe release from QA NPs ( Figure S1 ) , the expression level of DNA gradually decreased with QA NPs exposure ( Figure S2 ) , cytotoxicity of QA NPs assay ( Figure S3 ) , preferentially distributed bacteria in blood and tissues ( Figure S4 ) , biodistribution of bacteria and histological analysis ( Figure S5 ) , microbial metabolic pathway study in E. coli transcriptome ( Figure S6 ) , and MIC values ( PDF ) 
+ * Tel. :086 551 65786703 . 
+ Fax : 086 551 65786703 . 
+ E-mail : weiywswzy@163.com . 
+ This work was supported by the National Natural Science Foundation of China ( Grant No. 21401002 ) , the Natural Science Foundation of Anhui Province , China ( Grant No. 1508085QB37 ) , and the Youth Science Fund Key Project of Anhui Agricultural University ( Grant No. 2013ZR011 ) .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/28489862.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/28489862.txt 0 → 100644
View file @27818a9
+ motility of Escherichia coli ST131
+ Abstract 
+ 1 School of Chemistry and Molecular Biosciences , University of Queensland , Brisbane , Queensland , Australia , 2 Australian Infectious Disease Research Centre , University of Queensland , Brisbane , 
+ Uropathogenic Escherichia coli ( UPEC ) is the cause of ~ 75 % of all urinary tract infections ( UTIs ) and is increasingly associated with multidrug resistance . 
+ This includes UPEC strains from the recently emerged and globally disseminated sequence type 131 ( ST131 ) , which is now the dominant fluoroquinolone-resistant UPEC clone worldwide . 
+ Most ST131 strains are motile and produce H4-type flagella . 
+ Here , we applied a combination of saturated Tn5 mutagenesis and transposon directed insertion site sequencing ( TraDIS ) as a high throughput genetic screen and identified 30 genes associated with enhanced motility of the reference ST131 strain EC958 . 
+ This included 12 genes that repress motility of E. coli K-12 , four of which ( lrhA , ihfA , ydiV , lrp ) were confirmed in EC958 . 
+ Other genes represented novel factors that impact motility , and we focused our investigation on characterisation of the mprA , hemK and yjeA genes . 
+ Mutation of each of these genes in EC958 led to increased transcription of flagellar genes ( flhD and fliC ) , increased expression of the FliC flagellin , enhanced flagella synthesis and a hyper-motile phenotype . 
+ Complementation restored all of these properties to wild-type level . 
+ We also identified Tn5 insertions in several intergenic regions 
+ ( IGRs ) on the EC958 chromosome that were associated with enhanced motility ; this included flhDC and EC958_1546 . 
+ In both of these cases , the Tn5 insertions were associated with increased transcription of the downstream gene ( s ) , which resulted in enhanced motility . 
+ The EC958_1546 gene encodes a phage protein with similarity to esterase/deace-tylase enzymes involved in the hydrolysis of sialic acid derivatives found in human mucus . 
+ We showed that over-expression of EC958_1546 led to enhanced motility of EC958 as well as the UPEC strains CFT073 and UTI89 , demonstrating its activity affects the motility of different UPEC strains . 
+ Overall , this study has identified and characterised a number of novel 
+ Introduction
+ Uropathogenic Escherichia coli ( UPEC ) are the most common cause of urinary tract infection ( UTI ) , a disease of major significance to global human health [ 1 -- 3 ] . 
+ UPEC employ a range of virulence factors to colonise the urinary tract and cause symptomatic UTI , including adhesins , toxins , iron-acquisition systems , polysaccharide surface structures and flagella [ 4 -- 8 ] . 
+ Overall , the combined affect of genetic variation , redundancy and genomic diversity means that no single virulence factor is uniquely associated with the ability of UPEC to cause disease . 
+ This complex picture is further convoluted by increased resistance to antibiotics , which complicates the treatment of UTI and highlights the urgent need to better understand UPEC pathogenesis . 
+ A major contributor to increased antibiotic resistance among UPEC is the fluoroquinolone-resistant sequence type 131 ( ST131 ) clone , which has emerged recently and disseminated rap-idly across the globe [ 9 -- 11 ] . 
+ Flagella are complex multi-subunit , filamentous organelles that contribute to various aspects of UPEC virulence , including motility , chemotaxis , adhesion , biofilm formation and immune modulation [ 5 , 12 -- 14 ] . 
+ In mice , flagella provide a fitness advantage for UPEC coloni-zation of the urinary tract , leading to increased colonization and persistence in mixed competitive infection experiments comprising wild-type and isogenic flagella mutant strains [ 15 , 16 ] . 
+ Flagella-mediated motility is also required for UPEC ascension to the upper urinary tract and subsequent dissemination to other sites [ 17 ] . 
+ Complementing these studies , others have shown that flagella also contribute to UPEC invasion of mouse renal epithelial collecting duct cells [ 5 ] and enhanced adhesion to and invasion of bladder epithelial cells [ 14 ] . 
+ Flagella are also required for UPEC biofilm formation on abiotic surfaces [ 12 ] . 
+ The biosynthesis , assembly and regulation of E. coli flagella have been the subject of extensive research over many decades [ 18 -- 21 ] . 
+ The flagella structure contains three distinct components , the basal body , hook and an extracellular filament composed of the major subunit protein FliC or flagellin . 
+ The FliC is highly immunogenic and sequence variation within its hyper-variable central region defines the E. coli H antigen diagnostic serotype marker [ 22 ] . 
+ The synthesis and assembly of flagella occurs via a highly ordered process that involves a combination of transcriptional , translational and post-translational regulatory mechanisms . 
+ At the transcriptional level , the regulation of flagella is coordinated via a hierarchical cascade that involves three stages of control [ 23 ] ; the FlhDC master regulators control the first stage of this process . 
+ Numerous global regulatory proteins influence flagella expression by either positively or negatively regulating the transcription of flhDC [ 24 , 25 ] . 
+ Major transcriptional activators of flhDC include the cyclic AMP-catabolite activator protein ( CRP ) [ 26 ] , the histone-like nucleoid-structuring ( H-NS ) protein [ 26 , 27 ] , the quorum sensing E. coli regulators B and C ( QseBC ) [ 28 -- 30 ] and the MatA regulator of the E. coli common pilus [ 31 ] . 
+ Conversely , major transcriptional repressors of flhDC include the LysR-type regulator LrhA [ 32 ] , the osmoregulator protein OmpR [ 33 ] , the colanic acid activator Rcs [ 34 ] , the P fimbriae-associated regulator PapX [ 35 , 36 ] , the ferric uptake regulatory protein ( Fur ) [ 37 ] and integration host factor ( IHF ) [ 38 ] . 
+ Mutation of these regulatory genes alters the transcription of flhDC and leads to either reduced or enhanced motility . 
+ Our understanding of E. coli motility has been enhanced by the application of large-scale genetic screens to study flagella expression and chemotaxis . 
+ Overall , these studies have shown that many different cell processes influence this complex phenotype . 
+ For example , Girgis et al. . 
+ ( 2007 ) performed a powerful genome-wide investigation that combined competitive selection and microarray analysis , and resulted in the characterization of thirty-six novel motility-asso-ciated genes [ 39 ] . 
+ These genes encoded for a diverse range of non-flagellar factors , and notably comprised a large number of cell envelope proteins including transporters , periplasmic enzymes and intrinsic membrane proteins . 
+ Another study by Inoue et al. ( 2007 ) screened a comprehensive collection of E. coli K-12 mutants ( the Keio collection ) and compiled a detailed compendium of genes involved in swimming and swarming motility [ 40 ] . 
+ Again , a range of non-flagellar genes were identified , including those encoding factors associated with metabo-lism , iron acquisition , protein-folding and the biosynthesis of lipopolysaccharide ( LPS ) as well as other cell-surface components . 
+ Large-scale genetic screens to study motility in Salmonella have also been performed , with similar classes of genes identified [ 41 , 42 ] . 
+ Interestingly , a set of genes associated with enhanced motility ( hyper-motility ) of Salmonella were identified in one of these studies ; in some cases this phenotype was associated with increased expression of flagellin on the cell surface [ 41 ] . 
+ While the flagella regulon from E. coli has been extensively studied , the identification and characterisation of genes associated with hyper-motility has not been examined in great detail . 
+ We recently described the combined application of saturated Tn5 mutagenesis and transposon directed insertion site sequencing ( TraDIS ) to comprehensively define the complete set of genes associated with resistance to human serum in the UPEC ST131 strain EC958 [ 43 ] . 
+ Here , we applied TraDIS as a large scale genetic screen and identified a series of genes that when mutated , led to increased motility of the UPEC ST131 reference strain EC958 . 
+ Results
+ Identification of genes associated with enhanced motility of EC958 
+ We devised a swimming assay in combination with a forward genetic screen using a previously generated hyper-saturated mini-Tn5 mutant library [ 43 ] to identify genes associated with enhanced motility of EC958 ( Fig 1 ) . 
+ In this assay , a pool of approximately 1x10 Tn 7 5 mutants 
+ ( input pool ) was spotted in the center of 20 soft LB agar plates and incubated for 10 hours at 37 ˚C . 
+ Motile cells were recovered from the edge of the swimming zone of each plate by extracting the LB agar at a distance of 30mm from the point of inoculation ( output pool ) . 
+ EC958 genomic DNA was purified from the input and output pools and sequenced using a multiplexed TraDIS procedure . 
+ The input and output pools yielded 8.4 x10 -- 7x10 Tn 5 6 5-specific 
+ Using a stringent threshold cutoff ( log2 fold change [ logFC ] > 5 ; P < 0.001 ) , 30 genes were identified that , when mutated , led to enhanced motility of EC958 ( Table 1 ) . 
+ For each of these genes , the number of reads corresponding to Tn5 insertions was significantly increased in cells at the periphery of the swimming zone ( output pool ) compared to the input pool . 
+ Twelve of these genes have previously been shown to repress motility , namely lrhA [ 32 ] , ihfA [ 38 ] , ydiV [ 44 , 45 ] , ihfB [ 38 ] lrp [ 46 ] , clpXP [ 47 ] , papX [ 35 , 36 ] , rcsB [ 34 ] pfrA [ 48 ] , yeaI [ 49 ] and fliT [ 50 ] . 
+ The remaining genes represent novel factors that influence the motility of EC958 , with their function ranging across eight Clusters of Orthologous Groups ( COGs ) ( Table 1 ) . 
+ Genetic characterisation of selected hyper-motility mutants
+ In order to extend our TraDIS data we validated a selection of the genes involved in repression of motility by generating targeted mutants for further investigation . 
+ Thus , EC958 mutants containing deletions in four genes previously shown to repress motility in E. coli K-12 ( lrhA , ihfA , ydiV , lrp ) and three novel motility-associated genes ( mprA , hemK , yjeA ) were constructed by λ 
+ Red-mediated recombination and characterized in motility assays . 
+ In these experiments , all seven mutants displayed an enhanced swimming phenotype on 0.25 % LB agar compared to the wild-type EC958 strain ( Fig 2 ) . 
+ To further confirm the role of the novel motility-associated mprA , hemK and yjeA genes , the genes were cloned in the low copy number plasmid pSU2718G and introduced into their respective mutants to enable genetic complementation . 
+ The wild-type , mutant and complemented strains were then grown on 0.25 % LB agar to compare their swimming phenotype . 
+ In each case , the motility rate of the complemented mutants was restored to wild-type level ( Fig 3 ) . 
+ Taken together , this data confirms the involvement of mprA , hemK and yjeA in EC958 motility and suggests the TraDIS analysis has accurately iden ¬ 
+ Mutation of mprA, hemK and yjeA enhances the transcription and translation of flagella genes
+ To further understand the mechanism by which mutation of mprA , hemK and yjeA could enhance motility , we analysed our set of wild-type , mutant and complemented strains by examining ( i ) transcription of the flhD and fliC genes , ( ii ) expression of the FliC flagellin protein , and ( iii ) the number of flagella on the cell surface . 
+ Mutation of the mprA , hemK and yjeA genes led to a significant increase in the transcription of flhD ( 10.2 , 6.4 and 6.6 fold increase , respectively ) and fliC ( 16.4 , 21.0 , 15.2 fold increase , respectively ) , while complementation of the mutants restored the transcript of flhD and fliC to wild-type levels ( Fig 4A and 4B ) . 
+ In line with these data , the levels of FliC flagellin observed by western blotting using an H4-specific antibody were also elevated in all three mutants compared to the wild-type and complemented strains ( Fig 4C and 4D ) . 
+ To link these elevated levels of flagella biosynthesis to the number of flagella organelles per cell , transmission electron microscopy was employed . 
+ Based on a count of 200 randomly selected cells for each strain , wild-type EC958 had an average of 0.8 ± 0.1 fla-gella per cell , which of interest was relatively low compared to strains used for routine studies of motility and chemotaxis [ 51 ] . 
+ In contrast , the three mutants all possessed significantly higher numbers of flagella per cell ; EC958mprA 2.4 ± 0.3 flagella/cell , EC958hemK 3.3 ± 0.7 fla-gella/cell and EC958yjeA 2.0 ± 0.4 flagella/cell . 
+ Complementation of each of the mutants reduced the average number of flagella per cell back to wild-type level ( Fig 4E , S1 Fig ) . 
+ Taken together , our data suggest these three genes play a role in controlling the number of flagella per cell and disruption of any of the genes results in hyper-motility by increasing the number of flagella on the cell surface . 
+ Transposon insertions in intergenic regions associated with enhanced motility 
+ The very high level of saturation in our miniTn5 library enabled us to compare the insertion frequency within intergenic regions ( IGRs ) between input and output pools , and thus determine the impact of insertions in these regions on hyper-motility . 
+ There are 3973 IGRs on the chromosome of EC958 , eight of which contained significantly more miniTn5 insertions in the output pools than in the input pools . 
+ Out of these eight IGRs , six were located upstream of coding sequences ( CDS ) which are known to repress motility and were discovered in our primary TraDIS analysis ( Table 2 ) . 
+ The orientation of the miniTn5 cassette within these six IGRs was unidirectional , such that the chloramphenicol resistance gene was orientated in the opposite direction of the downstream genes and thus the insertion most likely abolished their transcription . 
+ The identification of an increased miniTn5 insertion frequency in both the promoter region and CDS of these genes provides further evidence to support the conclusion that their disruption leads to a hyper-motile phenotype . 
+ The two remaining IGRs identified by TraDIS ( EC958_IGR1610 , upstream of flhD and 
+ EC958_IGR1146 , upstream of EC958_1546 ) contained miniTn5 insertions uniquely located such that the chloramphenicol resistance gene was orientated in the same direction as the downstream gene . 
+ Furthermore , in both cases , the downstream gene was devoid of miniTn5 insertions , suggesting that the function of these genes was required for motility and that their increased transcription ( via read-through from the chloramphenicol resistance gene promoter in the miniTn5 transposon ) could result in hyper-motility . 
+ In the case of insertions in the IGR upstream of flhDC , this interpretation is consistent with other literature that has shown overexpression of these master regulator genes leads to hyper-motility [ 52 -- 57 ] . 
+ However , we also confirmed this by introducing a strong constitutive promoter ( PcL ) upstream of the flhDC CDS to generate EC958PcLflhDC ; as expected , this strain exhibited enhanced motility and pro ¬ 
+ Overexpression of the EC958_1546 gene enhances EC958 motility 
+ The EC958_1546 gene is located within the phi4 prophage ( EC958_Phi4 : 1436674 . 
+ .1490889 ) and encodes a hypothetical phage protein . 
+ We hypothesized that like insertions in the IGR upstream of flhDC , the unidirectional miniTn5 insertions in the IGR upstream of EC958_1546 enhanced its transcription and imparted a positive effect on motility . 
+ To investigate this further , we generated an isogenic EC958_1546 mutant ( EC958Δ1546 ) by λ-red mediated homologous recombination . 
+ EC958Δ1546 motility was unchanged compared to wild-type EC958 ( Fig 
+ 5 ) , suggesting its deletion did not alter this phenotype . 
+ Next we cloned the EC958_1546 gene into the low copy number expression vector pSU2718 to generate plasmid p1546 . 
+ Transformation of plasmid p1546 into EC958Δ1546 led to significantly enhanced motility ( Fig 5 ) . 
+ Finally , we constructed an EC958_1546 overexpressing strain by inserting a constitutive PcL promoter upstream of the chromosomal EC958_1546 gene ( strain EC958PcL1546 ) . 
+ The motility rate of 
+ EC958PcL1546 was also significantly higher than wild-type EC958 ( Fig 5 ) . 
+ Taken together , these data strongly support a role for the product of the EC958_1546 gene in enhancing motil ¬ 
+ EC958_1546 overexpression enhances transcription of the flhD master regulator 
+ To investigate the mechanism by which EC958_1546 enhances motility , we used the same approach described above and examined flhD and fliC transcription by qRT-PCR , FliC expression by western blot analysis and flagella expression by TEM . 
+ Compared to wild-type EC958 , the transcription of flhD was ~ 2-fold higher for EC958PcL1546 and ~ 4-fold higher for EC958Δ1546 ( p1546 ) ( Fig 6A ) . 
+ Similarly , the transcription of fliC was also significantly increased ( ~ 11-fold for EC958PcL1546 and ~ 32-fold for EC958Δ1546 ( p1546 ) ; Fig 6B ) . 
+ Consistent with our motility analysis , no significant difference was observed in the transcription of flhD and fliC in EC958Δ1546 ( Fig 6B ) . 
+ Overexpression of EC958_1546 also led to an increase in FliC expression ( Fig 6C and 6D ) and flagella production ( Fig 6E , S3 Fig ) compared to wild-type EC958 . 
+ Thus , our data strongly support a mechanism whereby overexpression of EC958_1546 leads to enhanced transcription of the flhDC master regulator genes , resulting in 
+ Overexpression of EC958_1546 also leads to hyper-motility of other UPEC strains 
+ To extend our analysis on the function of EC958_1546 , we also examined its overexpression in two other well-characterised UPEC strains , namely CFT073 and UTI89 . 
+ Plasmid p1546 was transformed into both strains to generate CFT073 ( p1546 ) and UTI89 ( p1546 ) , respectively . 
+ In both strains , overexpression of EC958_1546 led to increased motility compared to vector control strains ( Fig 7 ) , demonstrating that EC958_1546 can enhance motility in multiple UPEC strains ( Fig 7 ) . 
+ Discussion
+ The use of TraDIS to identify genes involved in motility represents a novel application for this high throughput forward genetic screen . 
+ We initially hypothesized that all mutants defective in swimming would be absent from the output pool , and thus that our screen would identify the complete flagella regulon of EC958 . 
+ However , analysis of our TraDIS data did not reveal any genes that exhibited a significant reduction in insertion frequency in the output pool ( compared to the input pool ) , suggesting that non-motile mutants are likely to be ` carried ' by the wave of swimming cells in our assay . 
+ Indeed , this is consistent with the previous findings of Girgis et al. [ 39 ] , who demonstrated a requirement for up to five rounds of selection and enrichment of swimming cells to identify genes essential for motility . 
+ Instead , our TraDIS analysis identified 30 genes associated with the enhanced motility of EC958 . 
+ This included 12 genes encoding factors known to repress motility of E. coli K-12 , four of which ( lrhA , ihfA , ydiV , lrp ) were confirmed in this study . 
+ The remaining genes represent novel factors that impact motility , and we focused our investigation on characterisation of the mprA , hemK and yjeA genes . 
+ Mutation of each of these genes in EC958 led to increased transcription of flhD and fliC , increased expression of the FliC flagellin , enhanced flagella synthesis and a hypermotile phenotype . 
+ Importantly , all of these properties were restored to wild-type level upon complementation . 
+ MprA ( also known as EmrR ) is a transcriptional regulator that belongs to the MarR family of winged helix DNA binding proteins , which control the expression of a range of bacterial genes involved in virulence , resistance to antibiotics , response to oxidative stresses and the catabolism of environmental aromatic compounds [ 58 , 59 ] . 
+ In E. coli , the mprA gene is located in an operon together with the ermAB genes that encode a multidrug resistance pump [ 60 , 61 ] . 
+ MprA represses transcription of ermAB by direct binding to its promoter region [ 62 ] . 
+ A recent study reported that MprA also controls UPEC capsule synthesis , and specific inhibitors of MprA prevented polysaccharide capsule production [ 63 ] . 
+ In this case , the effect of MprA on capsule production was indirect and most likely coordinated through a broader regulatory network . 
+ Here , we identified a new role for MprA in UPEC motility . 
+ Although the precise molecular mechanism by which MprA represses UPEC motility remains to be determined , our data suggest its effect is mediated at the transcriptional level , and could occur either directly by binding to the flhDC promoter region or indirectly by affecting the expression of other flhDC regulators . 
+ In this respect , we note that mutation of ermAB did not change the motility of 
+ EC958 ( S4 Fig ) , ruling out an affect via altered expression of the ErmAB multidrug resistance pump . 
+ Among the other characterised MarR-like transcriptional regulators , PapX has also been shown to repress the motility of UPEC [ 35 , 46 ] . 
+ PapX directly binds to the flhDC promoter and represses transcription , and its over-expression results in reduced flagellin production and decreased motility [ 35 ] . 
+ Consistent with these data , papX was also identified in our TraDIS screen . 
+ Taken together , our results provide strong evidence that in addition to PapX , MprA also affects UPEC motility . 
+ HemK is a protein ( N5 ) - glutamine methyltransferase that modulates the termination of release factors in ribosomal protein synthesis [ 64 -- 66 ] . 
+ In E. coli , mutation of hemK causes defects in translational termination , leading to reduced growth rate and induction of the oxidative stress response [ 64 , 66 ] . 
+ We also observed a significant growth defect for the EC958hemK mutant in comparison to wild-type EC958 and the complemented mutant EC958hemK ( pHemK ) ( S5 Fig ) . 
+ Based on this knowledge , the observation that deletion of hemK leads to enhanced motility is difficult to understand . 
+ It is possible that the induction of multiple stresses in a hemK mutant background results in increased FlhDC expression . 
+ Indeed , FlhDC expression is responsive to a range of environmental stimuli ( e.g. temperature , osmolarity and pH ) [ 67 ] . 
+ YjeA ( also known as PoxA ) is a lysine 2,3-aminomutase that mediates post-translational modification of elongation factor-P ( EF-P ) [ 68 -- 70 ] . 
+ EF-P is an essential component of bacterial protein synthesis and binds to ribosomes to facilitate peptide bond formation [ 71 , 72 ] . 
+ In E. coli , the lysine residue 34 ( Lys34 ) of EF-P is posttranslationally modified by YjeA , resulting in increased affinity of EF-P to the ribosome [ 73 , 74 ] and prevention of ribosome stalling at polyproline stretches [ 73 , 74 ] . 
+ EF-P Lys34 can also be modified by a second enzyme , YjeK [ 69 ] . 
+ Notably , both yjeA and yjeK were identified in our TraDIS motility screen ( Table 1 ) , suggesting that a defect in EF-P modification actually enhances the motility of EC958 . 
+ In Salmonella , a contrasting motility phenotype for yjeA and yjeK mutants has been reported , with mutation of these genes leading to impaired motility [ 75 , 76 ] . 
+ It is possible that these differences may be related to the relative abundance of flagella-related proteins that contain polyproline stretches between both organisms , although a direct comparison of flagella proteins in EC958 and the Salmonella strain UK-1 did not reveal any major differences ( S3 Table ) . 
+ Alternatively , as observed for hemK , mutation of yjeA and yjeK may induce a stress response that leads to increased FlhDC expression . 
+ Overall , the precise mechanism by which mutation of yjeA results mutated , led to enhanced motility of EC958 . 
+ The function of these genes ranged across seven COG functional categories , including ` cell wall/membrane/envelope biogenesis ' ( 2 genes ) , 
+ ` mobilome : prophages , transposons ' ( 2 genes ) , Signal transduction ( 3 genes ) , aminoacid transport and metabolism ( 2 genes ) and others ( Table 1 ) . 
+ Confirmation of the role of these genes in motility via the construction and characterisation of specific mutants is now required . 
+ The use of a highly saturated mutant library in our TraDIS procedure also enabled the interrogation of miniTn5 insertions within IGRs on the EC958 chromosome . 
+ In total , eight 
+ IGRs were identified that contained significantly more insertions in the output pool compared to the input pool , indicating insertions within these IGRs led to enhanced motility . 
+ Six of these 
+ IGRs were located upstream of CDS for genes known to repress motility , all of which were also identified in our screen . 
+ Close inspection of the Tn5 insertions revealed their orientation was unidirectional and opposite to the direction of the downstream genes , consistent with the notion that the insertion disrupted transcription of the corresponding gene . 
+ The analysis also identified Tn5 insertions in IGRs upstream of flhDC and EC958_1546 . 
+ These Tn5 insertions were also unidirectional , but instead orientated in the same direction as the respective downstream gene , which in both cases was devoid of Tn5 insertions . 
+ We hypothesized that these Tn5 insertions most likely resulted in enhanced transcription of the downstream gene ( s ) ; indeed this was confirmed by introducing the strong constitutive PcL promoter upstream of both genes , which resulted in enhanced motility . 
+ Thus , our approach has revealed a novel application of TraDIS to identify genes that enhance a specific phenotype when their transcription is increased . 
+ EC958_1546 encodes a hypothetical phage protein predicted to be 617 amino acids in length . 
+ EC958_1546 displays 58 % identity over 326 amino acids to NanS , an N-acetylneurami-nic acid deacetylase that catalyses the hydrolysis of the 9-O-acetyl group of 9-O-acetyl-N-acet-ylneuraminate , an alternative sialic acid commonly found in mammalian host mucosal sites such as the human intestine [ 77 -- 79 ] . 
+ We speculate that over-expression of the EC958_1546 protein may enhance motility via an altered chemotactic response , however this remains to be experimentally proven . 
+ Interestingly , there are three additional genes on the EC558 chromosome that display similarity to EC958_1546 , namely EC958_1029 ( 82.7 % amino acid identity over the whole protein ) , EC958_3294 ( 57.4 % amino acid identity over 317 amino acids ) and EC958_0037 ( 58.4 % amino acid identity over 319 amino acids ) ( S6 Fig ) . 
+ None of these three genes were identified in our TraDIS screen . 
+ Furthermore , PCR amplification , cloning and overexpression of these genes in EC958 did not alter motility ( S7 Fig ) , confirming the specific affect of EC958_1546 on this phenotype . 
+ We also showed that over-expression of EC958_1546 in two other UPEC strains could also invoke an enhanced motility phenotype , demonstrating the affect is not strain specific . 
+ In this respect , an Stx-phage-encoded protein ( 933Wp42 ) from enterohemorrhagic E. coli that possesses 53 % amino acid identity with EC958_1546 has been shown to have esterase activity [ 78 ] , and other phage-encoded variants of nanS have been described [ 80 ] . 
+ Thus , it is possible that the over-expression of other phage proteins with the capacity to degrade different carbon sources could also impact motility . 
+ Overall , this study demonstrates the application of TraDIS to identify novel genes associated with enhanced motility . 
+ A better understanding of the mechanisms by which many of the 
+ Materials and methods
+ Bacterial strains and growth conditions
+ All strains and plasmids used in this study are listed in Table 3 . 
+ Strains were routinely cultured at 37 ˚C on solid or in liquid Lysogeny Broth ( LB ) medium supplemented with the appropriate antibiotics ( chloramphenicol 30 μg / ml or gentamicin 20 μg / ml ) unless indicated otherwise . 
+ Where necessary , gene expression was induced with 1mM isopropyl β-D-1-thiogalactopyrano - 
+ Molecular methods
+ DNA purification , PCR and Sanger DNA sequencing was performed as previously described [ 88 ] . 
+ Targeted mutations were generated using a modified λ-Red recombineering method [ 81 , 
+ 84 ] . 
+ A list of primers used in this study is provided in S2 Table . 
+ In brief , the final PCR products were generated by a 3-way PCR that resulted in amplification of the chloramphenicol resistance gene cassette flanked by 500-bp homologous regions matching the target gene to be mutated . 
+ The PCR products were electroporated into EC958 harbouring pKOBEG-Gent . 
+ Mutants were selected by growth in the presence of chloramphenicol and confirmed by sequencing . 
+ Complementation was performed by cloning the gene of interest into pSU2718 
+ [ 87 ] or pSU2718G . 
+ The resultant plasmid was then transformed into the respective mutant and gene expression was induced using 1 mM IPTG . 
+ Screening assay for identification of mutants with enhanced motility Approximately 1x10 cells from a previously constructed miniTn 
+ ( input pool ; [ 43 ] ) were inoculated into the center of each of 20 LB soft agar plates ( 80 mm diameter ) and incubated for 10 hours at 37 ˚C . 
+ Motile cells were recovered by extracting the LB agar at a distance of 30 mm from the point of inoculation ( the edge of the swimming zone ; output pool ) . 
+ Approximately 5g of soft agar ( plus motile cells ) was collected from each plate and vigorously mixed with LB broth to achieve a suspension of 1g agar/ml . 
+ Five ml of this mixture was drawn from each tube ( n = 20 ) and pooled . 
+ The pooled mixture was centrifuged at 6000 rpm for 10 min at room temperature to separate the bacterial pellet from soft agar . 
+ This centri-fugation step produced a tight bacterial pellet surrounded by a loose mass of soft agar and a layer of supernatant . 
+ The agar and supernatant was removed , and the pellet was resuspended in LB to an OD600 of 1.8 ; genomic DNA was extracted from 5 ml of this suspension using the Qiagen genomic DNA purification kit . 
+ DNA from the input pool was extracted in the same 
+ Multiplexed TraDIS
+ TraDIS was performed essentially as previously described [ 43 ] , but with some modifications for adaptation to the MiSeq platform [ 89 ] . 
+ Briefly , 50 ng of genomic DNA from each sample ( 2 biological replicates of input and output pools , respectively ) was fragmented and tagged with adapter sequence via one enzymatic reaction ( tagmentation ) . 
+ Following tagmentation , DNA was purified using Zymo DNA Clean & Concentrator ™ kit ( Zymo Research ) . 
+ The PCR enrichment step was run using index primer 1 ( one index per sample ) and a custom transposon specific primer 4844 ( 5 ' - AATGATACGGCGACCACCGAGATCTACACTAGATCGCaacttcggaat aggaactaagg-3 ' ) to enrich for transposon insertion sites and allow for multiplexing sequencing ; the thermocycler program is 72 ˚C for 3 minutes , 98 ˚C for 30 seconds followed by 
+ 22 cycles of 98 ˚C for 10 seconds , 63 ˚C for 30 seconds and 72 ˚C for 1 minute . 
+ Each library was purified using Agencourt Ampure XP magnetic beads . 
+ Verification and quantification of 1 1 resulting libraries were calculated using a Qubit 2.0 Fluorometer , 2100 Bioanalyser ( Agilent 1 
+ Technologies ) and qPCR ( KAPA Biosciences ) . 
+ All libraries were pooled in equimolar to a final concentration of 3.2 nM and submitted for sequencing on the MiSeq platform at the Queensland Centre for Medical Genomics ( Institute for Molecular Bioscience , The University of Queensland ) . 
+ The MiSeq sequencer was loaded with 12 pM of pooled library with 5 % PhiX spike-in and sequenced ( single-end , 101 cycles ) using a mixture of standard Illumina sequencing primer and Tn5-specific sequencing primer 4845 ( 5 ' - actaaggaggatattcatatgga ccatggctaattcccatgtcAGATGTG-3 ' ) . 
+ A total of two MiSeq runs were performed to achieve sufficient read depth for analysis . 
+ All experiments were performed in duplicate . 
+ The TraDIS sequence data from this study was deposited on the Sequence Read Archive ( SRA ) under the Bio Project number PRJNA339173 ( http://www.ncbi.nlm.nih.gov/sra/SRP082245 ) . 
+ Analysis of TraDIS data
+ The raw , de-multiplexed fastq files from both MiSeq runs were combined and filtered to capture reads containing the 12-bp Tn5-specific barcode ( 5 ' - TATAAGAGACAG-3 ' ) , allowing for 2 mismatches ( fastx_barcode_splitter.pl , FASTX-Toolkit v. 0.0.13 ) . 
+ These reads were trimmed to remove the 12-bp barcode and 58-bp at the 3 ' end ( fastx_trimmer , FASTX-Toolkit v. 0.0.13 ) , resulting in high quality sequence reads of 30-bp in length that were mapped to the EC958 chromosome ( gb | HG941718 ) by Maq version 0.7.1 [ 90 ] . 
+ Subsequent analysis steps were carried out using an in-house Perl script as previously described [ 43 ] to calculate the number of unique insertion sites and the read count at each site for every gene and IGR . 
+ Statistical analysis
+ EC958 genes and IGRs associated with enhanced motility were identified by comparing their relative read abundance in the input and output pools using the Bioconductor package edgeR 
+ [ 91 ] as previously described [ 43 ] . 
+ Briefly , the read counts from each sample were loaded into the edgeR package ( version 2.6.12 ) using the R environment ( version 2.15.1 ) . 
+ The composition bias in each sequence library was normalized using the trimmed mean of M value ( TMM ) method [ 92 ] . 
+ The quantile-adjusted conditional maximum likelihood ( qCML ) for negative binomial models was then used to estimate dispersions ( biological variation between replicates ) and to perform exact tests for determining genes and IGRs with significantly lower read counts in the input pools compared to the output pools as previously described [ 93 , 94 ] . 
+ Stringent criteria of log fold - change ( logFC ) 5 and false discovery rate 0.001 were used to define a list of the most significant genes for further investigation by phenotypic assays . 
+ All other experimental data were analyzed using unpaired Students t-test and P-values 0.05 
+ Motility assay
+ To evaluate motility , 6 μl of an overnight culture prepared in LB broth was spotted onto the centre or the edge of a freshly prepared 0.25 % LB Bacto-agar plate ( n = 3 ) , supplemented with the appropriate inducer and/or antibiotic . 
+ Plates were incubated at 37 ˚C in a humid environment ( a closed box containing a dish of water ) and the rate of motility was determined by measuring the diameter of the motility zone over time . 
+ qRT-PCR was carried out essentially as previously described [ 14 ] . 
+ In brief , exponentially growing cells ( OD600 0.6 ) were stabilized with two-volumes of RNAprotect Bacteria Reagent ( Qiagen ) prior to RNA extraction using the RNeasy Mini Kit ( Qiagen ) followed by on-column DNase digestion . 
+ First-strand cDNA synthesis was performed using SuperScript III First-1 
+ Strand Synthesis System ( Invitrogen ) as per manufacturer 's recommendation . 
+ Real-time PCR was performed using SYBR Green PCR Master Mix ( Applied Biosystems ) on the ViiA 1 ™ 7 Real-Time PCR System ( Applied Biosystems ) using the following primers : flhD , primers 5613 ( 50-acttgcacagcgtctgattg ) and 5614 ( 50-agcttaaccatttgcggaag ) ; fliC , primers 5683 ( 50-caccaacct-gaacaacacca ) and 5684 ( 50-gcacggcgaatatccagttg ) . 
+ Transcript levels of each gene were normalized to gapA as the endogenous gene control ( primers 820 , 50-ggtgcgaagaaagtggttatgac and 821 , 50-ggccagcatatttgtcgaagttag ) . 
+ Gene expression levels were determined using the 2-ΔΔCT method with relative fold-difference expressed against EC958 . 
+ Protein preparation and western blotting
+ Whole cells lysates were prepared by pelleting 1 ml of an overnight culture diluted to an optical density at 600nm ( OD600 ) of 1.0 , and resuspending in 50 μl of distilled water plus 50 μl of 2x SDS loading buffer . 
+ SDS PAGE and transfer of proteins to a PVDF membrane for western blotting was performed as previously described [ 53 ] . 
+ Monospecific antiserum against H4 fla-gellin was purchased from the Statens Serum Institute , Denmark . 
+ OmpA antiserum was purchased from the Antibody Research Corporation , USA ( item # 111120 ) . 
+ Primary antibodies were detected with commercially purchased alkaline phosphatase-conjugated anti-rabbit antibody ( Sigma Aldrich ) . 
+ SIGMAFAST ™ BCIP1/NBT ( Sigma Aldrich ) was used as substrate for 
+ Supporting information
+ S1 Fig . 
+ TEM analysis demonstrating flagella expression for representative EC958 wildtype , mutant and complemented strains . 
+ S2 Fig . 
+ Overexpression of the flhDC master regulator genes in EC958 leads to enhanced motility . 
+ Left panel , motility phenotype expressed as the diameter of the swimming zone per hour for EC958 and EC958PcLflhDC . 
+ The data represents the mean and standard deviation from three independent experiments . 
+ Right panel , western blot analysis of cell lysates prepared from mid-log phase cultures of EC958 and EC958PcLflhDC probed with an antibody against 
+ S4 Fig . 
+ Motility phenotype of EC958 and EC958emrAB strains . 
+ Motility is expressed as the diameter of the swimming zone per hour for EC958 and EC958emrAB . 
+ The data represents the mean and standard deviation from three independent experiments . 
+ ( TIF ) 
+ S5 Fig . 
+ Growth of EC958 , EC958hemK and the complemented mutant EC958hemK ( pHemK ) . 
+ EC958hemK displayed a reduced growth rate compared to the wild-type and com plemented strains . 
+ ( TIF ) 
+ S6 Fig . 
+ Amino acid alignment of the translated sequences for EC958_1546 , EC958_1029 , EC958_3294 and EC958_0037 . 
+ Sequence alignments were performed using CLC main workbench 7.0.2 . 
+ Residues identical to EC958_1546 are indicated by dots ; gaps are indicated by dashed lines . 
+ S7 Fig . 
+ Motility phenotype of EC958 ( p1546 ) , EC958 ( p1029 ) , EC958 ( p3294 ) , EC958 ( p0037 ) and EC958 ( pSU2718 ) . 
+ Motility is expressed as the diameter of the swimming zone per hour for each strain . 
+ The data represents the mean and standard deviation from three independent experiments . 
+ S1 Table. Primers used in this study. (XLSX)
+ S2 Table. Summary of sequencing and mapping results of TraDIS runs. (XLSX)
+ S3 Table . 
+ Frequency of Proline residues in flagella-related proteins of EC958 and Salmo-nella enterica serovar Typhimurium strain UK-1 . 
+ ( XLSX ) 
+ Acknowledgments
+ This work was supported by a grant from the National Health and Medical Research Council ( NHMRC ) of Australia ( GNT1067455 ) . 
+ MAS is supported by an NHMRC Senior Research 
+ Writing – review & editing: AK MDP AWL SAB MAS.
+ Sanchez-Torres V, Hu H, Wood TK. GGDEF proteins YeaI, YedQ, and YfiN reduce early biofilm forma- tion and swimming motility in Escherichia coli. Appl Microbiol Biotechnol. 2011; 90(2):651–8. Epub 2010/12/25. https://doi.org/10.1007/s00253-010-3074-5 PMID: 21181144
+ Hung CC, Haines L, Altier C. The flagellar regulator fliT represses Salmonella pathogenicity island 1 through flhDC and fliZ. PLoS One. 2012; 7(3):e34220. Epub 2012/04/06. https://doi.org/10.1371/ journal.pone.0034220 PMID: 22479568
+ 53 . 
+ Ulett GC , Webb RI , Schembri MA . 
+ Antigen-43-mediated autoaggregation impairs motility in Escherichia coli . 
+ Microbiology . 
+ 2006 ; 152 ( Pt 7 ) :2101 -- 10 . 
+ https://doi.org/10.1099/mic.0.28607-0 PMID : 16804184 
+ 54 . 
+ Fahrner KA , Berg HC . 
+ Mutations that stimulate flhDC expression in Escherichia coli K-12 . 
+ J Bacteriol . 
+ 2015 ; 197 ( 19 ) :3087 -- 96 . 
+ Epub 2015/07/15 . 
+ https://doi.org/10.1128/JB.00455-15 PMID : 26170415 
+ 57 . 
+ Wang X , Wood TK . 
+ IS5 inserts upstream of the master motility operon flhDC in a quasi-Lamarckian way . 
+ ISME J. 2011 ; 5 ( 9 ) :1517 -- 25 . 
+ https://doi.org/10.1038/ismej.2011.27 PMID : 21390082 
+ 58 . 
+ Wilkinson SP , Grove A. Ligand-responsive transcriptional regulation by members of the MarR family of winged helix proteins . 
+ Curr Issues Mol Biol . 
+ 2006 ; 8 ( 1 ) :51 -- 62 . 
+ Epub 2006/02/03 . 
+ PMID : 16450885 
+ 59 . 
+ Ellison DW , Miller VL . 
+ Regulation of virulence by members of the MarR/SlyA family . 
+ Curr Opin Microbiol . 
+ 2006 ; 9 ( 2 ) :153 -- 9 . 
+ Epub 2006/03/15 . 
+ https://doi.org/10.1016/j.mib.2006.02.003 PMID : 16529980 
+ 60 . 
+ Lomovskaya O , Lewis K. Emr , an Escherichia coli locus for multidrug resistance . 
+ Proc Natl Acad Sci U S A. 1992 ; 89 ( 19 ) :8938 -- 42 . 
+ Epub 1992/10/01 . 
+ PMID : 1409590 
+ 61 . 
+ Lomovskaya O , Lewis K , Matin A. EmrR is a negative regulator of the Escherichia coli multidrug resistance pump EmrAB . 
+ J Bacteriol . 
+ 1995 ; 177 ( 9 ) :2328 -- 34 . 
+ Epub 1995/05/01 . 
+ PMID : 7730261 
+ 64 . 
+ Nakahigashi K , Kubo N , Narita S , Shimaoka T , Goto S , Oshima T , et al. . 
+ HemK , a class of protein methyl transferase with similarity to DNA methyl transferases , methylates polypeptide chain release factors , 
+ 67 . 
+ Soutourina OA , Bertin PN . 
+ Regulation cascade of flagellar expression in Gram-negative bacteria . 
+ FEMS Microbiol Rev. 2003 ; 27 ( 4 ) :505 -- 23 . 
+ Epub 2003/10/11 . 
+ PMID : 14550943 
+ 71 . 
+ Glick BR , Ganoza MC . 
+ Identification of a soluble protein that stimulates peptide bond synthesis . 
+ Proc Natl Acad Sci U S A. 1975 ; 72 ( 11 ) :4257 -- 60 . 
+ Epub 1975/11/01 . 
+ PMID : 1105576
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/28842878.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/28842878.txt 0 → 100644
View file @27818a9
+ Chapter 8
+ Chromatin ImmunoPrecipitation ( ChIP ) sequencing has become one of the most important methods for discovering the binding sites of NAPs and TFs on the DNA in vivo . 
+ In a ChIP experiment DNA and proteins are ﬁrst cross-linked to strengthen protein-DNA interactions . 
+ The cross-linked chromatin is then sheared within a size range of 200 -- 500 base pairs ( bp ) . 
+ Next , the protein of interest is immuno-precipitated using an appropriate antibody . 
+ The cross-links are reversed and the DNA obtained is used either for sequencing ( ChIP-seq ) or for hybridization on a microarray-based platform ( ChIP-chip ) . 
+ ChIP studies have been used to understand developmental processes and disease associations in eukaryotes [ 1 ] . 
+ The roles of DNA binding proteins in bacterial chromosome maintenance and gene regulation have also been uncovered using this method . 
+ One of the ﬁrst uses of the ChIP method for bacteria was the analysis of the genome-wide distribution of cAMP-receptor protein ( CRP ) on E. coli chromosome , which resulted in the suggestion that this GTF might in fact be a NAP [ 2 ] . 
+ Since then various groups have carried out experiments to determine the genome wide binding patterns of various NAPs and GTFs including , but not limited to , HNS , Fis , HU , IHF , FNR , Fur , and LRP [ 3 -- 7 ] . 
+ A certain degree of care must be taken while performing a ChIP experiment . 
+ The control usually is categorized broadly into two categories : ( a ) Input : the fragmented genomic sample extracted before immuno-precipitation ; ( b ) mock-IP : the sample treated without the antibody or with a nonspeciﬁc antibody such as IgG ( Immunoglobulin G ) . 
+ This article shares our experience performing ChIP-seq experiments with E. coli NAPs and GTFs , exploring the computational aspects of such studies . 
+ Hardware : Computer with installed UNIX , Linux or MAC OSX ( with xcode installed separately ) , with a minimum of 4 GB of RAM . 
+ Software : All software listed below are open source tools . 
+ Whereas some of these procedures make use of sophisticated algorithms including the Burrows-Wheeler procedure for rapidly aligning millions of reads to a reference sequence , many others can also be implemented efﬁciently using easy-to-write scripts in programming languages such as PERL or PYTHON . 
+ Install the following software : FastQC [ 8 ] , Cutadapt [ 9 ] , Burrows wheeler aligner ( BWA ) [ 10 ] , SAMtools [ 11 ] , Bedtools [ 12 ] , and MACS [ 13 ] . 
+ Install R [ 14 ] ( check the newest stable version ) and bioconductor packages such as Geneﬁlter [ 15 ] . 
+ UCSC archaeal genome browser [ 16 ] for visualization ( web only ) . 
+ MEME-ChIP [ 17 ] ( web only ) . 
+ Sample names : The ﬁlenames below assume paired end sequencing . 
+ l ChIP biological replicate 1 -- ChIP1.read1.fastq and ChIP1 . 
+ read2.fastq . 
+ l ChIP biological replicate 2 -- ChIP2.read1.fastq and ChIP2 . 
+ read2.fastq . 
+ l Input control replicate 1 -- Input1.read1.fastq and Input1 . 
+ read2.fastq . 
+ l Input control replicate 2 -- Input2.read1.fastq and Input2 . 
+ read2.fastq . 
+ 3 Methods
+ Install the software and packages mentioned in Subheading 2 . 
+ All of these are open source and the installation is straightforward . 
+ After obtaining the reads , check the quality using FastQC software . 
+ This tool gives the output in html format where you can see the sequence quality of the reads , sequence duplication , % GC content , and adapter contamination . 
+ Reads are aligned to the reference genome using BWA ( see Note 2 ) . 
+ To check only the mapped reads for further downstream analysis ¬ 
+ - c - count the number of occurrences , F - to remove , 0x40 ﬂag - unmapped reads . 
+ After checking for the number of reads , user can use the following command to work only with reads that are mapped . 
+ sort command sorts the output according to the user given option , o for output ﬁlename , n option to sort it according to the read name . 
+ where -- d option is for computing the coverage per base and ibam stands for input bam ﬁle . 
+ This step might take longer time to run . 
+ The . 
+ cov output ﬁle has three columns in which two columns are of interest : the second column with the base position and third with the coverage computed for that speciﬁc position . 
+ Model-based Analysis of ChIP-Sequencing ( MACS ) identiﬁes regions bound by a NAP/GTF/Histone modiﬁcation . 
+ The model assumes the read distribution to be Poisson and then performs three key steps to ﬁnd enrichment -- removal of redundant reads , adjustment of read position based on fragment size distribution , and calculation of peak enrichment using local background normalization [ 13 , 18 ] . 
+ MACS can be installed on local machine using the author 's instructions . 
+ We have used MACS2 version for our analysis purpose . 
+ There are several parameters that one has to consider before running MACS on dataset . 
+ $ macs2 callpeak - t ChIP1.bam ChIP2.bam - c Input1.bam Input2 . 
+ bam - f BAMPE - g 4.6 e7 - n output 
+ - c for input/mock data control . 
+ MACS can also work without this dataset . 
+ - f for the format of the input ﬁles . 
+ MACS takes several read formats including SAM , BAM , BED , ELAND . 
+ For paired end reads BAM and ELAND formats can be used by specifying it as BAMPE and ELANDMULTIPLET . 
+ If this option is not speciﬁed MACS by default will decide the format automatically ( see Note 6 ) . 
+ - p is value cutoff . 
+ If you do n't set this default will be 1e-5 . 
+ The output contains several ﬁles named ChIP1_peaks . 
+ bed , ChIP1_peaks . 
+ xls , ChIP1_summits . 
+ bed etc. . 
+ ChIP1_peaks . 
+ bed has the start and end of the genomic coordinates of the putative binding sites . 
+ The fourth column corresponds to the name of the ﬁle and ﬁfth is the - log10 ( q value ) also seen in the ChIP1_peaks . 
+ bed . 
+ The log2 fold-change cutoff is 1.2 and greater ( see Note 7 ) . 
+ Peak visualization in the UCSC genome browser gives detailed information on whether peaks are clustered in speciﬁc regions of chromosome , evolutionary conservation with other organisms , gene annotation tracks ( refseq ) to name a few ( Fig. 3 ) . 
+ One can also combine different NAP peak ﬁles into one ﬁle and view the differences and similarity in the same window . 
+ One of the key questions in the gene regulation ﬁeld is whether the binding of NAP/GTF on a regulatory region of a gene can explain the regulation of expression of that speciﬁc gene . 
+ GTFs/NAPs bind to various regions on the chromosome . 
+ But , only those peaks which are present in the regulatory regions of the chromosome are likely to inﬂuence gene expression directly . 
+ One point to note here is that the regulation of gene expression is not straightforward , as there is increasing evidence of combinatorial regulation by several GTFs / NAPs ; hence , readers must be cautious before interpreting these results . 
+ We already know from the extensive gene-centric studies of gene regulation and transcription initiation in E. coli that binding of activators and repressors starts from ~ 150 bp upstream till the transcription start site [ 19 -- 21 ] . 
+ To probe the role of NAP/GTF follow the below instructions . 
+ 4 Notes
+ Following this , the user obtains a tab-delimited ﬁle with geno-mic regions ( list of operons ) which are bound in their respective regulatory region by the NAP/GTF and are differentially expressed in mutant NAP/GTF background . 
+ This indicates whether the binding effect of the NAP/TF on the gene expression is direct or indirect . 
+ Based on the position of binding from the transcription start site , user can also predict whether the GTF is an activator or repressor ; for this one will presumably require a more precise binding site identiﬁcation than is permitted by the resolution of the ChIP , something that can be obtained by combining ChIP peaks with motif identiﬁcation , or by using higher-resolution experimental techniques such as ChIP-exo .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/28902868.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/28902868.txt 0 → 100644
View file @27818a9
+ Discovery of numerous novel small genes in
+ Deutsche Forschungsgemeinschaft DFG ( SCHE316/3 -2 , KE740/13 -2 ) within SPP 1395 InKoMBio to S.M.H. , Z.A. , S.S. , and K.N. , by Alexander von Humboldt foundation through the German Ministry for Research and Education ( Bundesministerium für Bildung und Forschung 
+ Abstract the intergenic regions of the Escherichia coli 
+ 1 Chair for Microbial Ecology , Technische Universität München , Freising , Germany , 2 ZIEL - Institute for Food & Health , Technische Universität München , Freising , Germany , 3 Department of Informatics -- 
+ Bioinformatics & TUM-IAS , Technische Universität München , Garching , Germany , 4 Research Unit Environmental Genomics , Helmholtz Zentrum München , Neuherberg , Germany , 5 Sackler Institute for 
+ Comparative Genomics , American Museum of Natural History New York , New York , United States of America , 6 Core Facility Microbiome/NGS , ZIEL - Institute for Food & Health , Technische Universität 
+ In the past , short protein-coding genes were often disregarded by genome annotation pipe-lines . 
+ Transcriptome sequencing ( RNAseq ) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription . 
+ Therefore , in addition to the transcriptome , the translatome ( RIBOseq ) of the enteric pathogen Escherichia coli O157 : 
+ H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure . 
+ All intergenic open reading frames potentially encoding a protein of 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value . 
+ This led to discovery of 465 unique , putative novel genes not yet annotated in this E. coli strain , which are evenly distributed over both DNA strands of the genome . 
+ For 255 of the novel genes , annotated homologs in other bacteria were found , and a machine-learning algorithm , trained on small protein-coding E. coli genes , predicted that 89 % of these translated open reading frames represent bona fide genes . 
+ The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain . 
+ All three groups turned out to be similar with respect to their translatability distribution , fractions of differentially regulated genes , secondary structure composition , and the distribution of evolutionary constraint , suggesting that both novel groups represent legitimate genes . 
+ However , the machine-learn - 
+ Introduction
+ The pathogenic E. coli strain O157 : H7 Sakai ( EHEC ) was first isolated in 1996 from an outbreak in Japan [ 1 ] . 
+ When contaminated food is consumed , EHEC can cause bloody diarrhea and the disease may progress to the life-threatening hemolytic uremic syndrome [ 2 ] . 
+ In addition to humans [ 3 ] and contaminated food , EHEC persists in many environments , such as soil [ 4 ] , plants [ 5 ] , invertebrates [ 6 ] , and cattle [ 7 ] . 
+ These environments represent various challenges requiring expression of a different set of bacterial genes [ 8 ] . 
+ Since there is no vaccination or targeted therapy available [ 9 ] , it is important to better understand the biology of this 
+ In contrast to eukaryotic genomes, bacterial genomes are densely covered with annotated
+ protein-coding genes , e.g. , 88.1 % of the EHEC Sakai genome consists of protein-coding genes according to the most recent genome annotation [ 1 ] . 
+ Nevertheless , it is still possible that intergenic regions harbor overlooked short genes [ 10 , 11 ] . 
+ After sequencing a bacterial genome , bioinformatics tools , such as GLIMMER [ 12 ] or RAST [ 13 ] are used for gene prediction and annotation . 
+ Especially for short genes , these tools are biased in that open reading frames ( ORFs ) shorter than 150 bp are often rejected [ 14 ] and in some cases are not even permitted for database entry [ 15 ] . 
+ Thus , the sensitivity of automated annotation processes in predicting short genes is quite low [ 16 ] . 
+ Additionally , the experimental detection of small proteins in proteome studies is difficult : Many small proteins are lost during proteome purification and many more are not detectable by classic mass spectrometry , because they do not produce enough tryptic peptides of the proper size [ 17 ] . 
+ Therefore , small proteins have been largely ignored in the past and our knowledge of their structures and functions is very limited [ 15 ] . 
+ Although small proteins have recently come more into focus [ 18 , 19 ] , the majority of them still belong to the ` dark proteome ' lacking known folds or domains , thus rendering putative functional the expression status of genomes without any restriction to gene length . 
+ RNAseq strand-specifi-cally determines the global transcriptome and widespread transcription outside of annotated genes has become increasingly obvious [ 22 -- 25 ] . 
+ In the past , these transcription signals were generally interpreted as ncRNAs [ 26 , 27 ] or just pervasive transcription without any biological significance [ 28 -- 30 ] . 
+ However , ribosomal footprinting ( RIBOseq ) can be used to determine the coverage of RNA with ribosomes , indicating translation into a peptide of the associated RNA , thus , facilitating the global investigation of the translatome [ 31 , 32 ] . 
+ Even more , RIBOseq reads usually show a triplet periodicity reflecting the codon-wise movement of the ribosome during the translation process [ 31 , 33 ] . 
+ Combining ribosomal footprinting with RNAseq allows estimation of the translatability of an ORF , expressed by the ribosomal coverage value ( RCV ) , which is the ratio of the reads per kilobase ( of gene ) per million sequenced reads ( RPKM ) value for the translatome over the RPKM value for the transcriptome . 
+ The RCV can be used to distinguish ncRNA from translated mRNA , and RIBOseq allows the discovery of many non-annotated short translated ORFs [ 33 -- 39 ] . 
+ In bacteria , RIBOseq is less frequently applied . 
+ However , Baek et al. [ 40 ] recently reported 130 novel short genes in Salmonella , the smallest gene encoding a peptide of only 7 amino acids ( AA ) . 
+ The translatome of EHEC strain EDL933 under a single growth condition yielded 72 novel genes encoded in intergenic regions , 95 % of them encoding 
+ In this study, RIBOseq and RNAseq analysis of E. coli O157:H7 Sakai was compared at
+ three different growth conditions to identify translated ORFs in the intergenic regions . 
+ The resulting candidates for novel genes were further characterized using bioinformatics analysis . 
+ Results
+ Translatome signals of putative novel genes
+ The transcriptome and the translatome of EHEC Sakai were determined at three different growth conditions . 
+ Two standard lab conditions ( lysogeny broth ( LB ) at 37 ˚C ; Brain-heart-infusion ( BHI ) at 37 ˚C ) and combined cold and osmotic stress ( COS ; BHI supplemented with 4 % NaCl at 14 ˚C ) in two biological replicates each . 
+ Details about total read number and amount of rRNA , tRNA , and mRNA are listed in S1 Table . 
+ All intergenic ORFs of at least 30 AA length were considered as potentially encoding a protein if significant RIBOseq signals were found . 
+ A RIBOseq signal was assumed significant at a threshold of at least 1 RPKM , at least 50 % ORF coverage , and an RCV of at least 0.25 . 
+ This analysis resulted in 1271 potentially translated intergenic ORFs , which were manually examined for the following additional criteria before consideration as candidate genes . 
+ First , ORFs with identical sequences to others were removed . 
+ Next , every ORF with its mapped RIBOseq reads was visualized in the Artemis viewer [ 41 ] . 
+ False positives were assumed if the signal could have been caused by neighboring annotated genes and not by the putative ORF of interest and , as such were excluded . 
+ In the case of same-strand overlapping ORFs in different reading frames , the ORF with the better fit to the RIBOseq signal was selected . 
+ After individual inspection in which 806 candidates were excluded , we arrived at a conservative estimate of 465 intergenic ORFs , which were considered to show convincing evidence of translation in the RIBOseq experiments . 
+ The novel putative genes were consecutively numbered in the order they appear in the EHEC genome ( XECs001 ¬ 
+ XECs465 ) . 
+ The novel genes were approximately uniformly distributed within the whole genome , occurring on both strands of the chromosome ( Fig 1 ) . 
+ Details about position on the genome , length , RPKM value , coverage , and RCV of all novel genes are found in S2 Table . 
+ Two-hundred-eleven ( 211 ) novel genes show translation at both optimal growth conditions 
+ ( LB and BHI at 37 ˚C ) , 210 novel genes are detected in LB only , and four are detected in BHI control only . 
+ RIBOseq signals of 32 novel genes are shared under all three conditions but no gene fulfills the criteria for candidate gene inclusion in BHI COS only ( Fig 2 and S2 Table ) . 
+ One example of a translated intergenic ORF for each growth condition is visualized in Fig 3 . 
+ The three novel gene candidates depicted are clearly covered by RIBOseq reads over their entire length and it is considered highly unlikely that the translation signals are caused by neighboring annotated genes . 
+ Additionally , the novel genes show sufficient RCVs of 0.51 ( XECs135 ) , 0.58 ( XECs029 ) and 0.29 ( XECs459 ) , confirming translation . 
+ Annotated homologs of novel genes
+ The amino acid sequences of the novel genes were used as a query to find annotated homologous proteins in other bacteria with blastp using default parameters against the RefSeq data-base . 
+ With an e-value threshold of 10 , 55 % of the putative proteins encoded in the novel − 3 genes match an annotated homolog ( Table 1 ) . 
+ When a more stringent e-value threshold of 10 was applied , 42 % of novel genes still possess annotated homologs . 
+ The hits with the − 10 lowest e-value for each novel gene are listed in S3 Table . 
+ Interestingly , 34 of the novel genes are annotated in other E. coli O157 : H7 strains , of which twelve were found in the EHEC strain 
+ EDL933 [ 42 ] , which is the closest relative to strain Sakai used in this study . 
+ Additionally , eleven of the novel genes detected in the intergenic regions of EHEC EDL933 in a previous study [ 11 ] 
+ Based on the blastp analysis with an e-value threshold of 10 , the 465 novel genes were −3
+ divided into two groups : one group of 255 ORFs , which have annotated homologs in other bacteria ( ` with annotated homolog ' ) , and a second group of 210 ORFs for which no annotated homologs were found in the database ( ` without annotated homolog ' ) . 
+ Furthermore , the 250 shortest annotated genes of EHEC Sakai with an RCV of at least 0.25 in LB ( S2 Table and S4 the novel genes ( mean 172 bp ) . 
+ The novel genes without annotated homologs being the shortest , with a mean length of 127 bp ( Table 1 ) . 
+ More than 50 % of the latter group would encode a protein of just 30 -- 39 AA ( Fig 4A ) . 
+ However , the largest novel gene would encode a protein of 425 AA . 
+ For the three groups , the RCV distribution is shown for LB in Fig 4B . 
+ All groups show a comparable pattern : the majority of genes have a moderate translatability and a subset of genes is translated with high efficiency . 
+ Growth in BHI control and in BHI COS also yield RCV distributions which are similar among the three gene groups ( S1 Fig ) . 
+ Overall , translatability is somewhat decreased under BHI control , but there is a massive decline of translatability under BHI COS condition ( Table 1 ) . 
+ However , the decline is in a similar range for all three groups and attributable to the stress condition . 
+ Sequence conservation
+ A tblastn search for non-annotated homologs of the novel genes in other organisms , using the RefSeq genomic database , shows high conservation levels within the Escherichia genus and often more widely ( Fig 5 ) . 
+ Six novel genes with annotated homologs ( blastp ) and three putative novel genes without annotated homologs did not have tblastn hits . 
+ Thus , 249 and 207 genes with unique sequences are shown in Fig 5A and 5B , respectively . 
+ The novel genes with annotated homologs ( blastp ) show more unannotated homologs ( tblastn ) with greater average evolutionary distance and AA similarity compared to those novel genes without annotated homologs ( blastp ) . 
+ A two-tailed t-test comparing the maximum distance of intact homologs ( tblastn ) for the novel genes with and without annotated homologs ( blastp ) gives a p-value of p = 0.002 . 
+ Thus , the maximum evolutionary distance of the homologs found using tblastn is significantly different for both groups ( i.e. , genes with and without annotated homologs using blastp ) . 
+ There is some evidence for horizontal gene transfer of some ORFs , with highly similar sequences found in distant bacterial genera , and even eukaryotes , for instance multiple matches between XECs029 and Drosophila genomes . 
+ The sequences in the RefSeq database might be misidentified . 
+ However , the phenomenon of transfer of bacterial genome regions to arthropods has been described [ 43 ] . 
+ Intergenic sequences upstream and downstream of the novel genes were analyzed as above . 
+ As expected , sequence similarity is less preserved in the upstream and downstream regions when compared to the ORF-sequence of the novel genes ( S2 Fig ) . 
+ For intact homologs ( i.e. , no stop codon ) of the novel genes , the average sequence similarity for intact tblastn hits outside of the Escherichia/Shigella genera is 69 % ( S5 Table ) . 
+ Average sequence similarity for all homologs of the sequences upstream and downstream of the novel genes is lower , at 47 % ( S2 Fig ) . 
+ Triplet periodicity of the RIBOseq signal
+ A characteristic of RIBOseq data , at least from eukaryotes , is that the reads show a triplet periodicity reflecting the codon-wise translation by the ribosome [ 31 ] . 
+ Thus , the codon positions of 5 ' ends of all RIBOseq reads with read length 20 bp were determined in the sum signal of all annotated genes and of the novel genes with and without annotated homologs . 
+ Indeed , the annotated genes and the novel genes with annotated homologs show a reading frame signal at codon position two for all investigated growth conditions ( Fig 6 ) . 
+ However , the signal is weak and the novel genes without annotated homologs only show a reading frame at codon position two when grown in BHI COS. shortest length . 
+ ( B ) The translatability expressed by the ribosomal coverage value ( RCV ) when growing in LB . 
+ The RCV was binned into ten groups . 
+ All three gene categories show a similar RCV distribution . 
+ Differential regulation of the novel genes
+ Differential expression at transcriptional and translational levels between growth conditions indicates regulation of gene expression , which implies functionality . 
+ Therefore , we investigated the novel genes for significantly changed transcription and translation using BHI control as the reference condition in comparison to LB and BHI COS. . 
+ In addition , the 250 shortest annotated genes were analyzed as a control group . 
+ Comparing growth in BHI and LB medium at 37 ˚C showed that about one third of the genes in each group is differentially expressed higher . 
+ For all groups , downregulation in LB is more frequent than upregulation . 
+ Downregulation occurs more often at the transcriptional level , whereas for upregulation translational changes are more frequent ( Fig 7B ) . 
+ Fold changes , p-values and false discovery rates determined with edgeR [ 44 ] for all significantly regulated genes are listed in S6 Table . 
+ When the two BHI conditions are compared , even more genes show differential regulation . 
+ For example , the novel gene XECs197 is clearly expressed at the control condition , but transcription and translation are almost switched off at BHI COS ( Fig 7C ) . 
+ For the short annotated genes , 40 % are regulated , but for the novel genes without annotated homologs and the novel genes with annotated homologs 81 % and 82.4 % are differentially expressed , respectively 
+ BHI COS, where translational regulation clearly dominates.
+ Bioinformatics analyses
+ Predicted protein characteristics . 
+ The software PredictProtein [ 45 , 46 ] predicts many parameters of an amino acid sequence including composition , secondary structure , protein localization , disordered regions , as well as the number of DNA/RNA binding sites , disulfide bonds and transmembrane helices . 
+ Prediction of secondary structures is very similar for the three groups ( Fig 8A ) . 
+ The proteins mainly fold into α-helices and loops , β-sheet-like structures are less common . 
+ Concerning disordered regions , the three groups contain a similar average portion of disorder of about 20 % regarding the UCON prediction [ 47 ] ( S8 Table and 
+ S9 Table ) . 
+ Forty-four ( 9.5 % ) novel genes show evidence of transmembrane helices ( Fig 8B ) . 
+ The proportion of short annotated genes with predicted transmembrane helices is higher 
+ ( 18 % ) . 
+ Novel genes with annotated homologs also more often contain a transmembrane helix than do novel genes without annotated homologs ( 12.9 % compared to 5.2 % , respectively ) . 
+ For the number of predicted disulfide bonds an opposite picture was obtained . 
+ The novel genes without annotated homologs more often have one or more disulfide bonds predicted , followed by the novel genes with annotated homologs , but 90 % of the short annotated genes seem not to contain any disulfide bond ( Fig 8C ) . 
+ The localization of the putative proteins was also predicted : 34 putative novel proteins should localize in the inner or outer membrane , while surprisingly , 85 % are predicted to be secreted ( Fig 8D ) . 
+ Whereas the localization prediction of the novel genes with and without annotated homologs is similar , the result for the short annotated genes is slightly different : Many of them should still be secreted ( 45 % ) , but the number predicted to be cytoplasmic and inner membrane proteins is higher . 
+ Further details and additional properties of the novel genes and the short annotated genes are listed in S8 Table and S9 mentioned parameters were also predicted for a number of short annotated proteins of the majority of 5 ' ends at position two for every condition . 
+ The novel genes without annotated homologs only show a reading frame at codon position two at the condition BHI + 4 % NaCl at 14 ˚C . 
+ Escherichia coli O157 : H7 EDL933 to obtain a positive control set . 
+ As a negative control set , these natural proteins were scrambled ( for each positive control sequence , 100 randomly scrambled sequences were used ) and submitted for PredictProtein analysis . 
+ A machine-learn-ing algorithm was trained on the positive and negative control sets to distinguish between ` real ' protein sequences and scrambled ones ( ` pseudo ' ) [ 11 ] . 
+ This algorithm was used to investigate the 465 translated ORFs found in this study ( S3 Table ) and the 250 short annotated genes of EHEC Sakai ( S4 Table ) . 
+ Again , every amino acid sequence was scrambled 10-times as a negative control . 
+ As expected , the algorithm recognized 99.4 % of the scrambled proteins as ` pseudo ' and 99.2 % of the short annotated genes as ` real ' based on predicted parameters of those sequences . 
+ Overall , 50 % of the novel genes were recognized as ` real ' . 
+ However , the presence of an annotated homolog ( found via blastp ) correlates well with being predicted as ` real ' by the machine-learning algorithm and vice versa ( Table 1 , S10 Table ) . 
+ Only five novel genes without annotated homologs were recognized by machine-learning algorithm as ` real ' proteins . 
+ Conversely , 29 novel genes with annotated homologs were predicted as ` pseudo ' proteins 
+ Promoter and terminator prediction . 
+ A promoter is required to initiate transcription of an ORF and is recognized by the σ-factor of the RNA polymerase holoenzyme . 
+ The housekeeping σ-factor in E. coli is σ70 ( reviewed in [ 48 ] ) . 
+ Therefore , σ70 promoter sequences were searched in the regions 300 bp upstream of putative start codons of the novel genes using BPROM . 
+ Interestingly , all novel genes without annotated homologs have a predicted promoter in their upstream region and in the upstream regions for the novel genes with annotated homologs a promoter sequence appears to be present in 95 % of the cases ( Table 1 and S3 slightly shorter . 
+ The LDF score is a measure of the promoter strength and a promoter is considered active with an LDF score of at least 0.2 . 
+ The average LDF score of the predicted promoters for the three gene groups is similar : 3.43 for the short annotated genes , 3.44 for the novel genes with annotated homologs and 3.86 for the novel genes without annotated homo-bp downstream of the stop codon was investigated using FindTerm . 
+ For 20.8 % of the novel genes with annotated homologs a terminator was predicted . 
+ For those without annotated 
+ Shine-Dalgarno sequence and start codons. The presence of a Shine-Dalgarno (SD)
+ sequence upstream of the start codon promotes efficient translation initiation [ 50 ] . 
+ The consensus SD motif for E. coli is uaAGGAGGu and base pairing of this sequence with the anti-SD of the 16S rRNA results in a free energy of ΔG˚ -9.6 [ 51 ] . 
+ Within the region 30 bp upstream of the start codons 41 % of the novel genes with annotated homologs and 35.2 % without annotated homologs have a SD sequence ( Table 1 ) . 
+ A high proportion of the annotated genes have a SD sequence ( 80 % ) . 
+ Additionally , the average free energy of the SD is lower for the annotated genes ( -5.17 compared to -4.61 and -4.47 , respectively ) . 
+ The upstream regions of XECs059 
+ ( novel gene with annotated homolog ) and XECs428 ( novel gene without annotated homolog ) contain a perfect SD sequence ( S3 Table ) . 
+ ATG is the most common start codon , but also GTG , TTG , and the rare start codons CTG , ATT , ATA , and ATC can initiate translation in E. coli [ 52 ] . 
+ Genome annotation algorithms only search for the three most common start codons ( ATG , GTG , and TTG , respectively ) [ 12 ] and in accordance with this , the group of the annotated genes shows for 90 % of genes an ATG start codon , for 7.2 % a GTG start codon , and for 2.8 % a TTG start codon , whereas rare start codons are not present at all . 
+ In case of the novel genes , the real start codon is unknown . 
+ Because of that the potential start codon farthest upstream of the coding region , but within the transcriptome signal , was chosen no matter whether it was a frequent or rare start codon . 
+ Therefore , only 42 % of the novel genes with annotated homologs and 32.8 % of the novel genes without annotated homologs start with either ATG , GTG , or TTG . 
+ All other genes , putatively , have rare start codons . 
+ However , it can not be excluded that some of these genes possess an ATG , GTG , or TTG start codon further downstream of the open reading frame . 
+ Evolutionary sequence analysis of novel genes . 
+ The rates of non-synonymous ( amino acid changing ) and synonymous ( not amino acid changing ) substitutions per site , kA and kS respectively , reflect the evolutionary processes underlying the divergence of related genes . 
+ In the absence of selection , it is expected that kA kS , indicating neutrality . 
+ On the other hand , when purifying selection acts to eliminate disadvantageous mutations , the fact that most fit-ness-altering mutations are nonsynonymous implies that selection will disproportionately slow the rate of divergence at nonsynonymous sites , leading to kA < kS . 
+ On the other hand , when positive selection acts to promote advantageous mutations , this will disproportionately increase the rate of divergence at non-synonymous sites , leading to kA > kS . 
+ Although intergenic junk sequences are expected to evolve neutrally , functional genes can also exhibit kA kS because of near-neutrality or a balance between positive and purifying selective forces . 
+ We reasoned that only functional protein-coding sequences would show significant signs of positive or negative selection and , based on the hypothesis that our novel genes are functional , we predicted that the proportion of genes exhibiting significant signatures of selection should be 
+ To test this hypothesis, the most distant homologous sequences matching the genes, with
+ 100 % coverage and no gaps , were identified using tblastn . 
+ Due to the short size of most of the genes , many sequences were too similar for a kA/kS comparison , leaving 175 of 250 annotated genes , 153 of 255 novel genes with annotated homologs , and 116 of 210 novel genes without annotated homologs available for analysis ( S3 Table and S4 Table ) . 
+ Of these remaining genes , 
+ 12 ( 4.8 % ) , 12 ( 4.7 % ) , and 5 ( 2.4 % ) genes showed significant selection in the three respective classes using a Holm-Bonferroni multiple comparisons procedure , which was not a significant difference between classes ( p = 0.335 , Fisher 's Exact Test ) . 
+ However , only annotated genes exhibited any genes under significant positive selection ( 5 genes ) , which was a significant difference among classes ( p = 0.001 , Fisher 's Exact Test ; Table 1 ) . 
+ Discussion
+ RIBOseq is a powerful tool to detect translated mRNA
+ Ribosomal footprinting has been used to detect translation of non-annotated ORFs previously . 
+ In eukaryotes , hundreds of non-annotated ORFs show evidence of translation , e.g. , in yeast [ 53 ] , in Drosophila [ 54 ] , in zebrafish [ 34 ] , in Arabidopsis [ 37 ] , and even in humans [ 55 ] . 
+ Additionally , the translation of previously annotated ncRNAs was reported frequently [ 36 , 39 , 56 ] . 
+ In bacteria , 130 novel genes were detected in Salmonella [ 40 ] and 72 novel genes were detected in EHEC strain EDL933 [ 11 ] . 
+ For the latter strain , translation is also reported for a number of RNAs that were previously classified as ncRNA . 
+ For instance , the ncRNA ryhB encodes a nonamer peptide RyhP [ 39 ] . 
+ Although it was not the focus of their study , Jeong et al. [ 57 ] report translation signals for 31 annotated ncRNAs in Streptomyces coelicolor . 
+ Even the well-studied λ phage with a very small genome of 48.5 kB shows translation of 50 non-annotated ORFs [ 58 ] . 
+ RIBOseq experiments with eukaryotes allow reading frame determination for individual genes [ 33 , 37 , 38 ] . 
+ The reading frame resolution of prokaryotic RIBOseq data is lower such that we can not determine a reading frame in the RIBOseq signal of single ORFs . 
+ This may be caused by bacterial ribosomes being more flexible and incorporating changing numbers of mRNA nucleotides [ 59 ] . 
+ In addition , the RIBOseq method , formerly developed for eukaryotes , has been adapted for bacteria and footprints of more variable read length are obtained [ 60 ] . 
+ Furthermore , the composition of ribosomal proteins and rRNAs can be heterogeneous dependent on the growth condition ; especially at stress conditions , specialized ribosomes are responsible for the translation of a subset of mRNAs [ 61 , 62 ] . 
+ Putatively , the specialized ribosomes protect an mRNA stretch of deviating length . 
+ Recent findings indicate that the usage of a translational inhibitor influences ribosome conformation , which weakens the reading frame signal 
+ [ 63 ] . 
+ For instance chloramphenicol , as used in this study , preferentially arrests translation at positions encoding alanine , serine , or threonine [ 64 ] which dilutes the triplet signal . 
+ Also , the choice of the ribonuclease used for digestion of mRNA not protected by ribosomes influences RIBOseq results [ 65 ] . 
+ To minimize the influence of any sequence specificity for a single RNase , we applied a mixture of five RNases ( RNase I , MNase , XRN-1 , RNase R , and RNase T ) . 
+ Here , we show a reading frame in the sum signal for all genes for the first time in bacteria using conventional RIBOseq . 
+ Very recently , the addition of the endonuclease RelE to the ribosome prep-aration has been reported to improve reading frame determination . 
+ The RelE toxin cuts the mRNA within the ribosome very precisely at a specific position in the codon [ 66 ] . 
+ However , as shown in Fig 6 , under our three conditions a reading frame in the sum signal can be extracted from the data , at least for the group of novel genes that have annotated homologs in other bacterial strains or species . 
+ RIBOseq based evidence for translation of 465 intergenic ORFs
+ In this study , 465 intergenic ORFs have been detected , which show a clear RIBOseq signal ( S2 Table ) . 
+ The average size of the novel-gene encoded proteins is only 50 AA . 
+ Standard genome annotation algorithms do usually not predict such very short genes or proteins [ 14 , 16 ] . 
+ In this study , an arbitrary size minimum of 30 AA was applied to restrict the number of ORFs to be investigated and to reduce the possibility of false positives , but even smaller peptides can be functional [ 39 , 40 ] . 
+ Knowledge about the functions of small proteins in bacteria is limited , but small proteins have recently achieved attention ( reviewed in [ 15 , 18 ] ) . 
+ For instance , Baumgartner et al. [ 67 ] confirmed five small proteins in Synechocystis by Western blot . 
+ Neuhaus et al. [ 11 ] detected 72 novel small genes in the intergenic regions of the E. coli strain EDL933 by evaluating RNAseq and RIBOseq data of a single growth condition ( LB , 37 ˚C ) . 
+ Compared to their work , this study on a different EHEC strain achieves a higher sequencing depth and two additional growth conditions including severe stress were investigated . 
+ Moreover , translated ORFs were not only selected by an RPKM-value threshold , but further conservative thresholds for coverage and RCV were applied . 
+ Translation of eleven novel small genes found in EHEC EDL933 by Neuhaus et al. [ 11 ] is present in EHEC Sakai and twelve translated ORFs of EHEC 
+ Sakai are annotated proteins in EDL933 . 
+ Vice versa , 28 of the 72 novel EDL933 genes are annotated proteins in strain Sakai . 
+ The 255 translated ORFs with annotated homologs most likely represent protein-coding genes 
+ Blastp analysis revealed that a group of 255 out of the 465 novel ORFs with a clear RIBOseq signal found in this work , have annotated homologs in other bacteria . 
+ In addition , many of these 255 genes display predicted protein structures ( Fig 8 ) , as well as σ70 promoters , and in some cases ρ-independent terminators and SD sites , like annotated short proteins . 
+ Even ORFs without these predicted extra features can encode proteins , because those genes could be part of an operon , the promoter could be recognized by an alternative σ-factor [ 68 ] , termination could be ρ-dependent [ 69 ] , and translation of leaderless mRNAs occurs [ 70 ] . 
+ Overall , these novel genes behave similarly in all parameters investigated when compared to 250 short annotated genes of EHEC Sakai . 
+ Both gene groups are transcribed and translated at the same magnitude and the RCV distributions of all growth conditions are comparable . 
+ A similar fraction of genes is differentially transcribed and/or translated , when BHI control is compared to BHI 
+ COS or LB . 
+ Even the directions of up/down regulation compare well ( Fig 7 ) . 
+ Additionally , active translation is supported by the presence of a reading frame on codon position two for every growth condition in the sum signal caused by codon-wise progression of the ribosome . 
+ Furthermore , a machine-learning algorithm trained with short annotated proteins of EHEC 
+ EDL933 predicted 88.6 % of these genes with annotated homologs as being ` real ' proteins . 
+ Finally , there is no significant difference between the number of genes under selection in this class as compared to either annotated genes or novel genes without annotated homologs . 
+ However , unlike annotated genes , for which the majority of selected genes showed evidence of positive selection , all selected genes in this class were under purifying selection . 
+ This is not unexpected under the hypothesis of functionality , because purifying selection is the most common form of selection in nature [ 52 ] , and because this result was obtained despite choosing the most distant homolog . 
+ However , it is also likely that ascertainment bias plays a role in this result , as it is probable that more emphasis has historically been placed on the annotation of genes which are shared by more distantly related organisms . 
+ This would especially be true if many of the novel genes we identified are orphan genes , since such genes lack distantly related homologs by definition . 
+ Therefore , we conclude that these 255 translated intergenic ORFs 
+ Unusual features of the 210 novel genes without annotated homologs 
+ A second group , 210 out of 465 novel genes , had no annotated homologs when using blastp . 
+ However , homologs in other bacteria may be present but were missed during annotation of these genomes due to their unusual features . 
+ Indeed , a tblastn search confirmed that many non-annotated homologs in the Escherichia genus and , in some cases , in farther related species as well , exist ( Fig 5B ) . 
+ The majority of these ORFs were not classified by the machine-learning algorithm to encode ` real ' proteins . 
+ This appears to be more significant and raises the question whether these ORFs indeed code for proteins . 
+ The following analysis is based on a comparison between three groups : ( i ) 250 annotated small genes , ( ii ) 255 novel small genes with annotated homologs and ( iii ) the group of 210 ORFs without annotated homologs , which may or may not code for proteins ( Table 1 ) . 
+ Several arguments support the hypothesis that these ORFs are functional and not residues due to pervasive transcription [ 29 ] : first , their expression obviously does not lead to a fitness disadvantage , as in misfolded proteins , which are cytotoxic [ 71 ] . 
+ Second , a promoter is present upstream of all 210 ORFs , and thirdly , the same fraction of these ORFs is differentially transcribed , compared to both control groups ( i ) and ( ii ) ( Fig 7 ) . 
+ However , these data would fit the hypothesis either that these ORFs represent ncRNA or that they are protein-coding genes . 
+ The following observations are in favor of the hypothesis that these novel ORFs are protein-coding genes and not ncRNAs : most significantly , RIBOseq signals , and hence significant RCVs , are in the same order of magnitude as those of short annotated genes , many ORFs without homologs are differentially regulated at the translational level , SD sequences are present upstream of one third of the ORFs , and the number of predicted protein structures is very similar to that of annotated protein-coding genes . 
+ Finally , a similar proportion of genes appear to be under selection as among the annotated genes and novel genes with annotated homologs , with the caveat that ascertainment bias has likely favored the detection of genes under purifying selection . 
+ Why , then , does the machine-learning algorithm not recognize these ORFs as protein-cod-ing genes ? 
+ A first explanation is that the algorithm will only predict sequences as ` real ' , which are within the known parameter space of the training set . 
+ Proteins of unknown structure and folds may reside outside the parameter space of ` established ' proteins and , thus , will fail to be classified as ` real ' and inevitability binned as ` pseudo ' . 
+ The majority of all established proteins belong to a protein family with known secondary structure or which contains characterized domains . 
+ But 25 % of all protein sequences do not match to any family and , therefore , belong to the ` dark proteome ' [ 72 ] . 
+ In prokaryotes , 13 % of all proteins are ` dark ' [ 20 ] . 
+ Their properties are different when compared to known proteins : They are shorter , they are often secreted , contain more disulfide bonds , have a lower evolutionary reuse [ 20 ] , are more disordered , have a different hydrophobic amino acid topology , and have a higher energy [ 21 ] . 
+ Many of these properties fit well with the PredictProtein data of the proteins encoded by the novel genes without annotated homologs : accordingly , the majority of putative proteins without annotated homologs are very short , are predicted to be secreted , and more often contain disulfide bonds . 
+ Thus , these properties render it unlikely that the machine-learning algorithm will predict these unusual proteins correctly . 
+ A second possibility is that the novel genes without annotated homologs may represent very young taxonomically restricted or ` orphan ' genes . 
+ Yomtovian et al. [ 73 ] reported that orphan genes of EHEC show an amino acid composition more comparable to random sequences than to annotated genes , since they may not yet have a fully adapted function , which makes it difficult for any annotation program , including our machine-learning algorithm , to distinguish them from scrambled proteins . 
+ Also , young genes without annotated homologs are shorter [ 74 ] , which is true for our data set . 
+ Additionally , evolutionary young genes often use uncommon start codons [ 75 ] , which is also true for our data set . 
+ This hypothesis is further supported by the evolutionary distances of the non-annotated homologs detected using tblastn , when comparing the novel genes without annotated homologs to the novel genes with annotated homologs ( Fig 5 ) . 
+ The genes with annotated homologs show intact tblastn hits ( i.e. , ORFs without stop codons ) with a significantly greater evolutionary distance compared to the genes without annotated homologs . 
+ In summary , we believe that our data provide evidence supporting the hypothesis that most of these 210 ORFs are evolutionarily young genes coding for proteins with unusual features . 
+ The data set may contain some false positives , since in a few cases , ribosome binding of the RNA may exert a regulatory function , comparable to a translation regulating riboswitch instead of translation into protein [ 76 , 77 ] ; however , this will not invalidate our general findings . 
+ Conclusion
+ This study supports the fact , that , in contrast to earlier beliefs , bacterial genomes are probably under-annotated due to small genes having been overlooked . 
+ In E. coli O157 : H7 Sakai , at least 
+ 465 non-annotated short ORFs are covered with significant RIBOseq reads indicating active translation and the majority of these ORFs show features of protein-coding genes . 
+ Since the 
+ EHEC Sakai genome harbors about 5200 annotated protein-coding genes , these additional genes would significantly increase the number of protein-coding genes in this bacterium . 
+ Obviously , much further work is required for functional characterization of the novel genes . 
+ It would not be surprising if other bacterial genomes also harbor many overlooked short genes in their intergenic regions , which could be investigated by combined RNAseq and RIBOseq . 
+ In addition , the high-throughput discovery of small proteins in proteome analysis requires modified or improved methods since these proteins likely escape attention with most currently available methods [ 17 , 78 , 79 ] . 
+ Our study supports the notion that it is advisable to improve genome annotation algorithms in order to reduce bias against annotation of short genes [ 16 , 75 ] . 
+ Material and methods
+ Transcriptome and translatome sequencing
+ Strand-specific RNAseq and RIBOseq of Escherichia coli O157 : H7 Sakai ( GenBank accession number BA000007 .2 and RefSeq accession NC_002695 .1 , version from February 2014 ) [ 1 ] were performed at three different growth conditions in two biological replicates each . 
+ An over-night culture of EHEC was inoculated 1:100 in lysogeny broth ( LB medium ) and incubated at 37 ˚C and 150 rpm until an OD600 of 0.4 was reached . 
+ Additionally , two conditions using brain-heart infusion broth ( BHI ; Merck KGaA ) were investigated . 
+ For the BHI control condition , an overnight culture of EHEC was inoculated 1:100 and incubated at 37 ˚C and 150 rpm until an OD600 of 0.1 was reached . 
+ For the stress condition of combined cold and osmotic stress ( COS ) , 4 % NaCl were added to the BHI medium and incubation was performed at 14 ˚C 
+ RNAseq was performed as described by Landstorfer et al. [8] for the Illumina system. For
+ ribosomal footprinting , the method published by Ingolia et al. [ 31 ] was adapted to bacteria as described [ 11 ] with the following further modifications : mRNA not protected by ribosomes was digested with a mixture of five RNases to exclude sequence specificity . 
+ Buffer NEB 4 plus 1 mM CaCl2 was added to 1 ml cell extract and the solution was incubated for 1 h at RT with 250 U MNase ( Roche ) , 5 U XRN-1 ( NEB ) , 250 U RNase I ( Thermo Fisher Scientific ) , 50 U RNase 
+ R ( Biozym ) and 12 U RNase T ( NEB ) . 
+ The monosome fraction was harvested by sucrose density gradient centrifugation and unprotected mRNA digestion was repeated once . 
+ For the LB condition , rRNA was depleted using the MICROBExpress kit ( Thermo Fisher Scientific ) and for the BHI conditions rRNA depletion was performed using the RiboZero kit for Gram-nega-tive bacteria ( Illumina ) . 
+ All libraries were prepared using the TruSeq Small RNA Sample 
+ Preparation Kit ( Illumina ) and sequenced on a HiSeq 2500 machine according to the manufacturer . 
+ The sequencing raw data is available at the Sequence Read Archive ( SRA , NCBI ) 
+ Read mapping and RCV calculation
+ For processing and mapping of the sequencing raw data , the Galaxy platform was used [ 80 ] as described [ 11 ] . 
+ The data were visualized using BamView [ 81 ] implemented in Artemis 16.0 
+ [ 41 ] . 
+ The RPKM values for all intergenic non-annotated ORFs in EHEC which would encode a peptide of 30 AA ( ~ 12,000 ORFs ) were calculated in R , whereas reads mapping to rRNA or tRNA were excluded [ 82 ] . 
+ Besides the canonical DTG start codons , the rare start codons CTG , ATT , ATA and ATC were allowed according to genetic code table 11 ( https://www.ncbi . 
+ nlm.nih.gov / Taxonomy/Utils/wprintgc . 
+ cgi ) . 
+ The ratio of RPKM translatome over RPKM transcriptome gives the ribosomal coverage value ( RCV ) , which is a measure for the translatability of a certain ORF [ 39 ] . 
+ Novel gene candidates had to fulfill the following criteria for at least one growth condition in both biological replicates to be considered translated : RPKM translatome at least 1 read per million mapped reads , coverage translatome 0.5 and RCV 0.25 . 
+ To exclude false positives , all novel gene candidates were manually inspected in Artemis . 
+ Reading frame determination
+ Adapter removal and quality trimming were performed using AdapterRemoval v2 .1.7 [ 83 ] and non-rRNA reads longer than 18 bp were extracted using sortMeRNA v2 .0 [ 84 ] . 
+ Extracted reads were mapped to previously annotated genes , novel genes with annotated homologs and novel genes without annotated homologs , in Escherichia coli O157 : H7 Sakai using Vsearch v2 .1.2 [ 85 ] . 
+ The reading frame of the 5 ' end of each mapped read of length 20 bp ( maximum of read length distribution ) was determined using a custom script ( S1 File ) , which counts the number of 5 ' ends for the three codon positions and sums the values for the three gene groups ( annotated genes , novel genes with annotated homologs , and novel genes without annotated 
+ Differential gene expression
+ The condition ` BHI at 37 ˚C ' was used as the reference data set and for the LB and BHI COS conditions significant changes on transcriptional and translational level were determined . 
+ Read counts were normalized to the smallest library and differential expression was analyzed by an exact test implemented in the Bioconductor package edgeR ( version 3.2.4 ) [ 44 ] . 
+ A pvalue 0.05 and a false discovery rate ( FDR ) 0.1 were used to delineate significant expression changes . 
+ Prediction of σ70 promoters
+ The region 300 bp upstream of the start codon was searched for the presence and strength of a σ70 promoter with the program BPROM ( Softberry [ 86 ] ) . 
+ It searches for the -35 and -10 consensus motif and recognition sequences for transcription factors . 
+ With this data , an LDF score ( linear discriminant function ) is calculated , whereupon increasing values indicate growing promoter strength . 
+ An LDF score of 0.2 gives the threshold for promoter prediction with 80 % accuracy and specificity . 
+ The region 300 bp downstream of the stop codon was searched for the presence and strength of a ρ-independent terminator using FindTerm ( Softberry [ 86 ] ) . 
+ This program searches thymi-dine-rich regions , and calculates the energy of possible terminator structures . 
+ Low energy values indicate strong terminators . 
+ Prediction of Shine-Dalgarno sequence
+ The region 30 bp upstream of the start codon was examined for the presence of a Shine-Dal-garno sequence ( optimum uaAGGAGGu ) . 
+ ΔG˚ was calculated according to Ma et al. [ 51 ] with 
+ Calculation of kA/kS
+ The most distantly related homologs of the short annotated genes and the novel genes were determined with tblastn by selecting the hit with the highest e-value which still has 100 % coverage and no gaps . 
+ In case the sequence pairs were too similar , meaningful kA/kS calculation was not possible . 
+ The ratio of synonymous to non-synonymous substitutions between those gene pairs was computed using the KaKs_Calculator 2.0 [ 87 ] . 
+ The `` bacterial and plant plastid code '' was selected and the method model selection ( MS ) was used . 
+ The ORF is assumed to be under positive selection when kA/kS is significantly greater than 1 and under purifying selection when kA/kS is significantly less than 1 . 
+ Significance was determined using a Holm-Bonfer-roni multiple comparisons procedure with respect to the family , an error rate of 0.05 . 
+ A Fisher 's Exact Test was performed in R version 3.3.2 . 
+ Unless otherwise noted , all p-values refer 
+ Detection of annotated homologs
+ Novel gene sequences were translated into the corresponding proteins sequences , which were used to query the GenBank database using blastp with default parameters [ 88 ] . 
+ An e-value cutoff of 10 was applied . 
+ -3 
+ Sequence conservation
+ Sequences of the novel genes were aligned against the full RefSeq genomic database downloaded on 5 April 2017 , using a tblastn search in the local BLAST utilities 2.6.0 + from the 
+ NCBI [ 89 ] with a maximum e-value of 0.001 . 
+ The putative homologues were extracted from the database and those without stop codons were retained as ` intact ' . 
+ The amino acid similarity of each intact subject sequence with the query ORF was calculated using the Needle-Wunsch algorithm `` Needleall '' from EMBOSS [ 90 ] . 
+ The Achromobacter sp . 
+ ATCC35328 sequences with names beginning NZ_CYUC010 were removed from the analysis , due to abnormally high similarity with E. coli for a very large number of genes . 
+ Thus , we assumed this species to be mislabeled . 
+ To map the results gained using NCBI databases to the SILVA taxonomy , hits were conflated to genus level , which allowed inclusion of over 90 % of genera with hits in each case . 
+ To obtain approximate relative evolutionary distances , the average distance from EHEC Sakai to the last common ancestor with each genus was calculated from the 16S rRNA SILVA reference NR99 guide tree [ 91 ] , release 128 , using Newick Utilities [ 92 ] . 
+ A custom shell script for these tasks , ORFage , was used ( S2 File ) . 
+ A similar pipeline was used to check the conservation of intergenic sequences upstream and downstream of the novel genes . 
+ For the upstream regions , the sequences between the stop codon of the nearest annotated gene upstream of the start codon of the novel gene was taken . 
+ Similarly , for downstream regions , the sequence between the stop codon of the novel gene and the start codon of the next annotated gene downstream was taken . 
+ Some of the regions were too short to obtain ( meaningful ) tblastn hits and were excluded . 
+ Further regions were excluded , when containing another of the novel genes before an annotated gene was reached . 
+ One downstream sequence was abnormally long and could not be processed ( tblastn search > 1 day ) , hence , this region was also excluded . 
+ Within the upstream and downstream sequences , stop codons were allowed . 
+ The shell script used for preparation of the intergenic sequences including the use of ENTREZ DIRECT [ 93 ] is included in S3 File . 
+ Predicted protein characteristics
+ The amino acid sequences encoded in the 250 short annotated genes and the 465 novel genes were submitted to PredictProtein [ 46 ] using default parameters . 
+ This software predicts structural and functional features of the putative proteins . 
+ The results of PROFphd ( secondary structure ) [ 94 ] , TMSEG ( transmembrane helices ) [ 95 ] , DISULFIND ( disulfide bonds ) [ 96 ] , 
+ UCON ( disordered regions ) [ 47 ] and LocTree3 ( subcellular localization ) [ 97 ] were analyzed in further detail . 
+ Machine learning based protein recognition
+ A machine-learning algorithm , as described by Neuhaus et al. [ 11 ] , was used to classify the novel proteins based on predicted protein parameters . 
+ Briefly , about 279 short annotated proteins were picked from EHEC EDL933 and these sequences shuffled 100-times . 
+ All sequences , natural and shuffled , were submitted to a PredictProtein analysis [ 45 , 46 ] . 
+ The machine-learn-ing algorithm was trained using the predicted parameters for the annotated proteins ( positive control ) and their shuffled counterparts ( negative control ) . 
+ Both strains , EDL933 and Sakai are very closely related to each other [ 98 ] and , thus , the trained algorithm was used here , as well . 
+ We not only examined the protein sequences of the novel genes in Sakai , but also shuffled 
+ Visualization of the gene’s localization was created using Circos [99].
+ Supporting information
+ S1 Fig . 
+ Distribution of RCV for the short annotated genes , novel genes with and without annotated homologs . 
+ ( A ) RCV distribution at BHI control . 
+ ( B ) RCV distribution at BHI COS. . 
+ S2 Fig . 
+ Conservation of intergenic sequences . 
+ A similar process as used for Fig 5 was repeated on unannotated sequences upstream and downstream of the novel genes , but without removing sequences with stop codons . 
+ Many of the sequences had no tblastn hits ( too short ) and some others were excluded as more than one novel gene was situated between two annotated genes ; one was excluded as abnormally long . 
+ Thus , 136 sequence remained for upstream and 122 for downstream . 
+ Most homologs have low similarity . 
+ The custom shell script used is provided in S3 File . 
+ ( A ) Analysis of the sequences upstream of the novel genes without annotated homologs . 
+ ( B ) Analysis of the sequences downstream of the novel genes without annotated 
+ The average evolutionary distance to the tblastn hits ( of at least 80 % similarity ) of the novel proteins without annotated homologs ( blastp ) is 0.643 . 
+ Average distance for their downstream sequences is 0.535 , which is significantly lower ( p = 0.0024 , two-tailed t-test ) . 
+ Average in evolutionary distance for upstream regions is 0.596 , not significantly different compared to distances for genes ( p = 0.1421 ) . 
+ The upstream region may be more conserved ( e.g. , due to reg ¬ 
+ S1 Table . 
+ Summary of NGS results . 
+ The total number of reads , the number of reads mapping to the E. coli O157 : H7 Sakai genome and the distribution of mapped reads to rRNA , tRNA and mRNA are shown . 
+ Only the reads mapping to mRNA were used for further analysis . 
+ Every library contains between 1.5 -- 9.7 m. mRNA reads . 
+ S2 Table . 
+ RNAseq and RIBOseq results of three different growth conditions for the 465 novel genes and the 250 short annotated genes . 
+ The novel genes are consecutively numbered after their appearance in the EHEC Sakai genome . 
+ The RPKM transcriptome , RPKM transla-tome , RCV , and coverage values represent mean values of the two biological replicates . 
+ S3 Table . 
+ Properties of the novel genes . 
+ Annotated homologs in other strains/species were searched using blastp . 
+ Only the best hit is listed . 
+ The fourth column illustrates annotated homologs in other E. coli O157 : H7 strains or duplications of annotated genes in EHEC Sakai . 
+ With bioinformatics methods the presence of a σ70 promoter , a ρ-independent terminator , a 
+ Shine-Dalgarno sequence , and selection pressure ( kA/kS ) were predicted or estimated . 
+ The last column gives the classification of the putative novel protein by the machine-learning algorithm trained with short annotated E. coli O157 : H7 EDL933 genes . 
+ S4 Table . 
+ Properties of the 250 short annotated genes . 
+ With bioinformatics methods the presence of a σ70 promoter , a ρ-independent terminator , a Shine-Dalgarno sequence and selection pressure ( kA/kS ) were predicted or estimated . 
+ The last column gives the classification of the short genes by the machine-learning algorithm . 
+ S5 Table . 
+ Conservation of the novel genes . 
+ Summary of ORF conservation as represented in Fig 5 . 
+ S6 Table . 
+ Significant transcriptional and translational regulation in LB compared to BHI control of the novel genes and the short annotated genes . 
+ The mean value of the two biological replicates of transcriptome and translatome counts of the BHI control and the LB condition are shown . 
+ The log-fold change was calculated and differential gene expression was determined using edgeR . 
+ Transcriptional or translational changes are considered significant , when they show a p-value of 0.05 and a false discovery rate ( FDR ) of 0.1 . 
+ Significant changes in 
+ LB compared to BHI control are highlighted in gray . 
+ Only genes with significant changes on transcriptional and/or translational level are listed . 
+ S7 Table . 
+ Transcriptional and translational regulation at BHI COS compared to BHI control of the novel genes and the short annotated genes . 
+ The mean value of the two biological replicates of transcriptome and translatome counts of the BHI control and the stress condition COS are shown . 
+ The log-fold change was calculated and differential gene expression was determined using the software edgeR . 
+ Transcriptional or translational changes are considered significant , when they show a p-value of 0.05 and an FDR of 0.1 . 
+ Significant changes in BHI 
+ COS compared to control are highlighted in gray . 
+ Only genes with significant changes on transcriptional and/or translational level are listed . 
+ S8 Table . 
+ Summary of the Predict Protein results for the putative proteins encoded by the novel genes . 
+ The first columns show the AA composition , followed by predicted cellular localization , number of transmembrane helices , disulfide bonds and binding motives . 
+ Additionally , secondary structures , disordered regions and domains are predicted . 
+ S9 Table . 
+ Summary of the Predict Protein results for the short annotated genes . 
+ The first columns show the AA composition , followed by predicted cellular localization , number of transmembrane helices , disulfide bonds and binding motives . 
+ Additionally , secondary structures , disordered regions and domains are predicted . 
+ S10 Table . 
+ Classification into 're al ' and ' pseudo ' proteins by the machine-learning algorithm . 
+ The upper part of the table shows the results for the novel genes and the lower part for 
+ S1 File . 
+ Custom script used for reading frame determination in the sum signal of gene groups . 
+ S2 File. Custom script used for detecting sequence conservation. (BASH)
+ S3 File . 
+ Custom script used for extracting intergenic sequences -- for comparative conservation analysis . 
+ Acknowledgments
+ We thank Svenja Simon for providing her R scripts for RPKM value and coverage calculation . 
+ Conceptualization: Sarah M. Hücker, Siegfried Scherer, Klaus Neuhaus.
+ Data curation: Sarah M. Hücker, Zachary Ardern, Tatyana Goldberg.
+ Formal analysis : Sarah M. Hücker , Zachary Ardern , Tatyana Goldberg , Andrea Schafferhans , Gisle Vestergaard , Chase W. Nelson , Klaus Neuhaus . 
+ Funding acquisition: Siegfried Scherer.
+ Investigation : Sarah M. Hücker , Zachary Ardern , Andrea Schafferhans , Michael Bernhofer , Gisle Vestergaard , Chase W. Nelson , Klaus Neuhaus . 
+ Methodology : Sarah M. Hücker , Tatyana Goldberg , Michael Bernhofer , Chase W. Nelson , Klaus Neuhaus . 
+ Project administration: Siegfried Scherer, Klaus Neuhaus.
+ Resources : Andrea Schafferhans , Michael Bernhofer , Michael Schloter , Burkhard Rost . 
+ Software : Zachary Ardern , Tatyana Goldberg , Andrea Schafferhans , Gisle Vestergaard , Michael Schloter , Burkhard Rost . 
+ Visualization: Sarah M. Hücker, Zachary Ardern.
+ Writing – original draft: Sarah M. Hücker.
+ Writing -- review & editing : Zachary Ardern , Chase W. Nelson , Siegfried Scherer , Klaus Neuhaus . 
+ 1.
+ 2 . 
+ Lim JY , Yoon J , Hovde CJ . 
+ A brief overview of Escherichia coli O157 : H7 and its plasmid O157 . 
+ J Microbiol Biotechnol . 
+ 2010 ; 20 ( 1 ) :5 -- 14 . 
+ PMID : 20134227 ; PubMed Central PMCID : PMC3645889 . 
+ 14 . 
+ Boekhorst J , Wilson G , Siezen RJ . 
+ Searching in microbial genomes for encoded small proteins . 
+ Microb Biotechnol . 
+ 2011 ; 4 ( 3 ) :308 -- 13 . 
+ https://doi.org/10.1111/j.1751-7915.2011.00261.x PMID : 21518296 . 
+ 15 . 
+ Storz G , Wolf YI , Ramamurthi KS . 
+ Small proteins can no longer be ignored . 
+ Annu Rev Biochem . 
+ 2014 ; 83:753 -- 77 . 
+ https://doi.org/10.1146/annurev-biochem-070611-102400 PMID : 24606146 . 
+ 18 . 
+ Kemp G , Cymer F. Small membrane proteins -- elucidating the function of the needle in the haystack . 
+ Biol Chem . 
+ 2014 ; 395 ( 12 ) :1365 -- 77 . 
+ https://doi.org/10.1515/hsz-2014-0213 PMID : 25153378 
+ 25.
+ 26.
+ 27 . 
+ Dornenburg JE , Devita AM , Palumbo MJ , Wade JT . 
+ Widespread Antisense Transcription in Escherichia coli . 
+ mBio . 
+ 2010 ; 1 ( 1 ) . 
+ https://doi.org/10.1128/mBio.00024-10 PMID : 20689751 . 
+ 29 . 
+ Wade JT , Grainger DC . 
+ Pervasive transcription : illuminating the dark matter of bacterial transcriptomes . 
+ Nat Rev Microbiol . 
+ 2014 ; 12 ( 9 ) :647 -- 53 . 
+ https://doi.org/10.1038/nrmicro3316 PMID : 25069631 . 
+ 41 . 
+ Rutherford K , Parkhill J , Crook J , Horsnell T , Rice P , Rajandream MA , et al. . 
+ Artemis : sequence visuali-zation and annotation . 
+ Bioinformatics . 
+ 2000 ; 16 ( 10 ) :944 -- 5 . 
+ PMID : 11120685 . 
+ 45 . 
+ Rost B , Yachdav G , Liu J . 
+ The predictprotein server . 
+ Nucleic Acids Res . 
+ 2004 ; 32 ( suppl 2 ) : W321 -- W6 . 
+ 48 . 
+ Browning DF , Busby SJ . 
+ The regulation of bacterial transcription initiation . 
+ Nat Rev Microbiol . 
+ 2004 ; 2 ( 1 ) :57 -- 65 . 
+ https://doi.org/10.1038/nrmicro787 PMID : 15035009 . 
+ 49 . 
+ Wilson KS , von Hippel PH. Transcription termination at intrinsic terminators : the role of the RNA hairpin . 
+ Proc Natl Acad Sci U S A. 1995 ; 92 ( 19 ) :8793 -- 7 . 
+ PMID : 7568019 ; PubMed Central PMCID : PMC41053 . 
+ 52 . 
+ Hughes AL. . 
+ Adaptive Evolution of Genes and Genomes . 
+ Oxford University Press , New York . 
+ 1999 . 
+ 65 . 
+ Gerashchenko MV , Gladyshev VN . 
+ Ribonuclease selection for ribosome profiling . 
+ Nucleic Acids Res . 
+ 2017 ; 45 ( 2 ) : e6 . 
+ https://doi.org/10.1093/nar/gkw822 PMID : 27638886 . 
+ 69 . 
+ Banerjee S , Chalissery J , Bandey I , Sen R. Rho-dependent transcription termination : more questions than answers . 
+ J Microbiol . 
+ 2006 ; 44 ( 1 ) :11 -- 22 . 
+ Epub 2006/03/24 . 
+ 2342 [ pii ] . 
+ PMID : 16554712 . 
+ 70 . 
+ Zheng X , Hu G-Q , She Z-S , Zhu H. Leaderless genes in bacteria : clue to the evolution of translation initiation mechanisms in prokaryotes . 
+ BMC Genomics . 
+ 2011 ; 12 ( 1 ) :361 . 
+ 72 . 
+ Levitt M. Nature of the protein universe . 
+ Proc Natl Acad Sci U S A. 2009 ; 106 ( 27 ) :11079 -- 84 . 
+ https://doi . 
+ org/10 .1073 / pnas .0905029106 PMID : 19541617 ; PubMed Central PMCID : PMC2698892 . 
+ 74 . 
+ Tatarinova TV , Lysnyansky I , Nikolsky YV , Bolshoy A . 
+ The mysterious orphans of Mycoplasmataceae . 
+ Biol Direct . 
+ 2016 ; 11 ( 1 ) :1 . 
+ 78 . 
+ Zur H , Aviner R , Tuller T. Complementary Post Transcriptional Regulatory Information is Detected by PUNCH-P and Ribosome Profiling . 
+ Sci Rep. 2016 ; 6 . 
+ Solovyev VV, Tatarinova TV. Towards the integration of genomics, epidemiological and clinical data. Genome Med. 2011; 3(7):48. https://doi.org/10.1186/gm264 PMID: 21867574; PubMed Central PMCID: PMC3221549.
+ Wang D, Zhang Y, Zhang Z, Zhu J, Yu J. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics. 2010; 8(1):77–80. https://doi.org/10.1016/S1672-0229(10)60008-3 PMID: 20451164
+ 88 . 
+ Altschul SF , Gish W , Miller W , Myers EW , Lipman DJ . 
+ Basic local alignment search tool . 
+ J Mol Biol . 
+ 1990 ; 215 ( 3 ) :403 -- 10 . 
+ https://doi.org/10.1016/S0022-2836 ( 05 ) 80360-2 PMID : 2231712 
+ 90 . 
+ Rice P , Longden I , Bleasby A. EMBOSS : the European Molecular Biology Open Software Suite . 
+ Trends Genet . 
+ 2000 ; 16 ( 6 ) :276 -- 7 . 
+ Epub 2000/05/29 . 
+ PMID : 10827456 . 
+ 93 . 
+ Kans J. Entrez Direct : E-utilities on the UNIX Command Line : Bethesda ( MD ) : National Center for Bio-technology Information ; 2013 . 
+ 94 . 
+ Rost B , Sander C. Combining evolutionary information and neural networks to predict protein secondary structure . 
+ Proteins . 
+ 1994 ; 19 ( 1 ) :55 -- 72 . 
+ https://doi.org/10.1002/prot.340190108 PMID : 8066087 . 
+ 96 . 
+ Ceroni A , Passerini A , Vullo A , Frasconi P. DISULFIND : a disulfide bonding state and cysteine connectivity prediction server . 
+ Nucleic Acids Res . 
+ 2006 ; 34 ( suppl 2 ) : W177 -- W81 .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/28911122.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/28911122.txt 0 → 100644
View file @27818a9
+ Data exploration, quality control and statistical
+ 1Department of Statistics , University of Wisconsin-Madison , Madison , WI 53706 , USA , 2Department of Public Health Sciences , Medical University of South Carolina , SC 29425 , USA , 3Great Lakes Bioenergy Research Center , University of Wisconsin-Madison , Madison , WI 53726 , USA , 4Department of Biochemistry , University of Wisconsin-Madison , Madison , WI 53706 , USA , 5Department of Bacteriology , University of Wisconsin-Madison , Madison , WI 53706 , USA and 6Department of Biostatistics and Medical Informatics , University of Wisconsin-Madison , Madison , WI 53792 , USA 
+ Received April 13, 2017; Revised June 02, 2017; Editorial Decision June 27, 2017; Accepted July 12, 2017
+ ABSTRACT
+ ChIP-exo/nexus experiments rely on innovative modiﬁcations of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites . 
+ Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq , these high throughput experiments pose a number of unique quality control and analysis challenges . 
+ We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package , ChIPexoQual , to enable exploration and analysis of ChIP-exo and related experiments . 
+ ChIPexoQual evaluates a number of key issues including strand imbalance , library complexity , and signal enrichment of data . 
+ Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage . 
+ We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple , new ChIP-exo datasets from Escherichia coli . 
+ ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data . 
+ INTRODUCTION
+ Chromatin Immunoprecipitation followed by exonuclease digestion and next generation sequencing ( ChIP-exo ) is currently one of the state-of-the-art high throughput assays for profiling protein-DNA interactions at or close to single base-pair resolution ( 1 ) . 
+ It presents a powerful alternative to popular ChIP-seq ( chromatin immunoprecipitation coupled with next generation sequencing ) assay . 
+ ChIP-exo experiments first capture millions of DNA fragments ( 150 -- 250 bps in length ) that the protein under study interacts with , using a protein-specific antibody and random fragmentation of DNA . 
+ Then , - exonuclease ( - exo ) is deployed to trim the 5 ′ end of each DNA fragment to each protein-DNA interaction boundary . 
+ This step is unique to ChIP-exo and aims to achieve significantly higher spatial resolution compared to ChIP-seq . 
+ Finally , high throughput sequencing of a small region ( 36 -- 100 bps ) at the 5 ′ end of each fragment generates millions of reads . 
+ Similarly , ChIP-nexus ( Chromatin Immunoprecipitation followed by exonuclease digestion , unique barcode , single ligation and next generation ligation ) ( 2 ) is a further modification on the ChIP-exo protocol . 
+ ChIP-nexus aims to overcome limitations of ChIP-exo by yielding high complexity libraries with numbers of cells comparable to that of ChIP-seq experiments . 
+ This is achieved by reducing the numbers of ligations in the standard ChIP-exo protocol from two to one , and adding unique , randomized barcodes to adaptors to enable monitoring of overamplification . 
+ In addition to these , several other high-resolution protocols have also been considered . 
+ In X-ChIP and ORGANIC ( 3,4 ) , the DNA is fragmented by the application of endonuclease and exonuclease enzymes and then stabilized by sonication . 
+ The main difference between these two protocols is that in X-ChIP , the cells are crosslinked with formaldehyde and then the DNA is extracted by cell lysation , while the ORGANIC protocol achieves this step by nuclear isolation . 
+ Currently , ChIP-exo seems to be the more commonly adapted high-resolution protocol . 
+ Figure 1A illustrates the differences between distinct ChIP-based protocols : ChIP-exo , ChIP-nexus , single-end ( SE ) ChIP-seq , paired-end ( PE ) ChIP-seq . 
+ The 5 ′ ends from a ChIP-exo/nexus experiment are clustered more tightly around the binding sites of the protein than in a ChIP-seq experiment . 
+ In a PE ChIP-seq experiment , both ends are sequenced as opposed to only the 5 ′ end in a SE ChIP-seq . 
+ Although ChIP-exo/nexus protocols are being adopted by the research community , features of ChIP-exo data , specially those pertaining to data quality , have not been investigated . 
+ First , DNA libraries generated by the ChIP-exo protocol are expected to be less complex than the libraries generated by ChIP-seq ( 5 ) because digestion by-exo aims to reduce the number of individual genomic positions , to which sequencing reads can map , to small regions located around the actual binding sites . 
+ Therefore , in high quality and deeply sequenced ChIP-exo datasets , it is possible to observe large numbers of reads accumulating at a small number of bases due to actual signal rather than overamplification bias as commonly observed in ChIP-seq experiments . 
+ Second , although we expect approximately the same numbers of reads from both DNA strands at a given binding site , there may be locally more reads in one strand than in the other , owing to - exo efficiency , ligation efficiency , or other factors . 
+ This is an important point with implications on the statistical analysis of ChIP-exo data . 
+ Specifically , currently available ChIP-exo specific statistical analysis methods ( e.g. MACE ( 6 ) , CexoR ( 7 ) and Peakzilla ( 8 ) ) rely on the existence of peak-pairs formed by forward and reverse strand reads at the binding site . 
+ Finally , most of current widely used ChIP-seq quality control ( QC ) guidelines ( 9 -- 11 ) may not be directly applicable to ChIP-exo data . 
+ To address these challenges , we develop a suite of diagnostic plots and summary statistics and implement them in a versatile R/Bioconductor package named ChIPexoQual . 
+ The overall pipeline takes into account the characteristics of ChIP-exo/nexus data and addresses the critical shortcomings of the currently available QC pipelines that are not particularly tailored for ChIP-exo/nexus data ( 9 -- 10,12 -- 13 ) . 
+ We apply this pipeline to a large collection of public and newly generated ChIP-exo/nexus data and we validate the QC pipeline by evaluating the samples for features that capture high signal to noise , such as occurrences of motifs recognized by the profiled DNA interacting protein and also utilize blacklisted regions as identified by the ENCODE consortium . 
+ MATERIALS AND METHODS
+ ChIP-seq/exo/nexus datasets
+ E. coli ChIP-exo and ChIP-seq samples . 
+ For simplicity , we introduce some abbreviations for the Escherichia coli 70 ChIP-exo ( E ) , PE ChIP-seq ( P ) , and SE ChIP-seq ( S ) samples . 
+ We denote the data generated in the first ( second ) batch as E1 ( E2 ) , P1 ( P2 ) and S1 ( S2 ) . 
+ Summaries of the growth conditions and sample IDs for the ChIP-exo samples are included in Table 1 . 
+ The SE and PE ChIP-seq samples generated under the same conditions share the same Id . 
+ convention . 
+ The procedures for sample preparation and sequencing are described in the supplement . 
+ The ChIP-exo experiments followed the protocol 7 described in ( 1 ) . 
+ Processing of the ChIP-exo and ChIP-nexus samples . 
+ We aligned the ChIP-exo/nexus samples in Table 2 by following the descriptions listed in their respective publications . 
+ When the alignment settings were not discernible in the original publication , we used bowtie ( version 1.1.2 ) ( 14 ) . 
+ We aligned the E1 samples of Table 1 with bowtie-q - m 1 - l 55 - k 1 -5 3 -3 40 -- best - S and the E2 samples using bowtie - q - m 1 - v 2 -- best . 
+ The average read lengths were 102 and 52 bp for the E1 and E2 samples , respectively . 
+ Hence , to make the alignments for both samples comparable , we trimmed 40 bp from the 3 ′ ends of the reads in the E1 samples . 
+ We trimmed 3 bp from the 5 ′ end to remove the adaptors in the E1 samples . 
+ ChIP-exo and ChIP-seq peak calling with MOSAiCS to identify high signal peaks
+ MOSAiCS ( 15 ) is a model-based approach for the analysis of ChIP-seq and ChIP-exo data . 
+ We used MOSAiCS to identify sets of highly significant peaks for ChIP-exo and ChIP-seq under the GC + Mappability and InputOnly modes for background estimation , respectively . 
+ Subsequently , we called peaks with a 5 % FDR and a threshold of at least 100 extended fragments . 
+ Generation of a set of high signal regions from E. coli samples to assess strand imbalance
+ We partitioned the E. coli genome into non-overlapping intervals of length 150 bp and counted the number of reads overlapping each interval . 
+ As is usually the practice with ChIP-seq analysis , each read was extended to the average fragment length of 150 bp toward the 3 ′ direction . 
+ To evaluate the strand imbalance , we identified a set of high signal peaks for ChIP-exo and SE ChIP-seq . 
+ The subset of these peaks for which dPeak ( 16 ) analysis identified one or more binding events were used in FSR assessments ( Figure 1B and Supplementary Figure S1E ) . 
+ Existing next generation sequencing data QC metrics and methods
+ We used the ChIP-seq QC metric definitions established by the ENCODE consortium ( 10,11 ) , and described in detail at https://genome.ucsc.edu/ENCODE/qualityMetrics . 
+ html . 
+ These QC metrics were calculated with the ChIPUtils package ( version 0.99.0 from https://github.com/keleslab/ ChIPUtils ) . 
+ Empirical data from the ENCODE project suggests the following guidelines for interpretation of the QC metrics for human and mouse genomes : a PBC value between 0 -- 0.5 indicates severe bottlenecking , 0.5 -- 0.8 moderate bottlenecking , 0.8 -- 0.9 mild bottlenecking and 0.9 -- 1 no bottlenecking . 
+ In addition to ENCODE QC metrics , we considered FASTQC ( version 0.11.5 ) and htSeqTools ( version 1.16.0 ) ( 9 ) for assessing the overall quality of the ChIP-exo/nexus sequences . 
+ Collectively , these encompass all the metrics available for read-level data in ChiLin ( 13 ) , which is another QC tool for ChIP-seq and DNase-seq , and Q-nexus ( 12 ) , which is a ChIP-nexus analysis pipeline with QC features that are similar to that of FASTQC . 
+ The remaining metrics calculated by the ChiLin pipeline require the use of a peak calling algorithm or external data ( such as DNas hypersensitive sites ) and , therefore , are not utilized in our evaluations . 
+ Blacklisted regions in eukaryotic genomes
+ For the mm9 , hg19 , and dm3 genomes , we used the blacklists generated by the ENCODE consortium ( 17 ) , available at https://sites.google.com/site/anshulkundaje/projects/ blacklists . 
+ These lists consist of genomic segments for which next-generation sequencing experiments produce artificially high signal . 
+ These lists were empirically derived from large compendia of data generated by the ENCODE and modENCODE consortia , respectively . 
+ ChIP-exo quality control with R package ChIPexoQual We implemented our proposed QC pipeline with an R/Bioconductor package named ChIPexoQual , available at http://bioconductor.org/packages/release/bioc/html/ ChIPexoQual.html . 
+ The analysis in this paper used version 1.0.0 of the ChIPexoQual package . 
+ ChIPexoQual : The package takes a set of N aligned reads from a ChIP-exo ( or ChIP-nexus ) experiment as input and performs the following steps . 
+ 1 . 
+ Identify read islands , i.e. overlapping clusters of reads separated by gaps , from read coverage . 
+ The gaps are defined as the union of positions in the genome with fewer than h * ( default = 1 ) aligned reads . 
+ The remaining is lands can be interpreted as the natural partition of the genome determined by a ChIP-exo/nexus experiment . 
+ 2 . 
+ Compute Di , number of reads in island i ; Ui number of positions in island i with at least one aligning read ; and Wi , the width of island i defined as the total number of bases in the island , i = 1 , · · · , I. 3 . 
+ For each island i , i = 1 , · · · , I , compute island statistics : D U ARCi = i , URC i Wi Di i = , FSRi = ( # of fwd . 
+ strand reads aligning to island i ) / Di , 4 . 
+ Generate diagnostic plots ( i ) URC vs. ARC plot ; ( ii ) Region Composition plot ; ( iii ) FSR distribution plot . 
+ 5 . 
+ Randomly sample without replacement M ( at least 500 , default = 1000 ) islands and fit , = + + Di β1Ui β2Wi εi , where ε denotes the independent error term . 
+ Repeat this i process B ( default = 1000 ) times and generate box plots of estimated and . 
+ 1 2 
+ Interpretation of the linear model in the QC pipeline . 
+ The linear model 
+ Di = β1Ui + β2Wi + εi
+ is a re-parametrization of the following relationship from URC vs. ARC diagnostic plot : κ URC = + γ + ε i i ARCi with = 1 and = − . 
+ In this setting , can be 1 / 2 / considered as the large-depth URCi , i.e. the limiting ratio between the number of positions with at least one mapping read and depth as the depth tends to infinity . 
+ Equivalently , = 1 can be interpreted as the average number 1 / of aligned reads per unique position when the sequencing depth is large . 
+ To interpret = − , we express as a 2 / function of ARC and URC and assume that is already estimated . 
+ Then , 
+ ARC lim ARC , γ W γ W D → ∞ U ( D ) where approximates the URC as the sequencing depth increases . 
+ In a low quality experiment where reads accumulate in a few number of positions due to PCR amplification bias or other artifacts , several reads are expected to repeatedly align to the same collection of unique positions , making the term involving the limit diverge from ARC . 
+ In contrast , in a highquality experiment , / is expected to converge to zero because the expression with the limit approximates ARC . 
+ The ChIPexoQual pipeline is enriched by the following two additional modules that are utilized when the sequencing depth is high and/or blacklisted regions are available . 
+ i. Subsampling analysis . 
+ For high depth datasets ( e.g. , ≥ 60M reads for human and mouse samples ) , we subsample N1 < N2 < · · · < N reads , starting with N1 = 20M reads and up to 50M reads in 10M increments as default , and apply steps 1 to 5 for each of the subsampled datasets . 
+ ii . 
+ Blacklisted regions analysis . 
+ The islands identified by ChIPexoQual are separated into two different collections based on their overlap with a set of blacklisted regions . 
+ Then , the 1 and 2 scores are estimated for both collections and compared against the all island scores . 
+ Motif analysis of FoxA1 and TBP enriched regions
+ For each ChIP-exo/nexus sample , we used the ChIP-exo QC pipeline to partition its reference genome into a set of islands with their respective summary statistics . 
+ We then filtered them into collections of high quality regions as follows : i. FoxA1 experiments : we removed the islands with ( i ) reads residing only on one strand ; ( ii ) Ui ≤ 15 ; ( iii ) Di ≤ 100 . 
+ ii . 
+ For TBP experiments : we removed the islands with ( i ) reads residing only on one strand ; ( ii ) Wi < 50 or Wi ≥ 2000 bp ; ( iii ) Ui ≤ 15 ; ( iv ) Di ≤ medianjDj . 
+ These thresholds were empirically selected . 
+ To validate their robustness , we performed an analogous analysis by using the regions that overlapped a set of peaks ( identified by MOSAiCS at FDR 5 % ) with width larger than 3 × rl , where rl is the median read length of the experiment ( Supplementary Figures S34 and S35 ) . 
+ The width filter was not applied to the TBP ChIP-exo samples , and accordingly to the ChIP-nexus samples for consistency , since they exhibited over-amplification ( 2 ) . 
+ We used FIMO ( version 4.9.1 ) ( 18 ) to identify the FoxA1 and TBP motifs within each enriched region using the FoxA1 MA0148 .1 and TBP MA0108 .1 position weight matrices from the JASPAR database ( 19 ) , respectively . 
+ For the FoxA1 experiments we used the default parameters and for the TBP experiments we considered all motifs identified with FIMO p.value < 0.05 . 
+ RESULTS
+ Publicly available ChIP-exo/nexus and novel E. coli ChIP- seq/exo datasets
+ We utilized a rich collection of publicly available ChIP-exo/nexus data from multiple organisms to build and evaluate our quality control pipeline ( Table 2 ) . 
+ These include : CTCF factor in human HeLa cell lines ( 1 ) ; ER factor in human MCF-7 cell lines ( 20 ) ; GR factor in IMR90 , K562 and U2OS human cell lines ( 21 ) ; TBP factor in human K562 cell lines ( 22 ) ; H3 histone in S. cerevisiae where most , but not all of the tail was deleted ( 1-28 ) ( 23 ) . 
+ ChIP-nexus data included experiments from ( 2 ) profiling TBP in human K562 cells , MyC and Max in D. melanogaster S2 cell lines , and Twist and Dorsal in D. melanogaster embryo . 
+ In order to have a setting where we can compare SE and PE ChIP-seq with their ChIP-exo counterpart , we profiled 70 under a variety of conditions in E. coli with ChIP-exo ( Table 1 ) , SE and PE ChIP-seq . 
+ Collectively , we generated 70 factor ChIP-exo , PE and SE ChIP-seq experiments under aerobic ( + O2 ) and anaerobic ( − O2 ) conditions in glu cose minimal media . 
+ For simplicity , we named these experiments as E1 , P1 and S1 , respectively . 
+ Similarly , we generated 70 factor ChIP-exo and PE ChIP-seq experiments in E. coli under aerobic ( + O2 ) conditions with and without rifampicin treatment . 
+ We also named these experiments E2 and P2 , respectively . 
+ ChIP-exo versus ChIP-seq: general features
+ We first compared ChIP-seq and ChIP-exo in terms of data features that are well studied in ChIP-seq studies . 
+ Our 70 ChIP-seq and ChIP-exo samples from E. coli are especially well suited for this task since they are all deeply sequenced compared to the genome size of E. coli . 
+ Figures 1B -- C summarize this comparison for one biological replicate of ChIP-exo and ChIP-seq experiments from the same biological conditions ( samples E1-1 from Table 1 , P1-1 and S1-1 following the same Id . 
+ convention ) . 
+ Peak-pair assumption . 
+ We evaluated the peak-pair assumption , i.e. a cluster of reads in the forward strand located on the left-hand-side of the binding site is usually paired with a cluster of reads located on the right-hand-side of the binding site in the reverse strand . 
+ This observation is commonly utilized in designing statistical analysis methods for ChIP-exo data ( 6 -- 8 ) . 
+ We considered the set of peaks identified in both the ChIP-seq and ChIP-exo samples as high quality peaks ( Materials and Methods ) and calculated the proportion of forward strand reads in these regions ( Figure 1B and Supplementary Figures S1 -- S3 ) . 
+ This plot reveals a higher level of strand imbalance for ChIP-exo compared to ChIP-seq . 
+ Potential reasons for this observation include ligation efficiency , efficiency of - exo digestion , and single-stranded protein-DNA interactions . 
+ Overall , such an imbalance is more likely to occur in low complexity libraries . 
+ Read distributions within signal and background regions . 
+ Using extended raw read counts within 150 bp non-overlapping intervals , i.e. , bins interrogating the genome , Figure 1C depicts that , as observed by others , ChIP read counts from ChIP-exo and ChIP-seq are linearly correlated especially at high read counts . 
+ This indicates that signals for potential binding sites are well reproducible between ChIP-exo and ChIP-seq data . 
+ In contrast , there is a clear difference between the two data types for bins with low read counts , highlighting potential differences in the background read distributions of these data types . 
+ Comparisons with other paired E. coli ChIP-seq and ChIP-exo samples led to similar conclusions ( Supplementary Figures S1 -- S3 ) . 
+ Mappability and GC-content bias . 
+ We next evaluated ChIP-exo data of CTCF in HeLa cells ( 1 ) to investigate biases inherent to next generation sequencing experiments with eukaryotic genomes . 
+ Figures 1D and E ( Supplementary Figure S4 ) display the bin-level average read counts against mappability and GC-content . 
+ Each data point is obtained by averaging the read counts across bins with the same mappability of GC-content . 
+ These biases , increasing linear trend with mappability and non-linear trend with GC-content , are similar to those observed in ChIP-seq datasets ( 15,24 -- 25 ) . 
+ This observation indicates that analysis of ChIP-exo data should benefit from methods that take into account apparent sequencing biases such as mappability and GC content , mostly when an input control sample is not available to account for variability in the background read distribution . 
+ Existing high throughput sequencing quality control metrics applied to ChIP-exo/nexus data
+ We processed the ChIP-exo/nexus samples with FASTQC and observed that in 73.33 % and 93.33 % of the cases , at least a warning is raised for sequence duplication levels and kmer content representation ( Supplementary Table S1 ) , respectively . 
+ The former assumes that most sequences will occur only once in a diverse library and the latter assumes that any small fragment should not have a positional bias in its appearance within a library . 
+ Clearly , these assumptions are not appropriate for ChIP-exo/nexus data , as the exo-enzyme is expected to stop its digestion when it reaches the crosslinking protein . 
+ The ENCODE consortium established empirical and widely used QC metrics on ChIP-seq data ( 10 ) . 
+ We evaluated how these metrics , namely PCR Bottleneck Coefficient ( PBC ) , Normalized Strand Cross-Correlation ( NSC ) , and Relative Strand Cross-Correlation ( RSC ) defined at https : / / genome.ucsc.edu/ENCODE/qualityMetrics.html ( 10,11 ) . 
+ Tables 1 and 2 present these metrics for the collection of ChIP-exo/nexus datasets we consider in this paper . 
+ Marinov et al. ( 11 ) discussed that highly complex ChIP-seq libraries can become exhausted by deep sequencing . 
+ Hence , the PBC is expected to decrease as the sequencing depth increases . 
+ This effect is expected to be more severe in ChIP-exo/nexus as DNA libraries generated by those protocols are expected to be less complex than the libraries generated by ChIP-seq because the numbers of positions to which the reads can align to are reduced due to the exonuclease digestion . 
+ This affects the interpretation of the PBC , which is defined as the ratio of the number of genomic positions to which exactly one read maps to the number of genomic positions to which at least one read maps . 
+ For ChIP-seq samples , low PBC values ( e.g. , ≤ 0.5 ) indicate high levels of PCR amplification bias , i.e. PCR bottleneck , unless the sequencing depth is high enough to saturate all targets of the factor profiled . 
+ In contrast , for ChIP-exo/nexus , exonuclease digestion will lead to reads with same exact 5 ′ end even before the PCR amplification step . 
+ We note that the PBC values are especially low for deeply sequenced ChIP-exo and ChIP-nexus samples ; however , this does not automatically indicate severe bottlenecking as suggested by standard ChIP-seq guidelines . 
+ Planet et al. ( 9 ) presented in the R/Bioconductor package htSeqTools the Standardized Standard Deviation ( SSD ) as a metric to assess enrichment efficiency and to compare across samples . 
+ According to the guidelines established by the authors , higher values of this metric indicates high-quality . 
+ We calculated the SSD coefficient for all the ChIP-exo/nexus samples ( Tables 1 and 2 ) . 
+ Detailed examination of these results reveals a key shortcoming of this metric as the propensity to label samples with low library complexity as higher quality because the reads in such sam ples align to fewer positions in the genome . 
+ For example , when comparing the ChIP-exo/nexus TBP samples , the use of this metric suggests that the deeply sequenced ChIP-exo samples ( replicates 2 and 3 ) exhibit higher quality than the first ChIP-nexus replicate . 
+ This is in contrast to evaluation of these datasets with an independent , motif-based metric as we discuss below . 
+ The Strand Cross-Correlation ( SCC ) , introduced by Kharchenko et al. ( 26 ) , is a commonly used quality metric in assessing ChIP-seq enrichment quality . 
+ It aims to quantify how well the reads mapped to each strand are clustered around the locations of the protein -- DNA interaction sites by calculating the Pearson correlation between forward and backward strands reads by shifting them across a range that covers both the read length of the experiment and the expected average fragment length . 
+ Typical SCC profiles exhibit two local maxima : at the average fragment length and the read length . 
+ In high quality experiments with clear ChIP enrichment , the average fragment length maximum coincides with the global maximum . 
+ In an idealized ChIP-exo experiment where the DNA fragments are digested to the boundaries of the protein -- DNA interaction sites , the SCC profile is expected to maximize at the motif length indicating clustering of the forward and reverse strand reads around the binding site . 
+ This hinders the interpretation of SCC for a ChIP-exo/nexus experiment since it is now maximized at an unobserved shorter fragment length that is confounded with the ` phantom peak ' at the read length . 
+ Carroll et al. ( 27 ) studied the impact of blacklisted regions and duplicated reads when calculating the SCC for ChIP-exo data . 
+ The authors showed that there is a dramatic effect in the SCC profile when removing duplicated reads but the effect of removing the blacklisted regions may be specific in few positions of the SCC profile and suggested to calculate the SCC using only aligned reads that overlap the experiment 's set of peaks but do n't overlap a set of predefined blacklisted regions . 
+ Several biases are introduced into the computation of this modified SCC , because it requires the use and tuning of a peak calling algorithm . 
+ Furthermore , in a lower quality experiment , the peaks may not correspond to actual binding sites . 
+ Figure 1F displays the SCC curves for the CTCF HeLa samples where the ChIP-exo curve actually shows local maxima at 12 bp and the read length , while the SE ChIP-seq curves have an expected local maxima at the read length and a global maxima at the average fragment length . 
+ SCC profiles for other samples are available in Supplementary Figures S5 to S14 . 
+ In ChIP-exo experiments , the read length and the fragment length peaks in the SCC are confounded . 
+ Furthermore , the former is close in proximity to the motif length ; as a result , this may incorrectly suggest experiments to be marginally successful or even failed ( e.g. Supplementary Figure S8 ) and renders QC metrics such as the Normalized Strand Cross-Correlation ( NSC ) or the Relative Strand Cross-Correlation ( RSC ) harder to interpret . 
+ However , in majority of the cases we present , the profile itself seems informative about the enrichment signal in ChIP-exo nexus / experiments . 
+ ChIP-exo quality control pipeline ChIPexoQual To address the limitations of available analytical exploration approaches discussed above , we developed ChIPexoQual . 
+ In Table 3 , we compare ChIPexoQual against the existing tools discussed above . 
+ We highlight that ChIPexoQual provides a global view of both library enrichment and complexity , and detailed diagnostic plots for the balance between the two . 
+ We first present the overall pipeline and then discuss individual components with a case study using ChIP-exo data of FoxA1 from ( 20 ) and ChIP-nexus data from ( 2 ) . 
+ Figure 2 summarizes the 4-step pipeline and the two additional modules . 
+ Given aligned reads from a ChIP-exo/nexus sample , the first step partitions the reference genome into islands representing overlapping clusters of reads separated by gaps by removing the regions with fewer than h * aligned reads . 
+ In step 2 , the total number of reads overlapping each island ( Di ) and the number of island positions with at least one aligned read ( Ui ) are recorded . 
+ Then , three summary statistics ARCi , URCi , and FSRi are computed for each region i. ARCi denotes the average read coefficient and is defined as the ratio of the number of reads in island i ( Di ) to the width of the island i ( Wi ) ; URCi , unique read coefficient , quantifies the inverse of the effective coverage and is defined as the ratio of the number of genomic positions with at least one aligned read within island i ( Ui ) to the number of reads in island i ( Di ) ; and FSRi denotes the proportion of forward strand reads . 
+ Step 3 of the pipeline generates several diagnostic plots aimed at quantifying ChIP enrichment and strand imbalance , and step 4 generates quantitative summaries of these diagnostic plots . 
+ Figure 2A presents the typical behavior of the URC vs. ARC plot for a high quality ChIP-exo sample . 
+ In general , the plot depicts two strong arms . 
+ The left arm , with low ARC and varying URC values , corresponds to background islands , regions that are usually composed of scattered reads that were not digested during the exonuclease step . 
+ The right arm where the URC decreases as the ARC increases corresponds to regions that are usually ChIP enriched . 
+ As a result , this arm depicts the balance between library enrichment and complexity . 
+ Low URC in this arm corresponds to regions composed by reads concentrated in a smaller number of positions . 
+ We quantify the shape of the URC versus ARC plot by the use of two estimated parameters : 1 which represents the average number of reads aligned to the unique positions in large depth regions and 2 which represents the overall change in depth as the width varies across a large set of regions . 
+ These parameters are estimated by sampling experiments on the original samples . 
+ We provide further details on how to obtain these later in the paper where we apply the pipeline to a large collection of ChIP-exo/nexus experiments . 
+ Figure 2B and C present the typical behavior of the Region Composition and Forward Strand Ratio ( FSR ) distribution plots , both of which quantify the strand imbalance as part of the QC pipeline . 
+ The Region Composition plot depicts how quickly the ratio of islands exclusively composed of fragments on a single strand among the islands with comparable read depth decreases as a function of read depth of the island . 
+ In a high quality sample , the proportion of islands with reads from only one strand is expecte to decrease rapidly as we consider higher depth regions . 
+ In contrast , this proportion remains approximately constant in lower quality samples . 
+ The Forward Strand Ratio distribution plot illustrates how quickly the quantiles of the FSR approaches to 0.5 , the expected FSR value in high quality samples . 
+ Even though not every region in a ChIP-exo experiment is perfectly balanced , the most enriched regions are expected to have approximately equal numbers of reads in both strands . 
+ Application and validation of ChIPexoQual with the FoxA1 ChIP-exo dataset . 
+ We next illustrate the proposed QC pipeline using FoxA1 ChIP-exo datasets , which were profiled at comparable sequencing depths in three biological replicates of mouse liver cells . 
+ We first investigated various thresholds for partitioning the mouse genome using these ChIP-exo samples . 
+ We specifically considered small thresholds because larger thresholds are likely to partition wider regions into smaller ones , discard parts of wide regions , and ignore background regions completely . 
+ With this in mind we processed the FoxA1 datasets with the following thresholds 1 , 5 , 25 and 50 ( Supplementary Figure S15 ) . 
+ We observed that , in a high-quality experiment , if multiple thresholds are small and close to each other , then the partitions are similar and the distributions of the proposed metrics are similar as well . 
+ Hence , we decided to use the default threshold of h * = 1 when analyzing the FoxA1 samples . 
+ Figure 3A presents URC versus ARC plots for all three replicates . 
+ The first and third replicates exhibit a defined decreasing trend in URC as the ARC increases . 
+ This indicates that these samples exhibit a higher ChIP enrichment than the second replicate . 
+ On the other hand , the overall URC level from the first two replicates is higher than that of the third replicate , elucidating that the libraries for the first two replicates are more complex than that of the third replicate . 
+ Figures 3B and C display the Read Composition and FSR distribution plots , which highlight specific problems with replicates 2 and 3 . 
+ Figure 3B exhibits apparent decreasing trends in the proportions of regions formed by fragments in one exclusive strand . 
+ High quality experiments tend to show exponential decay in the proportion of single stranded regions , while for the lower quality experiments , the trend may be linear or even constant ( Supplement Figure S21 ) . 
+ FSR distributions of both of replicates 2 and 3 are more spread around their respective medians ( Figure 3C ) . 
+ The rate at which the 0.1 and 0.9 quantiles approach the median indicate the aforementioned lower enrichment in the second replicate and the low complexity in the third one . 
+ In addition to step 4 , when a set of blacklisted regions is available we divide the ChIP-exo nexus islands into two / groups based on whether or not they overlap the blacklisted regions . 
+ Figure 3D illustrates that , first , 1 and scores 2 are robust to existence of islands in the blacklisted regions . 
+ Second , for the islands overlapping the blacklisted regions , both summary metrics are significantly higher in both the overall level and variance . 
+ Therefore , this stratified analysis further indicates that the 1 and 2 scores provide good overall assessments of the datasets and can clearly separate blacklist regions . 
+ We conclude that replicate 1 is higher quality than both of replicates 2 and 3 . 
+ We validate this observation with a motif analysis on the candidate binding regions identified from these replicates . 
+ A conservative approach to identify high quality binding regions ( Materials and Methods ) reveals 7014 , 1855 , and 2187 regions for replicates 1 , 2 and 3 , respectively . 
+ The lower number of enriched regions from replicate 2 is consistent with the lower ChIP enrichment pattern in the URC vs. ARC diagnostic plot . 
+ Figure 4A compares the FIMO scores among the three replicates , notsurprisingly confirming that the first replicate exhibits the highest quality . 
+ Figure 4B displays the average normalized read coverage around the actual motif locations in the candidate binding regions . 
+ These coverage plots reveal that the ChIP signal is slightly more defined for the first and third replicates than the second one , indicating overall strength of the ChIP enrichment in these samples compared to the second replicate . 
+ Figure 4C compares FSR distributions of the ChIP islands overlapping the union of the peaks across the three replicates and highlights that the samples largely satisfy the ` peak-pair ' assumption because peaks with at least one motif tend to be more strand-balanced . 
+ Furthermore , samples with lower library complexity appear to exhibit heavier FSR tails . 
+ High sequencing depth may confound low-complexity library issues . 
+ We evaluated every sample listed in Tables 1 and 2 with the ChIPexoQual QC pipeline ( Supplementary Figures S16 -- S27 ) . 
+ A key observation from this large scale analysis is that the URC versus ARC plots typically display one of the three patterns captured in the FoxA1 study . 
+ We will refer to these as pattern I ( FoxA1 replicate 1 ) , II ( FoxA1 replicate 2 ) , and III ( FoxA1 replicate 3 ) , respectively . 
+ Pattern III where the two arms along ARC are not distinguishable can arise due to either low-complexity library or high sequencing depth . 
+ For example , all three replicates of the TBP ChIP-exo from K562 , with sequencing depths between ∼ 60M to 115M reads , and replicate two of TBP ChIPnexus in K562 , with a sequencing depth of ∼ 130M reads , exhibit this pattern . 
+ A simple but effective strategy to distinguish the two plausible scenarios from Pattern III is to apply the QC pipeline to sub-samples randomly generated from the full dataset at varying sequencing depths ( sub-sampling analysis module ) . 
+ We applied this strategy by sub-sampling 20M to 50M reads in 10M increments , a range that represents the sequencing depths of the human samples we are using in this paper , from the TBP samples . 
+ URC vs. ARC diagnostics of these sub-samples ( Supplementary Figures S30 to S33 ) indicate that , among the four TBP samples with this pattern , replicates two and three of K562 ChIP-exo suffer from low-complexity library issues , whereas the other sam ples exhibit the pattern specific to high quality samples . 
+ To confirm this implication , we compared the top FIMO scores ( 18 ) of the TBP motif for the ChIP-exo and ChIP-nexus replicates . 
+ Figure 4D illustrates that the first ChIP-exo replicate and ChIP-nexus replicates identify binding events with consistently better motif matches than the other ChIP-exo replicates . 
+ This implication on overall quality is further confirmed by the large separation of the 1 and 2 scores between regions that do and do not overlap with the blacklist regions for these high quality samples ( Supplementary Figures S28-S29 ) . 
+ Figure 4E compares the FSR distributions of ChIP islands overlapping the union of peaks across all TBP samples by stratifying them with respect to TBP motif occurrence . 
+ Overall , while the peaks in high quality experiments are more likely to have a motif occurrence if they are balanced , many strand-unbalanced peaks with motifs are also identified . 
+ Specifically , the proportion of peaks with FSR smaller than 0.3 or larger than 0.7 varied between 0.38-0 .43 and 0.20-0 .22 , for ChIP-exo and the ChIP-nexus experiments , respectively . 
+ This further confirms the conclusion of the ChIPexoQual QC pipeline . 
+ Summary statistics for the URC versus ARC diagnostic plot . 
+ We next utilized QC pipeline results for all the samples ( Tables 1 and 2 ) and quantified the relationship between ARC and URC by fitting a reparametrized regression model of URC as a function of ARC . 
+ Specifically , we considered a model of read depth ( Di ) on the number of positions with at least one aligned read ( Ui ) and the width of the island ( Wi ) , i.e. , Di = 1Ui + 2Wi + εi , where εi represents the random error term . 
+ As we discuss in Materials and Methods , this parametrization has a direct connection i ARC + γ + i , which aims to recapitulate the i relationship in the URC vs. ARC plots . 
+ Figure 5A displays estimated overall change in depth ( β1 ) as the number of positions with at least one aligned read varies across a large collection of ChIP-exo samples from eukaryotic genomes . 
+ The parameter can be interpreted as the limiting ( i.e. , large depth ) URC of a sample . 
+ As discussed earlier , high quality ChIP-exo samples are expected to have two arms in the URC versus ARC plots : one with low ARC and varying URC and another with a decreasing URC as ARC increases and stabilizes . 
+ When the ChIP-exo sample is not deeply 1 sequenced , high values of β in Figure 5A indicate that the 1 library complexity is low . 
+ In contrast , lower values correspond to higher quality ChIP-exo experiments . 
+ Taking into account the depths of these samples and visualizing all the diagnostic plots ( Supplementary Figures S16 -- S27 ) , we conclude that samples with estimated β1 values < 10 seem to be high quality samples . 
+ We interpret the 2 as the overall change in depth as the width varies and display its estimates across all the eukaryotic samples in Figure 5B . 
+ Under perfect digestion by - exo , most of the reads aligned to binding regions are expected to accumulate around binding events . 
+ In a high quality sample , the overall variation in depth is expected to be small as the overall widths of the regions change . 
+ This is because the majority of reads are expected to be located tightly around the binding sites and , as a result , the region width should not significantly affect its depth . 
+ In contrast , low quality sample regions are usually composed of a fixed proportion of reads aligned to a small number of unique positions ; hence , the overall change in depth as the width varies is proportional to this fixed proportion . 
+ For example , although the third replicate of the TBP ChIP-exo experiment has comparable sequencing depth to the second replicate of the TBP ChIPnexus experiment ( Figure 5B ) , β2 is considerably higher for the ChIP-exo experiment . 
+ This potentially indicates that additional sequencing reads in comparison to replicates 1 and 2 are scattered around new positions instead of accumulating on the existing binding sites . 
+ In summary , samples with estimated 2 values close to zero can be considered as high quality samples . 
+ The interaction between 1 and 2 has implications regarding the quality of ChIP-exo and ChIP-nexus samples . 
+ When either β1 is large or β2 is different from zero owing to potentially the high sequencing depth of the sample , we suggest randomly sub-sampling reads to form samples of lower depth and evaluating the sub-samples with the QC pipeline . 
+ As an illustration , we apply this strategy for the three replicates of TBP ChIP-exo in K562 ( 22 ) and second replicate from the K562 ChIP-nexus experiments ( 2 ) . 
+ Figure 5C reveals a much higher β1 ( and larger than 10 ) for replicates 2 and 3 compared to replicate 1 and both ChIPnexus samples . 
+ Figure 5D illustrates that the 2 estimates remain approximately constant in ChIP-nexus sub-samples and sub-samples of first replicate of ChIP-exo , while they increase for the second and third ChIP-exo replicates . 
+ This suggests that these two ChIP-exo replicates have low library complexity and overall lower quality than the ChIP-nexus samples , regardless of the fact that all three experiments are deeply sequenced with more than 90M reads each . 
+ Furthermore , the ChIPexoQual diagnostic plots for each subsample ( Supplementary Figures S30 -- S33 ) illustrate that the two arms of the ARC vs. URC plots are clearly visible in moderate depth sub-samples of TBP ChIP-nexus data . 
+ Similarly , Supplementary Figure S32 illustrates that , as expected , the suggested subsampling strategy is also effective for the E1 and E2 samples , which are deeply sequenced , relative to the E. coli genome . 
+ ChIPexoQual R package
+ We implemented ChIPexoQual as an R/Bioconductor package . 
+ ChIPexoQual utilizes a fast processing algorithm by parallel computing . 
+ Supplementary Figure S36 provides ChIPexoQual 's processing times for a collection of samples representing different sequencing depths of the ChIP-exo/nexus experiments listed in Table 2 using four parallel threads on a server with 24 AMD 55Opteron 2.2 GHz processors . 
+ This plot shows that ChIPexoQual requires between 125 and 640 s ( 80 and 420 when the aligned reads are already loaded into memory ) for processing a ChIP-exo/nexus sample . 
+ CONCLUSION
+ We presented a systematic exploration of several ChIP-exo/nexus datasets . 
+ We provided a list of factors that reflect the quality of a ChIP-exo/nexus experiment and developed an easy to use QC pipeline , implemented into an R/Bioconductor package called ChIPexoQual . 
+ ChIPexoQual takes aligned reads as input and automatically generates several diagnostic plots and summary measures that enable assessing enrichment and library complexity . 
+ Our analysis of several datasets indicated that the QC pipeline only requires a set of aligned reads to provide a global overview of the quality of a given ChIP-exo dataset . 
+ The implications of the diagnostic plots and the summary measures align well with more elaborate analysis that is computationally more expensive to perform and/or requires additional inputs that often may not be available , such as motif occurrences in a set of high quality regions or resolution analysis based on a gold-standard . 
+ The ChIPexoQual package ( version 1.0.0 ) is available from Bioconductor ( http://bioconductor.org/packages/ release/bioc/html / ChIPexoQual.html ) . 
+ The Bioconductor version does not currently include the blacklist submodule . 
+ A stable version ( version 0.99.15 ) with this additional submodule is available at https://github.com/ welch16/ChIPexoQual/tree / devel . 
+ DATA AVAILABILITY
+ Escherichia coli ChIP-exo sequence and processed data are available under the NCBI 's Gene Expression Omnibus ( 28 ) and are accessible through GEO series accession number GSE84830 ( http://www.ncbi.nlm.nih.gov/geo/ query/acc.cgi ? 
+ acc = GSE84830 ) . 
+ Supplementary Data are available at NAR Online.
+ ACKNOWLEDGMENTS
+ R.W. acknowledges the funding provided by CONACYT . 
+ Authors ' contributions : R.W. and S.K. developed the ChIPexoQual pipeline . 
+ RW implemented the ChIPexoQual pipeline and D.C. implemented the dPeak package . 
+ R.W. and D.C. performed the analysis . 
+ J.G. and R.L. performed the E. coli sequencing experiments . 
+ R.W. and S.K. wrote the manuscript . 
+ All authors approved the final draft . 
+ FUNDING
+ National Institutes of Health ( NIH ) [ HG003747 and HG007019 to S.K. ] ( in part ) ; NIH [ GM38660 to R.L. ] ; CONACYT [ 215196 to R.W. ] . 
+ Funding for open access charge : NHGRI . 
+ Conflict of interest statement . 
+ None declared .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/29177735.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/29177735.txt 0 → 100644
View file @27818a9
+ Chapter 6
+ Topoisomerase IV is the main chromosome decatenase of E. coli and most bacteria [ 1 ] . 
+ Topo IV is a heterotetramer formed by dimers of ParC and ParE subunits [ 2 ] which present a high degree of structural homology with the GyrA and GyrB subunits of DNA gyrase , respectively . 
+ Alteration of Topo IV leads to severe chromosome segregation defects [ 2 ] but do not halt chromosome replication [ 3 ] . 
+ Recently a role for Topo IV in the segregation of replicating sister chromatids has been demonstrated [ 3 , 4 ] . 
+ It is therefore proposed that Topo IV works behind replication forks to remove precatenation links [ 5 , 6 ] that are formed by the rotation of the replication fork when DNA gyrase can not eliminate the positive superhelical tension generated by replication . 
+ Sister chromatids segregation is not a homogeneous process in E. coli , some regions of the chromosome appear to segregate a long time after their replication while some others segregate within minutes following their replication . 
+ Among the late segregation regions is the SNAPs regions that are enriched for GATC sequences that might recruit high amounts of SeqA protein that would inhibits Topo IV 
+ [ 7 ] . 
+ The terminus region of the chromosome also exhibits late segregation due to a combination of events : ( i ) the MatP-septal ring interaction [ 8 ] ; ( ii ) the MatP-MukB interaction and ( iii ) the MukB-Topo IV interaction [ 9 ] . 
+ These observations suggested that the decatenation activity of Topo IV is highly regulated in time and space . 
+ This is in good agreement with observations that Topo IV works preferentially late in the cell cycle [ 10 ] and in the chromosome terminus region at the dif site [ 12 ] . 
+ To get more insight into Topo IV regulation , we performed whole genome analysis of Topo IV binding and cleavage activity [ 11 ] . 
+ Topo IV has access to most of the genomic regions of E. coli but only selectively cleaves distinct genomic regions . 
+ Among the cleaved sites is the dif site which is by far the strongeest , conﬁrming that for almost every cell cycle , decatenation events take place at dif on fully replicated chromosomes . 
+ To verify these observations we performed ChIP seq experiments and developed a new Topo IV-DNA co-immunoprecipitation method aimed at trapping only active Topo IV ( which we called NorﬂiP ) . 
+ These two methods are described in the present protocol . 
+ All solutions must be prepared using ultrapure water ( by purifying deionized water , to attain a sensitivity of 18 MΩ-cm at 25 C ) . 
+ Prepare the following buffers and stock solutions . 
+ Unless otherwise speciﬁed , ﬁlter solutions using a 0.2 μm low protein binding nonpyrogenic membranes . 
+ 3 Methods
+ The following steps should be carried out at 4 C unless otherwise indicated . 
+ 8 . 
+ Centrifuge beads for 30 s at 8000 g and recover the Topo IV-DNA complex found in the supernatant ( IP sample ) . 
+ 9 . 
+ The ChIP-seq IP and input samples are de-cross-linked and proteins are degraded overnight at 65 C with 1 mg/mL proteinase K . 
+ The NorﬂIP IP and input samples are treated overnight at 65 C with 1 mg/mL proteinase K and 1 % SDS to degrade Topo IV covalently linked to the 50 end of DNA at the cleavage site . 
+ 10 . 
+ Add 0.2 mg/mL RNAse A and incubate for 30 min at 37 C. 11 . 
+ Purify IP and input samples with a DNA cleanup kit and elute in 30 μL of Milli-Q water . 
+ Alternatively , the Mini - elute kit from Quiagen can be used to limit DNA loss during the cleaning process . 
+ 12 . 
+ Measure DNA quality and quantity using Qubit dsDNA HS kit ( see Note 3 ) . 
+ 13 . 
+ IP/input enrichment can then be preliminarily tested by qPCR using dif and gapA probes . 
+ Libraries were prepared according to Illumina 's instructions accompanying the DNA Sample Kit ( FC-104-5001 ) . 
+ Brieﬂy , DNA was end-repaired using a combination of T4 DNA polymer-ase , E. coli DNA Pol I large fragment ( Klenow polymerase ) and T4 polynucleotide kinase . 
+ The blunt , phosphorylated ends were trea-ted with Klenow fragment ( 30 to 50 exo minus ) and dATP to yield a protruding 3 - ` A ' base for ligation of Illumina 's adapters which have a single ` T ' base overhang at the 30 end . 
+ After adapter ligation DNA was PCR ampliﬁed with Illumina primers for 15 cycles and library fragments of ~ 250 bp ( insert plus adaptor and PCR primer sequences ) were band isolated from an agarose gel . 
+ The puriﬁed DNA was captured on an Illumina ﬂow cell for cluster generation . 
+ Libraries were sequenced on the Genome Analyzer following the manufacturer 's protocols with single read for 50 cycles . 
+ Sequencing results were processed by the IMAGIF facility . 
+ Base calls were performed using CASAVA version 1.8.2 . 
+ ChIP-seq and NorﬂIP reads were aligned to the E. coli NC_000913 genome using BWA 0.6.2 . 
+ A custom made pipeline for the analysis of sequencing data was developed with Matlab ( available upon request ) . 
+ Brieﬂy , the number of reads for the input and IP data was smoothed over a 200 bp window . 
+ Forward and reverse signals were added , reads were normalized to the total number of reads in each experiment , strong nonspeciﬁc signals observed in unrelated experiments were removed , data were exported to the UCSC genome browser ( http://archaea.ucsc.edu ) for visualization and comparisons 
+ Several highly-enriched sites were observed in the IP samples . 
+ Interestingly one of these sites corresponds to the dif site ( position 1.58 Mb ) , which has previously been identiﬁed as a strong Topo IV cleavage site in the presence of norﬂoxacin [ 12 ] . 
+ We also observed strong enrichment over rRNA operons , tRNA and IS sequences . 
+ To address the signiﬁcance of the enrichment at rRNA , tRNA , and IS , we monitored these sites in ChIP-seq experiments performed in the same conditions with a MatP-ﬂag strain and mock IP performed with strain that did not contain any ﬂag tagged protein . 
+ Both MatP and Mock IP presented signiﬁcant signals on rRNA , tRNA , and IS loci . 
+ This observation suggested that Topo IV enrichment at rRNA , tRNAs and IS was an artifact of the ChIP-Seq technique . 
+ By contrast no enrichment was observed at the dif site in the MatP and mock-IP experiments , we therefore considered dif to be a genuine Topo IV binding site and compared every enriched region ( > 2 fold ) with the dif IP . 
+ We ﬁltered the raw data for regions presenting the highest Pearson correlation with the dif signa 
+ 4 Notes
+ ( P > 0.7 ) . 
+ This procedure discarded many highly enriched regions . 
+ An example of a site presenting a selected Topo IV IP/input signal suggesting a speciﬁc binding is presented on Fig. 1 ( red graphs ) . 
+ The strongest IP/input ratio was observed at dif and a locus close to the yebV gene ( 1.9 Mb ) . 
+ They present a characteristic shape ( Fig. 1 blue graphs , see Note 4 ) that allows the automatic detection of lower amplitude peaks but preserving the characteristic shape . 
+ We measured Pearson correlation coefﬁcient with the dif and the yebV site for 600 bp sliding windows over the entire genome . 
+ Peaks with a Pearson correlation above 0.7 were considered as putative Topo IV cleavage sites . 
+ Interestingly in the NorﬂIP experiments nonspeciﬁc signal was observed over rRNA and IS regions but not on tRNA . 
+ This suggested that immunoprecipitation signals over tRNA are artefacts linked to formaldehyde but not to the Flag immunoprecipitation .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/29358050.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/29358050.txt 0 → 100644
View file @27818a9
+ Multiscale Structuring of the E. coli Chromosome by
+ SUMMARY
+ As in eukaryotes , bacterial genomes are not randomly folded . 
+ Bacterial genetic information is generally carried on a circular chromosome with a single origin of replication from which two replication forks proceed bidirectionally toward the opposite terminus region . 
+ Here , we investigate the higher-order architecture of the Escherichia coli genome , showing its partition into two structurally distinct entities by a complex and intertwined network of contacts : the replication terminus ( ter ) region and the rest of the chromosome . 
+ Outside of ter , the conden-sin MukBEF and the ubiquitous nucleoid-associated protein ( NAP ) HU promote DNA contacts in the meg-abase range . 
+ Within ter , the MatP protein prevents MukBEF activity , and contacts are restricted to 280 kb , creating a domain with distinct structural properties . 
+ We also show how other NAPs contribute to nucleoid organization , such as H-NS , which restricts short-range interactions . 
+ Combined , these results reveal the contributions of major evolutionarily conserved proteins in a bacterial chromosome organization . 
+ INTRODUCTION
+ The genomes of all organisms must be folded to ﬁt within a cell that is typically several 1,000-fold smaller than the size of the DNA molecule itself . 
+ The overall chromosome fold is a combination of intertwined structural features resulting from differential accessibility , polymer properties , epigenetic modiﬁcations , and binding of proteins to the sequence . 
+ Recent work has highlighted the dynamics and regulation of this complex network and its functional interplay with metabolic processes over time , such as gene expression regulation , chromosome segregation , and repair ( Dekker and Mirny , 2016 ) . 
+ In bacteria , DNA is efﬁciently compacted into the nucleoid , a dynamic macromolecular complex where the genetic material and its associated proteins are located . 
+ Nucleoid folding and compaction result from a combination of processes ( Wang et al. , 2013 ; Kleckner et al. , 2014 ) : DNA supercoiling , formation of ( elusive ) chromatin-like structures by nucleoid-associated proteins ( NAPs ) , condensation by structural maintenance of chromosome ( SMC ) proteins , macromolecular crowding , and out-of-equilibrium processes such as transcription . 
+ The mostly negatively supercoiled DNA results in branched and plectonemic structures whose precise role ( s ) and distribution remain poorly understood . 
+ Genetic studies revealed the presence of stochastic boundaries between these structures , allowing DNA interactions in cis between sites located more than 100 kb apart ( Higgins et al. , 1996 ) . 
+ RNA polymerase blocks the movement of the plectonemic supercoil in the transcribed track , generating supercoil diffusion barriers ( Booker et al. , 2010 ) . 
+ Those constraints were proposed to segment the chromosome into `` chromosomal interaction domains '' ( CIDs ) ranging in length from 30 to 400 kb and identiﬁed through chromosome conformation capture ( 3C/Hi-C ) ( Dekker et al. , 2002 ; Le et al. , 2013 ) because these domains often display highly expressed genes at their boundaries in Caulobacter crescentus , Bacillus subtilis , and Vibrio cholerae ( Le et al. , 2013 ; Marbouty et al. , 2015 ; Val et al. , 2016 ) . 
+ Co-regulated genes were recently found to generate smaller ( 15 - to 30-kb ) domains in Mycoplasma pneunomiae ( Trussart et al. , 2017 ) . 
+ NAPs are highly abundant proteins that play diverse roles as chromatin organizers , transcription factors , and , more generally , accessory partners involved in DNA transactions ( Dillon and Dorman , 2010 ) . 
+ In E. coli , at least ten abundant ( 30,000 -- 60,000 copies per cell ) DNA binding proteins have been found to be associated with the nucleoid . 
+ These NAPs bend , wrap , or bridge DNA , to which they show different types of afﬁnities ; for instance , high and speciﬁc ( e.g. , Fis ) or non-speciﬁc and with a preference for AT-rich sequences ( e.g. , HU ) . 
+ Until recently , NAPs were suspected to contribute to DNA condensation by acting on the structuring and regulation of the chromatin only at a local scale . 
+ Although a recent 3C experiment unveiled involvement of HU for contacts up to 100 kb in C. crescentus ( Le et al. , 2013 ) , the respective contributions of NAPs to chromatin organization and DNA dynamics in vivo remain poorly understood . 
+ Higher-order levels of organization of bacterial chromosomes have been described in recent years , involving long-range contact structuring of the genome over large distances . 
+ The E. coli chromosome is segmented into macrodomains ( MDs ; Niki et al. , 2000 ; Valens et al. , 2004 ; Espeli et al. , 2008 ) , with the DNA binding protein MatP specifying a constrained 800-kb region ( ter ) surrounding the terminus of the replication locus ( Mer-cier et al. , 2008 ) . 
+ The functional requirement for organizing this ter domain is not completely understood . 
+ The interaction of MatP with the protein ZapB associated to the divisome promotes anchoring of the ter at the midcell and , therefore , controls chromosome choreography during the cell cycle ( Espéli et al. , 2012 ) . 
+ The interplay of MatP with MukBEF associated with Topo-isomerase IV may ensure timely chromosome unlinking and segregation ( Nolivos et al. , 2016 ) . 
+ By performing a structurefunction analysis of MatP , the molecular bases for MatP-medi-ated ter formation were identiﬁed . 
+ MatP contains a tripartite fold that includes a four-helix bundle ( corresponding to the DNA binding domain ) , a ribbon-helix-helix ( RHH ) domain ( responsible for the formation of the MatP dimer ) , and a C-termi-nal coiled coil . 
+ Although the RHH domain promotes the formation of MatP dimers , the coiled-coil regions form a bridged 
+ MatP tetramer that might ﬂexibly link distant matS sites , prompting a model for a protein-mediated DNA-looping mechanism for ter organization ( Dupaigne et al. , 2012 ) . 
+ Mutating the residues involved in the tetramerization affects both DNA condensation and the ability of MatP to interact with ZapB ; by contrast , these mutants were still able to specify the ter MD ( Dupaigne et al. , 
+ 2012 ; Espéli et al. , 2012 ) . 
+ In other species lacking MatP , large domains have also been characterized . 
+ In B. subtilis , a large 800-kb region overlapping the origin of replication is maintained into a constrained , dense state through the action of SMC proteins , as revealed by super-resolution imaging and 3C ( Marbouty et al. , 2015 ) . 
+ It was speculated that the condensation of this domain plays a role in the proper completion of the replication and segregation program . 
+ The disposition of the chromosome within the cell differs between bacterial species . 
+ In B. subtilis , C. crescentus , and M. pneunomiae , the chromosome has a longitudinal disposition , with the two replication arms aligned along the long axis of the cell ( Le et al. , 2013 ; Marbouty et al. , 2014 , 2015 ; Trussart et al. , 2017 ; Umbarger et al. , 2011 ; Wang et al. , 2015 ) . 
+ In E. coli , the chromosome has a transversal disposition , with the two replication arms occupying distinct nucleoid halves and the replication origin in between ( Wang et al. , 2006 ) . 
+ The different chromosome dispositions observed in bacteria suggest that different factors may be involved in chromosomal organization . 
+ Despite numerous efforts to understand the role of each structural factor in the overall chromosome organization among diverse species , their precise effect is yet to be fully understood . 
+ Genomic ana-lyses have shown that bacterial species exhibit diverse combinations of organizing factors . 
+ In E. coli and enterobacteria , these structural factors involve the ubiquitous NAPs such as HU , Fis , and H-NS as well as a speciﬁc group that coevolved with Dam methylase , including the condensin complex MukBEF and MatP ( Brézellec et al. , 2006 ) . 
+ In this study , we explored the higher-order E. coli chromosome organization and reveal the effect of several factors controlling its folding . 
+ The analysis of high-resolution ( 5 kb ) 3C contact maps of the E. coli chromosome in wild-type ( WT ) and mutant backgrounds reveals a multilevel 3D organization mediated by the major ubiquitous NAPs as well as the inﬂuence of transcription on local chromatin structure . 
+ The teaming up of HU and the condensin MukBEF to promote long-range contacts within chromosome arms and the formation of a speciﬁc chromosomal domain through the restraint of condensin activity by MatP provide clues regarding long-range chromosome organization and domain structuring in bacteria . 
+ RESULTS
+ A High-Resolution Contact Map of the E. coli Chromosome
+ 3C coupled with deep sequencing was applied to exponentially
+ growing WT cells to investigate the precise effect of nucleoid structuring factors on chromosome organization in E. coli . 
+ In agreement with the transverse disposition of the E. coli chromosome , the contact map displayed a single strong diagonal resulting from the enrichment of contacts between neighboring loci ( Figures 1A and S1A ) . 
+ The absence of a secondary diagonal perpendicular to this main one reﬂects the lack of contacts between the two replication arms and offers a sharp contrast to bacterial chromosomes characterized so far , such as that of C. crescentus ( Umbarger et al. , 2011 ; Le et al. , 2013 ) or B. subtilis ( Marbouty et al. , 2015 ; Wang et al. , 2015 ) . 
+ Replicate experiments performed on WT cells produced highly reproducthe robustness of this 3C protocol to investigate the spatial organization and patterns of DNA interactions throughout the E. coli chromosome . 
+ To facilitate the interpretation of contact maps , we developed a visualization tool dubbed `` scalogram , '' which represents , for each bin , the cumulated contact frequencies as a function of the genomic distance ( STAR Methods ; Figure S1E ) . 
+ A scalogram therefore displays the accumulated distribution of contacts for each bin with its ﬂanking regions , reﬂecting the relative tightness of the contact distribution ( Figure 1D ) . 
+ Abrupt changes in signal along the chromosome reveal three regions of 0.5 -- 1 Mb in size each that exhibits a distinct contact pattern : a single highly constrained domain around ter ( ter ) and two loosely structured regions ( L1 and L2 ) whose loci form contacts over a larger distances . 
+ This genome segmentation correlated with domains identiﬁed by a directional index analysis that reports the degree of upstream and downstream interactions for a genomic region carried out at a scale of 400 kb ( Figure 1E ; STAR Methods ; Dixon et al. , 2012 ) . 
+ This organization was conserved across three rep-licates ( Figures S1A ) and , in asynchronous E. coli cell populations , growing in different media and at different temperatures ( Figures S1F and S1G ) . 
+ Overall , these three regions underline the segmentation of the chromosome into six intervals ( Figure 1F , top ) that correlate well with those deﬁned by a genetic recombination assay that identiﬁed four ( Ori , ter , left and right ) MDs and two left and right non-structured regions ( Figure 1F , bottom ; Va-lens et al. , 2004 ) . 
+ In conclusion , the WT genome-wide contact map of E. coli validates previous genetic and imaging studies that revealed the non-homogeneous organization of the chromosome and conﬁrmed the existence of a peculiar folding for the 
+ To determine whether the distribution of contacts made by a chromosomal locus correlated with its dynamics ( STAR Methods ) , the cumulative contact signal at various distances was compared with mean square displacements ( MSDs ) of chromosomal loci at short timescales ( Espeli et al. , 2008 ; Javer et al. , 
+ 2014 ; Figure S1H ) . 
+ This analysis revealed that cumulative contact signals at 200 kb best correlated with the MSD measured at 10 s by Javer et al. ( 2014 ) or at 180 s by Espeli et al. ( 2008 ) . 
+ As shown by the strong anti-correlation observed between the cumulated 3C signal at 200 kb and the MSDs of several loci ( Figure 1H ; Figure S1I ) , contact maps can provide insights into the 
+ Interplay between Transcription and Local Chromatin Structure
+ Chromosomal structures ranging in size from 15 to 33 kb ( Myco-plasma pneumonia ; Trussart et al. , 2017 ) and 20 to 200 kb ( C. crescentus and B. subtilis ; Le et al. , 2013 ; Marbouty et al. , 2015 ) have been characterized . 
+ A directionality index ( DI ) analysis performed at a scale of 100 kb ( Le et al. , 2013 ) along the E. coli chromosome identiﬁed 31 CIDs ranging in size from 40 to 300 kb in exponential phase ( average , 150 kb ; Figure S2A ) . 
+ Boundaries were conserved across all exponential growth conditions ( Figure S1G ) and in different genetic backgrounds ( below ) . 
+ Boundaries were enriched in highly expressed , sometimes long ( ( Le and Laub , 2016 ; Marbouty et al. , 2015 ) transcription units ( 22 of 31 ; Figure S2B ; STAR Methods ) and in genes encoding proteins with an export signal sequence ( signal recognition particle [ SRP ] genes , 9 of 31 ) . 
+ This diversity suggests that multiple mechanisms may be responsible for deﬁning CID boundaries , including local decompaction of active transcribed regions ( Le and Laub , 2016 ) . 
+ We next investigated how transcription may affect short-range contacts along the chromosomes . 
+ The contact frequency between adjacent bins was plotted along the transcription proﬁle at resolutions of 2 kb and 5 kb . 
+ The strong correlation ( Pearson correlation [ PC ] > 0.5 ) between the two signals suggests that transcription levels correlate with short-range contact frequencies ( Figures S2C and S2D ) . 
+ Contact frequencies as a function of genomic distance were then plotted for genes pooled according to their expression level ( Figure S2E ; STAR Methods ) , revealing that higher levels of expression correlated in enrichment in short range contacts as well as stronger decay of the slope . 
+ Remarkably , these correlations were also observed in contact maps of other bacteria generated by different laboratories using different enzymes and crosslinking conditions ( Figures S2C and S2E ) , suggesting the existence of transcription-induced constraints that favor interactions between neighboring loci . 
+ Different interpretations can be provided to account for this observation . 
+ For instance , a less mobile locus ( above ) would indeed result in fewer long-range contacts and an increase in short-range contacts . 
+ Alternatively , a more open ﬁber may also lead to a stronger decay of the contact slope . 
+ The correlation was not apparent in C. crescentus , but similar trends were also observed in Drosophila ( Corrales et al. , 2017 ) . 
+ Contact maps of non-dividing E. coli cells in stationary phase revealed a large-scale chromosomal reorganization with an increase of long-range contacts more pronounced in ter ( Figures S1F and S1J ) . 
+ A directional index analysis performed at a scale of 100 kb identiﬁed 30 CIDs ( Figure S2F ) , and the boundaries identiﬁed under this condition were different from those identi-ﬁed in exponential phase ( Figure S2G ) . 
+ Interestingly , here too a signiﬁcant correlation ( PC = 0.40 ) was found between transcription levels and short-range contact frequencies ( Figures S2H and S2I ) . 
+ The causal relationships between these correlations ( transcription , short-range contact decay , and dynamics ) remain to be deciphered . 
+ Organization of ter and a Role for MatP in ter Insulation
+ The MatP protein plays a major role in the ter organization of enterobacteria ( Mercier et al. , 2008 ; Dupaigne et al. , 2012 ) . 
+ To gain insights into the molecular mechanism by which MatP bound to matS sites structures the ter MD , genomic 3C experiments were performed in a matP mutant . 
+ In the absence of MatP , the pattern of chromosome contacts was conserved across the genome , except for ter , which now appeared similar to the rest of the genome ( Figures 2A , 2B , and S3A ) . 
+ In the absence of MatP , an enrichment in long-range contacts ( more than 280 kb ) within ter and between ter and its ﬂanking regions appeared , accompanied by a compensatory decrease in contacts under 280 kb within ter ( Figure 2B ) . 
+ These results are consistent with genetic recombination assays that show , in the absence of MatP , an enrichment of long-range contacts within ter ( Figure 2C ) and between ter and its ﬂanking domains ( Mercier et al. , 2008 ) 
+ Remarkably , matS sites did not display enriched contacts with each other ( Figures 2D and S3B ) . 
+ These observations incited us to investigate more precisely the mechanism of MatP-medi-ated organization of ter . 
+ A matPDC20 derivative unable to form MatP tetramers and to interact with ZapB ( Dupaigne et al. , 2012 ; Espéli et al. , 2012 ) did not signiﬁcantly modify the contact map ( Figure 3A ) , as shown by the ratio of normalized contact maps ( Figure S3C ) or the ratio plot of contact signals ( Figure 3B ) . 
+ Similarly , a zapB mutation that abolishes the interaction of MatP with the divisome did not affect the contact pattern ( Figures 3C , 3D , and S3D ) . 
+ Therefore , tetramerization of MatP and the interaction of MatP with the division machinery are not required for maintaining the architecture of ter or for insulating it from ﬂanking MDs . 
+ These observations do not support the hypothesis that MatP tetramers bridge matS DNA sites into a single 800-kb intertwined domain ( Dupaigne et al. , 2012 ) but suggest instead that MatP binds to matS sites to organize the ter MD into a succession of overlapping subdomains . 
+ MatP was also described for its ability to maintain together sister ter regions extensively following replication . 
+ We tested whether intermolecular interactions between chromosome and plasmids carrying matS sites could be revealed by chromosome conformation capture sequencing ( 3C-seq ) . 
+ The behavior of a plasmid with and without matS sites was investigated in WT , matPDC20 , and zapB mutants . 
+ In WT cells , targeting of MatP at the septum ring promotes the anchoring of the replicated ter at the midcell . 
+ As a consequence , plasmids carrying matS sites also colocalize with ter at the midcell ( Espéli et al. , 2012 ) . 
+ Accordingly , the plasmid displayed enriched contacts with ter compared with a plasmid devoid of matS ( Figures 3E and 3F ) . 
+ Contacts between the plasmid and the chromosome did not exhibit a discrete pattern that would result from matS-matS contacts ( Figure 3E , inset ) , further conﬁrming that MatP does not speciﬁcally connect these sites . 
+ In the matPDC20 and zapB mutants , plasmids with or without matS sites do not position at the midcell ( Dupaigne et al. , 2012 ; Espéli et al. , 2012 ) , but whether they interact with ter is unknown . 
+ In these mutants , although ter displays a WT organization ( Figures 3A and 3C ) , contacts between ter and the matS plasmid are lost ( Figures 3G -- 3J ) . 
+ This demonstrates that MatP-dependent intermolecular contacts between molecules ( or replicons ) carrying matS sites are not involved in ter structuring and that the colocalization of matS sites at the midcell requires ZapB . 
+ The Condensin MukBEF Promotes Long-Range Chromosome Folding
+ E. coli has one single SMC complex , MukBEF . 
+ This complex is essential for correct chromosome segregation and conformation ( Nolivos and Sherratt , 2014 ) . 
+ In a mukB deletion mutant , 20 % of anucleate cells are produced at a permissive temperature 
+ ( 11 % of right loci in 0.75 of the cell length in WT cells versus 22 % in mukB ; Figure S4C ) . 
+ To investigate the effect of MukBEF on chromosome organization at the molecular level , a contact map of cells depleted in mukB was generated ( Figures 4A and S5A ) . 
+ In the absence of condensin , the contact map ratio showed a reduction in long-range ( > 280 kb ) contacts concom-itant with an increase of mid-range contacts up to 280 kb along the chromosome compared with the WT strain , except in ter ( Figures 4B and S5B ) . 
+ No signiﬁcant changes were detected in the ter MD of the mukB mutant compared with WT cells ( Figure 4B ) . 
+ Altogether , these results suggest that MukBEF promotes long-range ( > 280 kb ) contacts within replication arms outside of ter . 
+ Restriction of MukBEF-Dependent Long-Range Contacts in ter by MatP
+ Previous work in E. coli showed a physical interaction between MatP and MukB both in vivo and in vitro ( Nolivos et al. , 2016 ) . 
+ This interaction has been proposed to promote the displacement of MukB out of ter , facilitating the association of MukBEF with the Ori region . 
+ The unaltered contact pattern observed in ter in muk cells suggests that MukBEF is not active in ter organization . 
+ To test this hypothesis , the chromosome conformation contact map of a double mukB matP mutant ( Figure S5C ) was compared with a matP single mutant ( Figures S5C and S5D ) . 
+ These results showed a reduction in long-range contacts over the entire chromosome , including ter , in the absence of MukB ( Figure 4C ) , indicating that MatP impedes MukBEF activity in ter . 
+ In agreement with these data , the inactivation of MatP allows MukBEF to interact with ter and , hence , to increase long-range interactions ( > 280 kb ) in this region ( Figures 2B and 2C ) . 
+ Combined , our data reveal that MukBEF promotes long-range contacts along the chromosome , except in ter , where this activity is reduced or alleviated by MatP . 
+ Therefore , the peculiar structure of ter appears to result from a default of access by MukB instead of active folding promoted by MatP . 
+ HU Is Also Essential to Promote Long-Range Communication
+ Although NAPs have long been known to modulate DNA conformation by bending , wrapping , or bridging it , their exact contribution to chromosome folding in vivo is still unknown . 
+ The involvement of the NAPs Fis , H-NS , and HU in chromosome conformation was therefore investigated . 
+ NAP mutants were grown under conditions where growth defects were minimalized ( Figure S4 ; STAR Methods ) . 
+ We ﬁrst focused on the conserved protein HU . 
+ In E. coli , HU exists as an heterodimer ( HUab ) or as a homodimer ( HUa2 ) ( Claret and Rouviere-Yaniv , 1997 ) and is one of the most abundant NAPs in exponential phase . 
+ The conserved HU protein binds non-speciﬁcally to DNA with a preference for AT-rich sequences ( Prieto et al. , 2012 ) . 
+ In the hupAB mutant , E. coli cells present segregation defects , ﬁlament formation , and nucleoid compaction ( Figures S4B and S4D ) . 
+ To determine the role of HU in chromosome conformation , 3C contact maps were produced for a hupAB mutant ( Figure 4D ) and 
+ Remarkably , this increase in contacts is similar to that observed in the absence of MukBEF activity outside of ter . 
+ In ter , the absence of HU leads to an increase in contacts in the 5 - to 50-kb range and a reduction in contacts in the 50 - to 280-kb range ( Figures 4E , S5E , and S5F ) . 
+ These results reveal the existence of multiple mechanisms of DNA folding in ter , with HU favoring contacts in the 50 - to 280-kb range . 
+ No correlation between the DNA binding proﬁle of HU with the contact map ratio at short scales was found ( Figures S5G -- S5I ) , whereas most CID borders identiﬁed in WT cells were retained in hupAB mutants ( 23 of 29 identiﬁed under these conditions ; Figure S5J ) . 
+ Altogether , these results show that HU is required to maintain DNA contacts in the megabase range outside of ter and up to 280 kb within ter . 
+ The Roles of Fis and H-NS in Chromosome Organization
+ Fis , the most abundant DNA-binding protein in E. coli , binds to 1,200 sites and modulates the expression of hundreds of genes ( Cho et al. , 2008 ; Kahramanoglou et al. , 2011 ) . 
+ Fis is also thought to play an important role in shaping nucleoid structure by bending DNA and promoting the branching of plecto-nemes ( Hardy and Cozzarelli , 2005 ; Skoko et al. , 2006 ) . 
+ In the absence of Fis , cells were longer ( 5.21 ± 1.8 mm versus 3.05 ± 0.74 mm ) , with minor chromosome segregation defects , and nucleoids were more spread out compared with WT cells ( Figures S4B and S4E ) . 
+ The 3C contact map for ﬁs showed that the overall chromosome conformation remained conserved compared with the WT ( Figure 5A ) , including CID boundaries ( 22 of 31 ; Figure S6A ) . 
+ However , the ratio of the contact maps ( Figure S6B ) and the ratio plot of contact signals ( Figures 5B and S6C ) between ﬁs and WT cells revealed an enrichment of contacts in the 5 - to 100-kb range , a strong decrease above 200 -- 400 kb , and a strong decrease above 100 kb in ter . 
+ The contact ratio along the genome did not correlate with the density of Fis binding sites ( Figures S6D -- S6G ) . 
+ To further investigate the reduction of contacts , a recombination assay was performed ; it conﬁrmed that contacts in the range of 250 kb are reduced outside of ter and that this effect is more pronounced in ter ( Figure 5C ) . 
+ Although the underlying mechanisms promoted by Fis responsible for this higher-order architecture remain unknown , these results show that this NAP is a global player of chromosome folding by promoting contacts beyond 100 kb without discrimination along the genome . 
+ The transcriptional repressor H-NS prevents the transcription of horizontally acquired genes in enterobacteria . 
+ Chromatin immunoprecipitation sequencing ( ChIP-seq ) experiments conﬁrmed that , in vivo , H-NS binds speciﬁcally to AT-rich sequences and spreads upon binding ( Kahramanoglou et al. , 2011 ) . 
+ To investigate the role of H-NS in chromosome organization , contact maps were determined in a deletion mutant . 
+ The overall conformation of the chromosome ( Figure 5D ) and the distribution of CIDs are highly conserved compared with WT cells ( 26 CID borders of 31 ; Figure S6H ) . 
+ The ratio of the contact maps ( Figure S6I ) and the ratio plot of contact signals ( Figures 5E and S6J ) between hns and WT contact patterns demonstrated variations in DNA contacts in the absence of H-NS , with the removal of H-NS resulting in a signiﬁcant enrichment in short-range contacts of H-NS binding regions ( Figures 5E , 5F , and S6C ) . 
+ To understand this local enrichment , the DNA binding proﬁle of H-NS was correlated with the contact map ratio at short scales and at 5 kb resolution ( Figure S6K ; STAR Methods ) . 
+ A signiﬁcant correlation was observed ( PC , 0.5 ; p = 7.89 e 59 ) between the two parameters ( Figure 5G ) , with bins enriched in H-NS binding sites displaying two types of behaviors in the mutant : either an increase in short-range contacts ( 70 % overlap ) or no changes ( STAR Methods ) . 
+ The same analysis performed with 2-kb binning of the maps led to similar , slightly noisier results with 63 % overlap 
+ ( PC , 0.38 ; p = 2.6 e 79 ; see also Figure S6K ) . 
+ This result shows that the local binding of H-NS in WT cells prevents a large fraction of its targets from interacting with their neighboring loci . 
+ The absence of changes for the other targets may result from other processes maintaining the local folding in the absence of H-NS or preventing H-NS to fold the DNA in the WT . 
+ We did not observe the previously reported H-NS-promoted juxtaposition of H-NS-regulated operons ( Wang et al. , 2011 ; Figures S6I and S6L ) , and no variations in long-range contacts were detected with the recombination assay ( Figure S6M ) . 
+ Thus , in the absence of H-NS , short-range contacts increase in many cases , suggesting that the local binding of H-NS in 
+ WT cells prevents these discrete regions from interacting with neighboring loci . 
+ Cooperation of MukBEF and NAPs for Long-Range Chromosome Organization in E. coli Collectively , our results reveal complex , intertwined levels of higher-order organization of the E. coli nucleoid , with different players having contrasting roles in chromosome architecture ( Figures 6A and 6B ) . 
+ The three proteins Fis , HU , and MukBEF promote long-distance DNA contacts in the megabase range outside of ter in WT cells and on the whole chromosome in the matP mutant . 
+ In ﬁs-deﬁcient cells , long-range contacts above 
+ 300 -- 500 kb decreased , whereas , in the absence of HU or MukB , contacts below 280 kb are enriched . 
+ In ter , MatP ( or MatPDC20 ) maintains contacts up to 280 kb by restricting the action of MukBEF . 
+ In the absence of HU , contacts in ter in the 5 - to 50-kb range increased concomitantly with a decrease in the 50 - to 280-kb range . 
+ Finally , in the absence of Fis , most contacts in ter occur in the 5 - to 100-kb range . 
+ Our results support a model in which MukBEF and HU cooperate to promote DNA contacts in the megabase range along the chromosome arms , MatP prevents MukBEF activity in ter , Fis favors DNA communications , and H-NS has only local effects on DNA conformation . 
+ Therefore , MatP appears to insulate ter from 
+ DISCUSSION
+ Global Organization of the E. coli Chromosome
+ The understanding of chromosome structuring and dynamics remains fragmented for most prokaryotic and eukaryotic spe-cies , hampering the study of their functional roles . 
+ Using 3C contact maps of the E. coli genome , we disclose the multilayer organization of the chromosome . 
+ The E. coli chromosome contact map points to an absence of contacts between arms , as expected from a transversal chromosome disposition . 
+ DNA collisions along the genome are not uniform , suggesting the existence of processes that modulate the probabilities of contacts at different scales . 
+ At a lower scale ( 40 -- 300 kb ) , CIDs are also present , as in C. crescentus , B. subtilis , and V. cholera . 
+ Th invariable nature of these structures in multiple growth and mutant conditions suggests local imprinting of a mark resulting in the systematic generation of boundaries within 3C datasets . 
+ Some of these boundaries appear to result from topological constraints ( Le et al. , 2013 ) . 
+ Abrupt changes of long-range contact frequencies conﬁrm the existence of larger domains ( 0.5 to 
+ Partitioning of the E. coli Chromosome
+ The present analyses reveal a partitioning of the E. coli genome into two entities , ter and the rest of the chromosome . 
+ In agreement with previous work ( Nolivos et al. , 2016 ) , our data support a model in which MatP impedes MukBEF activity from ter and reveal that MukBEF is required for long-range DNA contacts 
+ ( A ) Schematic of the bipartite structure of the E. coli chromosome , representing the differential contacts inside and outside of ter . 
+ Outside of ter ( nonter ) , contacts occur up to the megabase range . 
+ Upon inactivation of either MukB ( yellow line ) or HU ( red line ) , contacts are limited to 280 kb . 
+ Fis inactivation ( cyan ) variably affects the contact range along the genome . 
+ Inside ter , contacts are limited to 280 kb . 
+ In the absence of MatP , they become similar to those observed in non-ter ( dashed gray line ) . 
+ Upon inactivation of HU ( red line ) or Fis ( cyan ) , contacts above 50 kb and 100 kb are decreased , respectively . 
+ ( B ) Recapitulation of contact ranges ( CRs ) observed in different mutants for the ter region ( dashed blue box ) or the rest of the genome ( dashed gray box ) illustrates the role of MatP as a regulator of chromosomal partition . 
+ ( C ) Schematic of putative chromosome folding in ter and outside of ter . 
+ The arrows in the vicinity of MukBEF outside of ter represent a process not yet characterized , allowing MukBEF to promote long-range contacts . 
+ For clarity , plectonemic structures resulting from DNA supercoiling were omitted . 
+ interplay with the MukBEF complex cements the role of MatP as an important player in chromosome organization ; this activity does not require the formation of tetramers or its anchoring at the septum of division . 
+ MatP has already been shown to confer speciﬁc properties to ter by interacting with ZapB and localizing ter at the midcell ( Espéli et al. , 2012 ) . 
+ Here we demonstrate that the MatP-ZapB interaction favors intermolecular contacts between plasmids and the ter region , unveiling how sister ter MDs might be in contact before their segregation . 
+ MatP therefore speciﬁes ter by two different activities : ﬁrst by connecting the chromosome with the divisome through determinants in the C terminus of the protein and second by inhibiting MukBEF activity through other determinants . 
+ Whether only the promotion of long-range contact by MukBEF is excluded from the ter or whether other activities associated with the condensin complex are excluded as well remains unknown . 
+ A functional link between the role held by Topoisomerase IV ( TopoIV ) in decatenation and the facilitation of chromosome segregation by MukBEF has been proposed ( Zawadzki et al. , 2015 ) . 
+ Modulation of TopoIV recruitment to ter by MatP could thus control the extent of sister chromatid colocalization and coordinate the late stage of chromosome segregation with cell division ( Nolivos et al. , 2016 ) . 
+ It is noteworthy that MatP and MukBEF belong to a group of proteins that coevolved in enterobacteria along with Dam methylase and ten other proteins ( Brézellec et al. , 2006 ) , whose potential role in chromosome organization remains to be investigated . 
+ Crystal structures have revealed the dimerization of MatP di-mers bound to matS sites . 
+ In vitro microscopic observations of MatP-dependent loops of DNA molecules carrying multiple matS suggest that ter organization is mediated by the bridging of distant matS sites ( Dupaigne et al. , 2012 ) . 
+ However , 3C analyses did not unveil any discrete in vivo intrachromosomal matS-matS contacts . 
+ Furthermore , the chromosome and plasmids carrying matS sites were not brought together in the absence of ZapB but in the presence of MatP . 
+ The same plasmids in the presence of ZapB contacted ter , but no discrete matS-matS interactions could be identiﬁed . 
+ Combined , these results show that MatP does not promote DNA bridges between matS sites , either in cis or in trans , but promotes the formation of a chromosomal domain by exclusion of a condensin complex . 
+ They also reveal that MatP-ZapB interactions at the divisome are responsible for the clustering of distinct DNA molecules carrying matS sites . 
+ A future challenge is to achieve a full understanding of how such DNA complexes are dynamically organized during the cell cycle . 
+ In the absence of MatP or in the absence of both MukBEF and 
+ MatP , ter shows the same range of interactions as the rest of the chromosome . 
+ These results reveal that the absence of MukB is the major determinant of the 3C signal in ter compared with the rest of the genome and that MatP itself has little effect on ter DNA contacts . 
+ However , this effect in ter is exacerbated in the absence of HU or Fis , indicating a counteraction between 
+ MatP and these two NAPs.
+ NAPs Contribute in Diverse Ways to E. coli Chromosome Conformation
+ Decades of biochemical and genomic studies have been carried out for the three important NAPS HU , Fis , and H-NS . 
+ They typically cover 10 -- 30 bp of DNA at their binding sites , and , depending on local binding properties or additional interactions with other protein-bound DNA complexes , they can organize DNA into various conformations . 
+ However , the relation between local DNA binding and the in vivo organization of chromosomal DNA over long scales remains to be deﬁned . 
+ This study provides important insights into distinct activities of three major NAPs in the control of chromosome conformation , with H-NS affecting short-range contacts , whereas HU and Fis promote long-range contacts in different ways . 
+ The H-NS effect is in agreement with its modus operandi of silencing extensive regions of the bacterial chromosome by binding ﬁrst to nucleating high-afﬁnity sites and then spreading along AT-rich DNA ( Lang et al. , 2007 ) . 
+ HU is required along with MukBEF to promote a megabase range of communications in the chromosome . 
+ Thus , one can speculate that either HU cooperates with MukBEF to promote long-range interactions or that MukBEF activity builds on DNA properties generated by HU . 
+ Surprisingly , the inactivation of HU in E. col has opposite effects as those observed in C. crescentus ( Le et al. , 2013 ) . 
+ This difference may either result from the presence in C. crescentus of SMC , a class of bacterial condensins different from MukBEF that would not require HU for its activity , or from the presence of another NAP in C. crescentus that would play the role of E. coli HU . 
+ The absence of Fis is less dramatic than the absence of HU in E. coli because the decrease of long-range DNA communication varies along the chromosome and may depend on local DNA properties resulting from the absence of Fis interacting with its targets . 
+ Finally , in ter , both HU and Fis are required for optimal contacts up to 280 kb , presumably by counteracting an effect of MatP . 
+ How MatP may 
+ Higher-Order Organization of the E. coli Chromosome
+ Our results reveal two modes of DNA communication in the E. coli chromosome ( Figures 6B and 6C ) . 
+ First , there is a long-range mode , homogeneous throughout most of the chromosome outside of ter , that depends on both MukBEF and HU action . 
+ Although the precise interplay between 
+ HU and MukBEF is unknown , these results provide signiﬁcant insights into the organization of bacterial chromosomes . 
+ The effect of the MukBEF complex in E. coli appears to be radically different from that of SMC in B. subtilis ( Marbouty et al. , 2015 ; Wang et al. , 2015 ) . 
+ Instead of aligning the chromosome arms from a centromere-like locus , MukB promotes DNA contacts in the megabase range within each replication arm . 
+ How these structurally related proteins promote such different processes remains unknown , but it may involve a similar mechanism . 
+ As invoked for B. subtilis arm bridging ( Marbouty et al. , 2015 ; Wang et al. , 2015 ) , DNA loop extrusion , a model by which mo-lecular motors actively generate loops ( Alipour and Marko , 
+ 2012 ; Dekker and Mirny , 2016 ) , could also account for the MukBEF-dependent formation of dynamic long-range cis contacts along the E. coli chromosome arms . 
+ Interestingly , HU appears as a key cofactor of the DNA management process by MukB . 
+ In hupA hupB mutants , outside of ter , the absence of HU mimics the absence of MukBEF , suggesting a direct link between the activities of the two proteins . 
+ So far , no such general role for an accessory protein has been uncovered for bacterial condensins . 
+ The second mode of DNA communication is revealed in the absence of MukBEF and corresponds to enriched homogeneous DNA contacts within 280 kb that could result either from a condensation process bringing together distant loci or from dynamic process ( es ) resulting in frequent and transient collisions between these sites . 
+ Several factors can cooperate to generate such contacts . 
+ First , DNA supercoiling could inﬂu-ence the likelihood of distant loci to collide with each other by promoting the sliding of branched plectonemic structures ( Staczek and Higgins , 1998 ) . 
+ Second , NAPs may play an important role in modulating the ability of a locus to make contacts with ﬂanking sequences . 
+ The 3C results reported here indicate that both HU and Fis promote higher-order DNA organization . 
+ Further studies will aim to characterize the precise organization of DNA in the regions that contribute to these contacts . 
+ members of the R.K. , O.E. , and F.B. laboratories for fruitful discussions and advice . 
+ We thank the I2BC genomic facility for high-throughput sequencing . 
+ This research was supported by funding from the European Research Council under the 7th Framework Program ( FP7/2007 -2013 , ERC Grant Agreement 260822 to R.K. ) , by the Agence Nationale pour la Recherche ( HiResBac 
+ ANR-15-CE11-0023-03 to R.K. , O.E. , and J.M. ) , by the Fondation pour la Re-cherche Médicale ( to V.S.L. ) , and by the Agence Nationale pour la Recherche ( ANR-12-BSV8-0020-01 to F.B. ) . 
+ AUTHOR CONTRIBUTIONS
+ Conceptualization , V.S.L. , A.C. , O.E. , F.B. , and R.K. ; Methodology , V.S.L. , A.C. , M.M. , and J.M. ; Investigation , V.S.L. , A.C. , and S.D. ; Writing -- Draft , V.S.L. , A.C. , F.B. , and R.K. ; Writing -- Review & Editing , O.E. , F.B. , and R.K. ; Funding Acquisition , F.B. and R.K. ; Supervision , F.B. and R.K. 
+ Dillon, S.C., and Dorman, C.J. (2010). Bacterial nucleoid-associated proteins, Niki, H., Yamaichi, Y., and Hiraga, S. (2000). Dynamic organization of chromo- nucleoid structure and gene expression. Nat. Rev. Microbiol. 8, 185–195. somal DNA in Escherichia coli. Genes Dev. 14, 212–223.
+ Nolivos , S. , and Sherratt , D. ( 2014 ) . 
+ The bacterial chromosome : architecture and action of bacterial SMC and SMC-like complexes . 
+ FEMS Microbiol . 
+ Rev. 38 , 380 -- 392 . 
+ Nolivos , S. , Upton , A.L. , Badrinarayanan , A. , Müller , J. , Zawadzka , K. , Wiktor , J. , Gill , A. , Arciszewska , L. , Nicolas , E. , and Sherratt , D. ( 2016 ) . 
+ MatP regulates the coordinated action of topoisomerase IV and MukBEF in chromosome segregation . 
+ Nat . 
+ Commun . 
+ 7 , 10466 
+ Prieto , A.I. , Kahramanoglou , C. , Ali , R.M. , Fraser , G.M. , Seshasayee , A.S.N. , and Luscombe , N.M. ( 2012 ) . 
+ Genomic analysis of DNA binding and gene regulation by homologous nucleoid-associated proteins IHF and HU in Escherichia coli K12 . 
+ Nucleic Acids Res . 
+ 40 , 3524 -- 3537 . 
+ Skoko , D. , Yoo , D. , Bai , H. , Schnurr , B. , Yan , J. , McLeod , S.M. , Marko , J.F. , and Johnson , R.C. ( 2006 ) . 
+ Mechanism of chromosome compaction and looping by the Escherichia coli nucleoid protein Fis . 
+ J. Mol . 
+ Biol . 
+ 364 , 777 -- 798 . 
+ Staczek , P. , and Higgins , N.P. ( 1998 ) . 
+ Gyrase and Topo IV modulate chromosome domain size in vivo . 
+ Mol . 
+ Microbiol . 
+ 29 , 1435 -- 1448 . 
+ Tjaden , B. ( 2015 ) . 
+ De novo assembly of bacterial transcriptomes from RNA-seq data . 
+ Genome Biol . 
+ 16 , 1 . 
+ Trussart , M. , Yus , E. , Martinez , S. , Baù , D. , Tahara , Y.O. , Pengo , T. , Widjaja , M. , Kretschmer , S. , Swoger , J. , Djordjevic , S. , et al. ( 2017 ) . 
+ Deﬁned chromosome structure in the genome-reduced bacterium Mycoplasma pneumoniae . 
+ Nat . 
+ Commun . 
+ 8 , 14665 . 
+ Umbarger , M.A. , Toro , E. , Wright , M.A. , Porreca , G.J. , Baù , D. , Hong , S.-H. , Fero , M.J. , Zhu , L.J. , Marti-Renom , M.A. , McAdams , H.H. , et al. ( 2011 ) . 
+ The three-dimensional architecture of a bacterial genome and its alteration by ge ¬ 
+ Val , M.-E. , Marbouty , M. , de Lemos Martins , F. , Kennedy , S.P. , Kemble , H. , Bland , M.J. , Possoz , C. , Koszul , R. , Skovgaard , O. , and Mazel , D. ( 2016 ) . 
+ A checkpoint control orchestrates the replication of the two chromosomes of Vibrio cholerae . 
+ Sci . 
+ Adv. 2 , e1501914 . 
+ Valens , M. , Penaud , S. , Rossignol , M. , Cornet , F. , and Boccard , F. ( 2004 ) . 
+ Macrodomain organization of the Escherichia coli chromosome . 
+ EMBO J. 23 , 4330 -- 4341 . 
+ METHODS
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/29394395.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/29394395.txt 0 → 100644
View file @27818a9
+ Systems assessment of transcriptional regulation on
+ 1Department of Bioengineering , University of California , San Diego , La Jolla , CA 92093 , USA , 2Department of Genetic Engineering , College of Life Sciences , Kyung Hee University , Yongin 446 -- 701 , Republic of Korea , 3School of Chemical and Biological Engineering , Institute of Chemical Prcocess , Seoul National University , 1 Gwanak-ro , Gwanak-gu , Seoul 08826 , Republic of Korea , 4Division of Biological Science , University of California , San Diego , La Jolla , CA 92093 , USA , 5School of Information and Communication , Gwangju Institute of Science and Technology , 123 Cheomdan-gwagiro , Buk-gu , Gwangju , Republic of Korea , 6Department of Biological Sciences , Korea Advanced Institute of Science and Technology , Daejeon 34141 , Republic of Korea , 7Department of Pediatrics , University of California , San Diego , La Jolla , CA 92093 , USA and 8The Novo Nordisk Foundation Center for Biosustainabiliy , Danish Technical University , 6 Kogle Alle , Hørsholm , Denmark 
+ Received October 28, 2016; Revised January 21, 2018; Editorial Decision January 23, 2018; Accepted January 24, 2018
+ ABSTRACT
+ Two major transcriptional regulators of carbon metabolism in bacteria are Cra and CRP . 
+ CRP is considered to be the main mediator of catabolite repression . 
+ Unlike for CRP , in vivo DNA binding information of Cra is scarce . 
+ Here we generate and integrate ChIP-exo and RNA-seq data to identify 39 binding sites for Cra and 97 regulon genes that are regulated by Cra in Escherichia coli . 
+ An integrated metabolic-regulatory network was formed by including experimentally-derived regulatory information and a genome-scale metabolic network reconstruction . 
+ Applying analysis methods of systems biology to this integrated network showed that Cra enables optimal bacterial growth on poor carbon sources by redirecting and repressing glycolysis ﬂux , by activating the glyoxylate shunt pathway , and by activating the respiratory pathway . 
+ In these regulatory mechanisms , the overriding regulatory activity of Cra over CRP is fundamental . 
+ Thus , elucidation of interacting transcriptional regulation of core carbon metabolism in bacteria by two key transcription factors was possible by combining genome-wide experimental measurement and simulation with a genome-scale metabolic model . 
+ INTRODUCTION
+ Catabolite repression is a universal phenomenon , found in virtually all living organisms , ranging from bacteria to plants and animals ( 1,2 ) . 
+ There is accumulating evidence to support that numerous mechanisms of catabolite repression exist within a single bacterium . 
+ A mechanism involving cyclic AMP ( cAMP ) and its receptor protein ( CRP , cAMP receptor protein ) in Escherichia coli was established four decades ago ( Figure 1A ) ( 3 ) . 
+ Given the general acceptance that cAMP-CRP provides the principal means to effect catabolite repression in E. coli and the closely related enteric bacteria , many aspects of CRP have been studied , including protein structure and allosteric activation ( 4 ) , mechanisms of transcriptional regulation ( 5 ) , and catabolite repression ( 6 ) . 
+ Thus , CRP is one of the best characterized transcription factors ( TF ) in bacteria . 
+ The transcriptional regulator CRP is reported to regulate the expression of over 180 genes ( 7,8 ) . 
+ A Chromatin Immuno-Precipitation ( ChIP ) method was used to determine in vivo binding sites of CRP in Escherichia coli K-12 MG1655 ( 7 ) and other strains ( 9 ) . 
+ The concentration of the effector molecule cAMP for CRP has been also determined with experimental methods , and the concentration increased significantly when less favorable carbon sources were supplemented ( 10 ) . 
+ The carbon metabolism of enterobacteria , including E. coli , is globally regulated by two major TFs ( 1 ) . 
+ In addition to the catabolite repression/activation mechanism by CRP , there is another mechanism mediated by catabolite repressor activator ( Cra ) , which was initially named fructose repressor ( FruR ) ( 11 ) . 
+ Cra plays a pleiotropic role to modulate the direction of carbon flow in multiple metabolic pathways , particularly in glycolysis . 
+ However , it has been postulated that Cra works independently of the CRP regulation ( 12,13 ) . 
+ Multiple studies with expression profiling experiments showed that Cra is capable of regulating a large number of genes in the gluconeogenic pathway ( 11 ) , TCA cycle ( 14 ) , glyoxylate shunt ( 15 ) , and Entner-Doudoroff ( ED ) pathway ( 13 ) . 
+ Cra regulates glycolytic flux by sensing the concentration level of fructose-1 ,6 - bisphosphate ( FBP ) or fructose-1-phosphate ( F1P ) ( Figure 1A ) ( 16 ) . 
+ Previous studies demonstrated that the concentration of the effector molecule FBP increases significantly as glucose becomes more limited ( 16,17 ) . 
+ In a recently published study , cAMP , FBP and F1P alone can explain most of the specific transcriptional regulation of the core carbon metabolism through their interaction with CRP and Cra ( 18 ) . 
+ Thus an investigation of the transcriptional regulation by CRP and Cra at the genome-scale would contribute to better understanding of the carbon metabolism in bacteria . 
+ Unlike CRP , the definition of the Cra regulon has mostly relied on transcriptome analysis or in vitro assays ( 19 ) , and the in vivo identification of the Cra regulon is yet to be performed at a genome-scale . 
+ Thus , the recently developed Chromatin Immuno-Precipitation with Exonuclease treatment ( ChIP-exo ) ( 20 -- 23 ) was applied to identify in vivo binding sites of Cra on three different carbon sources ; glucose , fructose , and acetate , at the genome-scale , to enable the definition of the Cra regulon . 
+ In addition , expression profiling on different carbon sources was performed with E. coli wild-type and the cra deletion mutant to identify causal effects of the ChIP-exo identified Cra binding sites on gene expression . 
+ Using a model-based simulation , regulation of metabolic flux states by both the Cra and CRP regulons was analyzed on 38 different carbon sources including glucose , fructose , and acetate . 
+ Flux states of pathways were established with the genome-scale metabolic network model of E. coli ( 24 ) using flux balance analysis ( FBA ) ( 25 ) . 
+ Integration of experimentally derived regulatory information with in silico calculation of flux states of core carbon metabolism revealed the transcriptional regulation by Cra of glycolysis , the TCA cycle , and the respiratory chain with emphasis on the overriding regulatory activity of Cra over CRP . 
+ MATERIALS AND METHODS
+ Bacterial strains, media and growth conditions
+ All strains used in this study are E. coli K-12 MG1655 and its derivatives , knock-out strains and a myc-tagged strain ( Supplementary Table S1 ) . 
+ For ChIP-exo experiments , the E. coli strain harboring cra-8myc was generated as described previously ( 26 ) . 
+ For growth rate measurement and expression profiling by RNA-seq , deletion mutant Δcra and Δcrp were constructed by red-mediated site-specific recombination system ( 27 ) . 
+ The myc-tagged strain E. coli cra-8myc showed no growth change when grown on glucose and acetate compared to the WT , indicating the myc epitope did not change the binding activity of Cra ( Supplementary Figure S1 ) . 
+ For deletion mutants , RpoB ChIP-exo background signals were analyzed to confirm the target gene region was the only removed region for cra and crp ( Supplementary Figure S2 ) . 
+ For growth rate measurement , glycerol stocks of E. coli strains were inoculated into M9 minimal media with different carbon sources , glucose , fructose , galactose , succinate , glycerol or acetate . 
+ The concentration of carbon sources was 0.2 % ( w/v ) . 
+ M9 minimal media was also supplemented with 1 ml trace element solution ( 100 × ) containing 1 g EDTA , 29 mg ZnSO4 · 7H2O , 198 mg MnCl2 · 4H2O , 254 mg CoCl2 · 6H2O , 13.4 mg CuCl2 and 147 mg CaCl2 . 
+ The culture was incubated at 37 ◦ C overnight with agitation , and then was used to inoculate the fresh media ( 1/200 dilution ) . 
+ The volume of the fresh media was 150 ml for each biological replicate . 
+ The growth curve measurement was performed in a batch culture , and was repeated twice with three biological replicates . 
+ From the data points by optical density measurement , the growth rate and the lag phase duration was calculated with GrowthRates 2.0 ( 28 ) . 
+ For RNA-seq expression profiling , glycerol stocks of E. coli strains were inoculated into M9 minimal media with different carbon sources , glucose , fructose or acetate . 
+ The concentration of carbon sources was 0.2 % ( w/v ) . 
+ M9 minimal media was also supplemented with 1 ml trace element solution ( 100X ) . 
+ The culture was incubated at 37 ◦ C overnight with agitation , and then was used to inoculate the fresh media . 
+ The fresh culture was incubated at 37 ◦ C with agitation to the mid-log phase ( OD600 ≈ 0.5 for glucose and fructose , and OD600 ≈ 0.25 for acetate ) . 
+ ChIP-exo experiment
+ ChIP-exo experiment was performed following the procedures previously described ( 20 ) . 
+ In brief , to identify Cra binding maps in vivo , we isolated the DNA bound to Cra from formaldehyde cross-linked E. coli cells by chromatin immunoprecipitation ( ChIP ) with the specific antibodies that specifically recognizes myc tag ( 9E10 , Santa Cruz Biotechnology ) , and Dynabeads Pan Mouse IgG magnetic beads ( Invitrogen ) followed by stringent washings as described previously ( 29 ) . 
+ ChIP materials ( chromatin-beads ) were used to perform on-bead enzymatic reactions of the ChIP-exo method ( 20,30 ) . 
+ Briefly , the sheared DNA of chromatin-beads was repaired by the NEBNext End Repair Module ( New England Biolabs ) followed by the addition of a single dA overhang and ligation of the first adaptor ( 5 ′ - phosphorylated ) using dA-Tailing Module ( New England Biolabs ) and NEBNext Quick Ligation Module ( New England Biolabs ) , respectively . 
+ Nick repair was performed by using PreCR Repair Mix ( New England Biolabs ) . 
+ Lambda exonuclease - and RecJf exonuclease-treated chromatin was eluted from the beads and the protein -- DNA cross-link was reversed by overnight incubation at 65 ◦ C. RNAs - and Proteins-removed DNA samples were used to perform primer extension and second adaptor ligation with following modifications . 
+ The DNA samples incubated for primer extension as described previously ( 20 ) were treated with dA-Tailing Module ( New England Biolabs ) and NEBNext Quick Ligation Module ( New England Biolabs ) for second adaptor ligation . 
+ The DNA sample purified by GeneRead Size Selection Kit ( Qiagen ) was enriched by polymerase chain reaction ( PCR ) using Phusion High-Fidelity DNA Polymerase ( New England Biolabs ) . 
+ The amplified DNA samples were purified again by GeneRead Size Selection Kit ( Qiagen ) and quantified using 
+ Qubit dsDNA HS Assay Kit ( Life Technologies ) . 
+ Quality of the DNA sample was checked by running Agilent High Sensitivity DNA Kit using Agilent 2100 Bioanalyzer ( Agilent ) before sequenced using MiSeq ( Illumina ) in accordance with the manufacturer 's instructions . 
+ Each modified step was also performed in accordance with the manufacturer 's instructions . 
+ ChIP-exo experiments were performed in biological duplicate . 
+ RNA-seq expression profiling
+ Three milliliters of cells from mid-log phase culture were mixed with 6 ml RNAprotect Bacteria Reagent ( Qiagen ) . 
+ Samples were mixed immediately by vortexing for 5 s , incubated for 5 min at room temperature , and then centrifuged at 5000 × g for 10 min . 
+ The supernatant was decanted and any residual supernatant was removed by inverting the tube once onto a paper towel . 
+ Total RNA samples were then isolated using RNeasy Plus Mini kit ( Qiagen ) in accordance with the manufacturer 's instruction . 
+ Samples were then quantified using a NanoDrop 1000 spectrophotometer ( Thermo Scientific ) and quality of the isolated RNA was checked by running RNA 6000 Pico Kit using Agilent 2100 Bioanalyzer ( Agilent ) . 
+ Paired-end , strand-specific RNA-seq was performed using the dUTP method ( 31 ) with the following modifications which is previously described ( 20 ) . 
+ The ribosomal RNAs were removed from 2 g of isolated total RNA with Ribo-Zero rRNA Removal Kit ( Epicentre ) in accordance with the manufacturer 's instruction . 
+ Subtracted RNA was fragmented for 2.5 min at 70 ◦ C with RNA Fragmentation Reagents ( Ambion ) , and then fragmented RNA was recovered with ethanol precipitation . 
+ Random primer ( 3 g ) and fragmented RNA in 4 l was incubated in 5 l total volume at 70 ◦ C for 10 min , and cDNA or the first strand was synthesized using SuperScript III first-strand synthesis protocol ( Invitrogen ) . 
+ The cDNA was recovered by phenol -- chloroform extraction followed by ethanol precipitation . 
+ The second strand was synthesized from this cDNA with 20 l of fragmented cDNA : RNA , 4 l of 5 × first strand buffer , 30 l of 5 × second strand buffer , 4 l of 10 mM dNTP with dUTP instead of dTTP , 2 l of 100 mM DTT , 4 l of E. coli DNA polymerase ( Invitrogen ) , 1 l of E. coli DNA ligase ( Invitrogen ) , 1 l of E. coli RNase H ( Invitrogen ) in 150 l of total volume . 
+ This reaction mixture was incubated at 16 ◦ C for 2 h , and fragmented DNA was recovered with PCR clean-up kit ( QIAGEN ) and eluted in 30 l of nuclease-free water . 
+ The fragmented DNA was end-repaired with End Repair Kit ( New England Biolabs ) , and dA-tailed with dA-Tailing Kit ( New England Biolabs ) , and then ligated with 7.5 g of DNA adaptor mixture with Quick Ligation Kit ( New England Biolabs ) . 
+ The adaptorligated DNA was size-selected to removed un-ligated adaptors with GeneRead Size Selection Kit ( QIAGEN ) , and treated with 1 U of USER enzyme ( New England Biolabs ) in 30 l of total volume , and incubated at 37 ◦ C for 15 min followed by 5 min at 95 ◦ C . 
+ The USER-treated DNA was amplified by PCR to generate sequencing library for Illumina sequencing . 
+ The samples were sequenced using MiSeq ( Illumina ) in accordance with the manufacturer 's instructions . 
+ All RNA-seq experiments were performed in biological duplicate . 
+ Real-time qPCR
+ The same total RNA samples , that were used for RNA-seq , were used . 
+ The starting RNA material was 10 g of the total RNA . 
+ The reaction mixture ( 60 l ) contained total RNA , random primers , 1 × first-strand buffer , 10 mM DTT , 0.5 mM deoxyribonucleotide triphosphates , 20 units of SUPERase-In and 600 units of SuperScript II reverse transcriptase ( Life Technologies ) . 
+ The mixture was incu - ◦ ◦ bated in a thermocycler at 25 C for 10 min , 37 C for 1 h , 42 ◦ ◦ C for 1 h and 70 C for 10 min to inactivate SuperScript II . 
+ The RNA template was then removed by adding 20 l of 1 ◦ M NaOH to the reaction mixture and incubating at 65 C for 30 min . 
+ The reaction was neutralized by the addition of 20 l of 1 M HCl . 
+ cDNA was purified using a QIAquick PCR purification column ( QIAGEN ) , following the vendor procedures . 
+ cDNA quantification was performed using a NanoDrop spectrophotometer . 
+ Real-time qPCR was performed on the synthesized cDNA using a QuantiTect SYBR Green PCR Kit ( QIAGEN ) . 
+ The 25 l qPCR mixtures contained 12.5 l of 2 QuantiTect SYBR Green PCR Master Mix , 0.2 M forward primer , 0.2 M reverse primer , and cDNA template . 
+ Each qPCR assay was performed in triplicate in a Bio - ◦ Rad iCycler under the following conditions : 95 C for 15 ◦ min , followed by 40 cycles of denaturation at 94 C for 15 ◦ ◦ s , annealing at 52 C for 30 s and extending at 72 C for 30 s , at which point the SYBR fluorescence was measured for qPCR curve generation . 
+ The real-time qPCR was performed in duplicates . 
+ The binding affinity of each primer set was assessed by constructing a standard curve for each primer . 
+ The standard curve allowed for calculation of reaction efficiency . 
+ Relative quantities of cDNA were calculated using the standard curve and normalizing to the quantity of a housekeeping gene , rplW . 
+ Results were reported in a bar chart showing relative normalized enrichment ratio . 
+ Peak calling for ChIP-exo dataset
+ Peak calling was performed as previously described ( 20 ) . 
+ Sequence reads generated from ChIP-exo were mapped onto the reference genome ( NC 000913.2 ) using bowtie ( 32 ) with default options to generate SAM output files ( Supplementary Table S2 ) . 
+ MACE program ( 33 ) was used to define peak candidates from biological duplicates for each experimental condition with sequence depth normalization . 
+ To reduce false-positive peaks , peaks with signal-to-noise ( S/N ) ratio < 1.5 were removed . 
+ The noise level was set to the top 5 % of signals at genomic positions because top 5 % makes a background level in plateau and top 5 % intensities from each ChIP-exo replicates across conditions correlate well with the total number of reads ( 20 -- 22 ) . 
+ The calculation of S/N ratio resembles the way to calculate ChIP-chip peak intensity where IP signal was divided by Mock signal . 
+ Then , each peak was assigned to the nearest gene . 
+ Genome-scale data were visualized using MetaScope ( http : / / systemsbiology.ucsd.edu/Downloads/MetaScope ) . 
+ The sequence motif analysis for TFs and - factors was performed using the MEME software suite ( 34 ) . 
+ For Cra , sequences in binding regions were extracted from the reference sequence ( NC 000913.2 ) . 
+ Calculation of differentially expressed gene
+ Sequence reads generated from RNA-seq were mapped onto the reference genome ( NC 000913.2 ) using bowtie ( 32 ) with the maximum insert size of 1000 bp , and two maximum mismatches after trimming 3 bp at 3 ′ ends ( Supplementary Table S3 ) . 
+ SAM files generated from bowtie , were then used for Cufflinks ( http://cufflinks.cbcb.umd.edu/ ) ( 35 ) to calculate fragments per kilobase of exon per million fragments ( FPKM ) . 
+ Cufflinks was run with default options with the library type of dUTP RNA-seq and the default normalization method ( classic-fpkm ) . 
+ Differentially expressed genes were calculated with DESeq2 ( 36 ) and expression with log2 fold change ≥ 1.0 and adjusted P-value ≤ 0.05 was considered as differentially expressed . 
+ Genome-scale data were visualized using MetaScope . 
+ COG functional enrichment
+ Cra regulons were categorized according to their annotated clusters of orthologous groups ( COG ) category . 
+ Functional enrichment of COG categories in Cra target genes was determined by performing hypergeometric test , and P-value < 0.05 was considered significant . 
+ FBA analysis and MCMC sampling was performed with iJO1366 E. coli metabolic model ( 24 ) , COBRA Toolbox v2 .0 ( 37 ) and COBRApy ( 38 ) as previously described ( 39 ) . 
+ In brief , the distribution of feasible fluxes for each reaction in the iJO1366 model was determined using Markov chain Monte Carlo ( MCMC ) sampling ( 40 ) . 
+ Specifically , uptake rates for the carbon sources were measured with HPLC and were used to constrain the model : -- 8.437 mmol/gDW/h for glucose , -- 7.546 mmol/gDW/h for fructose and -- 7.671 mmol/gDW/h for acetate . 
+ The biomass objective function ( a proxy for growth rate ) was provided a lower bound of 95 % of the optimal growth rate as computed by FBA . 
+ Thus , the sample flux distributions by MCMC sampling method represented sub-optimal flux distributions . 
+ MCMC sampling was used to obtain 10 thousands of feasible flux distributions , and the average of flux samples for each reaction was used . 
+ Sampled points of reactions in loops were removed before further analysis . 
+ Reactions in loops were calculated by using flux variability analysis ( FVA ) on iJO1366 model . 
+ RESULTS
+ Decreasing growth by cra knock-out on the poor carbon sources
+ In order to assess the contribution by Cra or CRP to bacterial growth on different carbon sources and to assess how crucial Cra or CRP transcriptional regulation is on growth , E. coli WT , Δcra , and Δcrp were grown on 6 different carbon sources ( glucose , fructose , galactose , succinate , glycerol , and acetate ) to measure the growth rate and the lagphase time . 
+ Those carbon sources , glucose , fructose , galactose , succinate , glycerol , and acetate , were chosen to span various carbon sources with different numbers of carbons . 
+ On carbon sources with 6 carbon atoms ( glucose , fructose , and galactose ) , the crp mutant showed a more severe growth defect . 
+ However , on carbon sources with 2 - ( acetate ) , 3 - ( glycerol ) and 4 - ( succinate ) carbon atoms which would be expected to relieve catabolite repression and to induce gluconeogenesis , knocking-out the cra gene showed more severe growth defects ( Figure 1B ) and showed a much longer lag-phase time ( Supplementary Figure S3 ) . 
+ This observation confirms the involvement and importance of transcriptional regulation by Cra and CRP as shown in the previous studies . 
+ However , cra knock-out decreased growth rate more severely than crp knock-out on the poor carbon sources , suggesting Cra might have more important regulatory implications in adaptation to those carbon sources . 
+ Genome-wide mapping of Cra binding sites
+ To identify in vivo Cra binding sites , E. coli was grown under three different carbon sources ; glucose , fructose , and acetate . 
+ Glucose is a favorable carbon source for E. coli , and is known to cause the most severe cAMP-dependent catabolite repression ( 10 ) . 
+ Cra , which is also called FruR , was first known to repress the fructose-specific operon fruBKA ( 41 ) , thus fructose was believed to alter the activity of Cra . 
+ Acetate was chosen as a representative of less favorable carbon sources for E. coli , which is reported to relieve catabolite repression , thus changing the flux though glycolysis and altering Cra activity ( 16 ) . 
+ A total of 49 Cra binding sites were identified using the ChIP-exo method during growth on those three different carbon sources ( Figure 1C , Supplementary Table S4 ) . 
+ Among them , only 29 binding sites were occupied when bacteria were grown on fructose , indicating least activation of Cra on that carbon source . 
+ In agreement with this observation , Cra ChIP-exo peak intensity on fructose was the weakest on average among the three substrates , while peak intensity of Cra binding on acetate was stronger than the intensity on either glucose or fructose ( ranksum test P-value < 0.05 , Supplementary Figure S4 ) . 
+ E. coli contains two phosphofructokinase ( PFK ) isozymes , PFK I/pfkA and PFK II/pfkB , however , over 90 % of the phosphofructokinase activity is attributed to PFK I ( 42 ) . 
+ Studies on pfkA in E. coli have previously identified a Cra binding site ( 43 ) upstream of a 70-dependent promoter ( 29,44 ) , and ChIP-exo experiments identified this binding site with a near single-base pair resolution ( Figure 1D , Supplementary Figure S5A ) . 
+ This Cra binding overlaps the 70-dependent promoter , particularly covering -- 35 box of this promoter , thus possibly indicating that Cra binding would repress the expression of the downstream gene , pfkA ( Supplementary Figure S5A ) . 
+ Similarly , Cra binding was also observed upstream of tpiA , which encodes triose phosphate isomerase ( Figure 1D ) . 
+ Cra binding overlaps with the -- 10 and -- 35 boxes of the tpiA promoter , suggesting its repressive effect on tpiA expression ( Supplementary Figure S5B ) . 
+ The genome-wide Cra binding sites were compared to binding sites summarized in a public database ( 45 ) . 
+ There are 17 previously reported Cra binding sites based on experimental evidence , and 13 ( 76.5 % ) of them were identified from ChIP-exo experiments performed in this study ( Figure 1E ) . 
+ The four missing bindings are for cydAB , csgDEFG , hypF and pck ( 12,46,47 ) . 
+ It is possible that these four binding sites were not detected in ChIP-exo experiments because they were previously identified with in vitro methods and/or were identified under different growth conditions such as stationary phase or anaerobic growth . 
+ One possible drawback of determining binding sites with in vitro methods is that they may not represent feasible in vivo interactions between Cra and the genomic DNA . 
+ The ChIP-exo binding sites for Cra were also compared to another dataset that was generated with the in vitro SELEX system ( 19 ) . 
+ This study identified a total of 164 binding sites for Cra using this in vitro method . 
+ Among them , only 33 binding sites overlap with the ChIP-exo binding sites . 
+ This comparison contrasts in vivo and in vitro methods , suggesting that an in vitro analysis could provide information on the full extent of Cra binding sites , while the ChIP-exo method can capture in vivo binding sites that are relevant to the growth conditions . 
+ For the ChIP-exo identified binding sites , the sequence motif was calculated using the MEME suite ( 48 ) . 
+ The sequence motif obtained for Cra binding sites was ctgaAtCGaTtcag ( lower-case characters indicate an information content < 1 bit ) ( Figure 1F ) . 
+ This sequence motif is nearly identical to the previously reported motif gcTGAAtCGaTTCAgc ( 45,49 ) . 
+ The ChIP-exo experiments performed here on three different carbon sources provide the first genome-wide in vivo measurement of Cra binding sites . 
+ A total of 49 binding sites were detected , and this dataset is in good agreement with previous knowledge in terms of the genomic locations and the sequence binding motif analysis . 
+ The better resolution that the recently developed ChIP-exo method provides enabled a more precise investigation of molecular interactions between TFs and the regulatory elements , such as promoters , of the genomic DNA . 
+ Orchestrated regulation of carbon metabolism by Cra and CRP
+ The definition of the regulon for Cra necessitates integration of the Cra binding site information with transcription unit ( TU ) annotation . 
+ Thus , the TUs with Cra binding sites in their upstream regulatory region were chosen from the reported TU annotation ( 29,45 ) . 
+ Only Cra binding sites in the regulatory regions were used in this integration , leaving out four binding sites found in the intragenic regions of ygenes , ynfK , yegI , yejG , and yihP . 
+ If the Cra binding site is located in the divergent promoter , then TUs at both sides were considered as possible Cra regulons . 
+ This integration resulted in 63 TUs with 136 genes as candidates for inclusion in the definition of the Cra regulon . 
+ To identify TUs with expression change upon cra knockout , RNA-seq experiments were performed for E. coli WT and Δcra knock-out strains on the three carbon sources . 
+ The RNA-seq transcriptome data was compared to the RT-qPCR gene expression measurement for aceA , gpmM , fbaA , and pgk for WT and Δcra ( Supplementary Figure S6 ) . 
+ Also , the correlation between two biological replicates for RNA-seq showed over 0.95 ( Supplementary Figure S7 ) , thus the quality of the RNA-seq data was confirmed and was further analyzed . 
+ Any TU having a gene with an expression change ≥ 2-fold ( q-value ≤ 0.01 ) was considered as a differentially expressed TU . 
+ Out of 63 candidate TUs , 35 TUs ( containing 97 genes ) were differentially expressed with Cra ChIP-exo binding sites , thus 97 genes are defined as the Cra regulon ( Supplementary Table S5 ) . 
+ Clusters of Orthologous Groups ( COG ) analysis showed that the Cra regulon has enriched functions in energy production/conversion , carbohydrate metabolism/transport , and inorganic ion transport/metabolism ( Supplementary Figure S8 ) . 
+ The average number of genes in the Cra regulon TUs was 2.77 , which is much larger than the average of 1.78 genes per TU for all TUs in E. coli ( 29 ) . 
+ Integration of Cra binding information with differential gene expression revealed the regulatory mode of Cra on its regulon TUs ( Supplementary Table S6 , Supplementary Table S7 ) . 
+ Out of 35 regulon TUs , Cra up-regulated 16 TUs , and down-regulated 16 TUs . 
+ The remaining three TUs are up - or down-regulated depending on which of the three carbon sources was used . 
+ For instance , glk , which encodes the cytoplasmic glucokinase , was down-regulated on fructose , but it was up-regulated on acetate when cra is missing ( Supplementary Table S5 ) . 
+ This result indicates a complex regulation on the expression of this enzyme , which could be true for most of the enzymes in glycolysis/gluconeogenesis and the TCA cycle , since their activity must be finely tuned based on available carbon sources . 
+ With the Cra regulon definition and with CRP regulatory information from the public database ( 45 ) , a regulatory network for Cra and CRP for core carbon metabolism was built ( Figure 2A ) . 
+ In brief , glycolysis is more heavily regulated and always repressed by Cra . 
+ The TCA cycle , however , is more regulated and mostly activated by CRP . 
+ This regulatory logic represents a differential transcriptional regulation of glycolysis and the TCA cycle by Cra and CRP . 
+ Another interesting aspect of this reconstructed regulatory network is that only a few genes , fbaA , gapA , pgk , epd , aceE , aceF , aceB , and aceA , are co-regulated by both Cra and CRP . 
+ The two TFs regulate their overlapping target genes in an antagonizing manner . 
+ For instance , co-regulated genes in glycolysis , fbaA , gapA , pgk and aceEF , are repressed by Cra , however they are activated by CRP . 
+ On the other hand , CRP represses the glyoxylate shunt , aceBA , but it is activated by Cra . 
+ In addition , it is notable to mention that other transcription factors and nucleoid-associated proteins may play an important role in accordance with CRP or Cra . 
+ For example , it was reported that IclR and IHF binds onto the promoter region of aceBA ( 50 ) , which may interact with CRP or Cra There is differential , but overlapping , transcriptional regulation of core carbon metabolic pathways , glycolysis , and the TCA cycle , by Cra and CRP . 
+ To investigate this complicated transcriptional regulation , expression of genes in carbon metabolism was analyzed . 
+ Thus , the relative transcription of each regulated gene was compared on three different carbon sources ( Figure 2B , Supplementary Table S8 ) . 
+ Except for three genes ( fbp , fbaB and ppsA ) that are known to be active for gluconeogenesis , genes in glycolysis are transcribed more on fructose than glucose or acetate . 
+ fbp encoding fructose-1 ,6 - bisphosphatase and ppsA encoding phosphoenolpyruvate synthetase catalyze the two irreversible reactions that distinguish glycolysis and gluconeogenesis . 
+ fbaB encodes a class I fructose bisphosphate aldolase , that is involved in gluconeogenesis , whereas class II aldolase , which is encoded by fbaA , is involved in glycolysis ( 51 ) . 
+ Thus these genes are expected to be more highly transcribed on acetate where gluconeogenesis is active , as shown by the expression profiling data . 
+ Acetate is primarily metabolized through the TCA cycle , to generate energy and biosynthetic precursors . 
+ Some of the acetate has to be metabolized through gluconeogenesis to synthesize five and six carbon biosynthetic precursors . 
+ Consistent with this expectation , the majority of genes in the TCA cycle were more highly expressed on acetate than on fructose or glucose . 
+ The statistical analysis of relative expression of genes in glycolysis and the TCA cycle on three carbon sources confirms the expected transcription pattern ( Figure 2C ) . 
+ The average relative transcriptional level of glycolysis is the highest on fructose , followed by glucose , and then acetate . 
+ In this analysis , genes that are more active for gluconeogenesis were excluded for the clarity of the figure ; however , those genes do not change the pattern in the relative transcriptional level even if genes in gluconeogenesis were included ( Supplementary Figure S9 ) . 
+ For the genes in the TCA cycle , however , the average relative transcriptional level is the highest on acetate , followed by fructose and glucose . 
+ The integration of ChIP-exo binding site information for Cra with known TU annotations and expression profiling by RNA-seq revealed the genome-wide transcriptional regulation by Cra . 
+ The Cra regulatory information was then combined with CRP regulatory information to build a regulatory network of the core carbon metabolic pathways including glycolysis , the TCA cycle , and PP ( pentose phosphate ) pathway . 
+ This regulatory network represents a differential , but overlapping , regulation of carbon metabolism by Cra and CRP . 
+ These observations suggest a possible decoupling in regulation on glycolysis and the TCA cycle , however how this decoupling occurs in the context of the function of the entire metabolic network requires more elaboration . 
+ Antagonizing regulatory mode between Cra and CRP on the key enzymes of core carbon metabolism
+ The activity level of Cra and CRP vary depending on the carbon source . 
+ ChIP-exo experiments show that the binding activity of Cra is the lowest on fructose in terms of number of binding sites and the binding intensity , and it is strongest on acetate . 
+ However , the activity of CRP may be different . 
+ Interestingly , the expressed mRNA and protein level of both crp or cra does not change significantly during growth on glucose , fructose , or acetate ( Supplementary Figure S10 , Supplementary Figure S11 ) . 
+ Therefore , the regulatory activity of CRP could be strongly dependent on the concentration of its effector molecule , cAMP , as suggested in the previous study with in vitro experiment ( 52 ) . 
+ The intracellular concentration of cAMP is lowest on glucose , and higher on fructose . 
+ Further , the cAMP concentration is even higher on less favorable carbon sources , such as malate ( 10 ) . 
+ Thus , the DNA binding and the regulatory activity of CRP is expected to be the weakest on glucose and the strongest on acetate ( Figure 2D ) . 
+ Cra and CRP co-regulate a total of 13 TUs . 
+ Of these , four are either co-activated or co-repressed by both of them , thus there is no conflict in regulation of those TUs between Cra and CRP . 
+ Cra binds upstream of crp ( Supplementary Figure S12 ) and marRAB , and these two TUs are reported to be activated by CRP ( 7 ) . 
+ However , expression of crp or marRAB did not change significantly on different carbon sources , thus they are categorized as undetermined . 
+ Cra and CRP both regulate seven transcription units containing genes encoding several enzymes in carbon metabolism in an opposite , or antagonizing , manner ( Figure 2D ) . 
+ For instance , Cra activates the aceBA operon that encodes enzymes in the glyoxylate shunt , while CRP represses it . 
+ On the other hand , Cra represses epd-pgk-fbaA , gapAand aceEF , most of which are involved in glycolysis , but CRP activates their expression . 
+ This conflicting regulation by Cra and CRP makes sense on glucose and fructose , where either one of them is inactivated . 
+ Cra is more active on glucose , while CRP is more active on fructose . 
+ This differential activation explains why genes in glycolysis and the TCA cycle are more highly transcribed on fructose than on glucose . 
+ Whereas either Cra or CRP is inactivated on glucose or fructose , acetate and possibly poor carbon sources would activate Cra and CRP at the same time . 
+ Since 7 TUs are regulated by both Cra and CRP , the expression change of genes in those TUs on less favorable carbon sources is of interest . 
+ The mRNA expression of aceBA and treBC were upregulated on acetate , and the expression of epd-pgk-fbaA , gapA , aceEF , raiA and mtlADR were down-regulated . 
+ Interestingly , the expression changes of these TUs always follow the regulatory mode of Cra regardless of CRP regulation . 
+ For example , expression of aceBA was up-regulated on acetate although CRP represses its expression , thus resulting in activation of the glyoxylate shunt . 
+ Similarly , expression of epd-pgk-fbaA , gapA , and aceEF was repressed even though CRP up-regulates their transcription , contributing to down-regulation of genes in glycolysis on acetate . 
+ Collectively , Cra and CRP are most active on poor carbon sources . 
+ When they are active , they regulate target genes in the core carbon metabolism in a variety of modes . 
+ They co-activate or co-repress some of their target genes , while regulating some key genes in glycolysis and the TCA cycle in an antagonizing manner . 
+ The overall regulatory consequence always follows the regulatory mode of Cra , indicating the possible overriding regulatory effect by Cra over CRP for those key enzymes of the core carbon metabolism . 
+ Flux balance analysis leads to a network level understanding of the regulatory roles of Cra and CRP
+ In order to try to understand the regulatory decoupling of glycolysis and the TCA cycle activation and to understand the metabolic driving force of increased TCA cycle activation and how the overriding regulatory effect by Cra works in that context , we turned to the methods of systems biology . 
+ Flux Balance Analysis ( FBA ) ( 25 ) and Markov Chain Monte Carlo ( MCMC ) sampling were applied to the E. coli metabolic model iJO1366 ( 24 ) to simulate feasible flux states for all metabolic reactions during growth on different carbon sources ( Figure 3A ) . 
+ To determine if the predicted flux calculation correlates with the enzyme abundance and is in agreement with previous studies ( 53,54 ) , the calculated flux values through the metabolic reactions were compared with the transcriptional level of the genes that encode enzymes catalyzing these reactions ( Figure 3B ) . 
+ The ratio of fluxes through those reactions showed a good correlation to the expression change calculated between acetate and glucose . 
+ In other words , as observed in expression profiling , fluxes through the TCA cycle were predicted to increase on acetate , while fluxes through glycolysis were calculated to decrease . 
+ Enzymatic reactions in glycolysis showed reduced expression and reduced flux on acetate while reactions in the TCA cycle showed increased expression and increased flux on less favorable carbon sources . 
+ Separation between reactions in glycolysis and the TCA cycle in expression and flux changes on carbon source shift reflects a decoupling between glycolysis and the TCA cycle , and differential transcriptional regulation on them . 
+ Cra redirects the fluxes through the glycolysis pathway towards gluconeogenesis and represses the transcriptional expression of the enzymes in glycolysis to make a reduced volume of enzymatic fluxes . 
+ In contrast , CRP activates transcription for the majority of components in the TCA cycle resulting in more reaction flux . 
+ Normalized fluxes , through the reactions in core carbon metabolic pathways , were mapped onto the metabolic network for glucose and acetate to illustrate how differently each pathway is predicted to be active on different carbon sources ( Figure 3A ) . 
+ When the flux of 10.5 ( mmol/gDW/h ) enters into glucose 6-phosphate ( g6p ) the simulation predicted 4.8 ( 45.7 % ) of the influx would flow into the PP pathway , leaving 5.4 ( 51.4 % ) to glycolysis . 
+ The simulation predicted the flux of 15.1 would flow to phosphoenolpyruvate ( pep ) , and flux would be divided into two flux flows from pep into the TCA cycle , making flux of 5.9 from citrate to isocitrate . 
+ Thus , on glucose , glycolysis is calculated to have a higher flux than the TCA cycle . 
+ On acetate , the carbon flow starts from acetate with incoming flux of 44.1 , with 8.0 ( 18.1 % ) predicted to enter into gluconeogenesis and most of the remaining flux , 29.3 ( 66.4 % ) , was predicted to flow into the TCA cycle . 
+ Thus , the TCA cycle flux computed for growth on acetate is almost 5 times larger than the flux on glucose ( Supplementary Figure S13 ) . 
+ There is a reaction , CITL ( citrate lyase ) , converting citrate to oxaloacetate ( oaa ) , however almost zero flux was predicted for this reaction in accordance with the knowledge that E. coli K-12 MG1655 does not have citrate lyase activity . 
+ The transcriptional level of citCDEFXGT for this reaction was very low ( Supplementary Figure S14 ) . 
+ Thus , this reaction was ignored in further analysis . 
+ Differential activation of the glycolysis pathway and the TCA cycle , which are regulated by Cra and CRP at the transcriptional level , was observed on three representative carbon sources , glucose , fructose , and acetate , for which experimental measurements were performed . 
+ However , in silico analysis with a genome-scale metabolic model on 38 carbon sources that support E. coli growth resulted in confirmation of the previous observation and expanded the understanding that decoupling of the glycolysis pathway and TCA cycle , reduced activity of the glycolysis pathway , and increased activity of the TCA cycle is expected to happen on most of the poor carbon sources ( Figure 3C ) . 
+ Except for a small number of carbon sources with 3 - or 4-carbons that need to be converted and fed into the glycolysis pathway , the majority of viable carbon sources with 3 or 4 carbons were predicted to render a smaller volume of fluxes through glycolysis with the opposite direction towards gluconeogenesis and to have more reaction fluxes through the TCA cycle . 
+ Thus , acetate is the best representative of the poor carbon sources that can render reduced reaction flux through glycolysis and activated flux through the TCA cycle . 
+ However , differential regulation and activation of those two pathways is not a phenomenon that is limited to a certain carbon source , acetate , but it is an outcome of a complex regulation that is common in most of the poor carbon sources . 
+ In summary , in silico simulation in the context of orchestrated transcriptional regulation by Cra and CRP on the core carbon metabolism confirmed the decoupling of glycolysis and the TCA cycle at the transcriptional regulation level and reaction flux level . 
+ Repression of glycolysis and activation of the TCA cycle at the reaction flux level was observed on most of the poor carbon sources . 
+ This regulatory activity is mediated by Cra and CRP . 
+ The overriding regulatory activity of Cra over CRP on gly- colysis
+ On the poor carbon sources such as acetate , both Cra and CRP are activated to regulate the expression of the key enzymes in glycolysis but in opposite ways . 
+ Cra downregulates the expression of the majority of enzymes in the glycolysis pathway except for two genes , fbp and ppsA , that are responsible for the flux redirection towards gluconeogenesis and should be up-regulated on the poor carbon sources to supply 5 - or 6-carbon precursor molecules . 
+ On the other hand , CRP up-regulates the expression of some metabolic enzymes in glycolysis such as FbaA , GapA , and Pgk . 
+ Despite the transcriptional regulation by Cra and CRP in opposite directions , the transcriptional expression of genes for glycolysis enzymes was down-regulated following the regulatory mode of Cra ( Figure 4A ) . 
+ The in silico simulation with the genome-scale metabolic model , which is independent of the transcriptional regulatory information , suggested that the reaction fluxes through the glycolysis pathway should be decreased to support the optimal growth of a bacterial cell ( Figure 4B ) , indicating the regulatory mode by Cra is optimal while that by CRP is not . 
+ In order to verify that the reduced fluxes through glycolysis support the optimal growth , the genome-scale metabolic model was artificially forced to have a higher flux than optimal , and we computed how increased fluxes through the glycolysis pathway towards gluconeogenesis affected cell growth ( Figure 4C ) . 
+ As postulated , when the in silico model was simulated on acetate but with a higher flux volume though glycolysis , the model predicted that the cell growth rate would decrease as the glycolysis flux volume increased . 
+ This indicates that the enzymatic activity of glycolysis should be lowered to support the maximum cell growth , and Cra provides the necessary transcriptional regulation on those enzymes . 
+ Thus , without the overriding activity of Cra over CRP at the transcriptional level , a bacterial cell would not be able to acquire the ability to adapt to the poor carbon sources and the optimal growth capability . 
+ These independent lines of evidence support the notion that transcriptional regulation of those enzymes should follow the regulatory mode of Cra , ignoring the regulatory activity of CRP , and emphasize the importance of the overriding regulatory activity of Cra on glycolysis . 
+ The remaining question is , what is the molecular basis for Cra overriding activity of CRP under the condition where they are both active ? 
+ With the high resolution that ChIP-exo provides , interaction between promoters ( 44 ) , CRP binding sites ( 45 ) , and Cra binding sites were analyzed at a base-pair resolution ( Figure 4D ) . 
+ For epd , which is activated by CRP and repressed by Cra , CRP binds upstream of the core promoter , indicating this interaction between CRP and RNA polymerase ( RNAP ) machinery is Class I ( 5 ) . 
+ Interestingly , however , Cra binding overlaps the promoter region by covering -10 box and transcription start site ( TSS ) . 
+ This suggests Cra could block RNAP binding to the promoter or inhibit the transcription process even if CRP recruits RNAP towards the promoter , repressing the expression of the downstream gene . 
+ Similarly , expression of pdhR-aceEF-lpd is repressed by Cra while being activated by CRP . 
+ CRP binds onto the genomic region including -35 box ( Class II activation ) ( 5 ) , however Cra binds downstream of two promoters of the TU , obstructing transcription . 
+ The same regulatory interaction between CRP and Cra was observed for mtlA and gapA ( Figure 4D ) . 
+ Thus , the activity of Cra overrides the activity of CRP when CRP activates and Cra represses the target gene by Cra binding -10 box of the promoter , TSS , or upstream of the promoter . 
+ The overriding regulatory activity of Cra over CRP on the glyoxylate shunt
+ Expression profiling showed that the transcriptional expression of enzymes in the TCA cycle pathway was up-regulated on acetate , a poor carbon source . 
+ Computation with the genome-scale metabolic model provided support for an increased flux through the TCA cycle , leading to an increase in the cell growth rate on acetate and other poor carbon sources . 
+ Activation of the TCA cycle may require activation of the glyoxylate shunt pathway , which is encoded by aceBA , because that particular pathway contributes to replenishing the oxaloacetate pool . 
+ In order to investigate this possibility , the expression profiling data was analyzed to confirm the up-regulation of transcriptional expression of the operon aceBA ( Figure 5A ) . 
+ As postulated , the expression of aceBA was upregulated , and Cra activates the expression of this operon while CRP tried to do the opposite . 
+ The flux prediction from in silico simulation confirmed that the fluxes through the glyoxylate shunt were predicted to increase when grown on acetate ( Figure 5B ) . 
+ As an independent verification of the necessity of the glyoxylate shunt activation , the metabolic model was artificially forced to have a lower flux . 
+ The model predicted that the in silico growth rate would decrease as the glyoxylate shunt was forced to have a lower flux ( Figure 5C ) . 
+ Thus , activation of the glyoxylate shunt is required to support optimal cell growth , and the transcriptional regulation by Cra provides up-regulation of enzymes in that pathway while CRP down-regulates the expression of those enzymes . 
+ Thus , the overriding regulatory activity of Cra over CRP on the glyoxylate shunt pathway is fundamental . 
+ The promoter regions and their neighboring regulatory regions of aceBA were also analyzed to gain insights into the molecular mechanisms of this regulatory overriding ( Figure 5D ) . 
+ To provide independent evidence of transcriptional regulation by Cra and CRP , RpoB ChIP-exo experiments were performed to capture the RNA polymerase occupancy over the relevant operons ( Figure 5E ) . 
+ Cra up-regulates expression of aceBA , whereas CRP down-regulates expression of the operon . 
+ When cra was removed , the RNA polymerase occupancy decreased over the aceBA operon . 
+ When crp was knocked-out , the RNA polymerase occupancy increased over the operon . 
+ On the other example of the epd-pgk-fbaA operon , the RNA polymerase occupancy analysis also confirmed the proposed regulatory mode by Cra and CRP . 
+ In the previous study , CRP was claimed to bind on the aceBA promoter covering -- 35 box and to repress the expression of the target gene ( 55 ) . 
+ From the ChIP-exo dataset , Cra binds upstream of the promoter , and up-regulates the expression . 
+ However , repression by CRP binding does not quash activation of aceBA by Cra on acetate , and this could be because CRP repression was observed to take place when fur is missing ( 55 ) . 
+ Thus , the activity of Cra may prevail over the activity of CRP on regulation of aceBA . 
+ Cra regulates the respiratory chain to keep energy balance between reduced glycolysis and activated TCA cycle
+ Usage of gluconeogenesis requires energy . 
+ To make pep from pyruvate ( pyr ) , 1 ATP is required , and converting 3-phospho-glycerate ( 3pg ) into 3-phospho-glyceroyl phosphate ( 13dpg ) requires another ATP . 
+ Similarly , making glyceraldehyde 3-phosphate ( g3p ) from 13dpg consumes one NADH . 
+ Moreover , one ATP is required to activate acetate to acetyl-CoA ( accoa ) . 
+ The iJO1366 model predicted that most energy molecules would be produced from the TCA cycle , and the PP pathway would be barely used in energy production on acetate or other poor carbon sources . 
+ The number of energy molecules that were calculated to be generated from the TCA cycle is sufficient to accommodate the energy requirements for gluconeogenesis and acetate conversion . 
+ However , NADH or NADPH needs to be converted into ATP , since the major energy expenditure would occur with ATP . 
+ Following the fluxes coming from the TCA cycle , the model-based simulation sheds light on how NADH could be converted into ATP when cells are growing on the poor carbon sources . 
+ NADH oxidoreductase I uses NADH to pump out proton molecules into the periplasm so that ATP synthase can generate ATP from the proton gradient ( Figure 6A ) . 
+ Since the iJO1366 model predicted there would be an increased flux though NADH oxidoreductase I , it was postulated that Cra or CRP may be involved in regulating the expression of the enzyme complex . 
+ NADH oxidoreductase I is encoded by a long operon , nuoABCEFGHIJKLMN , and ChIP-exo experiments provided evidence that Cra binds upstream of this operon , indicating regulation by Cra ( Figure 6B ) . 
+ To determine if this regulation is positive or negative , expression change of this operon upon cra knock-out was analyzed ( Figure 6C ) . 
+ On glucose , knocking out cra did not change the expression of nuoABCEFGHIJKLMN , however the expression significantly decreased on acetate ( ranksum test P-value < 1.5 × 10 − 5 ) . 
+ Thus , Cra up-regulates the expression of nuoABCEFGHIJKLMN . 
+ Cra was reported to regulate a broad range of metabolic genes , but independent of CRP ( 13 ) . 
+ However , there is supporting evidence that Cra directly regulates the expression of crp ( 19,56 ) . 
+ The Cra ChIP-exo dataset supports its binding upstream of 70-dependent promoter of crp ( Supplementary Figure S12 ) , however the expression change of crp was not significant between glucose and acetate ( Supplementary Figure S10 ) nor between WT and Δcra ( Supplementary Figure S15 ) . 
+ No evidence has been found that CRP regulates the expression of cra . 
+ Thus , the regulatory interaction between CRP and Cra is responsible for the competition between them on the expression of target genes that they both regulate . 
+ DISCUSSION
+ In this study , the complex transcriptional regulatory network of carbon metabolism in E. coli was investigated using a combination of genome-wide experimental measurements and computer simulation of a genome-scale metabolic model . 
+ The ChIP-exo and RNA-seq methods were applied to Cra when E. coli was grown on glucose , fructose , and acetate , and led to the identification of 97 genes in the Cra regulon . 
+ The definition of the Cra regulon showed that Cra and CRP have distinct roles in carbon metabolism regulation . 
+ Cra is involved in the repression of glycolysis , while historical data shows that CRP is focused on the activation of the TCA cycle . 
+ Expression profiling illustrated that the expression of genes in glycolysis is highest on fructose , and genes in the TCA cycle were more highly expressed on acetate . 
+ Model-based simulation and flux balance analysis were employed to explain this observation , and it was found that it is due to the fact that energy molecules are produced from the TCA cycle . 
+ This energy production from the TCA cycle enables gluconeogenesis when growing on unfavorable carbon sources . 
+ The conversion of energy molecules by NADH oxidation to produce ATP happens during this process , and this explains Cra regulation on the redox pathway . 
+ A single base-pair resolution of the experimental methods and detailed sequence analysis on Cra and CRP binding sites clarified how the activity of Cra overrides the activity of CRP in regulation of their target genes . 
+ The previous proteome study with E. coli BL21 reported that adaptation of bacterial cells in defined and rich medium reflected the antagonistic and competitive regulation of central metabolic pathways by Cra , CRP , and ArcA ( 57 ) . 
+ This study with high-resolution genome-scale experiments and in silico modeling proposes more detailed regulatory mechanism by Cra and CRP . 
+ The optimal gene expression on different carbon sources could be implemented by differential activation of Cra and CRP on glucose and fructose , and Cra activity overriding CRP binding on unfavorable carbon sources . 
+ Conservation analysis demonstrated that transcriptional regulation by Cra might be a more widely used strategy in modulating carbon and energy metabolism over regulation by CRP in E. coli . 
+ Most Cra regulon genes encode metabolic enzymes ; however , there are three TF-encoding genes : pdhR , nikR , and baeR . 
+ While the affiliation of nikR or baeR to carbon metabolism is still unclear , pdhR is involved in carbon metabolism by sensing pyruvate ( 58 ) . 
+ The binding sites of PdhR have been investigated with both in vitro ( 59 ) and in vivo ( 60 ) methods . 
+ In both studies , ndh was annotated as a PdhR target gene . 
+ ndh encodes NADH oxidoreductase II ( NDH-2 ) which is one of two distinct NADH dehydrogenases in E. coli . 
+ The other NADH dehydrogenase is NDH-1 that is encoded by nuo genes , and Cra regulates the expression of the nuo operon . 
+ Moreover , Cra and PdhR both regulate cyoABCDE , which encodes cytochrome bo oxidase . 
+ How the involvement of the electron transport system is relevant to growth on pyruvate has not been fully elaborated . 
+ However , it makes sense that the optimal growth on unfavorable carbon sources accompanies regulation on the redox pathway . 
+ It is possible that this may be because of energy production from the TCA cycle and conversion between energy molecules as similarly shown on acetate for Cra . 
+ In summary , cutting-edge experimental measurements with ChIP-exo and RNA-seq provided the regulatory information for Cra on the core carbon metabolism at the genome-scale . 
+ Integration of this experimentally-derived regulatory information and in silico flux calculation with a genome-scale metabolic model expanded the scope of carbon metabolism regulation by Cra . 
+ Cra supports the optimal cell growth on the poor carbon sources by at least three mechanisms ( Figure 7 ) . 
+ First , Cra redirects the enzymatic flux through glycolysis towards gluconeogenesis , but more importantly it decreases the flux volume through this pathway . 
+ Second , Cra activates the activity of the glyoxylate shunt pathway together with activation of the TCA cycle . 
+ Third , Cra up-regulates some components in the respiratory chain to provide the energy balance between the repressed glycolysis pathway and the activated TCA cycle . 
+ Most importantly , the repression of the glycolysis pathway and the activation of glyoxylate shunt pathway crucially depend on the overriding regulatory activity of Cra over that of CRP . 
+ The consolidation of the experimental measurements of in vivo states of transcriptional components and the computational prediction of in silico states of metabolic activities makes for an integrated genome-scale approach with which to investigate the network level mechanisms of transcriptional regulation in bacteria . 
+ Experimental measurements with recently developed methods at the single base-pair resolution enable researchers to determine the transcriptional regulation activity and to follow biological questions from the dataset . 
+ However , experimental methods can only provide a monolithic snapshot of internal in vivo states of transcriptional regulation under the given conditions . 
+ Modelbased in silico simulation , on the other hand , allows researchers to investigate the activity of a reaction in association with other connected reactions and to explore feasible cellular states . 
+ Thus it is possible to put biological questions or findings in a broader or expanded context . 
+ For instance , the linkage found in this study could be further investigated in the context of carbon and redox metabolism ( 61 ) in combinatorial conditions , which would contribute to understanding carbon metabolism regulation in the context of oxygen-limiting conditions . 
+ Thus , elucidation of transcriptional regulation of the core carbon metabolism in bacteria exhibited the benefits from combining genome-wide experimental measurement and simulation with a genome-scale metabolic model . 
+ DATA AVAILABILITY
+ The whole dataset of ChIP-exo and RNA-seq has been deposited to GEO with the accession number of GSE65643 . 
+ Supplementary Data are available at NAR Online.
+ ACKNOWLEDGEMENTS
+ We thank Marc Abrams for helpful assistance in writing and editing the manuscript . 
+ Author Contributions : D.K. , B.O.P. conceived the study . 
+ D.K. , S.W.S. , Y.G. , G.I.G. performed experiments . 
+ D.K. and H.N. performed the computational analysis . 
+ B.O.P. supervised the study . 
+ D.K. , S.W.S. , H.N. , Y.G. , B.K.C. and B.O.P. wrote the manuscript . 
+ All authors helped edit the final manuscript . 
+ FUNDING
+ Novo Nordisk Foundation Center for Biosustainability at the Danish Technical University [ NNF16CC0021858 ] ; NIH NIGMS ( National Institute of General Medical Sciences ) [ GM102098 ] ; C1 Gas Refinery Program through the National Research Foundation of Korea ( NRF ) funded by the Ministry of Science , ICT & Future Planning [ 2015M3D3A1A01064882 ] ; Basic Science Research Program through the National Research Foundation of Korea ( NRF ) funded by the Ministry of Education [ NRF-2017R1C1B2002441 ] . 
+ Funding for open access charge : Novo Nordisk Foundation Center for Biosustainability at the Danish Technical University [ NNF16CC0021858 ] . 
+ Conflict of interest statement . 
+ None declared .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/29463657.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/29463657.txt 0 → 100644
View file @27818a9
+ The Essential Genome of Escherichia coli K-12
+ ABSTRACT Transposon-directed insertion site sequencing ( TraDIS ) is a highthroughput method coupling transposon mutagenesis with short-fragment DNA sequencing . 
+ It is commonly used to identify essential genes . 
+ Single gene deletion libraries are considered the gold standard for identifying essential genes . 
+ Currently , the TraDIS method has not been benchmarked against such libraries , and therefore , it remains unclear whether the two methodologies are comparable . 
+ To address this , a high-density transposon library was constructed in Escherichia coli K-12 . 
+ Essential genes predicted from sequencing of this library were compared to existing essential gene databases . 
+ To decrease false-positive identiﬁcation of essential genes , statistical data analysis included corrections for both gene length and genome length . 
+ Through this analysis , new essential genes and genes previously incorrectly designated essential were identiﬁed . 
+ We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone . 
+ Examples include short essential regions within genes , orientation-dependent effects , and ﬁne-resolution identiﬁcation of genome and protein features . 
+ Recognition of these insertion proﬁles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry . 
+ IMPORTANCE Incentives to deﬁne lists of genes that are essential for bacterial survival include the identiﬁcation of potential targets for antibacterial drug development , genes required for rapid growth for exploitation in biotechnology , and discovery of new biochemical pathways . 
+ To identify essential genes in Escherichia coli , we constructed a transposon mutant library of unprecedented density . 
+ Initial automated analysis of the resulting data revealed many discrepancies compared to the literature . 
+ We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high-density TraDIS sequencing data for each putative essential gene for the E. coli model laboratory organism . 
+ This paper is important because it provides a better understanding of the essential genes of E. coli , reveals the limitations of relying on automated analysis alone , and provides a new standard for the analysis of TraDIS data . 
+ KEYWORDS Escherichia coli, TraDIS, genomics, tn-seq
+ There are many incentives to deﬁne lists of genes that are either essential for bacterial survival or important for normal rates of growth . 
+ Essential genes of bacterial pathogens may encode components of novel biochemical pathways or potential targets for antibacterial drug development . 
+ Disruption of genes required for rapid growth results in strains handicapped for exploitation in biotechnology . 
+ Conversely , normal growth of mutants defective in genes previously expected to be essential could reveal unexpected parallel biochemical pathways for fulﬁlling the essential function . 
+ Multiple attempts have been made to generate deﬁnitive lists of essential genes , but there are still many discrepancies between studies even for the model bacterium Escherichia coli strain K-12 . 
+ Two general approaches have been used : targeted deletion of individual genes , as in the Keio collection of mutants ( 1 ) , and random mutagenesis ( 2 , 3 ) . 
+ Data from several studies using different mutagenesis strategies have yielded inconsistent data and hence conﬂicting conclusions . 
+ Transposon-directed insertion site sequencing ( TraDIS ) is one of several high-throughput techniques that combine random transposon mutagenesis with sequencing of the transposon junctions in highdensity mutant libraries ( 4 -- 7 ) . 
+ Since its inception in 2009 , this high-throughput method has been applied to a range of biological questions ( 4 , 8 -- 15 ) . 
+ Here , in order to resolve outstanding conﬂicts , we report the use of this approach to identify the essential genes of E. coli K-12 strain BW25113 , a well-studied model organism for which a complete gene deletion library is available ( 1 ) . 
+ A confounding factor in determining the `` essentiality '' of a gene is the deﬁnition of an essential gene . 
+ Complete deletion of an essential gene results , by deﬁnition , in a strain that can not be isolated following growth . 
+ However , it is well-known that certain genes are required for growth under speciﬁc environmental and nutritional conditions . 
+ Such genes can be considered conditionally essential . 
+ For the purposes of this study , we deﬁne a gene as essential if the transposon insertion data reveal that the protein coding sequence ( CDS ) , or a portion of the CDS , is required for growth under the conditions tested here . 
+ To aid our analysis , we developed a statistical model that included corrections for both gene length and genome length in order to decrease false-positive identiﬁcation of essential genes . 
+ An additional challenge with deﬁning essentiality in high-throughput studies is an overreliance on automated analysis of the data . 
+ For example , a consequence of relying only on quantiﬁcation of the number of unique insertions within a gene is that genes with essential regions will be missed . 
+ If only part of a gene encodes the essential function , it should be possible to isolate viable mutants with transposon insertions in nonessential regions of the coding sequence ( 2 ) . 
+ Conversely , reliance on statistical analysis alone can also lead to overestimation of the number of essential genes . 
+ This is a common result from insertion sequencing analysis ( 16 ) . 
+ A low number of transposon insertion events within a gene , which fall below the statistical cutoff threshold , can be due to inaccessibility of the gene to transposition because of extreme DNA structure , exclusion by DNA-binding proteins , polarity effects due to insertion in a gene upstream of a cotranscribed essential gene , and location of the gene close to the replication terminus ( 17 ) . 
+ The most frequent reason for a low number of insertions is that the product of the disrupted gene is required for normal rates of growth under the conditions tested . 
+ In the current study , to minimize the possibility of incorrectly designating genes as essential or contributing to ﬁtness , we have supported our statistical analysis with a gene-by-gene inspection of the insertion distribution within each individual gene . 
+ RESULTS AND DISCUSSION
+ Sequencing of a mini-Tn5 transposon insertion library in E. coli strain BW25113 . 
+ We have used a modiﬁed method to obtain TraDIS data for a transposon mutant library of E. coli K-12 strain BW25113 ( 4 , 9 ) . 
+ The BW25113 strain was chosen because it is the parent strain for the Keio collection of deletion mutants and ideal for a direct comparison between data sets . 
+ A mini-Tn5 transposon with a chloramphenicol resistance cassette was transformed into competent cells and grown overnight on selective medium . 
+ Individual colonies were pooled to construct the initial library , estimated to consist of approximately 3.7 million mutants . 
+ An Illumina MiSeq system was used to obtain TraDIS data from two independent DNA extracts of the transposon library ( TL ) , designated TL1 and TL2 ( Table 1 ) . 
+ Raw data were checked for the presence of an inline index barcode to identify independently processed samples ( Table 1 ) . 
+ This resulted in 4,818,864 sequence reads from TL1 and 6,189,409 from TL2 . 
+ After veriﬁcation of the presence of a transposon sequence and removal of poor-quality data or short sequence reads , 3,891,339 ( 80.75 % ) and 4,387,970 ( 70.89 % ) sequence reads , respectively , were mapped successfully to the E. coli K-12 BW25113 genome ( accession no . 
+ CP009273 .1 ) ( Table 1 ) . 
+ The distribution of insertion sites covers the full length of the genome ( Fig. 1A ) . 
+ There was a high correlation coefﬁcient of 0.96 between the samples ( Fig. 1B ) . 
+ The data were therefore combined to give a total of 8,279,309 sequences that were mapped to 901,383 unique insertion sites throughout the genome . 
+ Of the 
+ 8,279,309 mapped sequences , 199,557 were represented by a single read . 
+ Similar numbers of insertions , 481,360 and 480,072 , were found for both orientations of the transposon . 
+ The high density of unique insertion sites resulted in an average of one insertion every 5.14 bp and a median distance between insertions of 3 bp . 
+ An example is shown in Fig. 1D . 
+ Identiﬁcation of putative essential genes by TraDIS . 
+ To determine whether a gene was essential or nonessential , the numbers of insertions per CDS were quantiﬁed . 
+ CDS is deﬁned as the protein coding sequence of a gene , inclusive of the start and stop codons . 
+ To normalize for gene length , the number of unique insertion points within the CDS was divided by the CDS length in bases . 
+ This value was termed the insertion index score and has been used previously as a measure of essentiality ( 4 , 8 , 9 , 18 ) , given a sufﬁciently dense library ( 19 ) . 
+ The frequency distribution of the insertion index scores was bimodal ( see Fig . 
+ S1 in the supplemental material ) , as previously shown by others ( 2 ) . 
+ We assume that genes associated with the left mode ( any data to the left of the trough in Fig . 
+ S1 ) , which have a low number of transposon insertions , are either essential for survival or genes that , when disrupted , confer a very severe ﬁtness cost ( Fig. 1D ) . 
+ The second mode is associated with genes with considerably more insertions ; these genes are deemed nonessential ( Fig. 1D ) . 
+ Based on inspection of the distributions , an exponential distribution model was ﬁtted to the mode that includes essential genes , and a gamma distribution model was ﬁtted to the nonessential mode . 
+ For a given insertion index score , the probability of belonging to each mode was calculated , and the ratio of these values was termed the log likelihood score . 
+ A gene was classiﬁed as essential if its log likelihood score was less than log2 ( 12 ) and was therefore 12 times more likely ( see Materials and Methods ) to belong to the essential mode than to the nonessential mode . 
+ Using this approach , sufﬁcient insertions were found in 3,793 genes for them to be classed as nonessential , 162 genes were situated between the two modes and classed as unclear , and 358 genes in the mutant library were identiﬁed as essential ( Table S1 ) . 
+ The 358 putative essential genes identiﬁed in the TL data were compared to the essential genes as deﬁned by the Keio collection and the Proﬁling of the E. coli Chromosome ( PEC ) database ( 1 , 2 ) . 
+ This comparison revealed 248 genes ( 59.5 % ) that were common to all three data sets ( Fig. 2A and Table S2 ) . 
+ This agreement between all three data sets strongly supports the hypothesis that these genes are essential so they were not investigated further . 
+ An additional 169 genes were identiﬁed as potentially essential in only one or two of the data sets . 
+ These genes comprise 16 genes in the Keio and PEC lists that were not identiﬁed by our analysis , 25 exclusive to Keio , 18 exclusive to PEC , and 11 and 18 that overlapped between our method and Keio or PEC , respectively ( Fig. 2A ) . 
+ However , the largest subcategory of 81 genes is unique to our data set . 
+ Statistical analysis of the transposon insertion density data . 
+ Overestimation of the number of genes that are essential has been noted in studies using transposon insertion sequencing ( 16 ) . 
+ In previous attempts to use statistical analysis to deﬁne an essential gene , a Poissonian model was used to derive a P value for an insertion-free region ( IFR ) of a given length against the null hypothesis that , by chance , no insertions occurred in that region . 
+ We reﬁned this approach for two reasons . 
+ First , genomes are sequences of discrete sites : although a continuous Poisson model can provide an approximation to this structure , a naturally discrete picture is more representative of true genome structure . 
+ Second , unless corrections are applied for gene length or for the genome length , this method risks overestimating the total number of essential genes . 
+ This problem arises because the method implicitly considers only a single , small genomic region , giving the probability that no insertions will be found in a single region of a given base pair length . 
+ However , genes and genomes have many such regions that are effectively independent , so the genome-wide probability of observing a `` false-positive '' insertion-free region across the genome will be much higher . 
+ To avoid this risk of overinterpretation of TraDIS data , we propose a new statistical approach , summarized in Text S1 and Fig . 
+ S2 . 
+ First , we replaced the commonly used Poissonian model exp ( x/f ) ( for x consecutive bases without an insert , given inverse insertion density f ; see reference 27 for further discussion of this ) with a geometric model . 
+ This model gives the probability of seeing k `` failures '' ( insertion-free sites ) then a `` success '' ( insertion event ) in a string of independent trials as P ( k ) ( 1 ) k , where is the probability of a success ( here , an insertion ) . 
+ The P value associated with a string of L sites being insertion-free is then P kLP k , an easily computable quantity . 
+ Next , to guard against false-positive results , we need to precisely state the statistic of interest and the corresponding null model . 
+ Under a null model of random , independent insertions , the three probabilities most pertinent here are those with which ( i ) a single length L region has no insertions ; ( ii ) a gene of length g contains one or more insertion-free regions of length L ; ( iii ) a genome of length G contains one or more insertion-free regions of length L . 
+ We used stochastic simulations of random insertions with given densities and genome lengths ( Text S1 ) to compute these probabilities . 
+ These values then give P values for insertion-free region observations , correcting for gene and genome length . 
+ Speciﬁcally , pgene ( L ) is the probability of observing one or more insertion-free regions of at least length L in a model gene ( of length g 1,000 bp ) by chance ( ii ) , and pgenome ( L ) is the probability of observing one or more insertion-free regions of at least length L in a full genome ( of length G 4.6 Mb ) by chance ( iii ) . 
+ The uncorrected P value ( i ) is that typically reported in other studies . 
+ Statistical analysis of our current data ( 901,383 inserts in a 4,631,469-bp genome ) gives a corrected pgenome of 0.05 for L 75 bp and pgene of 0.05 for L 36 bp ( pgene of 0.005 for L 47 bp ) . 
+ In other words , there is a probability of 0.05 that any insertion-free region of length 75 bp could appear anywhere in the genome by chance , and there is a probability of 0.005 that any insertion-free region of length 47 bp will occur anywhere in a gene of length 1,000 bp by chance . 
+ To our knowledge , this represents the ﬁrst study with a conﬁdent and genome-wide corrected detection resolution ( Fig . 
+ S2 ) , and the closest yet to approaching the length of the smallest annotated gene in our reference genome ( accession no . 
+ CP009273 .1 ) , which is 45 bp . 
+ In checking for uniformity of insertion density across genomic regions , we found that the density of insertions around the terminus ( taken as a region centered around terABCD ) was slightly lower than the genomic average ( a density of 0.142 in the surrounding 500-kb region , or 0.145 in the surrounding 1-Mb region , compared to a 0.195 average ; Fig. 1A ) . 
+ This density change marginally increases the detection of false-positive essential genes in the vicinity of the terminus but still represents an unprecedented level of coverage . 
+ Resolution of conﬂicts between data sets . 
+ A critical requirement for the validation of a list of essential genes is to explain why the statistical analysis of transposon insertion data failed to identify genes that the Keio library of deletion mutants and the PEC database identiﬁed as essential . 
+ We coupled statistical analysis and manual inspection of the data with literature searches to rationalize conﬂicting results . 
+ We ﬁnd that many of the inconsistencies between data sets can be explained by different methodologies used , deﬁnitions of the term `` essential , '' and statistical approaches ( Fig. 2B ) . 
+ Genes containing transposon-free regions . 
+ Manual inspection of the data revealed genes with transposon-free regions that were large enough to be identiﬁed as signiﬁcant using the algorithm deﬁned in the previous section . 
+ These IFRs do not necessarily report that a gene is essential ; rather , they show that the insertions within these genes are sufﬁciently sparse that the IFR is unlikely to have occurred by chance . 
+ These genes fall loosely into two groups . 
+ The ﬁrst group contains genes for which the 5 = regions are essential and contain no insertions . 
+ However , there are transposon insertions in the nonessential regions of these genes , such as ftsK ( Fig. 3A and Table S3 ) . 
+ FtsK is involved in correct segregation of the chromosome during division ( 20 , 21 ) ; the N-terminal domain of FtsK contains four transmembrane passes and is required for localization of FtsK to the septum ( 22 -- 24 ) . 
+ There is substantial literature reporting the essential function of the N-terminal domain , consistent with our data ( 21 , 22 , 25 ) . 
+ This is a common observation for insertion data and arises when only the function of the N terminus of the protein is required for viability ( 8 ) . 
+ Initial analysis of transposon insertion data would lead to these genes being incorrectly classiﬁed as nonessential , but attempts to construct a deletion mutant would fail . 
+ Indeed , previous transposon sequencing experiments failed to identify the essential nature of some of these genes when relying on statistical analysis alone ( 9 ) . 
+ The second group contains genes with transposon insertion sites throughout the CDS but which have an IFR that passes the signiﬁcance threshold for essentiality . 
+ For example , there is a small IFR within the coding sequence of secM of 66 bp ( Fig. 3B and Table S3 ) . 
+ The secM gene is located upstream of the essential gene secA . 
+ These genes are cotranscribed and also cotranslated , and secM is known to contain a translational stop sequence that interacts with the ribosomal exit tunnel to halt translation , acting as a translational regulator for secA . 
+ Speciﬁc mutations within the translational stop sequence are lethal unless secA is complemented by expression from a plasmid ( 26 ) . 
+ The dependence of secA translation on the secM CDS would explain the Keio classiﬁcation as `` essential . '' 
+ However , the IFR within secM does not fully correspond with the translation stop sequence , suggesting that there is more to be learned about the translational linkage between the two proteins . 
+ Other researchers have used different approaches to minimize false classiﬁcation of essential genes during statistical analysis of the insertion proﬁles by applying a sliding window , quantifying the mean distance between insertions per gene , or variations of truncating the CDS , such as excluding the 3 = end , analyzing only the ﬁrst 60 % of the CDS , or analyzing the central 60 % of the CDS ( 18 , 19 , 27 -- 31 ) . 
+ However , window analysis may overlook genes such as secM and analyzing only the ﬁrst 60 % of the CDS would overlook genes such as ftsK . 
+ We suggest that the algorithmic approach used here is a more appropriate method for identifying essential chromosomal regions in a sufﬁciently dense library . 
+ However , we see a number of IFRs of 45 bp throughout the genome within nonessential genes , suggesting that our null model of random insertions is not capturing the full structural detail of transposon insertion propensity . 
+ This suggests our modeling approach is not based on a perfect representation of biological reality and needs further reﬁnement . 
+ Polar insertions . 
+ A common feature when creating insertion mutants is the introduction of off-target polar effects where expression of adjacent genes is disrupted by the insertion . 
+ To mitigate against such polar effects , we designed a cassette that enabled both transcriptional and translational read-through in one direction only . 
+ To conﬁrm that transcriptional and translational read-through emanates from the transposon , the transposon was cloned in both orientations and in all three reading frames upstream of the lacZ gene in transcription and translation expression vectors pRW224 and pRW225 , derivatives of pRW50 ( 32 , 33 ) . 
+ Transcriptional read-through was con-ﬁrmed for one orientation of the transposon , consistent with transcriptional read-through from the chloramphenicol resistance cassette into the downstream disrupted CDS ( Fig. 4 ) . 
+ Translational read-through was identiﬁed for two of the three open reading frames that coincided with AUG and GUG start codons in the inverted repeat at the end of the transposon . 
+ More - galactosidase activity was obtained from the construct in which the AUG codon was in frame than when the GUG codon was in frame , conﬁrming that translation was initiated more strongly from the AUG codon . 
+ Therefore , transcription is initiated from within the transposon , and translation is initiated from within the inverted repeat . 
+ This allows transcription and translation of downstream essential regions , even from within a CDS . 
+ Such events can be identiﬁed by determining to which DNA strand the sequencing data maps ( Fig. 4C ) . 
+ Analysis of our data reveals a number of chromosomal regions with insertions in only one orientation . 
+ Such insertion proﬁles can offer insight into transcriptional regulation of genes when considered in conjunction with neighboring genes . 
+ For example , the gene rnc is located in an operon upstream of the essential gene , era . 
+ Only mutants with transposons that maintain downstream transcription of era are viable ( Fig. 3C ) . 
+ Baba et al. categorized rnc as essential ( 1 ) . 
+ However , in the case of the Keio library , construction of an rnc deletion mutant would disrupt the ability of the native promoter to drive downstream expression of the essential era gene , resulting in apparent lethality . 
+ Similarly , in both the Keio and PEC databases , yceQ is listed as essential , but we observed many insertions in yceQ , but in only one orientation ( Fig. 3D ) . 
+ The gene is located upstream of the essential gene rne and is divergently transcribed . 
+ The promoter for rne is positioned within yceQ ( 35 , 36 ) , and deletion of yceQ would remove the promoter for rne , resulting in an apparent lethal effect . 
+ Our data reveal that while era and rne are essential , rnc and yceQ are not essential . 
+ Like rnc and yceQ , several of the antitoxin genes are reported to be essential in the Keio library but not in our data set or the PEC database ( Table S3 ) . 
+ Antitoxins are required only if the corresponding toxin gene is functional . 
+ One example is yefM . 
+ We observed a substantial number of insertions in one orientation . 
+ Unlike rnc and yceQ where insertions maintained downstream expression , in the case of yefM , the opposite is true ; insertions that disrupt expression of the antitoxin but maintain downstream expression of the downstream toxin ( yoeB ) are lethal ( Fig. 3E ) . 
+ Scrutiny of our data in this manner reveals that these genes are essential . 
+ Another example of insertion bias is observed in a number of genes at the 3 = end of a transcript , such as rplI ( Fig. 3F ) . 
+ While rplI is not reported as essential , it is worth noting because insertions restricted exclusively to one orientation within the gene can not be explained by the positional context between an essential gene and promoter . 
+ One possible explanation for this observation is that transcription promoted from the transposon produces an antisense RNA that inhibits expression of an essential gene . 
+ Insertion bias , irrespective of the underlying cause , can result in false classiﬁcation of genes when quantifying insertion index scores , as these genes have half as many insertions relative to the rest of the genome . 
+ As such , these insertion proﬁles are to be considered when analyzing data with automated statistical approaches . 
+ Conditionally essential genes . 
+ In addition to the scenarios listed above , certain genes present challenges for binary classiﬁcation of essentiality . 
+ For example , a gene might code for a protein that is essential at a speciﬁc phase of growth , or for growth under certain environmental parameters such as temperature or nutrient availability . 
+ Our data reveal a range of these conditionally essential genes . 
+ For instance , the Keio and PEC databases list folK as essential , whereas we detected multiple insertions within folK ( Fig. 3G ) . 
+ Loss of folK disrupts the ability of the bacterium to produce folate , which is an essential metabolite . 
+ However , supplementation of the medium with folate abrogates the requirement for folate biosynthesis . 
+ In addition to folK , the Keio and PEC databases report degS as essential . 
+ In our data set , degS has a high density of insertions throughout the CDS , suggesting that degS is not essential for growth on an agar plate ( Fig. 3H ) . 
+ Consistent with this , there is substantial literature showing that degS mutants can be isolated , but they either lyse in the stationary phase of growth or rapidly accumulate suppressor mutations ( 37 -- 40 ) . 
+ The conditional essentiality of such mutants can be tested by growing the transposon library in liquid broth . 
+ One would expect that mutants lacking degS will lyse and that folK mutants will be outcompeted as the limited folate available in the medium is depleted . 
+ To test these scenarios , two independent samples of the transposon library were grown in Luria broth ( LB ) at 37 °C for 5 or 6 generations to an optical density at 600 nm ( OD600 ) of 1.0 and were then sequenced . 
+ These samples , LB1 and LB2 , resulted in 5,908,163 and 6,403,324 sequences of which 5,201,711 ( 88.04 % ) and 5,382,477 ( 84.06 % ) , respectively , were mapped to the E. coli BW25113 genome ( Table 1 ) . 
+ Insertion index scores were calculated as before ( Table S4 ) . 
+ As there was a high correlation coefﬁcient of 0.97 between the gene insertion index scores of each technical replicate ( Fig. 1C ) , the data were combined to give a pool of 10,584,188 sequences . 
+ Scrutiny of our data revealed substantially fewer degS and folK mutants after growth in LB , supporting our hypothesis that they are conditionally essential ( Fig. 3G and H ) . 
+ Other genes showing similar ﬁtness costs can be identiﬁed in the LB outgrowth data set ( Table S4 ) . 
+ Errors in library construction . 
+ The difﬁculty in classifying a gene as essential through deletion analysis is the dependence on a negative result to inform classiﬁcation . 
+ Thus , failure to knock out the gene may result in the false classiﬁcation of a gene as essential . 
+ For example , the Keio database originally reported mlaB ( yrbB ) as essential . 
+ However , our data demonstrate that mlaB is nonessential , and this is supported by the literature ( 41 , 42 ) . 
+ We have observed similar outcomes for several other genes ( Table S3 ) . 
+ The reason why knockouts of these genes were not obtained in the construction of the Keio library is unknown . 
+ In addition to the false-positive outcomes described above , we noted several instances of false-negative results within the Keio library database . 
+ For example , both our TraDIS data and the PEC database identiﬁed 18 genes as essential that are reported as nonessential in the original Keio database ( Table S2 ) . 
+ Subsequently , Yamamoto et al. ( 34 ) demonstrated that for 13 of these mutants , the target gene was duplicated during construction of the Keio library , resulting in a functional protein ; these genes are almost certainly essential . 
+ Another difﬁculty that arises when targeting essential genes for mutagenesis is the potential to select for mutants with a compensatory mutation elsewhere in the genome . 
+ Our data revealed that hda is an essential gene , but it is classiﬁed as nonessential in the Keio database . 
+ Since the initial description of the Keio library , hda has been reported to be essential , but hda mutants rapidly accumulate suppressor mutations that restore viability ( 43 -- 45 ) . 
+ We hypothesize that this is an explanation for the observed essentiality of some genes in the TraDIS data set that were described as nonessential by others ( Table S3 ) . 
+ These effects may arise when creating TraDIS libraries , but the effects are masked by the large number of mutants in the population . 
+ Similarly , in the PEC library , where insertion density is low , essential genes with an insertion in a nonessential region of the gene will be falsely classiﬁed as nonessential when relying on single insertion mutants to inform essentiality . 
+ An example of this false-negative classiﬁcation in the PEC database is tadA ( Table S3 ) . 
+ The TadA protein is a tRNA-speciﬁc deaminase , and its essentiality is reported in the Keio database and our data set and is supported by the literature ( 46 ) . 
+ The PEC database reports a single insertion site within the extreme 3 = end of the tadA gene . 
+ We have identiﬁed a range of underlying causes behind data set discrepancies and highlight that there are numerous possible insertion proﬁles for an `` essential '' gene . 
+ As such , it is important to note that no single statistical method , to our knowledge , would fully identify every essential gene and that manual inspection of data is crucial . 
+ Genes identiﬁed as essential only by TraDIS . 
+ There are 81 genes identiﬁed as essential using our insertion index data , which are not reported as essential in the Keio or PEC database ( Table 2 ) . 
+ These genes fall into two groups , those with no insertions and the remainder with insertions in the CDS . 
+ The ﬁrst group is most likely to be essential . 
+ For example , rpsU is essential in our data and has been described as essential by others ( Fig. 5A ) ( 47 ) . 
+ However , in the Keio library , there is a duplication event , which gives rise to a mutant that produces a functional protein ( 34 ) . 
+ Scrutiny of our data for the remaining genes reveals that there are additional essential genes with a low frequency of insertions . 
+ For instance , holD has been described in the literature as an essential gene ( 48 ) . 
+ Our data support that ﬁnding ( Fig. 5B and Table S1 ) . 
+ However , holD mutants are available in the Keio collection . 
+ The demonstration by Durand et al. and others ( 48 -- 50 ) that holD mutants accumulate extragenic suppressor mutations at high frequency may explain why these mutants are considered nonessential in the Keio database and why we observe a low frequency of insertions in our experiments . 
+ A number of the genes unique to our analysis were not identiﬁed as essential in the Keio collection or PEC database simply because they are not included in either of these data sets . 
+ This is in part because the Keio collection of knockout mutants was based on available annotation data at the time ( 51 ) . 
+ For example , the identiﬁcation and location of ynbG , yobI , and yqcG were published only in 2008 ( 52 ) . 
+ These genes show very sparse or no transposon disruption in our data , and consequently , these genes are potentially essential ( Fig. 5C , D , and E ) . 
+ Further validation studies would be required to conﬁrm this . 
+ As mentioned previously , overreporting of essential genes may occur when nones-sential genes have low insertion index scores . 
+ Such low insertion index scores may arise due to attenuated growth . 
+ An example of gene misclassiﬁcation because mutation results in a ﬁtness cost and attenuated growth is guaA . 
+ The low insertion index score results in guaA being classed as essential despite having many insertions . 
+ The ﬁtness effect was conﬁrmed by growing the library in LB , as such mutants are outcompeted ( Fig. 5F ) , and the literature supports the fact that this gene is not essential and has an altered growth rate ( 53 ) . 
+ High-resolution features within a TraDIS data set . 
+ Manual inspection of a TraDIS data set can reveal additional information that might go unnoticed in a highthroughput analysis pipeline . 
+ A common observation from this and previous detailed analysis of data from saturated transposon libraries is the ability to determine , at the base pair level of resolution , the boundaries of essential regions within a gene . 
+ An example of an essential gene with a dispensable 3 = end is yejM ( pbgA in Salmonella enterica serotype Typhimurium ) . 
+ Only the 5 = end of the CDS is essential , up to and including codon 189 , which corresponds with ﬁve transmembrane helices of the protein structure ; the C terminus of the protein is a periplasmic domain that is dispensable for viability ( Fig. 6A ) ( 54 -- 56 ) . 
+ Our TraDIS data revealed insertions in codons 
+ 186 and 189 . 
+ Analysis of the transposon orientation at these points revealed that they corresponded with the same transposon insertion location but , due to the 9-bp duplication introduced by the transposon , in different transposon orientations . 
+ The introduced transposon sequence maintains codon 189 , completely consistent with previously reported results ( 54 , 56 ) . 
+ In addition , as a result of our transposon design , a further feature of our TraDIS data is the identiﬁcation of genes with dispensable 5 = ends . 
+ An example of this is yrfF , which encodes an inhibitor of the Rcs stress response ( Fig. 6B ) ( 57 , 58 ) . 
+ This phenomenon , while less well covered in the literature , is not surprising , given that Zhang et al. report equal likelihood of a required intragenic region residing at the 5 = or 3 = end of a gene , albeit in Mycobacterium tuberculosis ( 31 ) . 
+ These mutants will be viable only if the remaining CDS can be translated into a functional product , and one would expect to ﬁnd an orientation bias where the transposon drives downstream transcription and translation of the essential region . 
+ Interestingly , inspection of our data revealed essential genes with isolated insertions within the coding sequence . 
+ An example of this is grpE . 
+ The grpE gene codes for the essential nucleotide exchange factor that forms a dimer and interacts with the DnaK/J complex ( 59 ) . 
+ The isolated insertion occurs only in the orientation that maintains expression of the remaining CDS ( Fig. 6C ) . 
+ Mapping of the site of transposon insertion onto the previously determined protein structure of GrpE indicated that the insertion occurred within the part of the gene encoding a ﬂexible linker between two - helices 
+ MATERIALS AND METHODS
+ Strains and plasmids . 
+ E. coli K-12 strain BW25113 , the parent strain of the Keio library , was used for construction of a transposon library . 
+ The strain has the following genotype : rrnB3 ΔlacZ4787 hsdR514 Δ ( araBAD ) 567 Δ ( rhaBAD ) 568 rph-1 ( 65 ) . 
+ The transposon mutant library was constructed by collaborators from Discuva Ltd. , Cambridge , United Kingdom , following a method described for Salmonella Typhi ( 4 ) . 
+ The main differences were that a mini-Tn5 transposon coding for a chloramphenicol resistance cassette was used . 
+ This was ampliﬁed by PCR from the cat gene of the plasmid vector pACYC184 ( 66 ) using oligonucleotide primers incorporating the Tn5 transposon mosaic ends . 
+ Transposomes were prepared using Tn5 transposase ( Epicentre , Madison , WI , USA ) , and these were introduced into E. coli K-12 strain BW25113 by electrotransformation . 
+ Transposon mutants were selected by growth on LB agar supplemented with chloramphenicol . 
+ Approximately 5.6 million colonies representing an estimated 3.7 million mutants were pooled and stored in 15 % glycerol at 80 °C . 
+ Media and growth conditions . 
+ DNA was extracted from two samples of the transposon library glycerol stock to generate TraDIS data referred to as TL1 and TL2 in the text . 
+ In addition , DNA was extracted from two independent cultures , LB1 and LB2 , of the library grown in Luria broth ( LB ) ( 10 g tryptone , 5 g yeast extract , 10 g NaCl ) and grown for generations at 37 °C with shaking until the culture reached an optical density at 600 nm ( OD600 ) of 1.0 . 
+ - Galactosidase assay . 
+ - Galactosidase assays were used to measure the activity of transposon : : lacZ fusions . 
+ The transposon was cloned in each orientation , for all three open reading frames , into transcription and translation assay vectors pRW224 and pRW225 ( 33 ) . 
+ Strains carrying the transposon : : lacZ fusions were grown overnight at 37 °C with aeration in LB supplemented with 35 g/ml tetracycline ( Sigma ) . 
+ The density of the overnight culture was determined by measuring OD650 and then used to subculture into 5 ml LB and incubated at 37 °C with aeration until the mid-exponential phase of growth ( OD650 of 0.3 to 0.5 ) . 
+ Each culture was lysed by adding 100 l each of toluene and 1 % sodium deoxycholate , mixed by vortexing for 15 s and aerating for 20 min at 37 °C . 
+ The - galactosidase activity of each culture was assayed by the addition of 100 l of each culture lysate for three technical replicates to 2.5 ml Z buffer ( 10 mM KCl , 1 mM MgSO4 · 7H2O , 60 mM Na2HPO4 , 30 mM NaH2PO4 · 2H2O supplemented with 2.7 ml - mercaptoethanol per liter of distilled water , adjusted to pH 7 ) supplemented with 13 mM 2-nitrophenyl - - D-galactopyranoside ( ONPG ) ( Sigma ) . 
+ The reaction mixture was incubated at 37 °C until a yellow color had developed , after which the reaction was stopped by adding 1 ml of 1 M sodium carbonate . 
+ The absorbance of the reaction at OD420 was measured , and - galactosidase activity was calculated in Miller units . 
+ TraDIS sequencing . 
+ Harvested cells were prepared for sequencing following an amended TraDIS protocol ( 4 , 8 , 9 ) . 
+ Genomic DNA was isolated using a Qiagen QIAamp DNA blood minikit , according to the manufacturer 's speciﬁcations . 
+ DNA was quantiﬁed and mechanically sheared by ultrasonication . 
+ Sheared DNA fragments were processed for sequencing using NEB Next Ultra I kit . 
+ Following adaptor ligation , a PCR step was introduced to enrich for transposon-containing fragments , using a forward primer speciﬁc for the transposon 3 = end and a reverse primer speciﬁc for the adaptor . 
+ After PCR puriﬁcation , an additional PCR prepared DNA for sequencing through the addition of Illumina-speciﬁc ﬂow cell adaptor sequences and custom inline index barcodes of variable length in the forward primers . 
+ The purpose of this was to increase indexing capacity while staggering introduction of the transposon sequence to increase base diversity during sequencing . 
+ Samples were sequenced using Illumina MiSeq 150 cycle v3 cartridges , aiming for an optimal cluster density of 800 clusters per mm2 . 
+ Sequencing analysis . 
+ Raw data were collected and analyzed using a series of custom scripts . 
+ The Fastx barcode splitter and trimmer tools , of the Fastx toolkit , were used to assess and trim the sequences ( 67 ) . 
+ Sequence reads were ﬁrst ﬁltered by their inline indexes , allowing no mismatches . 
+ Transposon similarity matching was done by identifying the ﬁrst 35 bp of the sequenced transposon in two parts : 25 bases ( 5 = to 3 = , corresponding to the PCR2 primer binding site ) were matched , allowing for three mismatches , trimmed , and then the remaining 10 bases ( corresponding to the sequenced transposon ) matched , allowing for one mismatch , and trimmed . 
+ Sequences less than 20 bases long were removed using Trimmomatic ( 68 ) . 
+ Trimmed , ﬁltered sequences were then aligned to the reference genome E. coli BW25113 ( accession no . 
+ CP009273 .1 ) , obtained from the NCBI genome repository ( 69 ) . 
+ Where gene names differed between databases , the BW25113 annotation was used . 
+ The aligner bwa was used , with the mem algorithm ( 0.7.8-r455 [ 75 ] ) . 
+ Aligned reads were ﬁltered to remove any soft clipped reads . 
+ The subsequent steps of conversion from SAM ( sequence alignment/map ) ﬁles to BAM ( binary version of SAM ) ﬁles , and the requisite sorting and indexing , were done using SAMtools ( 0.1.19-44428cd [ 70 ] ) . 
+ The BEDTools suite was used to create BED ( browser extensible data ) ﬁles which were intersected against the coding sequence boundaries deﬁned in general feature format ( . 
+ gff ) ﬁles obtained from the NCBI ( 71 ) . 
+ Custom python scripts were used to quantify insertion sites within the annotated CDS boundaries . 
+ Data were inspected manually using the Artemis genome browser ( 72 ) . 
+ Essential gene prediction . 
+ The frequency of insertion index scores was plotted in a histogram using the Freedman-Diaconis rule for choice of bin widths ( see Fig . 
+ S1 in the supplemental material ) . 
+ Using the R MASS library ( http://www.r-project.org ) , an exponential distribution ( red line ) was ﬁtted to the left , `` essential '' mode ( i.e. , any data to the left of the trough in Fig . 
+ S1 ) ; a gamma distribution ( blue line ) was ﬁtted to the right , `` nonessential '' mode ( i.e. , any data to the right of the trough ) . 
+ The probability of a gene belonging to each mode was calculated , and the ratio of these values was used to calculate a log likelihood score . 
+ Using a 12-fold likelihood threshold , based on the log likelihood scores , genes were assigned as `` essential '' if they were 12 times more likely to be in the left mode than in the right mode , and `` nonessential '' if they were 12 times more likely to be in the right mode ( 9 ) . 
+ Genes with log likelihood scores between the upper and lower log2 12 threshold values of 3.6 and 3.6 , respectively , were deemed 
+ `` unclear . '' 
+ A threshold cutoff of log2 ( 12 ) was chosen , as it is more stringent than log2 ( 4 ) ( used by Langridge et al. [ 4 ] ) , and consistent with analysis used by Phan et al. ( 9 ) . 
+ Essential gene lists . 
+ The Keio essential gene list is composed of the original essential genes minus three open reading frames ( ORFs ) , JW5190 , JW5193 , and JW5379 , as they are not annotated within strain MG1655 and are thought to be spurious , giving a ﬁnal list of 300 genes ( 1 , 73 ) . 
+ The PEC data set is composed of the 300 genes listed as essential for strain W3110 ( 2 ) . 
+ The lists of essential genes were compared using BioVenn ( 74 ) . 
+ Statistical analysis . 
+ For details of the statistical analysis , see Text S1 , Fig . 
+ S1 , and Fig . 
+ S2 in the supplemental material . 
+ Accession number ( s ) . 
+ TraDIS sequencing data are available from the European Nucleotide Archive under accession no . 
+ PRJEB24436 . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at https://doi.org/10.1128/mBio .02096 -17 . 
+ TEXT S1 , DOCX ﬁle , 0.1 MB . 
+ FIG S1 , PDF ﬁle , 0.02 MB . 
+ FIG S2 , TIF ﬁle , 0.1 MB . 
+ TABLE S1 , XLSX ﬁle , 0.2 MB . 
+ TABLE S2 , PDF ﬁle , 0.04 MB . 
+ TABLE S3 , PDF ﬁle , 0.03 MB . 
+ TABLE S4 , XLSX ﬁle , 0.3 MB . 
+ ACKNOWLEDGMENTS
+ We thank Discuva Ltd. for providing some of their large transposon mutant library . 
+ We thank N. Loman and J. Quick for help with optimization of our MiSeq protocol . 
+ We thank the authors of Langridge et al. ( 2009 ) for kindly supplying their R code for essential gene prediction . 
+ We thank Tony Hitchcock and Steve Williams for their support . 
+ Last , we thank M. J. Collingwood and R. W. Meek for their generous help with drawing ﬁgures . 
+ This research has been supported by the Midlands Integrative Biosciences Training Partnership ( MIBTP , BBSRC ) Ph.D. program , and the University of Birmingham Elite Ph.D. . 
+ Scholarship to I.R.H. Cobrabio contributed to the University of Birmingham Elite Ph.D. studentship . 
+ I.G.J. is supported by a Birmingham Fellowship . 
+ S.J. is supported by the BBSRC and MRC .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/29468196.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/29468196.txt 0 → 100644
View file @27818a9
+ Altered Distribution of RNA Polymerase Lacking the Omega
+ ABSTRACT The RNA polymerase ( RNAP ) of Escherichia coli K-12 is a complex enzyme consisting of the core enzyme with the subunit structure 2 = and one of the subunits with promoter recognition properties . 
+ The smallest subunit , omega ( the rpoZ gene product ) , participates in subunit assembly by supporting the folding of the largest subunit , = , but its functional role remains unsolved except for its involvement in ppGpp binding and stringent response . 
+ As an initial approach for elucidation of its functional role , we performed in this study ChIP-chip ( chromatin immunoprecipitation with microarray technology ) analysis of wild-type and rpoZ-defective mutant strains . 
+ The altered distribution of RpoZ-defective RNAP was identiﬁed mostly within open reading frames , in particular , of the genes inside prophages . 
+ For the genes that exhibited increased or decreased distribution of RpoZ-defective RNAP , the level of transcripts increased or decreased , respectively , as detected by reverse transcription-quantitative PCR ( qRT-PCR ) . 
+ In parallel , we analyzed , using genomic SELEX ( systemic evolution of ligands by exponential enrichment ) , the distribution of constitutive promoters that are recognized by RNAP RpoD holoenzyme alone and of general silencer H-NS within prophages . 
+ Since all 10 prophages in E. coli K-12 carry only a small number of promoters , the altered occupancy of RpoZ-defective RNAP and of transcripts might represent transcription initiated from as-yet-unidentiﬁed host promoters . 
+ The genes that exhibited transcription enhanced by RpoZ-defective RNAP are located in the regions of low-level H-NS binding . 
+ By using phenotype microarray ( PM ) assay , alterations of some phenotypes were detected for the rpoZ-deleted mutant , indicating the involvement of RpoZ in regulation of some genes . 
+ Possible mechanisms of altered distribution of RNAP inside prophages are discussed . 
+ The 91-amino-acid-residue small-subunit omega ( the rpoZ gene prod-IMPORTANCE uct ) of Escherichia coli RNA polymerase plays a structural role in the formation of RNA polymerase ( RNAP ) as a chaperone in folding the largest subunit ( = , of 1,407 residues in length ) , but except for binding of the stringent signal ppGpp , little is known of its role in the control of RNAP function . 
+ After analysis of genomewide distribution of wild-type and RpoZ-defective RNAP by the ChIP-chip method , we found alteration of the RpoZ-defective RNAP inside open reading frames , in particular , of the genes within prophages . 
+ For a set of the genes that exhibited altered occupancy of the RpoZ-defective RNAP , transcription was found to be altered as observed by qRT-PCR assay . 
+ All the observations here described indicate the involvement of RpoZ in recognition of some of the prophage genes . 
+ This study advances understanding of not only the regulatory role of omega subunit in the functions of RNAP but also the regulatory interplay between prophages and the host E. coli for adjustment of cellular physiology to a variety of environments in nature . 
+ KEYWORDS Escherichia coli, RNA polymerase, omega subunit, prophage, transcription regulation
+ RNA polymerase ( RNAP ) is the key enzyme of transcription . 
+ In Escherichia coli , the RNAP core enzyme is composed of four subunits , ( RpoA ) , ( RpoB ) , = ( RpoC ) , and ( RpoZ ) , in the stoichiometry of 2 = ( reviewed in references 1 and 2 . 
+ The core enzyme is assembled sequentially in the order 2 2 2 2 = ( 3 , 4 ) . 
+ In this assembly pathway , interacts with = , forming = intermediate ( 5 ) , which then binds to the preformed 2 complex , leading to formation of the core enzyme . 
+ A model of chaperone function was proposed for 91-residue-long ( RpoZ ) in supporting the folding of the longest polypeptide , = ( RpoC ) , of 1,407 residues in size . 
+ RNAP puriﬁed from the rpoZ-defective mutant is associated with GroL , indicating the participation of GroL chaperone in place of in the process of RNAP formation ( 6 ) . 
+ In agreement with this assembly mechanism of RNAP , the rpoZ gene is not an essential gene and the mutant lacking rpoZ is viable ( 7 ) . 
+ In the core enzyme , RpoZ is present in near-stoichiometric amounts with respect to other subunits ( 8 ) . 
+ The crystal structures of RNAP from Thermus aquaticus ( 9 ) and E. coli ( 10 ) conﬁrm that RpoZ is one of the RNAP subunits . 
+ Structure-function relationships have been extensively characterized for RpoA , RpoB , and RpoC ( 11 , 12 ) , but except for the structural role in assembly of RNAP , the functional role played by RpoZ remains unsolved ( 13 ) . 
+ When E. coli RNAP is associated with a dominant negative variant of RpoZ subunit , the resulting RNAP is defective in initiation of transcription , although preinitiated RNAP-RNA complex can elongate transcription ( 14 ) . 
+ When RpoZ is tethered to DNA-binding protein , it is able to activate transcription through protein-protein contact between RNAP and the RpoZ segment ( 15 ) . 
+ These ﬁndings indicate the direct molecular contact between RpoZ and other subunits of RNAP . 
+ Several lines of observation indicated the involvement of RpoZ in the functional control of RNAP ( 14 , 16 ) . 
+ A functional link between the RpoZ subunit and the stringent response has been elucidated : ﬁrst , in vitro transcription by puriﬁed RNAP without RpoZ subunit was found to be insensitive to ppGpp ( 17 ) , and the RpoZ-less RNAP regains its sensitivity to ( p ) ppGpp upon the external addition of the RpoZ subunit ( 18 ) or the protein DksA , a collaborative player in the stringent response ( 19 ) . 
+ Direct binding of ppGpp at a site near the rifampin binding site on the RpoB subunit was indicated by genetic and cross-linking studies ( 20 -- 22 ) , while the recent crystal structure of RNAP-ppGpp complex and mutational studies on E. coli RNAP showed that ppGpp binds at the interface of RpoC and RpoZ subunit ( 23 -- 25 ) . 
+ The level of RpoZ subunit inﬂuences DNA relaxation in E. coli ( 26 ) , implying difference in DNA-binding properties between wild-type RNAP and RpoZ-defective RNAP . 
+ Microarray analysis of transcriptome in the absence of RpoZ indicated alteration in transcription of a set of genes , including the relA gene encoding ppGpp synthetase ( 7 ) . 
+ These observations altogether indicate that the function of RNAP is controlled by the associated RpoZ subunit . 
+ Besides the ppGpp-binding site on the RpoZ subunit ( ppGpp site 1 ) , ppGpp binding was recently identiﬁed at an interface between RNAP and DksA ( ppGpp site 2 ) ( 27 ) . 
+ In the stringent control , a small transcription factor , DksA , participates in conjunction with ppGpp ( 19 ) . 
+ In this study , we made attempts to ﬁnd the regulatory role of RpoZ subunit in the function of RNAP and transcription . 
+ As a shortcut approach to get insights into the overall functional role of RpoZ subunit in the transcription of the E. coli genome , we have performed , in this study , ChIP-chip ( chromatin immunoprecipitation with microarray technology ) analysis of the distribution of RNAP along the genome in the presence and absence of RpoZ subunit . 
+ Results of the ChIP-chip analysis indicated that the distribution of RNAP lacking RpoZ in growing E. coli K-12 cells is altered compared with that of wild-type RNAP mostly in the middle of the open reading frame ( ORF ) of a speciﬁc set of target genes . 
+ Surprisingly , the majority of these genes are located within the cryptic prophages . 
+ This ﬁnding was conﬁrmed by measuring mRNA for these genes . 
+ These unexpected ﬁndings raise an interesting possibility of the involvement of RpoZ in the recognition of prophage genes . 
+ This could make a breakthrough in the identi-ﬁcation of the functional role of RpoZ . 
+ RESULTS Altered distribution of RpoZ-defective RNAP inside the prophages along the
+ E. coli genome . 
+ To examine the possible inﬂuence of the presence and absence of RpoZ on the distribution of RNAP along the genome , we performed ChIP-chip analysis ( 28 , 29 ) for E. coli K-12 wild-type strain BW25113 and its rpoZ-deleted mutant JW3624 from the Keio collection ( 30 ) . 
+ The absence of RpoZ in JW3624 was conﬁrmed by Western blotting analysis using speciﬁc anti-RpoZ antibody ( see Fig . 
+ S1 in the supplemental material ) . 
+ In a synthetic M9-glucose medium , JW3624 showed a similar growth pattern as that of the parent strain BW25113 until the early stationary phase , supporting the concept that the rpoZ gene is not essential and does not inﬂuence cell growth . 
+ The growth patterns differed between the two strains after prolonged incubation ( see below ) . 
+ In the middle of exponential growth phase , both wild-type BW25113 and rpoZ-defective JW3624 strains were treated with formaldehyde at a ﬁnal concentration of 1 % for cross-linking between proteins and genomic DNA . 
+ After 30 min , lysates were prepared , sonicated for DNA fragmentation , and then subjected to immunoprecipitation with anti-RpoA antibody . 
+ RNAP-conjugated DNAs in immunoprecipitates were digested by pronase , and then free DNA fragments were puriﬁed using a QIAquick PCR puriﬁcation kit ( Qiagen ) . 
+ Recovered DNA fragments were ampliﬁed by PCR using a pair of random primers . 
+ DNA from wild-type BW25113 was labeled with Cy3 , while that from rpoZ-defective JW3624 was labeled with Cy5 . 
+ The DNA mixture was subjected to tiling array analysis for mapping DNA segments along the E. coli genome . 
+ The ratio of Cy3 and Cy5 ﬂuorescence intensity bound to each probe was plotted along the E. coli K-12 genome , and thus , the relative intensity of Cy3/Cy5 represents the relative distribution between wild-type and RpoZ-defective mutant RNAP at each probe position . 
+ RNAP-bound DNA fragments of about 250 bp in size should bind to two or more 60-bp-long probes aligned at 105-bp intervals , and thus , a single peak was estimated to be a background noise . 
+ The relative distributions between wild-type and RpoZ-defective mutant RNAP were similar along the entire E. coli K-12 genome ( Fig. 1 ) . 
+ One surprising ﬁnding is that the RpoZ-defective RNAP showed a high level of distribution , mostly within open reading frames ( Fig. 1 , orange background ) . 
+ Furthermore , these peaks of the high-level distribution of RpoZ-defective RNAP are located inside some prophages in the E. coli K-12 genome ( see blue marks in Fig. 1 for the location of prophages ) . 
+ E. coli K-12 contains a total of 10 cryptic prophages ( 31 , 32 ) , i.e. , CP4-6 , DLP-12 ( or Qsr ) , e14 , Rac , Qin ( or Kim ) , CP4-44 , CPS-53 ( or KpLE1 ) , CPZ-55 , CP4-57 , and KpLE2 , in this order along the E. coli K-12 genome and one short prophage segment , PR-X , on the genetic map ( Table S1 ) , altogether comprising about 3.6 % of its genome sequence ( 33 , 34 ) . 
+ The prophage set is different between E. coli strains . 
+ For instance , prophages CP4-6 , e14 , Rac , Qin , CPS-53 , and CP4-57 exist in both E. coli K-12 and pathogenic E. coli O157 strains , but DLP-12 , CP4-44 , PR-X , and CPZ-55 are present only in E. coli K-12 strains ( 32 ) . 
+ On the other hand , E. coli O157 strains carry various types of O157-speciﬁc prophages . 
+ Each prophage of E. coli K-12 carries 9 ( CPZ-55 ) to 43 ( CP4-6 ) genes , but the functions are not known for most of these genes . 
+ The gene functions of prophages have been predicted based on the sequence similarity of the related original phages ( for details , see Discussion ) . 
+ Altered distribution of the RpoZ-defective RNAP inside speciﬁc genes . 
+ The ChIP-chip pattern indicates marked differences in the levels of RNAP occupancy inside some speciﬁc genes , in particular within prophages , between wild-type and rpoZ-deleted mutant strains . 
+ The marked increase in the distribution of RpoZ-defective RNAP was observed for about 30 positions by setting a cutoff level of 5 ( occupancy relative to that of wild-type RNAP ) ( Fig. 1 ) . 
+ The genes that exhibited the high-level distribution of RpoZ-defective RNAP are located within some cryptic prophage regions . 
+ The test DNA obtained by ChIP-chip screening often binds more than two probes of the tiling array , forming a single peak , and thus , the total amount of RNAP binding of each single peak was calculated by combining the ﬂuorescence intensity hybridized for each array probe within a single and the same peak . 
+ Among the total of 30 peaks of high-level distribution of RpoZ-defective RNAP detected at the cutoff level of 5.0 , the highest-level distribution of RpoZ-defective RNAP was identiﬁed inside the ORF of the CP4-44 prophage ﬂu gene encoding the Ag43 autotransporter ( Fig. 1 ) . 
+ The relative binding level of RpoZ-defective RNAP for this site was 36.6 compared with wild-type RNAP ( see Table S2 for details ) . 
+ Almost half ( 47 % ) of the high-level distribution of the mutant RNAP belonged to the genes that are organized in speciﬁc prophages , including CP4-6 , e14 , Rac , CP4-44 , CP4-57 , and KpLE2 ( Fig. 1 shows the location of these prophages ; for details , see Table S1 ) . 
+ The increased distribution of RpoZ-defective RNAP was detected in multiple genes for e14 ( 5 genes ) , Rac ( 3 genes ) , CP4-6 ( 2 genes ) , and KpLE2 ( 2 genes ) ( Fig. 2 ) . 
+ The set of genes that showed high-level distribution of RpoZ-lacking RNAP included ymfN ( putative transcription factor [ TF ] ) , ymfK ( putative repressor ) , ycfK ( unknown protein ) , intE ( predicted integrase ) , and ymfM ( unknown protein ) in e14 prophage ; ydaY ( pseudogene ) , stfR ( predicted tail ﬁber protein ) , and ydaV ( putative replication protein ) in Rac prophage ; ykfC ( conserved protein ) and ykfI ( YkfI-YafW T-AT toxin ) in CP4-6 prophage ; and yjhH ( predicted lyase ) and yjhG ( D-xylonate dehydrogenase ) in KpLE2 prophage ( Fig. 2 shows the locations of these genes in each prophage ) . 
+ Even though little is known about the functions of prophage-encoded proteins , these cryptic phages contribute to cell physiology such as cell growth , resistance to antibiotics , stress responses , and bioﬁlm formation ( 34 ) , implying that at least some of the prophage-encoded proteins are expressed in E. coli supposedly under speciﬁc conditions . 
+ The genes showing the high-level distribution of RpoZ-lacking RNAP are not clustered but scattered along each prophage ( Fig. 2 ) . 
+ Including these genes , transcription organization of the prophage genes is not known yet ( for details , see Discussion ) . 
+ Among the top 30 genes that exhibited increased distribution of the RpoZ-deleted RNAP , about half are carried by the host E. coli K-12 genome ( Table 1 ) and are located within a group of transporter genes such as the genes encoding LsrAC transporter for cell-cell communication signal AI-2 and ModABC transporter for molybdate . 
+ Signal transduction apparatus is also included , such as RcsBC two-component system ( TCS ) phosphotransferase and NtrBC TCS sensor . 
+ It is noteworthy that the cell surface receptor for phage N4 is also included in this group . 
+ Some of these genes could be considered the research targets for detailed analysis of the role of RpoZ in the functional control of RNAP . 
+ The ChIP-chip pattern also indicated decreased distribution of the RpoZ-deleted RNAP inside the ORF of speciﬁc genes , again mostly within some prophages ( Table S3 ) . 
+ Marked reduction in the distribution of mutant RNAP was observed for prophage CP4-6 ( Fig. 3 ) . 
+ The lowest difference , of 0.03 , was observed inside the ORF of the CP4-6 prophage yagN and CPZ-55 prophage yffN genes ( Table S3 ) . 
+ Noteworthy is that the genes exhibiting the decreased distribution of the RpoZ-defective RNAP in the prophages are located within the so-called promoter islands containing promoter-like sequences ( 35 ) . 
+ Although the level of constitutive promoters recognized by RpoD holoenzyme alone in the absence of additional supporting factors is low in the prophage regions ( see below ) , the level of promoter islands is high in some prophage regions ( 35 ) . 
+ One possibility is that these promoter-like sequences within the promoter islands might be differentially recognized by RNAP with and without RpoZ subunit ( see below ) . 
+ Transcription of the genes showing altered distribution of the RpoZ-defective RNAP . 
+ The altered distribution of the RpoZ-defective RNAP in some speciﬁc genes might be correlated with altered transcription , pausing , and/or attenuation-termination . 
+ To search for possible relationships between altered distribution of RpoZ-defective RNAP and altered transcription , in particular within prophages , attempts were then made to directly measure transcripts from the genes that showed altered distribution of RpoZ-defective RNAP . 
+ From the set of genes that showed high-level distribution of RpoZ-defective RNAP ( Table 1 ) , we selected 17 genes , i.e. , 5 from the prophages and 12 from the host genome ( Table 2 ) . 
+ Using reverse transcription-quantitative PCR ( RT-qPCR ) , transcripts of these genes were measured for these genes . 
+ As expected , the rpoZ mRNA was not detectable in the rpoZ-defective mutant strain . 
+ Among the total of 17 genes examined , the level of transcript was more than 4-fold higher for 7 genes ( ydbA , ybdJ , yﬁQ [ CP4-57 ] , paaE , ydaY [ Rec ] , ybcH , and yhcD ) and the transcript level was more than 2-fold higher for 6 genes ( ymfN [ e14 ] , basS , yaiP , ybhA , ycfK , and ymfK [ e14 ] ) ( Table 3 ) . 
+ This ﬁnding indicates that the high-level distribution of RpoZ-defective RNAP correlates with their high-level transcription . 
+ In order to identify possible inﬂuence of the decreased binding of RpoZ-defective RNAP on transcription , we then analyzed the levels of transcripts for seven representative genes ( Table 4 ) from the list of decreased binding of the mutant RNAP ( Table 3 ) . 
+ After RT-qPCR , transcripts were found to markedly decrease for yagM , yffL , yagN , yjfJ , yagL , and intF ( Table 4 ) . 
+ This ﬁnding indicates that the decreased distribution of RpoZ-defective RNAP correlates , to a certain extent , with their decreased transcription . 
+ Taken together , we predicted that the altered distribution of RpoZ-defective RNAP reﬂects the altered transcription of particular genes by the RpoZ-defective RNAP . 
+ Possible mechanisms of how only a speciﬁc set of genes is differently transcribed by the RpoZ-defective RNAP remain to be solved in future . 
+ In this aspect , the identiﬁcation of contact partners , which collaborate with RNAP in different manners in the presence and absence of RpoZ , could be important . 
+ Distribution of the constitutive promoters inside prophages . 
+ The prophage genes are generally silent and are not expressed under the steady state of cell growth , but some genes are induced under speciﬁc conditions , leading to inﬂuence on cell growth , resistance to antibiotics , stress responses , and bioﬁlm formation ( for instance , see reference 34 . 
+ For instance , the minor sigma factor FecI is encoded by the KpLE2 prophage and regulates only the divergently transcribed fecABCDE operon in the same prophage . 
+ This small regulon is employed by E. coli for utilization of the ferric citrate transport system ( 36 ) . 
+ The promoters recognized by E. coli RNAP should be fewer in prophages , because in the case of E. coli phages , only early genes are transcribed by the host RNAP , but afterward , the modiﬁed host RNAP by phage gene products ( in the case of T-even and lambdoid phages ) or the phage-encoded RNAP ( in the case of T7-type phages ) is responsible for the transcription of late genes ( 37 ) . 
+ Except for a limited number of characterized genes such as the fec regulon in prophage KpLE2 , however , almost nothing is known of the locations of promoters inside the prophages . 
+ We then analyzed aRatio indicates the relative level of mRNA ( rpoZ mutant/wild type ) . 
+ When the target gene is located inside a prophage , the name of the prophage is given . 
+ bThe level of mRNA was determined for E. coli K-12 mutants lacking the rpoZ gene by using the qRT-PCR method . 
+ The genes were selected from the list showing the decreased distribution of RpoZ-defective RNAP ( Table 3 and Fig. 3 ) . 
+ As references , two genes ( a and b ) were selected from the list of genes with increased distribution of RpoZ-defective RNAP ( Table 2 and Fig. 2 ) . 
+ the distribution of promoters in each prophage using the genomic SELEX ( systematic evolution of ligands by exponential enrichment ) screening system . 
+ RNAP holoenzyme was reconstituted from the sigma-free core enzyme and 4-fold excess of puriﬁed RpoD sigma . 
+ The constitutive promoters were predicted based on the binding sites of this reconstituted RpoD holoenzyme alone in the absence of other DNA-binding proteins ( 38 ) . 
+ A maximum total of 669 constitutive promoters were identiﬁed on the E. coli K-12 genome ( 38 , 39 ) ( see Fig . 
+ S2A for the distribution of RpoD constitutive promoters ) . 
+ Only a small number of the constitutive promoters were identiﬁed on prophages CP4-6 , e14 , and Rac ( Fig. 4 ) , supposedly each contributing to the transcription of the argF gene encoding ornithine carbamoyltransferase ( CP4-6 ) , the lit gene encoding protease for cleavage of EF-Tu in collaboration with Gol protein ( e14 ) , and the trkG gene encoding K transporter ( Rac ) , respectively . 
+ E. coli K-12 contains two genes , argF and argI , both encoding the ornithine carbamoyltransferase involved in the synthesis of L-citrulline from carbamoyl phosphate and L-ornithine along the pathway of arginine biosynthesis . 
+ Thus , the argF gene is considered to be integrated into the prophage CP4-6 , together with its promoter , after duplication or transposition of the original argI gene ( 40 ) . 
+ Likewise , the trkG gene encoding a K transporter is closely related to the trkH gene in the genome of E. coli K-12 ( 41 ) . 
+ Both TrkG and TrkH are active as low-afﬁnity transporters of K and function in conjunction with TrkA , a membrane binding protein . 
+ Thus , the trkG gene must have been inserted , together with its promoter , into the prophage Rac . 
+ The lit gene product blocks late gene expression of phage T4 , leading to phage exclusion ( 42 ) . 
+ This inhibitory activity depends on Gol protein of T4 gene 23 , together functioning as peptidase for cleavage of EF-Tu ( 43 ) . 
+ The putative promoter of the lit gene is , however , activated only after mutation ( 42 ) . 
+ Taken together , it is unlikely that prophages contain many promoters recognized by E. coli K-12 RNAP . 
+ Distribution of H-NS silencer along the prophages . 
+ To avoid deleterious effects of the expression of foreign genes , including prophages , E. coli carries the gene silencing system , in which H-NS ( 44 , 45 ) and Rho ( 46 -- 48 ) are known to participate as sentries . 
+ Even though these two silencing players have apparently similar functions with respect to the gene silencing , the mechanism is different between H-NS and Rho . 
+ H-NS is one of the nucleoid-associated proteins with both an architectural role in genome folding and a global regulatory role of transcription ( 49 ) . 
+ The altered transcription of a set of prophage genes by RpoZ-defective RNAP might be due to differences in the interaction of RpoZ-defective RNAP with prophage genomeassociated H-NS silencer . 
+ We then analyzed the location of primary binding sites of H-NS along the genome of E. coli K-12 by using the Genomic SELEX screening system . 
+ A total of 987 binding sites were identiﬁed ( 38 ) ( see Fig . 
+ S2B for the H-NS binding sites along the entire genome ) , of which a small number of H-NS binding sites were identiﬁed within speciﬁc prophages ( Fig. 5 ) . 
+ Noteworthy is that the strong binding sites of H-NS are almost absent for both e14 and Rac prophages , which showed the high-level distribution of RpoZ-defective RNAP . 
+ This ﬁnding implies that the binding of RpoZ-defective RNAP increased in the genes inside some prophages without strong binding sites for H-NS silencer . 
+ Altered phenotypes of the RNAP lacking RpoZ detected by PM analysis . 
+ To see whether the differential occupancy of genes by RpoZ-less RNAP inﬂuences the phenotype of the cell , we studied the growth phenotype of the RpoZ-deleted strain with various carbon substrates and its tolerance to different environmental stresses like osmotic stress , pH , and antibiotic , using phenotype microarray ( PM ) . 
+ PM technology allows us to ﬁnd altered functions of genes by testing mutants for a large number of phenotypes simultaneously ( 50 ) . 
+ An altered gene selection pattern of RpoZ-defective RNAP leads to the different phenotypes under various conditions tested . 
+ We then subjected both wild-type and rpoZ-defective E. coli strains to phenotype microarray ( PM ) assay . 
+ The most prominent effect was seen on chemicals targeting the cell membrane , as the rpoZ-defective strain was remarkably sensitive toward sodium iodonitrotetrazolium ( NT ) violet ( Fig. 6 ) . 
+ NT violet is widely used for measurement of cellular redox activity or the respiratory activity in bacteria ( 50 ) , indicating reduced transcription of the genes involved in respiratory metabolism for the rpoZ mutant . 
+ On the other hand , the growth of this mutant strain was signiﬁcantly better than that of the wild type in the presence of gallic acid ( PM19 , A05-09 ) and phenethicillin ( PM19 , F01-04 ) . 
+ Gallic acid , a type of phenolic acid , is known as a hydrated natural product of tannin and is commonly used in the pharmaceutical industry to produce a psychedelic alkaloid . 
+ Against bacteria , gallic acid is a toxic agent , a mutagen , and a modulator of amyloid formation ( 51 , 52 ) . 
+ These actions of gallic acid are considered to be attributable to changes in membrane properties such as charge and hydrophobicity . 
+ Phenethicillin is a semisynthetic acid-resistant penicillin , which is a methyl analog of phenoxymethyl penicillin . 
+ Penicillin group antibiotics bind to the penicillin-binding proteins ( PBPs ) and inhibit the cross-linking of peptidoglycan chains by PBPs ( 53 ) , ultimately leading to weakening of the bacterial cell wall . 
+ Besides the antibiotics targeting the membrane , the rpoZ-defective mutant was sensitive to the ribosome-targeting oxytetracycline , a broad-spectrum tetracycline ( PM20 , F05-09 ) ( Fig. 6 ) . 
+ E. coli gains the resistance to tetracyclines through interference with its binding to ribosomes by ribosome protection proteins such as 
+ TetM ( 54 ) . 
+ Resistance to tetracyclines is also mediated through decreased import or enhanced export of the drugs through membranes . 
+ In fact , the resistance by R factors is mediated through decrease of the intracellular level of tetracyclines ( 55 ) . 
+ The observed resistance of the rpoZ-defective mutant to these antibiotics detected by PM assay was then conﬁrmed using individual liquid cultures in the presence of different concentrations of tetracycline ( Fig . 
+ S3A ) or penicillin ( Fig . 
+ S3B ) . 
+ In the absence of drugs , the growth was retarded for the rpoZ mutant after late stationary phase . 
+ In the presence of these drugs , however , the growth retardation apparently disappeared . 
+ After prolonged culture , the growth of the rpoZ mutant was comparable to that of the wild type ( Fig . 
+ S3 ) and continued longer than that of the wild type as observed by the PM assay ( Fig. 6 ) and the tetrazolium reduction assay ( Fig . 
+ S4 ) . 
+ These observations altogether imply that the phenotypic resistance of the rpoZ mutant to these antibiotics could be attributable to the physiological modulation through some metabolic shifts due to altered transcription of as-yet-unspeciﬁed genes by the mutant RNAP . 
+ A close correlation has been established between the antibiotic sensitivity and the bacterial metabolism ( 56 ) . 
+ Under starvation conditions after prolonged culture , the susceptibility to antibiotics decreases , leading to display of the phenotypic resistance . 
+ The altered distribution of RpoZ-defective RNAP in prophage regions and the altered sensitivity of the rpoZ-defective mutant to antibiotics both agree with the ﬁnding that the cryptic prophages inﬂuence the sensitivity to a variety of environments , including the sensitivity to antibiotics ( 34 ) . 
+ DISCUSSION
+ E. coli RNAP functionally differentiates through two steps of protein-protein interaction : ﬁrst , seven species of the sigma subunit , and second , more than 300 species of transcription factors ( TFs ) ( 57 ) . 
+ TFs interact with one of the RNAP subunits for function ( 58 -- 60 ) . 
+ As a subunit of RNAP core enzyme , RpoZ might be involved in interaction with TFs . 
+ We then attempted to look into a broader perspective of the regulatory role played by RpoZ . 
+ Here , we identiﬁed the distribution of RNAP with and without RpoZ across the E. coli genome through ChIP-chip assay and found that the RNAP lacking RpoZ showed altered distribution within the prophage regions along the E. coli genome , supporting the prediction that RpoZ is involved in control of the regulatory function of RNAP . 
+ A functional link between the RpoZ subunit and the stringent control has been suggested through its direct interaction with a nucleotide effector , ppGpp ( 17 , 18 , 24 ) . 
+ Genetic and cross-linking studies of E. coli RNAP indicated that the direct binding of ppGpp to RpoB affects the catalytic activity of RNA polymerization ( 20 , 21 ) , while the mutational studies of E. coli RNAP and the recent crystal structure of RNAP-ppGpp complex showed that ppGpp binds at the interface of RpoB , RpoC , and RpoZ subunit ( 23 -- 25 ) . 
+ Besides this ppGpp-binding site 1 , ppGpp-binding site 2 was identiﬁed at an interface between RNAP and DksA ( 27 ) . 
+ In the absence of RpoZ , marked alteration was not identiﬁed for the set of hitherto-identiﬁed genes under stringent control such as rRNA genes ; the ppGpp-binding site 2 might play a major role in the stringent control . 
+ Horizontal transfer of foreign genes is a major contributor to the evolution of prokaryotic genomes ( 61 ) . 
+ The well-characterized model bacterium E. coli K-12 carries a set of 10 cryptic prophages and a short prophage segment ( 32 , 61 ) . 
+ Overall , a total of approximately 
+ 304 genes , including predicted pseudogenes , exist in these prophages , and based on the similarity to phage genes , the functions have been predicted , but without experimental conﬁrmation , for some genes in each prophage ( see Table S1 in the supplemental material ) . 
+ One critical problem is whether these horizontally transferred genes are expressed and what the roles of these genes are in prophage survival inside the host E. coli K-12 . 
+ Defective prophages Rac ( 62 ) , e14 ( 63 ) , DLP-12 ( 64 ) , and Qin ( 65 ) are believed to carry some functional genes . 
+ Some of the prophage genes are beneﬁcial to E. coli , including the genes encoding toxins and antibiotic resistance components for survival under various stressful natural conditions and for stable persistence in host animals . 
+ Sometimes , however , prophages kill E. coli through their induction . 
+ For instance , Rac repressor and integrase retain functional activity as conjugational transfer induces gene expression from the prophage and causes excision ( 66 ) . 
+ In agreement with the expression of some functional genes from the Rac prophage , Rac is lethal to the host when its genes are expressed , resulting in the inhibition of cell division ( 67 ) . 
+ To avoid deleterious effects of prophages , E. coli carries the gene silencing system , in which H-NS ( 44 , 45 , 68 ) and Rho ( 46 , 47 ) are known to participate as sentries . 
+ H-NS is known as a nucleoid-associated global silencer to prevent transcription . 
+ The silencing function of H-NS is interfered with by global regulators such as LeuO ( 69 ) . 
+ We predicted that the level of H-NS binding is related to the alteration of transcription in the presence and absence of RpoZ . 
+ On the other hand , Rho acts as a transcription terminator . 
+ The altered transcription of a set of prophage genes by RpoZ-defective RNAP might also be due to the difference in the interaction of RpoZ-defective RNAP with termination factor Rho . 
+ Direct interaction of RNAP with Rho or Rho-Nus protein complexes has been suggested ( 3 , 70 ) . 
+ Involvement of Rho in the maintenance of some prophages has been suggested because the absence of the rho gene induces excision of defective prophages ( 47 ) . 
+ In the total of approximately 304 prophage genes , the genes for one RNAP sigma factor ( FecI ) and 14 TFs exist ( Table S1 ) , of which most are considered to control expression of the prophage genes . 
+ We have determined that the total number of constitutive promoters that are recognized by RNAP RpoD holoenzyme alone in the absence of supporting TFs is 492 to 669 ( 38 ) . 
+ Along this line , the numbers of constitutive promoters recognized by the minor sigma factors were estimated to be 129 to 179 ( RpoS ) , 101 to 142 ( RpoH ) , 34 to 42 ( RpoF ) , and 77 to 106 ( RpoE ) ( 71 ) . 
+ In contrast , FecI is unique because it regulates only the divergently transcribed fecABCDE operon that encodes the ferric citrate transport system , suggesting that the regulatory target is still ﬁxed within the KpLE2 prophage ( 36 ) . 
+ Among the total of 14 TFs , the regulatory target and function have been experimentally examined only for CP4-57-encoded AlpA ( intA regulator ) ( 72 ) , DPL-12-encoded AppY ( acid phosphatase regulator ) ( 73 ) , and CP4-6-encoded XynR ( xylonate catabolism regulator ) ( 74 ) . 
+ Regulatory targets of these TFs seem to be ﬁxed to the genes inside prophages . 
+ For instance , AlpA of CP4-57 regulates only the intA gene for excision of the CP4-57 gene ( 72 ) . 
+ The polysaccharide xylan , one representative renewable plant hemicellulose biopolymer , consists of D-xylose . 
+ The main pathway for utilization of D-xylose by E. coli K-12 depends on the xylFGH and xylAB operons . 
+ XylR is a TF that activates D-xylose import ( xylFGH ) and catabolism ( xylAB ) genes ( 75 ) . 
+ The expression level of XylR is controlled by DicF small RNA ( sRNA ) encoded by Qin prophage ( 76 ) . 
+ E. coli lacks the oxidation pathway of xylose , but once the oxidized product xylonate is provided by coexisting microorganisms in nature , E. coli is able to catabolize xylonate with the use of CP4-6-encoded YagEF enzymes . 
+ XynR ( renamed YagI ) on CP4-6 prophage regulates only the adjacent yagEF genes on the same CP4-6 prophage ( 74 ) , indicating that XynR is a rare single-target TF and its regulatory target is still ﬁxed on the CP4-6 prophage genes . 
+ Thus , E. coli gained the system for utilizing plant-derived xylose from the prophages . 
+ In utilization of plant-derived xylan , host TF ( XylR ) and prophage TFs ( XynR and DicF ) collaborate , thereby contributing to the stable maintenance of prophages CP4-6 and Qin . 
+ Likewise , DLP-12-encoded AppY is induced under anaerobic conditions and , in collaboration with ArcA , plays a role in induction of the expression of the hydrogenase 1 operon ( hyaABCDEF ) ( 77 ) . 
+ ArcA is the response regulator of the quinone-dependent ArcAB two-component signal transduction system to respond to the change in respiratory conditions . 
+ Prophage-encoded AppY and host ArcA collaborate for antirepression against the repressor IscR , the iron-sulfur cluster [ 2Fe-2S ] regulator . 
+ This is another example of the novel mode of prophage-host interaction , in which the prophage-encoded TF collaborates with the host TF so as to modulate the spectrum of regulation targets . 
+ Except for these three TFs , the regulatory function is not known for the other 13 prophage-encoded TFs . 
+ We conclude that not only structurally but functionally as well , RpoZ is crucial for transcription as it guides the transcription machinery to express the genes necessary for viability under various environmental stresses . 
+ Our results indicate a future need to explore the role of RpoZ in alteration of the RNAP functions , including the interaction with as-yet-unidentiﬁed TFs , including H-NS and Rho . 
+ MATERIALS AND METHODS
+ Bacterial strains . 
+ Escherichia coli strains BW25113 ( parent strain ) and JW3624 lacking the rpoZ gene ( 30 ) were used for all experiments . 
+ The strains were cultured in M9 medium ( Difco ) containing 0.4 % glucose at 37 °C . 
+ Cell growth was monitored by measuring the optical density at 600 nm ( OD600 ) . 
+ ChIP-chip analysis . 
+ The ChIP-chip assay was carried out as previously described ( 78 , 79 ) with a few modiﬁcations . 
+ BW25113 and JW3624 strains were grown in M9 glucose medium at 37 °C to an optical density at 600 nm ( OD600 ) of 0.2 and then incubated in M9 medium containing formaldehyde ( ﬁnal concentration of 1 % ) for 30 min for cross-linking between proteins and genomic DNA . 
+ After termination of the cross-linking reaction by the addition of glycine , cells were collected , washed , and lysed by addition of lysozyme . 
+ The lysates were sonicated using a digital Soniﬁer ( Branson ) to fragment the genome . 
+ After centrifugation , the supernatant of the whole-cell extract was subjected to immunoprecipitation with the anti-RpoA antibody ( Neoclone ) - coated protein A-Dynal Dynabeads ( Invitrogen ) . 
+ RNAP-conjugated DNAs in immunoprecipitation fractions were digested by pronase ( Roche ) , and then the free DNA fragments were puriﬁed using a QIAquick PCR puriﬁcation kit ( Qiagen ) . 
+ Recovered DNA fragments were ampliﬁed by PCR using a pair of random primers . 
+ The ampliﬁed DNA fragments from the wild-type strain were labeled with Cy3 , while another sample from the rpoZ mutant was labeled with Cy5 . 
+ The labeled DNAs were mixed and hybridized to a 43,450-feature E. coli tiling array ( Oxford Gene Technology ) ( 80 , 81 ) . 
+ After hybridization of samples to the DNA tiling array , the Cy5/Cy3 ratio was measured and the peaks of scanned patterns were plotted against the positions of DNA probes along the E. coli W3110 genome . 
+ Genomic SELEX screening . 
+ The genomic SELEX screening was carried out as previously described ( 82 , 83 ) . 
+ A mixture of DNA fragments of the E. coli K-12 W3110 genome was prepared after sonication of puriﬁed genome DNA and cloned into a multicopy plasmid , pBR322 . 
+ In each SELEX screening , the DNA mixture was regenerated by PCR . 
+ For SELEX screening , 5 pmol of the mixture of DNA fragments and 10 pmol reconstituted RNAP RpoD holoenzyme or puriﬁed H-NS were mixed in a binding buffer ( 10 mM Tris-HCl , pH 7.8 , at 4 °C , 3 mM magnesium acetate , 150 mM NaCl , and 1.25 mg/ml bovine serum albumin ) . 
+ RNAP RpoD holoenzyme was prepared by mixing the puriﬁed sigma-free core enzyme and a 4-fold molar excess of the overexpressed and puriﬁed RpoD sigma ( 38 ) while H-NS was puriﬁed from overexpressed cells ( 84 ) . 
+ The sequences of DNA fragments obtained by the genomic SELEX screening were identiﬁed by a SELEX-chip method as described previously ( 82 , 83 ) . 
+ SELEX-chip data were submitted to the transcription factor proﬁling of Escherichia coli ( TEC ) at the National Institute of Genetics , Mishima , Japan ( https://shigen.nig.ac.jp/ecoli/tec/top/; accession code RpoZ_ChIP ) . 
+ RT-qPCR . 
+ Total RNA was prepared from E. coli cell as previously described ( 85 ) . 
+ E. coli was grown in M9 glucose medium to an OD600 of 0.2 at 37 °C with shaking . 
+ Cells were harvested , and total RNA was prepared using hot phenol . 
+ The concentration of total RNA was determined by measuring the absor-bance at 260 nm , and its purity was checked by agarose gel electrophoresis . 
+ Next , total RNAs were transcribed to cDNA with random primers using the Primer Script ﬁrst-strand cDNA synthesis kit ( TaKaRa Bio ) , and quantitative PCR ( qPCR ) was conducted using SYBR green PCR master mix ( Applied Biosystems ) as previously described ( 38 , 79 ) . 
+ The primers used are described in Table S3 in the supplemental material . 
+ The cDNA templates were serially diluted to 2-fold and used in the qPCR assays . 
+ The levels of the 16S rRNA gene were used for normalization of data . 
+ The relative expression levels were quantiﬁed using the threshold cycle method presented by PE Applied Biosystems ( PerkinElmer ) . 
+ PM . 
+ Phenotype microarray ( PM ) , a high-throughput technology for simultaneous testing of a large number of cellular phenotypes , was employed according to the manufacturer 's instructions ( BioLog ) ( 86 , 87 ) . 
+ In this study , PM was used for screening of the phenotypic differences between the wild-type strain and the rpoZ-deleted mutant strain . 
+ Growth difference was monitored by measuring the color intensity of oxidation of tetrazolium violet by NADH . 
+ SUPPLEMENTAL MATERIAL
+ Supplemental material for this article may be found at https://doi.org/10.1128/ mSystems.00172-17 . 
+ FIG S1 , PDF ﬁle , 0.4 MB . 
+ We thank Kayoko Yamada and Ayako Kori for technical assistance . 
+ We also thank the NBRP-E . 
+ coli center for E. coli strains . 
+ This work was supported by JSPS Indo-Japan Bilateral Joint Research Project , the MEXT Cooperative Research Program of Network Joint Research Centre for Materials and Devices , and the MEXT-Supported Program for the Strategic Research Foundation at Private Universities .
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/29476659.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/29476659.txt 0 → 100644
View file @27818a9
+ Devon M. Fitzgerald 1 , Carol Smith 2, Pascal Lapierre 2, and Joseph T. Wade 1,2,3
+ This article has been accepted for publication and undergone full peer review but has not been through the copyediting , typesetting , pagination and proofreading process which may lead to differences between this version and the Version of Record . 
+ Please cite this article as an ` Accepted Article ' , doi : 10.1111 / mmi .13941 This article is protected by copyright . 
+ All rights reserved 
+ Recent findings have identified thousands of bacterial promoters in unexpected locations , such as inside genes . 
+ Here , we investigate the functions of intragenic promoters for the flagellar sigma factor FliA . 
+ Our data suggest that most of these promoters are not functional , but that one intragenic FliA promoter is broadly conserved , and constrains evolution of the overlapping protein-coding gene . 
+ Our data suggest that intragenic regulatory 
+ ABSTRACT
+ In Escherichia coli , one Sigma factor recognizes the majority of promoters , and six `` alternative '' Sigma factors recognize specific subsets of promoters . 
+ The a 28 lternative Sigma factor FliA ( σ ) recognizes promoters upstream of many flagellar genes . 
+ We previously showed that most E. coli FliA binding sites are located inside genes . 
+ However , it was unclear whether these intragenic binding sites represent active promoters . 
+ Here , we construct and assay transcriptional promoter-lacZ fusions for all 52 putative FliA promoters previously identified by 
+ ChIP-seq . 
+ These experiments , coupled with integrative analysis of published genome-scale transcriptional datasets , strongly suggest that most intragenic FliA binding sites are active promoters that transcribe highly unstable RNAs . 
+ Additionally , we show that widespread intragenic FliA-dependent transcription may be a conserved phenomenon , but that specific promoters are not themselves conserved . 
+ We conclude that intragenic 
+ FliA-dependent promoters and the resulting RNAs are unlikely to have important regulatory functions . 
+ Nonetheless , one intragenic FliA promoter is broadly conserved , and constrains evolution of the overlapping protein-coding gene . 
+ Thus , our data indicate that intragenic regulatory elements can influence bacterial protein evolution , and suggest that the impact of intragenic regulatory sequences on genome evolution should be 
+ INTRODUCTION
+ In bacteria , RNA polymerase ( RNAP ) requires a transcription initiation factor , σ , to recognize promoter elements and initiate transcription . 
+ Bacteria encode one housekeeping σ factor that functions at most promoters , and multiple `` alternative '' σ factors that each recognize smaller sets of promoters . 
+ Historically , promoters were thought to be located solely upstream of annotated genes . 
+ However , widespread transcription initiation from inside genes has now been described in Escherichia coli and many other species ( reviewed , ( Lybecker et al. , 
+ 2014 ; Wade and Grainger , 2014 ) ) . 
+ Consistent with these observations , the 70 E. coli housekeeping σ factor , σ , has been shown to bind many intragenic sites ( Singh et al. , 2014 ) . 
+ Similar findings have been reported for alternative σ factors , e.g. 40 % of Mycobacterium tuberculosis SigF binding sites , 25 % of 32 E. coli σ binding et al. , 2013 ; Bonocora et al. , 2015 ) . 
+ The high degree of pervasive transcription involving multiple σ factors 
+ Like σ factors , DNA-binding transcription factors often bind extensively within genes ( Shimada et al. , 2008 ; J. 
+ Galagan et al. , 2013 ; J. E. Galagan et al. , 2013 ; Bonocora et al. , 2013 ; Wade and Grainger , 2014 ; Grainger , 
+ 2016 ) . 
+ The regulons of most transcription factors have not been mapped , even for E. coli , suggesting that most intragenic binding sites remain to be identified . 
+ Indeed , a study of 51 transcription factors in Mycobacterium tuberculosis suggests that a typical bacterial genome contains > 10,000 intragenic binding sites ( J. E. Galagan et al. , 2013 ) . 
+ The transcriptional activities of most intragenic transcription / σ factor binding sites have not been extensively studied , but many are likely to be functional ( J. E. Galagan et al. , 2013 ) . 
+ Although transcription regulatory networks evolve rapidly , individual regulatory interactions are often maintained by purifying selection ( Lozada-Chávez et al. , 2006 ; Perez and Groisman , 2009 ; Stringer et al. , 2014 ) . 
+ Hence , many intragenic transcription / σ factor binding sites may be functional , and thus are likely to be conserved . 
+ A previous study suggested that purifying selection on intragenic transcription / σ factor binding sites in human cells constrains the evolution of overlapping protein-coding genes ( Stergachis et al. , 2013 ) . 
+ The impact of bacterial 
+ FliA ( σ ) is an alternative σ factor involved in transcription of genes associated with flagellar motility and chemotaxis ( reviewed ( Paget , 2015 ) ) . 
+ FliA also initiates transcription of some non-flagellar genes in E. coli 
+ ( Fitzgerald et al. , 2014 ) , and is encoded by some non-motile bacteria , such as Chlamydia ( Yu and Tan , 2003 ) , suggesting additional non-flagellar roles . 
+ Recently , we reported that over half of E. coli FliA binding sites are located inside genes , often far from gene starts ( Fitzgerald et al. , 2014 ) . 
+ These intragenic sites were split approximately evenly between those occurring in the sense and antisense orientations , with respect to the overlapping gene . 
+ Most intragenic FliA binding sites were not associated with detectable FliA-dependent 
+ RNAs , so it is unclear whether they represent functional promoters . 
+ Notably , FliA is the most highly and broadly conserved alternative σ factor ( Feklístov et al. , 2014 ; Paget , 2015 ) . 
+ The interactions between FliA , 
+ RNA polymerase , and promoter DNA are so highly conserved that the D Bacillus subtilis homolog , σ , can complement an E. coli ΔfliA strain ( Chen and Helmann , 1992 ) . 
+ Like many alternative σ factors , FliA has a decreased ability to melt DNA as compared to housekeeping σ factors ( Koo , Rhodius , Nonaka , et al. , 2009 ; 
+ Feklístov et al. , 2014 ) . 
+ Thus , FliA-dependent transcription initiation requires a stringent match to its consensus promoter sequence ( Koo , Rhodius , Campbell , et al. , 2009 ) . 
+ Together , the high conservation and readily 
+ In this study , we evaluate the promoter activity of intragenic FliA binding sites in E. coli . 
+ We also assess the conservation of intragenic FliA promoters and map the Salmonella FliA regulon . 
+ We conclude that most intragenic FliA binding sites represent bona fide promoters that transcribe unstable intragenic RNAs . 
+ We show that extensive intragenic transcription by FliA is likely to be a conserved phenomenon , but the genetic locations of intragenic FliA promoters are generally not conserved . 
+ Nonetheless , we show that a single intragenic FliA promoter is under strong selective pressure that constrains the evolution of the FlhC protein . 
+ This is the first documented example of intragenic regulatory sequence impacting evolution of the overlapping protein-coding gene in a bacterium , and suggests that selective pressure on intragenic binding sites for σ factors and 
+ RESULTS
+ Most intragenic FliA binding sites represent transcriptionally active promoters
+ To test whether FliA binding sites previously identified by ChIP-seq ( Fitzgerald et al. , 2014 ) represent active promoters , we generated transcriptional fusions of potential promoters to the lacZ reporter gene . 
+ For each of the 
+ 52 putative FliA promoters , the region from approximately -200 to +10 was cloned upstream of lacZ on a single-copy plasmid ( Figure 1A ) . 
+ We chose to include 200 bp upstream sequence because at least one FliA promoter is regulated by a transcription factor binding upstream ( Hollands et al. , 2010 ) . 
+ Plasmids were transformed into a motile strain of E. coli MG1655 ( i.e. expressing FliA ) , or an isogenic ΔfliA derivative , and assayed for β-galactosidase activity . 
+ Of the 20 intergenic promoters , 15 displayed significant FliA-dependent activity ( t-test , p ≤ 0.05 ; Figure 1B ) . 
+ Of the 30 intragenic promoters , 10 out of 16 sense - and 7 out of 14 antisense-orientation putative intragenic promoters showed significant FliA-dependent activity ( t-test , p ≤ 0.05 ; with transcription of stable RNAs ( ( flhC ) motAB-cheAW , ( yafY ) ykfB , ( yjdA ) yjcZ , ( uhpT ) , and antisense ( hypD ) , where genes in parentheses indicate those with an internal FliA promoter . 
+ One of the two putative promoters located in convergent intergenic regions also showed significant FliA-dependent activity ( t-test , p ≤ 0.05 ; Figure 
+ 1C ) . 
+ It should be noted that some fusions had very high levels of background activity , which may have prevented the detection of lower levels of FliA-dependent transcription from these promoter fusions . 
+ Of note , no FliA-dependent activity was detected for the well-characterized promoters upstream of fliA , fliD , and fliL , likely due to overwhelming transcriptional activity from the strong , 70 σ - dependent , FlhDC-activated promoters known to be immediately upstream ( Liu and Matsumura , 1996 ; Stafford et al. , 2005 ; Fitzgerald et al. , 2014 ) . 
+ High β-galactosidase activity associated with the lacZ fusions for pntA , cvrA , glyA , proK , and insB-4 / cspH suggest they are also likely to include σ promoters that may preclude identification of FliA-dependent 
+ We previously identified FliA-regulated transcripts using RNA-seq , although most intragenic FliA sites were not associated with a detectable RNA ( Fitzgerald et al. , 2014 ) . 
+ However , this method often fails to detect unstable RNAs . 
+ To independently assess whether intragenic FliA binding sites act as promoters , we analyzed two published datasets generated from motile E. coli strains : ( i ) genome-wide transcription start site ( TSS ) mapping by differential RNA-seq ( dRNA-seq ) ( Thomason et al. , 2015 ) , and ( ii ) Nascent Elongating Transcript sequencing ( NET-seq ) ( Larson et al. , 2014 ) . 
+ dRNA-seq identifies TSSs by selectively degrading processed transcripts bearing a 5 ' monophosphate , and then preparing a library from the remaining 5 ' triphosphate-bearing primary transcripts ( Sharma and Vogel , 2014 ) . 
+ By focusing reads to the 5 ' ends of transcripts , this technique is more sensitive than standard RNA-seq , and can distinguish intragenic RNAs from overlapping mRNAs . 
+ NET-seq isolates nascent RNA still bound to RNAP , facilitating detection of unstable transcripts prior to degradation 
+ To compare FliA binding site location to TSS mapping data , we determined the distance from the predicted 
+ FliA promoter sequence associated with each FliA binding site ( Fitzgerald et al. , 2014 ) to all downstream TSSs within 500 bp ( Figure 2A ) . 
+ For most well-characterized FliA-dependent promoters for flagellar genes , the distance between the center of the promoter sequence and TSS was between 18 and 22 bp . 
+ For other FliA binding sites , we observed a strong enrichment for TSSs between 18 and 23 bp downstream of FliA motif centers . 
+ In total , 38 of the 52 FliA binding sites have a TSS located 18-23 bp downstream of the center of their predicted promoter . 
+ This positional enrichment is highly significant when compared to the same analysis performed with a randomized TSS dataset ; only one random TSS was between 18-23 bp downstream of a FliA 
+ To systematically assess whether FliA binding sites are associated with signal in the NET-seq dataset , the sequence read coverage upstream and downstream of FliA binding sites was determined . 
+ For FliA binding sites associated with a TSS , the read coverage at each position from -100 to +100 was determined relative to the 
+ TSS . 
+ For all other FliA binding sites , a TSS was predicted at 20 bp downstream of the predicted promoter sequence center ( average position of other TSSs ) , and coverage was determined from -100 to +100 relative to this position . 
+ The coverage profile for each binding site was normalized to the minimum and maximum coverage in the region and plotted as a heatmap ( Figure 2B ) . 
+ There is a clear trend of higher NET-seq read coverage downstream of FliA binding sites , compared to the regions immediately upstream . 
+ To quantify this trend , the ratio of NET-seq read coverage upstream and downstream of the TSS was calculated for each putative 
+ FliA-dependent promoter . 
+ In total , 44 out of the 52 putative promoters showed at least 2-fold higher coverage in the region 100 bp downstream of the TSS than in the region 100 bp upstream of the TSS . 
+ These 44 putative 
+ FliA binding sites with transcriptional activity detected by NET-seq and those detected by TSS association 
+ In total , 26 of the 30 intragenic FliA binding sites , and one of the two FliA sites in a convergent intergenic region , show evidence of promoter activity from at least one assay . 
+ Table 1 summarizes the existing evidence for these sites . 
+ It should be noted that neither the TSS nor NET-seq datasets have matched ΔfliA controls , so it is formally possible that TSSs/transcripts are associated with FliA-independent promoters . 
+ However , this is highly unlikely given the position of putative TSSs and the position of NET-seq signal with respect to the predicted 
+ FliA promoter sequences . 
+ Overall , there is substantial overlap between the sets of putative intragenic promoters that display FliA-dependent activity in promoter fusion assays , those with appropriately positioned TSSs , and 
+ Most intragenic FliA promoters are not conserved across species
+ To assess whether intragenic FliA promoters and binding sites are likely to be functionally important , we determined conservation of these sites bioinformatically . 
+ The sequence surrounding each of the 52 FliA binding sites previously identified by ChIP-seq ( Fitzgerald et al. , 2014 ) was extracted and used as a BLAST query to search genomes from 24 γ-proteobacterial genera ( Table S1 ) . 
+ All genomes queried encode FliA , except for those of Klebsiella and Raoultella , which were included as controls . 
+ If a homologous region was identified , it was scored against the previously determined E. coli FliA position-weight matrix ( Fitzgerald et al. , 2014 ) . 
+ These scores are depicted as a heatmap in Figure 3A , where yellow represents the highest-scoring sites and blue the lowest-scoring . 
+ Sites are categorized by location and orientation , and then ranked by total degree of conservation within each category , from left to right . 
+ The well-characterized FliA-dependent promoter inside flhC , which drives transcription of the downstream motABcheAW operon , was the most highly conserved . 
+ All other well-characterized , flagellar-related FliA promoters were well-conserved at the sequence level , with the exception of the promoter upstream of the fliLMNOPQR operon 70 , which is also transcribed by σ in E. coli . 
+ Most novel intergenic and intragenic FliA binding sites showed no evidence of conservation , even in close relatives such as Salmonella . 
+ It should be noted that a few intragenic FliA binding sites , such as those inside hslU , glyA , and ybhK , appear conserved , but score equally well in species that lack fliA ( Klebsiella and 
+ Raoultella ) , suggesting they are maintained for reasons independent of their ability to bind FliA , most likely because of high levels of conservation for these protein-coding genes . 
+ A few other intragenic promoters , such as those inside uhpC , hypD , metF , and speA , show possible sequence conservation in Salmonella , but not in more 
+ Intragenic FliA promoters are not conserved across E. coli strains
+ Previous studies suggest that while intragenic promoters may not be conserved between species ( Raghavan et al. , 2012 ) , they may be conserved within strains of the same species ( Shao et al. , 2014 ) . 
+ Hence , we bioinformatically determined the conservation of all FliA sites across 9,432 E. coli strains for which a genome sequence is available ( Table S2 ) . 
+ The sequence surrounding each of the 52 FliA binding sites previously identified by ChIP-seq ( Fitzgerald et al. , 2014 ) was extracted and used as a BLAST query to search genomes from each E. coli genome contig . 
+ If a homologous region was identified , we determined whether each position in each E. coli K-12 FliA site is conserved . 
+ We then determined the proportion of strains with a homologous region in which each position of each FliA site is conserved . 
+ Figure 3B shows the level of conservation of each position of FliA sites divided into two classes : ( i ) sites that represent promoters of mRNAs ( based on our 
+ The second class includes most of the intragenic FliA sites . 
+ FliA sites that represent promoters of mRNAs are with the lack of sequence requirements in the spacer region for FliA binding . 
+ By contrast , FliA sites that do not conservation between these regions and the spacer . 
+ We conclude that , as a group , FliA binding sites that do not 
+ Genome-wide mapping of the Salmonella Typhimurium FliA regulon
+ Salmonella enterica and E. coli diverged approximately 100 million years ago and exhibit substantial drift at wobble positions ( Gordienko et al. , 2013 ) . 
+ As an independent , empirical test of FliA binding site conservation , we determined the genome-wide binding profile of S. enterica serovar Typhimurium FliA using ChIP-seq of a 
+ C-terminally tagged derivative expressed from its native locus . 
+ To facilitate comparison with E. coli ChIP-seq data , we grew cells under similar conditions as those used in our previous study of E. coli FliA ( Fitzgerald et al. , 2014 ) . 
+ A total of 23 high-confidence FliA binding sites were identified ( Table 2 , Figure 4A ) . 
+ Of these 23 sites , three are inside genes but within 300 bp of a gene start ( 13 % ; Figure 4B ) , and five are inside genes and far from a gene start ( 22 % ) . 
+ No equivalent ChIP-seq peaks were identified using a control , untagged strain of S. 
+ Typhimurium . 
+ All 23 S. Typhimurium FliA binding sites are associated with a match to the consensus FliA motif ( Figure 4C ; MEME , E-value = 7.4e-49 ) , and motif positions were enriched in the region ~ 25 bp upstream of peak centers , as previously described for FliA binding sites in E. coli ( Fitzgerald et al. , 2014 ) . 
+ As predicted by the sequence conservation analysis ( Figure 3A ) , FliA-dependent promoters upstream of key flagellar operons were conserved in S. Typhimurium . 
+ However , with the exception of the motA promoter that is located inside flhC , no intragenic FliA binding sites were found to be conserved between E. coli and S. Typhimurium . 
+ RNA-seq was used to assess FliA-dependent changes in gene expression by comparing wild-type and ΔfliA strains of S. Typhimurium ( Figure 5 ) . 
+ As for the ChIP-seq experiment , cells were grown under similar significantly differentially expressed between the two strains ( q-value ≤ 0.01 , fold-change ≥ 2 ) , of which 36 were downstream of FliA binding sites identified by ChIP-seq ( Table 2 ) . 
+ The intragenic FliA binding sites within flhC , STM14_3340 , and STM14_3895 were associated with FliA-dependent regulation of the downstream genes , all of which are known flagellar genes . 
+ The other intragenic binding sites were not 
+ The motA promoter within flhC constrains evolution of the FlhC protein
+ Although most intragenic FliA promoters in E. coli are not well conserved in other species , the motA promoter , located inside flhC , is highly conserved ( Figure 3A ) . 
+ However , it is unclear whether this conservation is due to selective pressure on the promoter or on the amino acid sequence of FlhC , which is encoded by the same DNA . 
+ As expected given the conservation of the motA promoter inside flhC , the two FlhC amino acids , Ala177 ¬ 
+ Asp178 , that are encoded by sequence overlapping the -10 region , are highly conserved among γ-proteobacteria 
+ 6A ) , leading us to hypothesize that the Ala-Asp motif is conserved due to selective pressure on the motA promoter , rather than on the amino acids themselves . 
+ To test this hypothesis , we determined whether Asp178 is required for FlhC function . 
+ We created a strain of motile E. coli MG1655 in which the flhDC promoter is transcriptionally active , but flhC is replaced with a cassette containing thyA under the control of a constitutive σ promoter . 
+ Thus , this strain lacks the motA promoter , but we reasoned that motA would be co-transcribed with thyA ( Figure 6B ) . 
+ We then introduced either wild-type FlhC or D178A FlhC from a plasmid , or an empty vector control . 
+ Cells containing the empty vector control were non-motile , as expected given that they lack FlhC expressing D178A FlhC were also fully motile ( mean motility level relative to wild-type FlhC of 0.97 ± s.d. 
+ 0.09 , n = 3 ; Figure 6B ) . 
+ We conclude that the conserved Asp178 is likely not required for FlhC function . 
+ To further investigate the conservation of the Ala-Asp motif in FlhC , we aligned the sequences of FlhC homologues from 98 different proteobacterial species , each from a different genus in which motA is positioned immediately downstream of flhC ( Table S4 ) . 
+ Although Ala177 and Asp178 are well conserved across these conserved ( Table S4 ) . 
+ We reasoned that if Asp178 is broadly conserved due to selective pressure on the overlapping motA promoter , species in which Asp178 is not conserved are likely to have repositioned the motA promoter . 
+ To test this hypothesis , we extracted the intergenic sequences between flhC and motA for each of the 
+ 43 species where Asp178 is not conserved ( Figure S1 ) . 
+ Consistent with our hypothesis , we identified a strongly enriched sequence motif in 19 species ( MEME E-value = 1.5e-32 ) corresponding to a consensus FliA promoter 
+ S1 ) , we did not observe enrichment of a FliA promoter motif in the flhC-motA intergenic region . 
+ Having a FliA promoter for motA within flhC is likely to be the ancestral state , since the position of FliA promoters in flhCmotA intergenic regions differs extensively between species , as do the sequences flanking these promoters . 
+ We also compared the length of the flhC-motA intergenic region in ( i ) the 19 species where FlhC Asp178 is not conserved and for which we identified a likely intergenic FliA promoter , and ( ii ) the 55 species where FlhC 
+ Asp178 is conserved . 
+ Intergenic distances in group ( i ) are significantly higher ( median length 207 bp ) than those in group ( ii ) ( median length 131 bp ; Mann-Whitney U Test p = 4.0e-7 ) . 
+ We conclude that the selective pressure on Asp178 is lost in species that reposition the motA promoter to the flhC-motA intergenic region , and 
+ DISCUSSION
+ Most FliA Binding Sites are Active Promoters for Unstable RNAs
+ Most FliA binding sites identified by ChIP-seq display FliA-dependent promoter activity when fused upstream of the lacZ reporter gene ( Figure 1 ) . 
+ Many of these FliA binding sites , and some additional sites that had inactive lacZ fusions , are associated with correctly positioned TSSs and NET-seq signal from published studies 
+ ( Larson et al. , 2014 ; Thomason et al. , 2015 ) . 
+ Together , these data suggest that almost all FliA binding sites represent transcriptionally active FliA-dependent promoters , regardless of their location relative to proteincoding genes . 
+ The small subset of FliA binding sites that appear to be transcriptionally inert were amongst the most weakly bound sites detected by ChIP-seq ( Fitzgerald et al. , 2014 ) . 
+ Three of these sites have at least one mismatch to key -10 region residues ( Koo , Rhodius , Campbell , et al. , 2009 ) , suggesting that the sites are unlikely to be active promoters , or are so weakly transcribed that their activity is undetectable using standard 
+ Although most intragenic FliA binding sites are likely to represent active promoters , they are not associated with the transcription of stable RNAs , since we previously detected very few such RNAs using standard RNA-seq ( Fitzgerald et al. , 2014 ) . 
+ We conclude that most intragenic FliA promoters drive transcription of unstable 
+ RNAs . 
+ This is consistent with the previously described phenomenon of `` pervasive transcription '' that generates large numbers of short , unstable transcripts , primarily from promoters within genes ( Lybecker et al. , 2014 ; 
+ Wade and Grainger , 2014 ) . 
+ Intragenic promoters typically drive transcription of non-coding RNAs . 
+ Transcription of these RNAs is rapidly terminated by Rho ( Peters et al. , 2012 ) , and the transcripts are rapidly 
+ Limited conservation of the FliA regulon outside of core flagellar genes
+ Evolutionary conservation of DNA sequences is due to purifying selection , and suggests that the sequence has beneficial function . . 
+ As expected , most flagella-associated FliA promoters are highly conserved at the sequence level ( Figure 3 ) . 
+ Of the intragenic FliA binding sites , only those that drive transcription of an mRNA for a downstream gene appear to be at all functionally conserved . 
+ A few intragenic promoters , such as those within hslU , glyA , and ybhK , are conserved at the sequence level between E. coli and many species ( Figure 3A ) . 
+ However , the fact that these sites are also conserved in two genera not encoding fliA -- Klebsiella and Raoultella 
+ -- suggests that the DNA sequences are maintained for reasons independent of FliA , most likely purifying 
+ To experimentally validate the sequence-based conservation predictions , we performed ChIP-seq on S. 
+ Typhimurium FliA . 
+ As predicted based on sequence conservation , all key flagellar promoters were functionally conserved , except the one upstream of fliLMNOPQR . 
+ In E. coli , this operon is primarily 70 transcribed from a σ promoter that is activated by FlhDC ( Liu and Matsumura , 1996 ; Stafford et al. , 2005 ; Fitzgerald et al. , 2014 ) . 
+ Conservation of the σ promoter and FlhDC regulation would ensure that these genes are coordinately regulated with other flagellar genes in S. Typhimurium , potentially relieving the selective pressure to maintain the FliA promoter . 
+ Our ChIP-seq data indicate the only intragenic FliA promoter functionally conserved between E. coli and S. Typhimurium is that within flhC . 
+ While specific intragenic FliA binding sites were not conserved , S. Typhimurium FliA binds multiple intragenic sites . 
+ This suggests that the factors affecting FliA specificity , or lack thereof , are similar between E. coli and S. Typhimurium , and that the phenomenon of intragenic FliA promoters is conserved , even if the specific promoters are not . 
+ Note that we identified fewer intragenic FliA sites in S. Typhimurium than we previously identified in E. coli ( Fitzgerald et al. , 2014 ) , but this is likely due to the data for S. Typhimurium having slightly lower signal-to-noise ratios ( compare ChIP-seq 
+ It should be noted that lack of conservation of specific promoters does not necessarily indicate a lack of functional importance , but could instead reflect lineage-specific evolution . 
+ Indeed , regulatory small RNAs are often poorly conserved , even between closely related species ( Toffano-Nioche et al. , 2012 ; Beauregard et al. , 
+ 2013 ; Patenge et al. , 2015 ) . 
+ However , our analysis of conservation within E. coli suggests that most intragenic 
+ FliA promoters are not conserved even within the species , although this multi-promoter analysis does not rule out the possibility that a small proportion of the intragenic promoters are functional . 
+ Indeed , one of the two stable , FliA-transcribed non-coding RNAs -- that transcribed from within uhpT -- is likely a functional regulator . 
+ A recent study detected numerous Hfq-mediated interactions between mRNAs and RNA originating from the 3 ' end of uhpT ( Melamed et al. , 2016 ) . 
+ Although the uhpT sequences from these interactions map to locations downstream of the sRNA predicted by RNA-seq ( Fitzgerald et al. , 2014 ) , an earlier microarray study and NET-seq data suggest that the FliA-transcribed sRNA extends further downstream ( Reppas et al. , 2006 ; Larson et al. , 
+ 2014 ) . 
+ The other stable , FliA-transcribed non-coding RNA -- that transcribed from within hypD -- was not detected in any sRNA : mRNA interactions ( Melamed et al. , 2016 ) , suggesting that it is not functional . 
+ Unstable 
+ FliA-transcribed non-coding RNAs are also unlikely to be functional , given their transient nature , and the lack 
+ Intragenic FliA promoters likely arise as a result of sequence drift during evolution , although the likelihood of creating a FliA promoter as a result of a base substitution is lower than for some other σ factors , since FliA promoters require a more stringent match to the consensus sequence . 
+ Nonetheless , we estimate that there are 
+ 474 possible single base substitutions in the E. coli genome that would create a new FliA promoter ( see 
+ Methods ) . 
+ Strikingly , this number is similar to the number of single base substitutions that we predict would destroy an existing FliA site , based on the number of actual FliA sites and the information content of the binding motif . 
+ We propose that the number of intragenic FliA sites in E. coli is in equilibrium , but that nonfunctional sites turn over relatively frequently . 
+ The prevalence of intragenic FliA promoters in E. coli and S. 
+ Typhimurium suggests that they do not substantially impact expression of the overlapping genes . 
+ Consistent with this , we detected significant FliA-dependent regulation of only three S. Typhimurium genes that have an internal FliA site ( Figure 5 ; Table 2 ) ; one of these genes ( STM14_3340 ) is immediately upstream of a FliAtranscribed flagellar gene , and another ( motB ) is a downstream gene in a FliA-transcribed operon . 
+ While most intragenic FliA promoters are unlikely to be individually functional , the phenomenon of widespread intragenic 
+ FliA sites may be functional . 
+ For example , intragenic FliA sites could titrate cellular FliA , thereby sensitizing could reduce stochasticity in effective FliA levels , by requiring that FliA levels be maintained at higher levels . 
+ These functions would be independent of the specific locations of FliA promoters , and more dependent on the number and strength of promoters . 
+ Spontaneous creation of FliA binding sites by genetic drift may also provide a source of novel , functional FliA promoters , e.g. if there is a selective advantage of coordinately regulating the 
+ The motA promoter inside flhC constrains the evolution of FlhC
+ conserved of all FliA promoters . 
+ This promoter has been described previously , and drives transcription of the require a stringent match to the consensus promoter sequence ( Koo , Rhodius , Campbell , et al. , 2009 ) , and this is reflected by the high information content in the sequence motif associated with FliA binding , especially in the 
+ -10 region ( Figure 4C ) ( Fitzgerald et al. , 2014 ) . 
+ Hence , conservation of an intragenic FliA promoter is likely to result in conservation of the amino acid sequence for the overlapping codons . 
+ The -10 region of the FliA promoter in flhC corresponds to an Ala-Asp motif in the FlhC protein . 
+ This motif is broadly conserved . 
+ Multiple independent lines of evidence support the idea that the Ala-Asp sequence motif is conserved due to selective pressure on the intragenic FliA promoter and not on the amino acids themselves : ( i ) amino acids close to the Ala-Asp motif that are not associated with FliA promoter elements are poorly conserved ( Figure 6A ) ; ( ii ) the Ala-Asp motif is not present in the X-ray crystal structure of FlhDC ( Wang et al. , 2006 ) , suggesting that it is in a disordered region ; ( iii ) Asp178 does not detectably contribute to FlhC function ( Figure 6B ) ; and ( iv ) in proteobacterial species where flhC and motA are adjacent genes but FlhC Asp178 is not conserved , an alternative FliA promoter is often located in the intergenic region between flhC and motA ( Figure 6C ) . 
+ Thus , even in cases where the specific FliA promoter inside flhC is not conserved , the presence of a FliA promoter upstream of motA is conserved . 
+ If the FliA promoter inside flhC were conserved because of selective pressure on the Ala-Asp motif , we would expect that ( i ) surrounding amino acids would also be conserved , regardless of whether they are encoded in sequence overlapping key FliA promoter elements , ( ii ) the Ala-Asp motif would be part of an important structural motif , ( iii ) Asp178 would be required for motility , and ( iv ) in species where 
+ Asp178 is not conserved , there would be no selective pressure to acquire an alternative FliA promoter for motA . 
+ We therefore conclude that the amino acid sequence of FlhC is constrained by the internal promoter for motA . 
+ Thus , the evolution of FlhC protein sequence is directly impacted by the function of the downstream gene . 
+ The potential for an abundance of bacterial regulatory sequences that constrain protein evolution
+ A recent study reported large numbers of putative transcription factor binding sites in the coding sequences of the human genome , and suggested that these sequences are under selective pressure for both their regulatory and coding functions ( Stergachis et al. , 2013 ) . 
+ While the specific findings of that study have been questioned 
+ ( Xing and He , 2015 ) , the FliA promoter inside flhC is clearly analogous . 
+ We propose that conservation of intragenic sequences due to selective pressure on their regulatory function is likely to occur far more frequently in bacteria than in eukaryotes . 
+ The compact nature of bacterial genomes causes them to be gene-dense , greatly limiting the non-coding sequence space ; in E. coli , ~ 90 % of the genome is protein-coding , in stark contrast to the human genome , which is < 2 % protein-coding . 
+ Consistent with the paucity of non-coding sequence in bacterial genomes , numerous intragenic binding sites have been identified for transcription factors and σ factors 
+ ( Wade et al. , 2006 ; Shimada et al. , 2008 ; Hartkoorn et al. , 2012 ; J. Galagan et al. , 2013 ; J. E. Galagan et al. , 
+ 2013 ; Bonocora et al. , 2013 ; Wade and Grainger , 2014 ; Bonocora et al. , 2015 ; Grainger , 2016 ) . 
+ In some cases , low stringency in the DNA sequence requirements for binding may allow for sequence changes that change encoded amino acids while maintaining regulatory function 70 . 
+ For example , there are many intragenic σ consensus ( Singh et al. , 2014 ) 70 . 
+ Hence , even if an intragenic σ promoter is under selective pressure , it could acquire mutations that alter the overlapping coding potential without affecting promoter strength . 
+ However , bacterial transcription factors and some alternative σ factors tend to have high information content binding sites , especially compared to their eukaryotic equivalents ( Wade et al. , 2005 ; Wunderlich and Mirny , 2009 ) . 
+ This suggests that functional conservation of intragenic transcription / σ factor binding sites in bacteria will often 
+ Identification of regulatory sequences that constrain protein evolution requires further investigation of intragenic regulatory sites . 
+ Although numerous intragenic binding sites have been identified , their regulatory capacity is often unclear , and their conservation has not been extensively analyzed . 
+ Intragenic promoters have been reported in numerous bacterial species ( Lybecker et al. , 2014 ; Wade and Grainger , 2014 ) . 
+ Limited evolutionary analysis suggests that most promoters for antisense RNAs are not conserved ( Raghavan et al. , 
+ 2012 ) , although there is evidence for lineage-specific conservation ( Shao et al. , 2014 ) . 
+ Importantly , there are specific examples of intragenic σ factor binding that likely constrain evolution of the amino acid sequence encoded by the overlapping protein-coding gene 24 . 
+ First , an intragenic promoter for the alternative σ factor , σ , is conserved both at the sequence level and functionally ( Guo et al. , 2014 ; Li et al. , 2015 ) . 
+ This promoter drives transcription of a non-coding , regulatory RNA , MicL , that is also conserved ( Guo et al. , 2014 ) . 
+ Hence , both the promoter and non-coding RNA might represent dual-usage sequence . 
+ Second , an 54 alternative σ factor , σ , binds many intragenic sites in E. coli and S. Typhimurium that are conserved both at the sequence level and functionally ( Bonocora et al. , 2015 ; Bono et al. , 2017 ) , suggesting that they may constrain protein evolution . 
+ Since conserved intragenic σ binding sites are likely to be promoters for downstream genes ( Bonocora et al. , 
+ 2015 ) , evolution of the amino acid sequence of proteins encoded 54 by genes containing σ promoters may often 
+ Extrapolating from our data for FliA , the majority of intragenic transcription / σ factor binding sites are likely to be non-functional , and hence not under selective pressure . 
+ These sites would therefore not impact protein evolution . 
+ Even though the complete regulons of most E. coli transcription / σ factors remain to be mapped , thousands of intragenic sites have already been identified , implying that there are thousands more sites yet to be discovered . 
+ Even if only a small fraction of intragenic sites are under selection , this would indicate the existence of many such sequences that constrain protein evolution . 
+ Hence , our data suggest that the evolutionary impact of intragenic regulatory sequences should be considered more broadly , as it is likely to be 
+ MATERIALS AND METHODS
+ Strains, plasmids, and growth conditions
+ All bacterial strains and plasmids used in this study are listed in Table 3 . 
+ All oligonucleotides used in this study are listed in Table S5 . 
+ All E. coli strains are derivatives of the motile MG1655 strain ( DMF36 ) described previously ( Fitzgerald et al. , 2014 ) . 
+ To construct strains used for β-galactosidase assays , the native lacZ gene of 
+ DMF36 , or the isogenic ΔfliA strain ( DMF40 ) ( Fitzgerald et al. , 2014 ) was replaced by thyA using FRUIT recombineering ( Stringer et al. , 2012 ) with oligonucleotides JW5397 and JW5398 , generating strains DMF122 and DMF123 , respectively . 
+ flhC and 106 bp downstream sequence was replaced with thyA in DMF36 using 
+ FRUIT recombineering ( Stringer et al. , 2012 ) to generate strain CDS105 . 
+ Salmonella strains are derivatives of 
+ S. enterica serovar Typhimurium 14028s ( Jarvik et al. , 2010 ) . 
+ S. Typhimurium FliA was N-terminally epitope tagged with a 3x-FLAG tag at the native chromosomal locus using FRUIT recombineering ( Stringer et al. , 
+ 2012 ) , generating strain DMF087 . 
+ The S. Typhimurium ΔfliA strain , DMF088 , was constructed using FRUIT 
+ Wild-type flhC was PCR-amplified using oligonucleotides JW8879 and JW8880 , and cloned into the SacI and 
+ SalI restriction sites of pBAD30 ( Guzman et al. , 1995 ) using the In-Fusion method ( Clontech ) to generate pCDS043 . 
+ D178A mutant flhC was PCR-amplified using oligonucleotides JW8879 and JW8881 , and cloned as described for wild-type fhlC , to generate pCDS044 . 
+ Transcriptional fusions of putative FliA promoters to lacZ were constructed in plasmid pAMD-BA-lacZ ( Stringer et al. , 2014 ) . 
+ Putative promoter regions ( nucleotide positions -200 to +10 , relative to the predicted TSS ) were PCR-amplified from MG1655 cells . 
+ PCR products were cloned into pAMD-BA-lacZ cut with SphI and NheI using the In-Fusion method ( Clontech ) . 
+ Oligonucleotides used for the plasmid cloning are listed in Table 3.
+ For all experiments involving liquid growth , subcultures were grown in LB at 37 °C , with aeration , to OD600 
+ Transcriptional lacZ promoter fusion plasmids were transformed into ΔlacZ strains with ( DMF122 ) or without fliA ( DMF123 ) . 
+ Promoter activity was assessed by β-galactosidase assay , as previously described ( Stringer et 
+ Analysis of published TSS data
+ To determine whether FliA binding sites were associated with TSSs , a published list of TSS locations derived from dRNA-seq was used ( Thomason et al. , 2015 ) . 
+ Orientation of putative FliA promoters was determined based on associated motifs . 
+ For each putative FliA promoter , the distance from the motif center to each downstream TSS on the correct strand was calculated . 
+ All pairwise distances < 500 bp are plotted in Figure 2A . 
+ As a control , a randomized TSS dataset was generated with the same total number and distribution ( with respect to strand and being intragenic/intergenic ) as the experimental dataset . 
+ The analysis was repeated with this 
+ Analysis of published NET-seq data
+ Raw sequencing data files from NET-seq experiments ( Larson et al. , 2014 ) were obtained and mapped to the E. coli MG1655 genome using CLC Genomics Workbench . 
+ Sequence read depths at positions surrounding putative FliA promoters were calculated using a custom Python script . 
+ For FliA binding sites associated with a 
+ TSS , the NET-seq read coverage was calculated at every position from -100 to +100 relative to the TSS . 
+ For 
+ FliA binding sites not associated with a TSS , a TSS was predicted to be located 20 bp downstream of the motif center , and NET-seq read coverage was calculated from -100 to +100 relative to this position . 
+ For each region , 
+ NET-seq read coverage was normalized to local minimum and maximum values . 
+ Normalized read coverage 
+ The locations of all E. coli FliA binding sites described previously ( Fitzgerald et al. , 2014 ) were used to identify homologous sequences in 24 other species ( Table S1 ) . 
+ A Position Specific Scoring Matrix ( PSSM ) was derived from the identified FliA binding sites in E. coli ( Fitzgerald et al. , 2014 ) , as described previously ( Bonocora et al. , 2015 ) . 
+ We then took a 300 bp sequence surrounding each FliA site in E. coli MG1655 . 
+ For sites within 
+ ORFs we used BLASTX ( Altschul et al. , 1990 ) to search for homologous protein sequences in the selected bacterial species ( BLAST E-value cut-off of 1e-04 , low-complexity filter turned off ) . 
+ Using the PSSM , we scored the top-scoring BLAST hit for each species , searching within 100 bp of the position corresponding to the binding site in E. coli . 
+ For sites within intergenic regions , we used BLASTN to search for regions homologous to each of the 300 bp sequences in each of the selected species ( BLAST E-value cut-off of 1e-04 , lowcomplexity filter turned off ) , and extracted 100 bp on either side of the position corresponding to the position of the site in E. coli . 
+ If no hits were found , we took the sequence of the downstream gene in E. coli and used 
+ BLASTX to search for homologues in the selected species ( BLAST E-value cut-off of 1e-04 , low-complexity filter turned off ) . 
+ For each top BLAST hit , we used the position of the binding site in E. coli relative to the downstream gene to determine the predicted site of binding , and extracted 100 bp on either side . 
+ We calculated 
+ PSSM scores for all sequences in each of the selected regions . 
+ The best score for each region tested was 
+ FliA binding site conservation in E. coli strains
+ All complete or partial genome sequences for E. coli ( 9432 genomes or contigs ; Table S2 ) were downloaded directly from NCBI and individually scored for the presence FliA sites using the method described above for 
+ ChIP-seq of S. Typhimurium FliA
+ ChIP-seq was performed with strains DMF087 ( FliA-FLAG3 ) or 14028s ( untagged control ) as previously described ( Stringer et al. , 2014 ) . 
+ Sequence reads were mapped to the S. Typhimurium 14028s genome using 
+ CLC Genomics Workbench ( Version 8 ) . 
+ Peaks were called using a previously described analysis pipeline 
+ ( Fitzgerald et al. , 2014 ) . 
+ Three peaks with a FAT score of 1 were identified in the control dataset ; these peaks 
+ RNA-seq
+ RNA-seq was performed with strains 14028s and DMF088 , as previously described ( Stringer et al. , 2014 ) . 
+ Read mapping and differential expression analysis were performed using Rockhopper ( McClure et al. , 2013 ) . 
+ The normalized expression values and indicators of statistical significance in Table 2 were generated using 
+ Rockhopper.
+ Analysis of FlhC sequence conservation
+ We used the RSAT `` Comparative Genomics/Get Orthologs '' tool ( default parameters , except we required 50 % amino acid sequence identity ; ( Medina-Rivera et al. , 2015 ) ) to identify 52 FlhC homologues from γ-proteobacterial species , each from a different genus . 
+ We aligned protein sequences using MUSCLE ( v3 .8 , default parameters ; ( Edgar , 2004 ) ; Table S3 ) , and for each FlhC homologue we counted matches at each amino 
+ Identification of enriched sequence motifs in flhC-motA intergenic regions
+ We used the RSAT `` Comparative Genomics/Get Orthologs '' tool ( default parameters , except we required 40 % amino acid sequence identity ; ( Medina-Rivera et al. , 2015 ) ) to identify 130 FlhC homologues from proteobacterial species , each from a different genus . 
+ We aligned these protein sequences using MUSCLE ( v3 .8 , default parameters ; ( Edgar , 2004 ) ; Table S4 ) . 
+ To determine whether the flhC and motA genes are adjacent in each of the 131 species selected , we first used the RSAT `` Comparative Genomics/Get Orthologs '' tool ( default parameters except required 40 % amino acid sequence identity ; ( Medina-Rivera et al. , 2015 ) ) to extract 100 bp of sequence immediately downstream of the end of the intergenic region following flhC for each species . 
+ We then searched for open reading frames similar to that of E. coli K-12 motA using BLASTX ( v2 .2.3 , hosted on 
+ EcoGene 3.0 , default parameters , searching against the E. coli annotated proteome ; ( Altschul et al. , 1997 ; Zhou and Rudd , 2013 ) ) . 
+ We discarded 32 FlhC sequences for which there was no BLASTX match to MotA with the corresponding sequence downstream of flhC . 
+ For each of the 98 remaining FlhC homologues , using the 
+ MUSCLE alignment described above ( Table S4 ) , we determined whether E. coli K-12 Asp178 is conserved . 
+ We used the RSAT `` Comparative Genomics/Get Orthologs '' tool ( Medina-Rivera et al. , 2015 ) to extract intergenic sequence downstream of flhC for the 98 FlhC homologues from genomes where flhC and motA are adjacent genes . 
+ We discarded intergenic sequences < 50 bp . 
+ We used MEME ( v4 .12.0 , default settings , except we selected the `` look on given strand only '' option ; ( Bailey and Elkan , 1994 ) ) to identify enriched sequence motifs in intergenic regions from species where FlhC Asp178 is conserved ( n = 55 ) or is not conserved ( n = 43 ) , 
+ Motility assays were performed as previously described (Fitzgerald et al., 2014).
+ Estimating the number of single base substitutions that would create a new FliA site in E. coli
+ We used the E. coli FliA PSSM ( Fitzgerald et al. , 2014 ) to calculate motif scores for all 27mer sequences in the 
+ E. coli MG1655 genome . 
+ For each score window between integer values ( e.g. scores between 10 and 11 , scores between 11 and 12 , etc. ) , we determined the frequency of sequences that represent actual FliA binding sites , as determined previously by ChIP-seq ( Fitzgerald et al. , 2014 ) . 
+ We then calculated motif scores for every 27mer in the genome with every possible single base substitution ( i.e. 81 scores for each sequence ) . 
+ We binned scores in whole integer windows ( e.g. a bin for scores between 10 and 11 , a bin for scores between 11 and 12 , etc. ) and used the frequencies calculated for actual sites to estimate the number of mutated 27mers that would represent 
+ Raw ChIP-seq and RNA-seq data are available from the EBI ArrayExpress repository using accession numbers 
+ This work was funded by the National Institutes of Health through the NIH Director 's New Innovator Award 
+ Program , 1DP2OD007188 ( JTW ) and through grant 5R01GM114812 ( JTW ) . 
+ This material is based on work supported by the National Science Foundation Graduate Research Fellowship under grant number DGE ¬ 
+ 1060277 ( DMF ) . 
+ DMF was also supported by National Institutes of Health training grant T32AI055429 . 
+ The funders had no role in study design , data collection and interpretation , or the decision to submit the work for 
+ ACKNOWLEDGEMENTS
+ We thank the Applied Genomic Technologies Core Facility for Sanger sequencing , the University at Buffalo 
+ Next Generation Sequencing Core Facility for Illumina sequencing , and the Wadsworth Center Media and 
+ Glassware Core facilities for media and glassware . 
+ We thank David Grainger , Keith Derbyshire , and members of the Wade group for helpful discussions . 
+ We thank the anonymous reviewers for their suggestions and 
+ Mol Biol 215: 403–410.
+ Altschul , S.F. , Madden , T.L. , Schaffer , A.A. , Zhang , J. , Zhang , Z. , Miller , W. , and Lipman , D.J. ( 1997 ) Gapped 
+ BLAST and PSI-BLAST : a new generation of protein database search programs . 
+ Nucleic Acids Res 25 : 3389 -- 
+ 3402.
+ Bailey , T.L. , and Elkan , C. ( 1994 ) Fitting a mixture model by expectation maximization to discover motifs in 
+ Beauregard , A. , Smith , E.A. , Petrone , B.L. , Singh , N. , Karch , C. , McDonough , K.A. , and Wade , J.T. ( 2013 ) 
+ Identification and characterization of small RNAs in Yersinia pestis. RNA Biol 10: 397–405.
+ Bono , A.C. , Hartman , C.E. , Solaimanpour , S. , Tong , H. , Porwollik , S. , McClelland , M. , et al. ( 2017 ) Novel 
+ DNA Binding and Regulatory Activities for σ ( 54 ) ( RpoN ) in Salmonella enterica Serovar Typhimurium 
+ 14028s. J Bacteriol 199.
+ Bonocora , R.P. , Fitzgerald , D.M. , Stringer , A.M. , and Wade , J.T. ( 2013 ) Non-canonical protein-DNA 
+ Bonocora , R.P. , Smith , C. , Lapierre , P. , and Wade , J.T. ( 2015 ) Genome-Scale Mapping of Escherichia coli σ54 
+ Reveals Widespread, Conserved Intragenic Binding. PLoS Genet 11: e1005552.
+ Brewster , R.C. , Weinert , F.M. , Garcia , H.G. , Song , D. , Rydenfelt , M. , and Phillips , R. ( 2014 ) The transcription 
+ Chen , Y.F. , and Helmann , J.D. ( 1992 ) Restoration of motility to an Escherichia coli fliA flagellar mutant by a 
+ Churchman , L.S. , and Weissman , J.S. ( 2011 ) Nascent transcript sequencing visualizes transcription at 
+ Edgar , R.C. ( 2004 ) MUSCLE : multiple sequence alignment with high accuracy and high throughput . 
+ Nucleic 
+ Acids Res 32: 1792–1797.
+ Feklístov , A. , Sharon , B.D. , Darst , S.A. , and Gross , C.A. ( 2014 ) Bacterial sigma factors : a historical , structural , 
+ Fitzgerald , D.M. , Bonocora , R.P. , and Wade , J.T. ( 2014 ) Comprehensive Mapping of the Escherichia coli 
+ Flagellar Regulatory Network. PLOS Genet 10: e1004649.
+ Galagan , J. , Lyubetskaya , A. , and Gomes , A. ( 2013 ) ChIP-Seq and the Complexity of Bacterial Transcriptional 
+ Regulation. Curr Top Microbiol Immunol 363: 43–68.
+ Galagan , J.E. , Minch , K. , Peterson , M. , Lyubetskaya , A. , Azizi , E. , Sweet , L. , et al. ( 2013 ) The Mycobacterium 
+ Gordienko , E.N. , Kazanov , M.D. , and Gelfand , M.S. ( 2013 ) Evolution of pan-genomes of Escherichia coli , 
+ Shigella spp., and Salmonella enterica. J Bacteriol 195: 2786–2792.
+ Grainger , D.C. ( 2016 ) The unexpected complexity of bacterial genomes . 
+ Microbiol Read Engl 162 : 1167 -- 1172 . 
+ Guo , M.S. , Updegrove , T.B. , Gogol , E.B. , Shabalina , S.A. , Gross , C.A. , and Storz , G. ( 2014 ) MicL , a new σE-dependent sRNA , combats envelope stress by repressing synthesis of Lpp , the major outer membrane 
+ Guzman , L.M. , Belin , D. , Carson , M.J. , and Beckwith , J. ( 1995 ) Tight regulation , modulation , and high-level 
+ Hartkoorn , R.C. , Sala , C. , Uplekar , S. , Busso , P. , Rougemont , J. , and Cole , S.T. ( 2012 ) Genome-wide definition in Escherichia coli by the cyclic AMP receptor protein requires an unusual promoter organization . 
+ Mol 
+ Microbiol 75: 1098–1111.
+ Ide , N. , Ikebe , T. , and Kutsukake , K. ( 1999 ) Reevaluation of the promoter structure of the class 3 flagellar 
+ Jarvik , T. , Smillie , C. , Groisman , E.A. , and Ochman , H. ( 2010 ) Short-term signatures of evolutionary change in 
+ Koo , B.-M. , Rhodius , V.A. , Campbell , E.A. , and Gross , C.A. ( 2009 ) Mutational analysis of Escherichia coli sigma28 and its target promoters reveals recognition of a composite -10 region , comprised of an `` extended -10 '' 
+ Koo , B.-M. , Rhodius , V.A. , Nonaka , G. , deHaseth , P.L. , and Gross , C.A. ( 2009 ) Reduced capacity of alternative sigmas to melt promoters ensures stringent promoter recognition . 
+ Genes Dev 23 : 2426 -- 2436 . 
+ Larson , M.H. , Mooney , R.A. , Peters , J.M. , Windgassen , T. , Nayak , D. , Gross , C.A. , et al. ( 2014 ) A pause sequence enriched at translation start sites drives transcription dynamics in vivo . 
+ Science 344 : 1042 -- 1047 . 
+ Li , J. , Overall , C.C. , Johnson , R.C. , Jones , M.B. , McDermott , J.E. , Heffron , F. , et al. ( 2015 ) ChIP-Seq Analysis of the σE Regulon of Salmonella enterica Serovar Typhimurium Reveals New Genes Implicated in Heat Shock 
+ Liu , X. , and Matsumura , P. ( 1996 ) Differential regulation of multiple overlapping promoters in flagellar class II 
+ Lozada-Chávez , I. , Janga , S.C. , and Collado-Vides , J. ( 2006 ) Bacterial regulatory networks are extremely 
+ McClure, R., Balasubramanian, D., Sun, Y., Bobrovskyy, M., Sumby, P., Genco, C.A., et al. (2013)
+ Computational analysis of bacterial RNA-Seq data. Nucleic Acids Res 41: e140.
+ Medina-Rivera , A. , Defrance , M. , Sand , O. , Herrmann , C. , Castro-Mondragon , J.A. , Delerce , J. , et al. ( 2015 ) 
+ RSAT 2015: Regulatory Sequence Analysis Tools. Nucleic Acids Res 43: W50-56.
+ Melamed , S. , Peer , A. , Faigenbaum-Romm , R. , Gatt , Y.E. , Reiss , N. , Bar , A. , et al. ( 2016 ) Global Mapping of 
+ Small RNA-Target Interactions in Bacteria. Mol Cell 63: 884–897.
+ Paget , M.S. ( 2015 ) Bacterial Sigma Factors and Anti-Sigma Factors : Structure , Function and Distribution . 
+ Biomolecules 5: 1245–1265.
+ Park , K. , Choi , S. , Ko , M. , and Park , C. ( 2001 ) Novel sigmaF-dependent genes of Escherichia coli found using 
+ Patenge , N. , Pappesch , R. , Khani , A. , and Kreikemeyer , B. ( 2015 ) Genome-wide analyses of small non-coding 
+ RNAs in streptococci. Front Genet 6: 189.
+ Perez , J.C. , and Groisman , E.A. ( 2009 ) Evolution of transcriptional regulatory circuits in bacteria . 
+ Cell 138 : 
+ 233–244.
+ Peters , J.M. , Mooney , R.A. , Grass , J.A. , Jessen , E.D. , Tran , F. , and Landick , R. ( 2012 ) Rho and NusG suppress 
+ Raghavan , R. , Sloan , D.B. , and Ochman , H. ( 2012 ) Antisense Transcription Is Pervasive but Rarely Conserved 
+ Reppas , N.B. , Wade , J.T. , Church , G. , and Struhl , K. ( 2006 ) The transition between transcriptional initiation 
+ Transcription Start Sites within Genes across a Bacterial Genus. mBio 5: e01398-14.
+ Sharma , C.M. , and Vogel , J. ( 2014 ) Differential RNA-seq : the approach behind and the biological insight 
+ Shimada , T. , Ishihama , A. , Busby , S.J. , and Grainger , D.C. ( 2008 ) The Escherichia coli RutR transcription 
+ Singh , S.S. , Singh , N. , Bonocora , R.P. , Fitzgerald , D.M. , Wade , J.T. , and Grainger , D.C. ( 2014 ) Widespread 
+ Stafford , G.P. , Ogi , T. , and Hughes , C. ( 2005 ) Binding and transcriptional activation of non-flagellar genes by 
+ Stergachis , A.B. , Haugen , E. , Shafer , A. , Fu , W. , Vernot , B. , Reynolds , A. , et al. ( 2013 ) Exonic transcription 
+ Stringer , A.M. , Currenti , S.A. , Bonocora , R.P. , Petrone , B.L. , Palumbo , M.J. , Reilly , A.E. , et al. ( 2014 ) 
+ Genome-Scale Analyses of Escherichia coli and Salmonella enterica AraC Reveal Non-Canonical Targets and 
+ Stringer , A.M. , Singh , N. , Yermakova , A. , Petrone , B.L. , Amarasinghe , J.J. , Reyes-Diaz , L. , et al. ( 2012 ) 
+ FRUIT , a scar-free system for targeted chromosomal mutagenesis , epitope tagging , and promoter replacement 
+ Thomason , M.K. , Bischler , T. , Eisenbart , S.K. , Förstner , K.U. , Zhang , A. , Herbig , A. , et al. ( 2015 ) Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in 
+ Toffano-Nioche , C. , Nguyen , A.N. , Kuchly , C. , Ott , A. , Gautheret , D. , Bouloc , P. , and Jacq , A. ( 2012 ) 
+ Transcriptomic profiling of the oyster pathogen Vibrio splendidus opens a window on the evolutionary 
+ Wade , J.T. , and Grainger , D.C. ( 2014 ) Pervasive transcription : illuminating the dark matter of bacterial 
+ Wade , J.T. , Reppas , N.B. , Church , G.M. , and Struhl , K. ( 2005 ) Genomic analysis of LexA binding reveals the permissive nature of the Escherichia coli genome and identifies unconventional target sites . 
+ Genes Dev 2619 -- 
+ 2630.
+ Wade , J.T. , Roa , D.C. , Grainger , D.C. , Hurd , D. , Busby , S.J.W. , Struhl , K. , and Nudler , E. ( 2006 ) Extensive 
+ Wang , S. , Fleming , R.T. , Westbrook , E.M. , Matsumura , P. , and McKay , D.B. ( 2006 ) Structure of the 
+ Escherichia coli FlhDC complex , a prokaryotic heteromeric regulator of transcription . 
+ J Mol Biol 355 : 798 -- 808 . 
+ Wunderlich , Z. , and Mirny , L.A. ( 2009 ) Different gene regulation strategies revealed by analysis of binding 
+ Xing , K. , and He , X. ( 2015 ) Reassessing the `` duon '' hypothesis of protein evolution . 
+ Mol Biol Evol 32 : 1056 -- 
+ Yu , H.H.Y. , and Tan , M. ( 2003 ) σ28 RNA polymerase regulates hctB , a late developmental gene in Chlamydia . 
+ ( speA ) ( ybhK ) ( yqjA ) ( holA ) ( otsA ) ( rmuC ) ( hslU ) ( uhpC ) ( ydcU ) ( yjiN ) - - ✓ ( serT ) hyaA - -- Intergenic ( between convergent genes ) tsr/yjiZ - ✓ ✓ insB-4 / cspH - -- 1 Genes associated with FliA binding sites . 
+ Genes in parentheses have an internal FliA binding site ; genes not in parentheses start < 300 bp downstream of a FliA binding site and are orientated in the same direction as the putative promoter . 
+ Asterisks indicate FliA binding sites previously reported to be associated with transcription of an mRNA ( Fitzgerald et al. , 2014 ) . 
+ 2 Check marks indicate a significant difference in β-galactosidase activity between + fliA and ΔfliA cells for the corresponding lacZ transcriptional fusion ( Figure 1 ) . 
+ 3 Check marks indicate association with a nearby TSS . 
+ 4 Check marks indicate a downstream : upstream ( relative to the putative TSS ) coverage ratio ≥ 2 . 
+ 5 Check marks indicate regulation of the corresponding gene ( s ) , as determined using RNA-seq ( Fitzgerald et al. , 2014 ) . 
+ flgM flgK trg ycgR cheM ( motB ) ( flhC ) motA fliA fliC fliD STM14_2852 ( pepB ) ( STM14_3340 ) TAAAGTTTATGCCTCAAGTGTCGATAAC ( 280 ) 1954 fljBA STM14_3817 STM14_3893 ( STM14_3895 ) TAAAGATAAATAGATTAGCGCCGAAATA aer 3504766 ( arcB ) 3559655 ( yhdA ) 3801092 yhjH 4531425 ( nrfB ) 4802894 tsr 1 Genome coordinate of the ChIP-seq peak center . 
+ Coordinates are relative to the 14028s chromosomal reference sequence ( NC_003198 .1 ) . 
+ 2 Fold Above Threshold ( FAT ) score , a measure of relative ChIP-seq enrichment . 
+ 3 Genome coordinate of the sequence motif identified using MEME . 
+ Coordinates are relative to the 14028s chromosomal reference sequence ( NC_003198 .1 ) . 
+ 4 Genomic strand of the sequence motif identified using MEME . 
+ 5 Sequence of the motif identified using MEME . 
+ 6 For intergenic FliA binding sites , the downstream gene is listed . 
+ Genes containing intragenic FliA binding sites are listed in parentheses . 
+ Underlining indicates that the putative promoter is in the antisense orientation relative to the overlapping gene . 
+ If a gene start is located within 300 bp of a putative intragenic FliA promoter , that gene name is listed as well . 
+ 7 Normalized expression values for the indicated genes , as determined by RNA-seq . 
+ 8 Asterisks indicate significant differential expression between wild-type and ΔfliA cells ( q < 0.01 ) . 
+ Schematic of transcriptional fusions of potential FliA promoters to the lacZ reporter gene . 
+ For all FliA binding sites identified in a previous study , transcriptional fusions to lacZ were constructed using positions -200 to +10 relative to the predicted TSS based on the previously identified FliA binding motif ( Fitzgerald et al. , 2014 ) . 
+ ( B ) β-galactosidase activity for transcriptional fusions for FliA binding sites in intergenic regions upstream of genes , for wild-type ( wt ; DMF122 ; green bars ) and ΔfliA ( DMF123 ; gray bars ) cells . 
+ Reporter fusions that showed significantly lower β-galactosidase activity in ΔfliA cells than wild-type cells ( t-test p < 0.05 ) are indicated . 
+ The genes downstream of the FliA binding sites are listed on the x-axis . 
+ ( C ) As above , but for FliA binding sites within genes or between convergently transcribed genes . 
+ Genes containing FliA binding sites are listed on the x-axis in parentheses . 
+ Genes not in parentheses are downstream of the corresponding FliA binding transcriptome datasets . 
+ ( A ) For each FliA binding site identified previously ( Fitzgerald et al. , 2014 ) , we determined the distance to each downstream TSS identified previously ( Thomason et al. , 2015 ) within a 500 bp range . 
+ The frequencies of these distances are plotted in 10 bp bins ( green line ) , with the inset showing the frequency of binding sites 10-30 bp upstream of TSSs with a bin size of 1 bp . 
+ The gray line shows the frequency of distances from FliA binding sites to a control , randomized TSS dataset ( see Methods ) . 
+ ( B ) 
+ Normalized sequence read coverage from published NET-seq data ( Larson et al. , 2014 ) ( see Methods ) for each previously identified FliA binding site ( Fitzgerald et al. , 2014 ) , plotted 100 bp upstream and downstream of the known/predicted TSS . 
+ Predicted TSSs are indicated by the dashed vertical line . 
+ Darker green indicates higher 
+ Heat-map depicting the match to the FliA consensus binding site for regions in the genomes of a range of bacterial species , where the region analyzed is homologous to a region surrounding a FliA binding site in E. coli . 
+ Genera are listed on the left . 
+ E. coli genes associated with the binding sites are listed across the top of the heat-map . 
+ FliA binding sites are grouped by location/orientation category , as indicated by category labels across the bottom of the heat-map . 
+ Genes containing FliA binding sites are listed in parentheses . 
+ Genes not in parentheses are downstream of the corresponding FliA binding site . 
+ The color scale indicating the strength of the sequence match is shown next to the heat-map . 
+ Empty squares in the heat-map indicate that the corresponding genomic region in E. coli is not sufficiently conserved in the species being analyzed . 
+ ( B ) 
+ Conservation of FliA sites across 9,432 E. coli strains . 
+ For each site from E. coli K-12 , conservation was determined at each position within the site for all strains of E. coli where the surrounding sequence is conserved . 
+ Thus , the fraction of genomes in which each base is conserved was calculated . 
+ Values plotted represent the average ( mean ) level of conservation for ( i ) 18 FliA sites that represent promoters for mRNAs 
+ ( filled circles ; Table 1 ) , and ( ii ) the remaining 34 FliA sites ( empty circles ) . 
+ The FliA binding motif is shown read coverage across the S. Typhiumurium genome for a FliA ChIP-seq dataset . 
+ Annotated genes are indicated by gray bars . 
+ The green graph shows relative sequence read coverage , with `` spikes '' corresponding to sites of 
+ FliA association . 
+ ( B ) Pie-chart showing the distribution of identified FliA binding sites relative to genes . 
+ `` Inside '' = FliA binding within a gene . 
+ `` Upstream '' = FliA binding upstream of a gene . 
+ `` Inside + us '' = FliA binding within a gene but within 300 bp of a downstream gene start . 
+ ( C ) Enriched sequence motif associated with FliA binding sites identified by ChIP-seq . 
+ ( D ) Distribution of motifs relative to ChIP-seq peak centers for all FliA binding sites identified by ChIP-seq . 
+ Motifs are enriched in the region ~ 25 bp upstream of the peak normalized expression ( see Methods ) for each gene in S. Typhimurium for wild-type cells ( 14028s ; x-axis ) or 
+ ΔfliA cells ( DMF088 ; y-axis ) . 
+ Gray dots represent genes that are not associated with a FliA binding site and are not significantly differentially expressed between wild-type and ΔfliA cells . 
+ Black dots represent genes that are not associated with a FliA binding site and are significantly differentially expressed between wild-type and 
+ ΔfliA cells . 
+ Green circles represent genes that are associated with an upstream FliA binding site . 
+ Green triangles represent genes that are associated with an internal FliA binding site . 
+ Filled green circles/triangles indicate genes that are significantly differentially expressed between wild-type and ΔfliA cells . 
+ Empty green circles/triangles represent genes that are not differentially expressed between wild-type and ΔfliA cells . 
+ conservation of FlhC amino acid sequence between E. coli and 51 other γ-proteobacterial species . 
+ The graph indicates the level of identity across all species analyzed for each amino acid in FlhC ; data for Ala177 and 
+ Asp178 are highlighted in red . 
+ The nucleotide sequence of flhC in the motA promoter region is indicated , aligned with the previously reported FliA binding motif logo ( Fitzgerald et al. , 2014 ) . 
+ Codons 177 and 178 are shown in red . 
+ ( B ) Motility assay for ΔflhC : : thyA E. coli ( CDS105 ) containing either empty vector ( pBAD30 ) , or plasmid expressing wild-type FlhC ( pCDS043 ) or D178A mutant FlhC ( pCDS044 ) . 
+ Dashed red circles indicate the inoculation sites . 
+ Plates were incubated for 7 hours . 
+ The schematic to the left of the plate image shows how the strain was constructed . 
+ ( C ) Enriched sequence motif found in the flhC-motA intergenic regions of species in which FlhC Asp178 is not conserved . 
+ This motif is a close match to the known FliA binding site consensus . 
+ sequences between flhC and motA for selected proteobacterial species where Asp178 of FlhC is conserved/not conserved . 
+ Putative FliA promoters identified by MEME for species where Asp178 of FlhC is not conserved are
\ No newline at end of file
--- a/data/TEXT_FILES/useful_txt/29484588.txt 0 → 100644
View file @27818a9
+++ b/data/TEXT_FILES/useful_txt/29484588.txt 0 → 100644
View file @27818a9
+ Chapter 5
+ Bacterial small regulatory RNAs ( sRNAs ) are key actors in the finetuning of gene expression , ensuring rapid adaptation of bacteria to their ever-changing environment . 
+ sRNAs typically act at the posttranscriptional level by base-pairing to their messenger RNA ( mRNA ) targets in the 5 ′ untranslated region ( UTR ) [ 1 ] . 
+ Remarkably , limited complementarity between the sRNA and its targets not only allows the regulation of multiple mRNAs by a single sRNA but also the regulation of one mRNA by multiple sRNAs . 
+ This added complexity creates an extensive regulatory network where sRNAs act as bridges between various cellular metabo-lisms [ 2 ] . 
+ In the last decades , such networks were studied and specific sRNA targetomes were , in part , characterized . 
+ Since then , this field of study witnessed an explosion of technological advances [ 3 ] that exposed the versatility of sRNAs in terms of possible pairing sites and mechanisms of action . 
+ Indeed , these short regulators not only pair in the 5 ′ UTR of mRNAs , they can also target the coding sequence ( CDS ) [ 4 , 5 ] or could even pair in the 3 ′ UTR of targets . 
+ Moreover , sRNAs can regulate the translation of mRNA targets without directly affecting the stability of the transcript [ 6 ] . 
+ This new knowledge exposed a significant lack in efficacy of the classical techniques used to identify targets of sRNAs , reviving the challenge of sRNA target identification in bacterial cells . 
+ To tackle this issue , we developed and optimized a technique that combines RNA affinity purification and RNA sequencing ( RNAseq ) allowing genome-wide identification of sRNA -- mRNA interaction in bacterial cells . 
+ The assay is called MAPS : MS2 affinity purification coupled to RNAseq . 
+ Here , we describe the MAPS protocol in detail . 
+ Briefly , a sRNA is tagged with an MS2 RNA aptamer and expressed in vivo . 
+ Following cell lysis , tagged sRNAs are purified through affinity chromatography . 
+ Eluted RNAs are analyzed by highthroughput RNAseq and the ratio of enriched mRNAs in the tagged vs. untagged sRNA experiments is representative of the interaction between the two RNAs . 
+ Moreover , we describe the bioinformatic pipeline used to analyze MAPS data exploiting the Galaxy Project Platform . 
+ Ultrapure water should be used for every solution . 
+ Make sure to work with RNase-free material to avoid degradation of RNAs throughout the experiment . 
+ 3 Methods
+ 2 . 
+ Disposable Bio-Spin chromatography columns . 
+ 3 . 
+ Purified MS2-MBP fusion protein . 
+ MBP : maltose-binding protein [ 7 ] . 
+ 4 . 
+ Buffer A : 20 mM Tris -- HCl ( pH 8 ) , 150 m MgCl2 , 1 mM DTT , and 1 mM PMSF . 
+ 5 . 
+ Elution buffer : Buffer A supplemented with 15 mM maltose . 
+ 6 . 
+ Phenol-water , pH 6 : Melt phenol crystals at 65 °C and preheat an equal volume of ultrapure water to the same temperature . 
+ Carefully mix equal volume of liquid phenol and ultrapure water . 
+ Add 0.1 % w / ( phenol volume ) 8-hydroxyquinoline and mix carefully . 
+ Incubate 5 min at 65 °C . 
+ Aliquot in 50 mL conical tubes . 
+ Keep at 4 °C , protect from light . 
+ 7 . 
+ 25:24:1 ( v/v/v ) phenol-chloroform-isoamyl alcohol . 
+ 8 . 
+ Glycogen . 
+ 9 . 
+ 95 % v/v ethanol . 
+ 10 . 
+ 75 % v/v ethanol . 
+ All steps should be performed on ice . 
+ All buffers should be at 4 °C . 
+ 1 . 
+ Let the cell pellets thaw on ice for 30 min . 
+ 2 . 
+ Resuspend the pellet in 2 mL of Buffer A ( see Notes 6 and 7 ) . 
+ 3 . 
+ Chill the French Press cell by burying it in ice before performing the lysis . 
+ 4 . 
+ Break the bacterial cells using a French Press at 430 psi , 3 times per sample . 
+ Keep samples on ice at all times ( see Note 8 ) . 
+ 5 . 
+ Clear the lysates by centrifugation at 16,000 × g at 4 °C , 30 min . 
+ 6 . 
+ Transfer the soluble fraction ( lysate ) to clean tubes . 
+ Keep on ice . 
+ All steps should be performed on ice . 
+ All buffers should be at 4 °C . 
+ 1 . 
+ Add 75 μL amylose resin to a Bio-Spin disposable chromatography column . 
+ 2 . 
+ Equilibrate the column three times with 1 mL of Buffer A ( see Note 9 ) . 
+ 3 . 
+ Use the provided stopper to seal the column . 
+ Dilute 100 pmol of MS2-MBP coat protein in 1 mL Buffer A. Apply the protein solution to the sealed column and let stand for 5 min . 
+ 4 . 
+ Remove the stopper and let the column drain . 
+ 5 . 
+ Wash the column twice with 1 mL of Buffer A. 6 . 
+ Load the bacterial lysate onto the column , 1 mL at a time and let the column drain . 
+ 7 . 
+ Wash the column 5 times with 1 mL of Buffer A ( see Note 10 ) . 
+ 8 . 
+ Insert the column in a clean RNase-free collecting tube . 
+ Elute with 1 mL of Elution Buffer . 
+ 9 . 
+ Split the column output into two 1.5 mL microtubes . 
+ 10 . 
+ Add 1 volume of phenol-chloroform-isoamyl alcohol to each tube and mix . 
+ Centrifuge at 16,000 × g at room temperature , 10 min . 
+ 11 . 
+ Transfer the aqueous phase in clean microtubes containing 20 mg of glycogen ( see Note 11 ) . 
+ 12 . 
+ Add two volumes of 95 % EtOH . 
+ Mix thoroughly and precipitate overnight at − 80 °C . 
+ 13 . 
+ Centrifuge the samples at 16,000 × g at 4 °C , 30 min . 
+ 14 . 
+ Remove the supernatant very carefully and add 500 μL of ice-cold 75 % EtOH to the pellets . 
+ Centrifuge the samples at 16,000 × g at 4 °C , 5 min ( see Note 12 ) 
+ 15 . 
+ Remove the supernatant . 
+ Let the RNA pellets dry completely . 
+ 16 . 
+ Resuspend the pellets in 86 μL of ultrapure H2O . 
+ Proceed to the next step ( see Note 13 ) . 
+ 1 . 
+ Add 10 μL of 10 × TURBO ™ DNase Buffer and 4 TURBO ™ DNase to each sample . 
+ 2 . 
+ Incubate at 37 °C , 30 min . 
+ 3 . 
+ Add 100 μL of phenol-chloroform-isoamyl alcohol to each tube and mix . 
+ Centrifuge at 16,000 × g at room temperature , 10 min . 
+ 4 . 
+ Add 2.5 volumes of 95 % EtOH . 
+ Mix thoroughly and precipitate overnight at − 80 °C . 
+ 5 . 
+ Centrifuge the samples at 16,000 × g at 4 °C , 30 min . 
+ 6 . 
+ Remove the supernatant carefully . 
+ Let the RNA pellets dry completely . 
+ 7 . 
+ Resuspend the dried pellets in 6 μL of ultrapure H2O . 
+ 8 . 
+ Quantify and verify the quality of the RNA using Agilent Nano Chip in a Bioanalyzer 2100 . 
+ 9 . 
+ Prepare the cDNA libraries with the ScriptSeq ™ v2 RNA-Seq Library Preparation Kit from Illumina . 
+ 10 . 
+ Sequence the libraries in both directions using Illumina MiSeq . 
+ Bioinformatics tools used are freely available on the Galaxy Platform [ 8 ] ( https://usegalaxy.org/ ) . 
+ The following procedure allows the alignment and visualization of the RNA sequencing reads on the genome of interest . 
+ Note that the procedure has to be performed independently for the experimental data set ( MS2-sRNA ) and the control data set ( sRNA ) . 
+ Here , the procedure will be detailed for one experimental data set with paired-end sequencing . 
+ Refer to Fig. 1a for visual workflow and to Table 1 for Galaxy Project tool details . 
+ name = `` NAME_EXPERIMENT '' description = `` BedGraph format '' visibility = full ( see Note 15 ) . 
+ 8 . 
+ Navigate to the UCSC Microbial Genome Browser [ 13 ] ( http://microbes.ucsc.edu/ ) . 
+ 9 . 
+ Enter the name of the reference genome that you need . 
+ Here , we used Escherichia coli K12 . 
+ 10 . 
+ Click on the Manage custom tracks button and add your custom tracks ( see Note 16 ) . 
+ 11 . 
+ Click on View in Genome Browser . 
+ You can now search for a gene name or genomic position to visually compare read alignment on the genome ( Fig. 2 ) ( see Note 17 ) 
+ The following procedure is performed to assign the RNAseq reads to gene names and to compare the experimental data set ( MS2-sRNA ) to the control ( sRNA ) data set . 
+ To perform this step , three files are required and need to be processed . 
+ Steps 1 and 2 are performed on the SAM files from step 4 in Subheading 3.5.1 for the MS2-sRNA experimental condition and for the sRNA control condition . 
+ Steps 4 and 5 are performed on the file acquired in step 3 of this section . 
+ Refer to Fig. 1b for visual workflow and to Table 1 for Galaxy Project tool details . 
+ 1 . 
+ Run the Convert SAM to interval tool with default parameters on each SAM files . 
+ 2 . 
+ On the converted files , run the Remove beginning of a file tool . 
+ These data sets are ready to use . 
+ 3 . 
+ Through the NCBI database , download the Gene Data Bank of the bacterial strain corresponding to your experimental strain . 
+ This is a . 
+ txt file . 
+ 4 . 
+ Upload files : Gene Data Bank file . 
+ Set the file type as Interval 
+ 5 . 
+ Run Compute an expression on every row with the Add Interval parameter c3 -- c2 . 
+ 6 . 
+ On the output from step 5 , go to edit attributes and set the Database/Build as the reference genome used in step 4 of Subheading 3.5.1 . 
+ This data file is ready to use . 
+ 7 . 
+ Run Join tool . 
+ In the parameters , join your sample data set ( MS2-sRNA or sRNA ; step 2 in Subheading 3.5.2 ) with the Gene Data Bank data set ( see step 6 ) . 
+ Use default parameters . 
+ 8 . 
+ Run the Group tool for each output files obtained at step 7 . 
+ Select the following parameters : ( a ) Group by column : column 23 . 
+ ( b ) Ignore case by grouping : no . 
+ ( c ) Ignore lines beginning with these characters : select all characters except for the dot ( . ) . 
+ ( d ) Operation : insert Operation 1 , Count Distinct on column 5 . 
+ ( e ) Operation : insert Operation 2 , Mean on column 24 . 
+ 9 . 
+ Run Join two Datasets side by side on a specified field tool with outputs from step 8 . 
+ Join the MS2-sRNA sample using column 1 with the sRNA control sample using column 1 . 
+ Set the following parameters : ( a ) Keep lines of first input that do not join with second input : Yes . 
+ ( b ) Keep lines of first input that are incomplete : No . 
+ ( c ) Fill Empty columns : yes . 
+ ( d ) Fill column by : Single fill value . 
+ ( e ) Fill value : 1 . 
+ 10 . 
+ Download the output file . 
+ Open the file with Microsoft Excel ( or any similar software ) . 
+ The first 3 columns represent the gene name , the number of reads and the gene length for the MS2-sRNA experiment . 
+ The last three columns represent the same for the sRNA control . 
+ 11 . 
+ Relativize the number of reads ( see Note 18 ) : ( a ) From the Illumina Dashboard , note the total number of reads and the total number of reads mapped for each sample . 
+ ( b ) For the MS2-sRNA , calculate the relativized number of reads : Reads / ( ( total number of reads ) X ( the total number of reads mapped for `` MS2-sRNA '' ) ) ( c ) For the MS2-sRNA , calculate the relativized number of reads : Reads / ( ( total number of reads ) X ( the total number of reads mapped for `` sRNA '' ) ) 12 . 
+ Calculate the enrichment ratio for each gene between the MS2-sRNA experiment and the sRNA control ( Fig. 3 ) 
+ 4 Notes
+ It is strongly recommended to perform in vivo validation of putative targets identified by MAPS . 
+ Various methods are useful to perform such validation . 
+ Northern blotting will be effective in assessing the sRNA-dependent modulation of the target at the RNA level [ 14 ] . 
+ Translational regulation can be determined using translational reporter-gene fusions . 
+ These procedures will not be detailed here as they are beyond the scope of this protocol . 
+ input sample will be useful at a later step ( refer to Note 11 ) . 
+ Various methods can be used for RNA extraction . 
+ We suggest the hot-phenol RNA extraction [ 16 ] . 
+ 6 . 
+ Do not resuspend all the pellets at once . 
+ Follow steps 2 -- 4 for each sample individually . 
+ Keep all samples on ice at all times . 
+ 7 . 
+ Depending on your experimental conditions ( for example if the cells were harvested at high OD600nm ) , the pellets can be resuspended in 3 mL . 
+ 8 . 
+ The number of passages on the French Press can vary according to your experimental conditions . 
+ For example , if cells were harvested at high OD600nm , break the cells 4 times per sample . 
+ 9 . 
+ Use a clean 10 mL-syringe to push the first few drops out of the column . 
+ Then , let the elution carry on by gravity only . 
+ 10 . 
+ Number of washes is an important parameter and should be optimized for each experiment . 
+ 11 . 
+ Addition of glycogen is very important , if not essential , to be able to recover the small RNA pellets and avoid their loss after precipitation . 
+ Glycogen must be in contact with the RNAs ( the aqueous phase ) before the addition of ethanol . 
+ 12 . 
+ Be very careful as the RNA pellets do n't always stick to the bottom of the tube . 
+ Remove ethanol using a micropipette . 
+ Avoid using a vacuum system . 
+ 13 . 
+ At this step , it is possible and highly recommended to test your samples by Northern blot . 
+ Compare the input samples ( see Note 5 ) with the output samples . 
+ 14 . 
+ This parameter can be adjusted . 
+ If using the FastQ Quality Trimmer with the threshold at a score of 20 causes the loss of too many sequences , the threshold can be lowered . 
+ If the overall quality of sequences is above 20 , we do not perform the FastQ Quality Trimmer . 
+ 15 . 
+ This header is essential for the next step . 
+ It informs the UCSC Microbial Genome Browser on the type of data contained in the file and allows you to name your experiment . 
+ 16 . 
+ Ideally , add both the control track ( sRNA MAPS ) and the experimental track ( MS2-sRNA MAPS ) to compare the reads aligned in each condition . 
+ 17 . 
+ Enrichment at a specific location on a gene of interest does n't always represent the exact pairing site of the sRNA . 
+ Additional experimental data is required to validate the pairing site localization . 
+ 18 . 
+ Normally , reads must be relativized taking into consideration the size of the genes they mapped to . 
+ However , since we calculate the enrichment ratio of reads between the two conditions , gene size is irrelevant in our analysis
\ No newline at end of file