Báo cáo khoa học: Seed-based systematic discovery of specific transcription factor target genes

pdf
Số trang Báo cáo khoa học: Seed-based systematic discovery of specific transcription factor target genes 15 Cỡ tệp Báo cáo khoa học: Seed-based systematic discovery of specific transcription factor target genes 639 KB Lượt tải Báo cáo khoa học: Seed-based systematic discovery of specific transcription factor target genes 0 Lượt đọc Báo cáo khoa học: Seed-based systematic discovery of specific transcription factor target genes 0
Đánh giá Báo cáo khoa học: Seed-based systematic discovery of specific transcription factor target genes
4.9 ( 21 lượt)
Nhấn vào bên dưới để tải tài liệu
Đang xem trước 10 trên tổng 15 trang, để tải xuống xem đầy đủ hãy nhấn vào bên trên
Chủ đề liên quan

Nội dung

Seed-based systematic discovery of specific transcription factor target genes Ralf Mrowka1,2,3, Nils Blüthgen4 and Michael Fähling1,3 1 2 3 4 Paul-Ehrlich-Zentrum für Experimentelle Medizin, Berlin, Germany AG Systems Biology – Computational Physiology, Berlin, Germany Johannes-Müller-Institut für Physiologie, Charité-Universitätsmedizin Berlin, Germany School of Chemical Engineering and Analytical Sciences, Manchester Interdisciplinary Biocentre, University of Manchester, UK Keywords feedback; glaucoma; NF-jB; optineurin; transcription factor target prediction Correspondence R. Mrowka, Paul-Ehrlich-Zentrum für Experimentelle Medizin, AG Systems Biology – Computational Physiology, Tucholskystr. 2, D-10117 Berlin, Germany Fax: +49 30 450528972 Tel: +49 30 450528218 E-mail: ralf.mrowka@charite.de (Received 26 February 2008, revised 1 April 2008, accepted 16 April 2008) doi:10.1111/j.1742-4658.2008.06471.x Reliable prediction of specific transcription factor target genes is a major challenge in systems biology and functional genomics. Current sequence-based methods yield many false predictions, due to the short and degenerated DNA-binding motifs. Here, we describe a new systematic genome-wide approach, the seed-distribution-distance method, that searches large-scale genome-wide expression data for genes that are similarly expressed as known targets. This method is used to identify genes that are likely targets, allowing sequence-based methods to focus on a subset of genes, giving rise to fewer false-positive predictions. We show by cross-validation that this method is robust in recovering specific target genes. Furthermore, this method identifies genes with typical functions and binding motifs of the seed. The method is illustrated by predicting novel targets of the transcription factor nuclear factor kappaB (NF-jB). Among the new targets is optineurin, which plays a key role in the pathogenesis of acquired blindness caused by adult-onset primary open-angle glaucoma. We show experimentally that the optineurin gene and other predicted genes are targets of NF-jB. Thus, our data provide a missing link in the signalling of NF-jB and the damping function of optineurin in signalling feedback of NF-jB. We present a robust and reliable method to enhance the genomewide prediction of specific transcription factor target genes that exploits the vast amount of expression information available in public databases today. The prediction and analysis of the regulatory networks underlying gene expression is a central challenge in systems biology and functional genomics [1,2]. Regulation of transcription is the initial mechanism for controlling the expression of genes. Key regulators of transcription are transcription factors, which bind to DNA motifs in noncoding regions that control gene transcription. Therefore, the identification of transcription factor target genes is one major element in the understanding and reconstruction of the regulatory network. Although many DNA motifs for transcription factor binding are known and are contained as consensus sequences and binding matrices in databases such as transfac [3] and jaspar [4], their direct use for genome-wide matching in promoter sequences of higher organisms is greatly limited [5]. Current methods that use sequence data give results that are dominated by false predictions [5]. The issue of a high proportion of false positives in pure sequence-based methods has been known for a long time [6], and also Abbreviations CASP4, caspase 4; ChIP, chromatin immunoprecipitation; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; HEK, human embryonic kidney; HIF-1, hypoxia-inducible factor 1; HNF4, hepatocyte nuclear factor 4; IKK, IjB kinase; NEMO, nuclear factor kappaB essential modulator; NF-jB, nuclear factor kappaB; OPTN, optineurin; RGA, reporter gene analysis; STAT5A, signal transducer and activator of transcription 5A; TNF-a, tumor necrosis factor-a. 3178 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R. Mrowka et al. applies for the transcription factors analysed in this study. The major problem is the short length and high degeneracy of the DNA-binding motifs, which give rise to one predicted binding site per 1000–10 000 bp by sheer chance. Therefore, other resources, such as phylogenetic footprinting have been explored to further restrict and ‘purify’ potential targets to more likely candidates [7,8]. Such methods decrease the number of false predictions by about one order of magnitude, which is still not good enough for genome-wide predictions. Because the potential list of targets is too large, further information needs to be exploited to concentrate the analysis on the genes that have a higher probability of being true target genes. Gene ontology as a controlled and computer-readable way to annotate genes has been used extensively to characterize clusters of genes from microarray [9,10] data and also to validate microarray data [11]. Despite the enormous number of false-positive predictions for transcription factor targets with current methods, significant correlations with gene ontology terms have been found that can be used to enhance prediction quality [12,13]. In addition, statistical methods have been developed to associate genes with disease [14], and seed-based computational procedures have been applied to identify brain cancer-related genes [15]. Currently, experience and knowledge of pathways and an educated literature search may help us to focus on possible candidates. The inclusion of information from expression experiments conducted under different experimental conditions may hint at potential candidates for further evaluation, as these data provide the relevant biological functions of transcription factors, which directly influence mRNA concentrations in the cell. Well-designed, small-scale expression profile experiments have been successfully used to identify transcription factors involved in certain pathways [16,17]. Especially when applied to time-series data, seed-based clustering methods have been very successful in identifying novel targets by comparing expression kinetics with known targets for p53 and for picking up genes regulated in different cell-cycle phases [18,19]. However, these approaches require dedicated microarray experiments. We addressed the question as to whether it is feasible to explore the large body of expression information that is already stored in public databases. These datasets might contain information about expression at different time points for different cell lines that might be only marginally related to the transcription factor under investigation, and we wondered whether these datasets would allow us to extract the relevant information about the action of transcription factors on their targets. Systematic TF target prediction In recent years, several microarray techniques have been developed to measure mRNA concentration on a genome-wide scale [20]. In addition, efforts have been made to store individual microarray experiments in databases. Microarray expression data have been used in recent times to improve transcription factor target prediction [21]. In this work, we developed a method to exploit a dataset of approximately 1200 microarray experiments in conjunction with a seed group of known transcription factor target genes and show that the information available in the databases is sufficient to increase the accuracy of prediction drastically. We elucidate and exemplify our seed-distribution-distance method for predicting novel nuclear factor kappaB (NF-jB) targets. NF-jB is involved in pathways important for both physiological processes and disease conditions. It plays an important role in the control of immune function, differentiation, inflammation, stress response, apoptosis, cell survival, processes of development, and progression of cancers [22]. Thus, NF-jB has become one of the most widely studied transcription factors. Five NF-jB genes (NFKB1, NFKB2, RELA, c-REL and RELB) belong to the NF-jB gene family, and the resulting proteins are able to form homodimers or heterodimers [23]. Prior to activation, NF-jB is localized in the cytoplasm and is tightly associated with its inhibitors (IjB proteins) and p100 proteins. Multiple stimuli such as tumor necrosis factor-a (TNF-a), UV radiation and free radicals, activate NF-jB signalling through activation of IjB kinases (IKKs), which phosphorylate IjBs and p100 proteins, subsequently leading to their polyubiquitination and degradation [24]. Results The seed-distribution-distance method We started by defining a ‘seed’ group of known NF-jB targets by collecting known NF-jB targets mentioned in an NF-jB review paper [25] matching ensembl entries, resulting in 91 genes. Joining the 91 target genes with the genes in the microarray set resulted in 81 genes, which were used as the seed. We obtained these large-scale microarray expression data [26] (detailed description of data in supplementary Doc S1) from the Stanford microarray database [27]. The set contains genome-wide data from 1202 hybridization experiments from human tissues and cell lines. Subsequently, we ranked each gene x according to its similarity L(x) of expression to the seed group (detailed results given in supplementary Doc S2). We defined similarity L(x) for a gene x by taking the FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3179 Systematic TF target prediction R. Mrowka et al. median correlation of gene x to the seed and subtracting its median correlation to all genes (typical distributions of correlations of genes to the seed group are shown in supplementary Fig. S1). Thus, if L(x) showed high values, the particular gene was similarly regulated as the seed gene group. In contrast, if the absolute value of the similarity measure was low, it indicated that the median of distribution was close to that correlation distribution of the gene to a randomly selected group. Using the similarity measure L, we then sorted all remaining human genes and thereby obtained a ranking of the genes according to their similarity to the seed group. To avoid a circular argument, we would like to stress that for all statistical analyses and characterization of rank, the seed group was excluded. A schematic representation of this procedure is given in Fig. 1. The essence of the method is that if a gene’s correlation to those in the seed set (represented by the median) is larger than the median of the correlation to all genes, then it is more likely to be related to the seed set, the members of which are then more likely to be targets of the transcription factor. This method requires that at least the initial seed set of true targets is known, and that other targets are correlated to several genes in the seed set. Furthermore, the method is based on the assumption that there is a relationship Fig. 1. Schematic diagram of the workflow in this study. Expression profiles of a gene g are compared to the expression profiles of the seed genes and randomly selected genes. A distance score L(x) is calculated that quantifies specific expression similarity to the seed. The genes are then ranked on the basis of L(x), searched for putative binding sites in their promoter region, and subjected to a reporter gene assay. 3180 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R. Mrowka et al. Systematic TF target prediction between gene coexpression and gene coregulation. The ranking can also be done by other scores than the median correlation. For instance, we have ranked the genes using a one-sided P-value derived from a computationally more extensive Mann–Whitney ranksum test, and found similar performance as with L(x) (see supplementary Fig. S3). Top members in the rank show typical NF-jB functions We next analysed the top members of the obtained rank with regard to their gene ontology classification. For the top 600 genes, we examined whether any gene ontology classification is significantly enriched using rigorous statistics [12]. It turns out that the list of significant gene functions of the top 600 genes as shown in supplementary Table S1 is congruent with the functions of NF-jB described in the literature. We further analysed the occurrences of NF-jB typical functions within the rank. We found that there was a steep increase of the density of genes involved in ‘immune response’, starting at approximately rank 700 when moving from lowest to highest ranks. The probability of a gene being involved in the immune response is therefore greatly increased for the top members in the rank, as seen in Fig. 2. Genes involved in immune response 0.25 0.2 Density Density of occurence 0.2 0.1 0 0.15 0 "high rank" 5000 10 000 position "low rank" 0.1 0.05 0 High density of putative NF-jB DNA-binding sites in promoters in the top group of the rank As the overrepresentation of typical NF-jB-related biological functions might be due to coexpression mediated by different transcription factors, we decided to analyse the sequences of putative promoter regions of the high-ranking genes. We predicted binding sites for all vertebrate transcription factors contained in the transfac database in the 500 bp putative promoter region of all genes in the ranking. We derived the 500 bp sequences upstream of the transcriptional start site from the ensembl database. We chose to limit our search to 500 bp, because we and others observed earlier that the majority of promoter sequences fall within this region [12,28]. To illustrate our method, we chose to search for consensus sequences from the transfac database in the putative promoter regions, as this method does not require an additional parameter like more sophisticated weight-matrix methods, which typically require a cut-off score (see also supplementary Table S5). We analysed the distribution of occurrence of all predicted factor-binding sites in the promoters of genes along the rank. For each predicted binding motif, we calculated the ratio of the number of occurrences in the upper 5% of the rank divided by the expected occurrence in the top 5% (given by 0.05 times the total number of occurrences). A list of the motifs sorted by this ratio has NF-jB-binding motifs in the top ranks, namely NFKAPPAB65 (P = 0.0028) and NFKAPPAB50 (P = 0.0239) (P-values from the binomial test; see Experimental procedures). In addition, this list includes motifs of the transcription factors BACH2 (P = 0.0025), signal transducer and activator of transcription 5A (STAT5A) (P = 0.0036), and VBP (P = 0.0106), which are enriched on average in the top group. A graphical representation is given in Fig. 3 (see also supplementary Table S4). Robustness of seed-distribution-distance method 0 500 1000 1500 2000 Position of gene in the ranking Fig. 2. Density of occurrences of genes annotated with the term ‘immune response’ in the ranking after applying the seed-distribution-distance method. Immune response genes are highly enriched in the top members of the rank (P < 0.0001, two-sided Mann–Whitney rank-sum test). Red, individual occurrences of immune response genes; black line, density of genes that are annotated with the term. Inset: density for all genes in the rank. The original seed group contained 81 known NF-jB targets (supplementary Table S2). As, for most transcription factors, fewer targets are known, we investigated whether the seed-distribution-distance method might also give reliable results if the seed was substantially smaller. We applied a cross-validation strategy by randomly dividing the original 81 targets into two groups, one group being the seed, and the remaining genes constituting the other group, named the test group, t. Several sizes of the seed were used (1, 10, 20 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3181 Systematic TF target prediction R. Mrowka et al. Enrichment of putative transcription factor binding sites in top group Histogram of recovery test NFκB 65 STAT5a VBP1 2.5 NFκB 50 2 1 Sites not enriched 1.5 Binding sites for 234 other vertebrate transcription factors Original seed n = 81 Seed n = 50 Seed n = 20 Seed n = 10 Seed n = 1 0.5 Relative occurence BACH2 Sites enriched 3 0.4 0.3 0.2 0.1 0.5 0 0 Occurence Enriched P < 0.025 Depleted P < 0.025 Fig. 3. Distribution of enrichment of putative transcription factorbinding motifs in the ranking after applying the seed-distribution-distance method. The seed-distribution-distance method enriches genes with putative NF-jB-binding sites in the respective promoter. The top gene group of the seed rank was analysed regarding transcription factor-binding motif enrichment within the )500 bp promoter region. The binding motifs for NF-jB 50 and NF-jB 65 are among the transcription factor-binding sites that are most strongly enriched. Note that the initial seed group was not contained in this analysis. and 50 are shown in Fig. 4; cumulative representations of the distributions are provided in supplementary Fig. S2). After rank construction using the reduced seed, the test group was then analysed regarding its position in the rank. This procedure was repeated 100 times. It turned out that the test group members were strongly present in the top positions of the rank, and this was preserved even if a considerable part of the original targets was not used for the seed. Even if one used, for example, only 10 of 81 members of the seed, the remaining 71 genes in the test group were highly enriched in the top ranks, as shown in Fig. 4. Moreover, we addressed the question of whether the seed-distribution-distance method is also effective in enriching targets for other transcription factors. We chose E2F [29,30], ETS1 [31,32], hypoxia-inducible factor 1 (HIF-1) [33], hepatocyte nuclear factor 4 (HNF4), and c-Myc [34], and collected seed groups for these factors (supplementary Tables S2 and S3). We applied our method to these seed groups in a jackknife manner (i.e. we iteratively left one seed member out and determined its position in the rank). For all of 3182 0 2000 4000 6000 8000 10 000 12 000 14 000 Recovered position in gradient Fig. 4. Recovery of target genes in a cross-validation test: the original seed was divided into two parts: (a) a group of members for rank construction; and (b) a test group with the remaining members of the original seed. Histograms of the recovery position of the test group are shown for the newly constructed ranks using the seed without the test group (median: s, , h, ). If, for example, 10 genes are used as a seed (71 in the test group), the relative occurrence of the recovered positions are still very high (h), i.e. the enrichment capability of the seed-distribution-distance method is still highly preserved. For comparison, the relative occurrence of members of the original seed in the corresponding rank is given (d). The error bars indicate the 5th and 95th percentiles of the distribution. Corresponding cumulative histograms are given in supplementary Fig. S2. these additional transcription factors, the seed members left out were strongly enriched in the top of the rank (Fig. 5). Moreover, the top members of the rank were strongly enriched with typical gene ontology terms of the factors for E2F and HNF4. For ETS1, HIF-1 and c-Myc, this ontology enrichment is not as clear as for the other three tested factors. One reason could be the considerably lower number of gene ontology annotated genes for the specific terms and, in the case of c-Myc, the broad-spectrum ontologies [34]. The results of this jack-knife procedure also provide an estimate of how many of the true positives will lie in the upper 5%: about 18–39% of all targets would be in the upper 5% of genes of the rank (26% for NF-jB, 39% for E2F, 29% for ETS1, 18% for HIF-1, 36% for HNF4, and 20% for c-Myc). Thus, applying the seed-distribution-distance method will enrich the true targets in the top 5% of the rank by a factor of 4–8. FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R. Mrowka et al. Systematic TF target prediction Table 1. Potential NF-jB targets identified by the seed-distribution-distance method that are in the top group of the rank and have predicted NF-jB-binding motifs within their )500 bp upstream promoter region. Interestingly, eight of the 16 identified new targets are known targets of NF-jB. Note that all potential new targets were not in the initial seed group, so the otherwise known targets therefore constitute a good validation of our method. The third column contains additional information about the results of the analysis of the ChIP assays and the reporter gene analysis (RGA) followed by a + or ) in case of a positive or negative result, respectively. Description Reference for evidence as an NF-jB target ENSG00000100906 NF-jB inhibitor alpha (NFKBIA) Sun et al. [58], this article, CHIP+, RGA+ (positive control) ENSG00000197635 ENSG00000142539 ENSG00000123240 ENSG00000173432 ENSG00000163739 ENSG00000081041 ENSG00000169245 ENSG00000117151 ENSG00000135604 ENSG00000023445 ENSG00000196954 ENSG00000166718 ENSG00000077150 ENSG00000158714 ENSG00000163435 Dipeptidyl peptidase 4 (DPP4) Transcription factor Spi-B (SPI-B) Optineurin (OPTN) Serum amyloid A protein precursor (SAA1) Growth-regulated protein a precursor (CXCL1) Macrophage inflammatory protein 2a precursor (CXCL2) Small inducible cytokine B10 precursor (CXCL10) Di-N-acetylchitobiase precursor (CTBS) Syntaxin-11 (STX11) Baculoviral IAP repeat-containing protein 3 (BIRC3) Caspase-4 precursor (EC 3.4.22.-) (CASP4) Hypothetical protein Nuclear factor NF-jB p100 subunit (NFKB2) SLAM family member 8 precursor (SLAMF8) E74-like factor 3 (ELF3) ENSEMBL ID Taken together, these results suggest that the seeddistribution-distance method is applicable to other transcription factors as well, and might be used for much smaller seed sizes than the 81 genes used in the NF-jB seed. The list of predicted NF-jB targets and experimental verification We assembled a list of predicted NF-jB target genes by selecting all genes that showed a putative NF-jBbinding site (a match of a transfac consensus motif of NF-jB) in the 500 bp upstream of the transcription start site and were members of the upper 5% in the rank. The resulting list is shown in Table 1. Eight of the 16 predicted targets have already been reported in the literature to be direct targets of NF-jB, but were not in the seed. We decided to validate three of the novel predicted targets by performing luciferase reporter assays. We focused on optineurin (OPTN), among SPI-B, and caspase 4 (CASP4), and chose NFKBIA as a positive control and DARS from the bottom of our rank as a negative control. We cloned their human promoters in a luciferase reporter plasmid and generated identical plasmids in which the predicted consensus sequence of the NF-jB-binding site was deleted. A widely used method to induce NF-jB is stimulation by means of TNF-a. Human HEK293 cells were transiently transfected with the reporter plasmids, and TNF-a stimula- This paper, ChIP+, RGA+ This paper, ChIP+, RGA+ Edbrooke et al. [59] O’Donnell et al. [60] Guitart et al. [61] O’Donnell et al. [60], suggested Hosokawa et al. [62] This article, RGA+, ChiP) Lombardi et al. [63] Grall et al. [64] tion (1.25–20 ngÆmL)1) was applied. For all three unmodified promoters, luciferase activity was strongly induced in a concentration-dependent manner under TNF-a stimulation in the undeleted plasmid, very similar to our positive control NFKBIA. In contrast, in the experiment with the plasmids in which we had deleted the putative NF-jB sites, the concentrationdependent stimulation effect was not seen for OPTN and CASP4 promoters, and was strongly reduced for the Spi-B promoter (Fig. 6), indicating that the NF-jB action was blocked in the deleted mutant. The negative control (DARS) did not show any significant dosedependent change in expression. Furthermore, we applied the chromatin immunoprecipitation (ChIP) analysis in order to verify NF-jB interaction with the predicted NF-jB-binding sites. A positive ChiP signal was obtained for OPTN and SPI-B as well as for NFKBIA in stimulated cells (Fig. 6). NFjB-dependent activation of the CASP4 promoter was not indicated by ChIP analysis in HEK293 cells (Fig. 6Be). This correlates well with a very low basal promoter activity, and therefore may be attributed to a silenced CASP4 promoter in the cellular model used. Discussion We have described the seed-distribution-distance method for the identification of specific transcription factor target genes. This strategy extracts relevant information about gene regulation from large-scale FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3183 Systematic TF target prediction Transcription Factor Cross-validation Gene ontology 30 25 Density (%) Number of genes E2F R. Mrowka et al. 20 15 10 5 5 0 0 10 Extracellular matrix 8 Density (%) Number of genes ETS1 Cell cycle 10 6 4 2 10 5 0 0 25 Density (%) Number of genes 30 HIF-1 20 15 10 5 Response to hypoxia Angiogenesis 10 5 0 0 Liver development Blood coagulation Lipid metabolic process 5 Density (%) Number of genes 6 HNF4 4 3 2 1 10 5 0 0 Immune response 30 Density (%) Number of genes 40 NFkB 20 10 10 5 0 0 25 Cel proliferation Density (%) Number of genes 30 c-Myc 20 15 10 5 10 5 0 0 0 10 000 20 000 0 microarray experiments to generate a distribution-distance-derived target prediction based on a seed set of known target genes of a specific transcription factor. The target prediction is based on a combination of 3184 5000 10 000 Position in rank Position in rank Fig. 5. Left column: cross-validation of the seed distribution method for six different transcription factors. By means of a jackknife method, the recovery position of the gene left out in the rank was calculated for each transcription factor seed group. There is a clear and high enrichment in the top ranks for each transcription factor tested. Right column: we applied the seed distribution method to rank genes. We calculated the gene ontology density for typical ontologies of the corresponding factor. Enrichment corresponds to an increased density at the top ranks as compared with the density at the bottom ranks. transcription factor-binding site information and the distribution distance. We took especial care to keep our method simple and the number of free parameters as low as possible, so our results do not depend on FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R. Mrowka et al. any parameter fine-tuning. Despite the simplicity of the method, our predictions are very reliable, with 11 of the 16 predictions being true targets, corresponding to an upper bound of the false discovery rate of 33%. On the basis of a jack-knife method, we estimate that our seed-based method of ranking genes will enrich true target genes within the top 5% by a factor of 4–8. Thus, incorporating the vast amount of microarray data stored in databases can help to reduce the extraordinarily high amount of false-positives obtained with purely sequence-based methods [5,7,35]. More sophisticated clustering methods might even improve the prediction quality further. We provide both statistical and biological evidence that the seed-distributiondistance method is robust and applicable to other transcription factors and is hence very useful in predicting specific transcription factor target genes. Top rank members are involved in typical NF-jB-regulated functions and are enriched with putative NF-jB-binding sites The distance criterion for generating the rank is a kind of expression profile similarity measure with respect to the seed group. It is not a priori clear that similarly regulated genes share the same gene function. The NF-jB analysis, however, reveals that the seed-distribution-distance method highly enriches genes in the top ranks that share typical NF-jB-regulated functions. For instance, the processes immune responses, complement activation, regulation of T-cell differentiation and immune cell activation are significantly present in the top group (supplementary Table S1). Moreover, we found specific enrichment of predicted binding motifs for NF-jB 50 and NF-jB 65 in the top 5% of the genes among three others. We would expect the other factors to be functionally related to NF-jB. This is the case for STAT5A, which has been reported to be involved in severe combined immunodeficiency [36] and is involved in the immune response [37]. Please note that these statistics were obtained without the initial seed group. Therefore, it would have been possible in our example to determine with high certainty from the constructed rank which seed group was used to build up the rank, namely a group with NF-jB targets. OPTN is a direct NF-jB target We predict a list of new NF-jB targets that were not in the initial seed (Table 1). Eight of the 16 predicted novel targets have been previously confirmed. Three other predicted NF-jB targets were experimentally investigated in this study, and were identified as direct Systematic TF target prediction NF-jB targets. OPTN, Spi-B and CASP4 were in our predicted list of new targets. Deletions in the OPTN gene are causative for the adult-onset primary openangle glaucoma [38]. Glaucoma affects 67 million people worldwide [39], and is the second largest cause of bilateral blindness in the world [40]. It has been suggested that OPTN is involved in the TNF-a signalling pathway [41]; however, the molecular mode of action has been unknown up to now. It has been suggested that OPTN blocks the protective effect of E3-14.7K on TNF-a-mediated cell killing, and hence OPTN may be part of the TNF-a signalling pathway that can shift the equilibrium towards induction of apoptosis [38,41]. Recently, it has been shown that OPTN increases cell survival and translocates to the nucleus upon an apoptotic stimulus that is dependent upon the GTPase activity of Rab8, an interaction partner of OPTN [42]. Interestingly, this protective function of OPTN is lost when the OPTN protein is changed to the mutated form E50K, which is typical for patients with normal tension glaucoma [42]. We show that a deletion of a putative NF-jB-binding site in the promoter region of OPTN completely abolishes the enhancing action and modulatory effect of NF-jB on OPTN (Fig. 6). Our experiments show clearly that OPTN is a direct target of NF-jB. Recent findings indicated that TNF-a potentiates glutamate neurotoxicity through the blockade of glutamate transporter activity [43,44]. Furthermore, it was shown that OPTN and NF-jB essential modulator (NEMO) are competitive inhibitors of one another [45]. NEMO represents the regulatory subunit of IKK, which is essential for NF-jB activation [46]. Together with our data, this makes it apparent that OPTN is part of a negative feedback system that is important for NF-jB action. Elevated OPTN expression reduces induced NF-jB activation [45], and is therefore protective against induced neuronal cell death, which depends on NF-jB activity. This is in line with findings indicating that the protective function of OPTN is lost upon truncation resulting from the insertion of a premature stop codon, and when the OPTN protein is changed to the mutated form E50K, which is markedly reduced in patients suffer from glaucoma [42]. Our data provide the missing link in the signalling of NF-jB and the damping function of OPTN in signalling feedback of NF-jB. The knowledge about the direct action of NF-jB on OPTN will greatly enhance our understanding of the signalling pathways relevant for antiapoptosis, and will be helpful in designing possible new cell survival strategies in glaucoma patients. The two other newly identified and verified target genes of the NF-jB transcription factor seem to be FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3185 Systematic TF target prediction R. Mrowka et al. A. Reporter gene activity (a) B. ChIP analysis (a) Control DNA putative Lucreportergene NFkB site –409 –409 Relative values 10.000 putative Lucreportergene NFkB site deletion 1.000 0.100 n.s. 0.010 0.001 Control TNF- Control alpha Input Anti-rabbit-AB TNFalpha Anti-NFkB-AB (b) 120 1000 100 800 80 600 60 400 40 P = 0.94 200 20 0 0 NFKBIA promoter P < 4.2*10 1.25 ng·mL–1 2.5 ng·mL–1 5 ng·mL–1 10 ng·mL–1 20 ng·mL–1 Control TNF- Control alpha Input TNFalpha Anti-rabbit-AB Control TNFalpha Anti-NFkB-AB (c) –26 OPTN 10.000 TNF-alpha 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Control 1.25 ng·mL–1 2.5 ng·mL–1 –1 5 ng·mL 10 ng·mL–1 20 ng·mL–1 P < 0.03 1.000 0.100 0.010 Control TNFalpha Input OPTN Control TNFalpha Control TNFalpha Anti-rabbit-AB Anti-NFkB-AB OPTN NFkB del (d) P < 4.2*10 (d) –12 SPIB 10.000 45 TNF-alpha 40 1.25 ng·mL–1 30 2.5 ng·mL–1 Relative values Control 35 5 ng·mL–1 25 10 ng·mL–1 20 20 ng·mL–1 15 10 1.000 P < 0.01 0.100 Control 5 0 SPI-B (e) 0.35 1.25 ng·mL 0.3 2.5 ng·mL–1 –1 5 ng·mL–1 10 ng·mL–1 20 ng·mL–1 0.15 Anti-NFkB-AB 10.000 Control 0.2 Anti-rabbit-AB TNFalpha CASP4 TNF-alpha 0.25 TNF- Control alpha (e) 0.45 0.4 TNF- Control alpha Input SPI-B NFkB del P < 3.2*10–5 Luciferase activity (firefly/renilla) P < 0.003 1.000 0.100 Relative values Luciferase activity (firefly/renilla) Control DARS promoter (c) Luciferase activity (firefly/renilla) NFKBIA 10.000 TNF-alpha Relative values P < 10–15 Relative values 1200 Luciferase activity(rel.values) Luciferase activity (firefly/renilla) (b) 1.000 0.100 n.s. 0.010 0.001 0.1 Control 0.05 0 CASP4 3186 TNF- Control alpha CASP4 NFkB del Input TNF- Control alpha TNFalpha Control TNFalpha Anti-rabbit-AB Anti-NFkB-AB FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R. Mrowka et al. involved in important physiological processes related to typical known functions of NF-jB. It is known that the Spi-B transcription factor is expressed in adult pro-T cells, with Spi-B being maximal in the newly committed cells at the DN3 stage [47]. Furthermore, Spi-B can interfere with T-cell development [47]. CASP4 can function as an endoplasmic reticulum stress-specific caspase in humans, and may be involved in pathogenesis of Alzheimer’s disease [48]. When does the seed-distribution-distance method work? The major assumption of our method is that genes that are regulated by the same factor show at least some coregulation. We use a genome-wide based similarity measure L(x) based on the comparisons of the median values of two correlation distributions. For each gene (x) in the genome, we calculate L(x), which is the median correlation of gene x with all the genes within the seed set minus the median correlation of gene x with all the rest of the genes in the genome. Our approach is able to ‘add up’ contributions form all the genes in the seed set, and by the use of the median and not the mean, it can discard a reasonable amount of outliers. Subtracting the median correlation with the rest of the genome corrects for the correlation structure of the expression dataset as a whole. We also tried a more sophisticated scoring scheme by ranking the genes on the basis of a Mann–Whitney rank-sum test, which did not improve the performance of the ranking procedure. The seed-distribution-distance method is extremely robust and produces high enrichment even if a considerable part of the seed is not present. This was shown by the cross-validation procedure and the subsequent recovery test. Systematic TF target prediction The seed-distribution-distance method is expected to produce a biologically meaningful rank if the seed group is homogeneous with respect to its expression correlation. If, for instance, the seed group contains completely unrelated expression clusters that are located in the cluster space in a linearly independent way, the resulting distance measure might not to be capable of building up a transcription factor-specific rank. In this case, one would need to cluster the seed group into subseeds and to build up individual clusterspecific ranks. For instance, this might be necessary in the case of transcription factors that target different genes depending on the splice form of the transcription factor. Interestingly, however, in our analysis, the performance of the method seems not to depend crucially on the homogeneity of the expression of the seed group, as some seed groups that performed well in the cross-validation test had large intraseed variations (supplementary Fig. S4). A second consideration relates to the expression dataset. The seed-distribution-distance method relies on the assumption that the transcription factor of interest shows some biological activity in the data. If, for example, the transcription factor of interest is completely shut down in all experiments, one would not expect to be able to recover the regulation response of that factor. This issue might be of importance for genes that are only active at tight periods during development. One solution to this problem would be to generate expression experiments with artificial expression of that transcription factor or to include native material from that developmental period in the microarray analysis. The third consideration relates to the size of the seed. One would expect that if the seed is too small to define the target response adequately, the rank will be poorly defined. However, our bootstrapping test showed that 10 seed genes are capable of enriching Fig. 6. Experimental validation of predicted NF-jB targets by functional analyses and physical NF-jB interaction with the predicted NF-jBbinding sites in the nuclear chromatin context. (A) RGA. HEK293 cells were transfected and treated for 24 h with TNF-a in a dose-dependent manner (n = 4). (a) Schematic illustration of experimental design. RGA was measured with unmodified native promoter constructs (left column) and in constructs where the putative NF-jB-binding sites were deleted (right column, NF-jB del). (b) Promoter activity for NFKBIA, which is known to be a target of NF-jB, and a negative control (DARS). Only the NFKBIA promoter responded in a dose-dependent manner under stimulation with TNF-a. (c, d, e) RGA for the (c) OPTN, (d) SPI-B and (e) CASP4 promoter: All experiments showed a dose-dependent increase in promoter activity under stimulation with TNF-a. Deletion of the putative NF-jB-binding site resulted in significantly attenuated dose-dependent responses. (B) ChIP analysis. HEK293 cells were cultured with TNF-a (10 ngÆmL)1) or without (control) for 24 h prior to crosslinking and ChIP using anti-rabbit serum (negative control) or an antibody to NF-jB. Relative values of immunoprecipitated DNA were assessed by real-time PCR (n = 3). (a) Amplification of a coding region part of the intron-less gene encoding GAPDH, which should show no promoter-like activity and contains no potential NF-jB-binding element, served as control DNA. (b–e) Verification of the predicted NF-jB-binding sites was obtained for the (b) positive control NFKBIA as well as (c) OPTN and (d) SPI-B. NF-jB-dependent activation of (e) the CASP4 promoter is not indicated by ChIP analysis in HEK293 cells. FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3187
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.