Poster Abstracts for Category B: Computational genomics
|
Poster B01
Microbial Genome Signatures and their Application in Metagenomics to Monitor the Environment and to Control Infection Outbreaks
O. Reva
Bioinformatics and Computational Biology Unit and Department of Biochemistry, University of Pretoria
|
Abstract:
In the upcoming era of metagenomics the development of systems for routine and high-throughput identification of bacteria in environmental samples will become the major challenge. Of particular importance is the reliable identification of biohazardous agents, i.e., the proportion of pathogenic bacteria in the sample. Some new approaches of oligonucleotide usage statistics are proposed to resolve the problem of species identification from 3-5 kb large DNA fragments generated by mass sequencing of environmental and clinical samples.
Contact: oleg.reva [at] up.ac.za
Keywords: Nucleotide Usage Signature, Metagenomics
|
Poster B02
Tool for Automatic Detection of Co-regulated Genes
Elena D. Stavrovskaya (1,2), Vsevolod J. Makeev (3), Igor V. Merkeev (3), Andrey A. Mironov (1,2,3)
(1) Institute for Information Transmission Problems RAS; (2) Department of Bioengineering and Bioinformatics, Moscow State University; (3) State Scientific Center GosNIIGenetica
|
Abstract:
To study the regulation of transcription, it is important to identify coregulated genes (regulons). One way to do that is to cluster similar potential regulatory signals found by various experimental or computational techniques, for instance, phylogenetic footprinting. We have developed a computer tool for automatic detection of co-regulated genes. It implements the phylogenetic footprinting technique to find potential regulatory signals and uses the clustering procedure to identify potential regulons. The tool is intended for the analysis of sufficiently closely related bacterial genomes.
Contact: stavrovskaya [at] gmail.com
Keywords: Bacterial Genome, Regulon, Clustering
|
Poster B03
Comparative Genomics: Testing the Limit of Cross-Species Hybridization
Lukas Chavez (1, 2), Silvia Fluch (2)
(1) Max-Planck-Institute for Molecular Genetics; (2) Austrian Research Centers
|
Abstract:
Due to the evolutional theory and experimental data there is evidence that similar genes with similar functions are conserved in different species. The idea is to use genes that were found in model organisms for cross species hybridization analysis of less examined organisms. Hybridization is the central reaction of all microarray experiments. To test the limit of cross species hybridization an insilico model was created. A combination of the Smith-Waterman alignment algorithm and the Nearest Neighbor model simulates the hybridization behaviour of DNA sequences on a cDNA microarray.
Contact: chavez [at] molgen.mpg.de
Keywords: Cross-Hybridization, Thermodynamic Alignment
|
Poster B04
IVOMs for Identification of Horizontally Acquired DNA: Revisiting the Salmonella Pathogenicity Islands
Georgios S Vernikos, Julian Parkhill
The Wellcome Trust Sanger Institute
|
Abstract:
Interpolated Variable Order Motifs (IVOMs), exploit compositional biases using variable order motif distributions to predict more reliably Horizontal Gene Transfer (HGT) events compared to fixed-order methods. For optimal localization of the predicted boundaries, a 2nd order, 2-state Hidden Markov Model (HMM) is implemented in a change-point detection framework.
Contact: gsv [at] sanger.ac.uk
Keywords: Motifs, Salmonella, Pathogenicity Islands
|
Poster B05
Effects of DNA Copy Number on Gene Expression in Glioblastoma Multiforme (GBM)
Tal Shay (1), Wanyu L. Lambiv (2), Anastasia Murat (2), Eugenia Migliavacca (3,4), Anjan Misra (5), Burt Feuerstein (5), Mauro Delorenzi (3,4), Roger Stupp (6), Monika E. Hegi (2,4), Eytan Domany (1)
(1) Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel; (2) Laboratory of Tumor Biology and Genetics, Neurosurgery, University Hospital Lausanne (CHUV), Lausanne, Switzerland; (3) Institut Suisse de Recherche Experimentale sur le Cancer (ISREC), Epalinges, Switzerland; (4) National Center of Competence in Research (NCCR) Molecular Oncology, ISREC, Switzerland; (5) Departments of Laboratory Medicine and Neurosurgery, University of California, San Francisco, USA; (6) Multidisciplinary Oncology Center, CHUV, Lausanne, Switzerland
|
Abstract:
GBM is the most common and aggressive brain tumor. Expression profiles and DNA copy number (aCGH) of GBM cases were measured, to study relationships between expression and DNA copy number, and identify new oncogenes and tumor suppressors (TSGs). Amplicons and delicons were identified in the aCGH. Genes in the affected region that are upregulated in all amplicon carriers (vs non carriers) are putative oncogenes, whereas downregulation in all delicon carriers is indicative of putative TSGs. We found known and new oncogenes and TSGs, and characterized the effects of DNA aberrations on expression.
Contact: tal.shay [at] weizmann.ac.il
Keywords: Glioblastoma, DNA chips, aCGH, CIN
|
Poster B06
Evolution of the Methionine Biosynthesis Regulation in Streptococci
Galina Kovaleva (1,2), Mikhail Gelfand (1,2)
(1) Institute for Information Transmission Problems, Moscow, Russia; (2) Moscow State University, Department of Bioengineering and Bioinformatics, Moscow, Russia
|
Abstract:
Regulation of the methionine biosynthesis mainly involves the premature termination of transcription, determined by the T- and S-boxes. We suggest an evolutionary scenario for the methionine biosynthesis regulation in streptococcal genomes lacking T- and S-boxes. We show that Streptococci have two transcriptional regulators, one controlling genes of the methionine synthesis, and the other, cysteine synthesis genes. These factors also form a regulatory cascade.
Contact: kovaleva [at] iitp.ru
Keywords: Methionine Biosynthesis, Comparative Genomics
|
Poster B07
cRNA Microarrays and Bioinformatics Application in Lung Adenocarcinoma Samples
A. Molares (1), E. Carrillo (2), G. Gomez (1), G. Munoz Molina (2), I. Muguruza (2), I. Sanchez (3), R. Moreno (4), V. Leiro (1), F. Carrillo (3) , E. Caso (1)
(1) University Hospital, Vigo, Spain; (2) Ramon y Cajal Hospital, Madrid, Spain; (3) University Hospital, Guadalajara, Spain; (4) La Princesa Hospital, Madrid, Spain
|
Abstract:
We applied microarrays technology (Affymetrix), statistics and bioinformatics tools to study gene expression from patients having lung adenocarcinoma.
Methods: 8500 known genes were analyzed. cRNA was prepared from healthy (H), tumour limits (P) and tumoral tissues (T). Acuity 4.0 software (Molecular Devices) was used. ANOVA and Student's t-Test with Bonferroni's correction (p<0.05), hierarchical clustering (HC) and principal component analysis (PCA) with 70% variance were applied.
Results: 24 genes remained after ANOVA and t-Test. HC and PCA show two groups H and T-arrays.
Contact: amolares [at] hotmail.com
Keywords: Lung Cancer, Microarrays, ANOVA, T-test
|
Poster B08
Organ-Specific Linear Order of Gene Transcript Abundance in Arabidopsis
Anat Kats (1,2), Sagi Shporer (1), Benny Chor (1), Hanne Volpin (2)
(1) Tel Aviv Univerisity; (2) Volcani Agricultural Research Organization
|
Abstract:
We present a novel approach for organ specification based on quantitative relations of gene transcript abundance. We applied the order preserving sub-matrix (OPSM) approach to NASC Arabidopsis gene expression data, from various experimental conditions and labs. We discovered that ubiquitously expressed (but not co-expressed) genes preserve highly significant, linear orders of expression levels that are organ specific. We propose that these quantitative relations are involved in the maintenance of organ identity.
Contact: anatkats [at] hotmail.com
Keywords: Organ Specification, Gene Expression, OPSM
|
Poster B10
Computational Study of Human Cancer: Properties of Proto-Oncogenes and Tumour Suppressor Genes
Simon Furney (1,2), Stephen Madden (1), Des Higgins (1), Nuria Lopez-Bigas (2)
(1) Conway Institute, University College Dublin, Ireland; (2) Biomedical Genomics Group, GRIB, University Pompeu Fabra, Barcelona, Spain
|
Abstract:
The computational study of human cancer genes is expected to give insights into the mechanisms of tumour development. We focus on the properties of proto-oncogenes and tumour suppressor genes finding important differences in the evolutionary history and pattern of regulation of these two sets of genes. Proto-oncogenes have greater coding sequence and promoter region conservation, and are significantly more targeted by microRNAs than either tumour suppressor genes or non-cancer genes.
Contact: simon.furney [at] ucd.ie
Keywords: Cancer, Sequence Conservation, microRNAs
|
Poster B11
Genomic Location and the Evolution of Paralogous Genes
Alex Mira (1), Ravindra Pushker (2), Francisco Rodriguez-Valera (1)
(1) Evolutionary Genomics Group, Universidad Miguel Hernandez, San Juan de Alicante, Spain; (2) Conway Institute, University College Dublin, Ireland
|
Abstract:
Multiple copy genes whose sequence must be preserved identical in order to increase the dosage of the transcript are located at positions that maximize the probability of gene conversion. Even those that are recently formed, are indeed far from each other, and we have identified different mechanisms by which paralogs move across the genome, such as insertions, genomic inversions, large duplications and horizontal gene transfer. Thus, the geographic location of duplicated genes in the genome is vital for the evolution of gene families.
Contact: pushker [at] ucd.ie
Keywords: Paralogs, HGT, IS elements, tuf genes
|
Poster B12
The AMADEUS Motif Discovery Tool
Yonit Halperin (1), Chaim Linhart (1), Gidi Weber (2), Ron Shamir (1)
(1) School of Computer Science, Sackler Faculty of Exact Sciences, Tel-Aviv University; (2) School of Computer Science and Engineering, The Hebrew University
|
Abstract:
Reverse-engineering the transcriptional program of an organism requires identifying the binding sites (BSs) of its transcription factors (TFs). In a common scenario, we are given a set of co-regulated genes and our goal is to find TFs whose BSs are statistically over-represented in the promoters of these genes.
We present AMADEUS, a user-friendly software suite for efficient genome-scale detection of known and novel motifs in multiple species. AMADEUS produced highly satisfactory results on several real datasets, outperforming three common motif discovery tools we tested.
Contact: yonithal [at] post.tau.ac.il
Keywords: De-novo Motif Discovery, Promoter Analysis
|
Poster B13
GEVALT - An Integrated Software Tool for Genotype Analysis
Ofir Davidovich, Gad Kimmel, Ron Shamir
Tel Aviv University, School of Computer Science
|
Abstract:
Genotype information carries the promise of revolutionizing disease studies and the association of phenotypes with alleles and haplotypes. Tools for analyzing, interpreting and visualizing these data sets are of critical importance to researchers everywhere.
Here we present GEVALT (Genotype Visualization and Algorithmic Tool)- a software package designed to simplify and expedite the process of haplotype analysis by providing a common interface to several tasks relating to such analyses. GEVALT is based on HaploView, and combines the visual abilities of HaploView with several powerful algorithms
Contact: offirdav [at] post.tau.ac.il
Keywords: SNP, Genotypes, Tag SNPs, Algorithms, Phasing
|
Poster B14
Whole Genome Microarray in Arabidopsis Facilitates Global Analysis of Retained Introns
Hadas Ner-Gaon
Weizmann institute of science
|
Abstract:
In Arabidopsis, intron retention is the most prevalent alternative splicing (AS) type (~40%). Here we show that direct transcript expression analysis using whole-genome microarrays (WGAs) is particularly amenable for assessing global intron retention in Arabidopsis. By applying a novel algorithm we show that retained introns are detected in 8% of the transcripts examined, 86% of which can be confirmed by RT-PCR. These findings will facilitate monitoring constitutive and dynamic whole genome splicing on the next generation WGA slides.
Contact: hadas.ner-gaon [at] weizmann.ac.il
Keywords: Microarray, AS, Arabidopsis, Intron Retention
|
Poster B15
Microarray and Comparative Genomic Analysis of sigB Regulon in Bacillus Group: An Estimation of Correspondence
Elizabeth Permina (1), Mikhail Gelfand (2)
(1) GosNIIGenetika, Russia; (2) Institute of Information Transmission Problems, Russia
|
Abstract:
SigB regulon is relatively large (the estimated number of genes is about 200) and the data about it is quite contradictory. We performed a complex analysis of microarray data and comparative analysis of the sigB regulon in the group of Bacillus subtilis relatives and refined the estimation of regulon and modulon members.
Contact: epermina [at] mail.ru
Keywords: sigB, Comparative Analysis, Microarray
|
Poster B16
Towards a Comprehensive, Realistic and Interactive 4-Dimensional Model of Pancreatic Development
Yaki Setty (1,2), David Harel (1), Irun Cohen (2)
(1) Dept. of Math and Computer Sceince, Weizmann Institute of Science; (2) Dept. of Immunology, Weizmann Institute of Science
|
Abstract:
We present a reactive animation model for the early stages of pancreatic development; whose results correspond well with previous biological knowledge. In particular, the simulation mimics the 3D morphogenetic formation of the pancreas. Furthermore, analysis of the model based on in-silico experiments captured the essence of the relevant process. These preliminary results provide a basis for further modeling of pancreatic development.
Contact: yaki.setty [at] weizmann.ac.il
Keywords: RA, Computational Biology, Pancreas
|
Poster B17
A Probabilistic Model for Identifying the Key Features of Protein-DNA Interactions
Eilon Sharon, Tali Sadka, Eran Segal
Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Israel
|
Abstract:
Protein-DNA interactions are central to the process of transcriptional regulation. Learning the true nature of these interactions can provide much insight on how transcription factors identify their target binding sites, and thereby improve our understanding of transcriptional regulation. Here, we present a new method for representing and learning protein - DNA interactions. Our method uses undirected probabilistic graphical models to learn the important features of the interaction directly from sequence data.
Contact: eilon.sharon [at] weizmann.ac.il
Keywords: Probabilistic Model, Transcription Regulation
|
Poster B18
A Genome-wide Computational Scan for Novel Single-stem H/ACA-like snoRNA
Tirza Doniger (1), Yair Horesh (2), Inna Myslyuk (1), Shulamit Michaeli (1), Ron Unger (2)
(1) The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University; (2) Department of Computer Science, Bar-Ilan University
|
Abstract:
H/ACA molecules are small nucleolar RNAs (snoRNAs) that guide the process of converting a uridine into a pseudouridine which is crucial for ribosomal RNA (rRNA) processing and maturation. Unlike the double-stem structure of H/ACA molecules in other organisms, H/ACA-like molecules in Trypanosomes have a single-stem structure. Programs developed to scan genomes for H/ACA depend heavily on the double-stem structure, and thus are ineffective for use in trypanosomes. We have developed SinglePsiScan a package which enables a genome-wide scan for single-stem H/ACA-like molecules.
Contact: tirza [at] biomodel.os.biu.ac.il
Keywords: snoRNA, H/ACA, ncRNA, Computational Data Mining
|
Poster B19
Multiple Regulation of Respiration in Gamma-Proteobacteria: FNR, ArcA, NarP and ModE Regulons
Anna V. Gerasimova (1), Dmitriy A. Ravcheev (2,3), Mikhail S. Gelfand (2,3)
(1) Laboratory of Bioinformatics, State Scientific Center GosNIIGenetika, Moscow, Russia; (2) Institute for Information Transmission Problems, RAS, Moscow, Russia ; (3)Lomonosov Moscow State University, Department of Bioengineering and Bioinformatics,Moscow, Russia
|
Abstract:
Gamma-proteobacteria, such as Escherichia coli, can use a variety of respiratory substrates employing numerous aerobic and anaerobic respiratory systems controlled by multiple transcription regulators. We applied the comparative genomic approach to the analysis of four global regulatory systems, Fnr, ArcA, NarP and ModE. We identified new members of the regulon. Using a new variation of the comparative technique, we demonstrated taxon-specific changes in regulatory interactions and predicted taxon-specific regulatory cascades.
Contact: anyagerasimova [at] yandex.ru
Keywords: Respiration, Bacteria, Comparative Analysis
|
Poster B21
Regulation of Alternative Splicing by Natural Antisense Transcripts is a Common Mechanism in the Human Genome
Ido Sher (1,2), Sarit Edelheit (1), Ronen Shemesh (1), Dvir Dahary (1)
(1) Compugen Ltd., Tel Aviv, Israel; (2) Faculty of Life Sciences, Bar Ilan University, Israel
|
Abstract:
We report the identification of at least 55 gene pairs in the human genome, where the expression of an antisense gene directly influences the ratio between two functional variants of the sense gene. This is the first genome wide search for the relationship between natural antisense transcription and functional alternative splicing. Importantly, we experimentally support our findings and show high correlations between mRNA expressions for two selected cases.
Contact: ido.sher [at] gmail.com
Keywords: Antisense Transcript, Alternative Splicing
|
Poster B23
Phenotypic Divergence among Diverse Yeast Species is Affected by Translational Control Signals in Protein Coding Sequence
Orna Man, Yitzhak Pilpel
Department of Molecular Genetics, Weizmann Institute of Science
|
Abstract:
The utilization of sequence data to gain functional insights into the mechanisms underlying the biology of organisms, and their phenotypic divergence, is a challenging task. In this abstract we show that phenotypic differences between related yeast species may be reflected in translational signals embedded within the coding sequences of the genes underlying the phenotype. By analyzing multiple fully-sequenced yeast genomes we identified sets of genes that are predicted to be efficiently translated in some species but not in others. The functions of many such genes are often related to life style.
Contact: pilpel [at] weizmann.ac.il
Keywords: Gene Expression, Yeast, Metabolism, Translation
|
Poster B25
Choosing the End: Regulation of Alternative 3' Splice Sites
Martin Akerman, Yael Mandel-Gutfreund
Faculty of Biology, Technion-Israel Institute of Technology
|
Abstract:
In this study we apply a Support Vector Machine to identify alternative 3' splice site events. We show that we can distinguish between alternative and constitutive event both for tandem acceptor motifs and for splice sites which are distant apart. Finally, we suggest a possible mechanism of splice site selection
Contact: makerman [at] tx.technion.ac.il
Keywords: Alternative Splicing, Splice Site, SVM
|
Poster B29
Approaching Alternative Splicing Patterns in Exon Microarray Data from Thymoma Tumors
Lilach Soreq (1), Adi Gilboa-Geffen (2), Paul Lacoste (3), Hagai Bergman (1,4), Sonia Berrih-Aknin (3), Hermona Soreq (2,4)
(1) Hadassah Faculty of Medicine, The Hebrew University of Jerusalem, Israel; (2) The Institute of Life Sciences, The Hebrew University of Jerusalem, Israel; (3) CNRS UMR8162-CCML, Le Plessis Robinson, Paris, France; (4) The Interdisciplinary Center for Neural Computation and Eric Roland Center of Neurodegenerative Diseases, Hadassah Medical School, The Hebrew University of Jerusalem, Israel
|
Abstract:
Affymetrix Exon-level expression profiling provides unprecedented advantages for identifying dynamic changes in gene expression. Here, we report their use for studying alternative splicing in cortical thymoma tumors. Exonic data was subjected to both continuous Kolmogorov-Smirnov, and a discrete, threshold-based hypergeometric statistics. We found wide spectrum of distributional divergence among GO terms. Specifically, we identified selective alternative splicing decreases in Thymomas defense/immune transcripts. Combining Exon arrays with continuous analysis thus offers previously non-perceived options for detecting gene expression changes underlying tumorogenic events.
Contact: lilacht [at] pob.huji.ac.il
Keywords: GO, Microarrays, Alternative Splicing, Tumor
|
Poster B31
EXPANDER - A One-Stop Shop for Microarray Data Analysis
Igor Ulitsky, Adi Maron-Katz, Amos Tanay, Chaim Linhart, Israel Steinfeld, Roded Sharan, Yosef Shiloh, Ran Elkon, Ron Shamir
Tel-Aviv University, Israel
|
Abstract:
EXPANDER 3.0 (EXPression ANalyzer and DisplayER) is an integrative package for the analysis of gene expression data, designed as a 'one-stop shop' tool that implements various data analysis algorithms ranging from the initial steps of normalization and filtering, through clustering and biclustering, to high-level functional enrichment analysis and promoter cis-regulatory elements analysis. EXPANDER is available with pre-compiled functional Gene Ontology (GO) and promoter sequence-derived data files for yeast, worm, fly, arabidopsis, rat, mouse and human.
Contact: ulitskyi [at] post.tau.ac.il
|
Poster B32
A Mix of Bernoulli Distributions for Identification of Differentially Expressed Genes Detected by Replicated Microarray
Yefim Ronin (1), Irine Ronin (2)
(1) Institute of Evolution, University of Haifa, Haifa, Israel ; (2) The Hebrew University of Jerusalem, Jerusalem, Israel
|
Abstract:
Replicated microarray analyses improve the detection of differentially expressed genes by reducing biases of the system. Genes displaying identical expression profile across all hybridization experiments have a greater statistical significance compared to genes showing inconsistent results. Our research was aimed to evaluate the statistical significance for groups of such genes in order to determine the level of negative results that could be interpreted as false negative. To deal with genes displaying inconsistent results we suggest using a model of mixture of Bernoulli distributions.
Contact: efim [at] research.haifa.ac.il
Keywords: Mix of Bernoulli Distributions, Microarray
|
Poster B34
Differentiated Evolutionary Rates in Alternative Exons and the Implications for Splicing Regulation
Eduardo Eyras (1,2), Mireya Plass (1)
(1) Biomedical Informatics, Pompeu Fabra University, Barcelona, Spain; (2) Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
|
Abstract:
Two contradicting properties have been associated to conserved alternative exons: higher sequence conservation and higher rate of non-synonymous substitutions, relative to constitutive exons. We provide evidence showing that most of the observed differences can be explained by the conservation of the transcript exonic structure. We also give a description of the evolution of splicing regulators in terms of a positional conservation. These results provide evidence for a selection pressure linked to the regulation of splicing of the pre-mRNA.
Contact: eduardo.eyras [at] upf.edu
Keywords: Alternative Splicing, Regulation, Evolution
|
Poster B36
Computational Analysis of ROMA-derived Copy Number Variations in Cancer
J. Hicks (1), A. Krasnitz (1), B. Lakshmi (1), X. Zhao (1), L. Norton (2), T. Sorlie (3), A.-L. Borresen-Dale (3), A. Zetterberg (4), P. Lundin (4), M. Wigler (1)
(1) Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; (2) Memorial Sloan-Kettering Cancer Center, New York, NY, USA; (3) Radium Hospital, Oslo, Norway; (4) Karolinska Institute, Stockholm, Sweden
|
Abstract:
Cancer genomes are known to exhibit wide-ranging copy number variations (CNV). In our ongoing study we performed a genome-wide determination of relative copy number for a collection of over 250 frozen-tissue samples from Scandinavian breast-cancer patients, using microarray-based ROMA technology. We examined these CNV data for association with clinical outcome and found a novel whole-genome measure which is a strong prognostic indicator uncorrelated with other clinical parameters. We show that CNV distribution in the genome can be used to guide a search for genetic elements involved in cancer.
Contact: krasnitz [at] cshl.edu
Keywords: ROMA, CNV, Cancer, Survival, Epicenters
|
Poster B38
CrossChip: A Tool for Correlating Gene Expression Data with Genomic Imbalances
Nicolas Delhomme, Natalia Becker, Felix Engel, Frederic Blond, Peter Lichter, Grischa Toedt
German Cancer Research Center, Division of Molecular Genetics (B060)
|
Abstract:
Microarray experiments have become a standard procedure in molecular genetics to detect alterations in gene expression patterns, imbalances of genomic regions or CpG islands methylation status. In most cases, the relationship between the independently detected alterations is poorly understood. We developed an application (CrossChip) which first combines the resulting data of different microarray techniques, in the second step assigns a correlation score to each feature and finally visualizes the comparison results.
Contact: n.delhomme [at] dkfz.de
Keywords: Expression Profiling, Array CGH, Correlation
|
Poster B43 Late-Breaking Results
Predicting Novel Transcription Factor Binding Sites in Human Using a Machine Learning Approach
Sonya Liberman, Nir Friedman, Hanah Margalit
Hebrew University of Jerusalem, Israel
|
Abstract:
Transcription factors (TFs) regulate gene expression by binding to specific sequences on the DNA. At present only a partial map of the interactions between transcription factors and their targets in human is available. A major challenge is to expand the known repertoire of TF-target pairs by identifying novel Transcription Factor Binding Sites (TFBS) based on sequence data. One main difficulty in such computational predictions is the large number of false positives they generate. Here we examine the association of several sequence features with TFBS and show that they differ between true binding sites and similar sequences that are predicted as binding sites. These features include evolutionary conservation, distance from the transcription start site of the target gene, number of neighboring known binding sites of other transcription factors, number of neighboring sites with a similar sequence and orientation of the transcription factor binding. Using machine learning approaches, we developed a computational scheme for TFBSs prediction, in which prediction of sites based on sequence data is subjected to filtering and further classification according to these features. This results in a significant reduction in the number of false positive predictions and enables the construction of a more accurate transcription regulation network.
Contact: hanah [at] md2.huji.ac.il
|
|