Poster Abstracts K

Poster Abstracts for Category K: Sequence analysis

Poster K02
A Computer Program to Predict Gene Position and Reveal Inconsistent Designation of Similar Genes in Different Organisms
Marjanca Starcic Erjavec (1), Marko Erjavec (2), Darja Zgur-Bertok (1)
(1) Department of Biology, Biotechnical Faculty, University of Ljubljana; (2) UnistarLC

Abstract:
The DNA sequence is the foundation upon which the structure and function of an organism to a large degree depends. To find the most similar sequences the computer program BLAST is used. The presented computer program links the GenBank data with BLAST results and draws a figure with information including the sequence name, the percentage of similarity, gene positions and designations, that can be used to unravel the positions of genes on the examined DNA sequence and the different designations for genes with same function in different organisms. Examples of the computer program use are given.

Contact: Marjanca.Starcic.Erjavec [at] bf.uni-lj.si

Keywords: Gene Position and Designation, BLAST, GenBank

Poster K03
Detection and Reduction of Evolutionary Noise in Correlated Mutation Analysis
Orly Noivirt, Miriam Eisenstein, Amnon Horovitz
Weizmann Institute of Science

Abstract:
Direct or indirect inter-residue interactions in proteins are often reflected by mutations at one site that compensate for mutations at another site. Correlated mutation analysis for non-interacting proteins showed that the signal due to real interactions is of similar magnitude to the noise that arises from other evolutionary processes. A new method for detecting correlated mutation is presented that reduces the evolutionary noise by considering the evolutionary distances within the protein family.

Contact: orly.noivirt [at] weizmann.ac.il

Keywords: Coordinated Mutations, Co-evolving Residues

Poster K06
Mammalian microRNA Prediction using a Support Vector Machine Model of Sequence and Structure
Ying Sheng, Par G. Engstrom, Boris Lenhard
Computational Biology Unit, Bergen Computational Center of Science, University of Bergen, Norway

Abstract:
MicroRNAs are endogenous small noncoding RNAs with important regulatory roles in animals and plants. We present an efficient microRNA prediction method that uses sequence conservation profiles and secondary structure characteristics. The method predicts an extensive set of potential human and mouse microRNAs. We expect our predictions to contribute a significant number of new candidate miRNAs for experimental verification.

Contact: ying.sheng [at] bccs.uib.no

Keywords: microRNA, Support Vector Machine, Prediction

Poster K08
Classifying High-Level Protein Functionality: Toxins and Toxin-Like Proteins
Noam Kaplan (1,2), Michal Linial (2)
(1) Weizmann Institute; (2) The Hebrew University

Abstract:
Toxins are short proteins that appear in animal venom and are extremely varied in sequence, structure and function. We have developed a toxin classifier based on extracting sequence-derived features that are related to the notion of structural stability. Application of the classifier to the honey bee genome and to a set of recently-discovered mouse sequences reveals 3 novel toxin-like proteins. Remarkably, these proteins seem to be expressed in the brain rather than in the venom. We suggest that the novel endogenous toxin-like proteins may function as neuromodulators via toxin-like mechanisms.

Contact: noam.kaplan [at] weizmann.ac.il

Keywords: Toxins, Machine Learning, Function Prediction

Poster K12
Built-in Switches Allow Versatility in Domain-Domain Interactions
Eyal Akiva, Hanah Margalit
Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University

Abstract:
Certain domains have been identified as mediating homo-dimerization. As these domains are found also in proteins that function as monomers, a question arises as to what determines the oligomeric state of proteins having such domains. By comparing multiple sequence alignments of known dimers and monomers that contain such domains, we identify the domain segments that are responsible for this versatility. Analysis of our results in view of relevant solved structures provides insight into the molecular basis of this phenomenon and has implications for predicting the oligomeric states of proteins.

Contact: akiva2 [at] md.huji.ac.il

Keywords: Protein, Interaction, Domain, Dimer, Monomer

Poster K13
Augmenting Protein Sequence Alignment of Remote Homologues using Secondary Structure Confidence Scores
Yaniv Loewenstein, Elon Portugaly, Michal Linial
Hebrew University of Jerusalem, Israel

Abstract:
Incorporating sequence together with protein secondary structure has a synergistic affect on alignment quality and remote homology detection, without affecting the alignment speed. Yet, synergism vanishes when noisy structure prediction is used. Using Bayesian confidence levels in the alignment scoring function, we overcome this inherent problem, and reproduce substantially better performance than current methods, even when no experimental structure is at hand. Our method is applicable to BLAST searches, better sequence profiles, and large-scale unsupervised classification of protein domains.

Contact: lonshy [at] cs.huji.ac.il

Keywords: Alignment, Confidence, Homology Detection

Poster K14
Using Co-occurrence of Transcription Factor Binding Sites for the Assessment of Regulatory Potential
Holger Klein, Martin Vingron
Max Planck Institute for Molecular Genetics

Abstract:
We present an approach for the detection of co-occurrence of transcription factor binding sites within known regulatory sequences. We annotate a set of upstream regions of human genes with predicted TFBSs based on a set of representative position weight matrices. We count co-occurring pairs of TFBSs using a sliding window and calculate a log-odds score of observed vs. expected number of pairs to identify significant combinations. To assess the co-occurrence scores we use known interactions of TFs. A way to use the introduced co-occurrence scores for promoter prediction is outlined.

Contact: holger.klein [at] molgen.mpg.de

Keywords: Gene Regulation, Co-occurrence of TFBS

Poster K15
Olfactory Receptors in Peptide Space
Assaf Gottlieb (1), Tsviya Olender (2), Doron Lancet (2) , David Horn (1)
(1) School of Physics and Astronomy, Tel Aviv University, Israel; (2) Dept of Molecular Genetics, Weizmann Institute of Science, Israel

Abstract:
We use the Motif EXtraction algorithm (MEX) to extract specific peptides from Olfactory Receptor (OR) proteins of seven vertebrate species. The peptides trace well evolutionary trends. 567 peptides are found to be common to all tetrapod species, probably related to shared structure and function. While mammals share most of their peptides (3016), many peptides in non-mammals species are species-specific. Within mammals 1874 peptides are OR-family specific. Peptides that lie on the extra-cellular part of the ORs may be related to specific OR functions.

Contact: assafgot [at] post.tau.ac.il

Keywords: Olfactory Receptors, Motif Extraction

Poster K17
SIGI-HMM: Score-Based Prediction of Genomic Islands in Prokaryotic Genomes using HMMs
Oliver Keller (1), Katharina Surovcik (1), Thomas Brodag (1), Rainer Merkl (2), Roman Asper (1), Carsten Damm (3), Stephan Waack (1)
(1) Institut fur Informatik, Universitat Gottingen; (2) Institut fur Biophysik und Physikalische Biochemie, Universitat Regensburg; (3) Institut fur Numerische und Angewandte Mathematik, Universitat Gottingen

Abstract:
Horizonal gene transfer, i.e. the process of acquiring genes from foreign species, is a frequent phenomenon among microbial species. It is considered a strong evolutionary force allowing rapid adaptation to changing environmental demands. The transferred pieces of DNA often comprise a large number of genes that are found in contiguous regions called Genomic Islands (GIs).
We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene, by the use of a HMM that takes into account codon usage.

Contact: keller [at] cs.uni-goettingen.de

Keywords: Genomic Islands, Hgt, HMM, Codon Usage

Poster K18
A New Approach for Estimation of Statistical Significance in Sequence Profile-Profile Comparisons
Mindaugas Margelevicius, Ceslovas Venclovas
Institute of Biotechnology, Lithuania

Abstract:
Sequence profile-profile comparison is one of the most sensitive techniques for distant homology detection. We propose a new approach to estimate statistical significance of the profile-profile alignments. The exhaustive testing showed that this approach combined with a newly developed comparison procedure compares favorably to a number of other existing methods.

Contact: minmar [at] ibt.lt

Keywords: Profile Comparisons, Statistical Significance

Poster K20
An Algorithm for the Identification of Active Site Residues Using Pfam Alignments
Jaina Mistry, Alex Bateman, Rob Finn
Wellcome Trust Sanger Institute

Abstract:
Over 10% of the 8300 Pfam families are enzymatic, but within these families only a small fraction of the sequences have been biochemically characterized. We have developed an algorithm that takes experimentally determined active site residues contained within a family, and identifies the presence of these active site residue patterns within other family members. We compare our results to UniProt and the Catalytic Site Atlas. Using this algorithm we have predicted 606110 active site residues and have increased the active site annotations in Pfam by over 200 times.

Contact: jm14 [at] sanger.ac.uk

Keywords: Catalytic, Active Site, Annotation, Function

Poster K23
Information Content and Sequence Periodicity
Alexander Bolshoy
Genome Diversity Center, University of Haifa

Abstract:
Here we report a novel information content based method for sequence analysis. The method can conveniently indicate all kinds of periodicity and repeat-related features in a set of genomic DNA sequences. We illustrate the power of the method by studying the nucleosomal database of Ioshikhes et al. and intergenic regions of E. coli.

Contact: bolshoy [at] research.haifa.ac.il

Poster K24
DeltaProt: Toolbox for Molecular Comparison of Proteins based on Sequence Alignments
Steinar Thorvaldsen (1), Tor Fla (1), Nils P. Willassen (2)
(1) University of Tromso, Faculty of Science, Tromso, Norway; (2) Norwegian Structural Biology Centre, Tromso, Norway

Abstract:
We present statistical methods, trend-tests and visualisations that are useful when the protein sequences in alignments can be divided into two or more populations based on known phenotypic traits such as preference of temperature, pH, salt concentration or pressure. The algorithms have been successfully applied in the research on extremophile organisms.

Contact: steinart [at] math.uit.no

Keywords: Physicochemical Properties, Extremophiles

Poster K25
Assessment of Protein Domain Classifications: Automatic Sequence-Based Method and Methods based on 3D Structures
Elon Portugaly (1), Nathan Linial (1), Michal Linial (2)
(1) School of Computer Science and Engineering, The Hebrew University of Jerusalem; (2) Dept. of Biochemistry, Inst. of Life Sciences, The Hebrew University of Jerusalem

Abstract:
EVEREST is an automatic system that identifies and classifies domains within a database of protein sequences. We show that the set of EVEREST families is as similar to the set of CATH families and to the set of SCOP families as the latter two sets are similar to each other.

Contact: elonp-eccb06 [at] cs.huji.ac.il

Keywords: Automatic Annotation, Protein Domains

Poster K27
Enhancing Transmembrane Beta-Barrel Topology Prediction by Information Encoded in Multiple Sequence Alignments
Costas Tsirigos (1), Pantelis Bagos (1), Vasilis Promponas (1,2), Stavros Hamodrakas (1)
(1) Department of Cell Biology & Biophysics, Faculty of Biology, University of Athens, Greece; (2) Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus

Abstract:
In this work we demonstrate that some common pitfalls of predictive algorithms (for example broken strands, false positives or dipping loops) may be resolved by using the positional conservation and gap distribution of multiple sequence alignments. Our results indicate that it is possible to use information encoded in multiple sequence alignments to post-process and enhance topology prediction for transmembrane beta-barrels. We also have preliminary data suggesting that positions in the alignment mainly occupied by gaps may accurately indicate shifted predictions.

Contact: vprobon [at] ucy.ac.cy

Keywords: TM Beta-Barrel, Multiple Alignment, Prediction

Poster K28
On the Kinetics of Prokaryotic Transcription Initiation
Johanna Weindl (1), Pavol Hanus (1), Zaher Dawy (2), Juergen Zech (3), Joachim Hagenauer (1), Jakob C. Mueller (3,4)
(1) Institute for Communications Engineering (LNT), Technical University of Munich (TUM); (2) Department of Electrical and Computer Engineering (ECE), American University of Beirut; (3) Institute for Medical Statistics and Epidemiology, Technical University of Munich (TUM); (4) Hertie-Institute for Clinical Brain Research, University Clinic Tuebingen

Abstract:
Transcription initiation in prokaryotes has been extensively studied in the past decades. However, little is known about the kinetics involved in promoter detection by the RNA polymerase (RNAP) and its sigma subunit. We present an approach to relate the binding energy between the sigma factor and the DNA to the kinetics of promoter detection during the first step of transcription. Results suggest that the sequence surrounding the promoters contains important information to guide the RNAP and its sigma subunit in a way to increase the probability of transcription initiation.

Contact: jweindl [at] tum.de

Keywords: Kinetics, Dynamics, Transcription Initiation

Poster K29 Late-Breaking Results
Towards Alignment Independent Quantitative Assessment of Homology Detection Methods
Avihay Apatoff (1), Eddo Kim (1), Yossef Kliger (2)
(1) The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University; (2) Compugen Ltd.

Abstract:
Identification of homologous proteins provides a basis for protein annotation. Sequence alignment tools reliably identify homologs sharing high sequence similarity. However, identification of homologs that share low sequence similarity remains a challenge. Lowering the cutoff value could enable the identification of diverged homologs, but also introduces numerous false hits. Methods are being continuously developed to minimize this problem. Estimation of the fraction of homologs in a set of protein alignments can help in the assessment and development of such methods. Herein, we present a computational approach that estimates the amount of homologs in a set of protein pairs. The method requires a prevalent and detectable protein feature that is conserved between homologs. By analyzing the feature prevalence in a set of pairwise protein alignments, the method can estimate the number of homologs pairs in the set, independently of the alignments quality. Using the HomoloGene database as a standard of truth, we implemented this approach in a proteome-wide analysis. The results revealed that this approach works well for estimating the number of homologous proteins in a wide range of homology values. In summary, the presented method can accompany homology searches and method development, provides validation to search results, and allows tuning of tools and methods.

Contact: kliger [at] compugen.co.il