NWO Special Year on Mathematical Biology, 2001

CWI Colloquium Crossroads of Mathematics, Informatics, and Life Sciences

Date:	Friday, 23 March 2001
Place:	CWI, Zaal Z009 Kruislaan 413 Amsterdam

Bioinformatic pattern analysis

As an activity within the NWO Special Year on Mathematics and Biology, CWI organises a seminar day in the colloquium series Crossroads of Mathematics, Informatics and Life Sciences in the field of Computational Molecular Biology.
More information on the other events of the NWO 2001 Special Theme Mathematical Biology can be found on the web site http://www.cwi.nl/projects/NWO-jaarthema . More information on the series Crossroads of Mathematics, Informatics, and Life Sciences can be found on http://homepages.cwi/nl/~peletier/crossroads_gen.shtml .

Speakers

Jaap Heringa, Division of Mathematical Biology, National Institute for Medical Research, London, England. (www)
Paulien Hogeweg, Bioinformatics Group, Department of Theoretical Biology and Bioinformatics, Utrecht University, The Netherlands.
Can Kesmir, Bioinformatics Group, Department of Theoretical Biology and Bioinformatics, Utrecht University, The Netherlands.
Thomas Lengauer, Institute for Algorithms and Scientific Computing, GMD-German National Research Center for Information Technology, Sankt Augustin, Germany.

Program

9.30h	Coffee and stroopwafels
10.00h	Opening
10.05h	Paulien Hogeweg From Bioinformatic Pattern Analysis to Evolutionary Dynamics: two case studies.
11.35h	Break
11.45h	Can Kesmir Bioinformatic pattern analysis of antigen processing and presentation
12.35h	Lunch
13.45h	Jaap Heringa Multiple sequence alignment in the post-genomic era: pitfalls, remedies and an application to domain boundary prediction
14.35h	Break
14.50h	Thomas Lengauer Computational Biology at the brink of the post-genomic era
16.20h	Closing

Organisation

Monique Laurent, CWI
Lex Schrijver, CWI and UvA
Leen Stougie, TUE and CWI

Lunch is offered by CWI and there is no fee for participation. If you would like to join the lunch we do require you to inform us by sending an e-mail to Nada Mitrovic ( Nada.Mitrovic@cwi.nl).
For directions of how to reach CWI see http://www.cwi.nl/about

Abstracts

From Bioinformatic Pattern Analysis to Evolutionary Dynamics: two case studies.
Paulien Hogeweg

Pattern analysis of DNA, RNA and Protein Sequences, whole genomes and expression data gives important insights in the dynamics of evolution.

Analysis of the mapping of RNA sequences to RNA secondary structure, and the structure of the fitness landscape defined by this mapping, has lead to a quantitative theory of evolutionary dynamics governed by neutral networks in an otherwise rugged landscape. I will discuss some of the highlights of this theory, and formalisms used to derive it, and some recent experimental support for the theory.

Examining whole genomes, of which about 50 are now available, we see that gene content of genomes is changing relatively rapidly: gene duplication, gene loss and gene generation is ubiquitous. Large scale micro-array studies, in which the expression of every gene can be measured simultaneously, gives a first glimpse of the 'division of labor' between duplicated genes. A preliminary analysis suggests that differential expression is often the primary event which allows duplicated genes to be maintained in a genome, but alternate routes also exist, most notably on the one hand the mere need of a lot of product, and on the other hand differentiation within multi-protein complexes consisting of homologous genes. I will discuss these results in terms of multilevel evolution, in particular in terms of information integration and the alternatives of 'individual based' vs 'population based' diversity.

The tools used in analyzing expression data are still very crude. I will discuss some central questions in this respect.

Bioinformatic pattern analysis of antigen processing and presentation
Can Kesmir

T cells can detect the presence of intracellular pathogens because infected cells display on their surface peptide fragments derived from pathogenic proteins. The capability of any antigenic peptide fragment to invoke cytotoxic T cell response is three-fold: First, a peptide has to be generated as a product of cytolytic degradation by the proteasome. Then it has to have enough binding affinity to TAP in order to get carried to endoplasmic reticulum. Finally a peptide has to have high affinity to the host major histocompatibility complex (MHC) Class I molecule.

In the talk, I will give an overview of the recent work done in understanding generation and presentation of these antigenic peptides. Special emphasis will be on the specificity of human (immuno)proteasome.

Multiple sequence alignment in the post-genomic era: pitfalls, remedies and an application to domain boundary prediction
Jaap Heringa

With about 70 complete genomes sequenced today, the magnitude and diversity of sequence data lead to much increased demands concerning the speed, sensitivity and versatility of sequence analysis programs. Multiple sequence alignment is one of the most important tools to make biological sense out of biological sequence data. The most widely used methods are based on the dynamic programming (DP) protocol, which gives a guaranteed best solution for the alignment of a pair of sequences. However, using this strategy for simultaneous multiple sequence alignment becomes computationally prohibitive quickly when more than a few sequences need to be aligned.

A long-standing heuristic around this problem is progressive alignment, which implies repeated use of the pair wise DP algorithm according to a preconceived order until all sequences are aligned. Although a reasonable strategy, there are many pitfalls connected to this approach, leading to ongoing problems for the alignment engines when faced with sequences of low homology, internal sequence repeats, local similarity, long insertions/deletions, etc. In my talk I will focus on these problems and discuss our strategies to address them. How the information from a multiple alignment can be applied will be shown using a new method to predict protein domain boundaries based on multiple alignment and protein secondary structure prediction.

References:
Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comput. Chem., 23, 341-364.
Notredame, C., Higgins D.G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205-217.

Computational Biology at the brink of the post-genomic era
Thomas Lengauer

The year 2000 will be remembered in history as the year in which the human genome has been sequenced. This marks the end of the pre-genomic era which was characterized by strong world-wide efforts to sequence the human genome and, in fact, ended significantly ahead of schedule. Today, we are at the entry of the probably much longer post-genomic era, which is characterized by the grand quest of making sense of the genomic text. This goal can only be achieved by a concerted effort involving biological experiments and computer analyses. Conquering the computer part is the task of the scientific field of computational biology or bioinformatics.

Here we will describe two facets of computational biology. One is that of a discipline shaped by several grand challenge basic research problems. The other is that of a field driven by a strong demand for immediate answers to pressing practical problems in biotechnology, notably in pharmaceutics and medicine.