| Date: | Friday, 23 March 2001 |
| Place: | CWI, Zaal Z009 Kruislaan 413 Amsterdam |
| 9.30h | Coffee and stroopwafels |
| 10.00h | Opening |
| 10.05h | Paulien Hogeweg From Bioinformatic Pattern Analysis to Evolutionary Dynamics: two case studies. |
| 11.35h | Break |
| 11.45h | Can Kesmir Bioinformatic pattern analysis of antigen processing and presentation |
| 12.35h | Lunch |
| 13.45h | Jaap Heringa Multiple sequence alignment in the post-genomic era: pitfalls, remedies and an application to domain boundary prediction |
| 14.35h | Break |
| 14.50h | Thomas Lengauer Computational Biology at the brink of the post-genomic era |
| 16.20h | Closing |
Pattern analysis of DNA, RNA and Protein Sequences, whole genomes and expression data gives important insights in the dynamics of evolution.
Analysis of the mapping of RNA sequences to RNA secondary structure, and the structure of the fitness landscape defined by this mapping, has lead to a quantitative theory of evolutionary dynamics governed by neutral networks in an otherwise rugged landscape. I will discuss some of the highlights of this theory, and formalisms used to derive it, and some recent experimental support for the theory.
Examining whole genomes, of which about 50 are now available, we see that gene content of genomes is changing relatively rapidly: gene duplication, gene loss and gene generation is ubiquitous. Large scale micro-array studies, in which the expression of every gene can be measured simultaneously, gives a first glimpse of the 'division of labor' between duplicated genes. A preliminary analysis suggests that differential expression is often the primary event which allows duplicated genes to be maintained in a genome, but alternate routes also exist, most notably on the one hand the mere need of a lot of product, and on the other hand differentiation within multi-protein complexes consisting of homologous genes. I will discuss these results in terms of multilevel evolution, in particular in terms of information integration and the alternatives of 'individual based' vs 'population based' diversity.
The tools used in analyzing expression data are still very crude. I will discuss some central questions in this respect.
Bioinformatic pattern analysis of antigen processing and presentation
Can Kesmir
T cells can detect the presence of intracellular pathogens because infected cells display on their surface peptide fragments derived from pathogenic proteins. The capability of any antigenic peptide fragment to invoke cytotoxic T cell response is three-fold: First, a peptide has to be generated as a product of cytolytic degradation by the proteasome. Then it has to have enough binding affinity to TAP in order to get carried to endoplasmic reticulum. Finally a peptide has to have high affinity to the host major histocompatibility complex (MHC) Class I molecule.
In the talk, I will give an overview of the recent work done in understanding generation and presentation of these antigenic peptides. Special emphasis will be on the specificity of human (immuno)proteasome.
Multiple sequence alignment in the post-genomic era: pitfalls, remedies
and an application to domain boundary prediction
Jaap Heringa
With about 70 complete genomes sequenced today, the magnitude and diversity of sequence data lead to much increased demands concerning the speed, sensitivity and versatility of sequence analysis programs. Multiple sequence alignment is one of the most important tools to make biological sense out of biological sequence data. The most widely used methods are based on the dynamic programming (DP) protocol, which gives a guaranteed best solution for the alignment of a pair of sequences. However, using this strategy for simultaneous multiple sequence alignment becomes computationally prohibitive quickly when more than a few sequences need to be aligned.
A long-standing heuristic around this problem is progressive alignment, which implies repeated use of the pair wise DP algorithm according to a preconceived order until all sequences are aligned. Although a reasonable strategy, there are many pitfalls connected to this approach, leading to ongoing problems for the alignment engines when faced with sequences of low homology, internal sequence repeats, local similarity, long insertions/deletions, etc. In my talk I will focus on these problems and discuss our strategies to address them. How the information from a multiple alignment can be applied will be shown using a new method to predict protein domain boundaries based on multiple alignment and protein secondary structure prediction.
References:
Heringa, J. (1999) Two strategies for sequence comparison:
profile-preprocessed and secondary structure-induced multiple alignment.
Comput. Chem., 23, 341-364.
Notredame, C., Higgins D.G., and Heringa, J. (2000) T-Coffee: A novel
method for fast and accurate multiple sequence alignment.
J. Mol. Biol., 302, 205-217.
Computational Biology at the brink of the post-genomic era
Thomas Lengauer
The year 2000 will be remembered in history as the year in which the human genome has been sequenced. This marks the end of the pre-genomic era which was characterized by strong world-wide efforts to sequence the human genome and, in fact, ended significantly ahead of schedule. Today, we are at the entry of the probably much longer post-genomic era, which is characterized by the grand quest of making sense of the genomic text. This goal can only be achieved by a concerted effort involving biological experiments and computer analyses. Conquering the computer part is the task of the scientific field of computational biology or bioinformatics.
Here we will describe two facets of computational biology. One is that of a discipline shaped by several grand challenge basic research problems. The other is that of a field driven by a strong demand for immediate answers to pressing practical problems in biotechnology, notably in pharmaceutics and medicine.