Tutorial 2

Artificial intelligence techniques for bioinformatics

Tutorial delivered by:

Professor A. Narayanan
Professor of Artificial Intelligence
Director of Bioinformatics
School of Engineering and Computer Science
University of Exeter

E-mail: A.Narayanan@ex.ac.uk

Co-authored by:

Dr Ed Keedwell
Research Fellow
School of Engineering and Computer Science
University of Exeter

E-mail: e.c.keedwell@exeter.ac.uk
Webpage: http://www.ex.ac.uk/~eckeedwe

Professor A. Narayanan (A.Narayanan@ex.ac.uk) was appointed Lecturer in Computer Science at the University of Exeter in 1980 and has taught various modules in computer science (artificial intelligence and machine learning techniques) cognitive science (mind/brain issues) and philosophy (philosophy of mind and language). He designed and developed the MSc/MRes programme in Bioinformatics in 1999. He teaches machine learning techniques for bioinformatics and bioethics on that programme. His CV can found at http://www.dcs.ex.ac.uk/~anarayan/. He has published several papers on the application of artificial intelligence techniques to bioinformatics. His recent professional activity includes being an advisory board member for the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB04) and the 2004 IEEE International Conference on Intelligent Data Engineering and Automated Learning (IDEAL’04).

Dr Keedwell is a researcher in the School of Engineering and Computer Science whose PhD thesis concerned a neural-genetic model for gene expression analysis. He has several publications in the application of neural networks and genetic algorithms.He is co-author with Professor Narayanan of ‘Intelligent Bioinformatics’, a book to be published by Wiley in 2005. Further details can be found at http://www.ex.ac.uk/~eckeedwe.

The presenters gave a four-hour tutorial on Machine Learning Techniques for Bioinformatics at the Intelligent Systems for Molecular Biology conference (ISMB03) in Brisbane, Australia to an audience of 150 delegates. The slides from that tutorial can be found at http://www.dcs.ex.ac.uk/~anarayan/ismb03/ismb_tutorial.ppt to provide an indication of the style and quality of presentation for this proposed tutorial.

Expected Goals, Objectives and Motivation:

There is growing interest in the application of artificial intelligence (AI) techniques in bioinformatics. In particular, there is an appreciation that many of the bioinformatics problems need a new way of being addressed given either the intractability of current approaches or the lack of an informed and intelligent way to exploit biological data. For an instance of the latter, there is an urgent need to identify new methods for extracting gene and protein networks from the rapidly proliferating gene expression and proteomic datasets. For an instance of the former, predicting the way a protein folds from first principles may well be feasible given some algorithms for protein sequences of 20 or so amino acids, but once the sequences become biologically plausible (200 or 300 amino acids and more) current protein folding algorithms which work on first principles rapidly become intractable.

AI is an area of computer science that has been around since the 1950s and deals with problems considered intractable by computer scientists through the use of heuristics and probabilistic approaches. AI approaches excel when dealing with problems where there is no requirement for ‘the absolutely provably correct or best’ answer (a ‘strong’ constraint) but where, rather, the requirement is for an answer which is better than one currently known or which is acceptable within certain defined constraints (a ‘weak’ constraint). Given that many problems in bioinformatics do not have strong constraints, there is plenty of scope for the application of AI techniques to a number of bioinformatics problems.

The aim of this tutorial is to introduce researchers in adaptive and natural algorithms to these fundamental problems in bioinformatics and computational biology, in particular, neural networks, symbolic machine learning, genetic algorithms, genetic programming and cellular automata. One of the intriguing aspects of using evolutionary computation techniques is the rather philosophically appealing idea of applying techniques from AI which have themselves been influenced by developments in biology.

The objectives of the tutorial are to ensure that participants will have a basic knowledge of current problems in bioinformatics and computational biology, as well as the knowledge and skills to apply artificial intelligence techniques to biological data and to evaluate bioinformatics problems for their potential analysis by natural and adaptive algorithms.

Detailed outline of the presentation

The tutorial will consist of three parts. First, the basics of molecular biology will be introduced, informed by the latest discoveries in genomics, spliceosomics, transcriptomics and proteomics. Next, a variety of AI approaches to problems in these areas will be described, including classical symbolic machine learning techniques (nearest neighbour and identification tree approaches), supervised and unsupervised neural networks, and evolutionary computation techniques (genetic algorithms, genetic programming, cellular automata). Finally, novel hybrid methods will be introduced, including genetic neural networks and symbolically informed neural networks. The tutorial will be delivered at the pace the audience requires, with questions during presentation being encouraged. Problems and application areas will include: secondary structure protein folding prediction; viral protease cleavage prediction; cancer gene expression data mining; temporal gene expression data analysis; multiple sequence alignment; reverse engineering gene regulatory networks.

Tutorial material

The material for the tutorial will be based on a paper ‘Artificial intelligence techniques for Bioinformatics’, written by the presenters and recently published in Applied Bioinformatics (available from http://www.dcs.ex.ac.uk/~anarayan/publications/

Slides and examples will be taken from this paper, supplemented with further material from our research papers and the forthcoming book, Intelligent Bioinformatics by Narayanan and Keedwell (Wiley, 2005). The tutorial presenters will use See5 and SNNS (Stuttgart Neural Network Simulator) for demonstration purposes, as well as other purpose-built genetic algorithm and evolutionary computation software. Tutorial attendees will receive free copies of slides.

Target audience

There is currently a great deal of interest among computer scientists concerning the application of genetic algorithms, neural networks and machine learning techniques to bioinformatics problems. The tutorial will introduce the basics of molecular biology and then provide examples of how AI techniques can be used to help solve problems of analysing gene expression data, reverse engineering gene regulatory networks, form multiple alignments and predict protein structure. Examples will be taken from the bioinformatics literature and the research programmes of the tutorial presenters. The tutorial audience will therefore immediately see the relevance of these techniques to bioinformatics problems. Additionally, new ‘discoveries’ made by these techniques will be presented to demonstrate the value of applying AI and machine learning techniques to a variety of bioinformatics problems, including alternative models to the standard theory of cancer and the identification of new drug targets.