Computer Science 245/345
Introduction to Bioinformatics

Summer 2016: June 6 to June 24
Monday, Tuesday, Wednesday, Thursday, Friday: 10:00 to 1:00 pm

Helpful links


Biology is not a prerequisite for this course. But an interest in Biology is. I will spend the first 2 lectures of the course going over some notions in biology. See Biology Terms.

Information about the Instructor

Name: Sami Khuri, visiting professor from San Jose State University
Office: 335W
Phone: TBD
Office Hours: TBD

Catalog Description

The course starts with a brief introduction to molecular biology. The course then investigates the main algorithms used in Bioinformatics. After a brief description of commonly used tools, algorithms, and databases in Bioinformatics, the course describes specific tasks that can be completed using combinations of the tools and Databases. The course then focuses on the algorithms behind the most successful tools, such as the local and global sequence alignment packages: BLAST, Smith-Waterman and Needleman- Wunsch. Lecture topics include Hidden Markov Models for pattern recognition, conducting profile-based searches, phylogenetic tree construction, and transmembrane protein structure prediction;
The course is self contained and does not assume any background knowledge in biology, although an interest is molecular biology is helpful.
The course will also be complemented by hands-on, computer lab sessions that will allow students to practice with some of the major tools and databases. Students will solve hands-on problems on HIV, BRCA1 gene, Thalassemia, MYH, etc...
Students will be given projects that will have to be completed and submitted by midnight (California time) on July 29, 2016.


To acquaint students with some of the most challenging problems in life science and show how computer science can be used to better understand and in some cases, solve some of these problems.


Learning Outcome

Upon successful completion of this course, students should be able to use dynamic programming for pairwise alignment and (to some extend) for RNA secondary structure prediction (Nussinov's algorithm), to understand how multiple sequence alginment algorithms work, to have a clear understanding of phylogenetic tree algorithms, and to know various databases for DNA and protein sequences. They should be able to assess and evaluate novel computational methods for use in bioinformatics, including machine learning techniques, mainly Hidden Markov Models, and pattern recognition techniques.

Lecture Material and Schedule

Recommended Textbooks (Not required)

Copies of lecture notes, hands-on exercises and case studies for all classes, from Monday, June 6, to Friday, June 24, 2016, can be found here.

Cover Sheet

Lecture Notes Biology Motivation Algorithm Hands-On Exercise Additional Links Articles
1) Biology Review
2) What is Bioinformatics?
Transcription and Translation Transcription Translation Algorithms ONE: Human Traits
TWO: Interesting Problems
THREE: Transcription-Translation & NCBI
Biology Terms
Basic Genetics
Interactive Explore
A Sunny Day (video 6.1 MB)
Translation (video 2.0 MB)
Translation (YouTube)
Transcription & Translation (YouTube)
Luscombe et al.
3) Pairwise Sequence Alignment
Pairwise Sequence Alignment DP for Alignment Algorithm FOUR: Investigating Inherited Diseases
HandsOn_Sequences [txt]
DP for Alignment Problems (visualization in Java)
NCBI Tutorials
Needleman et al.
4) Multiple Sequence Alignment
Multiple Sequence Alignment CLUSTAL Algorithm FIVE: Multiple Sequence Alignment CLUSTALW Higgins_2007
5) Phylogenetic Trees
Phylogenetic Tree Construction UPGMA Algorithm SIX: Phylogenetic Tree UPGMA: An Example Baldauf
The Big Jaw The Big Jaw Case Study SEVEN: The Big Jaw
myh16_Sequences [txt]
Supplementary Info Stedman et al.
Beta Thalassemia Beta Thalassemia Case Study EIGHT: Beta Thalassemia
beta_globin_sequence [pdf]
beta_globin_sequence [doc]
Armenian Association
Videos on Thalassemia
Cao et al.
Treisman et al.
PAX6 PAX6 Case Study NINE: PAX6
zebrafish_pax6_protein_fasta [txt]
. Cooper et al.
BRCA1 BRCA1 Case Study TEN: BRCA1 Gene & Protein
sequences_BRCA1 [txt]
Familial vs Sporadic
Cancer Genetics
Genetics Home Reference
Origins of HIV Origins of HIV Case Study ELEVEN: HIV Genome
hiv_sequence [txt]
Genome Maps [pdf]
TWELVE: Origins of HIV
env_protein_sequences [txt]
gag_protein_sequences [txt]
pol_protein_sequences [txt]
The Nine Genes of HIV Rambaut et al.
6) Motifs and Logos
Motif Detection . THIRTEEN: Detecting Motifs
pwm [Excel]
Excel Functions [pdf]
FOURTEEN: Constructing PWM
Hands_On_14_Skeleton_1 [Excel]
Hands_On_14_Skeleton_2 [Excel]
FIFTEEN: Scoring Short Sequences with PWM
. .
7) Hidden Markov Models Hidden Markov Models . SIXTEEN: Arabidopsis
arabidopsis_rad1_genomic [txt]
arabidopsis_rad1_cDNA [txt]
splice_site [pdf]
. Lawrence Rabiner
Sean Eddy
8) Transmembrane Protein Structure Prediction
Transmembrane Protein Structure Prediction . EIGHTEEN: CFTR Protein
CFTR_Screening_Mary_Tom [txt]
. .
Programming in Python Bioinformatics Problems Python Coding Style
NINETEEN: Manipulating DNA Sequences
Programs for Hands-On 19

TWENTY: Transcription, Translation, and GC Content
Programs for Hands-On 20
1) Install Python on Windows


2) Copy, Paste, Edit, Run Program in Codeskulptor

Course Requirements

Term Project

There will be a group project. Each group consists of two students. The group chooses a topic and writes a term-project. The group will choose only one topic from the five suggested topics. Alternatively, the group suggests their own project by submitting a one-page proposal describing their project, by noon on Friday, June 16. If accepted, the group needs to submit their work by the deadline. The term-project is due by 11:59 pm PST on Friday, July 29, 2016.
The cover sheet for the project (pdf).
The cover sheet for the project (doc).


Final Exam: In-class, closed-book and comprehensive. Date: Friday, June 24, 2016.
Review sheet for Final Exam.

Grading Policy

The final grade will be computed as shown below:

Hands-On Exercises 30%
Term-project 30%
Final Exam 40%

Note: Students get full credit on Hands-On Exercsies as long as they are in class during the discussion of the solutions of the problems. Students marked absent will have to hand in a hard copy of detailed solutions of the hands-on exercises they missed (if they want to get credit for the exercises that were solved and discussed during their absence). The submission will have to be done within 2 days of the absence. It is not meant to be as a punishment, but rather to make sure that students do not fall behind.

[97, 100] A+
[93, 97) A
[90, 93) A-
[87, 90) B+
[82, 87) B
[80, 82) B-
[77, 80) C+
[72, 77) C
[70, 72) C-
[67, 70) D+
[62, 67) D
[60, 62) D-
[0, 60) F

web counter
web counter