Computational Genomics Specialization Area

Computational Genomics entails efforts to digest the daunting quantity of genomic and proteomic data now available by systematic development and application of probability and statistics theories, information technologies and data mining techniques. Linguistics methods are viewed as promising tools towards elucidating sequence-structure-function relations, and complementing computational genomics studies. Computational genomics targets understanding gene/protein function, identifying and characterizing cellular regulatory networks and discerning the link between genes and diseases. Discovery and processing of this information is pivotal in the development of novel gene therapy strategies and tools.

Life Science Electives

Specialization Electives (3 credits/9 units)

Molecular Evolution

Sequencing technology is continually progressing, and genome sequences from different species and populations continue to become available in increasing numbers. Such data allows questions about molecular function and evolution to be addressed in new and exciting ways. This course introduces students to the evolutionary analysis of DNA and amino acid sequences. Lectures on theory will be accompanied by practical instruction in the use of contemporary computational methods and software. Topics include: population genetics of selection and mutation, models of sequence evolution, phylogenetic models, analysis of multiple sequence alignments for rates and patterns of divergence, inference of natural selection, and co-evolution between proteins. Emphasis is placed on quantitative modeling and the biological principles underlying observed patterns of molecular evolution. Interested students should have a basic grasp of molecular biology and calculus.

Computational Molecular Biology and Genomics

An advanced introduction to computational molecular biology, using an applied algorithms approach. The first part of the course will cover established algorithmic methods, including pairwise sequence alignment and dynamic programming, multiple sequence alignment, fast database search heuristics, hidden Markov models for molecular motifs and phylogeny reconstruction. The second part of the course will explore emerging computational problems driven by the newest genomic research. Course work includes four to six problem sets, one midterm and final exam. A project based on recent results from the genomics literature will be required of students taking 03-711.

Computational Methods for Biological Modeling and Simulation

This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Course work will include problems sets with significant programming components and independent or group final projects.

Advanced Topics in Computational Genomics

Research in biology and medicine is undergoing a revolution due to the availability of high-throughput technology for probing various aspects of a cell at a genome-wide scale. The next-generation sequencing technology is allowing researchers to inexpensively generate a large volume of genome sequence data. In combination with various other high-throughput techniques for epigenome, transcriptome, and proteome, we have unprecedented opportunities to answer fundamental questions in cell biology and understand the disease processes with the goal of finding treatments in medicine. The challenge in this new genomic era is to develop computational methods for integrating different data types and extracting complex patterns accurately and efficiently from a large volume of data. This course will discuss computational issues arising from high-throughput techniques recently introduced in biology, and cover very recent developments in computational genomics and population genetics, including genome structural variant discovery, association mapping, epigenome analysis, cancer genomics, and transcriptome analysis. The course material will be drawn from very recent literature. Grading will be based on weekly write-ups for ciritiques of the papers to be discussed in the class, class participation, and a final project. It assumes a basic knowledge of machine learning and computational genomics.

Automation of Biological Research

Biology has been revolutionized by automated methods for generating large amounts of data on diverse biological processes. This, in addition to the finding that many more components are involved in each process than had earlier been thought, has led to a transition from a reductionist paradigm of biological research involving detailed study of single molecules or events to a systems biology paradigm involving comprehensive, systematic studies combined with computational data analysis. Integration of data from many types of experiments will be required to construct detailed, predictive models of cell, tissue or organism behaviors, and the complexity of the systems suggests the need for these models to be constructed automatically. This will require iterative cycles of acquisition, analysis, modeling, and experimental design, since it is not feasible to do all possible biological experiments. This course will cover a range of automated biological research methods, especially high-throughput screening and next generation sequencing, and a range of relevant computational methods, especially model structure learning and active learning. It assumes a basic knowledge of machine learning. Class sessions will consist of a combination of lectures and discussions of important research papers. Grading will be based on class participation, homeworks and a final project.

Statistical Foundations for Bioinformatics Data Mining

This course provides an intermediate-level understanding of statistical foundations to prepare students for the competent use of data analysis methods in common practice in bioinformatics. Statistical ideas covered include probability distributions, likelihood theory, Bayesian and frequentist concepts, estimation, hypothesis testing and significance testing, multiplicity adjustments, the EM and MCMC algorithms, random walks, Poisson processes and Markov chains. Application areas include biological swquence analysis and microarray analysis. Students will learn the R statistical language. The R packages Bioconductor and BRB array tools for microarray analysis will be studied.

Introductory High-Throughput Genomics Data Analysis I: Data Mining and Applications

-

Human Population Genetics

-

Molecular Evolution 

Sequencing technology is continually progressing, and genome sequences from different species and populations continue to become available in increasing numbers. Such data allows questions about molecular function and evolution to be addressed in new and exciting ways. This course introduces students to the evolutionary analysis of DNA and amino acid sequences. Lectures on theory will be accompanied by practical instruction in the use of contemporary computational methods and software. Topics include: population genetics of selection and mutation, models of sequence evolution, phylogenetic models, analysis of multiple sequence alignments for rates and patterns of divergence, inference of natural selection, and co-evolution between proteins. Emphasis is placed on quantitative modeling and the biological principles underlying observed patterns of molecular evolution. Interested students should have a basic grasp of molecular biology and calculus.

Computational Methods for Proteomics and Metabolomics

Proteomics and metabolomics are the large scale study of proteins and metabolites, respectively. In contrast to genomes, proteomes and metabolomes vary with time and the specific stress or conditions an organism is under. Applications of proteomics and metabolomics include determination of protein and metabolite functions (including in immunology and neurobiology) and discovery of biomarkers for disease. These applications require advanced computational methods to analyze experimental measurements, create models from them, and integrate with information from diverse sources. This course specifically covers computational mass spectrometry, structural proteomics, proteogenomics, metabolomics, genome mining and metagenomics.

Computational Medicine

Modern medical research increasingly relies on the analysis of large patient datasets to enhance our understanding of, and our ability to treat human diseases. This course will focus on the computational problems that arise from studies of human diseases and the translation of research to the bedside to improve human health. The topics to be covered include computational strategies for advancing personalized medicine, pharmacogenomics for predicting individual drug responses, metagenomics for learning the role of the microbiome in human health, mining electronic medical records to identify disease phenotypes, and case studies in complex human diseases such as cancer and asthma. We will discuss how machine learning and other computational methods are being used by clinicians. Class sessions will consist of lectures, discussions of papers from the literature, and guest presentations by clinicians and other domain experts. Students enrolled in 02-518 will be graded based on homeworks, paper summaries, and a course project. Students enrolled in 02-718 will be graded based on in-class presentations, written summaries of papers, and a course project. 

Genomics and Epigenetics of the Brain

This course will provide an introduction to genomics, epigenetics, and their application to problems in neuroscience. The rapid advances in genomic technology are in the process of revolutionizing how we conduct molecular biology research. These new techniques have given us an appreciation for the role that epigenetics modifications of the genome play in gene regulation, development, and inheritance. In this course, we will cover the biological basis of genomics and epigenetics, the basic computational tools to analyze genomic data, and the application of those tools to neuroscience. Through programming assignments and reading primary literature, the material will also serve to demonstrate important concepts in neuroscience, including the diversity of neural cell types, neural plasticity, the role that epigenetics plays in behavior, and how the brain is influenced by neurological and psychiatric disorders. Although the course focuses on neuroscience, the material is accessible and applicable to a wide range of topics in biology.