Computational Genomics Specialization Area

Computational Genomics entails efforts to digest the daunting quantity of genomic and proteomic data now available by systematic development and application of probability and statistics theories, information technologies and data mining techniques. Linguistics methods are viewed as promising tools towards elucidating sequence-structure-function relations, and complementing computational genomics studies. Computational genomics targets understanding gene/protein function, identifying and characterizing cellular regulatory networks and discerning the link between genes and diseases. Discovery and processing of this information is pivotal in the development of novel gene therapy strategies and tools.

Life Science Electives

Specialization Electives (3 credits/9 units)

Molecular Evolution

Sequencing technology is continually progressing, and genome sequences from different species and populations continue to become available in increasing numbers. Such data allows questions about molecular function and evolution to be addressed in new and exciting ways. This course introduces students to the evolutionary analysis of DNA and amino acid sequences. Lectures on theory will be accompanied by practical instruction in the use of contemporary computational methods and software. Topics include: population genetics of selection and mutation, models of sequence evolution, phylogenetic models, analysis of multiple sequence alignments for rates and patterns of divergence, inference of natural selection, and co-evolution between proteins. Emphasis is placed on quantitative modeling and the biological principles underlying observed patterns of molecular evolution. Interested students should have a basic grasp of molecular biology and calculus.

Computational Molecular Biology and Genomics

An advanced introduction to computational molecular biology, using an applied algorithms approach. The first part of the course will cover established algorithmic methods, including pairwise sequence alignment and dynamic programming, multiple sequence alignment, fast database search heuristics, hidden Markov models for molecular motifs and phylogeny reconstruction. The second part of the course will explore emerging computational problems driven by the newest genomic research. Course work includes four to six problem sets, one midterm and final exam. A project based on recent results from the genomics literature will be required of students taking 03-711.

Computational Methods for Biological Modeling and Simulation

This course covers a variety of computational methods important for modeling and simulation of biological systems. It is intended for graduates and advanced undergraduates with either biological or computational backgrounds who are interested in developing computer models and simulations of biological systems. The course will emphasize practical algorithms and algorithm design methods drawn from various disciplines of computer science and applied mathematics that are useful in biological applications. The general topics covered will be models for optimization problems, simulation and sampling, and parameter tuning. Course work will include problems sets with significant programming components and independent or group final projects.

Advanced Topics in Computational Genomics

Research in biology and medicine is undergoing a revolution due to the availability of high-throughput technology for probing various aspects of a cell at a genome-wide scale. The next-generation sequencing technology is allowing researchers to inexpensively generate a large volume of genome sequence data. In combination with various other high-throughput techniques for epigenome, transcriptome, and proteome, we have unprecedented opportunities to answer fundamental questions in cell biology and understand the disease processes with the goal of finding treatments in medicine. The challenge in this new genomic era is to develop computational methods for integrating different data types and extracting complex patterns accurately and efficiently from a large volume of data. This course will discuss computational issues arising from high-throughput techniques recently introduced in biology, and cover very recent developments in computational genomics and population genetics, including genome structural variant discovery, association mapping, epigenome analysis, cancer genomics, and transcriptome analysis. The course material will be drawn from very recent literature. Grading will be based on weekly write-ups for ciritiques of the papers to be discussed in the class, class participation, and a final project. It assumes a basic knowledge of machine learning and computational genomics.

Cross-Species Systems Modeling

Model organisms have longed played an important role in basic science studies and in the pharmaceutical industry. These organisms, ranging from yeast to worms to flies, share many processes that are similar to those active in humans which have made these and other animals the focus of many lab studies. Similarly, almost all drugs are initially tested on mice making cross species studies a key issue in drug development. However, many of the drugs that work well for mice fail in late stage human trials. Similarly, many interactions between highly conserved proteins in one species are not conserved, even between very close species. In this class we will discuss recent studies that try to compare and contrast genomics and functional genomics data across species with the goal of identifying the conserved and divergent processes that are active in each of the species being studied. The class will be divided into three parts. The first will focus on sequence analysis and comparative genomics covering issues related to whole genome sequence alignment, motif discovery using conservation data and miRNA identification using sequence data from multiple species. The second will focus on comparisons of a single type of functional genomics data including gene expression, protein interactions and protein-DNA interactions. This part will rely on recent studies regarding the integration of expression data across species, combining, comparing and aligning protein interaction networks in multiple species and experimental studies that compare protein-DNA interactions across species and in hybrids. In the final part of the class we will discuss methods that attempt to combine multiple functional genomics datasets for a systems biology comparison of interactions across species. Students would be required to present one or two papers and to complete a class project in which they compare or contrast genomics data across species.

Automation of Biological Research

Biology has been revolutionized by automated methods for generating large amounts of data on diverse biological processes. This, in addition to the finding that many more components are involved in each process than had earlier been thought, has led to a transition from a reductionist paradigm of biological research involving detailed study of single molecules or events to a systems biology paradigm involving comprehensive, systematic studies combined with computational data analysis. Integration of data from many types of experiments will be required to construct detailed, predictive models of cell, tissue or organism behaviors, and the complexity of the systems suggests the need for these models to be constructed automatically. This will require iterative cycles of acquisition, analysis, modeling, and experimental design, since it is not feasible to do all possible biological experiments. This course will cover a range of automated biological research methods, especially high-throughput screening and next generation sequencing, and a range of relevant computational methods, especially model structure learning and active learning. It assumes a basic knowledge of machine learning. Class sessions will consist of a combination of lectures and discussions of important research papers. Grading will be based on class participation, homeworks and a final project.

Statistical Foundations for Bioinformatics Data Mining

This course provides an intermediate-level understanding of statistical foundations to prepare students for the competent use of data analysis methods in common practice in bioinformatics. Statistical ideas covered include probability distributions, likelihood theory, Bayesian and frequentist concepts, estimation, hypothesis testing and significance testing, multiplicity adjustments, the EM and MCMC algorithms, random walks, Poisson processes and Markov chains. Application areas include biological swquence analysis and microarray analysis. Students will learn the R statistical language. The R packages Bioconductor and BRB array tools for microarray analysis will be studied.

Practical Analysis of High-Throughput Genomic & Proteomic Data

This course provides an in-depth comparative study of methods for the analysis and interpretation of high-throughput genomic and proteomic data sources. Using a broad survey of literature, the student will learn approaches to normalization/transformation, finding predictive biomarkers, methods for classification, cross-validation, functional interpretation. Ways to integrate diverse data sources are explored, including clinical outcomes. Lectures, exercises in use of publicly-available software, and intensive experience in analysis/interpretation of published data sets are included.

Introductory High-Throughput Genomics Data Analysis I: Data Mining and Applications

-

Statistical Methods and Data Mining in Microarray Analysis

Introduces the student to specialized topics that are not covered in the formal curriculum.

Human Population Genetics

-

Quantitative Genetics

-

Linkage Analysis

-

Bioinformatics of Gene Regulation

This is a graduate level course designed primarily for students who want to learn about the computational methods and tools that are used in the analysis of promoter regions and transcription regulation data. Students with a biological background and knowledge of introductory level statistics can participate as well as students of quantitative background. The course will primarily focus on the methods that are used in the identification of transcription factor binding sites in the promoter regions of the genes. Both sequence-based and structure-based methods will be discussed. Various technologies for data collection will also be presented, including DNA arrays, SELEX, ChIP, and their derivatives.

Molecular Evolution 

Sequencing technology is continually progressing, and genome sequences from different species and populations continue to become available in increasing numbers. Such data allows questions about molecular function and evolution to be addressed in new and exciting ways. This course introduces students to the evolutionary analysis of DNA and amino acid sequences. Lectures on theory will be accompanied by practical instruction in the use of contemporary computational methods and software. Topics include: population genetics of selection and mutation, models of sequence evolution, phylogenetic models, analysis of multiple sequence alignments for rates and patterns of divergence, inference of natural selection, and co-evolution between proteins. Emphasis is placed on quantitative modeling and the biological principles underlying observed patterns of molecular evolution. Interested students should have a basic grasp of molecular biology and calculus.

Bioinformatics of Cancer Biology and Therapeutics

Reading and discussion on bioinformatics resources available to enhance research on cancer biology and therapeutics. We will discuss bioinformatics databases and other resources related to: regulatory networks and signal transduction pathways, genes associated with cancer risk and the progression of cancer; cytogenomics, sources of information on the distribution of cancer occurence and trends in the us population, databases DNA repair genes, their structure & function, models of cancer progression & responses to therapy, biomarkers for cancer detection, treatment & prevention.