Carnegie Mellon University

Core Courses

All students will be required to take five core graduate courses. The core courses are:

It is hard to imagine anything more fascinating than automated systems that improve their own performance. The study of learning from data is commercially and scientifically important. This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in learning and data mining or who may need to apply learning or data mining techniques to a target problem. The topics of the course draw from classical statistics, from machine learning, from data mining, from Bayesian statistics and from statistical algorithmics. Students entering the class with a pre-existing working knowledge of probability, statistics and algorithms will be at an advantage, but the class has been designed so that anyone with a strong numerate background can catch up and fully participate.

This is an upper level critical paper reading and discussion course in the areas of experimental genomics. Introductory lectures will be interspersed within the topic blocks, with most of the meeting time devoted to critical discussion of assigned journal articles. Students will be responsible for leading class discussions and fielding questions on those articles. There will be written and oral presentations of final student proposals at the end of the semester.

The course will meet 2 x 2-hour meetings for each of six weeks. The course will be offered with multiple sections if necessary to keep class size small enough to enable productive discussions.

Goals:

  • Developing skills for critical reading and thinking about the primary literature
  • Gaining experience and comfort with robust in-person scientific discussion and civil debate
  • Gaining experience and comfort with leading discussions
  • Experience and comfort with public presentation of original scientific ideas
  • Building a fund of knowledge around large-scale biological data and its computational analysis

A graduate-level introduction into mathematical modeling and analysis of biological systems on the molecular, cellular and other levels. This condensed and broad course conveys the unity of the modeling methodology in biology. It spans a range of perspectives derived from the different disciplines from which this new area of research originated: biology, mathematics, engineering, and computer science. The systems covered include quantitative physiology, quantitative cell biology, computational structural biology, biophysical modeling, biological networks, dynamic systems, cell mechanics, and systems modeling of critical illness. 

The course will survey computational methods and models that are broadly useful across the various system types examined. These will include random walk models, statistical mechanics, molecular dynamics, master equations, and continuous and discrete models of chemistry at the cellular and molecular levels.

Across the entire range of topics and systems, the universality of the systems modeling methodology and its role in biomedical research are emphasized.

This course will focus on advanced algorithms and algorithm design techniques for classes of algorithmic problems that frequently arise in biological contexts. On completing the course, students will be familiar with the algorithmic toolkit at their disposal. They will master the art of combining algorithmic building blocks to devise solutions to novel problems in the biological domain. The course presents both provably optimal algorithms as well as practical heuristics. Students will learn both proof techniques and procedures for empirical assessment of algorithmic performance.

Algorithmic families that will be presented include efficient algorithms for manipulation of strings, graphs, trees, images, and other abstractions of biological data. Modern algorithmic techniques such as succinct data structures, amortized and probabilistic analysis of algorithms, sketching, and convex optimization will be discussed for solving large-scale biological problems.

The discussion of each of these algorithmic concepts will be motivated by and tightly integrated with approaches to one or more biological problems such as the following:

  • Read mapping, genomic sequence assembly and variant detection (succinct data structures, amortizedand probabilistic analysis of algorithms)
  • Gene expression quantification (convex optimization)
  • Whole genome alignments (heuristics, string algorithms)
  • Biological network analysis (graph algorithms)
  • Clustering data and inferring phylogenies (sketching)
  • Image analysis (signal processing)
  • Molecular mechanics (convex optimization, global search strategies)

Each offering of the course will contain new topics in response to algorithmic challenges that arise as new types of biological data and innovations in the field of algorithms emerge.

The course will take a mathematical viewpoint, focusing on provable properties of algorithms and data structures. While some topics may introduce heuristics that empirically work well, these are introduced to augment the main topic of the course, which is provable theoretic properties of algorithms.

Computational biologists frequently focus on analyzing and modeling large amounts of biological data, often from high-throughput assays or diverse sources. It is therefore critical that students training in computational biology be familiar with the paradigms and methods of experimentation and measurement that lead to the production of these data. This one-semester laboratory course gives students a deeper appreciation of the principles and challenges at the interface of biological experimentation and computation required to analyze the resulting data. Students learn a range of topics, including experimental design, structural biology, next generation sequencing, genomics, proteomics, bioimaging, and high-content screening. Class sessions are primarily devoted to designing experiments and analyzing the resulting data. and performing experiments in the lab using the above techniques. Students are required to summarize their resulting data in written abstracts and oral presentations given in class-hosted lab meetings. With an emphasis on the basics of experimentation and broad views of multiple cutting-edge and high-throughput techniques, this course is appropriate for students who have never taken a traditional undergraduate biology lab course, as well as those who have and are looking for introductory training in more advanced approaches. Grading: Letter grade based on class participation, experimental design assignments, and written and oral presentations.

As an alternative to 02-760, 02-761 and 02-762 can be taken in sequence to fulfill both the lab methods requirement and a specialization elective.

In the 02-761 and 02-762 sequence, students will receive the same content as 02-760 and they will also will learn about the essential technical and biological laboratory skills used to design and execute automated biological experiments. Students will learn the principles, experimental paradigms, and techniques for automating biological experimentation with the goal of enabling complete automation of biological experimentation (AI driven robotic experimentation). Students will learn the design concepts for automated experiments, engineering elements enabling hardware for preparing samples and doing automated data collection, and software for controlling that hardware. Instruments used will include liquid handling robots, plate readers, and automated microscopes. All experiments will be performed in the CMU Automation Lab which is the first Automation Lab designed specifically for training students in biological laboratory automation.